Logistic Regression

Explore the fundamentals of logistic regression, a key classification algorithm that predicts class probabilities using the sigmoid function. Understand how to model parameters with maximum likelihood estimation and apply this approach to multi-class problems using one-versus-all strategy. This lesson prepares you to implement logistic regression and optimize coefficients via gradient ascent for better prediction accuracy.

We'll cover the following...

Moving towards class probability
Logistic function
Handling multiclasses (more than 2 classes)
Logistic regression parameters
Gradient Ascent algorithm

In the linear model, we used score $t$ to get the class or output $y$ . If the score is greater than zero, we say that the class is positive. If the score is less than zero, we say that the class is negative. The score can range between negative infinity to positive infinity.

If the score approaches positive infinity (or a very high positive value), we can say with confidence that the class is positive. If the score approaches negative infinity (or a very high negative value), we can say with confidence that the class is negative. If the value is near zero, the answer is uncertain.

Multiclasses are problems with more than two classes, and many real-world problems are multiclasses. For example, the original iris dataset has three classes, and we had to predict the right one with the help of given features.

We create a multiclass classification model using the one versus all strategy. We train C classifiers (C is the number of classes). Each classifier has data that belongs to class i, and all others put in another category. We train the classifier. Similarly, all C classifiers are trained. At the time of prediction, we get the probability of each class and output the class, which has a maximum probability.

For example, let’s say we have three classes: {Cat, Dog, Horse}

Classifier 1: Cat and Other dataset
Classifier 2: Dog and Other dataset
Classifier 3: Horse and Other dataset

When we get the example, we run it through all the classifier models. The example belongs to whichever classifier gives the highest probability.

The logistic regression model means parameters or coefficients are used to predict the probability of an example. To do so, we have to define a quality metric. We use maximum likelihood estimation or MLE.

If we have examples from two-class, then we want the probability of predicting an example with a positive class near 1 when the example belongs to the positive class. Similarly, we want the probability of predicting an example with the positive class to be near zero when the example belongs to a negative class.

Now, let’s understand the MLE with an example. We will revisit the iris dataset problem.

sepal_length	sepal_width	petal_length	petal_width	Species	Maximize probability
6.7	3.3	5.7	2.1	+1	P(y=+1\|x1 β)
6.4	3.1	5.5	1.8	+1	P(y=+1\|x2 β)
4.9	3.1	1.5	0.1	-1	P(y=-1\|x3 β)
5.5	3.5	1.3	0.2	-1	P(y=-1\|x4 β)

1.Are You Ready to Become a Data Scientist?

2.Python Basics

3.Python Libraries

4.More Data Science Tools

5.Data Structures and Algorithms - I

6.Data Structures and Algorithms - II

7.Statistics and Probability

8.Feature Engineering

9.Basics of Machine Learning

10.Regression

11.Classification

12.Unsupervised Learning

13.Advanced Topics in Machine Learning

14.Conclusion

Logistic Regression

Moving towards class probability

Logistic function

Handling multiclasses (more than 2 classes)

Logistic regression parameters

Gradient Ascent algorithm

sepal_length	sepal_width	petal_length	petal_width	Species
6.7	3.3	5.7	2.1	virginica
6.4	3.1	5.5	1.8	virginica
4.9	3.1	1.5	0.1	setosa
5.5	3.5	1.3	0.2	setosa