Kernel Logistic Regression

Learn how to implement kernel logistic regression along with its derivation.

We'll cover the following...

In the previous lessons, we mastered logistic regression, a powerful discriminative classifier, and understood how to optimize it using gradient descent on the BCE Loss. However, as a linear model, standard logistic regression is fundamentally limited to solving problems where the classes are linearly separable.

To overcome this limitation and enable logistic regression to tackle complex, non-linear data (like concentric circles or interlocking spirals), we must employ the kernel trick.

We can kernelize logistic regression just like other linear models by observing that the parameter vector w\bold w is a linear combination of the feature vectors Φ(X)\Phi(X), that is:

w=Φ(X)a\bold w = \Phi(X) \bold a

Here, a\bold a is the dual parameter vector, and, in this case, the loss function now depends upon a\bold a.

Minimzing BCE Loss

We need to find the model parameters (a\bold a) that result in the smallest BCE loss function value to minimize the BCE loss. The BCE loss is defined as:

LBCE(a)=i=1nLiLi=(yilog(y^i)+(1yi)log(1y^i))y^i=σ(zi)=11+ezizi=aTΦ(X)Tϕ(xi)\begin{align*} L_{BCE}(\bold{a})&=\sum_{i=1}^n L_i\\ L_i &= -(y_ilog(\hat y_i)+(1-y_i)log(1-\hat y_i)) \\ \hat{y}_i&=\sigma(z_i)=\frac{1}{1+e^{-z_i}} \\ z_i&=\bold a^T \Phi(X)^T\phi(\bold{x}_i) \end{align*} ...

Ask