...

/

From Logistic Regression to Neural Networks

From Logistic Regression to Neural Networks

Learn the evolution from logistic regression to neural networks.

In the preceding lessons, we mastered logistic regression, the foundational discriminative model that transforms a weighted linear input (wTϕ(x)\mathbf{w}^T \phi(\mathbf{x})) into a probability using the sigmoid function. Logistic regression, at its core, represents a single computational unit.

This lesson explores the evolution from this single unit to a complex, multi-layered architecture: the neural network.

We will first define the structure and components of a single artificial neuron, highlighting how it directly mimics a logistic regression unit. We will then assemble these units into a full network architecture, introducing concepts like hidden layers, the forward pass, and the necessity of non-linear activation functions to ensure the network can model complex, non-linear patterns. Finally, we will delve into the Backpropagation algorithm, the sophisticated optimization technique that uses the chain rule to train all parameters across all layers simultaneously.

Neuron

A neuron, in the context of neural networks and artificial intelligence, is a fundamental computational unit that mimics the behavior of biological neurons found in the human brain. Neurons are the building blocks of artificial neural networks, which are used for various machine learning tasks, including image recognition, natural language processing, and more.

Components of a neuron

Let’s discuss the key components and functions of an artificial neuron:

  • Input: Neurons receive input signals from other neurons.

  • Weights: Each input is associated with a weight that determines its influence on the neuron’s output. These weights are learnable parameters that are adjusted during the training process to optimize the neuron’s performance.

  • Summation: The weighted input signals are summed together, often with an additional bias term, to produce a single value. This weighted sum represents the net input to the neuron.

  • Activation function: The net input is then passed through an activation function. The activation function introduces nonlinearity into the neuron’s computation.

  • Output: The result of the activation function is the output of the neuron, which can be passed to other neurons in subsequent layers of the neural network.

Here’s the illustration for the components of a neuron:

Neural network
Neural network

Neural network

In a neural network, we combine distinct neurons, each with its own set of parameters w\bold{w} and an activation function. The figure below exemplifies a typical neural network, comprising an input layer as the first layer, consisting of input labels xix_i. Unlike other layers, the input layer doesn’t contain neurons and simply copies the input to the subsequent layers. The last layer represents the output layer, with output labels y^j\hat{y}_j. All layers situated between the input and output layers are referred to as hidden layers. The hidden layers use labels of the form akla_k^l, where kk denotes the neuron index in layer ll. Both the hidden layers and the output layer are computational, meaning they’re comprised of neurons serving as computational units.

An example of a neural network
An example of a neural network

A neural network is composed of multiple units, with each unit resembling a logistic regression unit at its core. The crucial difference lies in the flexibility afforded by multiple units and hidden layers:

  1. Logistic regression (LR) is a single-unit model that uses a sigmoid activation function to perform simple binary classification (i.e., finding a single linear decision boundary).

  2. A neural network (NN) stacks these units, allowing the hidden layers to learn complex, non-linear features and representations. The key distinction lies in the flexibility of these units, which can employ various nonlinear functions on top of the weighted sum xTw\mathbf{x}^T \mathbf{w}. This composition enables the NN to model highly complex relationships and classification boundaries, going far beyond the limitations of the sigmoid function in a single logistic regression unit.

The image below illustrates the concept of a neuron as the functional equivalent of a simple logistic regression model:

A neuron can be viewed as a general logistic regression unit
A neuron can be viewed as a general logistic regression unit

The figure above shows that a single neuron operates identically to a logistic regression model. This model uses the sigmoid activation function (σ\sigma) to transform the weighted input into the output y^1\hat{y}_1. The image provides two equivalent ways to express this calculation:

  1. Scalar notation: y^1=σ(w1x1+w2x2)\hat{y}_1 = \sigma(w_1 x_1 + w_2 x_2), which explicitly shows the component-wise multiplication and summation for the two features x1x_1 and x2x_2.

  2. Vector notation: y^1=σ(xTw)\hat{y}_1 = \sigma(\mathbf{x}^T \mathbf{w}), where x\mathbf{x} is the input vector and w\mathbf{w} is the weight vector.

The vector notation, xTw\mathbf{x}^T \mathbf{w}, is preferred in machine learning for its compactness and scalability, allowing the network to handle an arbitrarily large number of features efficiently using matrix operations. Since a single neuron performs this fundamental calculation, the overall functioning of a neural network is defined by how these individual units are interconnected and how their computations are executed sequentially across the layers.

Forward pass

In a neural network, each neuron takes as input a vector consisting of the outputs from all neurons in the previous layer. Consequently, every neuron produces a real number output. For a layer ll containing nln_l neurons, the output vector of this layer comprises nln_l components, which act as the input for all neurons in the subsequent layer l+1l+1. This arrangement ensures that each neuron in layer l+1l+1 possesses nln_l parameters, maintaining uniformity across all neurons in the same layer. Although each neuron has its own set of parameters, they all share the same number of parameters.

Let wkl\bold{w}_k^{l} represent the parameter vector of the kthk^{th} neuron in layer ll, then the parameter matrix Wl\bold{W}^l can be defined as [w1lw2lwnll]T\begin{bmatrix}\bold{w}_1^{l} &\bold{w}_2^{l}&\dots&\bold{w}_{n_l}^{l}\end{bmatrix}^T. If all neurons in layer ll employ the same activation function glg^l, and al1\bold{a}^{l-1} denotes the output vector of layer l1l-1, the relationship between them can be expressed as:

al=gl(Wlal1)\bold{a}^l=g^l(\bold{W}^l\bold{a}^{l-1}) ...

Ask