Overview of Probabilistic Models

Explore generative and discriminative models for classification and regression within the realm of probabilistic modeling.

Imagine a weather app that predicts the temperature tomorrow will be exactly 20C20^\circ\text{C}. What if it could tell you there’s an 80%80\% chance it will be 20C20^\circ\text{C}, a 15%15\% chance it will be 22C22^\circ\text{C}, and a 5%5\% chance it will be 18C18^\circ\text{C}? Which prediction is more useful?

In the context of supervised learning, it’s far more advantageous to predict the distribution of the target variable rather than a specific fixed value. This distribution provides a measure of uncertainty or confidence alongside the prediction.

What are probabilistic models?

A probabilistic model in machine learning is any model that provides an output in terms of probabilities. Instead of directly predicting a class label (like “Dog”) or a single value (like “Price is $500”), it predicts a probability distributionA probability distribution is a mathematical function that describes the likelihood of different outcomes occurring in a random experiment. It provides a way to model and quantify the uncertainty associated with various events or values. over all possible outcomes.

  • Classification example: Instead of outputting “Class A,” a probabilistic model might output: P(Class A)=0.90P(\text{Class A}) = 0.90, P(Class B)=0.08P(\text{Class B}) = 0.08, P(Class C)=0.02P(\text{Class C}) = 0.02. The fixed prediction is then typically the value with the highest probability (Class A).

  • Regression example: Instead of outputting “Price = $500,” the model might output a Gaussian distributionA Gaussian distribution is a bell-shaped curve that shows which values are most likely. Values near the center are common, and values far away are unlikely. centered at $500, indicating that prices close to $500 are highly likely, while prices far from it are unlikely.

In fact, numerous classification and regression models implicitly calculate the underlying distribution and generate a fixed target by sampling from this distribution. The most common approach involves sampling the value with the highest probability, although alternative sampling methods also exist.

The role of probability in learning

Probabilistic modeling forces us to explicitly calculate and model the relationship between our input features (x\mathbf{x}) and the target variable (yy). This formal approach to uncertainty is mathematically grounded and highly interpretable.

However, certain models don’t lend themselves easily to estimating a probability distribution for the target variable. In situations where obtaining samples of the target variable proves difficult, we turn to conventional machine learning models as an alternative solution.

To formally understand how these models calculate probabilities, we must turn to the foundational tool of probabilistic machine learning: Bayes’ rule. Bayes’ rule provides the mathematical framework for updating the probability of a hypothesis as new evidence or data is gathered, forming the basis of many powerful probabilistic algorithms.

Bayes’ rule

We can express the computation of the target probability distribution of a target variable yy given an input feature vector x\bold{x}, represented as p(yx)p(y|\bold{x}), using Bayes’ rule in the following manner:

p(yx)=p(xy)p(y)p(x)p(y|\bold x) = \frac{p(\bold x|y)p(y)}{p(\bold x)}

Here, p(yx) ...

Ask