Overview of Probabilistic Models
Explore generative and discriminative models for classification and regression within the realm of probabilistic modeling.
Imagine a weather app that predicts the temperature tomorrow will be exactly . What if it could tell you there’s an chance it will be , a chance it will be , and a chance it will be ? Which prediction is more useful?
In the context of supervised learning, it’s far more advantageous to predict the distribution of the target variable rather than a specific fixed value. This distribution provides a measure of uncertainty or confidence alongside the prediction.
What are probabilistic models?
A probabilistic model in machine learning is any model that provides an output in terms of probabilities. Instead of directly predicting a class label (like “Dog”) or a single value (like “Price is $500”), it predicts a
-
Classification example: Instead of outputting “Class A,” a probabilistic model might output: , , . The fixed prediction is then typically the value with the highest probability (Class A).
-
Regression example: Instead of outputting “Price = $500,” the model might output a
centered at $500, indicating that prices close to $500 are highly likely, while prices far from it are unlikely.Gaussian distribution A Gaussian distribution is a bell-shaped curve that shows which values are most likely. Values near the center are common, and values far away are unlikely.
In fact, numerous classification and regression models implicitly calculate the underlying distribution and generate a fixed target by sampling from this distribution. The most common approach involves sampling the value with the highest probability, although alternative sampling methods also exist.
The role of probability in learning
Probabilistic modeling forces us to explicitly calculate and model the relationship between our input features () and the target variable (). This formal approach to uncertainty is mathematically grounded and highly interpretable.
However, certain models don’t lend themselves easily to estimating a probability distribution for the target variable. In situations where obtaining samples of the target variable proves difficult, we turn to conventional machine learning models as an alternative solution.
To formally understand how these models calculate probabilities, we must turn to the foundational tool of probabilistic machine learning: Bayes’ rule. Bayes’ rule provides the mathematical framework for updating the probability of a hypothesis as new evidence or data is gathered, forming the basis of many powerful probabilistic algorithms.
Bayes’ rule
We can express the computation of the target probability distribution of a target variable given an input feature vector , represented as , using Bayes’ rule in the following manner:
Here, ...