Group the Similar Stuff
Learn how to uncover hidden patterns in unlabeled data.
We'll cover the following
As data scientists, we often work with datasets that don’t come with neat, pre-defined categories. Previously, we explored classification, where our models learned to predict a flower’s species because we already had labeled examples.
But what happens when we’re faced with a large volume of data, perhaps customer transaction histories, or complex biological measurements, and no one has told us what the underlying groups are or how many distinct groups might exist? We don’t have labels; just raw information.
This is where we transition from supervised learning to unsupervised learning. Our goal here isn’t to predict a known outcome but to discover inherent patterns, structures, and natural groupings within the data. The primary technique we use for this type of exploratory analysis is clustering.
What is clustering?
Clustering is unsupervised learning that systematically partitions a dataset into distinct groups. Unlike classification algorithms, which learn from labeled data to predict predefined categories, clustering algorithms learn from unlabeled data to uncover underlying patterns and structures without prior information. These algorithms identify patterns in the dataset by analyzing the similarity or distance between individual data points.
Get hands-on with 1400+ tech skills courses.