Search⌘ K
AI Features

Robust Scaling

Explore how to apply robust scaling to data using scikit-learn's RobustScaler, which scales features based on median and interquartile range. Understand why this method helps manage outliers and ensures your data preprocessing is more reliable.

Chapter Goals

  • Learn how to scale data without being affected by outliers

A. Data outliers

An important aspect of data that we have to deal with is outliers. In general terms, an outlier is a data point that is significantly further away from the other data points. For example, if we had watermelons of weights 5, 4, 6, 7, and 20 pounds, the 20-pound watermelon is an outlier.

A 2-D data plot with the outlier data points circled. Note that the outliers in this plot are exaggerated, and in real life outliers are not usually this far from the non-outlier data.
A 2-D data plot with the outlier data points circled. Note that the outliers in this plot are exaggerated, and in real life outliers are not usually this far from the non-outlier data.

The data scaling methods from the previous two chapters are both affected by ...