AI Features

Outliers

This lesson explains what are outliers, why they happen and how to remove them.

What is an outlier?

Another area of cleaning can be dealing with outliers. First off, how do you define an outlier? This can require domain knowledge as well as other information, but a simple way to start is by taking a look at box plots:

Box Plot of Hours Per Week
Box Plot of Hours Per Week

The above plot was calculated with this command:

bbox = train_df['hoursperweek'].plot(kind="box")

Detection of an outlier

Here, anything outside the “whiskers” could be considered an outlier. As a refresher, the “whiskers” are the lines sticking out from the box and are 1.5 times the interquartile range. The interquartile ...