Outliers
This lesson explains what are outliers, why they happen and how to remove them.
We'll cover the following...
What is an outlier?
Another area of cleaning can be dealing with outliers. First off, how do you define an outlier? This can require domain knowledge as well as other information, but a simple way to start is by taking a look at box plots:
Box Plot of Hours Per Week
The above plot was calculated with this command:
bbox = train_df['hoursperweek'].plot(kind="box")
Detection of an outlier
Here, anything outside the “whiskers” could be considered an outlier. As a refresher, the “whiskers” are the lines sticking out from the box and are 1.5 times the interquartile range. The interquartile ...