US State Incomes and Literacy Rates
Learn to analyze the relationship between literacy rates and income levels across US states using Python. This lesson guides you through importing data, classifying incomes by quantiles, merging datasets, and creating plots. Understand why literacy may correlate with income rather than cause differences in economic groups.
US Literacy rates
Now let’s look at a smaller scale and compare literacy rates and income for each US state. The data for literacy rates of US states has been taken from the ThinkImpact website. Please run the following code to see it yourself.
Incomes of each state
The US states income group dataset is taken from Wikipedia. The table may look intimidating to import into Python, but there’s a great online website called wikitable2csv.ggor.de that will do all the work for us. We need to paste the Wikipedia URL, click “Convert”, and the site does all the rest.
Classifying income of each state
In the income dataset, we have the mean wages of each state. Using the income data, we are going to classify the states into four groups. But first, we will convert the Mean wage column type to integer. Next, we will use the command dfv.quantile to find the three quantiles which will help us divide the data into four equal parts and group the wages into the following classes:
- Low income: The first quarter (0% to 25%).
- Lower middle income: The second quarter (25% to 50%).
- Upper middle income: The third quarter (50% to 75%).
- High income: The fourth quarter (75% to 100%).
Merging and plotting of the data
Just as we did with the international data earlier, we’ll create a merged_data data frame, this time referenced in the State column. Run the code below to merge the data and plot literacy vs. income group data.
We do not get much additional insight about the relationship between the income group of a states and its literacy rate because almost all the states have a high literacy rate. This indicates that the difference in the income group among the states might depend on factors other than the literacy rate. Therefore, this supports the idea that this relationship is actually a correlation rather than a causation.
Jupyter notebook in action
To see the above Python scripts in a notebook, click to launch the application.