Scraping Data from a Single Vote

Explore how to scrape and convert Canadian parliamentary voting records into Pandas data frames. Learn techniques to analyze voting patterns, understand party line adherence, and assess the representativeness of government through practical Python examples in a Jupyter notebook environment.

We'll cover the following...

Pulling of the data

Change the names of columns

Jupyter notebook in action

A democratic parliament means the people of that country vote for their choice of leader, and whoever with the majority of votes wins. For this lesson, we will be looking at the data for Canadian voting statistics.

We will see how often parliament members vote strictly along party lines. If they’re always following their party leaders, then it would seem that they’re serving their parties. If, on the other hand, they often cast votes independently of their parties, then they might be thinking more about their own constituents.

This won’t definitively prove anything one way or the other, but if we can access a large enough dataset, we should be able to draw some interesting insights.

We’ll begin on a web page managed by the House of Commons itself. The OurCommons API expects you to play around with URLs based on the base address, ourcommons.ca/Members/. Adding en tells the server that you want the service in English. Adding a forward slash and then the word votes, /votes, means that you’re looking for voting records.

Some of the terms we need to be familiar with while analyzing this data are:

Parliament: A parliament in this context is all the sittings between forming a new government after one election until its dissolution before the next election.
Session: A session is a few months’ (or even years’) worth of sittings.

What’s the difference between members and government bills? The former is sponsored by regular members of parliament of any party, while the latter is always sponsored by cabinet ministers and reflects the government’s official position.

import pandas as pd
dfs = pd.read_html('https://www.ourcommons.ca/Members/en/votes/43/2/17',header=0)

For some reason, the specific data we’re after exists in the data frame identified as dfs[0] (rather than just dfs). So for convenience, we’ll push that to the new data frame, df:

df = dfs[0]

Let’s see what our data looks like.

1.Before We Begin

2.Comparing Wages With Consumer Price Index Data

3.Wages and CPI: Reality Check

4.Working With Major US Storm Data

Project

5.Property Rights and Economic Development

6.How Representative Is Your Government?

7.Does Wealth Influence The Prevalence Of Mental Illness?

8.Do Birthdays Make Elite Athletes?

9.Does Literacy Impact The Income of People

10.Conclusion

11.Appendix

Scraping Data from a Single Vote

Pulling of the data

Change the names of columns

Jupyter notebook in action