Search⌘ K
AI Features

Scraping Data from a Single Vote

Explore how to scrape and convert Canadian parliamentary voting records into Pandas data frames. Learn techniques to analyze voting patterns, understand party line adherence, and assess the representativeness of government through practical Python examples in a Jupyter notebook environment.

A democratic parliament means the people of that country vote for their choice of leader, and whoever with the majority of votes wins. For this lesson, we will be looking at the data for Canadian voting statistics.

We will see how often parliament members vote strictly along party lines. If they’re always following their party leaders, then it would seem that they’re serving their parties. If, on the other hand, they often cast votes independently of their parties, then they might be thinking more about their own constituents.

This won’t definitively prove anything one way or the other, but if we can access a large enough dataset, we should be able to draw some interesting insights.

We’ll begin on a web page managed by the House of Commons itself. The OurCommons API expects you to play around with URLs based on the base address, ourcommons.ca/Members/. Adding en tells the server that you want the service in English. Adding a forward slash and then the word votes, /votes, means that you’re looking for voting records.

Some of the terms we need to be familiar with while analyzing this data are:

  • Parliament: A parliament in this context is all the sittings between forming a new government after one election until its dissolution before the next election.
  • Session: A session is a few months’ (or even years’) worth of sittings.

What’s the difference between members and government bills? The former is sponsored by regular members of parliament of any party, while the latter is always sponsored by cabinet ministers and reflects the government’s official position.

In this chapter, we will look at the Canadian voting statistics from the second session of the 43rd parliament, the first session of the 42nd parliament, and the second session of the 41st parliament. We’ll analyze this data to see if the government could have been more representative.

Pulling of the data

Let’s start pulling some data with those introductions out of the way. Converting the webpage for a single vote into a Pandas data frame is surprisingly simple. After importing the Python Pandas library, we only need to pass a URL (for the seventeenth vote of the second session of the 43rd parliament, in this case) to the pd.read_html command. Pandas will read the page’s HTML, identify data relationships, and convert everything to a table.

Pulling of the Data
import pandas as pd
dfs = pd.read_html('https://www.ourcommons.ca/Members/en/votes/43/2/17',header=0)

For some reason, the specific data we’re after exists in the data frame identified as dfs[0] (rather than just dfs). So for convenience, we’ll push that to the new data frame, df:

df = dfs[0]

Let’s see what our data looks like.

Python 3.5
import pandas as pd
dfs = pd.read_html('https://www.ourcommons.ca/Members/en/votes/43/2/17',header=0)
df = dfs[0]
print (df.shape)

Change the names of columns

There are 4 columns, comprising 319 rows. That means 319 members cast votes for this bill. To keep things clean, we’ll change the names of the column headers.

All the votes of these first five members went against the bill (“Nay”). An affirmative vote would be identified as “Yea.” We can easily see how the party numbers break down using the .value_counts() method. Let’s see what our data looks like.

Python 3.5
import pandas as pd
dfs = pd.read_html('https://www.ourcommons.ca/Members/en/votes/43/2/17',header=0)
df = dfs[0]
df.shape
df.columns = ['Member','Party','Vote','Paired']
print (df['Party'].value_counts().to_frame())

We’re sure you’re impatiently waiting to hear how the vote went. Once again, it’s .value_counts() to the rescue.

Let’s see what our data looks like.

Python 3.5
import pandas as pd
dfs = pd.read_html('https://www.ourcommons.ca/Members/en/votes/43/2/17',header=0)
df = dfs[0]
df.shape
df.columns = ['Member','Party','Vote','Paired']
print (df['Vote'].value_counts().to_frame())

Not a happy ending, the bill didn’t pass.

Jupyter notebook in action

To see the above Python scripts in a notebook, click to launch the application.

Please login to launch live app!
1.

What is the purpose of value_counts()?

Show Answer
Did you find this helpful?