...
/Web Scraping using Beautiful Soup
Web Scraping using Beautiful Soup
Learn how to scrape Github data using Beautiful Soup.
Scrape GitHub data
Create a function to get the data:
def getData(userName):
pass
The URL https://github.com/{user_name}?tab=repositories contains the user’s information and their recent public repositories. We’ll use the requests library to get the contents of the page.
Let’s run the following code to scrape the user’s data:
import requests
userName = input("Enter Github user name: ")
url = "https://github.com/{}?tab=repositories".format(userName)
page = requests.get(url)
decoded = page.content.decode("utf-8") # Converting content into HTML
# Creating and saving html file
f = open("index.html",'w')
f.write(decoded)
f.close()Displaying a repository's content
Next, we create an instance of BeautifulSoup and pass page.content as the parameter. We create an empty dictionary to store the user information.
soup = BeautifulSoup(page.content , 'html.parser')
info = {}
We’ll scrape the following information: Full name
- Image
- Number of followers
- Number of users following
- Location (if it exists)
- Portfolio URL (if it exists)
- Repo name, repo link, repo last update, repo programming language, repo description
Full name
The full name is inside an ...
Ask