Coding Exercise: Analyze Clickstream with RDDs

Practice using PySpark RDDs to load and summarize simplified clickstream data.

Scenario

You’re working as a junior data engineer at a growing e-commerce company. Every day, the platform collects millions of clickstream logs—records of users interacting with the website. For now, you’ve been given a small sample to practice with.

Dataset

Here’s a small sample of clickstream logs stored as a list of dictionaries, simulating a large dataset:

Get hands-on with 1400+ tech skills courses.