Coding Exercise: Analyze Clickstream with RDDs
Practice using PySpark RDDs to load and summarize simplified clickstream data.
We'll cover the following
Scenario
You’re working as a junior data engineer at a growing e-commerce company. Every day, the platform collects millions of clickstream logs—records of users interacting with the website. For now, you’ve been given a small sample to practice with.
Dataset
Here’s a small sample of clickstream logs stored as a list of dictionaries, simulating a large dataset:
Get hands-on with 1400+ tech skills courses.