A data pipeline is the backbone of any reliable data workflow. It takes raw inputs, applies structured transformations, and produces clean outputs one can actually use. In this project, we'll build a data pipeline in Python from scratch using Kedro, an open-source framework designed for creating modular, reproducible, and production-ready data pipelines. Rather than writing one-off scripts, we'll structure the work into reusable nodes and datasets the way professional data engineering workflows are actually organized.

We'll begin with data ingestion, i.e., loading raw data into the pipeline and configuring Kedro's DataCatalog to manage inputs and outputs cleanly. From there, we'll implement data preprocessing and transformation stages as discrete pipeline nodes, learning how Kedro resolves dependencies between steps automatically and makes each stage independently rerunnable. This is what separates a real data pipeline from a notebook full of sequential cells.

Once the pipeline is running end-to-end, we'll shift to visualization. Using hvPlot, a high-level Python plotting library built on HoloViews, we'll build an interactive data visualization dashboard with dynamic charts, filters, zoom, and hover capabilities that Matplotlib alone doesn't offer. This is where raw pipeline outputs become interpretable: we'll explore distributions, compare categories, and surface patterns through interactive views rather than static plots.

By the end, we'll have a complete, working example of a Python data pipeline paired with an interactive dashboard, which will be a practical foundation in both data pipeline design and data visualization that reflects how analysts and data engineers approach the problem in real teams.

1.Introduction

2.Getting Started

3.Structuring the ML Pipeline

4.Directed Acyclic Graphs (DAGs)

5.The ML Library

Project

6.The Pipeline Core

7.Extending the Pipeline

Project

8.Testing

9.Deployment

10.Other Considerations

11.Wrapping Up

12.Appendix

Assessment

Create Your First Data Pipeline with a Dashboard