Know Your Data
Learn where data comes from, what types it takes, and how it’s stored around us.
As a data engineer, everything starts with understanding the data itself. Where is it coming from? What form does it take—structured, unstructured, or something in between? How often does it change? These core questions may sound simple, but they define how you'll collect, store, and manage data across systems. Before any tools or tech come into play, it's this clarity about your data that sets the stage for building efficient, reliable data workflows.
In data engineering, designing the wrong data structure due to a poor understanding of the source data is a top reason why pipelines fail at scale.
Know the backstory
Every dataset has a backstory. It might be clean and well-organized, or it might be messy and inconsistent. Maybe it was collected through a web form, a sensor, or a survey. Each of these origins shapes what the data can tell us—and what it can’t.
Let’s head back to your kitchen. Remember all those scattered recipes—some scribbled on napkins, a few saved in your notes app, others stuck to the fridge? Last time, we talked about organizing them neatly with labels, categories, and a proper system—that’s what a database helped us do. But here’s the next step: before you decide how to store or organize anything, you need to understand what you’re dealing with.
Are all the recipes handwritten or typed? Are some missing ingredients? Do a few include cooking times while others don’t? That’s the messy reality of real-world data. As a data engineer, your first job isn’t to build pipelines—it’s to get a feel for the ingredients in front of you.
Because if you don’t know the kind of data you’re working with, how often it changes, or where it’s coming from, you’ll end up creating systems that break down later. Think of this as your data pantry check—you're not cooking anything yet, just figuring out what’s fresh, what’s useful, and what needs cleanup.
Fun fact: The term “garbage in, garbage out” originated in computing but perfectly applies here—poor quality or misunderstood data leads to faulty outcomes, no matter how advanced your system is.
Data types
Not all data looks the same. Some datasets are highly organized and easy to explore. Others are more complex, less structured, and require a bit more work to make sense of. Knowing how to recognize different types of data is a foundational skill—it shapes how we clean, analyze, and model it.
Get hands-on with 1400+ tech skills courses.