Search⌘ K
AI Features

Getting Started

Explore the basics of datasets and CSV file formats, understand how data is organized in tables, and begin working with real-world data by analyzing Amazon's top 50 bestselling books. Learn to read and interpret CSV files to prepare for interactive coding projects.

We'll cover the following...

What is the dataset?

A dataset is the collection of data. In the case of tabular data, a dataset is arranged in tables. Each column in a table represents a variable, and each row acts as a record of the data.

Below, you see a very basic medical chart created by a doctor to keep the information of patients.

Notice that the chart has three columns:

  • Name
  • Age
  • Weight

The chart has six rows. The first row of the chart holds Carol, 21, and 55 kg. Instead of writing, Carol is 2121 years old and weighs 5555 kg, the doctor stores it in a tabular form. Anyone can easily map the values in a row back to the column name.

Overview of the project

In this project, you’ll be reading a dataset from Kaggle. The dataset will be in the form of a .csv file. You might be hearing the term ‘.csv’ for the first time. You have likely noticed that different types of file formats exist. For example, ‘.txt’, ‘.java’, and etc.

Similarly, a .csv file is a delimited text file that uses a comma to separate values. It is also known as a comma-separated values file. Below is the translation of the medical chart above into a .csv file.

Name, Age, Weight
Carol, 21, 55 kg
Maze, 25, 61 kg
Charles, 32, 72 kg
Krystal, 19, 45 kg
Paul, 47, 66 kg
David, 61, 77 kg 

Each line in a .csv file is a row. The comma between two values acts as a separator to indicate that both values belong to separate columns.

For this project, we have taken a dataset from Kaggle. This dataset is on Amazon’s Top 50 bestselling books from 2009 to 2019. It keeps the record of 550 books in a .csv file.

Let’s get started!