Search⌘ K
AI Features

Combining and Merging Datasets

Explore how to combine datasets in pandas using the merge, concat, and join methods. Understand key parameters like how and on, perform inner and outer joins, and handle multiple key columns for data analysis.

Data in pandas objects can be combined in several ways:

  • The merge() method connects rows in DataFrames based on one or more keys.
  • The concat() method concatenates or “stacks” together objects along an axis.

The merge() method may be familiar to users of SQL or other relational databases. No prior experience is required. The section introduces merging concepts using simple examples with clear steps. Our focus here is not to learn SQL; we only want to go through the widely used and very important inner and outer joining operations for data wrangling.

It’s important to note that merging operations may give NaN in the output. They also need to be treated according to the circumstances or requirements during data analysis.

Let’s learn these methods with examples:

Database-style DataFrame joins

Merge or join operations combine datasets by linking rows using one or more keys. These operations are central to relational databases (for example, SQL-based databases).

Let’s create two DataFrames, df1 and df2.

Python 3.5
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'key': ['a', 'b', 'c', 'd', 'e'],'A1': range(5), 'B1':range(5,10)})
df2 = pd.DataFrame({'key': ['a', 'b', 'c'], 'A2': range(3), 'B2':range(3,6)})
print("Dataframe 1\n", df1)
print("Dataframe 2\n", df2)

Now that we’ve ...