Data Merging Techniques#

Merging data is a fundamental aspect of many data analysis tasks, especially when combining information from multiple sources. Pandas provides powerful functions to merge DataFrames in a manner similar to SQL joins. This chapter will cover four primary types of merges: outer, inner, left, and right joins.

Outer Join#

An outer join returns all records when there is a match in either the left or right DataFrame. If there is no match, the missing side will contain NaN.

import pandas as pd

# Sample DataFrames
data1 = {'column': ['A', 'B', 'C'],
         'values1': [1, 2, 3]}
df1 = pd.DataFrame(data1)
data2 = {'column': ['B', 'C', 'D'],
         'values2': [4, 5, 6]}
df2 = pd.DataFrame(data2)

# Performing an outer join
outer_joined = pd.merge(df1, df2, on = 'column', how = 'outer')
print(outer_joined)

Result:

  column  values1  values2
0      A      1.0      NaN
1      B      2.0      4.0
2      C      3.0      5.0
3      D      NaN      6.0

Inner Join#

An inner join returns records that have matching values in both DataFrames.

# Performing an inner join
inner_joined = pd.merge(df1, df2, on = 'column', how = 'inner')
print(inner_joined)

Result:

  column  values1  values2
0      B        2        4
1      C        3        5

Left Join#

A left join returns all records from the left DataFrame, and the matched records from the right DataFrame. The result is NaN in the right side where there is no match.

# Performing a left join
left_joined = pd.merge(df1, df2, on = 'column', how = 'left')
print(left_joined)

Result:

  column  values1  values2
0      A        1      NaN
1      B        2      4.0
2      C        3      5.0

Right Join#

A right join returns all records from the right DataFrame, and the matched records from the left DataFrame. The result is NaN in the left side where there is no match.

# Performing a right join
right_joined = pd.merge(df1, df2, on = 'column', how = 'right')
print(right_joined)

Result:

  column  values1  values2
0      B        2      4.0
1      C        3      5.0
2      D      NaN      6.0

These data merging techniques are crucial for combining data from different sources, allowing for more comprehensive analyses by creating a unified dataset from multiple disparate sources.