Multi-Index Operations#

Handling high-dimensional data often requires the use of multi-level indexing, or MultiIndex, which allows you to store and manipulate data with an arbitrary number of dimensions in lower-dimensional data structures like DataFrames. This chapter covers creating a MultiIndex and performing slicing operations on such structures.

Creating MultiIndex#

MultiIndexing enhances data aggregation and grouping capabilities. It allows for more complex data manipulations and more sophisticated analysis:

import pandas as pd

# Sample DataFrame
data = {
    'state': ['CA', 'CA', 'NY', 'NY', 'TX', 'TX'],
    'year': [2001, 2002, 2001, 2002, 2001, 2002],
    'population': [34.5, 35.2, 18.9, 19.7, 20.1, 20.9]
}
df = pd.DataFrame(data)

# Creating a MultiIndex DataFrame
df.set_index(['state', 'year'], inplace = True)
print(df)

Result:

            population
state year
CA    2001        34.5
      2002        35.2
NY    2001        18.9
      2002        19.7
TX    2001        20.1
      2002        20.9

Slicing on MultiIndex#

Slicing a DataFrame with a MultiIndex involves specifying the ranges for each level of the index, which can be done using the slice function or by specifying index values directly:

# Slicing MultiIndex DataFrame
sliced_df = df.loc[(slice('CA', 'NY'),)]
print(sliced_df)

Result:

            population
state year
CA    2001        34.5
      2002        35.2
NY    2001        18.9
      2002        19.7

This example demonstrates slicing the DataFrame to include data from states 'CA' to 'NY' for the years 2001 and 2002.

These MultiIndex operations are essential for working with complex data structures effectively, enabling more nuanced data retrieval and manipulation.