Multi-Index Operations#
Handling high-dimensional data often requires the use of multi-level indexing, or MultiIndex, which allows you to store and manipulate data with an arbitrary number of dimensions in lower-dimensional data structures like DataFrames. This chapter covers creating a MultiIndex and performing slicing operations on such structures.
Creating MultiIndex#
MultiIndexing enhances data aggregation and grouping capabilities. It allows for more complex data manipulations and more sophisticated analysis:
import pandas as pd
# Sample DataFrame
data = {
'state': ['CA', 'CA', 'NY', 'NY', 'TX', 'TX'],
'year': [2001, 2002, 2001, 2002, 2001, 2002],
'population': [34.5, 35.2, 18.9, 19.7, 20.1, 20.9]
}
df = pd.DataFrame(data)
# Creating a MultiIndex DataFrame
df.set_index(['state', 'year'], inplace = True)
print(df)
Result:
population
state year
CA 2001 34.5
2002 35.2
NY 2001 18.9
2002 19.7
TX 2001 20.1
2002 20.9
Slicing on MultiIndex#
Slicing a DataFrame with a MultiIndex involves specifying the ranges for each level of the index, which can be done using the slice
function or by specifying index values directly:
# Slicing MultiIndex DataFrame
sliced_df = df.loc[(slice('CA', 'NY'),)]
print(sliced_df)
Result:
population
state year
CA 2001 34.5
2002 35.2
NY 2001 18.9
2002 19.7
This example demonstrates slicing the DataFrame to include data from states 'CA' to 'NY' for the years 2001 and 2002.
These MultiIndex operations are essential for working with complex data structures effectively, enabling more nuanced data retrieval and manipulation.