Categorical Data Operations#
Categorical data is common in many data sets involving categories or labels, such as survey responses, product types, or user roles. Efficient handling of such data can lead to significant performance improvements and ease of use in data manipulation and analysis. Pandas provides robust support for categorical data, including converting data types to categorical and specifying the order of categories.
Convert Column to Categorical#
Converting a column to a categorical type can optimize memory usage and improve performance, especially for large datasets. Here’s how to convert a column to categorical:
import pandas as pd
# Sample DataFrame
data = {'product': ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']}
df = pd.DataFrame(data)
# Converting 'product' column to categorical
df['product'] = df['product'].astype('category')
print(df['product'])
Result:
0 apple
1 banana
2 apple
3 orange
4 banana
5 apple
Name: product, dtype: category
Categories (3, object): ['apple', 'banana', 'orange']
This shows that the 'product' column is now of type category
with three categories.
Order Categories#
Sometimes, the natural order of categories matters (e.g., in ordinal data such as 'low', 'medium', 'high'). Pandas allows you to set and order categories:
# Sample DataFrame with unordered categorical data
data = {'size': ['medium', 'small', 'large', 'small', 'large', 'medium']}
df = pd.DataFrame(data)
df['size'] = df['size'].astype('category')
# Setting and ordering categories
df['size'] = df['size'].cat.set_categories(['small', 'medium', 'large'], ordered = True, )
print(df['size'])
Result:
0 medium
1 small
2 large
3 small
4 large
5 medium
Name: size, dtype: category
Categories (3, object): ['small' < 'medium' < 'large']
This conversion and ordering process ensures that the 'size' column is not only categorical but also correctly ordered from 'small' to 'large'.
These categorical data operations in Pandas facilitate the effective handling of nominal and ordinal data, enhancing both performance and the capacity for meaningful data analysis.