Data Formatting and Conversion#
Data often needs to be formatted or converted to different types to meet the requirements of various analysis tasks. Pandas provides versatile capabilities for data formatting and type conversion, allowing for effective manipulation and preparation of data. This chapter covers some essential operations for data formatting and conversion.
Convert Data Types#
Changing the data type of a column in a DataFrame is often necessary during data cleaning and preparation. Use astype
to convert the data type of a column:
import pandas as pd
# Sample DataFrame
data = {'age': ['25', '30', '35']}
df = pd.DataFrame(data)
# Converting the data type of the 'age' column to integer
df['age'] = df['age'].astype(int)
print(df['age'].dtypes)
Result:
int64
String Operations#
Pandas can perform vectorized string operations on Series using .str
. This is useful for cleaning and transforming text data:
# Sample DataFrame
data = {'name': ['Alice', 'Bob', 'Charlie']}
df = pd.DataFrame(data)
# Converting all names to lowercase
df['name'] = df['name'].str.lower()
print(df)
Result:
name
0 alice
1 bob
2 charlie
Datetime Conversion#
Converting strings or other datetime formats into a standardized datetime64
type is essential for time series analysis. Use pd.to_datetime
to convert a column:
# Sample DataFrame
data = {'date': ['2023-01-01', '2023-01-02', '2023-01-03']}
df = pd.DataFrame(data)
# Converting 'date' column to datetime
df['date'] = pd.to_datetime(df['date'])
print(df['date'].dtypes)
Result:
datetime64[ns]
Setting Index#
Setting a specific column as the index of a DataFrame can facilitate faster searches, better alignment, and easier access to rows:
# Sample DataFrame
data = {'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35]}
df = pd.DataFrame(data)
# Setting 'name' as the index
df.set_index('name', inplace=True)
print(df)
Result:
age
name
Alice 25
Bob 30
Charlie 35
These formatting and conversion techniques are crucial for preparing your dataset for detailed analysis and ensuring compatibility across different analysis and visualization tools.