Handling Time Series Data#
Time series data analysis is a crucial aspect of many fields such as finance, economics, and meteorology. Pandas provides robust tools for working with time series data, allowing for detailed analysis of time-stamped information. This chapter will explore how to manipulate time series data effectively using Pandas.
Set Datetime Index#
Setting a datetime index is foundational in time series analysis as it facilitates easier slicing, aggregation, and resampling of data:
import pandas as pd
# Sample DataFrame with date information
data = {'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'],
'value': [100, 110, 120, 130]}
df = pd.DataFrame(data)
# Converting 'date' column to datetime and setting it as index
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
print(df)
Result:
value
date
2023-01-01 100
2023-01-02 110
2023-01-03 120
2023-01-04 130
Resampling Data#
Resampling is a powerful method for time series data aggregation or downsampling, which changes the frequency of your data:
# Resampling the data monthly and calculating the mean
monthly_mean = df.resample('M').mean()
print(monthly_mean)
Result:
value
date
2023-01-31 115.0
Rolling Window Operations#
Rolling window operations are useful for smoothing or calculating moving averages, which can help in identifying trends in time series data:
# Adding more data points for a better rolling example
additional_data = {'date': pd.date_range('2023-01-05', periods = 5, freq = 'D'),
'value': [140, 150, 160, 170, 180]}
additional_df = pd.DataFrame(additional_data)
df = pd.concat([df, additional_df.set_index('date')])
# Calculating rolling mean with a window of 5 days
rolling_mean = df.rolling(window = 5).mean()
print(rolling_mean)
Result:
value
date
2023-01-01 NaN
2023-01-02 NaN
2023-01-03 NaN
2023-01-04 NaN
2023-01-05 120.0
2023-01-06 130.0
2023-01-07 140.0
2023-01-08 150.0
2023-01-09 160.0
These techniques are essential for analyzing time series data efficiently, providing the tools needed to handle trends, seasonality, and other temporal structures in data.