Handling Time Series Data#

Time series data analysis is a crucial aspect of many fields such as finance, economics, and meteorology. Pandas provides robust tools for working with time series data, allowing for detailed analysis of time-stamped information. This chapter will explore how to manipulate time series data effectively using Pandas.

Set Datetime Index#

Setting a datetime index is foundational in time series analysis as it facilitates easier slicing, aggregation, and resampling of data:

import pandas as pd

# Sample DataFrame with date information
data = {'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'],
        'value': [100, 110, 120, 130]}
df = pd.DataFrame(data)

# Converting 'date' column to datetime and setting it as index
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
print(df)

Result:

            value
date
2023-01-01    100
2023-01-02    110
2023-01-03    120
2023-01-04    130

Resampling Data#

Resampling is a powerful method for time series data aggregation or downsampling, which changes the frequency of your data:

# Resampling the data monthly and calculating the mean
monthly_mean = df.resample('M').mean()
print(monthly_mean)

Result:

            value
date
2023-01-31  115.0

Rolling Window Operations#

Rolling window operations are useful for smoothing or calculating moving averages, which can help in identifying trends in time series data:

# Adding more data points for a better rolling example
additional_data = {'date': pd.date_range('2023-01-05', periods = 5, freq = 'D'),
                   'value': [140, 150, 160, 170, 180]}
additional_df = pd.DataFrame(additional_data)
df = pd.concat([df, additional_df.set_index('date')])

# Calculating rolling mean with a window of 5 days
rolling_mean = df.rolling(window = 5).mean()
print(rolling_mean)

Result:

              value
date
2023-01-01      NaN
2023-01-02      NaN
2023-01-03      NaN
2023-01-04      NaN
2023-01-05    120.0
2023-01-06    130.0
2023-01-07    140.0
2023-01-08    150.0
2023-01-09    160.0

These techniques are essential for analyzing time series data efficiently, providing the tools needed to handle trends, seasonality, and other temporal structures in data.