Goal of this lesson:
In this lesson, you’ll learn how to work with dates and times in pandas, which is critical in many data engineering tasks such as time-series analysis, monitoring, and scheduling.
Often, dates are stored as strings in raw data. pandas provides pd.to_datetime() to convert these strings into datetime objects.
Let’s start with a simple example:
import pandas as pd
# Sample data with date strings
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Join_Date': ['2023-01-15', '2022-06-24', '2021-03-10', '2020-12-19']
}
df = pd.DataFrame(data)
# Convert the 'Join_Date' column to datetime
df['Join_Date'] = pd.to_datetime(df['Join_Date'])
print(df)Now, the Join_Date column contains datetime objects, making it easier to work with dates in pandas.
Once you’ve converted your strings to datetime objects, you can easily extract specific components like the year, month, or day.
- Extracting the year:
df['Year'] = df['Join_Date'].dt.year
print(df)- Extracting the month:
df['Month'] = df['Join_Date'].dt.month
print(df)- Extracting the day of the week:
df['Day_of_Week'] = df['Join_Date'].dt.day_name()
print(df)If you’re working with time-series data (such as monitoring logs or stock prices), you’ll often need to set a datetime column as the index for easier analysis. Let’s look at an example:
# Sample time-series data
data = {
'Date': ['2024-09-01', '2024-09-02', '2024-09-03', '2024-09-04'],
'Value': [100, 110, 105, 115]
}
df_ts = pd.DataFrame(data)
# Convert 'Date' to datetime and set as index
df_ts['Date'] = pd.to_datetime(df_ts['Date'])
df_ts.set_index('Date', inplace=True)
print(df_ts)With the Date column set as the index, you can now perform time-based operations more easily.
Now it’s your turn to play with some time-series data. Try the following tasks with this dataset:
data = {
'Date': ['2024-01-01', '2024-02-01', '2024-03-01', '2024-04-01'],
'Temperature': [5.2, 7.4, 10.1, 12.8]
}
df_weather = pd.DataFrame(data)
# Exercise 1: Convert the 'Date' column to datetime
df_weather['Date'] = pd.to_datetime(df_weather['Date'])
print(df_weather)
# Exercise 2: Extract the month and year from the 'Date' column
df_weather['Month'] = df_weather['Date'].dt.month
df_weather['Year'] = df_weather['Date'].dt.year
print(df_weather)
# Exercise 3: Set the 'Date' column as the index and view the DataFrame
df_weather.set_index('Date', inplace=True)
print(df_weather)In this lesson, we’ve covered:
- How to use
pd.to_datetime()to convert date strings to datetime objects. - Extracting date components like year, month, and day of the week.
- Handling time-series data and setting a datetime column as the index.
- Hands-on exercises to practice working with dates and times.