Easy datetime manipulation with the pandas to_datetime() function.

It seems like no matter what (programming) language you use, there are nuances when it comes to working with date and/or datetime values. Nevertheless, dates are super important. A lot hinges on date and time values. For instance, it is paramount that my employer and I both are conscious of the days and hours I work. Be it business-related or centered around life, we always tend to keep the time so to speak. Pandas has a to_datetime() function that provides quite a bit of useful functionality when working with datetime values. Inspired by a great video where I learned about this function, this post will cover several basic examples of its use. As always, read and learn with me…

Photo by Manasvita S on Unsplash
OS, Database, and software used:
     
  • OpenSuse Leap 15.0
  •  
  • Python 3.7.2


Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!


To start, I must give credit where credit is due. I learned so much from this fantastic pandas video. Several of the examples that follow are inspired or derived from it, applying the concepts to my own data sources. Be sure and check it out!

To use pandas we need to first, import it. I’ll also be using csv data so the powerful read_csv() function takes care of that. Do check out any one of the several links provided in the closing section for other blog posts I’ve written on the read_csv() function.

1
2
3
4
5
6
7
8
9
10
>>> import pandas as pd
>>> from datetime import date
>>> data = pd.read_csv(r'/home/joshua/Practice Data/Fitness_DB_Data/aug_2019_hiking_stats.csv', delimiter=',')
>>> data.head()
 day_walked cal_burned miles_walked duration mph shoe_id
0 20190801 358.1 3.53 01:05:51 3.2 6
1 20190802 354.7 3.46 01:05:12 3.2 6
2 20190804 354.2 3.55 01:05:07 3.3 6
3 20190805 351.3 3.52 01:04:36 3.3 6
4 20190806 292.2 2.83 00:53:44 3.2 4

Let’s look at the dtypes attribute, paying particular attention to the ‘day_walked’ columns’ type.

1
2
3
4
5
6
7
8
>>> data.dtypes
day_walked object
cal_burned float64
miles_walked float64
duration object
mph float64
shoe_id int64
dtype: object

Using to_datetime(), I’ll reassign the ‘day_walked’ columns type to a datetime of the same format that the column string currently represents: 4 digit year, 2 digit month, and 2 digit day:

1
data['day_walked'] = pd.to_datetime(data['day_walked'], format='%Y-%m-%d')

Now have a look at the ‘day_walked’ column type:

1
2
3
4
5
6
7
8
>>> data.dtypes
day_walked datetime64[ns]
cal_burned float64
miles_walked float64
duration object
mph float64
shoe_id int64
dtype: object

Since it is a datetime type, you now can access several properties relating to date and time functionality. It is important to note here that my dataset does not contain any timestamp values for column ‘day_walked’. Therefore, those specific types of properties are not available for it.

You can retrieve the actual weekday name via that property:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
>>> data['day_walked'].dt.weekday_name
0 Thursday
1 Friday
2 Sunday
3 Monday
4 Tuesday
5 Wednesday
6 Thursday
7 Friday
8 Sunday
9 Monday
10 Tuesday
11 Wednesday
12 Thursday
13 Friday
14 Sunday
15 Monday
16 Tuesday
17 Thursday
18 Friday
19 Sunday
20 Tuesday
21 Wednesday
22 Thursday
23 Friday

Although we can obviously see it in the dataset, you can also extract the year value through that particular property:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
>>> data['day_walked'].dt.year
0 2019
1 2019
2 2019
3 2019
4 2019
5 2019
6 2019
7 2019
8 2019
9 2019
10 2019
11 2019
12 2019
13 2019
14 2019
15 2019
16 2019
17 2019
18 2019
19 2019
20 2019
21 2019
22 2019
23 2019

Same with the month number:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
>>> data['day_walked'].dt.month
0 8
1 8
2 8
3 8
4 8
5 8
6 8
7 8
8 8
9 8
10 8
11 8
12 8
13 8
14 8
15 8
16 8
17 8
18 8
19 8
20 8
21 8
22 8
23 8

Need just the day of the month? There is an attribute for it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
>>> data['day_walked'].dt.day
0 1
1 2
2 4
3 5
4 6
5 7
6 8
7 9
8 11
9 12
10 13
11 14
12 15
13 16
14 18
15 19
16 20
17 22
18 23
19 25
20 27
21 28
22 29
23 30

The weekday attribute returns a number, indicating the day of the week where Monday would be 0 (zero) and Sunday is 6:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
>>> data['day_walked'].dt.weekday
0 3
1 4
2 6
3 0
4 1
5 2
6 3
7 4
8 6
9 0
10 1
11 2
12 3
13 4
14 6
15 0
16 1
17 3
18 4
19 6
20 1
21 2
22 3
23 4

To know the ordinal day of the year, that attribute is available:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
>>> data['day_walked'].dt.dayofyear
0 213
1 214
2 216
3 217
4 218
5 219
6 220
7 221
8 223
9 224
10 225
11 226
12 227
13 228
14 230
15 231
16 232
17 234
18 235
19 237
20 239
21 240
22 241
23 242

You can easily determine the first day of the month with is_month_start. For those dates that are not, False is returned:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
>>> data['day_walked'].dt.is_month_start
0 True
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False

According to the results above, I did walk on the first day of the month. However, calling is_month_end returns False for each row, that is not the last day of the month, indicating I did not walk on the last day of the month as shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
>>> data['day_walked'].dt.is_month_end
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False

By it being a datetime data type, column ‘walking_stats’ can be used in comparisons with other datetime values. Let’s assign the date of ‘2019–08–19’ to a variable:

1
>>> curious_day = pd.to_datetime('2019–08–19')

To retrieve those rows where the date is greater than or equal to the date value of ‘2019–08–19’, the comparison could look like this:

1
2
3
4
5
6
7
8
9
10
11
>>> data.loc[data['day_walked'] >= curious_day]
 day_walked cal_burned miles_walked duration mph shoe_id
15 20190819 370.1 3.68 01:08:03 3.2 5
16 20190820 360.1 3.60 01:06:20 3.3 5
17 20190822 376.1 3.75 01:09:09 3.3 5
18 20190823 351.7 3.46 01:04:39 3.2 5
19 20190825 355.1 3.51 01:05:17 3.2 5
20 20190827 378.1 3.71 01:09:31 3.2 5
21 20190828 345.1 3.41 01:03:27 3.2 5
22 20190829 365.8 3.58 01:07:15 3.2 5
23 20190830 205.5 2.02 00:37:47 3.2 5

Likewise, using the less-than comparison operator, retrieves those rows with dates prior to ‘2019–08–19’:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
>>> data.loc[data['day_walked'] < curious_day]
 day_walked cal_burned miles_walked duration mph shoe_id
0 20190801 358.1 3.53 01:05:51 3.2 6
1 20190802 354.7 3.46 01:05:12 3.2 6
2 20190804 354.2 3.55 01:05:07 3.3 6
3 20190805 351.3 3.52 01:04:36 3.3 6
4 20190806 292.2 2.83 00:53:44 3.2 4
5 20190807 339.0 3.31 01:02:20 3.2 4
6 20190808 344.1 3.36 01:03:16 3.2 4
7 20190809 316.7 3.12 00:58:13 3.2 5
8 20190811 363.2 3.64 01:06:47 3.3 5
9 20190812 346.1 3.42 01:03:38 3.2 5
10 20190813 370.0 3.66 01:08:01 3.2 5
11 20190814 347.0 3.45 01:03:55 3.2 5
12 20190815 327.0 3.25 01:00:09 3.2 5
13 20190816 351.0 3.45 01:04:32 3.2 5
14 20190818 347.4 3.47 01:03:53 3.3 5

Explore the to_datetime() function, trying out it’s handy functionality on your data sets’ datetime values. Thanks for reading!

See these links to official documentation on many of the pandas topics covered in the post:

Other posts you may be interested in: Bulk CSV Uploads with Pandas and PostgreSQL

Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

One thought on “Easy datetime manipulation with the pandas to_datetime() function.

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.