Pandas dataframe ordering with examples using sort_values().

Often times, you need some form of ordering in a result set. In the SQL world, without an ORDER BY clause, query results order is not guaranteed. What if you are working in the pandas world? Fear not. You can order DataFrame results in a similar fashion as that of ORDER BY using the sort_values() function. Let’s learn together by example…

person-looking-through-vinyl-records
Photo by Edu Grande on Unsplash

Note: All data, names or naming found within the database presented in this post, are strictly used for practice, learning, instruction, and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.

OS and DB used:
  • Xubuntu Linux 18.04.2 LTS (Bionic Beaver)
  • PostgreSQL 11.5


Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!

For a working data set, I’ll use the pandas read_csv() function and load a csv files’ contents into a DataFrame object:

1
2
>>> import pandas as pd
>>> df = pd.read_csv('/home/linux_user/Practice Data/Fitness_DB_Data/july_2019_hiking_stats.csv')
(To get started with read_csv(), see this post I wrote using a simple example)

Calling the head() function, returns the first 5 rows of data, allowing us to gain a sense of the DataFrame’s contents:

1
2
3
4
5
6
7
>>> df.head()
   day_walked  cal_burned  miles_walked  duration  mph  shoe_id
0  2019-07-01       330.5          3.27  01:00:45  3.2        4
1  2019-07-03       306.1          2.98  00:56:17  3.2        4
2  2019-07-04       330.4          3.17  01:00:45  3.1        4
3  2019-07-05       326.9          3.19  01:00:06  3.2        4
4  2019-07-06       327.2          3.23  01:00:06  3.2        4

One of several DataFrame attributes you can access for information is the columns label, which returns a list of the DataFrame’s column names:

1
2
3
4
>>> df.columns
Index(['day_walked', 'cal_burned', 'miles_walked', 'duration', 'mph',
       'shoe_id'],
      dtype='object')

Suppose you want the DataFrame results sorted by the ‘day_walked’ column, starting at the earliest date to the latest? Simply pass in that column name to the sort_values() by parameter:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
>>> dw_sort = df.sort_values(by=['day_walked'])
>>> dw_sort
    day_walked  cal_burned  miles_walked  duration  mph  shoe_id
0   2019-07-01       330.5          3.27  01:00:45  3.2        4
1   2019-07-03       306.1          2.98  00:56:17  3.2        4
2   2019-07-04       330.4          3.17  01:00:45  3.1        4
3   2019-07-05       326.9          3.19  01:00:06  3.2        4
4   2019-07-06       327.2          3.23  01:00:06  3.2        4
5   2019-07-08       330.4          3.31  01:00:45  3.3        4
6   2019-07-09       337.8          3.33  01:02:07  3.2        4
7   2019-07-10       323.1          3.18  00:59:24  3.2        4
8   2019-07-11       327.3          3.22  01:00:11  3.2        4
9   2019-07-12       327.0          3.21  01:00:08  3.2        4
10  2019-07-14       368.0          3.65  01:07:40  3.2        4
11  2019-07-15       359.4          3.55  01:06:05  3.2        5
12  2019-07-16       356.4          3.50  01:05:31  3.2        5
13  2019-07-17       354.7          3.54  01:05:13  3.3        5
14  2019-07-18       332.4          3.31  01:01:07  3.2        5
15  2019-07-19       358.8          3.52  01:05:58  3.2        5
16  2019-07-21       356.2          3.60  01:05:29  3.3        5
17  2019-07-22       355.7          3.50  01:05:24  3.2        5
18  2019-07-23       349.8          3.49  01:04:19  3.3        5
19  2019-07-24       352.7          3.49  01:04:51  3.2        5
20  2019-07-25       350.8          3.44  01:04:30  3.2        5
21  2019-07-26       358.5          3.52  01:05:54  3.2        5
22  2019-07-28       361.0          3.55  01:06:22  3.2        4
23  2019-07-29       359.9          3.52  01:06:11  3.2        6
24  2019-07-30       358.1          3.53  01:05:51  3.2        6
25  2019-07-31       224.0          2.22  00:41:11  3.2        6

Based on the next example, we can see that ascended sorting (from least to greatest) is the default. In specifying False as the value for the ascending parameter, the sort order is reversed to a descending (greatest to least) ordering:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
>>> dw_sort = df.sort_values(by=['day_walked'], ascending=False)
>>> dw_sort
    day_walked  cal_burned  miles_walked  duration  mph  shoe_id
25  2019-07-31       224.0          2.22  00:41:11  3.2        6
24  2019-07-30       358.1          3.53  01:05:51  3.2        6
23  2019-07-29       359.9          3.52  01:06:11  3.2        6
22  2019-07-28       361.0          3.55  01:06:22  3.2        4
21  2019-07-26       358.5          3.52  01:05:54  3.2        5
20  2019-07-25       350.8          3.44  01:04:30  3.2        5
19  2019-07-24       352.7          3.49  01:04:51  3.2        5
18  2019-07-23       349.8          3.49  01:04:19  3.3        5
17  2019-07-22       355.7          3.50  01:05:24  3.2        5
16  2019-07-21       356.2          3.60  01:05:29  3.3        5
15  2019-07-19       358.8          3.52  01:05:58  3.2        5
14  2019-07-18       332.4          3.31  01:01:07  3.2        5
13  2019-07-17       354.7          3.54  01:05:13  3.3        5
12  2019-07-16       356.4          3.50  01:05:31  3.2        5
11  2019-07-15       359.4          3.55  01:06:05  3.2        5
10  2019-07-14       368.0          3.65  01:07:40  3.2        4
9   2019-07-12       327.0          3.21  01:00:08  3.2        4
8   2019-07-11       327.3          3.22  01:00:11  3.2        4
7   2019-07-10       323.1          3.18  00:59:24  3.2        4
6   2019-07-09       337.8          3.33  01:02:07  3.2        4
5   2019-07-08       330.4          3.31  01:00:45  3.3        4
4   2019-07-06       327.2          3.23  01:00:06  3.2        4
3   2019-07-05       326.9          3.19  01:00:06  3.2        4
2   2019-07-04       330.4          3.17  01:00:45  3.1        4
1   2019-07-03       306.1          2.98  00:56:17  3.2        4
0   2019-07-01       330.5          3.27  01:00:45  3.2        4

Multiple column sorting is possible with sort_values() as well:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
>>> df.sort_values(by=['day_walked','cal_burned'], ascending=[False,False], inplace=True)
>>> df
    day_walked  cal_burned  miles_walked  duration  mph  shoe_id
25  2019-07-31       224.0          2.22  00:41:11  3.2        6
24  2019-07-30       358.1          3.53  01:05:51  3.2        6
23  2019-07-29       359.9          3.52  01:06:11  3.2        6
22  2019-07-28       361.0          3.55  01:06:22  3.2        4
21  2019-07-26       358.5          3.52  01:05:54  3.2        5
20  2019-07-25       350.8          3.44  01:04:30  3.2        5
19  2019-07-24       352.7          3.49  01:04:51  3.2        5
18  2019-07-23       349.8          3.49  01:04:19  3.3        5
17  2019-07-22       355.7          3.50  01:05:24  3.2        5
16  2019-07-21       356.2          3.60  01:05:29  3.3        5
15  2019-07-19       358.8          3.52  01:05:58  3.2        5
14  2019-07-18       332.4          3.31  01:01:07  3.2        5
13  2019-07-17       354.7          3.54  01:05:13  3.3        5
12  2019-07-16       356.4          3.50  01:05:31  3.2        5
11  2019-07-15       359.4          3.55  01:06:05  3.2        5
10  2019-07-14       368.0          3.65  01:07:40  3.2        4
9   2019-07-12       327.0          3.21  01:00:08  3.2        4
8   2019-07-11       327.3          3.22  01:00:11  3.2        4
7   2019-07-10       323.1          3.18  00:59:24  3.2        4
6   2019-07-09       337.8          3.33  01:02:07  3.2        4
5   2019-07-08       330.4          3.31  01:00:45  3.3        4
4   2019-07-06       327.2          3.23  01:00:06  3.2        4
3   2019-07-05       326.9          3.19  01:00:06  3.2        4
2   2019-07-04       330.4          3.17  01:00:45  3.1        4
1   2019-07-03       306.1          2.98  00:56:17  3.2        4
0   2019-07-01       330.5          3.27  01:00:45  3.2        4
(Note: It appears that precedence is given to the ‘day_walked’ column in the sorting order. On this, I am still learning. Feel free to share thoughts, comments, and more information about multiple column sorts…)

Information Links…

Visit the below links for more information on many of the topics covered in this post:

Have a look at sort_values() when you need a sure sorting order for pandas DataFrame contents.

Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Advertisements

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.