Window Functions in PostgreSQL – example with 3-day rolling average.

After reading this fantastic post, Window Functions in Python and SQL, I decided to apply a similar function to a data set that interests me: the walking/hiking stats I keep up with for all of my (daily) walks. While this blog post will cover more of the SQL aspect, I plan to write one covering the Python and Pandas portion in the near future…

OS, Database, and software used:
  • Xubuntu Linux 18.04.2 LTS (Bionic Beaver)
  • PostgreSQL 11.4


Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!


I use this table and structure to store and track walking stats data. I have written several blog posts detailing different methods using PostgreSQL and pandas, for bulk loading CSV data in it. Be sure and visit those linked posts in the closing section below if you are interested.

1
2
3
4
5
6
7
8
9
10
walking_stats=> \d stats;
                          Table "public.stats"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 day_walked   | date                   |           |          |
 cal_burned   | numeric(4,1)           |           |          |
 miles_walked | numeric(4,2)           |           |          |
 duration     | time without time zone |           |          |
 mph          | numeric(2,1)           |           |          |
 shoe_id      | integer                |           |          |

Using a Window Function, we can retrieve query results for a 3 day rolling average of calories burned:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SELECT day_walked,
cal_burned,
AVG(cal_burned) OVER(ORDER BY day_walked ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS three_day_avg
FROM stats
WHERE EXTRACT(MONTH FROM day_walked) = 1;
 day_walked | cal_burned |    three_day_avg    
------------+------------+----------------------
 2019-01-01 |      132.8 | 132.8000000000000000
 2019-01-02 |      181.1 | 156.9500000000000000
 2019-01-07 |      207.3 | 173.7333333333333333
 2019-01-08 |      218.2 | 202.2000000000000000
 2019-01-09 |      193.0 | 206.1666666666666667
 2019-01-10 |      160.2 | 190.4666666666666667
 2019-01-11 |      206.3 | 186.5000000000000000
 2019-01-13 |      253.2 | 206.5666666666666667
 2019-01-14 |      177.6 | 212.3666666666666667
 2019-01-15 |      207.0 | 212.6000000000000000
 2019-01-16 |      248.7 | 211.1000000000000000
 2019-01-17 |      176.3 | 210.6666666666666667
 2019-01-19 |      200.2 | 208.4000000000000000
 2019-01-20 |      244.4 | 206.9666666666666667
 2019-01-21 |      205.9 | 216.8333333333333333
 2019-01-22 |      244.8 | 231.7000000000000000
 2019-01-23 |      231.8 | 227.5000000000000000
 2019-01-25 |      244.9 | 240.5000000000000000
 2019-01-27 |      302.7 | 259.8000000000000000
 2019-01-28 |      170.2 | 239.2666666666666667
 2019-01-29 |      235.5 | 236.1333333333333333
 2019-01-30 |      254.2 | 219.9666666666666667
 2019-01-31 |      229.5 | 239.7333333333333333
(23 rows)

To clean up all the extra digits in the ‘three_day_avg’ column, we can wrap the Window Function in the ROUND() function, keeping only 2 digits after the decimal:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
walking_stats=> SELECT day_walked, cal_burned, ROUND(AVG(cal_burned) OVER(ORDER BY day_walked ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),2) AS three_day_avg
FROM stats
WHERE EXTRACT(MONTH FROM day_walked) = 1;
 day_walked | cal_burned | three_day_avg
------------+------------+---------------
 2019-01-01 |      132.8 |        132.80
 2019-01-02 |      181.1 |        156.95
 2019-01-07 |      207.3 |        173.73
 2019-01-08 |      218.2 |        202.20
 2019-01-09 |      193.0 |        206.17
 2019-01-10 |      160.2 |        190.47
 2019-01-11 |      206.3 |        186.50
 2019-01-13 |      253.2 |        206.57
 2019-01-14 |      177.6 |        212.37
 2019-01-15 |      207.0 |        212.60
 2019-01-16 |      248.7 |        211.10
 2019-01-17 |      176.3 |        210.67
 2019-01-19 |      200.2 |        208.40
 2019-01-20 |      244.4 |        206.97
 2019-01-21 |      205.9 |        216.83
 2019-01-22 |      244.8 |        231.70
 2019-01-23 |      231.8 |        227.50
 2019-01-25 |      244.9 |        240.50
 2019-01-27 |      302.7 |        259.80
 2019-01-28 |      170.2 |        239.27
 2019-01-29 |      235.5 |        236.13
 2019-01-30 |      254.2 |        219.97
 2019-01-31 |      229.5 |        239.73
(23 rows)

And there it is, a 3-day rolling average of calories burned for the month of ‘January’.

We can use SQL to easily check the math for a particular row. I’ll focus on row 3, dated ‘2019-01-07’. Since the WINDOWING portion of the OVER() clause, ROWS BETWEEN 2 PRECEDING AND CURRENT ROW essentially means: Include (Average) the ‘cal_burned’ values for the current row and those 2 rows above – or the PRECEDING 2 rows- the math would look like this:

1
2
3
4
5
walking_stats=> SELECT ROUND((207.3 + 181.1 + 132.8) / 3,2) AS three_day_avg;
 three_day_avg
---------------
        173.73
(1 row)

I have written several blog posts about Window Functions within both the PostgreSQL and MySQL ecosystems, however, the 2 below are most similar to this post and provide more information concerning the windowing portion of the OVER() clause:

Other posts you may be interested in: Bulk CSV Uploads with Pandas and PostgreSQL

Try out Window Functions yourself to calculate rolling averages, sums, and the like on data sets that interest you. Hit me up in the comments with some examples. I’d love to know of more interesting use cases. Thanks for reading!

Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Advertisements

Pandas dataframe ordering with examples using sort_values().

Often times, you need some form of ordering in a result set. In the SQL world, without an ORDER BY clause, query results order is not guaranteed. What if you are working in the pandas world? Fear not. You can order DataFrame results in a similar fashion as that of ORDER BY using the sort_values() function. Let’s learn together by example…

[Head this way for great PostgresSQL and Python blogging >>>]

Regular expressions in PostgreSQL with regexp_match() – With examples.

Regular expressions are somewhat new to me in the sense of, I know what they are, it’s just I have never had to use them. I’ve always gotten by using the LIKE search pattern. However, I wanted to replicate a particular UPDATE in PostgreSQL that I carried-out in MS Access (not with regex’s there either) and discovered a need for basic regex search and match in order to accomplish it. Let’s visit and learn about the regexp_match() function and how I used it…

[Head this way for great PostgresSQL blogging >>>]

Pandas merge() and read_sql() – joining DataFrames.

I have written several articles recently, about pandas and PostgreSQL database interaction – specifically in loading CSV data. In this post, I’ll cover what I have recently learned using pandas merge() and read_sql_query(), retrieving query results using INNER JOIN‘s and similar queries.

[Python, Pandas and PostgreSQL.. It’s all here >>>]

PostgreSQL LEFT() and RIGHT() functions revisitied – String comparison use case.

In my day job (Pipeline Survey Data Analyst) I sometimes have the opportunity to write custom SQL queries in an MS Access database, which is the back end of one of the proprietary GIS solutions, my colleagues use. Although I feel that Access’s SQL implementation is not quite as robust as other SQL dialects, it does get the job done in certain situations (the visual interface continues to grow on me). For a learning experiment, I decided to reproduce – and solve – the same requirements using PostgreSQL, as that I had in the MS Access environment. However, I discovered an all-together different challenge.

Both MS Access and MySQL provide several string functions. One of those is a particularly useful string comparing function. MySQL has STRCMP() while in Access, there is a similar StrComp(). Postgres does not have its own version that I am aware of. I leaned heavily on this type of function in MS Access. Hopefully, readers will clue me in on what Postgres-specific string function I could use in its stead. Meanwhile, read on to see the workaround I used…

[Head this way for great PostgresSQL blogging >>>]