The PARTITION BY clause of a Window Function – with an example in MySQL.

In this blog post, I will look at a couple of interesting result sets using the PARTITION BY clause as part of the OVER() clause of a Window Function with an example using MySQL.

Photo by Erol Ahmed on Unsplash

Note: All data, names or naming found within the database presented in this post, are strictly used for practice, learning, instruction, and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.

OS and DB used:


  • Xubuntu Linux 16.04.5 LTS (Xenial Xerus)
  • MySQL 8.0.15

Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!


I’ll be using the table structure below, containing close to 90 rows of data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
mysql> DESC hiking_stats;
+-----------------+--------------+------+-----+---------+-------+
| Field           | Type         | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+---------+-------+
| day_walked      | date         | YES  |     | NULL    |       |
| burned_calories | decimal(4,1) | YES  |     | NULL    |       |
| distance_walked | decimal(4,2) | YES  |     | NULL    |       |
| time_walking    | time         | YES  |     | NULL    |       |
| pace            | decimal(2,1) | YES  |     | NULL    |       |
| shoes_worn      | text         | YES  |     | NULL    |       |
| trail_hiked     | text         | YES  |     | NULL    |       |
+-----------------+--------------+------+-----+---------+-------+
7 rows in set (0.00 sec)

mysql> SELECT COUNT(*) FROM hiking_stats;
+----------+
| COUNT(*) |
+----------+
|       84 |
+----------+
1 row in set (0.00 sec)

We can apply aggregate functions (E.g., SUM(), AVG(), etc …) in conjunction with the OVER() clause of a window function, producing some interesting query results.

I’ll start with the SUM() function and the OVER() clause filtering for just those walks in the month of ‘July’:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
mysql> SELECT day_walked, burned_calories,
    -> SUM(burned_calories) OVER(ORDER BY burned_calories ASC) AS amt_per_trail,
    -> trail_hiked FROM hiking_stats WHERE MONTHNAME(day_walked) = 'July';
+------------+-----------------+---------------+------------------------+
| day_walked | burned_calories | amt_per_trail | trail_hiked            |
+------------+-----------------+---------------+------------------------+
| 2018-07-21 |            93.1 |          93.1 | Yard Mowing            |
| 2018-07-23 |           322.9 |         416.0 | West Boundary          |
| 2018-07-03 |           323.7 |         739.7 | West Boundary          |
| 2018-07-12 |           325.9 |        1065.6 | West Boundary          |
| 2018-07-09 |           336.0 |        1401.6 | West Boundary          |
| 2018-07-28 |           337.4 |        1739.0 | Sandy Trail-Drive      |
| 2018-07-19 |           339.2 |        2078.2 | West Boundary          |
| 2018-07-17 |           339.4 |        2417.6 | West Boundary          |
| 2018-07-04 |           342.8 |        2760.4 | West Boundary          |
| 2018-07-07 |           347.6 |        3108.0 | Sandy Trail-Drive      |
| 2018-07-29 |           348.7 |        3456.7 | West Boundary          |
| 2018-07-08 |           351.6 |        3808.3 | West Boundary          |
| 2018-07-31 |           359.9 |        4168.2 | West Boundary          |
| 2018-07-30 |           361.6 |        4529.8 | West Boundary          |
| 2018-07-18 |           368.1 |        4897.9 | West Boundary          |
| 2018-07-16 |           368.6 |        5266.5 | West Boundary          |
| 2018-07-11 |           375.2 |        5641.7 | West Boundary          |
| 2018-07-06 |           375.7 |        6017.4 | West Boundary          |
| 2018-07-22 |           378.3 |        6774.0 | West Boundary          |
| 2018-07-27 |           378.3 |        6774.0 | West Boundary          |
| 2018-07-02 |           379.5 |        7153.5 | Yard Mowing            |
| 2018-07-25 |           379.9 |        7533.4 | West Boundary          |
| 2018-07-15 |           382.9 |        7916.3 | House-Power Line Route |
| 2018-07-24 |           386.4 |        8302.7 | West Boundary          |
| 2018-07-13 |           416.2 |        8718.9 | Yard Mowing            |
+------------+-----------------+---------------+------------------------+
25 rows in set (0.00 sec)

Note that on the 3rd row of the result set, the ‘amt_per_trail’ value of 739.7, is the total sum of the preceding (2) rows ‘burned_calories’ column values, in addition to the current row value (i.e., 93.1 + 322.9 + 323.7 = 739.7). This pattern continues throughout the remainder of the results with the combined total of all rows making up the last value of 8718.9.

Although that query does give us basically a nice running total, through-and-through, we can leverage specific clauses and retrieve even more detailed and interesting query results.

One of the 3 optional clause arguments allowed in the OVER() clause, PARTITION BY – with ordering and windowing being the other 2 – breaks the rows into individual sub-groups (i.e., rows of data). Let’s see with an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
mysql> SELECT day_walked,
    -> burned_calories,
    -> SUM(burned_calories) OVER(PARTITION BY trail_hiked ORDER BY burned_calories ASC) AS amt_per_trail,
    -> trail_hiked
    -> FROM hiking_stats WHERE MONTHNAME(day_walked) = 'July';
+------------+-----------------+---------------+------------------------+
| day_walked | burned_calories | amt_per_trail | trail_hiked            |
+------------+-----------------+---------------+------------------------+
| 2018-07-15 |           382.9 |         382.9 | House-Power Line Route |
| 2018-07-28 |           337.4 |         337.4 | Sandy Trail-Drive      |
| 2018-07-07 |           347.6 |         685.0 | Sandy Trail-Drive      |
| 2018-07-23 |           322.9 |         322.9 | West Boundary          |
| 2018-07-03 |           323.7 |         646.6 | West Boundary          |
| 2018-07-12 |           325.9 |         972.5 | West Boundary          |
| 2018-07-09 |           336.0 |        1308.5 | West Boundary          |
| 2018-07-19 |           339.2 |        1647.7 | West Boundary          |
| 2018-07-17 |           339.4 |        1987.1 | West Boundary          |
| 2018-07-04 |           342.8 |        2329.9 | West Boundary          |
| 2018-07-29 |           348.7 |        2678.6 | West Boundary          |
| 2018-07-08 |           351.6 |        3030.2 | West Boundary          |
| 2018-07-31 |           359.9 |        3390.1 | West Boundary          |
| 2018-07-30 |           361.6 |        3751.7 | West Boundary          |
| 2018-07-18 |           368.1 |        4119.8 | West Boundary          |
| 2018-07-16 |           368.6 |        4488.4 | West Boundary          |
| 2018-07-11 |           375.2 |        4863.6 | West Boundary          |
| 2018-07-06 |           375.7 |        5239.3 | West Boundary          |
| 2018-07-22 |           378.3 |        5995.9 | West Boundary          |
| 2018-07-27 |           378.3 |        5995.9 | West Boundary          |
| 2018-07-25 |           379.9 |        6375.8 | West Boundary          |
| 2018-07-24 |           386.4 |        6762.2 | West Boundary          |
| 2018-07-21 |            93.1 |          93.1 | Yard Mowing            |
| 2018-07-02 |           379.5 |         472.6 | Yard Mowing            |
| 2018-07-13 |           416.2 |         888.8 | Yard Mowing            |
+------------+-----------------+---------------+------------------------+
25 rows in set (0.00 sec)

An important differentiator in this query is that the ‘amt_per_trail’ running total values resets for each distinct group of ‘trail_hiked’ values since that column is specified in the PARTITION BY clause (and further separated into those sub-groups). Here’s the breakdown.

First, the 3rd row is the sum of all the rows for the ‘Sandy Trail-Drive’ trail_hiked column. Then on the immediately following row, trail ‘West Boundary’ group begins, ending on the 4th row from the end of the result set. And finally, ‘Yard Mowing’ values begin at the 3rd from the end of the result set continuing through the remaining rows left.

The ending row value for each sub-group is the aggregated total (via SUM()) for that entire group’s ‘burned_calories’ values.

I am still learning about the power of Window Functions. Through these examples, I hope to help both you and I understand them even better.

Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!

Explore the official MySQL 8.0 Online Manual for more information.

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Advertisements

One thought on “The PARTITION BY clause of a Window Function – with an example in MySQL.

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.