PostgreSQL Subquery as predicate in WHERE clause.

In Exploring subqueries in PostgreSQL – Column names and expressions, I discussed generating SELECT statement columns and expressions with subqueries. This blog post will visit another common pattern. Subqueries as predicate filters in the WHERE clause.

shot looking down a high tech tunnel

Photo by Daniel Chen on Unsplash

OS and DB used:

  • Xubuntu Linux 16.04.3 LTS (Xenial Xerus)
  • PostgreSQL 10.4


The tables and data are the same used in the previous (linked above) blog post, but here is a refresher:
(Note: Output format is unaligned with psql \a meta-command).

1
2
3
4
5
6
7
8
9
10
11
12
fitness=> \d hiking_stats;
Table "public.hiking_stats"
Column           |Type                  |
hike_id          |integer               |
day_walked       |date                  |
cal_burned       |numeric(4,1)          |
miles_walked     |numeric(4,2)          |
duration         |time without time zone|
mph              |numeric(2,1)          |
additional_weight|boolean               |
trekking_poles   |boolean               |
shoe_id          |integer               |

To make things a bit easier for the blog post, I’ll gather some data to work with using CREATE TABLE AS (I wrote a previous blog post on CREATE TABLE AS you can check out here) syntax:

1
2
3
4
5
6
fitness=> CREATE TABLE may_stats
fitness-> AS
fitness-> SELECT * FROM hiking_stats
fitness-> WHERE extract(month FROM day_walked) = 5
fitness-> WITH DATA;
SELECT 25

Now I have a table with only the results for the month of May.
I am interested on which hikes the cal_burned values are greater than the average calories burned for all hikes in May.
First of all, I need the average across all rows for that column to even make any sort of comparison, so I’ll retrieve that value with this query:

1
2
3
4
5
6
fitness=> SELECT ROUND(AVG(cal_burned), 2) AS calorie_avg
fitness-> FROM may_stats;
 calorie_avg
-------------
      367.84
(1 row)

And with that value, construct this query to answer my original question:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
fitness=> SELECT day_walked, cal_burned, duration
fitness-> FROM may_stats
fitness-> WHERE cal_burned > 367.84;
 day_walked | cal_burned | duration
------------+------------+----------
 2018-05-18 |      566.0 | 01:59:50
 2018-05-20 |      400.0 | 01:13:32
 2018-05-21 |      401.6 | 01:13:50
 2018-05-22 |      407.8 | 01:14:58
 2018-05-23 |      397.9 | 01:14:52
 2018-05-24 |      396.3 | 01:14:34
 2018-05-27 |      395.1 | 01:14:21
 2018-05-28 |      388.5 | 01:13:06
 2018-05-29 |      390.7 | 01:13:32
 2018-05-30 |      394.7 | 01:14:17
 2018-05-31 |      392.9 | 01:13:56
 2018-05-08 |      371.8 | 01:08:21
 2018-05-13 |      372.2 | 01:08:26
(13 rows)

I’ve had numerous hikes above the average!
Woot woot for me!!!
While this does work, I’m using two separate queries to arrive here.
Are you thinking like I am?
I thought so.
There is a better way.
This is a prime use case for filtering with a subquery in the WHERE clause.
We can essentially take that original ‘exploratory’ query, and ‘feed’ it (so to speak) to the WHERE clause like so:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
fitness=> SELECT day_walked, cal_burned, duration
fitness-> FROM may_stats
fitness-> WHERE cal_burned > (SELECT ROUND(AVG(cal_burned)) FROM may_stats);
 day_walked | cal_burned | duration
------------+------------+----------
 2018-05-18 |      566.0 | 01:59:50
 2018-05-20 |      400.0 | 01:13:32
 2018-05-21 |      401.6 | 01:13:50
 2018-05-22 |      407.8 | 01:14:58
 2018-05-23 |      397.9 | 01:14:52
 2018-05-24 |      396.3 | 01:14:34
 2018-05-27 |      395.1 | 01:14:21
 2018-05-28 |      388.5 | 01:13:06
 2018-05-29 |      390.7 | 01:13:32
 2018-05-30 |      394.7 | 01:14:17
 2018-05-31 |      392.9 | 01:13:56
 2018-05-08 |      371.8 | 01:08:21
 2018-05-13 |      372.2 | 01:08:26
(13 rows)

Identical results.
Yet, with only one query versus two.
Successful execution for this individual subquery depends on it only returning one column and one row from that column.
This type of subquery is known as a scalar subquery.
(Visit 4.2.11. Scalar Subqueries from the official documentation for more information.)

Here are error examples if you attempt retrieving multiple rows for this type of subquery:


!!! Warning: Incorrect !!!

1
2
3
4
fitness=> SELECT day_walked, cal_burned, duration
fitness-> FROM may_stats
fitness-> WHERE cal_burned > (SELECT cal_burned FROM may_stats);
ERROR:  more than one row returned by a subquery used as an expression

By removing the AVG() aggregate function on the cal_burned column, all rows for that column are returned.

1
2
3
4
5
6
fitness=> SELECT day_walked, cal_burned, duration
fitness-> FROM may_stats
fitness-> WHERE cal_burned > (SELECT day_walked, cal_burned FROM may_stats);
ERROR:  subquery must return only one column
LINE 3: WHERE cal_burned > (SELECT day_walked, cal_burned FROM may_s...
                           ^

The error is self-explanatory.
The subquery selects and returns more than one column.


While this blog post is not exhaustive by any measure when it comes to subqueries as predicate filters, my hope is that it is a solid start in the right direction for you.
I would love to know of better practices and any other observations in the comments below.
Thanks for reading.
Be sure and visit the stellar on-line PostgreSQL 10 Documentation for any related questions.


A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)
Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.


Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

 

Advertisements

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.