Exploring subqueries in PostgreSQL – Column names and expressions.

There is an old saying that there is more than one way to skin a cat. Fear not, I won’t be skinning any, but that saying applies for many queries within PostgreSQL (and SQL in general). While the structure of some queries are ideal for execution, speed, and performance others are useful for analysis, exploration, and reporting. In this blog post, I’ll explore using subqueries to generate SELECT statement columns.

sticky post-it notes on a wall

Photo by Patrick Perkins on Unsplash

Note: All data, names or naming found within the database presented in this post, are strictly used for practice, learning, instruction, and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.

OS and Database:

  • Xubuntu Linux 16.04.3 LTS (Xenial Xerus)
  • PostgreSQL 10.4

Subqueries are a special type of SELECT statement. They can be used for the target columns of your SELECT clauses, even if the table that subquery derives from is not named in the FROM clause.
Oftentimes they are used as part of predicate filters in WHERE clauses but that topic is for another blog post.
All in all, they are quite a powerful feature.

To start, let’s see the description of the two tables I’ll use that are part of a database in my personal/development environment used to track trail hiking stats.
(Note: Format output has been unaligned with the psql \a meta-command for a better on-screen display, hence many columns are absent from the regular \dt meta-command output)
The hiking_stats table:

1
2
3
4
5
6
7
8
9
10
11
12
fitness=> \d hiking_stats;
Table "public.hiking_stats"
Column           |Type                  |
hike_id          |integer               |
day_walked       |date                  |
cal_burned       |numeric(4,1)          |
miles_walked     |numeric(4,2)          |
duration         |time without time zone|
mph              |numeric(2,1)          |
additional_weight|boolean               |
trekking_poles   |boolean               |
shoe_id          |integer               |

And a shoe_brand table:

1
2
3
4
5
fitness=> \d shoe_brand
Table "public.shoe_brand"
Column    |Type   |
shoe_id   |integer|
name_brand|text   |

If you notice, these tables share a common shoe_id column that can be used to form a relationship between them.
I do track what shoes I wear on my hikes, so this is important information to me.
To know what shoe I wore on a hike, I can query with this INNER JOIN as shown below.
I’ll filter the query results for those rows from the month of May:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
fitness=> SELECT hs.day_walked, hs.cal_burned, hs.miles_walked, hs.duration, sb.name_brand
fitness-> FROM hiking_stats AS hs
fitness-> INNER JOIN shoe_brand AS sb
fitness-> ON hs.shoe_id = sb.shoe_id
fitness-> WHERE extract(month from day_walked) = 5;
 day_walked | cal_burned | miles_walked | duration |              name_brand              
------------+------------+--------------+----------+---------------------------------------
 2018-05-02 |      311.2 |         3.27 | 00:57:13 | New Balance Trail Runners-All Terrain
 2018-05-03 |      320.8 |         3.38 | 00:58:59 | New Balance Trail Runners-All Terrain
 2018-05-04 |      291.3 |         3.01 | 00:53:33 | Keen Koven WP(keen-dry)
 2018-05-06 |      363.8 |         3.82 | 01:06:53 | New Balance Trail Runners-All Terrain
 2018-05-08 |      371.8 |         3.89 | 01:08:21 | New Balance Trail Runners-All Terrain
 2018-05-09 |      362.7 |         3.83 | 01:06:42 | New Balance Trail Runners-All Terrain
 2018-05-07 |      354.0 |         3.72 | 01:05:05 | Keen Koven WP(keen-dry)
 2018-05-10 |      353.3 |         3.60 | 01:14:47 | New Balance Trail Runners-All Terrain
 2018-05-12 |      144.2 |         1.28 | 00:30:31 | New Balance Trail Runners-All Terrain
 2018-05-13 |      372.2 |         3.86 | 01:08:26 | New Balance Trail Runners-All Terrain
 2018-05-14 |      363.3 |         3.88 | 01:06:48 | New Balance Trail Runners-All Terrain
 2018-05-15 |      360.7 |         3.73 | 01:06:19 | New Balance Trail Runners-All Terrain
 2018-05-16 |      360.5 |         3.72 | 01:06:16 | New Balance Trail Runners-All Terrain
(13 rows)

This works just fine.
And likely is the ‘ideal’, most efficient query to run.
But, as with much of SQL, there is more than one way to retrieve a results set.
Let’s look at using a subquery to generate the name_brand column from table shoe_brand without performing a JOIN.
Or naming it in the outside (outer) query’s FROM clause:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
fitness=> SELECT hs.day_walked, hs.cal_burned,
fitness-> hs.miles_walked, hs.duration,
fitness-> (SELECT sb.name_brand FROM shoe_brand AS sb WHERE hs.shoe_id = sb.shoe_id) AS shoe_worn
fitness-> FROM hiking_stats AS hs
fitness-> WHERE extract(month from day_walked) = 5;
 day_walked | cal_burned | miles_walked | duration |               shoe_worn              
------------+------------+--------------+----------+---------------------------------------
 2018-05-02 |      311.2 |         3.27 | 00:57:13 | New Balance Trail Runners-All Terrain
 2018-05-03 |      320.8 |         3.38 | 00:58:59 | New Balance Trail Runners-All Terrain
 2018-05-04 |      291.3 |         3.01 | 00:53:33 | Keen Koven WP(keen-dry)
 2018-05-06 |      363.8 |         3.82 | 01:06:53 | New Balance Trail Runners-All Terrain
 2018-05-08 |      371.8 |         3.89 | 01:08:21 | New Balance Trail Runners-All Terrain
 2018-05-09 |      362.7 |         3.83 | 01:06:42 | New Balance Trail Runners-All Terrain
 2018-05-07 |      354.0 |         3.72 | 01:05:05 | Keen Koven WP(keen-dry)
 2018-05-10 |      353.3 |         3.60 | 01:14:47 | New Balance Trail Runners-All Terrain
 2018-05-12 |      144.2 |         1.28 | 00:30:31 | New Balance Trail Runners-All Terrain
 2018-05-13 |      372.2 |         3.86 | 01:08:26 | New Balance Trail Runners-All Terrain
 2018-05-14 |      363.3 |         3.88 | 01:06:48 | New Balance Trail Runners-All Terrain
 2018-05-15 |      360.7 |         3.73 | 01:06:19 | New Balance Trail Runners-All Terrain
 2018-05-16 |      360.5 |         3.72 | 01:06:16 | New Balance Trail Runners-All Terrain
(13 rows)

Identical results as the previous INNER JOIN query, but this time with a subquery.
The shoe_id column is a good candidate here to match with.
And a critical component to the subquery.
Notice the subquery recognized the hs table ALIAS from the outer query and executed no problem with it.
What if we didn’t filter the subquery results with the WHERE clause?
Let’s see:

1
2
3
4
5
6
fitness=> SELECT hs.day_walked, hs.cal_burned,
fitness-> hs.miles_walked, hs.duration,
fitness-> (SELECT sb.name_brand FROM shoe_brand AS sb) AS shoe_worn
fitness-> FROM hiking_stats AS hs
fitness-> WHERE extract(month from day_walked) = 5;
ERROR:  more than one row returned by a subquery used as an expression

Let’s visit 4.2.11. Scalar Subqueries from the official documentation to learn why this error happened.
This passage directly from that section sums it up nicely:

A scalar subquery is an ordinary SELECT query in parentheses that returns exactly one row with one
column. (See Chapter 7 for information about writing queries.) The SELECT query is executed and the
single returned value is used in the surrounding value expression. It is an error to use a query that returns
more than one row or more than one column as a scalar subquery.

By not filtering with the WHERE clause in the subquery, the SELECT statement is doing this:

1
2
3
4
5
6
7
fitness=> SELECT sb.name_brand FROM shoe_brand AS sb;
              name_brand              
---------------------------------------
 New Balance 510v2
 New Balance Trail Runners-All Terrain
 Keen Koven WP(keen-dry)
(3 rows)

And we are trying to stuff all three of those rows into one row for column shoe_brand.
Not happening!
Therefore, by restricting the shoe_id within the subquery to that of the matching shoe_id column value from the hiking_stats table row in the outer query, we prohibit returning all the rows.
Which is what threw the error.

Scalar subqueries can be as complex as we need, to retrieve the desired column name or expression.
We’ll visit a subquery that utilizes an INNER JOIN and a WHERE clause on a ‘linking table’, while the main ‘outer’ query is filtering with a WHERE clause itself.
Confused?
Not to worry!!!
Let’s view the additional tables for this next example.

1
2
3
4
5
6
fitness=> \d trail_route;
Table "public.trail_route"
Column          |Type   |
trail_id        |integer|
name            |text   |
surface_material|text   |

And here is the hiking_trail ‘linking table’:

1
2
3
4
5
fitness=> \d hiking_trail;
Table "public.hiking_trail"
Column  |Type    |
th_id   |integer |
tr_id   |integer |

Column trail_id from table trail_route forms a relationship (links) to table hiking_trail‘s tr_id column.
Next, column th_id references column hike_id in table hiking_stats.
Let’s see the sample query using this relationship:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
fitness=> SELECT hs.day_walked,
fitness-> (SELECT tr.name FROM trail_route AS tr INNER JOIN hiking_trail AS ht ON tr.trail_id = ht.tr_id WHERE ht.th_id = hs.hike_id) AS trail_hiked
fitness-> FROM hiking_stats AS hs
fitness-> WHERE hs.trekking_poles IS TRUE;
 day_walked |      trail_hiked      
------------+------------------------
 2018-04-24 | Tree Trail-extended
 2018-04-25 | Tree Trail-extended
 2018-04-26 | Tree Trail-extended
 2018-04-27 | Tree Trail-extended
 2018-04-29 | Tree Trail-extended
 2018-05-02 | Tree Trail-extended
 2018-05-03 | Sandy Trail-Drive
 2018-05-04 | House-Power Line Route
 2018-05-06 | Sandy Trail-Drive
 2018-05-08 | House-Power Line Route
 2018-05-09 | House-Power Line Route
 2018-05-07 | Sandy Trail-Drive
 2018-05-13 | Sandy Trail-Drive
 2018-05-14 | House-Power Line Route
 2018-05-15 | House-Power Line Route
 2018-05-16 | House-Power Line Route
(16 rows)

Again, these results can also be obtained with a multiple-table INNER JOIN like so:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
fitness=> SELECT hs.day_walked, tr.name
fitness-> FROM hiking_stats AS hs
fitness-> INNER JOIN hiking_trail AS ht
fitness-> ON hs.hike_id = ht.th_id
fitness-> INNER JOIN trail_route AS tr
fitness-> ON ht.tr_id = tr.trail_id
fitness-> WHERE hs.trekking_poles IS TRUE;
 day_walked |          name          
------------+------------------------
 2018-04-24 | Tree Trail-extended
 2018-04-25 | Tree Trail-extended
 2018-04-26 | Tree Trail-extended
 2018-04-27 | Tree Trail-extended
 2018-04-29 | Tree Trail-extended
 2018-05-02 | Tree Trail-extended
 2018-05-03 | Sandy Trail-Drive
 2018-05-04 | House-Power Line Route
 2018-05-06 | Sandy Trail-Drive
 2018-05-07 | Sandy Trail-Drive
 2018-05-08 | House-Power Line Route
 2018-05-09 | House-Power Line Route
 2018-05-13 | Sandy Trail-Drive
 2018-05-14 | House-Power Line Route
 2018-05-15 | House-Power Line Route
 2018-05-16 | House-Power Line Route
(16 rows)

Same results, just in a different query to obtain them.
One of the true beauties of SQL.

For this next example, I will get a count of how many times I wore each of the shoes I track in the shoe_brand table.
First, let’s see how an INNER JOIN will get us there:

1
2
3
4
5
6
7
8
9
10
11
fitness=> SELECT sb.name_brand, COUNT(*)
fitness-> FROM shoe_brand AS sb
fitness-> INNER JOIN hiking_stats AS hs
fitness-> ON sb.shoe_id = hs.shoe_id
fitness-> GROUP BY sb.name_brand;
              name_brand               | count
---------------------------------------+-------
 New Balance Trail Runners-All Terrain |    17
 New Balance 510v2                     |     1
 Keen Koven WP(keen-dry)               |     6
(3 rows)

It’s evident I have a favorite shoe!
Next, I’ll use a subquery to get the results:

1
2
3
4
5
6
7
8
9
fitness=> SELECT sb.name_brand,
fitness-> (SELECT count(*) FROM hiking_stats AS hs WHERE sb.shoe_id = hs.shoe_id) AS num_times_worn
fitness-> FROM shoe_brand AS sb;
              name_brand               | num_times_worn
---------------------------------------+----------------
 New Balance 510v2                     |              1
 New Balance Trail Runners-All Terrain |             17
 Keen Koven WP(keen-dry)               |              6
(3 rows)

Like previous examples, we restrict rows for the column (the subquery) with a match on the shoe_id column in the WHERE clause.
And if we remove the WHERE clause from the subquery?
Will it run?
Or error out on us?
Let’s see.

1
2
3
4
5
6
7
8
9
fitness=> SELECT sb.name_brand,
fitness-> (SELECT COUNT(*) FROM hiking_stats) AS num_times_worn
fitness-> FROM shoe_brand AS sb;
              name_brand               | num_times_worn
---------------------------------------+----------------
 New Balance 510v2                     |             24
 New Balance Trail Runners-All Terrain |             24
 Keen Koven WP(keen-dry)               |             24
(3 rows)

No errors.
However, the results are not correct.
Since we did not filter with a WHERE clause, the subquery is basically doing this:

1
2
3
4
5
fitness=> SELECT COUNT(*) FROM hiking_stats AS hs;
 count
-------
    24
(1 row)

Which meets the definition of a scalar subquery, returning no more than one column and one row.
Nevertheless, incorrect results because COUNT(*) returns all input rows.
As we can see from the above example, it is often necessary for a subquery to match up with a value from the outer query, to return correct results

My hope is through this blog post and examples, you now know how to generate column expressions from subqueries.
I’d love to know of those interesting times you have found them beneficial or a better fit than perhaps a JOIN.
Feel free to leave any comments below.
Be sure and visit the stellar on-line PostgreSQL 10 Documentation for any related questions.


A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)
Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell is an open-source SQL database enthusiast with a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.


Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.