Conditional logic with SUM() and the IF() functions in MySQL…

Recently at work, I began hacking on a semi-personal/on-the-clock/side project, prototyping an idea I had for some “database stuff” on unfamiliar grounds.

MS Access or using it period is a foreign landscape to me as I tend to lean more towards open-source alternatives, namely MySQL (and PostgreSQL in the past). However, it is what it is so I dug in a put my SQL thinking cap on…

Note: All data, names or naming found within the database presented in this post, are strictly used for practice, learning, instruction, and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.

OS and DB used:


  • Xubuntu Linux 16.04.5 LTS (Xenial Xerus)
  • MySQL 8.0.15
Photo by Antoine Dautry on Unsplash

Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!


Due to the nature of the particular problem I wanted to solve, I knew I needed some form of conditional logic involving Aggregate functions, namely the SUM() function. What I ran into was the strange MS Access IIF() function.

To my knowledge, there is no CASE expression in MS Access (although there is a sort of switch construct I’ve yet to explore).

To get a better handle on the required logic I needed, during off hours, I used my favorite SQL dialect and began to wrap my head around the best approach.

Having previously written, Understanding MySQL’s IF() function – with examples, I knew enough about it to start from there since I would eventually have to reproduce (or as close as I could) the same functionality in the MS Access world.

Below are some discoveries along the way to getting something working in MySQL and then being able to translate what I learned there, over to the MS Access ecosystem.

Here is the setup of tables. I have a ‘worker’ table with ‘name’, ‘id’ and ‘pay_rate’ columns, used in conjunction with a ‘worker_hours’ table for logging hours worked with the accompanying id for the worker. I mainly use this setup for an understanding of calculating rates and hours. To be crystal clear, I work with and handle Pipeline Survey data. Not HR-type data (E.g., employee pay, hours, rates, etc…). However, I wanted to get a handle on this type of functionality in case I became (my dream and hopes) the ‘Database Person’ asked to implement something along these lines. (To Note: This probably is not the best-normalized structure but serves for the purpose of this post and learning – for me at least.)

Below let’s look at the structure and data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mysql> DESC workers;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id         | int(11)      | YES  |     | NULL    |       |
| first_name | varchar(25)  | YES  |     | NULL    |       |
| pay_rate   | decimal(4,2) | YES  |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM workers;
+------+------------+----------+
| id   | first_name | pay_rate |
+------+------------+----------+
|    2 | Mary       |    12.00 |
|    3 | Bob        |     8.00 |
|    1 | Jim        |    10.00 |
+------+------------+----------+
3 rows in set (0.00 sec)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
mysql> DESC worker_hours;
+--------------+--------------+------+-----+---------+-------+
| Field        | Type         | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| worker_id    | int(11)      | YES  |     | NULL    |       |
| hours_worked | decimal(4,2) | YES  |     | NULL    |       |
+--------------+--------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM worker_hours;
+-----------+--------------+
| worker_id | hours_worked |
+-----------+--------------+
|         1 |        10.00 |
|         1 |        10.00 |
|         1 |         9.50 |
|         2 |        12.00 |
|         2 |        12.00 |
|         2 |        14.00 |
|         2 |        10.00 |
|         3 |        12.00 |
|         3 |        12.00 |
|         3 |        12.00 |
|         3 |         4.00 |
+-----------+--------------+
11 rows in set (0.00 sec)

I total the hours per worker with the SUM() aggregate function, so pretty straight-forward there:

1
2
3
4
5
6
7
8
9
10
11
mysql> SELECT worker_id, SUM(hours_worked)
    -> FROM worker_hours
    -> GROUP BY worker_id;
+-----------+-------------------+
| worker_id | SUM(hours_worked) |
+-----------+-------------------+
|         1 |             29.50 |
|         2 |             48.00 |
|         3 |             40.00 |
+-----------+-------------------+
3 rows in set (0.00 sec)

Then, use an INNER JOIN on the ‘workers’ table and include the ‘first_name’ of each worker. INNER JOIN‘s are a staple and often, a necessity when querying multiple tables.

Not familiar with them? Then you should have a look at this specific blog post I wrote for information on them.

Basically, I am after a result set like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
mysql> SELECT w.first_name, SUM(wh.hours_worked)
    -> FROM workers AS w
    -> INNER JOIN worker_hours AS wh
    -> ON(w.id = wh.worker_id)
    -> GROUP BY w.first_name;
+------------+----------------------+
| first_name | SUM(wh.hours_worked) |
+------------+----------------------+
| Jim        |                29.50 |
| Mary       |                48.00 |
| Bob        |                40.00 |
+------------+----------------------+
3 rows in set (0.03 sec)

That’s wonderful and all, but, the workers must be paid accordingly to their total hours, which is is where the conditional logic comes in. If the total (SUM()) hours worked is more than 40, the ‘pay_rate’ is calculated at time and a half (at least in the US to my knowledge). Pretty simple right?

Let’s try this:

1
2
3
4
5
6
7
8
9
10
SELECT
w.first_name,
SUM(wh.hours_worked) AS total_hours,
IF(SUM(wh.hours_worked) >= 40, w.pay_rate * 40, w.pay_rate * wh.hours_worked) AS regular_pay,
IF((SUM(wh.hours_worked) - 40) > 0, w.pay_rate * 1.5 * (wh.hours_worked - 40), 0) AS overtime_pay
FROM workers AS w
INNER JOIN worker_hours AS wh
ON(w.id = wh.worker_id)
GROUP BY w.first_name;
ERROR 1055 (42000): Expression #3 of SELECT list is not in GROUP BY clause and contains non-aggregated column 'practice.w.pay_rate' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

Oh duh. Silly me. I need to include those IF() columns in the GROUP BY clause:

1
2
3
4
5
6
7
8
9
10
11
12
SELECT
w.first_name,
SUM(wh.hours_worked) AS total_hours,
IF(SUM(wh.hours_worked) >= 40, w.pay_rate * 40, w.pay_rate * wh.hours_worked) AS regular_pay,
IF((SUM(wh.hours_worked) - 40) > 0, w.pay_rate * 1.5 * (wh.hours_worked - 40), 0) AS overtime_pay
FROM workers AS w
INNER JOIN worker_hours AS wh
ON(w.id = wh.worker_id)
GROUP BY w.first_name,
IF(SUM(wh.hours_worked) >= 40, w.pay_rate * 40, w.pay_rate * wh.hours_worked),
IF((SUM(wh.hours_worked) - 40) > 0, w.pay_rate * 1.5 * (wh.hours_worked - 40), 0);
ERROR 1111 (HY000): Invalid use of group function

I am a blithering idiot! Of course, I need to place the IF() inside the SUM() function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT
w.first_name,
w.pay_rate,
SUM(wh.hours_worked) AS total_hours,
SUM(IF(wh.hours_worked >= 40, ROUND(w.pay_rate * 40, 2), ROUND(w.pay_rate * wh.hours_worked, 2))) AS regular_pay,
SUM(IF((wh.hours_worked - 40) > 0, w.pay_rate * 1.5 * (wh.hours_worked - 40), 0)) AS overtime_pay
FROM workers AS w
INNER JOIN worker_hours AS wh
ON(w.id = wh.worker_id)
GROUP BY w.first_name, w.pay_rate;
+------------+----------+-------------+-------------+--------------+
| first_name | pay_rate | total_hours | regular_pay | overtime_pay |
+------------+----------+-------------+-------------+--------------+
| Jim        |    10.00 |       29.50 |      295.00 |      0.00000 |
| Mary       |    12.00 |       48.00 |      576.00 |      0.00000 |
| Bob        |     8.00 |       40.00 |      320.00 |      0.00000 |
+------------+----------+-------------+-------------+--------------+
3 rows in set (0.00 sec)

Well now, this is a start. However, still not correct. The ‘regular_pay’ column calculation seems to work fine for hours less than or equal to 40 but not for those greater than 40.

Even slightly changing the conditional logic for the overtime hours still did not produce correct calculations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT
w.first_name,
w.pay_rate,
SUM(wh.hours_worked) AS total_hours,
SUM(IF(wh.hours_worked >= 40, ROUND(w.pay_rate * 40, 2), ROUND(w.pay_rate * wh.hours_worked, 2))) AS regular_pay,
SUM(IF(wh.hours_worked > 40, w.pay_rate * 1.5 * (wh.hours_worked - 40), 0)) AS overtime_pay
FROM workers AS w
INNER JOIN worker_hours AS wh
ON(w.id = wh.worker_id)
GROUP BY w.first_name, w.pay_rate;
+------------+----------+-------------+-------------+--------------+
| first_name | pay_rate | total_hours | regular_pay | overtime_pay |
+------------+----------+-------------+-------------+--------------+
| Jim        |    10.00 |       29.50 |      295.00 |      0.00000 |
| Mary       |    12.00 |       48.00 |      576.00 |      0.00000 |
| Bob        |     8.00 |       40.00 |      320.00 |      0.00000 |
+------------+----------+-------------+-------------+--------------+
3 rows in set (0.01 sec)

While tackling this problem, something just kept bugging me about how I was wrapping the IF() function with SUM(). It did not seem to be calculating results like it should.

Let’s look at a meaningful, base query real quick and get our mental juices flowing:

1
2
3
4
5
6
7
8
9
10
11
mysql> SELECT worker_id, SUM(hours_worked) AS total_hours
    -> FROM worker_hours
    -> GROUP BY worker_id;
+-----------+-------------+
| worker_id | total_hours |
+-----------+-------------+
|         1 |       29.50 |
|         2 |       48.00 |
|         3 |       40.00 |
+-----------+-------------+
3 rows in set (0.00 sec)

How could I use this results set to my advantage? Let’s explore a couple of different options I tried with great success.

Up first, using a Derived Table. With this option, the above query will essentially become the table named in the INNER JOIN:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
SELECT
w.first_name,
w.pay_rate,
tab_1.total_hours AS total_hours,
IF(tab_1.total_hours >= 40, ROUND(w.pay_rate * 40, 2), ROUND(w.pay_rate * tab_1.total_hours, 2)) AS regular_pay,
IF(tab_1.total_hours > 40, ROUND(w.pay_rate * 1.5 * (tab_1.total_hours - 40), 2), 0) AS overtime_pay
FROM workers AS w
INNER JOIN (SELECT worker_id,
SUM(hours_worked) AS total_hours
FROM worker_hours
GROUP BY worker_id) AS tab_1
ON w.id = tab_1.worker_id
GROUP BY w.first_name, w.pay_rate, tab_1.total_hours;
+------------+----------+-------------+-------------+--------------+
| first_name | pay_rate | total_hours | regular_pay | overtime_pay |
+------------+----------+-------------+-------------+--------------+
| Mary       |    12.00 |       48.00 |      480.00 |       144.00 |
| Bob        |     8.00 |       40.00 |      320.00 |         0.00 |
| Jim        |    10.00 |       29.50 |      295.00 |         0.00 |
+------------+----------+-------------+-------------+--------------+
3 rows in set (0.00 sec)

Finally! Correct results!

Next, I will use something a bit more interesting. A CTE. Mentioning CTE’s, have a look at CTE’s in MySQL – First steps…, where I covered a similar type of functionality as to that we will see below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
WITH tab_1 AS (
SELECT worker_id,
SUM(hours_worked) AS total_hours
FROM worker_hours
GROUP BY worker_id
)

SELECT
w.first_name,
w.pay_rate,
tab_1.total_hours AS total_hours,
IF(tab_1.total_hours >= 40, ROUND(w.pay_rate * 40, 2), ROUND(w.pay_rate * tab_1.total_hours, 2)) AS regular_pay,
IF(tab_1.total_hours > 40, ROUND(w.pay_rate * 1.5 * (tab_1.total_hours - 40), 2), 0) AS overtime_pay
FROM workers AS w
INNER JOIN tab_1
ON w.id = tab_1.worker_id
GROUP BY w.first_name, w.pay_rate, tab_1.total_hours;
+------------+----------+-------------+-------------+--------------+
| first_name | pay_rate | total_hours | regular_pay | overtime_pay |
+------------+----------+-------------+-------------+--------------+
| Mary       |    12.00 |       48.00 |      480.00 |       144.00 |
| Bob        |     8.00 |       40.00 |      320.00 |         0.00 |
| Jim        |    10.00 |       29.50 |      295.00 |         0.00 |
+------------+----------+-------------+-------------+--------------+
3 rows in set (0.01 sec)

To be crystal clear, to my knowledge, CTE’s are not available in MS Access. So I am not saying (or trying) to implement them there. I only included the CTE here as a sort of alternative to the Derived Table.

I suppose the most important question is why did this work? In my opinion, by providing a table column that already contained the aggregated total, summed values – via the ‘tab_1.total_hours’ column alias – enabled the IF() function to evaluate the conditional logic with correct computations. However, I am not 100% sure about this. But, it seems like intermingling the calls to IF() and SUM() together, were ‘wrecking’ the ‘overtime_pay’ calculations. Any feedback (and always) corrections in the comments are a treasure for me so I would love to see them from any readers.

Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!

Explore the official MySQL 8.0 Online Manual for more information.

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Advertisements

Code snippet syntax highlighting in Gutenberg via shortcodes – Continued learning…

With Gutenberg coming full-on to WordPress core soon, I’ve committed to try to learn all I can about it. I may as well face the fact that I am going to have to use it. Honestly, so far, I have not found it to be all that bad.

Learning Gutenberg also? Me too! Keep reading…

Getting a handle on Gutenberg… slowly.

With blogging, I am always on the search for ways to make the whole process easier and more proficient. One thing I discovered long ago, the WordPress default editor leaves a lot to be desired. In my humble opinion at least. I believe it is named the TinyMCE editor but don’t sell the farm on that. That’s how much time I spend (or try not to spend) using it.

Continue reading >>>