PostgreSQL TRIGGERS and \copy working in tandem.

After coming off of an injury-ridden June, July turned out to be a more promising (except the sweltering Louisiana heat) month with me getting in plenty of solid hikes as I try to stay active and maintain better weight. I also began tracking my hiking stats a little differently. This blog post will cover how I bulk-uploaded them into my database at months end…


multiple-stacks-of-cut-and-stacked-firewood

Photo by Lastly Creative on Unsplash



OS and DB used:

  • Xubuntu Linux 16.04.5 LTS (Xenial Xerus)
  • PostgreSQL 10.5


The first part of the month, I continued to individually INSERT each days hiking stats in my fitness DB. Then, I decided to store the remaining half of the months, in a CSV file.
Here are the contents of that file (viewed in text mode):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
day_walked, cal_burned, miles_walked, duration, mph, additonal_weight, weight_amount, trekking_poles, shoe_id, trail_id
2018-07-12, 325.9, 3.28, 01:01:20, 3.2, true, 3.0, true, 4, 7
2018-07-13, 416.2, 4.44, 01:30:12, 3.0, false, 0.0, false, 4, 4
2018-07-15, 382.9, 3.91, 01:12:03, 3.3, true, 1.5, true, 4, 5
2018-07-16, 368.6, 3.72, 01:09:22, 3.2, true, 1.5, true, 4, 7
2018-07-17, 339.4, 3.46, 01:03:52, 3.3, true, 1.5, true, 4, 7
2018-07-18, 368.1, 3.72, 01:08:28, 3.3, true, 1.5, true, 4, 7
2018-07-19, 339.2, 3.44, 01:03:06, 3.3, true, 1.5, true, 4, 7
2018-07-21, 93.1, 0.87, 00:17:18, 3.0, false, 0.0, false, 4, 4
2018-07-22, 378.3, 3.76, 01:10:22, 3.2, true, 1.5, true, 4, 7
2018-07-23, 322.9, 3.28, 01:00:03, 3.3, true, 1.5, true, 4, 7
2018-07-24, 386.4, 3.81, 01:11:53, 3.2, true, 1.5, true, 4, 7
2018-07-25, 379.9, 3.83, 01:10:39, 3.3, true, 1.5, true, 4, 7
2018-07-27, 378.3, 3.73, 01:10:21, 3.2, true, 1.5, true, 4, 7
2018-07-28, 337.4, 3.39, 01:02:45, 3.2, true, 1.5, true, 4, 6
2018-07-29, 348.7, 3.50, 01:04:52, 3.2, true, 1.5, true, 4, 7
2018-07-30, 361.6, 3.69, 01:07:15, 3.3, true, 1.5, true, 4, 7
2018-07-31, 359.9, 3.66, 01:06:57, 3.3, true, 1.5, true, 4, 7

(Note: All dates prior to 2018-07-12 have been uploaded already so that’s why they are not present in the document. I told y’all I got in a good month of hiking haha.)
Reason I am switching this up is, I can use the psql \copy meta-command to bulk upload that file’s contents to a staging table at one time. (Take a look at Two handy examples of the psql \copy meta-command. I wrote for an overview of this command. It’s really cool if I don’t say so myself!!!)

My plan was, during the bulk (sorry for the lackadaisical usage of that term – this is definitely not Big Data) upload, a TRIGGER will fire, distributing the data to all the tables that make up the normalized ‘system’ of tables.
Here is a recounting of events in not so real time.

Up first, CREATE the ‘staging’ table:

1
2
3
4
5
6
7
fitness=> CREATE TABLE staging_stats(s_day_walked DATE,
fitness(> s_cal_burned NUMERIC(4,1), s_miles_walked NUMERIC(4,2),
fitness(> s_duration time without time zone,
fitness(> s_mph NUMERIC(2,1), s_additional_weight BOOLEAN,
fitness(> s_weight_amount NUMERIC(4,2), s_trekking_poles BOOLEAN,
fitness(> s_shoe_id INTEGER, s_trail_id INTEGER);
CREATE TABLE

This structure differs slightly from the original hiking_stats table, all due to the weight_amount value I am tracking from the hydration bladder pack I wear.

For safety’s sake and to ensure all goes according to plan in the upload, I’ll use a TRANSACTION in a .sql source file with the necessary PostgreSQL commands to CREATE the FUNCTION, TRIGGER, and call \copy. Also, I’ll leverage SAVEPOINT‘s should we run into any issues. That way, I can ROLLBACK to revert any undesirable outcome(s) if needed.

Below is the (somewhat busy) source file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
BEGIN;

CREATE OR REPLACE FUNCTION staging_stats_ait()
RETURNS TRIGGER AS
$$
DECLARE
    v_hike_id INTEGER;
BEGIN
INSERT INTO hiking_stats(day_walked, cal_burned, miles_walked, duration, mph, additional_weight, trekking_poles, shoe_id)
VALUES(NEW.s_day_walked, NEW.s_cal_burned, NEW.s_miles_walked, NEW.s_duration, NEW.s_mph, NEW.s_additional_weight, NEW.s_trekking_poles, NEW.s_shoe_id);
SELECT hike_id INTO STRICT v_hike_id FROM hiking_stats WHERE day_walked = NEW.s_day_walked;
INSERT INTO hiking_trail(th_id, tr_id)
VALUES(v_hike_id, NEW.s_trail_id);
BEGIN
    IF NEW.s_additional_weight IS TRUE THEN
        INSERT INTO weight(wh_id, weight_added)
        VALUES (v_hike_id, NEW.s_weight_amount);
    END IF;
    END;
    RETURN NEW;
END;
$$ LANGUAGE PLpgSQL;

CREATE TRIGGER staging_stats_ait
AFTER INSERT ON staging_stats
FOR EACH ROW
EXECUTE PROCEDURE staging_stats_ait();

SAVEPOINT first_save;

\copy staging_stats FROM '~/Practice_Data/Fitness_DB_Data/stats.csv' WITH DELIMITER ',' CSV HEADER;

SAVEPOINT second_save;


Credit where credit is due: I learned of this FUNCTION and TRIGGER naming convention from: PostgreSQL Triggers and Stored Function Basics. Do yourself a favor and read this blog post. It is excellent!


I’ll use \i to call the source file within psql:

1
2
3
4
5
6
7
fitness=> \i /path_to_folder/trig_copy.sql
BEGIN
CREATE FUNCTION
CREATE TRIGGER
SAVEPOINT
COPY 17
SAVEPOINT

Let’s verify the uploaded data, querying with this JOIN:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
fitness=> SELECT hs.hike_id, hs.day_walked, ht.th_id,
fitness-> ht.tr_id, tr.trail_id, tr.name,
fitness-> hs.shoe_id, sb.name_brand
fitness-> FROM hiking_stats AS hs
fitness-> INNER JOIN hiking_trail AS ht
fitness-> ON hs.hike_id = ht.th_id
fitness-> INNER JOIN trail_route AS tr
fitness-> ON ht.tr_id = tr.trail_id
fitness-> INNER JOIN shoe_brand AS sb
fitness-> ON hs.shoe_id = sb.shoe_id
fitness-> WHERE extract(month from hs.day_walked) = 7
fitness-> and extract(day from hs.day_walked) >= 12;
 hike_id | day_walked | th_id | tr_id | trail_id |          name          | shoe_id |    name_brand    
---------+------------+-------+-------+----------+------------------------+---------+-------------------
      91 | 2018-07-12 |    91 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
      92 | 2018-07-13 |    92 |     4 |        4 | Yard Mowing            |       4 | Oboz Sawtooth Low
      93 | 2018-07-15 |    93 |     5 |        5 | House-Power Line Route |       4 | Oboz Sawtooth Low
      94 | 2018-07-16 |    94 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
      95 | 2018-07-17 |    95 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
      96 | 2018-07-18 |    96 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
      97 | 2018-07-19 |    97 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
      98 | 2018-07-21 |    98 |     4 |        4 | Yard Mowing            |       4 | Oboz Sawtooth Low
      99 | 2018-07-22 |    99 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
     100 | 2018-07-23 |   100 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
     101 | 2018-07-24 |   101 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
     102 | 2018-07-25 |   102 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
     103 | 2018-07-27 |   103 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
     104 | 2018-07-28 |   104 |     6 |        6 | Sandy Trail-Drive      |       4 | Oboz Sawtooth Low
     105 | 2018-07-29 |   105 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
     106 | 2018-07-30 |   106 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
     107 | 2018-07-31 |   107 |     7 |        7 | West Boundary          |       4 | Oboz Sawtooth Low
(17 rows)

Note: I filtered this query for dates greater than the 12th since any days prior were already in the database.

I’ll also view the weight amount carried and confirm it is correct with this query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
fitness=> SELECT hs.hike_id, hs.day_walked, w.wh_id, w.weight_added
fitness-> FROM hiking_stats AS hs
fitness-> INNER JOIN weight AS w
fitness-> ON hs.hike_id = w.wh_id
fitness-> WHERE hs.additional_weight IS TRUE
fitness-> AND extract(month from hs.day_walked) = 7
fitness-> AND extract(day from hs.day_walked) >= 12;
 hike_id | day_walked | wh_id | weight_added
---------+------------+-------+--------------
      91 | 2018-07-12 |    91 |         3.00
      93 | 2018-07-15 |    93 |         1.50
      94 | 2018-07-16 |    94 |         1.50
      95 | 2018-07-17 |    95 |         1.50
      96 | 2018-07-18 |    96 |         1.50
      97 | 2018-07-19 |    97 |         1.50
      99 | 2018-07-22 |    99 |         1.50
     100 | 2018-07-23 |   100 |         1.50
     101 | 2018-07-24 |   101 |         1.50
     102 | 2018-07-25 |   102 |         1.50
     103 | 2018-07-27 |   103 |         1.50
     104 | 2018-07-28 |   104 |         1.50
     105 | 2018-07-29 |   105 |         1.50
     106 | 2018-07-30 |   106 |         1.50
     107 | 2018-07-31 |   107 |         1.50
(15 rows)

This query targets the staging_stats table, confirming the above rows with dates and weight amounts:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
fitness=> SELECT s_day_walked, s_weight_amount
fitness-> FROM staging_stats
fitness-> WHERE s_additional_weight IS TRUE;
 s_day_walked | s_weight_amount
--------------+-----------------
 2018-07-12   |            3.00
 2018-07-15   |            1.50
 2018-07-16   |            1.50
 2018-07-17   |            1.50
 2018-07-18   |            1.50
 2018-07-19   |            1.50
 2018-07-22   |            1.50
 2018-07-23   |            1.50
 2018-07-24   |            1.50
 2018-07-25   |            1.50
 2018-07-27   |            1.50
 2018-07-28   |            1.50
 2018-07-29   |            1.50
 2018-07-30   |            1.50
 2018-07-31   |            1.50
(15 rows)

So far so good, but since all of this was encapsulated in a TRANSACTION, nothing has officially been committed just yet and we are currently at SAVEPOINT second_save. I’ll make this entire operation permanent by issuing COMMIT:

1
2
fitness=> COMMIT;
COMMIT

I’ll do my best to describe what this TRIGGER is doing in layman’s terms:

  • An INSERT on table staging_stats causes the TRIGGER to fire since it is defined of an AFTER INSERT ‘variety’.
  • A TRIGGER provides many special variables. One of them I use extensively to make this work, is the NEW variable. Here is a description of it from the documentation:

    “Data type RECORD; variable holding the new database row for INSERT/UPDATE operations in row-level triggers. This variable is unassigned in statement-level triggers and for DELETE operations.”

    This variable is key to move over the new data to the normalized database tables for final storage. Taken care of by the INSERT‘s on table hiking_stats.

  • The line:
    1
        SELECT hike_id INTO STRICT v_hike_id FROM hiking_stats WHERE day_walked = NEW.s_day_walked;

    provides us the the correct hike_id (from table hiking_stats) and compares it with the day_walked date value to that same date value present in the NEW.s_day_walked column from table staging_stats. Storing this value in variable v_hike_id, we have it available to use for INSERT in other tables where needed.

  • Speaking of, we use it next by inserting it and the available NEW.s_trail_id column value from table staging_stats.
  • The IF conditional block that immediately follows acts only if the NEW.s_additional_weight amount is TRUE. Meaning I wore the hydration pack on that day’s hike. Therefore, that weight amount stored in variable NEW.s_weight_amount is stored in table weight with the accompanying hike_id. While I feel there may be a better way to handle this particular value, I still need to find it. Since this value will change in the future as I do plan to add weight as my strength and endurance develop.
  • To finish off this overview, all of these events happen for each inserted row in table staging_stats as defined in the CREATE TRIGGER function.

This blog post is my take on one way to accomplish this type of bulk upload for a normalized database with many tables. While I am sure there are more proficient alternatives, this one works pretty well in my personal development/learning environment. However, it’s likely not without flaws. See anything that can be improved? Feel free to dissect it and provide any tips you feel are better. As always, many thanks for reading!!

Be sure and visit the stellar on-line PostgreSQL 10 Documentation for any related questions.


A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)
Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.


Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided are performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Advertisements

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.