After coming off of an injury-ridden June, July turned out to be a more promising (except the sweltering Louisiana heat) month with me getting in plenty of solid hikes as I try to stay active and maintain better weight. I also began tracking my hiking stats a little differently. This blog post will cover how I bulk-uploaded them into my database at months end…

Photo by Lastly Creative on Unsplash
OS and DB used:
- Xubuntu Linux 16.04.5 LTS (Xenial Xerus)
- PostgreSQL 10.5
The first part of the month, I continued to individually INSERT
each days hiking stats in my fitness DB. Then, I decided to store the remaining half of the months, in a CSV
file.
Here are the contents of that file (viewed in text mode):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | day_walked, cal_burned, miles_walked, duration, mph, additonal_weight, weight_amount, trekking_poles, shoe_id, trail_id 2018-07-12, 325.9, 3.28, 01:01:20, 3.2, true, 3.0, true, 4, 7 2018-07-13, 416.2, 4.44, 01:30:12, 3.0, false, 0.0, false, 4, 4 2018-07-15, 382.9, 3.91, 01:12:03, 3.3, true, 1.5, true, 4, 5 2018-07-16, 368.6, 3.72, 01:09:22, 3.2, true, 1.5, true, 4, 7 2018-07-17, 339.4, 3.46, 01:03:52, 3.3, true, 1.5, true, 4, 7 2018-07-18, 368.1, 3.72, 01:08:28, 3.3, true, 1.5, true, 4, 7 2018-07-19, 339.2, 3.44, 01:03:06, 3.3, true, 1.5, true, 4, 7 2018-07-21, 93.1, 0.87, 00:17:18, 3.0, false, 0.0, false, 4, 4 2018-07-22, 378.3, 3.76, 01:10:22, 3.2, true, 1.5, true, 4, 7 2018-07-23, 322.9, 3.28, 01:00:03, 3.3, true, 1.5, true, 4, 7 2018-07-24, 386.4, 3.81, 01:11:53, 3.2, true, 1.5, true, 4, 7 2018-07-25, 379.9, 3.83, 01:10:39, 3.3, true, 1.5, true, 4, 7 2018-07-27, 378.3, 3.73, 01:10:21, 3.2, true, 1.5, true, 4, 7 2018-07-28, 337.4, 3.39, 01:02:45, 3.2, true, 1.5, true, 4, 6 2018-07-29, 348.7, 3.50, 01:04:52, 3.2, true, 1.5, true, 4, 7 2018-07-30, 361.6, 3.69, 01:07:15, 3.3, true, 1.5, true, 4, 7 2018-07-31, 359.9, 3.66, 01:06:57, 3.3, true, 1.5, true, 4, 7 |
(Note: All dates prior to 2018-07-12 have been uploaded already so that’s why they are not present in the document. I told y’all I got in a good month of hiking haha.)
Reason I am switching this up is, I can use the psql \copy
meta-command to bulk upload that file’s contents to a staging table at one time. (Take a look at Two handy examples of the psql \copy meta-command. I wrote for an overview of this command. It’s really cool if I don’t say so myself!!!)
My plan was, during the bulk (sorry for the lackadaisical usage of that term – this is definitely not Big Data) upload, a TRIGGER
will fire, distributing the data to all the tables that make up the normalized ‘system’ of tables.
Here is a recounting of events in not so real time.
Up first, CREATE
the ‘staging’ table:
1 2 3 4 5 6 7 | fitness=> CREATE TABLE staging_stats(s_day_walked DATE, fitness(> s_cal_burned NUMERIC(4,1), s_miles_walked NUMERIC(4,2), fitness(> s_duration time without time zone, fitness(> s_mph NUMERIC(2,1), s_additional_weight BOOLEAN, fitness(> s_weight_amount NUMERIC(4,2), s_trekking_poles BOOLEAN, fitness(> s_shoe_id INTEGER, s_trail_id INTEGER); CREATE TABLE |
This structure differs slightly from the original hiking_stats
table, all due to the weight_amount
value I am tracking from the hydration bladder pack I wear.
For safety’s sake and to ensure all goes according to plan in the upload, I’ll use a TRANSACTION
in a .sql
source file with the necessary PostgreSQL commands to CREATE
the FUNCTION
, TRIGGER
, and call \copy
. Also, I’ll leverage SAVEPOINT
‘s should we run into any issues. That way, I can ROLLBACK
to revert any undesirable outcome(s) if needed.
Below is the (somewhat busy) source file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | BEGIN; CREATE OR REPLACE FUNCTION staging_stats_ait() RETURNS TRIGGER AS $$ DECLARE v_hike_id INTEGER; BEGIN INSERT INTO hiking_stats(day_walked, cal_burned, miles_walked, duration, mph, additional_weight, trekking_poles, shoe_id) VALUES(NEW.s_day_walked, NEW.s_cal_burned, NEW.s_miles_walked, NEW.s_duration, NEW.s_mph, NEW.s_additional_weight, NEW.s_trekking_poles, NEW.s_shoe_id); SELECT hike_id INTO STRICT v_hike_id FROM hiking_stats WHERE day_walked = NEW.s_day_walked; INSERT INTO hiking_trail(th_id, tr_id) VALUES(v_hike_id, NEW.s_trail_id); BEGIN IF NEW.s_additional_weight IS TRUE THEN INSERT INTO weight(wh_id, weight_added) VALUES (v_hike_id, NEW.s_weight_amount); END IF; END; RETURN NEW; END; $$ LANGUAGE PLpgSQL; CREATE TRIGGER staging_stats_ait AFTER INSERT ON staging_stats FOR EACH ROW EXECUTE PROCEDURE staging_stats_ait(); SAVEPOINT first_save; \copy staging_stats FROM '~/Practice_Data/Fitness_DB_Data/stats.csv' WITH DELIMITER ',' CSV HEADER; SAVEPOINT second_save; |
Credit where credit is due: I learned of this FUNCTION
and TRIGGER
naming convention from: PostgreSQL Triggers and Stored Function Basics. Do yourself a favor and read this blog post. It is excellent!
I’ll use \i
to call the source file within psql:
1 2 3 4 5 6 7 | fitness=> \i /path_to_folder/trig_copy.sql BEGIN CREATE FUNCTION CREATE TRIGGER SAVEPOINT COPY 17 SAVEPOINT |
Let’s verify the uploaded data, querying with this JOIN
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | fitness=> SELECT hs.hike_id, hs.day_walked, ht.th_id, fitness-> ht.tr_id, tr.trail_id, tr.name, fitness-> hs.shoe_id, sb.name_brand fitness-> FROM hiking_stats AS hs fitness-> INNER JOIN hiking_trail AS ht fitness-> ON hs.hike_id = ht.th_id fitness-> INNER JOIN trail_route AS tr fitness-> ON ht.tr_id = tr.trail_id fitness-> INNER JOIN shoe_brand AS sb fitness-> ON hs.shoe_id = sb.shoe_id fitness-> WHERE extract(month from hs.day_walked) = 7 fitness-> and extract(day from hs.day_walked) >= 12; hike_id | day_walked | th_id | tr_id | trail_id | name | shoe_id | name_brand ---------+------------+-------+-------+----------+------------------------+---------+------------------- 91 | 2018-07-12 | 91 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 92 | 2018-07-13 | 92 | 4 | 4 | Yard Mowing | 4 | Oboz Sawtooth Low 93 | 2018-07-15 | 93 | 5 | 5 | House-Power Line Route | 4 | Oboz Sawtooth Low 94 | 2018-07-16 | 94 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 95 | 2018-07-17 | 95 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 96 | 2018-07-18 | 96 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 97 | 2018-07-19 | 97 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 98 | 2018-07-21 | 98 | 4 | 4 | Yard Mowing | 4 | Oboz Sawtooth Low 99 | 2018-07-22 | 99 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 100 | 2018-07-23 | 100 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 101 | 2018-07-24 | 101 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 102 | 2018-07-25 | 102 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 103 | 2018-07-27 | 103 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 104 | 2018-07-28 | 104 | 6 | 6 | Sandy Trail-Drive | 4 | Oboz Sawtooth Low 105 | 2018-07-29 | 105 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 106 | 2018-07-30 | 106 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low 107 | 2018-07-31 | 107 | 7 | 7 | West Boundary | 4 | Oboz Sawtooth Low (17 rows) |
Note: I filtered this query for dates greater than the 12th since any days prior were already in the database.
I’ll also view the weight amount carried and confirm it is correct with this query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | fitness=> SELECT hs.hike_id, hs.day_walked, w.wh_id, w.weight_added fitness-> FROM hiking_stats AS hs fitness-> INNER JOIN weight AS w fitness-> ON hs.hike_id = w.wh_id fitness-> WHERE hs.additional_weight IS TRUE fitness-> AND extract(month from hs.day_walked) = 7 fitness-> AND extract(day from hs.day_walked) >= 12; hike_id | day_walked | wh_id | weight_added ---------+------------+-------+-------------- 91 | 2018-07-12 | 91 | 3.00 93 | 2018-07-15 | 93 | 1.50 94 | 2018-07-16 | 94 | 1.50 95 | 2018-07-17 | 95 | 1.50 96 | 2018-07-18 | 96 | 1.50 97 | 2018-07-19 | 97 | 1.50 99 | 2018-07-22 | 99 | 1.50 100 | 2018-07-23 | 100 | 1.50 101 | 2018-07-24 | 101 | 1.50 102 | 2018-07-25 | 102 | 1.50 103 | 2018-07-27 | 103 | 1.50 104 | 2018-07-28 | 104 | 1.50 105 | 2018-07-29 | 105 | 1.50 106 | 2018-07-30 | 106 | 1.50 107 | 2018-07-31 | 107 | 1.50 (15 rows) |
This query targets the staging_stats
table, confirming the above rows with dates and weight amounts:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | fitness=> SELECT s_day_walked, s_weight_amount fitness-> FROM staging_stats fitness-> WHERE s_additional_weight IS TRUE; s_day_walked | s_weight_amount --------------+----------------- 2018-07-12 | 3.00 2018-07-15 | 1.50 2018-07-16 | 1.50 2018-07-17 | 1.50 2018-07-18 | 1.50 2018-07-19 | 1.50 2018-07-22 | 1.50 2018-07-23 | 1.50 2018-07-24 | 1.50 2018-07-25 | 1.50 2018-07-27 | 1.50 2018-07-28 | 1.50 2018-07-29 | 1.50 2018-07-30 | 1.50 2018-07-31 | 1.50 (15 rows) |
So far so good, but since all of this was encapsulated in a TRANSACTION
, nothing has officially been committed just yet and we are currently at SAVEPOINT second_save
. I’ll make this entire operation permanent by issuing COMMIT
:
1 2 | fitness=> COMMIT; COMMIT |
I’ll do my best to describe what this TRIGGER
is doing in layman’s terms:
- An
INSERT
on tablestaging_stats
causes theTRIGGER
to fire since it is defined of anAFTER INSERT
‘variety’. - A
TRIGGER
provides many special variables. One of them I use extensively to make this work, is theNEW
variable. Here is a description of it from the documentation:
“Data type RECORD; variable holding the new database row for INSERT/UPDATE operations in row-level triggers. This variable is unassigned in statement-level triggers and for DELETE operations.”
This variable is key to move over the new data to the normalized database tables for final storage. Taken care of by the
INSERT
‘s on tablehiking_stats
. - The line:
1SELECT hike_id INTO STRICT v_hike_id FROM hiking_stats WHERE day_walked = NEW.s_day_walked;
provides us the the correct
hike_id
(from tablehiking_stats
) and compares it with theday_walked
date value to that same date value present in theNEW.s_day_walked
column from tablestaging_stats
. Storing this value in variablev_hike_id
, we have it available to use forINSERT
in other tables where needed. - Speaking of, we use it next by inserting it and the available
NEW.s_trail_id
column value from tablestaging_stats
. - The
IF
conditional block that immediately follows acts only if theNEW.s_additional_weight
amount isTRUE
. Meaning I wore the hydration pack on that day’s hike. Therefore, that weight amount stored in variableNEW.s_weight_amount
is stored in tableweight
with the accompanyinghike_id
. While I feel there may be a better way to handle this particular value, I still need to find it. Since this value will change in the future as I do plan to add weight as my strength and endurance develop. - To finish off this overview, all of these events happen for each inserted row in table
staging_stats
as defined in theCREATE TRIGGER
function.
This blog post is my take on one way to accomplish this type of bulk upload for a normalized database with many tables. While I am sure there are more proficient alternatives, this one works pretty well in my personal development/learning environment. However, it’s likely not without flaws. See anything that can be improved? Feel free to dissect it and provide any tips you feel are better. As always, many thanks for reading!!
Be sure and visit the stellar on-line PostgreSQL 10 Documentation for any related questions.
A Call To Action!
Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.
Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.
Have I mentioned how much I love a cup of coffee?!?!
To receive email notifications (Never spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)
Be sure and visit the “Best Of” page for a collection of my best blog posts.
Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.
Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided are performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.