COPY and CAST() – Bulk uploads in PostgreSQL

Loading data into database tables is pretty much a necessity. Without data, what do we have? Not much at all. The CSV format is super common, used far and wide. I keep a CSV file of my daily walking/hiking stats and am looking to store them in a PostgreSQL database on my local learning/development machine. How can I load a CSV – with several rows of data – at one go in Postgres? What about data types? Any concerns there? Continue reading to see a simple, yet effective solution…

Photo by Markus Spiske on Unsplash

Note: All data, names or naming found within the database presented in this post, are strictly used for practice, learning, instruction, and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.

OS and DB used:
  • Xubuntu Linux 18.04.2 LTS (Bionic Beaver)
  • PostgreSQL 11.4


Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!


The below table will be the final stop for the data I am importing. As shown, there are several different data types present in its structure. To my knowledge, CSV files store data as strings, which poses a challenge in this case.

1
2
3
4
5
6
7
8
9
10
walking_stats=> \d stats;
                          Table "public.stats"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 day_walked   | date                   |           |          |
 cal_burned   | numeric(4,1)           |           |          |
 miles_walked | numeric(4,2)           |           |          |
 duration     | time without time zone |           |          |
 mph          | numeric(2,1)           |           |          |
 shoe_id      | integer                |           |          |

To measure my progress with fitness (or lack thereof), along with great SQL Window Function practice, I want to run analytics on over 6 months worth of walking metrics and with a potential for data loss, I need to mitigate this issue. As I previously highlight, CVS’s mostly store data as strings.

Below I have mostly a mirror image of table ‘stats’, yet all columns are of the TEXT data type:

1
2
3
4
5
6
7
8
9
10
walking_stats=# \d stat_staging;
             Table "public.stat_staging"
    Column    | Type | Collation | Nullable | Default
--------------+------+-----------+----------+---------
 day_walked   | text |           |          |
 cal_burned   | text |           |          |
 miles_walked | text |           |          |
 duration     | text |           |          |
 mph          | text |           |          |
 shoe_id      | text |           |          |

I plan to import the CSV data into this staging table, then type cast the rows as I move them over to table ‘stats’. Let’s see how.

To date, I have had more views of the post, Two handy examples of the psql \copy meta-command, than any other so far on my blog and I truly hope you read it as well. That being said, \copy is a client-side psql meta-command. Executing COPY, on the other hand, depends on server-side permissions. Therefore, since it – COPY – is part of the focus of this post, I will use the postgres user to run it.

I’ll run the below COPY command to import the records from the CSV into the ‘stat_staging’ table:

1
2
3
4
5
walking_stats=# COPY stat_staging (day_walked, cal_burned, miles_walked, duration, mph, shoe_id)
    FROM '/path_to/jan_2019_hiking_stats.csv'
DELIMITER ','
CSV HEADER;
COPY 23

Two things to note: 1) Listing out the column names as I did is optional. 2) Upon success, COPY returns the number of rows copied (23 in this example).

Querying table ‘stat_staging’, we can see all the rows of data present (Ugh! What low numbers I had in January!):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
walking_stats=# SELECT * FROM stat_staging;
 day_walked | cal_burned | miles_walked | duration  | mph  | shoe_id
------------+------------+--------------+-----------+------+---------
 2019-01-01 |  132.8     |  1.27        |  00:24:24 |  3.1 |  4
 2019-01-02 |  181.1     |  1.76        |  00:33:18 |  3.2 |  3
 2019-01-07 |  207.3     |  2.03        |  00:38:07 |  3.2 |  4
 2019-01-08 |  218.2     |  2.13        |  00:40:07 |  3.2 |  4
 2019-01-09 |  193.0     |  1.94        |  00:35:29 |  3.3 |  4
 2019-01-10 |  160.2     |  1.58        |  00:29:27 |  3.2 |  4
 2019-01-11 |  206.3     |  2.03        |  00:37:55 |  3.2 |  4
 2019-01-13 |  253.2     |  2.49        |  00:46:33 |  3.2 |  4
 2019-01-14 |  177.6     |  1.78        |  00:32:39 |  3.3 |  4
 2019-01-15 |  207.0     |  2.03        |  00:38:03 |  3.2 |  4
 2019-01-16 |  248.7     |  2.42        |  00:45:43 |  3.2 |  4
 2019-01-17 |  176.3     |  1.76        |  00:32:25 |  3.3 |  4
 2019-01-19 |  200.2     |  2.01        |  00:36:48 |  3.3 |  4
 2019-01-20 |  244.4     |  2.42        |  00:44:57 |  3.2 |  4
 2019-01-21 |  205.9     |  2.03        |  00:37:52 |  3.2 |  4
 2019-01-22 |  244.8     |  2.43        |  00:45:01 |  3.2 |  4
 2019-01-23 |  231.8     |  2.35        |  00:42:37 |  3.3 |  4
 2019-01-25 |  244.9     |  2.44        |  00:45:02 |  3.3 |  4
 2019-01-27 |  302.7     |  3.04        |  00:55:39 |  3.3 |  4
 2019-01-28 |  170.2     |  1.66        |  00:31:17 |  3.2 |  4
 2019-01-29 |  235.5     |  2.31        |  00:43:18 |  3.2 |  4
 2019-01-30 |  254.2     |  2.52        |  00:46:44 |  3.2 |  4
 2019-01-31 |  229.5     |  2.27        |  00:42:11 |  3.2 |  4
(23 rows)

Next, I’ll type cast the column data values with CAST() in an INSERT with SELECT, moving the rows from ‘stats_staging’ over to table ‘stats’:

1
2
3
4
5
6
7
8
9
10
walking_stats=# INSERT INTO stats(day_walked, cal_burned, miles_walked, duration, mph, shoe_id)
SELECT
    CAST(day_walked AS DATE),
    CAST(cal_burned AS numeric(4,1)),
    CAST(miles_walked AS numeric(4,2)),
    CAST(duration AS TIME),
    CAST(mph AS numeric(2,1)),
    CAST(shoe_id AS INTEGER)
FROM stat_staging;
INSERT 0 23

As shown in the INSERT, using the CAST() function, I set the column values to the exact same data type as those matching columns in table ‘stats’. Although a returned tag of 23 is provided upon completion of the INSERT, I’ll verify by simply querying table ‘stats’:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
walking_stats=# SELECT * FROM stats;
 day_walked | cal_burned | miles_walked | duration | mph | shoe_id
------------+------------+--------------+----------+-----+---------
 2019-01-01 |      132.8 |         1.27 | 00:24:24 | 3.1 |       4
 2019-01-02 |      181.1 |         1.76 | 00:33:18 | 3.2 |       3
 2019-01-07 |      207.3 |         2.03 | 00:38:07 | 3.2 |       4
 2019-01-08 |      218.2 |         2.13 | 00:40:07 | 3.2 |       4
 2019-01-09 |      193.0 |         1.94 | 00:35:29 | 3.3 |       4
 2019-01-10 |      160.2 |         1.58 | 00:29:27 | 3.2 |       4
 2019-01-11 |      206.3 |         2.03 | 00:37:55 | 3.2 |       4
 2019-01-13 |      253.2 |         2.49 | 00:46:33 | 3.2 |       4
 2019-01-14 |      177.6 |         1.78 | 00:32:39 | 3.3 |       4
 2019-01-15 |      207.0 |         2.03 | 00:38:03 | 3.2 |       4
 2019-01-16 |      248.7 |         2.42 | 00:45:43 | 3.2 |       4
 2019-01-17 |      176.3 |         1.76 | 00:32:25 | 3.3 |       4
 2019-01-19 |      200.2 |         2.01 | 00:36:48 | 3.3 |       4
 2019-01-20 |      244.4 |         2.42 | 00:44:57 | 3.2 |       4
 2019-01-21 |      205.9 |         2.03 | 00:37:52 | 3.2 |       4
 2019-01-22 |      244.8 |         2.43 | 00:45:01 | 3.2 |       4
 2019-01-23 |      231.8 |         2.35 | 00:42:37 | 3.3 |       4
 2019-01-25 |      244.9 |         2.44 | 00:45:02 | 3.3 |       4
 2019-01-27 |      302.7 |         3.04 | 00:55:39 | 3.3 |       4
 2019-01-28 |      170.2 |         1.66 | 00:31:17 | 3.2 |       4
 2019-01-29 |      235.5 |         2.31 | 00:43:18 | 3.2 |       4
 2019-01-30 |      254.2 |         2.52 | 00:46:44 | 3.2 |       4
 2019-01-31 |      229.5 |         2.27 | 00:42:11 | 3.2 |       4
 (23 rows)

All 23 rows are present in table ‘stats’, with their respective columns’ data configured in the correct data type format.

If you have suggestions for improvements or other issues to look out for that I may have missed or not covered, please comment them below.

Be sure and have a look at these informational resources on COPY and CAST() if you’re interested:

I have more CSV files with several months worth of walking stats I plan on uploading each one in turn in a different way. The best thing about it is, there will be a blog post coming for each of them. I plan on using PostgreSQL functionality coupled with Python for the majority of them. Do check back in to read the forthcoming posts!

Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!

Explore the official PostgreSQL 11 On-line Documentation for more information.

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Advertisements

2 thoughts on “COPY and CAST() – Bulk uploads in PostgreSQL

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.