Postgres, Python, and Psycopg2 – executemany() method CSV upload example.

Having previously covered a couple of different ways to upload a CSV’s worth of data into a PostgreSQL table, this post will be quite similar to the second one, with a slight change in the psycopg2 method used. Visit COPY and CAST() – Bulk uploads in PostgreSQL, and Python and psycopg2 for CSV bulk uploads in PostgreSQL – with examples to get up to speed. Aside from that, read on to see the differences between the methods used…

Photo by DAVIDCOHEN on Unsplash

Note: All data, names or naming found within the database presented in this post, are strictly used for practice, learning, instruction, and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.

OS and DB used:
  • Xubuntu Linux 18.04.2 LTS (Bionic Beaver)
  • PostgreSQL 11.4
  • Python 3.7


Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!


Like previous posts in this series, I have this staging table (currently empty):

1
2
3
4
walking_stats=> TABLE stat_staging;
 day_walked | cal_burned | miles_walked | duration | mph | shoe_id
------------+------------+--------------+----------+-----+---------
(0 rows)

I wondered aloud in the previous post, about the psycopg2 executemany() method, if it provided any performance gains over execute(). What I should have done was just visited the documentation – which is where I snapped this screenshot below – that apparently answers my curiosities quite clearly (according to my interpretation):

screen shot of information from programming documentation
Can be found in the psycopg2 documentation

With that being said, it does not appear that executemany() is more efficient, so the script below mirrors that of what I used with execute() before:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import psycopg2 as pg
import csv

file = r'/my_linux_user/path_to_file/mar_2019_hiking_stats.csv'
sql_insert = """INSERT INTO stat_staging(day_walked, cal_burned, miles_walked,
                duration, mph, shoe_id)
                VALUES(%s, %s, %s, %s, %s, %s)"""

try:
    conn = pg.connect(user="my_user",
        password="my_password",
        host="127.0.0.1",
        port="5432",
        database="walking_stats")
    cursor = conn.cursor()
    with open(file, 'r') as f:
        reader = csv.reader(f)
        next(reader) # This skips the 1st row which is the header.
        cursor.executemany(sql_insert, f)
        conn.commit()
except (Exception, pg.Error) as e:
    print(e)
finally:
    if (conn):
        cursor.close()
        conn.close()
        print("Connection closed.")

However, I did run into a couple of issues, the majority of them due to lack of knowledge and focus. On the bright side, each folly is also an opportunity to learn so it is not all bad. Executing the above script in my venv returned:

1
2
3
(pg_py_database) linux_user@LE2:~/pg_py_database$ python3 database2.py
not all arguments converted during string formatting
Connection closed.

Since I was not at all familiar with this error, I put some Google-Fu to use and searched…

Partly through searching out the webs, I decided to make the ‘f’ object a tuple and changed up the line below:

1
    cursor.executemany(sql_insert, (f,))

And like that, out with the first error and in with a new one! Go figure lol!

1
2
(pg_py_database) linux_user@LE2:~/pg_py_database$ python3 database2.py
'_io.TextIOWrapper' object does not support indexing

After more scrutiny, I realized that I totally botched this important line:

1
    reader = csv.reader(f)

Duh. I assigned the csv.reader(f) object to the ‘reader’ variable. Easy fix right?

1
    cursor.executemany(sql_insert, (reader,))

Nope. I instead was provided with a new error. On a role now folks!

1
2
3
(pg_py_database) linux_user@LE2:~/pg_py_database$ python3 database2.py
'_csv.reader' object does not support indexing
Connection closed.

So I thought, “Is the csv.reader(f) object actually a tuple?”

For a shot in the dark, I made this change, just passing in the ‘reader’ variable:

1
    cursor.executemany(sql_insert, reader)

Then running the script:

1
2
(pg_py_database) linux_user@LE2:~/pg_py_database$ python3 database2.py
Connection closed.

No errors… And the ‘Connection closed’ message via the finally block.

To verify all the rows from the CSV were inserted in table ‘stat_staging’ I’ll check over in a psql session:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
walking_stats=> TABLE stat_staging;
 day_walked | cal_burned | miles_walked | duration  | mph  | shoe_id
------------+------------+--------------+-----------+------+---------
 2019-03-01 |  176.8     |  1.78        |  00:32:30 |  3.3 |  4
 2019-03-03 |  232.2     |  2.33        |  00:42:41 |  3.3 |  4
 2019-03-04 |  207.2     |  2.02        |  00:38:05 |  3.2 |  4
 2019-03-05 |  234.7     |  2.37        |  00:43:09 |  3.3 |  4
 2019-03-06 |  246.4     |  2.48        |  00:45:18 |  3.3 |  4
 2019-03-07 |  244.5     |  2.45        |  00:44:57 |  3.3 |  4
 2019-03-11 |  243.2     |  2.43        |  00:44:43 |  3.3 |  4
 2019-03-12 |  230.8     |  2.32        |  00:42:25 |  3.3 |  4
 2019-03-15 |  209.3     |  2.04        |  00:38:29 |  3.2 |  4
 2019-03-17 |  238.8     |  2.43        |  00:43:54 |  3.3 |  4
 2019-03-18 |  226.5     |  2.24        |  00:41:39 |  3.2 |  4
 2019-03-19 |  215.7     |  2.13        |  00:39:39 |  3.2 |  4
 2019-03-20 |  236.5     |  2.39        |  00:43:29 |  3.3 |  4
 2019-03-21 |  140.6     |  1.37        |  00:25:51 |  3.2 |  4
 2019-03-24 |  251.4     |  2.51        |  00:46:13 |  3.3 |  4
 2019-03-25 |  221.3     |  2.20        |  00:40:41 |  3.2 |  4
 2019-03-26 |  212.3     |  2.09        |  00:39:02 |  3.2 |  4
 2019-03-27 |  241.0     |  2.36        |  00:44:18 |  3.2 |  4
 2019-03-28 |  203.6     |  2.03        |  00:37:26 |  3.3 |  4
 2019-03-31 |  220.0     |  2.15        |  00:40:27 |  3.2 |  4
(20 rows)

And whala folks, that is how you fumble your way through error messages! (JK BTW)

So what really happened, allowing the last change to work?

According to my interpretation of the executemany() function, one of the two parameters – var_list – should be a sequence.

Consulting the documentation on a csv.reader() object, provided this (partial) passage referenced below:

“Each row read from the csv file is returned as a list of strings.”

Based on my beginner Python understanding, a list object is a sequence. Any advice, direction, or correction(s) on this from seasoned Python devs/readers are more than welcome so please comment freely below.

While this post seemed to be all over the place – or it did to me at least – hopefully, you got something out of it useful. I know I sure did.

And that is another method covered to get data loaded from a CSV file into a PostgreSQL table with Python. Check back in for more articles to come along these same lines.

Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!

Explore the official PostgreSQL 11 On-line Documentation for more information.

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Advertisements

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.