Wholesale inserts with Python, Psycopg2 and PostgreSQL.

Triple P threat to all data files!!! Hahahaha… Now that I have myself collected and together, in this blog post I will utilize Python, Psycopg2, and PostgreSQL for a bulk upload, importing data from a CSV file to a database table.

Note: All data, names or naming found within the database presented in this post, are strictly used for practice, learning, instruction, and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.

I will be using Xubuntu Linux 16.04.2 LTS (Xenial Xerus), Python3.6, PostgreSQL 9.6.4, and psycopg2 for these exercises.


Squeezing and constricting…

Having performed a wholesale insert with the PostgreSQL COPY command, server-side, (see this blog post) we will now look at a similar upload using Python3.6 and psycopg2.

Luckily, the Daily_Pipe_Tally_python.csv file is ready for use (giving sed a rest) as shown below:

bigdaddy@LE2:~/Practice_Data$ cat Daily_Pipe_Tally_python_staging.csv
pipe_id,kind,pipe_jt_num,pipe_heat,pipe_length,pipe_wall_thickness,degree
Joint-1849A,PIPE_JOINT,14525,11145,19.3,0.547,0
Joint-22011,PIPE_JOINT,858A,6585,14.55,0.547,0
Joint-33899,BEND,4546,3333CC4,6.66,0.547,15.45
Joint-66525,BEND,369874PP,89874,10.45,0.547,8.25
Joint-11996,BEND,15458,555522,8.25,0.547,14.75
Joint-33912,BEND,965412300,325877,9.65,0.547,33.5
Joint-55122,PIPE_JOINT,UI-4545,985474,15.98,0.547,0
Joint-3896,PIPE_JOINT,17899L,14145,12,0.547,0
Joint-9872,PIPE_JOINT,363621,54888911,17.11,0.547,0
Joint-93114,PIPE_JOINT,20233,77TR-33-A,13.36,0.547,0
Joint-56B12,PIPE_JOINT,17P-55,1456322,9.88,0.547,0
Joint-0110,PIPE_JOINT,369777,789888,14.12,0.547,0
Joint-4849A,PIPE_JOINT,71344RT,4569,19.3,0.547,0
Joint-22000,PIPE_JOINT,11-A44,444474,14.55,0.547,0
Joint-33833,BEND,87474,25256,6.66,0.547,22.5
Joint-66666,BEND,993930,218222,10.45,0.547,16
Joint-11901,BEND,1-88A,A8969,8.25,0.547,4.75
Joint-33903,BEND,9191,363-C,9.65,0.547,45
Joint-56122,PIPE_JOINT,44566,9632,15.98,0.547,0
Joint-3891,PIPE_JOINT,333B1,987453211,12,0.547,0
Joint-9873,PIPE_JOINT,9844,AA-74,17.11,0.547,0
Joint-9333,PIPE_JOINT,88722,66-34TR,13.36,0.547,0
Joint-56A1,PIPE_JOINT,4949,17O11,9.88,0.547,0
Joint-0101,PIPE_JOINT,9043-C,19081,14.12,0.547,0
Joint-454,BEND,8989,2022,9.33,0.587,17
Joint-991,PIPE_JOINT,1191,891A,12.98,0.547,0
Joint-190A,PIPE_JOINT,2121A,991CT,13.01,0.547,0

One option of performing this operation using Python and psycopg2 is present in the copy_command.py script file depicted in the image below:

copy_py_file

Lines 3 and 4 are import statements. The psycopg2 module is imported along with constants.py, a user-defined Python module, that shares the same current directory as our example working file.

At line 7 a string variable named staging_path is assigned to the absolute path of the CSV data file being used for this upload. This string is prefixed with an r meaning it’s a raw string. Python raw strings allow the backslash character to be treated as a literal character, where normally, this character is used for C style escaping. Linux, Unix, and Unix-like OS’s use the forward slash (/) as their file hierarchy delimiter, while other OS’s use the backslash. I feel it’s good practice to make this variable a raw string, allowing the script to be portable, should it be used on a different OS other than Linux or Unix.

The string variable, SQL, on line 9 is assigned the psycopg2 usable SQL statement for the COPY command. Notice the FROM STDIN contained in the string, which is required by the pyscopg2 copy_expert() method called on line 26.

At line 13 the connect_str variable is accessed via the dot operator (.), from that constants.py file I imported, which is passed into the variable conn (short for connection). The psycopg2 connect() function starts up a database session and returns a connection.

The connect_str variable encapsulates all of the server-side connection/login information (ie: database name, port #, host address, username, and password) that is required to connect to the PostgreSQL server. That assignment could look similar to the following example:

connect_str = "host=127.0.0.1 dbname=db_to_connect_to user=user_name_here port=port_number_here password=password_here"

The cursor variable on line 15 is an object of the psycopg2 cursor() method which is used to execute database commands. The cursor() method is responsible for terminating transactions with either the commit() or rollback() methods as well.

All of the connectivity operations are within a try/except block to help with catching, reporting, and troubleshooting failed connection errors/issues should any arise.

The main() function defined on line 22 contains another try/except block that houses all of the COPY command functionality executed by the psycopg2 module.

Let’s dig into the copy_expert() method used in this block. Its syntax is as follows:

copy_expert(sql,file)

This method takes at least two parameters, sql and file.

  1. sql: The COPY statement to execute.
    • For exporting (COPY TO operations) a table to a file-object the sql statement should have the form of COPY table TO STDOUT.
    • For importing (COPY FROM operations) the contents from a file-object to a table the sql statement should have the form of COPY table FROM STDIN.
  2. file: A file-like object to read or write.

Lines 25 through 28 are where the actual work is being done.

Line 25 demonstrates using a context manager to open the Daily_Pipe_Tally_python.csv file via the staging_path variable. Using the with statement on the open() function, Python will make sure and close this open file descriptor resource, when execution leaves this block of code. Yay and thanks, Python!


Line 26 is using the connected cursor object to call the copy_expert() method which has variables SQL and f passed into it for parameters.

This operation is inserting data into the staging_assets table, hence the commit() method needs to be called by the conn object to complete the transaction.

Following best practices, I then call the close() method on line 28 on the cursor object to close off or sever that connection and release those resources being used by this connection, back to the operating system. This connection is now unusable and will raise errors if any operations attempt to use it.

Finally, the main() function is called at line 35 and will be executed when the copy_command.py script is executed in the terminal.


Upload away…

Saving the copy_command.py file, I then execute it in the Linux terminal:

(postgresfab) bigdaddy@LE2:~/Fab_Py_PostFiles$ python3.6 copy_command.py


Note: Prior to executing the script, make sure you have executable privileges for it. Also, that your database server is started, up, and running on your system.


No errors were returned so I will verify the count of records in the staging_assets table to confirm these 30 records were inserted:

fab_tracking=> SELECT COUNT(*) FROM staging_assets;
count
-------
45
(1 row)

Previously holding only 15 records, now with a total of 45, the psycopg2 copy_expert() method executed successfully.

Querying the staging_assets table for the columns for 'Joint-56122'returns all data for that asset, again confirming the operation was a success.

fab_tracking=> SELECT *
FROM staging_assets
WHERE pipe_id = 'Joint-56122';
pipe_id | kind | pipe_jt_num | pipe_heat | pipe_length | pipe_wall_thickness | degree
-------------+------------+-------------+-----------+-------------+---------------------+--------
Joint-56122 | PIPE_JOINT | 44566 | 9632 | 15.98 | 0.547 | 0.00
(1 row)

For a full rundown and any inquiries you may have about psycopg2, see their excellent documentation.


Although brief, interactions with Psycopg2 have piqued my interest in further exploring its application and uses. I hope you follow along, as future blog posts will detail my learning and discovery.


Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.


To receive notifications for the latest post from “Digital Owl’s Prose” via email, please subscribe by clicking the ‘Click To Subscribe!’ button in the sidebar!


Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided are performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.