Triple P threat to all data files!!! Hahahaha… Now that I have myself collected and together, in this blog post I will utilize Python, Psycopg2, and PostgreSQL for a bulk upload, importing data from a CSV
file to a database table.
Note: All data, names or naming found within the database presented in this post, are strictly used for practice, learning, instruction, and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.
I will be using Xubuntu Linux 16.04.2 LTS (Xenial Xerus), Python3.6, PostgreSQL 9.6.4, and psycopg2 for these exercises.
Squeezing and constricting…
Having performed a wholesale insert with the PostgreSQL COPY
command, server-side, (see this blog post) we will now look at a similar upload using Python3.6 and psycopg2.
Luckily, the Daily_Pipe_Tally_python.csv
file is ready for use (giving sed
a rest) as shown below:
pipe_id,kind,pipe_jt_num,pipe_heat,pipe_length,pipe_wall_thickness,degree
Joint-1849A,PIPE_JOINT,14525,11145,19.3,0.547,0
Joint-22011,PIPE_JOINT,858A,6585,14.55,0.547,0
Joint-33899,BEND,4546,3333CC4,6.66,0.547,15.45
Joint-66525,BEND,369874PP,89874,10.45,0.547,8.25
Joint-11996,BEND,15458,555522,8.25,0.547,14.75
Joint-33912,BEND,965412300,325877,9.65,0.547,33.5
Joint-55122,PIPE_JOINT,UI-4545,985474,15.98,0.547,0
Joint-3896,PIPE_JOINT,17899L,14145,12,0.547,0
Joint-9872,PIPE_JOINT,363621,54888911,17.11,0.547,0
Joint-93114,PIPE_JOINT,20233,77TR-33-A,13.36,0.547,0
Joint-56B12,PIPE_JOINT,17P-55,1456322,9.88,0.547,0
Joint-0110,PIPE_JOINT,369777,789888,14.12,0.547,0
Joint-4849A,PIPE_JOINT,71344RT,4569,19.3,0.547,0
Joint-22000,PIPE_JOINT,11-A44,444474,14.55,0.547,0
Joint-33833,BEND,87474,25256,6.66,0.547,22.5
Joint-66666,BEND,993930,218222,10.45,0.547,16
Joint-11901,BEND,1-88A,A8969,8.25,0.547,4.75
Joint-33903,BEND,9191,363-C,9.65,0.547,45
Joint-56122,PIPE_JOINT,44566,9632,15.98,0.547,0
Joint-3891,PIPE_JOINT,333B1,987453211,12,0.547,0
Joint-9873,PIPE_JOINT,9844,AA-74,17.11,0.547,0
Joint-9333,PIPE_JOINT,88722,66-34TR,13.36,0.547,0
Joint-56A1,PIPE_JOINT,4949,17O11,9.88,0.547,0
Joint-0101,PIPE_JOINT,9043-C,19081,14.12,0.547,0
Joint-454,BEND,8989,2022,9.33,0.587,17
Joint-991,PIPE_JOINT,1191,891A,12.98,0.547,0
Joint-190A,PIPE_JOINT,2121A,991CT,13.01,0.547,0
One option of performing this operation using Python and psycopg2 is present in the copy_command.py
script file depicted in the image below:
Lines 3 and 4 are import statements. The psycopg2 module is imported along with constants.py
, a user-defined Python module, that shares the same current directory as our example working file.
At line 7 a string variable named staging_path
is assigned to the absolute path of the CSV
data file being used for this upload. This string is prefixed with an r
meaning it’s a raw
string. Python raw
strings allow the backslash character to be treated as a literal character, where normally, this character is used for C style escaping. Linux, Unix, and Unix-like OS’s use the forward slash (/
) as their file hierarchy delimiter, while other OS’s use the backslash. I feel it’s good practice to make this variable a raw
string, allowing the script to be portable, should it be used on a different OS other than Linux or Unix.
The string variable, SQL
, on line 9 is assigned the psycopg2
usable SQL statement for the COPY
command. Notice the FROM STDIN
contained in the string, which is required by the pyscopg2 copy_expert()
method called on line 26.
At line 13 the connect_str
variable is accessed via the dot operator (.), from that constants.py
file I imported, which is passed into the variable conn
(short for connection). The psycopg2
connect()
function starts up a database session and returns a connection.
The connect_str
variable encapsulates all of the server-side connection/login information (ie: database name, port #, host address, username, and password) that is required to connect to the PostgreSQL server. That assignment could look similar to the following example:
connect_str = "host=127.0.0.1 dbname=db_to_connect_to user=user_name_here port=port_number_here password=password_here"
The cursor
variable on line 15 is an object of the psycopg2 cursor()
method which is used to execute database commands. The cursor()
method is responsible for terminating transactions with either the commit()
or rollback()
methods as well.
All of the connectivity operations are within a try/except
block to help with catching, reporting, and troubleshooting failed connection errors/issues should any arise.
The main()
function defined on line 22 contains another try/except
block that houses all of the COPY
command functionality executed by the psycopg2
module.
Let’s dig into the copy_expert()
method used in this block. Its syntax is as follows:
copy_expert(sql,file)
This method takes at least two parameters, sql
and file
.
sql
: TheCOPY
statement to execute.- For exporting (
COPY TO
operations) a table to a file-object thesql
statement should have the form ofCOPY table TO STDOUT
. - For importing (
COPY FROM
operations) the contents from a file-object to a table thesql
statement should have the form ofCOPY table FROM STDIN
.
- For exporting (
file
: A file-like object to read or write.
Lines 25 through 28 are where the actual work is being done.
Line 25 demonstrates using a context manager to open the Daily_Pipe_Tally_python.csv
file via the staging_path
variable. Using the with
statement on the open()
function, Python will make sure and close this open file descriptor resource, when execution leaves this block of code. Yay and thanks, Python!
Line 26 is using the connected cursor
object to call the copy_expert()
method which has variables SQL
and f
passed into it for parameters.
This operation is inserting data into the staging_assets table, hence the commit()
method needs to be called by the conn
object to complete the transaction.
Following best practices, I then call the close()
method on line 28 on the cursor
object to close off or sever that connection and release those resources being used by this connection, back to the operating system. This connection is now unusable and will raise errors if any operations attempt to use it.
Finally, the main()
function is called at line 35 and will be executed when the copy_command.py
script is executed in the terminal.
Upload away…
Saving the copy_command.py
file, I then execute it in the Linux terminal:
(postgresfab) bigdaddy@LE2:~/Fab_Py_PostFiles$ python3.6 copy_command.py
Note: Prior to executing the script, make sure you have executable privileges for it. Also, that your database server is started, up, and running on your system.
No errors were returned so I will verify the count of records in the staging_assets table to confirm these 30 records were inserted:
count
-------
45
(1 row)
Previously holding only 15 records, now with a total of 45, the psycopg2 copy_expert()
method executed successfully.
Querying the staging_assets table for the columns for 'Joint-56122'
returns all data for that asset, again confirming the operation was a success.
FROM staging_assets
WHERE pipe_id = 'Joint-56122';
pipe_id | kind | pipe_jt_num | pipe_heat | pipe_length | pipe_wall_thickness | degree
-------------+------------+-------------+-----------+-------------+---------------------+--------
Joint-56122 | PIPE_JOINT | 44566 | 9632 | 15.98 | 0.547 | 0.00
(1 row)
For a full rundown and any inquiries you may have about psycopg2, see their excellent documentation.
Although brief, interactions with Psycopg2 have piqued my interest in further exploring its application and uses. I hope you follow along, as future blog posts will detail my learning and discovery.
Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.
To receive notifications for the latest post from “Digital Owl’s Prose” via email, please subscribe by clicking the ‘Click To Subscribe!’ button in the sidebar!
Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided are performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.