Triple P threat to all data files!!! Hahahaha… Now that I have myself collected and together, in this blog post I will utilize Python, Psycopg2, and PostgreSQL for a bulk upload, importing data from a
CSV file to a database table.
Note: All data, names or naming found within the database presented in this post, are strictly used for practice, learning, instruction, and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.
I will be using Xubuntu Linux 16.04.2 LTS (Xenial Xerus), Python3.6, PostgreSQL 9.6.4, and psycopg2 for these exercises.
Squeezing and constricting…
Having performed a wholesale insert with the PostgreSQL
COPY command, server-side, (see this blog post) we will now look at a similar upload using Python3.6 and psycopg2.
Daily_Pipe_Tally_python.csv file is ready for use (giving
sed a rest) as shown below:
One option of performing this operation using Python and psycopg2 is present in the
copy_command.py script file depicted in the image below:
Lines 3 and 4 are import statements. The psycopg2 module is imported along with
constants.py, a user-defined Python module, that shares the same current directory as our example working file.
At line 7 a string variable named
staging_path is assigned to the absolute path of the
CSV data file being used for this upload. This string is prefixed with an
r meaning it’s a
raw string. Python
raw strings allow the backslash character to be treated as a literal character, where normally, this character is used for C style escaping. Linux, Unix, and Unix-like OS’s use the forward slash (
/) as their file hierarchy delimiter, while other OS’s use the backslash. I feel it’s good practice to make this variable a
raw string, allowing the script to be portable, should it be used on a different OS other than Linux or Unix.
The string variable,
SQL, on line 9 is assigned the
psycopg2 usable SQL statement for the
COPY command. Notice the
FROM STDIN contained in the string, which is required by the
pyscopg2 copy_expert() method called on line 26.
At line 13 the
connect_str variable is accessed via the dot operator (.), from that
constants.py file I imported, which is passed into the variable
conn (short for connection). The
connect() function starts up a database session and returns a connection.
connect_str variable encapsulates all of the server-side connection/login information (ie: database name, port #, host address, username, and password) that is required to connect to the PostgreSQL server. That assignment could look similar to the following example:
connect_str = "host=127.0.0.1 dbname=db_to_connect_to user=user_name_here port=port_number_here password=password_here"
cursor variable on line 15 is an object of the
psycopg2 cursor() method which is used to execute database commands. The
cursor() method is responsible for terminating transactions with either the
rollback() methods as well.
All of the connectivity operations are within a
try/except block to help with catching, reporting, and troubleshooting failed connection errors/issues should any arise.
main() function defined on line 22 contains another
try/except block that houses all of the
COPY command functionality executed by the
Let’s dig into the
copy_expert() method used in this block. Its syntax is as follows:
This method takes at least two parameters,
COPYstatement to execute.
- For exporting (
COPY TOoperations) a table to a file-object the
sqlstatement should have the form of
COPY table TO STDOUT.
- For importing (
COPY FROMoperations) the contents from a file-object to a table the
sqlstatement should have the form of
COPY table FROM STDIN.
- For exporting (
file: A file-like object to read or write.
Lines 25 through 28 are where the actual work is being done.
Line 25 demonstrates using a context manager to open the
Daily_Pipe_Tally_python.csv file via the
staging_path variable. Using the
with statement on the
open() function, Python will make sure and close this open file descriptor resource, when execution leaves this block of code. Yay and thanks, Python!
Line 26 is using the connected
cursor object to call the
copy_expert() method which has variables
f passed into it for parameters.
This operation is inserting data into the staging_assets table, hence the
commit() method needs to be called by the
conn object to complete the transaction.
Following best practices, I then call the
close() method on line 28 on the
cursor object to close off or sever that connection and release those resources being used by this connection, back to the operating system. This connection is now unusable and will raise errors if any operations attempt to use it.
main() function is called at line 35 and will be executed when the
copy_command.py script is executed in the terminal.
copy_command.py file, I then execute it in the Linux terminal:
(postgresfab) bigdaddy@LE2:~/Fab_Py_PostFiles$ python3.6 copy_command.py
Note: Prior to executing the script, make sure you have executable privileges for it. Also, that your database server is started, up, and running on your system.
No errors were returned so I will verify the count of records in the staging_assets table to confirm these 30 records were inserted:
Previously holding only 15 records, now with a total of 45, the psycopg2
copy_expert() method executed successfully.
Querying the staging_assets table for the columns for
'Joint-56122'returns all data for that asset, again confirming the operation was a success.
WHERE pipe_id = 'Joint-56122';
pipe_id | kind | pipe_jt_num | pipe_heat | pipe_length | pipe_wall_thickness | degree
Joint-56122 | PIPE_JOINT | 44566 | 9632 | 15.98 | 0.547 | 0.00
For a full rundown and any inquiries you may have about psycopg2, see their excellent documentation.
Although brief, interactions with Psycopg2 have piqued my interest in further exploring its application and uses. I hope you follow along, as future blog posts will detail my learning and discovery.
Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.
To receive notifications for the latest post from “Digital Owl’s Prose” via email, please subscribe by clicking the ‘Click To Subscribe!’ button in the sidebar!
Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided are performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.