PostgreSQL user-defined functions with PL/pgSQL for repetitive Database tasks.

Repetitive tasks can oftentimes become a pain. Why not avoid redundancy if we have the option to? In this post we will look at an example of writing our own function with the procedural language PLpgSQL, to help automate some of these mundane chores we may face daily, allowing us to be more productive in PostgreSQL.


Note: All data, names or naming found within the database presented in this post, are strictly used for practice, instruction and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.


I will be using Xubuntu Linux 16.04.2 LTS (Xenial Xerus), PostgreSQL 9.6.2 and pgAdmin for these exercises.

The data found in our database table will depict a mock pipe tally cut log sheet, used to track cuts to records of pipe. Let’s get a look at the entire table and see what we are working with:

whole_table

We can see we have 25 records in the cut_log table with the following columns: id, pipe_id, length, cuts, and new_lt. Suppose one of our daily tasks is to update this table, inputting any cuts for any joint of pipe that has a cut reported for it (perhaps from field crews or a fabrication site). Maybe we are supplied a list containing the pipe_id and length of cut. The PostgreSQL command for accomplishing this could look something like this:

single_update

We will break down this statement to see what is being executed here:

UPDATE cut_log
SET cuts = 2.1
WHERE pipe_id = 'Asset-1292;

UPDATE will change the values of the specified columns in all rows that satisfy the specified condition. We are instructing PostgreSQL to UPDATE the cut_log table, SET the value (the expression to assign to the specified column) 2.1 (our cuts value for this pipe) for the cuts column, WHERE (this is the specified condition) the pipe_id (column) has the pipe_id ‘Asset-1292’.

We are returned:

UPDATE 1

after our query, informing us we have successfully completed updating 1 record in the table. This count can be more than one if we updated multiple records, but in this case, we only updated one record. Let’s verify this with the following query:

check_single_update

Again the WHERE clause here is telling PostgreSQL to only return information about the record with the pipe_id ‘Asset-1292’ and we can see the cuts column now has the 2.1 value we inserted with the SET command in the previous query. We can go ahead and update the new_lt column as well since it has now changed due to 2.1 cut value.

set_new_lt_postgres

Here we are using the SET command again to make the new_lt column, for pipe_id ‘Asset-1292’ be equal to the value of the length column, subtracted from the cuts column. Again, UPDATE 1 informs us we updated a record in the table so let’s have a look:

see_single_update

That all worked out just fine. But wait a minute. Perhaps we made an error when checking our form or report for updating this table, to discover the cuts value for this record should be 2.5, not 2.1. We need to fix this real quick. Can we correct both the cuts and new_lt column at the same time? Let’s see:

update_2_at_once

And the results?

check_two_for_one_update

Yes, we can update both columns in one query. This is not too terribly burdensome for a table with 25 records in it. But what if we had a table with 250 records in it? Instead of retyping this UPDATE statement over and over, we can automate this tasks and reuse it in the future anytime we need to perform this operation by writing a user-defined function in PL/pgSQL to take care of some of this heavy lifting for us.

PL/pgSQL, is a fully featured programming language. It allows much more procedural control than SQL. Also enabling us to use loops and other control structures should we choose or need to (this example will not feature any loops or control structures).

A function can be thought of as a type of procedure or routine. Functions may or may not return a value. In our example, we will create a function of a grouping of common variables and SQL statements, to help automate and carry out recurring tasks we need to perform on our database data. For brevity, this will not be an in-depth discussion on functions but more of a useful example of one with needed highlights for an explanation.

To accomplish the task of updating our cut_log table, we could write a PL/pgSQL function like this example:

cut_log_func

There is a lot here to digest so we will break it down step by step to try to understand what is taking place.

On line 1 we have:

 CREATE OR REPLACE FUNCTION cut_input(jt_id VARCHAR(25), cut_amt REAL)

The CREATE OR REPLACE FUNCTION statement does just that. It either creates a new function or replaces an existing one. The name of the new function must not match any existing function with the same input argument or parameter types in the same table (schema).  The

cut_input(jt_id VARCHAR(25), cut_amt REAL)

portion of this line means we are naming our function cut_input (a good practice is to avoid naming any user-defined functions with the same name as language-specific, provided functions to avoid confusion) that takes two arguments or parameters, jt_id and cut_amt. These arguments basically mean that whatever we will pass into this function is what this function will be performing/carrying out operations or tasks on. In place of jt_id we will specify something like ‘Asset-2’. In place of cut_amt we will input whatever the amount of the length of the cut to update. When creating the function, we must specify the data types of the parameters. VARCHAR(25) is a variable-length with a limit, character type of length 25 (ie…can hold up to 25 characters) and REAL is a decimal value number type. Calling (using) our function would look something like this:

SELECT cut_input('Pipe-3939', 3.50);

Line 2 has

RETURNS TEXT AS $$

This is where we specify the return type of our function. This function returns the type TEXT. Next is:

DECLARE
my_row cut_log%ROWTYPE;

We must declare any variables used in the block in this section. We have declared the variable my_row of type cut_log%ROWTYPE, which is the name of our table, cut_log. This my_row variable is of type %ROWTYPE. Quoting the online documentation,  “Such a variable can hold a whole row of a SELECT or FOR query result, so long as that query’s column set matches the declared type of the variable.”

 In this next section of our function, a lot of powerful operations are taking place. Let’s examine it:

BEGIN
    EXECUTE 
        'UPDATE cut_log 
        SET cuts = $2, new_lt = length - $2 
        WHERE pipe_id = $1'
        USING jt_id, cut_amt;
    
    EXECUTE 
        'SELECT * FROM cut_log
        WHERE pipe_id = $1'
        INTO my_row
        USING jt_id;
        RETURN 'pipe_id: '||my_row.pipe_id||' updated successfully.';
END

Note: Quoting the online documentation, “PL/pgSQL‘s BEGIN/END are only for grouping; they do not start or end a transaction.”  In essence, we use the BEGIN/END for grouping in PL/pgSQL.

Just after the first EXECUTE statement should look somewhat familiar. This is basically the same previous UPDATE statement we performed earlier before deciding to write this function. There are some differences, however. First, the $1 and $2 were not included in those previous UPDATE statements. These are identifiers for the parameters we passed to our functions, with the $1 representing the first argument (jt_id) and the $2 representing the second argument (cut_amt). These are then available for our use in this section as can be seen with the:

USING jt_id, cut_amt;

line (although we can use other means as ‘placeholders’ for our parameters, they are not discussed here). In short, anyplace in this statement (also known as a command string here due to the EXECUTE statement), there is a $1, that means whatever value we passed in as our first argument or parameter. Likewise for the $2. These are the values supplied in the USING clause.

After the second EXECUTE statement, we see the query:

'SELECT * FROM cut_log 
WHERE pipe_id = $1' 
INTO my_row
USING jt_id;

Pretty straightforward for the most part. Similar to our earlier queries, we are selecting everything from the cut_log table where the pipe_id is the value of the jt_id function parameter we passed in (again the placeholder $1 is here) and is supplied in this section by the USING clause. But what is this INTO my_row clause we are seeing? INTO specifies where a SQL command, that is returning rows, should be assigned. In our case, we are assigning the results of the SELECT query here to the my_row variable, which we declared as type %ROWTYPE in the DECLARE section of the function. This variable now holds the information of the row returned from the SELECT query and we can access the values of the row with the . (dot) notation by using row_variable.field syntax to access the column for the returned row. We see that in the line:

RETURN 'pipe_id: '||my_row.pipe_id||' updated successfully.';

Here we are retrieving the value of the cut_log column pipe_id, from the returned row, of the query results. Notice we are specifying this column or field exclusively and could very easily access any other columns from our table for this row result. We are also RETURNING our return value (TEXT) back to the function. The string within the single quotes uses the concatenation operator || ( in this post I briefly touch on this operator) to join together the pipe_id with the rest of the words enclosed in single quotes to display a message indicating the jt_id parameter we passed into our function has been updated. Now to briefly examine these EXECUTE statements. Whenever we want our PL/pgSQL functions to generate dynamic commands (ie.. commands involving different tables or data types each time they are executed), we must use the EXECUTE statement.
Now let’s use our cut_input function like so:

trying_our_func

‘Joint-191A’ is the jt_id parameter (1st) and 3.55 is the cut_amt parameter (2nd) required by the cut_input function.

We see the message displaying from the RETURN statement, informing us that ‘Joint-191A’ was successfully updated.

We can verify our results to make sure the function worked as expected:

verify new func

‘Joint-191A’ has a cuts value of 3.55 recorded and 64.87 for the new_lt value. Our function works as expected.

Some final thoughts concerning the function we created here may be what are the ‘$$‘ symbols for and what is the meaning of LANGUAGE plpgsql?

The ‘$$‘ used like this is called dollar quotingAs explained, quoting the documentation, “While the standard syntax for specifying string constants is usually convenient, it can be difficult to understand when the desired string contains many single quotes or backslashes since each of those must be doubled. To allow more readable queries in such situations, PostgreSQL provides another way, called “dollar quoting”, to write string constants. A dollar-quoted string constant consists of a dollar sign ($), an optional “tag” of zero or more characters, another dollar sign, an arbitrary sequence of characters that makes up the string content, a dollar sign, the same tag that began this dollar quote, and a dollar sign.” In other words, dollar quoting enables us to write ‘Some string query like this’, vs having to escape all single quotes (or backslashes) with another single quote like so: ”Some string query like this”

As for LANGUAGE plpgsql, this is the language the function is implemented in. In our case, PL/pgSQL.

Although relatively dense, this post example is by no means the ‘be all end all’ of PL/pgSQL functions. We can and should strive to write better functions with some checks in place. One added functionality is we could implement measures to confirm that the pipe_id value passed to the cut_input function is present in our table and act accordingly if not. Another check that should be included is what if someone entered an incorrect value for the cut_amt parameter? Perhaps tried to enter in a zero or negative value? These and many other best practices can be found in quality, well-written functions. Those implementations will be a topic and the subject of a future post. The idea behind this post is to present an example of user-defined functions we can customize and write ourselves to suit and benefit our particular needs.

As we can see, PL/pgSQL offers powerful features we can utilize in PostgreSQL. We have but scratched the surface of the capabilities of this procedural language.

I hope that you find something interesting and meaningful in this post. I always look forward to and welcome any questions or comments.

Thank you all so much for taking the time to read this.

To receive notifications for the latest post from this blog via email, please subscribe by clicking the ‘Follow Digital Owl’s Prose’ button in the sidebar!


Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided are performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Edited 10/09/17
07/08/2018

2 thoughts on “PostgreSQL user-defined functions with PL/pgSQL for repetitive Database tasks.

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.