Regular expressions in PostgreSQL with regexp_match() – With examples.

Regular expressions are somewhat new to me in the sense of, I know what they are, it’s just I have never had to use them. I’ve always gotten by using the LIKE search pattern. However, I wanted to replicate a particular UPDATE in PostgreSQL that I carried-out in MS Access (not with regex’s there either) and discovered a need for basic regex search and match in order to accomplish it. Let’s visit and learn about the regexp_match() function and how I used it…

zoomed-in-picture-of-dictionary
Photo by Romain Vignes on Unsplash

Note: All data, names or naming found within the database presented in this post, are strictly used for practice, learning, instruction, and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.

OS and DB used:
  • Xubuntu Linux 18.04.2 LTS (Bionic Beaver)
  • PostgreSQL 11.5


Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!

What exactly is a regular expression (or regexp as it is also called)?

Here is a portion of the Wikipedia definition:

“A regular expression, regex or regexp[1] (sometimes called a rational expression)[2][3] is a sequence of characters that define a search pattern.”

Now that we have an idea of what a regexp is, let’s see the table and data for this example.

The table structure I am using is relatively simple, having 3 columns: ‘asset_name’, ‘has_parent’, and ‘parent_id’.

1
2
3
4
5
6
7
learning=> \d pipe_asset;
               Table "public.pipe_asset"
   Column   |  Type   | Collation | Nullable | Default
------------+---------+-----------+----------+---------
 asset_name | text    |           |          |
 has_parent | boolean |           |          | false
 parent_id  | text    |           |          |

I’ll use the psql \pset meta-command and set NULL to something a bit more visible and vibrant for this session:

1
2
learning=> \pset null *##*
Null display is "*##*".

Table ‘pipe_asset’ has the following records:

1
2
3
4
5
6
7
8
learning=> TABLE pipe_asset;
 asset_name | has_parent | parent_id
------------+------------+-----------
 34343      | f          | *##*
 12143A     | f          | *##*
 78932-A    | f          | *##*
 18431R     | f          | *##*
(4 rows)

At this point, columns ‘has_parent’ and ‘parent_id’ have their default values: FALSE and the NULL marker respectively.

Let’s take a moment and discuss the significance of the table columns. In the world of Energy pipelines, the pieces of pipe used to construct the line, are often cut for any number of reasons. When these pipe assets are cut, that essentially creates a new piece of pipe – also called a PUP for ‘Partial Unit of Pipe’ – that must be tracked and accounted for.

Typically pipe assets are labeled with some form of identification (I’ve mostly been exposed to numerical naming conventions in my career) and when a piece is cut, it is typically renamed with an Alphabet character tacked on the end of the original name.

Therefore, table ‘pipe_asset’ is implemented for this purpose, storing cut pipe assets with the corresponding origin pipe name from which is was cut.

Imagine records are loaded into this table from a CSV file and my daily requirement is to update this table reflecting a TRUE value for column ‘has_parent’ along with the ‘parent_id’ for those pipe assets that have been cut.

So, a successful UPDATE must: SET the ‘has_parent’ column to TRUE along with the ‘parent_id’ column to the name of the original parent pipe, all based off of the ‘asset_name’ values ending letter.

This post will focus on the regexp portion of the query and future post(s) will cover the actual UPDATE and logic.


Do you need to load CSV data into your PostgreSQL database tables? Have a look at these blog posts I wrote using both \copy and COPY with simple examples:

Among the plethora of pattern matching and regexp functionality Postgres supports, I settled on the regexp_match function for this requirement. At this time, it makes the most sense to me. To make it perfectly clear, I am not a regexp expert by any means. I more than welcome suggestions for better implementations (and any uses of regexp’s in general) via the comments below so do share!

Here is a portion of the description from the documentation for regexp_match:

“The regexp_match function returns a text array of captured substring(s) resulting from the first match of a POSIX regular expression pattern to a string. It has the syntax regexp_match(string, pattern [, flags ]). If there is no match, the result is NULL.”

Let’s see an example query using regexp_match and gain a sense of what it does:

1
2
3
4
5
6
7
8
learning=> SELECT asset_name, (regexp_match(RIGHT(asset_name,1),'[A-Z]')) FROM pipe_asset;
 asset_name | regexp_match
------------+--------------
 34343      | *##*
 12143A     | {A}
 78932-A    | {A}
 18431R     | {R}
(4 rows)

For starters, I like the fact that if there is not a match found in the string for the pattern argument, NULL is returned as is the case in the first record with ‘asset_name’ ‘34343. The call to RIGHT(asset_name,1) supplies the search string for the regexp pattern, which is '[A-Z], meaning, if the right-most character is a capital letter, there is a match.

To see a better visual representation, I’ll include that right-most character as a column in the SELECT query:

1
2
3
4
5
6
7
8
9
10
11
SELECT asset_name,
(regexp_match(RIGHT(asset_name,1),'[A-Z]')),
RIGHT(asset_name,1)
FROM pipe_asset;
 asset_name | regexp_match | right
------------+--------------+-------
 34343      | *##*         | 3
 12143A     | {A}          | A
 78932-A    | {A}          | A
 18431R     | {R}          | R
(4 rows)
(I have written posts on both LEFT() and RIGHT() string functions if you are interested in more about them.)

You may have noticed that in row 3, there is a dash (-) between the ending numerical character (Talk about an oxymoron!) and the ending ‘A’. Although that row still matches, this could become an issue if using the original name as part of the desired UPDATE. We can account for this with a small adjustment to the regexp.

Notice the difference between the 2 queries below. The regexp pattern, ‘-[A-Z]’, requires that the 2 rightmost characters extracted with RIGHT() must be a dash (-), followed by any capital letter.

1
2
3
4
5
6
7
8
learning=> SELECT asset_name, (regexp_match(RIGHT(asset_name,2),'-[A-Z]')) FROM pipe_asset;
 asset_name | regexp_match
------------+--------------
 34343      | *##*
 12143A     | *##*
 78932-A    | {-A}
 18431R     | *##*
(4 rows)

A noble attempt no doubt. But, this version of the regexp returns a false positive (I’m on a roll here folks with the oxymoron’s right!). In the instance that there is, a dash preceding the ending letter (if it exists), that works great. But, what about the other 2 records in which the ‘asset_name’ column value does end with a letter but no dash is present? We do want to know and UPDATE those accordingly as well. Plainly seen above, those 2 rows returned NULL and technically will cause the UPDATE to miss potential rows that should be changed.

In order to resolve this, I need the preceding dash to be optional.

1
2
3
4
5
6
7
8
learning=> SELECT asset_name, (regexp_match(RIGHT(asset_name,2),'-?[A-Z]')) FROM pipe_asset;
 asset_name | regexp_match
------------+--------------
 34343      | *##*
 12143A     | {A}
 78932-A    | {-A}
 18431R     | {R}
(4 rows)

By incorporating the question mark (?) quantifier, the dash (-) is now optional, allowing for rows with and without it, to make the query results if the ‘asset_name’ column value ends with a letter. (Just FYI: Should the ending character letter be lower-case, changing the regexp to ‘-?[a-zA-Z]’ will catch and account for them as well, yielding the same results had it been upper-case.)

The next task will be to use regexp_match() and the regexp as part of the UPDATE in order to modify the correct row(s). That will be covered in the next post so check back in when it drops!

Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Advertisements

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.