Searching numerous columns of data for particular records is time-consuming and less than productive. In this post, I will demonstrate pattern matching examples using the powerful LIKE operator available in both traditional SQL and PostgreSQL. Along with briefly visiting the PostgreSQL keyword ILIKE as well. This post will be Part 1 in a planned 3 part series on pattern matching.
Note: All data, names or naming found within the database presented in this post, are strictly used for practice, instruction and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.
I will be using Xubuntu Linux 16.04.2 LTS (Xenial Xerus), PostgreSQL 9.6.2 and pgAdmin for these exercises.
The mock table and data being used in this post will depict a Pipeline Weld Id or Weld X-Ray log containing the columns: id, weld_id, tech_name and date_completed
For the purpose of this blog post, suppose part of your task is to peruse this weld_log table and perform any number of operations with the weld_id field. Perhaps you need to review differences in the naming of any repaired Weld Ids? In the past, these repairs have been labeled with an ‘R’ at the end of the designated name but the naming in this table is mixed differently.
Pouring over these records one at a time is not optimal. In this example table, there are only 19 rows but what if it was populated with perhaps 300 plus records? The time needed to search through all those records would dramatically increase. There is a much more efficient and better way. Especially when you are not 100 percent sure of a specific name or naming.
You can utilize the LIKE operator to simplify and speed up this type of query process tremendously. This first example will involve querying for those records that are labeled ending with an ‘R’, to get an idea of how many are found within this table, with that particular naming scheme. Using the LIKE operator you can determine that quickly with the following query:
Using this query results in the following records being returned:
As defined here, the LIKE operator’s syntax is as follows:
string LIKE pattern [ESCAPE escape-character]
LIKE returns the record(s) of the string you want to search (the weld_id column value in this example) if the string matches the particular pattern you supply, to the expression. The pattern is surrounded by single quotes and utilizes the percent sign (%) character. When used in this context, the % character is known as a wildcard character. By placing the % wildcard in front of the ‘R’, the specified pattern matches all records for a weld_id ending with an ‘R’, where the ‘R’ is preceded by any number of other characters.
This type of query is quite flexible. Returning the records for a weld_id that begins with an ‘R’, followed by any number of other characters would look like this:
How many records have a weld_id column value that begins with ‘R’?
By specifying that the weld_id column value must begin with an ‘R’, then followed by the % wildcard character, which matches any number of other characters, this query returns two records that match as can be seen above.
POINT TO REMEMBER: The % wildcard character represents any number of characters to match.
What results would be returned if you used this query?
SELECT * FROM weld_log WHERE weld_id LIKE '%R%';
By placing the % wildcard character before and after the ‘R’, this query will return all records from the weld_id column that has an ‘R’ anywhere in the weld_id string, due to LIKE searching the entire string for the pattern.
You also have the option to be very specific with the patterns you supply using the % wildcard character. For example, suppose your naming convention has allowed the weld_id to contain ‘-Id-‘. To quickly locate those records, you can use a query like this:
But exercise caution and always be specific with the patterns you supply. You may think running a similar query such as below, would return the same results as above:
However, the results are surprising:
The first pattern specifies any number of characters leading up to a hyphen (-) which is followed by the letters ‘Id’ in turn, then followed by another hyphen (-) and finally any number of other characters following that hyphen (-).
But the pattern,
specifies any number of characters preceding the letters ‘Id’ followed by any number of characters thereafter. This is a huge difference. These types of queries demonstrate the power of the LIKE operator.
Take a close look at those most recent results. In particular, the record with id 11 and weld_id ‘Id-527R’. Why did ‘%Id%’ return that record with the others? It does contain ‘Id’ but ‘Id’ is not preceded by any other characters. ‘Id’ is at the beginning of the string. So what gives? The answer is the % wildcard character preceding the ‘Id’.
To expand, I will run the query without the preceding % wildcard:
and it returns,
which specifies the weld_id string must begin with (due to the absence of the % wildcard character) ‘Id’ and is then followed by any number of other characters.
Moving along, there is another wildcard character that can be used in the string pattern supplied to be searched. The underscore (_). Imagine a repaired weld_id is to be named ending with 4 numbers and an ‘R’. The technician catches a mistake a day late and informs you that an extra hyphen (-) was mistakenly placed between the four numbers and the ‘R’. How can you quickly search for those specific records and get them corrected and updated?
Specifying the % wildcard first, then 4 underscores preceding ‘-R’, the returned records match the pattern for any number of characters leading up to exactly 4 characters that are followed by a ‘-R’.
And the results:
show three records returned that match this specific pattern.
POINT TO REMEMBER: The _ wildcard character represents only one character to match.
Ready for a twist? The previous example showed all the weld_id columns that were named with any number of characters leading up to exactly 4 characters that preceded an ending ‘-R‘. We used the hypothetical situation that those records were named wrong. Suppose for a particular naming convention, that naming is how all repaired Weld Id’s should be defined. And those three returned records are the only ones in the table that are correct according to those specifications. What query could you use to locate any others that need to be structured exactly like that? By using the NOT operator in conjunction with the LIKE operator.
This query returns:
The results are profound and indicative of how using NOT combined with LIKE results in a very powerful query. This example use case would be excellent in discovering which records you essentially needed to be returned in the query. As referenced in the mock example, these records are the ones needing to be corrected and updated.
POINT TO REMEMBER: If a % or _ wildcard character is a part of the actual string pattern you are searching, both of them have to be ‘escaped’. Normally with the backslash ‘\’ character, which is the default escape character.
Example: ‘%TI-\%%’ would return any records that contained the pattern ‘TI-%’, preceded by any number of other characters and also followed by any number of other characters. By escaping the % here with the backslash, it holds it’s intended value. The percent value, not the % wildcard value.
Also very important to note, the LIKE operator is case-sensitive. For example:
are not the same search pattern to the LIKE operator due to the capitalized ‘ID’ in the second example.
If you need queries with the flexibility to be case-insensitive the PostgreSQL ILIKE keyword is available. An example of that type of query would look like this:
This query returns:
Utilizing ILIKE with the pattern to be searched, returns any records containing a match of ‘%-Id-%’ regardless of case.
Searching the pattern with lower-case letters in the query:
returns the exact same results:
LIKE operator important takeaways:
- With the LIKE operator, pattern matching is performed over the string in its entirety. If you want to find your search pattern anywhere within the whole string, the pattern you specify has to begin and end with a % wildcard character. (Refer to the previously discussed example above)
- If the string pattern does not contain a % or _ wildcard character, it is treated as a regular string. In these instances, LIKE mimics the equals operator.
- If a literal % or _ character is within the string to search, escape them with the default backslash ‘\’ escape character. By using the ESCAPE clause this escape character can be changed if needed. The syntax is below for those use cases:
string LIKE pattern [ESCAPE desired-escape-character-here]
- Is Case-sensitive
- Can be used in conjunction with the NOT operator.
ILIKE keyword important takeaways:
- Is Case-insensitive
- Is a PostgreSQL extension.
- Can be used in conjunction with the NOT operator.
Using LIKE and if needed ILIKE in search queries similar to those presented in this blog post, will help tremendously when trying to locate certain records or fields within large database tables. Especially in those cases when you are unaware of how data values are named. LIKE also works exceptionally well when you have only partial naming to use in your queries. I highly recommend experimenting with the LIKE operator in your search queries to fully benefit from its use.
Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.
To receive notifications for the latest post from this blog via email, please subscribe by clicking the ‘Follow Digital Owl’s Prose’ button in the sidebar!
Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not hard and fast rules or the utmost best solution(s). Your particular goals and needs may vary. Use those practices that best benefit your needs and goals.