SUBSTRING_INDEX() function in MySQL – with examples.

I recently used the MySQL SUBSTRING_INDEX(), function for a requirement in a PHP/MySQL reporting dashboard I am developing, and thought to write about SUBSTRING_INDEX() general usage. Although this post has arbitrary examples, I’m sure you can find a use for this function in specific cases. Continue reading and see basic queries including SUBSTRING_INDEX in the SELECT clause…

paper with black text on it
Photo by Denny Müller on Unsplash

OS, Software, and DB used:

  • OpenSuse Leap 15.1
  • MySQL 8.0.19


Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!


SUBSTRING_INDEX() accepts 3 arguments:

  1. A string to search.
  2. A delimiter to search on.
  3. A number of occurrences of the specified delimiter. In other words, which occurrence of the delimiter to split on should multiple be present in the string to search.

I will use a simple table, ‘file_names’ for the examples throughout the post:

1
2
3
4
5
6
7
 MySQL  localhost:33060+ ssl  learning  SQL > DESC file_names;
+-------------+------+------+-----+---------+-------+
| Field       | Type | Null | Key | Default | Extra |
+-------------+------+------+-----+---------+-------+
| target_file | text | YES  |     | NULL    |       |
+-------------+------+------+-----+---------+-------+
1 row in set (0.0028 sec)

1
2
3
4
5
6
7
8
9
10
 MySQL  localhost:33060+ ssl  learning  SQL > SELECT * FROM file_names;
+-------------------+
| target_file       |
+-------------------+
| NAME-1-A_02202020 |
| NAME-2-A_02212020 |
| NAME3-A_02212020  |
| NAME-4-B_02192020 |
+-------------------+
4 rows in set (0.0007 sec)

Table ‘file_names’ stores file names data (not the actual files), which is where I applied SUBSTRING_INDEX() functionality. In my particular requirement, the files are named with a certain naming scheme. Based on that naming, I know what project certain data points belong to.

However, in order to determine which project a file does in fact belong to, I must extract (not in the literal sense) a portion of text based on the location of a delimiter (an underscore (_) in this instance).

Moving forward…

In this first example query, I split the file name on the first dash (-) occurrence, as specified by the number ‘1’ in the third argument position:

1
2
3
4
5
6
7
8
9
10
11
 MySQL  localhost:33060+ ssl  learning  SQL > SELECT target_file, SUBSTRING_INDEX(target_file, '-', 1) AS sub_split
                                           -> FROM file_names;
+-------------------+-----------+
| target_file       | sub_split |
+-------------------+-----------+
| NAME-1-A_02202020 | NAME      |
| NAME-2-A_02212020 | NAME      |
| NAME3-A_02212020  | NAME3     |
| NAME-4-B_02192020 | NAME      |
+-------------------+-----------+
4 rows in set (0.0009 sec)

The query results show the returned substring contains all characters preceding the first occurrence of the dash delimiter.

Are you wondering if you can split on another occurrence of the delimiter character?

In this query, I simply change the number of delimiter occurrences to 2, therefore splitting on the second instance of the dash:

1
2
3
4
5
6
7
8
9
10
11
 MySQL  localhost:33060+ ssl  learning  SQL > SELECT target_file, SUBSTRING_INDEX(target_file, '-', 2) AS sub_split
                                           -> FROM file_names;
+-------------------+------------------+
| target_file       | sub_split        |
+-------------------+------------------+
| NAME-1-A_02202020 | NAME-1           |
| NAME-2-A_02212020 | NAME-2           |
| NAME3-A_02212020  | NAME3-A_02212020 |
| NAME-4-B_02192020 | NAME-4           |
+-------------------+------------------+
4 rows in set (0.0008 sec)

Notice the third ‘target_file’ row in the results set does not contain a second dash in its name. In that case, the entire ‘target_file’ name is returned.

With so many two’s embedded in the ‘target_file’ names, let’s execute an example query where ‘2’ is the delimiter:

1
2
3
4
5
6
7
8
9
10
11
 MySQL  localhost:33060+ ssl  learning  SQL > SELECT target_file, SUBSTRING_INDEX(target_file, '2', 1) AS sub_split
                                           -> FROM file_names;
+-------------------+------------+
| target_file       | sub_split  |
+-------------------+------------+
| NAME-1-A_02202020 | NAME-1-A_0 |
| NAME-2-A_02212020 | NAME-      |
| NAME3-A_02212020  | NAME3-A_0  |
| NAME-4-B_02192020 | NAME-4-B_0 |
+-------------------+------------+
4 rows in set (0.0008 sec)

Splitting on the first occurrence of ‘2’, returns varied results. In particular, the 2nd row’s ‘target_file’ name has a 2 early on in the name, whereas the with the other row’s names, 2 occurs in the strings date portion later in the string.

Again, splitting on a different instance of the delimiter, returns varied results:

1
2
3
4
5
6
7
8
9
10
11
 MySQL  localhost:33060+ ssl  learning  SQL > SELECT target_file, SUBSTRING_INDEX(target_file, '2', 2) AS sub_split
                                           -> FROM file_names;
+-------------------+---------------+
| target_file       | sub_split     |
+-------------------+---------------+
| NAME-1-A_02202020 | NAME-1-A_02   |
| NAME-2-A_02212020 | NAME-2-A_0    |
| NAME3-A_02212020  | NAME3-A_02    |
| NAME-4-B_02192020 | NAME-4-B_0219 |
+-------------------+---------------+
4 rows in set (0.0008 sec)

Another direction…

In all of the examples shown so far, we have split on a specific delimiter using a positive number. However, SUSBSTRING_INDEX() accepts a negative number for the third argument as well.

As you have likely surmised, a positive number returns all characters to the left of the delimiter and starts with the specified delimiter from that direction. On the other hand, a negative number yields text from the right, also using the specified occurrence of the delimiter from that direction.

The below queries demonstrate using a negative number for the third argument:

1
2
3
4
5
6
7
8
9
10
11
 MySQL  localhost:33060+ ssl  learning  SQL > SELECT target_file, SUBSTRING_INDEX(target_file, '-', -1)
                                           -> FROM file_names;
+-------------------+---------------------------------------+
| target_file       | SUBSTRING_INDEX(target_file, '-', -1) |
+-------------------+---------------------------------------+
| NAME-1-A_02202020 | A_02202020                            |
| NAME-2-A_02212020 | A_02212020                            |
| NAME3-A_02212020  | A_02212020                            |
| NAME-4-B_02192020 | B_02192020                            |
+-------------------+---------------------------------------+
4 rows in set (0.0143 sec)

1
2
3
4
5
6
7
8
9
10
11
 MySQL  localhost:33060+ ssl  learning  SQL > SELECT target_file, SUBSTRING_INDEX(target_file, '2', -1)
                                           -> FROM file_names;
+-------------------+---------------------------------------+
| target_file       | SUBSTRING_INDEX(target_file, '2', -1) |
+-------------------+---------------------------------------+
| NAME-1-A_02202020 | 0                                     |
| NAME-2-A_02212020 | 0                                     |
| NAME3-A_02212020  | 0                                     |
| NAME-4-B_02192020 | 0                                     |
+-------------------+---------------------------------------+
4 rows in set (0.0008 sec)

1
2
3
4
5
6
7
8
9
10
11
 MySQL  localhost:33060+ ssl  learning  SQL > SELECT target_file, SUBSTRING_INDEX(target_file, '2', -2)
                                           -> FROM file_names;
+-------------------+---------------------------------------+
| target_file       | SUBSTRING_INDEX(target_file, '2', -2) |
+-------------------+---------------------------------------+
| NAME-1-A_02202020 | 020                                   |
| NAME-2-A_02212020 | 020                                   |
| NAME3-A_02212020  | 020                                   |
| NAME-4-B_02192020 | 020                                   |
+-------------------+---------------------------------------+
4 rows in set (0.0008 sec)

1
2
3
4
5
6
7
8
9
10
11
 MySQL  localhost:33060+ ssl  learning  SQL > SELECT target_file, SUBSTRING_INDEX(target_file, 'E', 1)
                                           -> FROM file_names;
+-------------------+--------------------------------------+
| target_file       | SUBSTRING_INDEX(target_file, 'E', 1) |
+-------------------+--------------------------------------+
| NAME-1-A_02202020 | NAM                                  |
| NAME-2-A_02212020 | NAM                                  |
| NAME3-A_02212020  | NAM                                  |
| NAME-4-B_02192020 | NAM                                  |
+-------------------+--------------------------------------+
4 rows in set (0.0008 sec)

1
2
3
4
5
6
7
8
9
10
11
 MySQL  localhost:33060+ ssl  learning  SQL > SELECT target_file, SUBSTRING_INDEX(target_file, 'E', -1)
                                           -> FROM file_names;
+-------------------+---------------------------------------+
| target_file       | SUBSTRING_INDEX(target_file, 'E', -1) |
+-------------------+---------------------------------------+
| NAME-1-A_02202020 | -1-A_02202020                         |
| NAME-2-A_02212020 | -2-A_02212020                         |
| NAME3-A_02212020  | 3-A_02212020                          |
| NAME-4-B_02192020 | -4-B_02192020                         |
+-------------------+---------------------------------------+
4 rows in set (0.0008 sec)

Can you use the SUBSTRING_INDEX() function in any of your particular queries? Tell me all about it in the comments. Thanks for reading!

Feel free to comment below with any questions or suggestions you may have about the post.

Like what you have read? See anything incorrect? Please comment below and thanks for reading!!!

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

Have I mentioned how much I love a cup of coffee?!?!

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.


Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Hey thanks for commenting! Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.