Pattern matching string functions v16

EDB Postgres Advanced Server offers support for the REGEXP_COUNT, REGEXP_INSTR, and REGEXP_SUBSTR functions. These functions search a string for a pattern specified by a regular expression and return information about occurrences of the pattern in the string. Use a POSIX-style regular expression. For more information about forming a POSIX-style regular expression, see the PostgreSQL core documentation.

REGEXP_COUNT

REGEXP_COUNT searches a string for a regular expression and returns a count of the times that the regular expression occurs. The signature is:

INTEGER REGEXP_COUNT
(
  srcstr    TEXT,
  pattern   TEXT,
  position  DEFAULT 1
  modifier  DEFAULT NULL
)

Parameters

srcstr

srcstr specifies the string to search.

pattern

pattern specifies the regular expression REGEXP_COUNT searches for.

position

position is an integer value that indicates the position in the source string where REGEXP_COUNT begins searching. The default value is 1.

modifier

modifier specifies values that control the pattern-matching behavior. The default value is NULL. For a complete list of the modifiers supported by EDB Postgres Advanced Server, see the PostgreSQL core documentation.

Example

In this example, REGEXP_COUNT returns a count of the number of times the letter i is used in the character string 'reinitializing':

edb=# SELECT REGEXP_COUNT('reinitializing', 'i', 1) FROM DUAL;
Output
 regexp_count
--------------
            5
(1 row)

The command instructs REGEXP_COUNT to begin counting in the first position. If you modify the command to start the count on the sixth position, REGEXP_COUNT returns 3:

edb=# SELECT REGEXP_COUNT('reinitializing', 'i', 6) FROM DUAL;
Output
 regexp_count
--------------
            3
(1 row)

The count now excludes any occurrences of the letter i that occur before the sixth position.

REGEXP_INSTR

REGEXP_INSTR searches a string for a POSIX-style regular expression. This function returns the position in the string where the match was located. The signature is:

INTEGER REGEXP_INSTR
(
  srcstr        TEXT,
  pattern       TEXT,
  position      INT  DEFAULT 1,
  occurrence    INT  DEFAULT 1,
  returnparam   INT  DEFAULT 0,
  modifier      TEXT DEFAULT NULL,
  subexpression INT  DEFAULT 0,
)

Parameters

srcstr

srcstr specifies the string to search.

pattern

pattern specifies the regular expression that REGEXP_INSTR searches for.

position

position specifies an integer value that indicates the start position in a source string. The default value is 1.

occurrence

occurrence specifies the match to return if more than one occurrence of the pattern occurs in the string to search. The default value is 1.

returnparam

returnparam is an integer value that specifies the location in the string for REGEXP_INSTR to return. The default value is 0. Specify:

  • 0 to return the location in the string of the first character that matches pattern.

  • A value greater than 0 to return the position of the first character following the end of the pattern.

modifier

modifier specifies values that control the pattern-matching behavior. The default value is NULL. For a complete list of the modifiers supported by EDB Postgres Advanced Server, see the PostgreSQL core documentation.

subexpression

subexpression is an integer value that identifies the portion of the pattern that's returned by REGEXP_INSTR. The default value of subexpression is 0.

If you specify a value for subexpression, you must include one or more sets of parentheses in the pattern that isolate a portion of the value you are searching for. The value specified by subexpression indicates the set of parentheses to return. For example, if subexpression is 2, REGEXP_INSTR returns the position of the second set of parentheses.

Example

In this example, REGEXP_INSTR searches a string that contains a phone number for the first occurrence of a pattern that contains three consecutive digits:

edb=# SELECT REGEXP_INSTR('800-555-1212', '[0-9][0-9][0-9]', 1, 1) FROM DUAL;
Output
 regexp_instr
--------------
            1
(1 row)

The command instructs REGEXP_INSTR to return the position of the first occurrence. If we modify the command to return the start of the second occurrence of three consecutive digits, REGEXP_INSTR returns 5. The second occurrence of three consecutive digits begins in the fifth position.

edb=# SELECT REGEXP_INSTR('800-555-1212', '[0-9][0-9][0-9]', 1, 2) FROM DUAL;
Output
 regexp_instr
--------------
            5
(1 row)

REGEXP_SUBSTR

The REGEXP_SUBSTR function searches a string for a pattern specified by a POSIX-compliant regular expression. REGEXP_SUBSTR returns the string that matches the pattern specified in the call to the function. The signature of the function is:

TEXT REGEXP_SUBSTR
(
  srcstr        TEXT,
  pattern       TEXT,
  position      INT  DEFAULT 1,
  occurrence    INT  DEFAULT 1,
  modifier      TEXT DEFAULT NULL,
  subexpression INT  DEFAULT 0
)

Parameters

srcstr

srcstr specifies the string to search.

pattern

pattern specifies the regular expression REGEXP_SUBSTR searches for.

position

position specifies an integer value that indicates the start position in a source string. The default value is 1.

occurrence

occurrence specifies the match returned if more than one occurrence of the search pattern occurs. The default value is 1.

modifier

modifier specifies values that control the pattern-matching behavior. The default value is NULL. For a complete list of the modifiers supported by EDB Postgres Advanced Server, see the PostgreSQL core documentation.

subexpression

subexpression is an integer value that identifies the portion of the pattern that's returned by REGEXP_SUBSTR. The default value of subexpression is 0.

If you specify a value for subexpression, you must include one or more sets of parentheses in the pattern that isolate a portion of the value being searched for. The value specified by subexpression indicates the set of parentheses to return. For example, if subexpression is 2, REGEXP_SUBSTR returns the value contained in the second set of parentheses.

Example

In this example, REGEXP_SUBSTR searches a string that contains a phone number for the first set of three consecutive digits:

edb=# SELECT REGEXP_SUBSTR('800-555-1212', '[0-9][0-9][0-9]', 1, 1) FROM DUAL;
Output
 regexp_substr
---------------
 800
(1 row)

It locates the first occurrence of three digits and returns the string (800). If we modify the command to check for the second occurrence of three consecutive digits, REGEXP_SUBSTR returns 555, the contents of the second substring.

edb=# SELECT REGEXP_SUBSTR('800-555-1212', '[0-9][0-9][0-9]', 1, 2) FROM DUAL;
Output
 regexp_substr
---------------
 555
(1 row)