Regular Expressions in Oracle (pl/)SQL
|
|
|
|
|
  OraDev.com
|
Regular Expressions in Oracle SQL
Since Oracle 10g you can use regular expressions in the database. Regular expressions enable you to search for patterns
in string data by using standardized syntax conventions. You specify a regular expression through the following
types of characters:
-Metacharacters, which are operators that specify search algorithms
-Literals, which are the characters for which you are searching
Regular Expressions Functions:
Function | Description | Syntax |
REGEXP_LIKE |
This is like the LIKE operator, but much more powerfull.
|
REGEXP_LIKE(search_string, pattern [,match_option])
|
REGEXP_INSTR |
This functions searches for a regular expression pattern in a string and returns the position of the first character of the
string that matches the expression.
|
REGEXP_INSTR(search_string, pattern [, position [, occurrence
[, return_option [, match_option]]]])
|
REGEXP_REPLACE |
This functions searches for a regular expression pattern in a string and replaces this with a replacement string.
|
REGEXP_REPLACE(search_string, pattern [,replacestring [, position
[, occurrence [, match_option]]]])
|
REGEXP_SUBSTR |
This function you can use to search for a regular expression pattern and return the matched substring.
|
REGEXP_SUBSTR(search_string, pattern [, position
[, occurrence [, match_option]]])
|
REGEXP_COUNT |
counts the number of times a pattern appears in a string.
|
REGEXP_COUNT(search_string, pattern [,position] [,match_option])
|
with:
search_string: the string you are searching in.
pattern: A regular expression. (max 512 bytes).
position: The character position from which you want the search to begin.
occurrence: The occurence you are searching for. If you want to return the second match for the regular expression, this is a 2.
return_option: 0 means return the pattern's beginning position. 1 means return the ending character position.
match_option can have one of these values:
'c': case-sensitive (default) ;
'i': case-insensitive ;
'n': Allow match-any-character operator (.) ;
'm': Treat source string as multiple line. For every line ^ and $ will be used for the beginning and ending of the line.
Example: select * from emp where REGEXP_LIKE(name,'A.{2,5))
means select rows from emp table where name starts with an A followed by 2 to 5 chararcters.
Metacharacters
POSIX (Standard) Metacharacters
The POSIX standard for matching ASCII data is used in other languages also. Here is an overview
of the POSIX metacharacters used in Oracle:
Operator | Description |
* |
Matches zero or more occurrences
|
+ |
Matches one or more occurrences
|
? |
Matches zero or one occurrence
|
. |
Matches any character in the supported character set except NULL
|
| |
Alternation operator (OR) for specifying alternative matches
|
^ |
Matches the beginning of a string by default. In multiline mode, it matches the beginning of any line anywhere within the source string.
|
$ |
Matches the end of a string by default. In multiline mode, it matches the end of any line anywhere within the source string.
|
\ |
Escape character. Use a backslash to search for a character that is normally threated as a metacharacter.
|
[ ] |
Bracket expression for specifying a matching list that should match any one of the expressions represented in the list.
|
[^ ] |
Non matching character list. Matches none of the expressions represented in the list.
|
( ) |
Grouping expression, treated as a single subexpression
|
{m} |
Matches exactly m times
|
{m,} |
Matches at least m times
|
{m,n} |
Matches at least m times but no more than n times
|
\n |
The backreference expression (n is a digit between 1 and 9) matches the nth subexpression enclosed between '(' and ')' preceding the \n
|
[..] |
Specifies one collation element, and can be a multicharacter element (for example, [.ch.] in Spanish)
|
[: :] |
Specifies POSIX character classes. It matches any character within the character class. This can be:
[:alnum:] Alphanumeric characters
[:alpha:] Alphabetic characters
[:ascii:] ASCII characters
[:blank:] Space and tab
[:cntrl:] Control characters
[:digit:] Digits
[:graph:] Visible characters
[:lower:] Lowercase letters [a-z]
[:print:] Visible characters and spaces
[:punct:] Punctuation and symbols.
[:space:] All whitespace characters, including line breaks
[:upper:] Uppercase letters
[:word:] Word characters
[:xdigit:] Hexadecimal digits
Character classes are supported only within bracket expressions. So [:blank]{10} does not work, but [[:blank:]]{10} is ok.
|
[==] |
Specifies equivalence classes. For example, [=a=] matches all characters having base letter 'a'.
|
Perl has expressions that are not in the POSIX standard. Because these expressions are widly used, they are also implemented in Oracle.
Operator | Description |
\d |
A digit character. It is equivalent to the POSIX class [[:digit:]].
|
\D |
A nondigit character. It is equivalent to the POSIX class [^[:digit:]].
|
\w |
A word character, which is defined as an alphanumeric or underscore (_) character. It is equivalent to the POSIX class [[:alnum:]_].
|
\W |
A nonword character. It is equivalent to the POSIX class [^[:alnum:]_].
|
\pP |
Names properties. This is no POSIX equivalent class.
|
\s |
A whitespace character. It is equivalent to the POSIX class [[:space:]] except that [:space:] includes vertical tab and \s does not.
|
\S |
A nonwhitespace character. It is equivalent to the POSIX class [^[:space:]].
|
\A |
Only at the beginning of a string. In multi-line mode, that is, when embedded newline characters in a string are considered the termination of a line, \A does not match the beginning of each line.
|
\Z |
Only at the end of string or before a newline ending a string. In multi-line mode, that is, when embedded newline characters in a string are considered the termination of a line, \Z does not match the end of each line.
|
z |
Only at the end of a string.
|
*? |
The preceding pattern element 0 or more times ("nongreedy"). This quantifier matches the empty string whenever possible.
|
+? |
The preceding pattern element 1 or more times ("nongreedy").
|
?? |
The preceding pattern element 0 or 1 time ("nongreedy"). This quantifier matches the empty string whenever possible.
|
{n}? |
The preceding pattern element exactly n times ("nongreedy"). In this case {n}? is equivalent to {n}.
|
{n,}? |
The preceding pattern element at least n times ("nongreedy").
|
{n,m}? |
At least n but not more than m times ("nongreedy"). {0,m}? matches the empty string whenever possible.
|
Regular expressions in constraints
Regular expressions can be used to enforce constraints on a database table.
For example:
alter table employee
add CONSTRAINT emp_ck1
CHECK (REGEXP_LIKE (telephone_nr, '\d{0,4}[-]?\d{6,103}'));
|
|
|
|
|
|