pcre Syntax

The pcre keyword accepts standard Perl-compatible regular expression (PCRE) syntax. The following sections describe that syntax.

Tip

While this section describes the basic syntax you may use for PCRE, you may want to consult an online reference or book dedicated to Perl and PCRE for more advanced information.

Metacharacters

Metacharacters are literal characters that have special meaning within regular expressions. When you use them within a regular expression, you must ��escape” them by preceding them with a backslash.

The following table describes the metacharacters you can use with PCRE and gives examples of each.

PCRE Metacharacters

Metacharacter

Description

Example

.

Matches any character except newlines. If s is used as a modifying option, it also includes newline characters.

abc. matches abcd, abc1, abc#, and so on.

*

Matches zero or more occurrences of a character or expression.

abc* matches abc, abcc, abccc, abccccc, and so on.

?

Matches zero or one occurrence of a character or expression.

abc? matches abc.

+

Matches one or more occurrences of a character or expression.

abc+ matches abc, abcc, abccc, abccccc, and so on.

()

Groups expressions.

(abc)+ matches abc, abcabc, abcabcabc and so on.

{}

Specifies a limit for the number of matches for a character or expression. If you want to set a lower and upper limit, separate the lower limit and upper limit with a comma.

a{4,6} matches aaaa, aaaaa, or aaaaaa.

(ab){2} matches abab.

[]

Allows you to define character classes, and matches any character or combination of characters described in the set.

[abc123] matches a or b or c, and so on.

^

Matches content at the beginning of a string. Also used for negation, if used within a character class.

^in matches the “in” in info, but not in bin. [^a] matches anything that does not contain a.

$

Matches content at the end of a string.

ce$ matches the “ce” in announce, but not cent.

|

Indicates an OR expression.

(MAILTO|HELP) matches MAILTO or HELP.

\

Allows you to use metacharacters as actual characters and is also used to specify a predefined character class.

\. matches a period, \* matches an asterisk, \\ matches a backslash and so on. \d matches the numeric characters, \w matches alphanumeric characters, and so on.

Character Classes

Character classes include alphabetic characters, numeric characters, alphanumeric characters, and white space characters. While you can create your own character classes within brackets, you can use the predefined classes as shortcuts for different types of character types. When used without additional qualifiers, a character class matches a single digit or character.

The following table describes and provides examples of the predefined character classes accepted by PCRE.

PCRE Character Classes

Character Class

Description

Character Class Definition

\d

Matches a numeric character (“digit”).

[0-9]

\D

Matches anything that is not an numeric character.

[^0-9]

\w

Matches an alphanumeric character (“word”).

[a-zA-Z0-9_]

\W

Matches anything that is not an alphanumeric character.

[^a-zA-Z0-9_]

\s

Matches white space characters, including spaces, carriage returns, tabs, newlines, and form feeds.

[ \r\t\n\f]

\S

Matches anything that is not a white space character.

[^ \r\t\n\f]