Data Patterns in Custom Sensitive Data Types
You define the data pattern for a custom data type using a simple set of regular expressions comprised of the following:
-
three metacharacters
-
escaped characters that allow you to use the metacharacters as literal characters
-
six character classes
Metacharacters are literal characters that have special meaning within regular expressions.
Metacharacter |
Description |
Example |
---|---|---|
|
Matches zero or one occurrence of the preceding character or escape sequence; that is, the preceding character or escape sequence is optional. |
|
|
Matches the preceding character or escape sequence n times. |
For example,
|
|
Allows you to use metacharacters as actual characters and is also used to specify a predefined character class. |
|
You must use a backslash to escape certain characters for the sensitive data preprocessor to interpret them correctly as literal characters.
Use this escaped character... |
To represent this literal character... |
---|---|
\? |
? |
\{ |
{ |
\} |
} |
\\ |
\ |
When defining a custom sensitive data pattern, you can use character classes.
Character Class |
Description |
Character Class Definition |
---|---|---|
\d |
Matches any numeric ASCII character 0-9 |
0-9 |
\D |
Matches any byte that is not a numeric ASCII character |
not 0-9 |
\l (lowercase “ell”) |
Matches any ASCII letter |
a-zA-Z |
\L |
Matches any byte that is not an ASCII letter |
not a-zA-Z |
\w |
Matches any ASCII alphanumeric character Note that, unlike PCRE regular expressions, this does not include an underscore (_). |
a-zA-Z0-9 |
\W |
Matches any byte that is not an ASCII alphanumeric character |
not a-zA-Z0-9 |
The preprocessor treats characters entered directly, instead of
as part of a regular expression, as literal characters. For example, the data
pattern 1234 matches
1234
.
The following data pattern example, which is used in system-provided sensitive data rule 138:4, uses the escaped digits character class, the multiplier and option-specifier metacharacters, and the literal dash (-) and left and right parentheses () characters to detect U.S. phone numbers:
(\d{3}) ?\d{3}-\d{4}
Exercise caution when creating custom data patterns. Consider the following alternative data pattern for detecting phone numbers which, although using valid syntax, could cause many false positives:
(?\d{3})? ?\d{3}-?\d{4}
Because the second example combines optional parentheses, optional spaces, and optional dashes, it would detect, among others, phone numbers in the following desirable patterns:
-
(555)123-4567
-
555123-4567
-
5551234567
However, the second example pattern would also detect, among others, the following potentially invalid patterns, resulting in false positives:
-
(555 1234567
-
555)123-4567
-
555) 123-4567
Consider finally, for illustration purposes only, an extreme
example in which you create a data pattern that detects the lowercase letter
a
using a low event threshold in all destination
traffic on a small company network. Such a data pattern could overwhelm your
system with literally millions of events in only a few minutes.