Character Classes
A character class is a set of characters that will find a match if any one of the characters included in the set matches. The following table summarizes character matching syntax.
Character class | Description |
---|---|
. | Matches any character except \n. If modified by the Singleline option, a period character matches any character. For more information, see Regular Expression Options. |
[aeiou] | Matches any single character included in the specified set of characters. |
[^aeiou] | Matches any single character not in the specified set of characters. |
[0-9a-fA-F] | Use of a hyphen (– ) allows specification of contiguous character ranges. |
\p{name} | Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing. |
\P{name} | Matches text not included in groups and block ranges specified in {name}. |
\w | Matches any word character. Equivalent to the Unicode character categories[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}] . If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9]. |
\W | Matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}] . If ECMAScript-compliant behavior is specified with the ECMAScript option, \W is equivalent to [^a-zA-Z_0-9]. |
\s | Matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}] . If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is equivalent to [ \f\n\r\t\v]. |
\S | Matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}] . If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v]. |
\d | Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior. |
\D | Matches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior. |
You can find the Unicode category a character belongs to with the method GetUnicodeCategory.
For more information on Unicode character categories, see the document Unicode Data File Format, available on the Unicode Technical Committee's (UTC) Web site at http://www.unicode.org/Public/UNIDATA/UnicodeData.html.
See Also
Regular Expression Language Elements | GetUnicodeCategory | Regular Expression Options