.NET and C# Regular Expression
Microsoft's .NET framework provides a consistent and powerful set of regular expression classes for all .NET implementations. The following sections list the .NET regular expression syntax, the core .NET classes.Microsoft's .NET uses a Traditional NFA match engine. For an explanation of the rules behind a Traditional NFA engine.
.NET defines its regular expression support in the System.Text.RegularExpressions module. The RegExp( ) constructor handles regular expression creation, and the rest of the RegExp methods handle pattern matching. The Groups and Match classes contain information about each match.
C#'s raw string syntax, @"", allows you to define regular expression patterns without having to escape embedded backslashes.
Match
Properties
- public bool Success
-
Indicates whether the match was successful.
- public string Value
-
Text of the match.
- public int Length
-
Number of characters in the matched text.
- public int Index
-
Zero-based character index of the start of the match.
- public GroupCollection Groups
-
A GroupCollection object where
Groups[0].value contains the text of the entire
match, and each additional Groups element contains
the text matched by a capture group.
Methods
- public Match NextMatch( )
-
Return a Match object for the next match of the
regex in the input string.
- public virtual string Result(string result)
-
Return result with special replacement
sequences replaced by values from the previous match.
- public static Match Synchronized(Match inner)
-
Return a Match object identical to
inner, except also safe for multithreaded
use.
Group
Properties
- public bool Success
-
True if the group participated in the match.
- public string Value
-
Text captured by this group.
- public int Length
-
Number of characters captured by this group.
- public int Index
-
Zero-based character index of the start of the text captured by this
group.
Metacharacters representations
| Sequence |
Meaning |
\a |
Alert (bell), x07. |
\b |
Backspace, x08, supported only in character class. |
\e |
ESC character, x1B. |
\n |
Newline, x0A. |
\r |
Carriage return, x0D. |
\f |
Form feed, x0C. |
\t |
Horizontal tab, x09. |
\v |
Vertical tab, x0B. |
\0octal |
Character specified by a two-digit octal code. |
\xhex |
Character specified by a two-digit hexadecimal code. |
\uhex |
Character specified by a four-digit hexadecimal code. |
\cchar |
Named control character. |
Character classes |
Class
|
Meaning
|
|---|
|
[...]
|
A single character listed or contained within a listed range.
| |
[^...]
|
A single character not listed and not contained within a listed range.
| |
.
|
Any character, except a line terminator (unless single-line mode,
s).
| |
\w
|
Word character,or
[a-zA-Z_0-9] in ECMAScript
mode.
| |
\W
|
Non-word character,or
[^a-zA-Z_0-9] in ECMAScript
mode.
| |
\d
|
Digit,or [0-9] in
ECMAScript mode.
| |
\D
|
Non-digit,or [^0-9] in
ECMAScript mode.
| |
\s
|
Whitespace character,or
[ \f\n\r\t\v] in ECMAScript
mode.
| |
\S
|
Non-whitespace character,or [^ \f\n\r\t\v] in ECMAScript
mode.
| |
\p{prop}
|
Character contained by given Unicode block or property.
| |
\P{prop}
|
Character not contained by given Unicode block or property.
|
Comments and mode modifiers
|
Modifier/sequence
|
Mode character
|
Meaning
|
|---|
|
Singleline
|
s
|
Dot (.) matches any character, including a line
terminator.
| |
Multiline
|
m
|
^ and $ match next to embedded
line terminators.
| |
IgnorePatternWhitespace
|
x
|
Ignore whitespace and allow embedded comments starting with
#.
| |
IgnoreCase
|
i
|
Case-insensitive match based on characters in the current culture.
| |
CultureInvariant
|
i
|
Culture-insensitive match.
| |
ExplicitCapture
|
n
|
Allow named capture groups, but treat parentheses as non-capturing
groups.
| |
Compiled
| |
Compile regular expression.
| |
RightToLeft
| |
Search from right to left, starting to the left of the start position.
| |
ECMAScript
| |
Enables ECMAScript compliance when used with
IgnoreCase or Multiline.
| |
(?imnsx-imnsx)
| |
Turn match flags on or off for rest of pattern.
| |
(?imnsx-imnsx:...)
| |
Turn match flags on or off for the rest of the subexpression.
| |
(?#...)
| |
Treat substring as a comment.
| |
#...
| |
Treat rest of line as a comment in /x mode.
|
Grouping, capturing, conditional, and control
|
Sequence
|
Meaning
|
|---|
|
(...)
|
Grouping. Submatches fill
\1,\2,... and
$1, $2,....
| |
\n
|
In a regular expression, match what was matched by the
nth earlier submatch.
| |
$n
|
In a replacement string, contains the nth
earlier submatch.
| |
(?<name>...)
|
Captures matched substring into group,
name.
| |
(?:...)
|
Grouping-only parentheses, no capturing.
| |
(?>...)
|
Disallow backtracking for subpattern.
| |
...|...
|
Alternation; match one or the other.
| |
*
|
Match 0 or more times.
| |
+
|
Match 1 or more times.
| |
?
|
Match 1 or 0 times.
| |
{n}
|
Match exactly n times.
| |
{n,}
|
Match at least n times.
| |
{x,y}
|
Match at least x times, but no more than
y times.
| |
*?
|
Match 0 or more times, but as few times as possible.
| |
+?
|
Match 1 or more times, but as few times as possible.
| |
??
|
Match 0 or 1 times, but as few times as possible.
| |
{n,}?
|
Match at least n times, but as few times
as possible.
| |
{x,y}?
|
Match at least x times, no more than
y times, but as few times as possible.
|
Example
//Match super-Man, Superman, SUPER-MAN, etc.
namespace Regex_PocketRef
{
using System.Text.RegularExpressions;
class SimpleMatchTest
{
static void Main( )
{
string dailybugle = "Super-Man Menaces City!";
string regex = "super[- ]?man";
if (Regex.IsMatch(dailybugle, regex, RegexOptions.IgnoreCase)) {
//do something
}
}
}
|