It's a very elegant summary of regular expression from The AWK Programming Language.

1. The regular expression metacharacters are:

\ ^ $ . [ ] | ( ) * + ?

2. A basic regular expression is one of the following:

  • a nonmetacharacter, such as A, that matches itself.
  • an escape sequence that matches a special symbol: \t matches a tab.
  • a quoted metacharacter, such as \*, that matches the metaqcharacter literally.
  • ^, which matches the beginning of a string.
  • $, which matches the end of a string.
  • ., which matches any single character.
  • a character class: [ABC] matches any of the characters A, B, or C. Character classes may include abbreviations: [A-Za-z] matches any single letter.
  • a complemented character class: [^0-9] matches any character except a digit.

3. These operators combine regular expressions into larger ones:

  • alternation: A | B matches A or B.
  • concatenation: AB matches A immediately followed by B.
  • closure: A* matches zero or more A's.
  • positive closure: A+ matches one or more A's.
  • zero or one: A? matches the null string or A.
  • parentheses: (r) matches the same strings as r does.

Regular Expressions
Expression Matches
c  the nonmetacharacter c
\c  escape sequence or literal character c
 ^  beginning of string 
 $  end of string
 .  any character
 [$c_1$$c_2$...]  any character in $c_1$$c_2$
 [^$c_1$$c_2$...]  any character not in $c_1$$c_2$
[$c_1$-$c_2$]   any character in the range beginning with $c_1$ and ending with $c_2$
 [^$c_1$-$c_2$]  any character not in the range $c_1$ to $c_2$
 $r_1$|$r_2$  any string matched by $r_1$ or $r_2$
($r_1$)($r_2$)   any string xy where $r_1$ matches x and $r_2$ matches y; parentheses not needed around arguments with no alternations
 (r)*  zero or more consecutive strings matched by r 
 (r)+  one or more consecutive strings matched by r
 (r)?  zero or one string matched by r parentheses not needed around basic regular expressions 
 (r)  any string matched by r

