* Samples
交换两项位置
s/(\S+)\s+(\S+)/$2 $1/
搜索C语言标识符
m/[_A-Za-z][_A-Za-z0-9]*/
m/[_[:alpha:]][_[:alnum:]]*/
空行
/^$/
单词
\b\w+\b
* Questions
* Reference
perlre (bytes and utf8)
regex.h (regcomp regexec regfree regerror) (single byte only)
java (unicode only)
python (bytes and unicode)
*基本结构
*语法
m/regex/ismx
s/regex/replacement/ismxg
*修饰符
i 大小写一视同仁
s 单行模式 (.能够匹配所有的东东)
m 多行模式 (只影响 ^ $,使其匹配一个字符串内的多个行首/行尾)
x 允许空格和注释 (针对perl 有效)
g 全部(替换)
*等价
m/ABC|XYZ/
*序列
m/ABC/
*重复
(agressive)
a? 0 or 1
a* 0 or more
a+ 1 or more
a{m} m
a{m,} m or more
a{m,n} m to n (inclusively)
(懒惰)
a??
a*?
a+?
a{m}?
a{m,}?
a{m,n}?
aa
(a?)(a*) $1 => a a
(a??)(a*) $1 => "" aa
*原子
Character = a b c
Character Class
Escape = \ + non-alpha, such as \\, \+, \(, except reference
Meta Escape= \ + alpha[a-zA-Z]
Groups = (...)
* Character Class
[abc] [a-b] [^abc] [^abc0-9]
[- and [] are considered literal
[-a] = - or a
[^\-]
[[]
[]]
[ ]
* Posix Character Class
[[.a.]] collation
[[=a=]] equivalence
[[:alpha:]]
* Meta
. anything except newlines (normal mode)
. anything (s mode, singleline, dotall)
^ start of string, or start of line (m mode)
$ end of string (including newline), or end of line (m mode)
* Meta Escape
\t \n \r \f \a \e
\0nn \xnn 分别是八进制和16进制
\cA (using algorithm ch ^ 0x40)
\cM
\N{name}
\l lowercase next char
\u uppercase next char
\L...\E lowercase until \E
\U...\E uppercase until \E
\Q...\E quote until \E
\w \W word char
\s \S space
\d \D digit
\b \B boundary
\p{property}
\P{property}
\X combining character sequence
\C single byte (perl)
\<> end of word (emacs)
* Groups
(abc) for capture group
* Special group
(?#comment)
(?imsx-imsx) embedded flags
(?:pattern) for non-capture
(?imsx-imsx:pattern) subpattern
(?=pattern) positive look ahead
(?!pattern) negative look ahead
(?<=pattern) positive look behind
(?<!pattern) negative look behind
* Reference for capture
m/(x)\1/
s/(x)/$1$1/
*传统 vs 扩展
\{m,n\} vs {m,n}
\(xxx\) vs (xxx)
Emacs is still using traditional regular expression
* 特殊扩展
\<> end of word (emacs)
*换行
\n \v \r \r\n \f \x85 \x2028 \x2029 \x1A