- DFA,NFA和正则表达式三者的描述能力是一样的。
- 正则表达式可以转换为NFA,已经有成熟的算法实现这一转换。
- NFA可以转换为DFA,也有完美的实现。
正则表达式应用实例前面已经使用python的re模块,简单展示了正则表达式[ab]*abb的匹配过程。下面将结合几个常用的正则表达式例子,展示正则表达式的强大之处。
开始之前,先来看下python中正则表达的一些规定。
- \w 匹配单词字符,即[a-zA-Z0-9_],\W 则恰好相反,匹配[^a-zA-Z0-9_];
- \s 匹配单个的空白字符:space, newline(\n), return(\r), tab(\t), form(\f),即[ \n\r\t\f\v],\S 相反。
- \d 匹配数字[0-9],\D 恰好相反,匹配[^0-9]。
- (…P<name>…) 会产生一个分组,在后面需要时可以用数组下标引用。
- (?P…) 会产生命名组,需要时直接用名字引用。
- (?!…) 当…不出现时匹配,这叫做后向界定符
- r”pattern” 此时pattern为原始字符串,其中的”\”不做特殊处理,r”\n” 匹配包含”\”和”n”两个字符的字符串,而不是匹配新行。当一个字符串是原始类型时,Python编译器不会对其尝试做任何的替换。关于原始字符串更多的内容可以看stackflow上问题Python regex – r prefix
- re.findall: 返回所有匹配搜索模式的字符串组成的列表;
- re.search: 搜索字符串直到找到匹配模式的字符串,然后返回一个re.MatchObject对象,否则返回None;
- re.match: 如果从头开始的一段字符匹配搜索模式,返回re.MatchObject对象,否则返回None。
- re.sub(pattern, repl, string, count=0, flags=0): 返回repl替换pattern后的字符串。
- re.split: 在pattern出现的地方分割字符串。
re.MatchObject默认有一个boolean值True,match()和search()在没有找到匹配时均返回None,因此可以用简单的if语句判断是否匹配。
match = re.search(pattern, string) if match: process(match)
1
2
3
4match = re.search(pattern, string)
if match:
process(match)
re.MatchObject对象主要有以下方法:group([group1, …])和groups([default])。group返回一个或多个分组,groups返回包含所有分组的元组。
例子1:匹配Hello,当且仅当后面没有紧跟着World。
strings = ["HelloWorld!", "Hello World!"] import re pattern = re.compile(r"Hello(?!World).*") for string in strings: result = pattern.search(string) if result: print string, "> ", result.group() else: print string, "> ", "Not match" ''' outputs: HelloWorld! > Not match Hello World! > Hello World! '''
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16strings = ["HelloWorld!",
"Hello World!"]
import re
pattern = re.compile(r"Hello(?!World).*")
for string in strings:
result = pattern.search(string)
if result:
print string, "> ", result.group()
else:
print string, "> ", "Not match"
'''
outputs:
HelloWorld! > Not match
Hello World! > Hello World!
'''
例子2:匹配邮箱地址。目前没有可以完美表达邮箱地址的正则表达式,可以看stackflow上问题Using a regular expression to validate an email address 。这里我们用[w.-]+@[w-]+.[w.-]+来简单地匹配邮箱地址。
content = """ [email protected] alice-bob@gmail.._com gmail [email protected] apple alice.bob@gmailcom invalid gmail """ import re address = re.compile(r'[\w.-]+@[\w-]+\.[\w.-]+') print address.findall(content) ''' outpus: ['[email protected]', 'alice-bob@gmail.._com', '[email protected]'] '''
1
2
3
4
5
6
7
8
9
10
11
12
13
14content = """
[email protected]
alice-bob@gmail.._com gmail
[email protected] apple
alice.bob@gmailcom invalid gmail
"""
import re
address = re.compile(r'[\w.-]+@[\w-]+\.[\w.-]+')
print address.findall(content)
'''
outpus:
['alice@google.com', 'alice-bob@gmail.._com', 'alice.bob@apple.com']
'''
例子3:给函数添加装饰器。
original = """ def runaway(): print "running away..." """ import re pattern = re.compile(r"def (\w+<span class='MathJax_Preview'><img src='http://python.jobbole.com/wp-content/plugins/latex/cache/tex_d41d8cd98f00b204e9800998ecf8427e.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="" /></span><script type='math/tex'></script>:)") wrapped = pattern.sub(r"@get_car\ndef \1", original) print original, "--->", wrapped, "----" """outputs def runaway(): print "running away..." ---> @get_car def runaway(): print "running away..." ---- """
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
original = """
def runaway():
print "running away..."
"""
import re
pattern = re.compile(r"def (\w+\(\):)")
wrapped = pattern.sub(r"@get_car\ndef \1", original)
print original, "--->", wrapped, "----"
"""outputs
def runaway():
print "running away..."
--->
@get_car
def runaway():
print "running away..."
http://rxeqv68272.weebly.com/
http://yunweec.weebly.com/
http://sdfbyjg.weebly.com/
http://rtjhyj.weebly.com/
http://jdtjhj.weebly.com/
http://vfdbfb.weebly.com/
http://vfdbzsdf.weebly.com/
http://gfj7udv.weebly.com/
http://dfukfg.weebly.com/
http://sdgsdgsdf342t4.weebly.com/
http://srgaerhrfg.weebly.com/
http://dsbsdb.weebly.com/
http://sedrfgfn.weebly.com/
http://bdfnfgn.weebly.com/
http://bdfbndfn.weebly.com/
http://bsdbfdb.weebly.com/
http://dsbrfhbgfb.weebly.com/
http://egdrhf.weebly.com/
http://dvbfbf.weebly.com/
http://sdb21fd0b.weebly.com/
http://lbjh6436.weebly.com/
http://fhghhhccc.weebly.com/
http://dfgn0vdv.weebly.com/
http://ghg10jn00njj.weebly.com/
http://sdghfg41.weebly.com/
http://vdfb0fb0.weebly.com/
http://nfdvd.weebly.com/
http://88fvdf.weebly.com/
http://yunweec.weebly.com/
http://sdfbyjg.weebly.com/
http://rtjhyj.weebly.com/
http://jdtjhj.weebly.com/
http://vfdbfb.weebly.com/
http://vfdbzsdf.weebly.com/
http://gfj7udv.weebly.com/
http://dfukfg.weebly.com/
http://sdgsdgsdf342t4.weebly.com/
http://srgaerhrfg.weebly.com/
http://dsbsdb.weebly.com/
http://sedrfgfn.weebly.com/
http://bdfnfgn.weebly.com/
http://bdfbndfn.weebly.com/
http://bsdbfdb.weebly.com/
http://dsbrfhbgfb.weebly.com/
http://egdrhf.weebly.com/
http://dvbfbf.weebly.com/
http://sdb21fd0b.weebly.com/
http://lbjh6436.weebly.com/
http://fhghhhccc.weebly.com/
http://dfgn0vdv.weebly.com/
http://ghg10jn00njj.weebly.com/
http://sdghfg41.weebly.com/
http://vdfb0fb0.weebly.com/
http://nfdvd.weebly.com/
http://88fvdf.weebly.com/