================== 正则表达式 ================== .. contents:: Table of Contents :backlinks: none 比较HTML标签 ----------------- +------------+--------------+--------------+ | tag type | format | example | +============+==============+==============+ | 所有标签 | <[^>]+> |
, | +------------+--------------+--------------+ | 开标签 | <[^/>][^>]*> | , | +------------+--------------+--------------+ | 闭标签 | ]+> |

, | +------------+--------------+--------------+ | 自闭合标签 | <[^/>]+/> |
| +------------+--------------+--------------+ .. code-block:: python # 开标签 >>> re.search('<[^/>][^>]*>', '

') != None True >>> re.search('<[^/>][^>]*>', '') != None True >>> re.search('<[^/>][^>]*>', '

') != None True >>> re.search('<[^/>][^>]*>', '

') != None False # 闭标签 >>> re.search(']+>', '') != None True # 自闭合标签 >>> re.search('<[^/>]+/>', '
') != None True ``re.findall()`` 匹配字符串 ----------------------------- .. code-block:: python # 拆分所有字符串 >>> source = "Hello World Ker HAHA" >>> re.findall('[\w]+', source) ['Hello', 'World', 'Ker', 'HAHA'] # 解析python.org网站 >>> import urllib >>> import re >>> s = urllib.urlopen('https://www.python.org') >>> html = s.read() >>> s.close() >>> print("open tags") open tags >>> re.findall('<[^/>][^>]*>', html)[0:2] ['', '