==================
正则表达式
==================
.. contents:: Table of Contents
:backlinks: none
比较HTML标签
-----------------
+------------+--------------+--------------+
| tag type | format | example |
+============+==============+==============+
| 所有标签 | <[^>]+> |
, |
+------------+--------------+--------------+
| 开标签 | <[^/>][^>]*> | ,
|
+------------+--------------+--------------+
| 闭标签 | [^>]+> | , |
+------------+--------------+--------------+
| 自闭合标签 | <[^/>]+/> |
|
+------------+--------------+--------------+
.. code-block:: python
# 开标签
>>> re.search('<[^/>][^>]*>', '') != None
False
# 闭标签
>>> re.search('[^>]+>', '
') != None
True
# 自闭合标签
>>> re.search('<[^/>]+/>', '
') != None
True
``re.findall()`` 匹配字符串
-----------------------------
.. code-block:: python
# 拆分所有字符串
>>> source = "Hello World Ker HAHA"
>>> re.findall('[\w]+', source)
['Hello', 'World', 'Ker', 'HAHA']
# 解析python.org网站
>>> import urllib
>>> import re
>>> s = urllib.urlopen('https://www.python.org')
>>> html = s.read()
>>> s.close()
>>> print("open tags")
open tags
>>> re.findall('<[^/>][^>]*>', html)[0:2]
['', '