Python 3 json与正则式解析速度测试

晚上忽然想到个问题,解析Json数据,用正则式和json包,哪个更快?

写了段代码测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Author: loveNight
# @Date: 2015-10-28 19:59:24

import requests
import re
import json
import timeit

url = r"http://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E9%95%BF%E8%80%85%E8%9B%A4&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=&z=&ic=&word=%E9%95%BF%E8%80%85%E8%9B%A4&s=&se=&tab=&width=&height=&face=&istype=&qc=&nc=1&fr=&pn=60&rn=60&gsm=3c&1447527816768="
re_url = re.compile(r'"objURL":"(.*?)".*?"fromPageTitle":"(.*?)"')

# 这串JSON编码有问题,必须手动指定utf-8
html = requests.get(url).content.decode("utf-8")


def byRe(html):
return re_url.findall(html)


def byJson(html):
data = json.loads(html)
return [[x["objURL"], x["fromPageTitle"]] for x in data["data"] if "objURL" in x.keys() and "fromPageTitle" in x.keys()]


timeJson = timeit.timeit(
"byJson(html)", "from __main__ import byJson; from __main__ import html", number=10000)
print("Json解析一万次", timeJson, "秒")

timeRe = timeit.timeit(
"byRe(html)", "from __main__ import byRe; from __main__ import html", number=10000)
print("Re解析一万次", timeRe, "秒")

运行结果:

Json解析一万次 14.212283402610288 秒
Re解析一万次 9.272848993345992

看来还是正则表达式更快。不了解这两者的实现,暂无法作更进一步的分析。

loveNight wechat
我的微信公众号,放一些有趣的内容,不定期更新。