python Requests库在处理response时的一些陷阱
python的Requests(http://docs.python-requests.org/en/latest/)库在处理http/https请求时还是比较方便的,应用也比较广泛。
但其在处理response时有一些地方需要特别注意,简单来说就是Response对象的content方法和text方法的区别,具体代码如下:
@property
def content(self):
"""Content of the response, in bytes.""" if self._content is False:
# Read the contents.
try:
if self._content_consumed:
raise RuntimeError(
'The content for this response was already consumed') if self.status_code == 0:
self._content = None
else:
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes() except AttributeError:
self._content = None self._content_consumed = True
# don't need to release the connection; that's been handled by urllib3
# since we exhausted the data.
return self._content @property
def text(self):
"""Content of the response, in unicode. if Response.encoding is None and chardet module is available, encoding
will be guessed.
""" # Try charset from content-type
content = None
encoding = self.encoding if not self.content:
return str('') # Fallback to auto-detected encoding.
if self.encoding is None:
encoding = self.apparent_encoding # Decode unicode from given encoding.
try:
content = str(self.content, encoding, errors='replace')
except (LookupError, TypeError):
# A LookupError is raised if the encoding was not found which could
# indicate a misspelling or similar mistake.
#
# A TypeError can be raised if encoding is None
#
# So we try blindly encoding.
content = str(self.content, errors='replace') return content
@property
def apparent_encoding(self):
"""The apparent encoding, provided by the lovely Charade library
(Thanks, Ian!)."""
return chardet.detect(self.content)['encoding']
可以看出text方法中对原始数据做了编码操作
其中response的encoding属性是在adapters.py中的HTTPAdapter中的build_response中进行赋值,具体代码如下:
def build_response(self, req, resp):
"""Builds a :class:`Response <requests.Response>` object from a urllib3
response. This should not be called from user code, and is only exposed
for use when subclassing the
:class:`HTTPAdapter <requests.adapters.HTTPAdapter>` :param req: The :class:`PreparedRequest <PreparedRequest>` used to generate the response.
:param resp: The urllib3 response object.
"""
response = Response() # Fallback to None if there's no status_code, for whatever reason.
response.status_code = getattr(resp, 'status', None) # Make headers case-insensitive.
response.headers = CaseInsensitiveDict(getattr(resp, 'headers', {})) # Set encoding.
response.encoding = get_encoding_from_headers(response.headers)
response.raw = resp
response.reason = response.raw.reason if isinstance(req.url, bytes):
response.url = req.url.decode('utf-8')
else:
response.url = req.url # Add new cookies from the server.
extract_cookies_to_jar(response.cookies, req, resp) # Give the Response some context.
response.request = req
response.connection = self return response
从上述代码(response.encoding = get_encoding_from_headers(response.headers))中可以看出,具体的encoding是通过解析headers得到的,
def get_encoding_from_headers(headers):
"""Returns encodings from given HTTP Header Dict. :param headers: dictionary to extract encoding from.
""" content_type = headers.get('content-type') if not content_type:
return None content_type, params = cgi.parse_header(content_type) if 'charset' in params:
return params['charset'].strip("'\"") if 'text' in content_type:
return 'ISO-8859-1'
为避免Requests采用chardet去猜测response的编码,请慎用text属性,直接使用content属性即可,再根据实际需要进行编码。
对于服务端没有显式指明charset的response来说,采用text和content的差别如下所示:
代码:
print time.time()
print 'begin request'
r = requests.get(r'http://www.sina.com.cn')
# erase response encoding
r.encoding = None
r.text
#r.content
print 'request end'
print time.time()
采用text时的耗时:
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAKMAAABICAIAAAAVjdKWAAADXElEQVR4nO2bW3LkIAxF2W7vqrcx8+UtzCwjq5iPVHUxIF2uMDEP6VQ+aMCS4NhVbpKk33+/Pj+//ny93+/X65WC8wjTXgjTXgjTXsCmr+sq5l8ZYmc+qvXXQ2KKjrx4Poiv0SySjDMfYFpbHm7nH639YlI+LxPQ5IbJu41s0TRzw44ynSeymhs137rGc0x/D/GmyaHaaHZHXcyFoM3UecdKHQfXvBwdpvN1aqO4P9+mpIgvgohDYL5YTHO+Bq6HjzOTsc90x44n6Vm3xheHcA8vCcTZSfZA06TmVG3QT5huXk4aGhVnPkPevbUeciZpmq/H2s57tNuFj7MozXdvcfHi8njTRRwmPk4Kgljrvx9nUeKMzAth2gth2gth2gth2gth2gth2gv4+/SoLDt97zwV7ZkO0002W9Ezps9jv/3BpvlTwLwzH20GYfZLC26tpy6MjwPqZ5YwH+aZbu4U2EGxszm/pqihr55Rbb7shegwnVNMSD9suhBsrQe36zigPLLsheh+pnMeM810gvhMm8zuyLQo70OR5QHTT7aTsidLA75Pp9avbMWA+A6od4fZKZBUG6qTFv1iGTiOlhoXvwpDzsis5p5ktXqmMeo0FNz7E1mzqjnEubcXwrQXwrQXwrQXwrQXwnTOyS/qG5l+RkCYnsxjAnyZ1o4ME3FqmJQzYWscbTJpgomjzXdkOm3+++CBec+h+TcnSbJe3Pt97TpO0je6w/T9Oo/C9HdkjIYhu1lP6DB9v86j6Db9ZDv/yJgYmPccmv8/XczXhq7/qfuTtKE4jpaaWRcopm6DOs9h+LesM7fpAMaaBk9kMJldTk6Cu4RpL4RpL4RpL4RpL2DT5PfpSwL010Niio68eD7o17DGXxdgWtsm3M4/WvvFpHxe03xGElP/NrKtZ2QfRpnOE90xZ62z79r6kr1Nfw/xpsmh2mhxSy1ruqhTi780Haa1NTNXFW0gvggiDoH5Wp34Eq3mS7oRySCrMPaZZjQn/VGuLZLx+dTNOPU0sTxrqPkMNM3vdfE0rG/6fqj5DHn31nrImaRpvp47dYqFmepclOa7t7hacXm86SIOEx8nBfPJONpQxz4sSpyReSFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMeyFMO+Ef673JmtCMTGwAAAAASUVORK5CYII=" alt="" />
采用content时的耗时:
aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAK8AAABDCAIAAABldsLbAAADVElEQVR4nO2dUXLkIAxEue7caq6x+fIVdo+RU+RvxABqGoYllt2v8oFBkYTUdpWZqSR9/ft+/fz5+/18Ph+PRxL3BKvhOI7C/shoTuar3ny91AzRjevZe0HBvjxG84wNUIO3fzzOL0fnm0HJWJ59Dd/C0TzD01QDuKterFJDHsiL6MVi7Ov5OTUweYZn6NmQA1a7TSoG29TArIKEpYaj/hV8ew11FIijcNJc8uyBH5AhyJnPMzZrnw2MFJJ/q9WdZvwMhe7uq7acyDMqC9Uw2o+6oKdVw2ieUVnyTuHNkJZklfl8PsnTUxufZ2C67xTNijRLwKuh8MP4x0GBveecDDFRh8DoLFIYUoMwpAZhSA3CkBqEITUIQ2oQBj5vWBXlau/lV8V7NkgNXa63o01quB7XrA9WA38im0/mq10nTE0956P51InxfkD+zBZiwDwbutUEVW5Odu1rihzm8lk15tMOxoQacgqD9J/VUIhgNB88rv2A9Mi0gzH9bMjZpgZmEvhnxmR0qQFV8P3WOtI7G9Swc5ycmoQHnDck/wsgoARYJXUFmWqCoN5SHbSYb6aB/XihcfKRWHIWOdrdnZwtn1Oz6mQa3EO/yDmzOi/6nEIYUoMwpAZhSA3CkBqEITXk3P0FJJAa9jRJagighm1NkhrQN+G8k13vBDc5Z/ijfjxjsluMH89eanA/tWq2efO4vsQsjHsvut99Si1lFPfQ3Lj2k/xmTKjh8zxvx9D3IplWLal4bTChhs/zvB3Tatg5zi+Zbi2Mey+6f7+hsPeWjnfq+dQqOvbjhWb2BZKpxyDPe7H8DfO+pbwAa9UA7mwRgCinT2IHUoMwpAZhSA3CkBqEgdVAnjccLcB8vdQM0Y3r2XtBsR+P0XwCA9TglQCP88vR+WZQMpZnT/rxGM0nNqNnkS9WqSEPNNpFxp7MubvHT2zCMPRsyAGr3WYUg21q+C2bMEyoIe+ft4rn8/YnRxyFk+aSZw/84CS9nTbtcR1CsvbZQFa/KHE9IP137ZvzfAvn/AdmoRpG+1EXeoMahponNVz5/1Mw/hm1MSoJSfedolmgZgl4NRR+GP84KLD3nJMhhvyHR2eRwpAahCE1CENqEIbUIAypQRhSgzCkBmFIDeLFD3SGYZeFVbbYAAAAAElFTkSuQmCC" alt="" />
python Requests库在处理response时的一些陷阱的更多相关文章
- python requests库学习笔记(上)
尊重博客园原创精神,请勿转载! requests库官方使用手册地址:http://www.python-requests.org/en/master/:中文使用手册地址:http://cn.pytho ...
- Python——Requests库的开发者接口
本文介绍 Python Requests 库的开发者接口,主要内容包括: 目录 一.主要接口 1. requests.request() 2. requests.head().get().post() ...
- Python:requests库、BeautifulSoup4库的基本使用(实现简单的网络爬虫)
Python:requests库.BeautifulSoup4库的基本使用(实现简单的网络爬虫) 一.requests库的基本使用 requests是python语言编写的简单易用的HTTP库,使用起 ...
- Python requests库的使用(一)
requests库官方使用手册地址:http://www.python-requests.org/en/master/:中文使用手册地址:http://cn.python-requests.org/z ...
- 大概看了一天python request源码。写下python requests库发送 get,post请求大概过程。
python requests库发送请求时,比如get请求,大概过程. 一.发起get请求过程:调用requests.get(url,**kwargs)-->request('get', url ...
- 使用python requests库写接口自动化测试--记录学习过程中遇到的坑(1)
一直听说python requests库对于接口自动化测试特别合适,但由于自身代码基础薄弱,一直没有实践: 这次赶上公司项目需要,同事小伙伴们一起学习写接口自动化脚本,听起来特别给力,赶紧实践一把: ...
- python利用requests库模拟post请求时json的使用
我们都见识过requests库在静态网页的爬取上展现的威力,我们日常见得最多的为get和post请求,他们最大的区别在于安全性上: 1.GET是通过URL方式请求,可以直接看到,明文传输. 2.POS ...
- Python Requests库:HTTP for Humans
Python标准库中用来处理HTTP的模块是urllib2,不过其中的API太零碎了,requests是更简单更人性化的第三方库. 用pip下载: pip install requests 或者git ...
- python requests库学习笔记(下)
1.请求异常处理 请求异常类型: 请求超时处理(timeout): 实现代码: import requestsfrom requests import exceptions #引入exc ...
随机推荐
- JavaScipt 源码解析 回调函数
函数是第一类对象,这是javascript中的一个重要的概念,意味着函数可以像对象一样按照第一类管理被使用,所以在javascript中的函数: 能"存储"在变量中,能作为函数的实 ...
- (转)【深入浅出jQuery】源码浅析2--奇技淫巧
[深入浅出jQuery]源码浅析2--奇技淫巧 http://www.cnblogs.com/coco1s/p/5303041.html
- 非常简单的XML解析(SAX解析、pull解析)
这里只是把解析的数据当日志打出来了 非常简单的xml解析方式 package com.example.demo.service; import java.io.IOException; import ...
- eval解析非标准json
以前一直在用,但是不知道原理,惭愧啊,今天把自己想法加上. eval("{a:1}"); eval("{a:,b:1}"); 第一眼的感觉是都会得到一个对象,其 ...
- docker 会这些也够
$ sudo systemctl start docker $ sudo systemctl stop docker $ sudo systemctl restart docker If you wa ...
- JavaScript 用法
JavaScript 用法 HTML 中的脚本必须位于 <script> 与 </script> 标签之间. 脚本可被放置在 HTML 页面的 <body> 和 & ...
- javascript的坑
1 for in循环:使用它时,要主要遍历的是所有可枚举的属性(实例以及原型中的属性) function Person(name){ this.name = name; } Person.protot ...
- android recyclerview 更新ui
http://blog.csdn.net/leejizhou/article/details/51179233
- C++二进制文件中读写bitset
这个比较简单,直接上代码: bitset< > *b = >(); bitset< > *c = >(); ofstream out("I:\\test. ...
- libdispatch for Linux
这个Dispatch是苹果的一个高效的处理库,它在ubuntu上的安装如下: Build/Runtime Requirements 如下: libBlocksRuntime libpthread_wo ...