用Python做大批量请求发送

原创. 禁转.

大批量请求发送需要考虑的几个因素:

1. 服务器承载能力(网络带宽/硬件配置);

2. 客户端IO情况, 客户端带宽, 硬件配置;

方案:

1. 方案都是相对的;

2. 因为这里我的情况是客户机只有一台,所以不能考虑使用分布式了, 服务器承载能力也非常有限(经过不断调试得知);

3. 这里没有使用Jmeter, 虽然jmeter也是可以做到的.

注: 如无特殊说明以下代码基于windows7 64位/Centos 6.5 64位, Python3.6+

Python里面支持发送大批量的方案有很多, 这里只介绍我所用过的几种:

1. 使用grequests:

grequests可以一次性发送超大批量的请求, 但是底层听说修改了socket通信, 可能不稳定或者不安全? 而且如果你需要校验对比每个请求的发送信息与返回信息, 比较不方便, 因为它是批量发送,然后批量收回, 示例代码 :

import grequests

import time

from collections import OrderedDict

import hashlib

import os

import xlrd

import json

import datetime

class voiceSearchInterface:

    @classmethod

    def formatUrlAndHeader(self, des, singer, songName):

        #生成url和header的逻辑

        return url, h

    @classmethod

    def exeRequests(self):

        errorLog = open("errorLog.txt","w+")

        startTime = datetime.datetime.now()

        rightCount = 0

        errCount = 0

        descr = ["播放", "搜索", "搜", "听", "我要听", "我想听", "来一首", "来一个", "来一段", "来一曲", "来首", "来个", "来段", "来曲"]

        orgPath = os.path.join(os.path.dirname(os.getcwd()), "test","SongsAndSingers","singersAndSongs3.txt")

        f = open(orgPath,"rb")

        i = 0

        urlsAndHs = []

        for line in f.readlines():

            temp = line.decode().split("\t")

            orgSinger = temp[0]

            orgSong = temp[1].replace("\n","")

            for k in descr:

                urlAndH = self.formatUrlAndHeader(k, orgSinger,orgSong)

                urlsAndHs.append(urlAndH)

        f.close()

        rs = (grequests.get(u[0], headers = u[1], stream = False) for u in urlsAndHs)

        rsText = grequests.imap(rs, size=20)

        for r in rsText:

            executingLog = open("Log.txt","w+")

            i+=1

            try:

                searchResult = json.loads(r.text)

                searchItem = searchResult["data"]["searchitem"]

                tt =  searchItem.split("Split")

                searchSinger = tt[1]

                searchSong = tt[-1]

                resultSinger = searchResult["data"]["sounds"][0]["singer"]

                resultSong = searchResult["data"]["sounds"][0]["title"]

                if(searchSinger==resultSinger and searchSong==resultSong):

                    rightCount += 1

                else:

                    errCount += 1

                    print(searchSinger, "\t",resultSinger, "\t",searchSong,"\t", resultSong)

            except Exception:

                errCount += 1

                errorLog.write((r.text+"\n").encode('latin-1').decode('unicode_escape'))

            print(i)

            executingLog.write(str(int(i/14)))

        errorLog.close()

        executingLog.close()

        endTime = datetime.datetime.now()

        print("耗时: %d秒, 正确数: %d, 异常数: %d, 总数: %d, 通过率: %.2f%%" % ((endTime-startTime).seconds, rightCount, errCount,  i, (rightCount)/i*100))

voiceSearchInterface.exeRequests()

注意: 使用grequests可能有坑, 因为它修改了底层socket通信, 可能会造成系统有问题,我目前虽然还没遇到,但还是在这里友情提醒下.

2. 使用多进程+requests库:

Python里面的多进程库multiprocessing和requests库都是神器, 下面直接上代码:

#_*_coding=utf-8_*_

import multiprocessing

import time

from collections import OrderedDict

import hashlib

import linecache

import os

import requests

import json

def formatUrlAndHeader(des, singer, songName):

    #生成url和header的逻辑

    return url, h

#每个进程都去读各自的文件,然后以写文件的方式保存当前的执行记录,为了预防断电或者其他程序异常终止情况

def worker(fileName):

    descr = ["播放", "搜索", "搜", "听", "我要听", "我想听", "来一首", "来一个", "来一段", "来一曲", "来首", "来个", "来段", "来曲"]

    Logprefix = os.path.split(fileName)[1].replace(".txt", "")

    resultLogPath = os.path.join(os.getcwd(), "log", Logprefix+".log")

    logbreakPoint = os.path.join(os.getcwd(), "log", Logprefix+".txt")

    with open(logbreakPoint, "r") as b:

        startLine = int(b.read())

        b.close()

    with open(resultLogPath, "a+", encoding="utf-8") as logF:

        with open(fileName, "r", encoding="utf-8") as f:

            lines = f.readlines()

            f.close()

            LineNum = startLine

            for j in range(len(lines)-startLine+1):

                LineContent = linecache.getline(fileName, LineNum)

                for i in descr:

                    line = LineContent.split("\t")

                    singer = line[0]

                    song = line[1].replace("\n","")

                    uAndH = formatUrlAndHeader(i, singer, song)

                    try:

                        r = requests.get(url=uAndH[0], headers = uAndH[1])

                        with open(logbreakPoint, "w") as w:

                            w.write(str(LineNum))

                        print("searching:%s, line: %d\n" % (fileName, LineNum))

                        result = json.loads(r.text)

                        resultSinger = result["data"]["sounds"][0]["singer"]

                        resultSong = result["data"]["sounds"][0]["title"]

                        if not (resultSinger==singer and resultSong==song):

                            logF.write("Error: search des: %s, singer:%s, song:%s;return: %s\n" %(i,singer,song, r.text.encode('latin-1').decode('unicode_escape')))

                    except Exception as e:

                        logF.write("Error: search des: %s, singer:%s, song:%s;return: %s\n" %(i,singer,song,str(e).encode('latin-1').decode('unicode_escape')))

                LineNum += 1

        logF.close()

if __name__=='__main__':

    orgPath = os.path.join(os.getcwd(), "data")

    files = os.listdir(orgPath)

    for i in files:

        f =os.path.join(orgPath,i)

        if os.path.isfile(f):

            p = multiprocessing.Process(target=worker, args=(f,))

            p.start()

程序会根据数据源文件数量, 生成相应的进程数. 每个进程各自读各自的数据源文件, 然后调用formatUrlAndHeader方法获取url和heade, 挨个发送请求并保存当前执行记录到指定文件. 这种方式的好处在于针对每个请求, 都能对比发送前的参数和收回的请求相应数据.

3. 使用异步asyncio, aiohttp

asyncio是python3.4+才进入的新东西, 是Python3.4+以上的标准库, 是推荐采用的方式, 而aiohttp需要单独安装, 代码如下:

#_*_coding=utf-8_*_

import aiohttp

import time

from collections import OrderedDict

import hashlib

import asyncio

import os

import linecache

import threading

def formatUrlAndHeader(des, singer, songName):

    #生成url和header的逻辑

    return url, h

async def fetch_async(uandh):

    u, h = uandh[0],uandh[1]

    with aiohttp.Timeout(301):

        async with aiohttp.request('GET', url=u, headers=h) as r:

            data = await r.text()

            return data

loop = asyncio.get_event_loop()

descr = ["播放", "搜索", "搜", "听", "我要听", "我想听", "来一首", "来一个", "来一段", "来一曲", "来首", "来个", "来段", "来曲"]

orgPath = os.path.join(os.path.dirname(os.getcwd()), "test","SongsAndSingers","singersAndSongs3.txt")

def runRequests(startNum):

    start = time.time()

    urlsAndHs = []

    for i in range(20):

        line = linecache.getline(orgPath, startNum+i).split("\t")

        orgSinger = line[0]

        orgSong = line[1].replace("\n","")

        for k in descr:

            urlAndH = formatUrlAndHeader(k, orgSinger,orgSong)

            urlsAndHs.append(urlAndH)

    linecache.clearcache()

    tasks = [fetch_async(uandh) for uandh in urlsAndHs]

    done, pending = loop.run_until_complete(asyncio.wait(tasks))

    for i in done:

        print(i.result().encode('latin-1').decode('unicode_escape'))

    end = time.time()

    print(end-start)   

for i in range(1,50,20):

    t = threading.Thread(target=runRequests, args=(i,))

    t.start()

    t.join()

一个源数据文件, 多线程. 每个线程根据传入的起始行号连续读取文件的20行, 然后批量发送20个请求, 下一个线程必须等待上一个线程结束才开始. 这种方式也是批量发, 批量收回,不能单独对比每个请求的请求前参数, 请求后相应.

以上3种方式, 任何一种都能满足我的测试要求. 实际过程中发现:

1. PHP接口对于单个请求, 参数pagesize对相应速度影响甚大, 具体原因未知;

2. 服务器对IO密集型的操作, 非常消耗CPU. 以上3种方式, 基本上都是每次只发20个请求左右, 而服务器的CPU(8核)已经满载了!

用Python做大批量请求发送的更多相关文章

分别用postman和python做post请求接口功能测试
前几天,在做一个post请求的接口功能测试的时候,发现数据始终无法入库, 认真加仔细检查了请求的url.方式.参数,均没有问题找到技术确认,原来是需要传json格式数据在头信息中加上类型,body ...
python通过http请求发送soap报文进行webservice接口调用
最近学习Python调用webservice 接口,开始的时候主要采用suds 的方式生产client调用,后来发现公司的短信接口采用的是soap报文来调用的,然后开始了谷歌,最后采用httplib ...
python使用post请求发送图片并接受图片
图像读取编码与反编码: import requests import json import numpy as np import cv2 import base64 # 首先将图片读入 # 由于要发 ...
python用httplib模块发送get和post请求
在python中,模拟http客户端发送get和post请求,主要用httplib模块的功能. 1.python发送GET请求我在本地建立一个测试环境,test.php的内容就是输出一句话: 1 e ...
python爬虫---scrapy框架爬取图片,scrapy手动发送请求,发送post请求,提升爬取效率,请求传参(meta),五大核心组件,中间件
# settings 配置 UA USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, l ...
python 使用 requests 做 http 请求
1. get import requests # 最简单的get请求 r = requests.get(url) print(r.status_code) print(r.json()) # url ...
Python中http请求方法库汇总
最近在使用python做接口测试,发现python中http请求方法有许多种,今天抽点时间把相关内容整理,分享给大家,具体内容如下所示: 一.python自带库----urllib2 python自带 ...
python接口测试—get请求（一）
python 做借口测试用到的是requests模块,首先要导入requests库,pip install requests 1.get直接请求方式以豆瓣网为例: url = 'https://re ...
python的post请求抓取数据
python通过get方式,post方式发送http请求和接收http响应-urllib urllib2 python通过get方式,post方式发送http请求和接收http响应-- import ...

随机推荐

关于MATLAB处理大数据坐标文件201763
目前已经找出26条特征 ,但是提交数据越来越少,给我的感觉是随机森林画的范围越来越小,输出的机器数据也越来越少,我自认为特征没太大问题我已经将不懂之处列了出来,将于明天咨询大师级人物
echarts3 清空上一次加载的series数据
今天做图表的时候发现了一个问题,想和大家分享一下我有一个下拉选框,每次选中都切换不同的数据,数据是从后台查询获取的,但是如果后台返回了数据每次渲染都没有问题,如果后台没有返回数据,但是我在渲染图表的 ...
探索Windows命令行系列(4)：通过命令管理文件和文件夹
1.文件夹操作 1.1.DIR(directory)命令 1.2.TREE 命令 1.3.CD(change directory)命令 1.4.MD(make directory)命令 1.5.RD( ...
VMware中Linux系统时间与主机同步以及时区设置
网络上有各种资料,但最简单的一种方法就是修改虚拟机的配置文件 *.vmx .修改 tools.syncTime = "FALSE" 为 tools.syncTime = " ...
『转』MarsEdit快速插入源代码
开始用MarsEdit来写博文,客户端的,毕竟是要方便的多啊. 遇到的第一个问题就是:MarsEdit没有提供快速插入源代码的工具,而对于我这枚码农而言,这个就有点太杯具了. 简单研究了一下,发现Ma ...
Linux文件属性上
文件属性概述(ls -lhi) linux里一切皆文件Linux系统中的文件或目录的属性主要包括:索引节点(inode),文件类型,权限属性,链接数,所归属的用户和用户组,最近修改时间等内容: 解释: ...
jzoj3760. 【BJOI2014】Euler
题目大意: 欧拉函数 φ(n) 定义为不超过正整数 n 并且与 n 互素的整数的数目. 可以证明 φ(n) = n ∗ ∏ (1 − 1 / pi). 其中 pi(1 <= i <= ...
一级缓存二级缓存（hibernate）
缓存是介于应用程序和物理数据源之间,其作用是为了降低应用程序对物理数据源访问的频次,从而提高了应用的运行性能.缓存内的数据是对物理数据源中的数据的复制,应用程序在运行时从缓存读写数据,在特定的时刻或事 ...
css自定义动画在微信中无法执行的原因
这是我去年年底遇到的一个问题, 在这个过程中我发现了一个比较有趣的问题. 我们在做抽奖的时候,微信分享到朋友圈的页面里,安卓机器无法执行页面中的自定义动画(元宝的位移,进入按钮的放大缩小等等).这让我 ...
高级Java程序员的技术进阶之路
据不完全统计,截至目前(2017.07)为止,中国Java程序员的数量已经超过了100万.而且,随着IT培训业的持续发展和大量的应届毕业生进入社会,Java程序员面临的竞争压力越来越大.那么,作为 ...

用Python做大批量请求发送

用Python做大批量请求发送的更多相关文章

随机推荐

热门专题