redis mongodb mysql 三大数据库的更简单的批量操作。批量任务自动聚合器。

1、redis mongodb mysql的python包都提供了批量插入操作，但需要自己在外部对一个例如1000 001个任务进行分解成每1000个为1个小批次，还要处理整除批次数量后的余数，如果做一次还可以，如果是很多任务多要这样做，有点麻烦。

例如redis的，mongo的也一样，需要在外部自己准备一个批量列表,循环完后不要遗漏了没达到批次数量的任务。

city_items是一个迭代器，长度有点大，一下子不好分均匀，每次为了划割批次和兼容余数都要写一坨，如下

        for city_item in city_items:

            task_dict = OrderedDict()

            task_dict['city_cn'] = city_item.get('city')

            task_dict['city_en'] = city_item.get('cityEn')

            task_dict['is_international'] = is_international

            task_dict['url'] = url_city

            self.logger.debug(task_dict)

            task_dict_list.append(task_dict)

            if len(task_dict_list) == 2000:

                self.logger.debug('执行2000个city任务插入')

                with self.redis_local_db7.pipeline(transaction=False) as p:

                    for task_dict in task_dict_list:

                        p.sadd(self.start_urls_key, json.dumps(task_dict))

                    p.execute()

                task_dict_list.clear()

        task_dict_list_lenth = len(task_dict_list)

        if task_dict_list_lenth > 0:

            self.logger.debug('执行{}个city任务插入'.format(task_dict_list_lenth))

            with self.redis_local_db7.pipeline(transaction=False) as p:

                for task_dict in task_dict_list:

                    p.sadd(self.start_urls_key, json.dumps(task_dict))

                p.execute()

            task_dict_list.clear()

        self.logger.debug(total_city_count)

2、更简单的操作应该是这样，在类外只管提交单个任务就可以了，只需要调用一个提交任务的api，在类里面自动聚合多个任务成一个批次。想要处理速度快，一定要是一次批量插入多个任务。，而不是使用多线程，每个线程每次插入一个任务，这两种效率可是相差很大的，尤其是远程公网ip写入。

发出三大数据库的简单批量操作api，使用方法在unittest里面。里面实现的批量操作都是基于redis mongo mysql自身的批量操作api。

# coding=utf8

"""

@author:Administrator

@file: bulk_operation.py

@time: 2018/08/27

三大数据库的更简单的批次操作

"""

import atexit

from typing import Union

import abc

import time

from queue import Queue, Empty

import unittest

from pymongo import UpdateOne, InsertOne, collection, MongoClient

import redis

from app.utils_ydf import torndb_for_python3

from app.utils_ydf import LoggerMixin, decorators, LogManager, MongoMixin  # NOQA

class RedisOperation:

    """redis的操作，此类作用主要是规范下格式而已"""

    def __init__(self, operation_name: str, key: str, value: str):

        """

        :param operation_name: redis操作名字，例如 sadd lpush等

        :param key: redis的键

        :param value: reids键的值

        """

        self.operation_name = operation_name

        self.key = key

        self.value = value

class BaseBulkHelper(LoggerMixin, metaclass=abc.ABCMeta):

    """批量操纵抽象基类"""

    bulk_helper_map = {}

    def __new__(cls, base_object, *args, **kwargs):

        if str(base_object) not in cls.bulk_helper_map:  # 加str是由于有一些类型的实例不能被hash作为字典的键

            self = super().__new__(cls)

            return self

        else:

            return cls.bulk_helper_map[str(base_object)]

    def __init__(self, base_object: Union[collection.Collection, redis.Redis, torndb_for_python3.Connection], threshold: int = 100, is_print_log: bool = True):

        if str(base_object) not in self.bulk_helper_map:

            self._custom_init(base_object, threshold, is_print_log)

            self.bulk_helper_map[str(base_object)] = self

    def _custom_init(self, base_object, threshold, is_print_log):

        self.base_object = base_object

        self._threshold = threshold

        self._is_print_log = is_print_log

        self._to_be_request_queue = Queue(threshold * 2)

        self._current_time = time.time()

        atexit.register(self.__do_something_before_exit)  # 程序自动结束前执行注册的函数

        self._main_thread_has_exit = False

        self.__excute_bulk_operation_in_other_thread()

        self.logger.debug(f'{self.__class__}被实例化')

    def add_task(self, base_operation: Union[UpdateOne, InsertOne, RedisOperation, tuple]):

        """添加单个需要执行的操作，程序自动聚合陈批次操作"""

        self._to_be_request_queue.put(base_operation)

    @decorators.tomorrow_threads(10)

    def __excute_bulk_operation_in_other_thread(self):

        while True:

            if self._to_be_request_queue.qsize() >= self._threshold or time.time() > self._current_time + 10:

                self._do_bulk_operation()

            if self._main_thread_has_exit and self._to_be_request_queue.qsize() == 0:

                break

            time.sleep(10 ** -4)

    @abc.abstractmethod

    def _do_bulk_operation(self):

        raise NotImplementedError

    def __do_something_before_exit(self):

        self._main_thread_has_exit = True

        self.logger.critical(f'程序自动结束前执行  [{str(self.base_object)}]  剩余的任务')

class MongoBulkWriteHelper(BaseBulkHelper):

    """

    一个更简单的mongo批量插入,可以直接提交一个操作，自动聚合多个操作为一个批次再插入，速度快了n倍。

    """

    def _do_bulk_operation(self):

        if self._to_be_request_queue.qsize() > 0:

            t_start = time.time()

            count = 0

            request_list = []

            for _ in range(self._threshold):

                try:

                    request = self._to_be_request_queue.get_nowait()

                    count += 1

                    request_list.append(request)

                except Empty:

                    pass

            if request_list:

                self.base_object.bulk_write(request_list, ordered=False)

            if self._is_print_log:

                self.logger.info(f'[{str(self.base_object)}]  批量插入的任务数量是 {count} 消耗的时间是 {round(time.time() - t_start,6)}')

            self._current_time = time.time()

class RedisBulkWriteHelper(BaseBulkHelper):

    """redis批量插入，比自带的更方便操作非整除批次"""

    def _do_bulk_operation(self):

        if self._to_be_request_queue.qsize() > 0:

            t_start = time.time()

            count = 0

            pipeline = self.base_object.pipeline()

            for _ in range(self._threshold):

                try:

                    request = self._to_be_request_queue.get_nowait()

                    count += 1

                except Empty:

                    pass

                else:

                    getattr(pipeline, request.operation_name)(request.key, request.value)

            pipeline.execute()

            pipeline.reset()

            if self._is_print_log:

                self.logger.info(f'[{str(self.base_object)}]  批量插入的任务数量是 {count} 消耗的时间是 {round(time.time() - t_start,6)}')

            self._current_time = time.time()

class MysqlBulkWriteHelper(BaseBulkHelper):

    """mysql批量操作"""

    def __new__(cls, base_object: torndb_for_python3.Connection, *, sql_short: str = None, threshold: int = 100, is_print_log: bool = True):

        # print(cls.bulk_helper_map)

        if str(base_object) + sql_short not in cls.bulk_helper_map:  # 加str是由于有一些类型的实例不能被hash作为字典的键

            self = object.__new__(cls)

            return self

        else:

            return cls.bulk_helper_map[str(base_object) + sql_short]

    def __init__(self, base_object: torndb_for_python3.Connection, *, sql_short: str = None, threshold: int = 100, is_print_log: bool = True):

        if str(base_object) + sql_short not in self.bulk_helper_map:

            super()._custom_init(base_object, threshold, is_print_log)

            self.sql_short = sql_short

            self.bulk_helper_map[str(self.base_object) + sql_short] = self

    def _do_bulk_operation(self):

        if self._to_be_request_queue.qsize() > 0:

            t_start = time.time()

            count = 0

            values_list = []

            for _ in range(self._threshold):

                try:

                    request = self._to_be_request_queue.get_nowait()

                    count += 1

                    values_list.append(request)

                except Empty:

                    pass

            if values_list:

                real_count = self.base_object.executemany_rowcount(self.sql_short, values_list)

                if self._is_print_log:

                    self.logger.info(f'【{str(self.base_object)}】  批量插入的任务数量是 {real_count} 消耗的时间是 {round(time.time() - t_start,6)}')

                self._current_time = time.time()

class _Test(unittest.TestCase, LoggerMixin):

    @unittest.skip

    def test_mongo_bulk_write(self):

        # col = MongoMixin().mongo_16_client.get_database('test').get_collection('ydf_test2')

        col = MongoClient().get_database('test').get_collection('ydf_test2')

        with decorators.TimerContextManager():

            for i in range(50000 + 13):

                # time.sleep(0.01)

                item = {'_id': i, 'field1': i * 2}

                mongo_helper = MongoBulkWriteHelper(col, 10000, is_print_log=True)

                mongo_helper.add_task(UpdateOne({'_id': item['_id']}, {'$set': item}, upsert=True))

    @unittest.skip

    def test_redis_bulk_write(self):

        with decorators.TimerContextManager():

            r = redis.Redis(password='')

            # redis_helper = RedisBulkWriteHelper(r, 100)  # 放在外面可以

            for i in range(100003):

                # time.sleep(0.2)

                redis_helper = RedisBulkWriteHelper(r, 2000)  # 也可以在这里无限实例化

                redis_helper.add_task(RedisOperation('sadd', 'key1', str(i)))

# @unittest.skip

    def test_mysql_bulk_write(self):

        mysql_conn = torndb_for_python3.Connection(host='localhost', database='test', user='root', password='', charset='utf8')

        with decorators.TimerContextManager():

            # mysql_helper = MysqlBulkWriteHelper(mysql_conn, sql_short='INSERT INTO test.table_2 (column_1, column_2) VALUES (%s,%s)', threshold=200) # 最好写在循环外

            for i in range(100000 + 9):

                mysql_helper = MysqlBulkWriteHelper(mysql_conn, sql_short='INSERT INTO test.table_2 (column_1, column_2) VALUES (%s,%s)', threshold=20000, )  # 支持无限实例化，如果不小心写在循环里面了也没关系

                mysql_helper.add_task((i, i * 2))

if __name__ == '__main__':

    unittest.main()

三种数据库批量操作方式相同，调用方式就是，调用add_task方法，提交一个任务就可以了。

mysql批量操作的截图

3、代码里面主要是使用了模板模式、享元模式、代理模式这三种。

模板模式是节约代码，用于在扩展其他数据库种类批量操作，少写一些方法。可以使用策略模式代替。

享元模式，是不需要使用者很小心在一个合适的代码位置初始化，然后一直使用这个对象。可以支持在任意位置包括for循环里面初始化实例。

代理模式，用户不需要直接使用三大数据库的官方pipeline excutemany bulkwrite方法，对象里面自己来调用这些官方接口。

redis mongodb mysql 三大数据库的更简单的批量操作。批量任务自动聚合器。的更多相关文章

DB-Engines Ranking : Redis, MongoDB, MySQL
DB-Engines Ranking http://db-engines.com/en/ranking The DB-Engines Ranking ranks database management ...
Redis（1.8）Redis与mysql的数据库同步（缓存穿透与缓存雪崩）
[1]缓存穿透与缓存雪崩 [1.1]缓存和数据库间数据一致性问题分布式环境下(单机就不用说了)非常容易出现缓存和数据库间的数据一致性问题,针对这一点的话,只能说,如果你的项目对缓存的要求是强一致性的 ...
【数据库】Redis/MongoDB/MySQL/Oracle随笔索引
数据库体系 [思维导图]数据库体系密码: a8ni Redis JPA
Unity3D游戏开发之SQLite让数据库开发更简单
各位朋友大家好.欢迎大家关注我的博客,我是秦元培,我是博客地址是http://blog.csdn.net/qinyuanpei.在经历了一段时间的忙碌后,博主最终有时间来研究新的东西啦,今天博客向和大 ...
Asp.net 实现Session分布式储存(Redis,Mongodb,Mysql等) sessionState Custom
对于asp.net 程序员来说,Session的存储方式有InProc.StateServer.SQLServer和Custom,但是Custom确很少有人提及.但Custom确实最好用,目前最实用和 ...
mongodb，redis，mysql 简要对比
本篇内容大部分不是原创,转载的会贴有链接. 准备学习下数据库,想对目前的主流数据库做一个简单的了解分析,就搜集了资料整理到了一块. 当下主流的要数NoSql数据库了,拥有强大的高并发能力. mongo ...
redis事务与关系型数据库事务比较
redis 是一个高性能的key-value 数据库.作为no sql 数据库redis 与传统关系型数据库相比有简单灵活.数据结构丰富.高速读写等优点. 本文主要针对redis 在事物方面的处理与传 ...
python mysql redis mongodb selneium requests二次封装为什么大都是使用类的原因，一点见解
1.python mysql redis mongodb selneium requests举得这5个库里面的主要被用户使用的东西全都是面向对象的,包括requests.get函数是里面每次都是实例 ...
{MySQL数据库初识}一数据库概述二 MySQL介绍三 MySQL的下载安装、简单应用及目录介绍四 root用户密码设置及忘记密码的解决方案五修改字符集编码六初识sql语句
MySQL数据库初识 MySQL数据库本节目录一数据库概述二 MySQL介绍三 MySQL的下载安装.简单应用及目录介绍四 root用户密码设置及忘记密码的解决方案五修改字符集编码六 ...

随机推荐

java中哪些数值不能被初始化
main方法中的变量不能被初始化 final修饰的变量不能被初始化·
Qt 4.6.2静态编译后，创建工程出现中文乱码的解决办法
一.如果静态编译是用mingw编译的 1)在pro文件里增加QTPLUGIN += qcncodecs 2)在main函数所在的文件里面增加#include <QtPlugin>和Q_IM ...
Structured Streaming教程(2) —— 常用输入与输出
上篇了解了一些基本的Structured Streaming的概念,知道了Structured Streaming其实是一个无下界的无限递增的DataFrame.基于这个DataFrame,我们可以做 ...
linux上安装软件
一.rpm包安装方式步骤: 1.找到相应的软件包,比如soft.version.rpm,下载到本机某个目录: 2.打开一个终端,su -成root用户: 3.cd soft.version.rpm所 ...
C# 8.0中的模式匹配
C# 8.0中的模式匹配相对C# 7.0来说有了进一步的增强,对于如下类: class Point{ public int X { get; } public int Y { get; } ...
error LNK1104:无法打开文件"lua51.lib"
今天学习C++与Lua通信,遇到了问题:fatal error LNK1104: 无法打开文件“lua51.lib” 开发环境: VS2012 cocos版本:cocos2d-x-3.0 已经按书&l ...
C++中extern “C”含义及extern、static关键字浅析
https://blog.csdn.net/bzhxuexi/article/details/31782445 1.引言 C++语言的创建初衷是“a better C”,但是这并不意味着C++中类似C ...
Android GUI之View绘制流程
在上篇文章中,我们通过跟踪源码,我们了解了Activity.Window.DecorView以及View之间的关系(查看文章:http://www.cnblogs.com/jerehedu/p/460 ...
Android进程命令查看
• 进程是指一个具有独立功能的程序在某个数据集上的一次动态运行过程,它是系统进行资源分配和调度的最小单元. • 一个进程能够拥有多个线程.每一个线程必须有一个父进程. • ...
angularjs drag and drop
angular-dragula Drag and drop so simple it hurts 480 live demo angular-drag-and-drop-lists Angular d ...

redis mongodb mysql 三大数据库的更简单的批量操作。批量任务自动聚合器。

redis mongodb mysql 三大数据库的更简单的批量操作。批量任务自动聚合器。的更多相关文章

随机推荐

热门专题