概述

如何快速插入大量数据比如几千万上亿的带索引的数据表。

数据准备

准备一个拥有二十个索引的数据表。


kingbase=# \d+ bigtab
Table "kingbase.bigtab"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
c01 | integer | | | | plain | |
c02 | integer | | | | plain | |
c03 | integer | | | | plain | |
c04 | integer | | | | plain | |
c05 | integer | | | | plain | |
c06 | integer | | | | plain | |
c07 | integer | | | | plain | |
c08 | integer | | | | plain | |
c09 | integer | | | | plain | |
c10 | integer | | | | plain | |
c11 | integer | | | | plain | |
c12 | integer | | | | plain | |
c13 | integer | | | | plain | |
c14 | integer | | | | plain | |
c15 | integer | | | | plain | |
c16 | integer | | | | plain | |
c17 | integer | | | | plain | |
c18 | integer | | | | plain | |
c19 | integer | | | | plain | |
c20 | integer | | | | plain | |
c21 | integer | | | | plain | |
c22 | integer | | | | plain | |
c23 | integer | | | | plain | |
c24 | integer | | | | plain | |
c25 | integer | | | | plain | |
c26 | integer | | | | plain | |
c27 | integer | | | | plain | |
c28 | integer | | | | plain | |
c29 | integer | | | | plain | |
t01 | text | | | | extended | |
t02 | text | | | | extended | |
t03 | text | | | | extended | |
t04 | text | | | | extended | |
t05 | text | | | | extended | |
t06 | text | | | | extended | |
t07 | text | | | | extended | |
t08 | text | | | | extended | |
t09 | text | | | | extended | |
t10 | text | | | | extended | |
t11 | text | | | | extended | |
t12 | text | | | | extended | |
t13 | text | | | | extended | |
t14 | text | | | | extended | |
t15 | text | | | | extended | |
t16 | text | | | | extended | |
t17 | text | | | | extended | |
t18 | text | | | | extended | |
t19 | text | | | | extended | |
t20 | text | | | | extended | |
Indexes:
"bigtab_i01" btree (c01)
"bigtab_i02" btree (c02)
"bigtab_i03" btree (c03)
"bigtab_i04" btree (c04)
"bigtab_i05" btree (c05)
"bigtab_i06" btree (c06)
"bigtab_i07" btree (c07)
"bigtab_i08" btree (c08)
"bigtab_i09" btree (c09)
"bigtab_i10" btree (c10)
"bigtab_i11" btree (c11)
"bigtab_i12" btree (c12)
"bigtab_i13" btree (c13)
"bigtab_i14" btree (c14)
"bigtab_i15" btree (c15)
"bigtab_i16" btree (c16)
"bigtab_i17" btree (c17)
"bigtab_i18" btree (c18)
"bigtab_i19" btree (c19)
"bigtab_i20" btree (c20)
Access method: heap kingbase=#

方法一:直接插入海量数据,自动维护索引



kingbase=#
kingbase=# insert into bigtab
kingbase-# select id
kingbase-# , (random() * 100)::int + 1000 c01
kingbase-# , (random() * 200)::int + 1000 c02
kingbase-# , (random() * 300)::int + 10000 c03
kingbase-# , (random() * 400)::int + 10000 c04
kingbase-# , (random() * 500)::int + 10000 c05
kingbase-# , (random() * 600)::int + 10000 c06
kingbase-# , (random() * 700)::int + 10000 c07
kingbase-# , (random() * 800)::int + 10000 c08
kingbase-# , (random() * 900)::int + 10000 c09
kingbase-# , (random() * 1000)::int + 10000 c10
kingbase-# , (random() * 2000)::int + 10000 c11
kingbase-# , (random() * 3000)::int + 10000 c12
kingbase-# , (random() * 4000)::int + 10000 c13
kingbase-# , (random() * 5000)::int + 10000 c14
kingbase-# , (random() * 6000)::int + 10000 c15
kingbase-# , (random() * 7000)::int + 10000 c16
kingbase-# , (random() * 8000)::int + 10000 c17
kingbase-# , (random() * 9000)::int + 10000 c18
kingbase-# , (random() * 10000)::int + 10000 c19
kingbase-# , (random() * 20000)::int + 10000 c20
kingbase-# , (random() * 30000)::int + 10000 c21
kingbase-# , (random() * 40000)::int + 10000 c22
kingbase-# , (random() * 50000)::int + 10000 c23
kingbase-# , (random() * 60000)::int + 10000 c24
kingbase-# , (random() * 70000)::int + 10000 c25
kingbase-# , (random() * 80000)::int + 10000 c26
kingbase-# , (random() * 90000)::int + 10000 c27
kingbase-# , (random() * 10000)::int + 10000 c28
kingbase-# , (random() * 10000)::int + 10000 c29
kingbase-# , md5(random()::text) t01
kingbase-# , md5(random()::text) t02
kingbase-# , md5(random()::text) t03
kingbase-# , md5(random()::text) t04
kingbase-# , md5(random()::text) t05
kingbase-# , md5(random()::text) t06
kingbase-# , md5(random()::text) t07
kingbase-# , md5(random()::text) t08
kingbase-# , md5(random()::text) t09
kingbase-# , md5(random()::text) t10
kingbase-# , md5(random()::text) t11
kingbase-# , md5(random()::text) t12
kingbase-# , md5(random()::text) t13
kingbase-# , md5(random()::text) t14
kingbase-# , md5(random()::text) t15
kingbase-# , md5(random()::text) t16
kingbase-# , md5(random()::text) t17
kingbase-# , md5(random()::text) t18
kingbase-# , md5(random()::text) t19
kingbase-# , md5(random()::text) t20
kingbase-# from generate_series(1, 2000000) id;
INSERT 0 2000000
Time: 299331.143 ms (04:59.331)

优点: 语句单一;自动维护索引;自动支持之后的索引。

缺点: 逐行维护索引,造成用时较长。

方法二:删除索引,插入海量数据,再创建索引


kingbase=#
kingbase=# do
kingbase-# $$
kingbase$# begin
kingbase$# drop index bigtab_i01;
kingbase$# drop index bigtab_i02;
kingbase$# drop index bigtab_i03;
kingbase$# drop index bigtab_i04;
kingbase$# drop index bigtab_i05;
kingbase$# drop index bigtab_i06;
kingbase$# drop index bigtab_i07;
kingbase$# drop index bigtab_i08;
kingbase$# drop index bigtab_i09;
kingbase$# drop index bigtab_i10;
kingbase$# drop index bigtab_i11;
kingbase$# drop index bigtab_i12;
kingbase$# drop index bigtab_i13;
kingbase$# drop index bigtab_i14;
kingbase$# drop index bigtab_i15;
kingbase$# drop index bigtab_i16;
kingbase$# drop index bigtab_i17;
kingbase$# drop index bigtab_i18;
kingbase$# drop index bigtab_i19;
kingbase$# drop index bigtab_i20;
kingbase$#
kingbase$# insert into bigtab
kingbase$# select id
kingbase$# , (random() * 100)::int + 1000 c01
kingbase$# , (random() * 200)::int + 1000 c02
kingbase$# , (random() * 300)::int + 10000 c03
kingbase$# , (random() * 400)::int + 10000 c04
kingbase$# , (random() * 500)::int + 10000 c05
kingbase$# , (random() * 600)::int + 10000 c06
kingbase$# , (random() * 700)::int + 10000 c07
kingbase$# , (random() * 800)::int + 10000 c08
kingbase$# , (random() * 900)::int + 10000 c09
kingbase$# , (random() * 1000)::int + 10000 c10
kingbase$# , (random() * 2000)::int + 10000 c11
kingbase$# , (random() * 3000)::int + 10000 c12
kingbase$# , (random() * 4000)::int + 10000 c13
kingbase$# , (random() * 5000)::int + 10000 c14
kingbase$# , (random() * 6000)::int + 10000 c15
kingbase$# , (random() * 7000)::int + 10000 c16
kingbase$# , (random() * 8000)::int + 10000 c17
kingbase$# , (random() * 9000)::int + 10000 c18
kingbase$# , (random() * 10000)::int + 10000 c19
kingbase$# , (random() * 20000)::int + 10000 c20
kingbase$# , (random() * 30000)::int + 10000 c21
kingbase$# , (random() * 40000)::int + 10000 c22
kingbase$# , (random() * 50000)::int + 10000 c23
kingbase$# , (random() * 60000)::int + 10000 c24
kingbase$# , (random() * 70000)::int + 10000 c25
kingbase$# , (random() * 80000)::int + 10000 c26
kingbase$# , (random() * 90000)::int + 10000 c27
kingbase$# , (random() * 10000)::int + 10000 c28
kingbase$# , (random() * 10000)::int + 10000 c29
kingbase$# , md5(random()::text) t01
kingbase$# , md5(random()::text) t02
kingbase$# , md5(random()::text) t03
kingbase$# , md5(random()::text) t04
kingbase$# , md5(random()::text) t05
kingbase$# , md5(random()::text) t06
kingbase$# , md5(random()::text) t07
kingbase$# , md5(random()::text) t08
kingbase$# , md5(random()::text) t09
kingbase$# , md5(random()::text) t10
kingbase$# , md5(random()::text) t11
kingbase$# , md5(random()::text) t12
kingbase$# , md5(random()::text) t13
kingbase$# , md5(random()::text) t14
kingbase$# , md5(random()::text) t15
kingbase$# , md5(random()::text) t16
kingbase$# , md5(random()::text) t17
kingbase$# , md5(random()::text) t18
kingbase$# , md5(random()::text) t19
kingbase$# , md5(random()::text) t20
kingbase$# from generate_series(1, 2000000) id;
kingbase$#
kingbase$# create index bigtab_i01 on bigtab (c01);
kingbase$# create index bigtab_i02 on bigtab (c02);
kingbase$# create index bigtab_i03 on bigtab (c03);
kingbase$# create index bigtab_i04 on bigtab (c04);
kingbase$# create index bigtab_i05 on bigtab (c05);
kingbase$# create index bigtab_i06 on bigtab (c06);
kingbase$# create index bigtab_i07 on bigtab (c07);
kingbase$# create index bigtab_i08 on bigtab (c08);
kingbase$# create index bigtab_i09 on bigtab (c09);
kingbase$# create index bigtab_i10 on bigtab (c10);
kingbase$# create index bigtab_i11 on bigtab (c11);
kingbase$# create index bigtab_i12 on bigtab (c12);
kingbase$# create index bigtab_i13 on bigtab (c13);
kingbase$# create index bigtab_i14 on bigtab (c14);
kingbase$# create index bigtab_i15 on bigtab (c15);
kingbase$# create index bigtab_i16 on bigtab (c16);
kingbase$# create index bigtab_i17 on bigtab (c17);
kingbase$# create index bigtab_i18 on bigtab (c18);
kingbase$# create index bigtab_i19 on bigtab (c19);
kingbase$# create index bigtab_i20 on bigtab (c20);
kingbase$#
kingbase$# end;
kingbase$# $$;
ANONYMOUS BLOCK
Time: 83069.170 ms (01:23.069)

优点: 批量维护索引,用时最短。

缺点: 语句复杂且固化;手动维护删建索引语句;不支持之后的索引。

方法三:禁止索引更改,插入海量数据,重建表的全部索引


kingbase=# do
kingbase-# $$
kingbase$# begin
kingbase$#
kingbase$# update pg_index
kingbase$# set indislive= false
kingbase$# where indrelid = 'bigtab'::regclass;
kingbase$#
kingbase$# insert into bigtab
kingbase$# select id
kingbase$# , (random() * 100)::int + 1000 c01
kingbase$# , (random() * 200)::int + 1000 c02
kingbase$# , (random() * 300)::int + 10000 c03
kingbase$# , (random() * 400)::int + 10000 c04
kingbase$# , (random() * 500)::int + 10000 c05
kingbase$# , (random() * 600)::int + 10000 c06
kingbase$# , (random() * 700)::int + 10000 c07
kingbase$# , (random() * 800)::int + 10000 c08
kingbase$# , (random() * 900)::int + 10000 c09
kingbase$# , (random() * 1000)::int + 10000 c10
kingbase$# , (random() * 2000)::int + 10000 c11
kingbase$# , (random() * 3000)::int + 10000 c12
kingbase$# , (random() * 4000)::int + 10000 c13
kingbase$# , (random() * 5000)::int + 10000 c14
kingbase$# , (random() * 6000)::int + 10000 c15
kingbase$# , (random() * 7000)::int + 10000 c16
kingbase$# , (random() * 8000)::int + 10000 c17
kingbase$# , (random() * 9000)::int + 10000 c18
kingbase$# , (random() * 10000)::int + 10000 c19
kingbase$# , (random() * 20000)::int + 10000 c20
kingbase$# , (random() * 30000)::int + 10000 c21
kingbase$# , (random() * 40000)::int + 10000 c22
kingbase$# , (random() * 50000)::int + 10000 c23
kingbase$# , (random() * 60000)::int + 10000 c24
kingbase$# , (random() * 70000)::int + 10000 c25
kingbase$# , (random() * 80000)::int + 10000 c26
kingbase$# , (random() * 90000)::int + 10000 c27
kingbase$# , (random() * 10000)::int + 10000 c28
kingbase$# , (random() * 10000)::int + 10000 c29
kingbase$# , md5(random()::text) t01
kingbase$# , md5(random()::text) t02
kingbase$# , md5(random()::text) t03
kingbase$# , md5(random()::text) t04
kingbase$# , md5(random()::text) t05
kingbase$# , md5(random()::text) t06
kingbase$# , md5(random()::text) t07
kingbase$# , md5(random()::text) t08
kingbase$# , md5(random()::text) t09
kingbase$# , md5(random()::text) t10
kingbase$# , md5(random()::text) t11
kingbase$# , md5(random()::text) t12
kingbase$# , md5(random()::text) t13
kingbase$# , md5(random()::text) t14
kingbase$# , md5(random()::text) t15
kingbase$# , md5(random()::text) t16
kingbase$# , md5(random()::text) t17
kingbase$# , md5(random()::text) t18
kingbase$# , md5(random()::text) t19
kingbase$# , md5(random()::text) t20
kingbase$# from generate_series(1, 2000000) id;
kingbase$#
kingbase$# update pg_index
kingbase$# set indislive= true
kingbase$# where indrelid = 'bigtab'::regclass;
kingbase$#
kingbase$# analyse bigtab;
kingbase$# reindex table bigtab;
kingbase$#
kingbase$# end;
kingbase$# $$;
ANONYMOUS BLOCK
Time: 87110.126 ms (01:27.110)

优点: 批量维护索引,用时短;语句固定模式;自动维护索引;支持之后的索引。

缺点: 多个SQL语句,不易嵌入语句块。

最后的话

reindex table 的执行依赖统计信息,所以需要执行 analyse table ,才能成功重建表的全部可更新的索引。

reindex index 不受上述因素的影响,可以强制重建不更新的索引,并自动修改 indislive= true。

如果在REINDEX期间出现异常,那么所有需要rebuild的索引的状态都是invalid,意味着这些索引仍然占用空间,定义仍在但不能使用。

避免REINDEX期间出现异常,可以在索引更新操作时,跳过唯一索引和外键依赖索引等。

KingbaseES例程之拥有大量索引的表导入数据的更多相关文章

  1. U8API——向U8数据库表导入数据

    一.打开API资源管理器 替换两个引用 打开应用实例,选择相应的功能 复制相应的封装类到自己的目录下 在数据库新建临时表,与目标表相同 数据导入: 思路:先将要导入的数据导入到与U8目标表相同的临时表 ...

  2. mysql单表导入数据,全量备份导入单表

    (1)“导出”表 导出表是在备份的prepare阶段进行的,因此,一旦完全备份完成,就可以在prepare过程中通过--export选项将某表导出了: innobackupex --apply-log ...

  3. asp.net 从Excel表导入数据到数据库中

    http://www.cnblogs.com/hfzsjz/archive/2010/12/31/1922901.html http://hi.baidu.com/ctguyg/item/ebc857 ...

  4. 关于mysql 表导入数据

    一.实验准备: 1.实验设备:Dell laptop 7559; 2.实验环境:windows 10操作系统; 3.数据库版本:mysql 8.0; 二.实验目的: 1.将一个宠物表pet.txt文件 ...

  5. oracle RAC 11g sqlload 生产表导入数据(ORA-12899)

    背景:由于即将来临的双十一,业务部门(我司是做京东,天猫的短信服务),短信入库慢,需要DBA把数据库sqlload进数据库. 表结构如下: MRS VARCHAR2(100), STATUS VARC ...

  6. 从Excel表导入数据到Table

    步骤: 1.写第一行SQL,(本sql对应的是oracle数据库) ="INSERT INTO TD_PROMOTION_RATE VALUES("&A3&&quo ...

  7. hive 建表导入数据

    1. hive> create table wyp > (id int, name string, > age int, tel string) > ROW FORMAT DE ...

  8. Hive创建表|数据的导入|数据导出的几种方式

    * Hive创建表的三种方式 1.使用create命令创建一个新表 例如:create table if not exists db_web_data.track_log(字段) partitione ...

  9. SQL Server 索引和表体系结构(聚集索引)

    聚集索引 概述 关于索引和表体系结构的概念一直都是讨论比较多的话题,其中表的各种存储形式是讨论的重点,在各个网站上面也有很多关于这方面写的不错的文章,我写这篇文章的目的也是为了将所有的知识点尽可能的组 ...

随机推荐

  1. 手把手教你实现一个Vue无限级联树形表格(增删改)

    前言平时我们可能在做项目时,会遇到一个业务逻辑.实现一个无限级联树形表格,什么叫做无限级联树形表格呢?就是下图所展示的内容,有一个祖元素,然后下面可能有很多子孙元素,你可以实现添加.编辑.删除这样几个 ...

  2. Spring框架系列(5) - 深入浅出SpringMVC请求流程和案例

    前文我们介绍了Spring框架和Spring框架中最为重要的两个技术点(IOC和AOP),那我们如何更好的构建上层的应用呢(比如web 应用),这便是SpringMVC:Spring MVC是Spri ...

  3. 原生实现.NET5.0+ 自定义日志

    一.定义一个静态类 声明一个 ReaderWriterLockSlim 对象 用于并发控制 1 /// <summary> 2 /// IO锁 3 /// </summary> ...

  4. 使用C#编程语言开发Windows Service服务

    转载-https://www.cnblogs.com/yubao/p/8443455.html Create Windows Service project using Visual Studio C ...

  5. 搭建企业级实时数据融合平台难吗?Tapdata + ES + MongoDB 就能搞定

      摘要:如何打造一套企业级的实时数据融合平台?Tapdata 已经找到了最佳实践,下文将以 Tapdata 的零售行业客户为例,与您分享:基于 ES 和 MongoDB 来快速构建一套企业级的实时数 ...

  6. CTO与CIO选型数据中台的几大建议

    企业数字化转型离不开企业数字化技术的配备.但企业在选择数字化技术时也面临着一个问题,就是如何在大胆采用先进的数字化技术和对技术进行投资之间找到平衡,将投资风险降到最低,毕竟错误的技术选型会给企业带来不 ...

  7. 4-6 Mabatis 框架

    Mabatis 框架 Ⅰ.关于Mabatis 对数据库中的数据进行访问的框架 数据库执行过程: 连接数据库-->准备好SQL-->发送SQL语句-->执行语句-->获取结果-- ...

  8. jdbc 10:jdbc事务

    jdbc连接mysql,涉及到的事务问题 package com.examples.jdbc.o10_jdbc事务; import java.sql.Connection; import java.s ...

  9. 高级数据结构学习笔记 / Data Structure(updating)

    树状数组   查询操作:O(logn) 修改操作:O(logn) #define lowbit(x) (x & -x) int tr[N]; // 树状数组 // 添加c个大小为x的数值 vo ...

  10. 基于gRPC编写golang简单C2远控

    概述 构建一个简单的远控木马需要编写三个独立的部分:植入程序.服务端程序和管理程序. 植入程序是运行在目标机器上的远控木马的一部分.植入程序会定期轮询服务器以查找新的命令,然后将命令输出发回给服务器. ...