转:

http://venublog.com/2007/11/07/load-data-infile-performance/

I often noticed that people complain about the LOAD DATA performance when loading the table with large number of rows of data. Even today I saw a case where the LOAD DATA on a simple 3 column table with about 5 million rows taking ~15 minutes of time. This is because the server did not had any tuning in regards to bulk insertion.

Consider the following simple MyISAM table on Redhat Linux 32-bit.

 
 
 

Shell

 
1
2
3
4
5
6
7
8
 
CREATE TABLE load1 (
  `col1` varchar(100) NOT NULL default '',
  `col2` int(11) default NULL,
  `col3` char(1) default NULL,
  PRIMARY KEY  (`col1`)
) TYPE=MyISAM;
 

The table has a string key column. Here is the data file(download here) that I used it for testing:

 
 
 

Shell

 
1
2
3
4
5
6
7
 
[vanugant@escapereply:t55 tmp]$ wc loaddata.csv
  5164946   5164946 227257389 loaddata.csv
[vanugant@escapereply:t55 tmp]$ ls -alh loaddata.csv
-rw-r--r--  1 vanugant users 217M Nov  6 14:42 loaddata.csv
[vanugant@escapereply:t55 tmp]$
 

Here is the default mysql system variables related to LOAD DATA:

 
 
 

Shell

 
1
2
3
4
5
6
7
8
9
10
 
mysql> show variables;
+-------------------------+---------+
| Variable_name              | Value   |
+-------------------------+---------+
| bulk_insert_buffer_size   | 8388608 |
| myisam_sort_buffer_size   | 16777216 |
| key_buffer_size            | 33554432 |
+-------------------------+----------+
 

and here is the actual LOAD DATA query to load all ~5m rows (~256M of data) to the table and its timing.

 
 
 

Shell

 
1
2
3
4
5
 
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (14 min 56.84 sec)
Records: 5164946  Deleted: 0  Skipped: 489123  Warnings: 0
 

Now, lets experiment by disabling the keys in the table before running the LOAD DATA:

 
 
 

Shell

 
1
2
3
4
5
6
7
8
9
10
11
 
mysql> SET SESSION BULK_INSERT_BUFFER_SIZE=314572800;
Query OK, 0 rows affected (0.00 sec)
 
mysql> alter table load1 disable keys;
Query OK, 0 rows affected (0.00 sec)
 
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (13 min 47.50 sec)
Records: 5164946  Deleted: 0  Skipped: 489123  Warnings: 0
 

No use, just 1% increase or same…., now lets set the real MyISAM values… and try again…

 
 
 

Shell

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
 
mysql> SET SESSION BULK_INSERT_BUFFER_SIZE=256217728;
Query OK, 0 rows affected (0.00 sec)
 
mysql> set session MYISAM_SORT_BUFFER_SIZE=256217728;
Query OK, 0 rows affected (0.00 sec)
 
mysql> set global KEY_BUFFER_SIZE=256217728;
Query OK, 0 rows affected (0.05 sec)
 
mysql> alter table load1 disable keys;
Query OK, 0 rows affected (0.00 sec)
 
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (1 min 55.05 sec)
Records: 5164946  Deleted: 0  Skipped: 489123  Warnings: 0
 
mysql> alter table load1 enable keys;
Query OK, 0 rows affected (0.00 sec)
 

Wow…thats almost 90% increase in the performance. So, disabling the keys in MyISAM is not just the key, but tuning the buffer size does play role based on the input data.

For the same case with Innodb, here is the status by adjusting the Innodb_buffer_pool_size=1G andInnodb_log_file_size=256M along with innodb_flush_logs_at_trx_commit=1.

 
 
 

Shell

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
 
mysql> show variables like '%innodb%size';
+---------------------------------+------------+
| Variable_name                   | Value      |
+---------------------------------+------------+
| innodb_additional_mem_pool_size | 26214400   |
| innodb_buffer_pool_size         | 1073741824 |
| innodb_log_buffer_size          | 8388608    |
| innodb_log_file_size            | 268435456  |
+---------------------------------+------------+
 
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (2 min 37.53 sec)
Records: 5164946  Deleted: 0  Skipped: 489123  Warnings: 0
 

With innodb_flush_logs_at_trx_commit=2, innodb_flush_method=O_DIRECT and innodb_doublewrite=0; it will be another 40% difference (use all these variables with caution, unless you know what you are doing)

 
 
 

Shell

 
1
2
3
4
5
 
mysql> LOAD DATA INFILE '/home/vanugant/tmp/loaddata.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ',';
Query OK, 4675823 rows affected (1 min 53.69 sec)
Records: 5164946  Deleted: 0  Skipped: 489123  Warnings: 0

LOAD DATA INFILE – performance case study的更多相关文章

  1. LOAD DATA INFILE Syntax--官方

    LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] INTO TABLE tbl_n ...

  2. Data Visualization – Banking Case Study Example (Part 1-6)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  3. Mysql load data infile 导入数据出现:Data truncated for column

    [1]Mysql load data infile 导入数据出现:Data truncated for column .... 可能原因分析: (1)数据库表对应字段类型长度不够或修改为其他数据类型( ...

  4. Mysql load data infile 命令导入含中文csv源数据文件 【错误代码 1300】

    [1]Load data infile 命令导入含中文csv源数据文件 报错:Invalid utf8 character string: '??֧' (1)问题现象 csv格式文件源数据: 导入SQ ...

  5. Mysql load data infile 命令格式

    [1]Linux系统环境下 LOAD DATA INFILE /usr/LOCAL/lib/ubcsrvd/datacsv/201909_source.csv INTO TABLE np_cdr_20 ...

  6. Mysql 命令 load data infile 权限问题

    [1]Mysql命令load data infile 执行权限问题 工作中,经常会遇到往线上环境mysql数据库批量导入源数据的场景. 针对这个场景问题,mysql有一个很高效的命令:load dat ...

  7. mysql load data infile的使用 和 SELECT into outfile备份数据库数据

    LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE t ...

  8. SQL基本语句(3) LOAD DATA INFILE

    使用LOAD语句批量录入数据 语法: LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNOR ...

  9. mysql导入数据load data infile用法

    mysql导入数据load data infile用法 基本语法: load data [low_priority] [local] infile 'file_name txt' [replace | ...

随机推荐

  1. UNIX网络编程——网络IPC:套接字

    UNIX网络编程——网络IPC:套接字 Contents 套接字接口 套接字描述符 寻址 字节序 地址格式 地址查询 绑定地址 建立连接 数据传输 套接字选项 带外数据 UNIX域套接字 使用套接字的 ...

  2. SSIS ->> 通过Checkoints实现SSIS的包重启(Package Restartability Fullfilled By Checkoints)

    Checkpoints是实现SSIS包重载的基础.它的原理是把当前运行环境的配置.变量以及到了包运行到哪一步和哪一步失败这些信息写入到一个文件中.而且有一点很重要,Checkpoint只发生在Cont ...

  3. Reads sequentially from multiple sources

    /* * Copyright (C) 2016 Stephen Ostermiller * http://ostermiller.org/contact.pl?regarding=Java+Utili ...

  4. Linux音频编程指南

    Linux音频编程指南 虽然目前Linux的优势主要体现在网络服务方面,但事实上同样也有着非常丰富的媒体功能,本文就是以多媒体应用中最基本的声音为对象,介绍如何在Linux平台下开发实际的音频应用程序 ...

  5. ubuntu下如何查看用户登录及系统授权相关信息【转】

    转自:http://www.tuicool.com/articles/ia67Bj 如何在ubuntu下查看相关用户登录历史,进行系统的日志跟踪和分析,以便发现系统登录问题,进行安全策略防护呢?ubu ...

  6. Linux下scp的用法

    scp就是secure copy,一个在linux下用来进行远程拷贝文件的命令.有时我们需要获得远程服务器上的某个文件,该服务器既没有配置ftp服务器,也没有做共享,无法通过常规途径获得文件时,只需要 ...

  7. [原]携程预选赛A题-聪明的猴子-GCD+DP

    题目: 聪明的猴子 Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others) Total Sub ...

  8. hdoj - 1258 Sum It Up && hdoj - 1016 Prime Ring Problem (简单dfs)

    http://acm.hdu.edu.cn/showproblem.php?pid=1258 关键点就是一次递归里面一样的数字只能选一次. #include <cstdio> #inclu ...

  9. AOJ -0189 Convenient Location && poj 2139 Six Degrees of Cowvin Bacon (floyed求任意两点间的最短路)

    http://acm.hust.edu.cn/vjudge/problem/viewProblem.action?id=78207 看懂题就好. 求某一办公室到其他办公室的最短距离. 多组输入,n表示 ...

  10. 汉字编码:GB2312, GBK, GB18030, Big5

    前一篇博文:ANSI是什么编码?中有这样一段小故事: 话说计算机是由美国佬搞出来的嘛,他们觉得一个字节(可以表示256个编码)表示英语世界里所有的字母.数字和常用特殊符号已经绰绰有余了(其实ASCII ...