Oracle数据泵导出使用并行参数,单个表能否真正的并行?
对于Oracle 数据泵expdp,impdp是一种逻辑导出导入迁移数据的一个工具,是服务端的工具,常见于DBA人员使用,用于数据迁移。从A库迁移至B库,或者从A用户迁移至B用户等。
那么有个疑问?
在Oracle 11.2.0.4的版本,对一个表,使用并行参数,是否真实的起用了并行?假设并行为2,是否真的分2个进程,一个进程负责导出一半的数据???
1.测试导出两个不同的表,使用并行2
$ expdp scott/tiger directory=dump dumpfile=D%U.dmp parallel= cluster=n tables=dept,emp
Total estimation using BLOCKS method: KB
. . exported "SCOTT"."DEPT" 5.929 KB rows
. . exported "SCOTT"."EMP" 8.562 KB rows
Dump file set for SCOTT.SYS_EXPORT_TABLE_01 is:
/home/oracle/D01.dmp
/home/oracle/D02.dmp
这两个文件,一个大一个小,根本不是大小相等的文件格式。 并且dump文件是二进制文件,无法观察里面具体存放什么信息,因此直接读取dump文件观察这条路不通。
2.阅读MOS文档
Parallel Capabilities of Oracle Data Pump (Doc ID 365459.1) For every export operation, Data Pump estimates how much disk space each table data object in the export job will consume (in bytes).
This is done whether or not the user uses the ESTIMATE parameter. The estimate is printed in the log file and displayed on the client's
standard output device. The estimate is for table row data only; it does not include metadata. This estimate is used to determine how many
PX processes will be applied to the table data object, if any.
对于每个导出的表来说,会估算导出大小,只包含表的行记录对应预估的大小。并且以此评估真正使用并行时,需要使用多少个进程? The columns of the tables are examined to determine if direct path, external tables, or both methods can be used. For direct path,
the parallel number for the table data object is always one since direct path does not support parallel unload of a table data object.
PX processes are only used with external tables.
Oracle导出两种方式,直接路径读 or 外部表,直接路径读并行一直1,外部表才允许并行导出。 If the external tables method is chosen, Data Pump will determine the maximum number of PX processes that can work on a table data object.
It does this by dividing the estimated size of the table data object by MB and rounding the result down. If the result is zero or one,
then PX processes are not used to unload the table.
如果选择了外部表方法,则数据泵将确定可在表数据对象上运行的PX进程的最大数量。它通过将表数据对象的估计大小除以250 MB并将结果四舍五入来实现。如果结果为零或一,则不使用PX进程卸载表。 If a job is not big enough to make use of the maximum parallel number, then the user will not see the maximum number of active workers and
Parallel Execution Processes. For example, if there is one MB table, and it has been determined that external tables will be used,
there will be one worker for the metadata, one worker for the data, and three PX processes. As mentioned above, the worker process for
the data acts as the coordinator for the PX processes and does not count toward the parallel total. So, if a user specifies PARALLEL = ,
the degree of parallelism is actually four. The user will only see one active worker in the STATUS display. Data Pump is working optimally;
the job is too small for the specified degree of parallelism. 这段话不太理解,有点懵,后续测试下。
3.模拟一个800M的非分区表,并行使用10个并行导出,结果如何?
疑问? 表导出,是根据表统计信息估算大小? 还是根据DBA_SEGMENTS 估算?
SQL> select sum(bytes)// from dba_segments where owner='SCOTT' and segment_name='A';
SUM(BYTES)//
--------------------
824
SQL> select owner,table_name,NUM_ROWS,BLOCKS*8/1024,SAMPLE_SIZE from dba_tables where owner='SCOTT' and table_name='A';
OWNER TABLE_NAME NUM_ROWS BLOCKS*8/1024 SAMPLE_SIZE
---------- ------------------------------ ---------- ------------- -----------
SCOTT A
导出
$ expdp scott/tiger directory=dump dumpfile=D%U.dmp parallel= tables=a
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: MB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "SCOTT"."A" 708.3 MB rows
******************************************************
Dump file set for SCOTT.SYS_EXPORT_TABLE_01 is:
/u01/dump/D01.dmp
/u01/dump/D02.dmp
/u01/dump/D03.dmp
Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug :: elapsed ::
$ ls -lrt D*.dmp
-rw------- oracle oinstall Aug : D03.dmp
-rw------- oracle oinstall Aug : D02.dmp
-rw------- oracle oinstall Aug : D01.dmp $expdp \'/ as sysdba\' attach=SYS_EXPORT_TABLE_01
Job: SYS_EXPORT_TABLE_01
Owner: SCOTT
Operation: EXPORT
Creator Privs: TRUE
GUID: AC156DF4AC940F1DE053453CA8C0F1FA
Start Time: Wednesday, August, ::
Mode: TABLE
Instance: tt
Max Parallelism:
EXPORT Job Parameters:
Parameter Name Parameter Value:
CLIENT_COMMAND scott/******** directory=dump dumpfile=D%U.dmp parallel=10 tables=a
State: EXECUTING
Bytes Processed: 0
Current Parallelism: 10
Job Error Count: 0
Dump File: /u01/dump/D01.dmp
bytes written: 4,096
Dump File: /u01/dump/D%u.dmp Worker 1 Status:
Process Name: DW00
State: EXECUTING
Object Schema: SCOTT
Object Name: A
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 1
Worker Parallelism: 1 Worker 2 Status:
Process Name: DW01
State: EXECUTING
Object Schema: SCOTT
Object Name: A
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 1
Worker Parallelism: 3 自问自答:数据泵这个值说是预估,实际上是直接使用DBA_SEGMENTS里面的数据。
Total estimation using BLOCKS method: 824 MB 并且此时在结合MOS最后一段话的理解,并行参数使用10,但是实际上相当于使用了两个并行主进程,一个进程负责导出元数据,导出完成元数据,在帮忙导出一点数据;
另一个进程主要负责导输出,根据算法,出现3个辅助进程,加快导出! 824/250=3 四舍五入。
这也侧面验证了对于非分区表,实际上并行导出,并不会多出几个进程并行。
疑问又来了??? 那么对于单表的导出,使用并行参数是否真的快? 还是慢?
并行10 ,11秒执行。
$ expdp scott/tiger directory=dump dumpfile=D1%U.dmp tables=a
******************************************************************************
Dump file set for SCOTT.SYS_EXPORT_TABLE_01 is:
/u01/dump/D101.dmp
Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug 5 06:12:40 2020 elapsed 0 00:00:15
15s的时间,与11s相比,差距并不大。 因此对于单表来说,其实并行并不会真正意义上明细加快速度。
4.模拟分区表,800M数据,使用并行参数导出时间如何?
create table RANGE_PART_TAB(id number,
deal_date date, area_code number, contents varchar2())
partition by range(deal_date)
(
partition p1 values less than(to_date('2020-02-01','yyyy-mm-dd')),
partition p2 values less than(to_date('2020-03-01','yyyy-mm-dd')),
partition p3 values less than(to_date('2020-04-01','yyyy-mm-dd')),
partition p4 values less than(to_date('2020-05-01','yyyy-mm-dd')),
partition p5 values less than(to_date('2020-06-01','yyyy-mm-dd')),
partition p6 values less than(to_date('2020-07-01','yyyy-mm-dd')),
partition p7 values less than(to_date('2020-08-01','yyyy-mm-dd')),
partition p8 values less than(to_date('2020-09-01','yyyy-mm-dd')),
partition p9 values less than(to_date('2020-10-01','yyyy-mm-dd')),
partition p10 values less than(to_date('2020-11-01','yyyy-mm-dd'))); insert into range_part_tab (id,deal_date,area_code,contents)
select rownum,
to_date(to_char(to_date('','yyyymmdd'),'J')+ trunc(dbms_random.value(,)),'J'),
ceil(dbms_random.value(,)),
rpad('*',,'*')
from dual
connect by rownum <= ;
SQL> r 多次自插入数据
* insert into range_part_tab select * from range_part_tab
rows created.
SQL> commit;
SQL> select sum(bytes)// from dba_segments where owner='SCOTT' and segment_name='RANGE_PART_TAB';
SUM(BYTES)//
-------------------- SQL> select PARTITION_NAME,sum(bytes)// from dba_segments where owner='SCOTT' and segment_name='RANGE_PART_TAB' group by
PARTITION_NAME order by ; PARTITION_NAME SUM(BYTES)//
------------------------------ --------------------
P1
P10
P2
P3
P4
P5
P6
P7
P8
P9
rows selected.
[oracle@test dump]$ expdp scott/tiger directory=dump dumpfile=D2%U.dmp tables=RANGE_PART_TAB parallel= Worker Status:
Process Name: DW00
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects:
Total Objects:
Completed Rows: ,
Completed Bytes: ,,
Percent Done:
Worker Parallelism: Worker Status:
Process Name: DW01
State: WORK WAITING
******************************************************************************
Dump file set for SCOTT.SYS_EXPORT_TABLE_01 is:
/u01/dump/D201.dmp
/u01/dump/D202.dmp
/u01/dump/D203.dmp
/u01/dump/D204.dmp
/u01/dump/D205.dmp
/u01/dump/D206.dmp
/u01/dump/D207.dmp
/u01/dump/D208.dmp
/u01/dump/D209.dmp
Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug :: elapsed :: [oracle@test dump]$ expdp scott/tiger directory=dump dumpfile=D3%U.dmp tables=RANGE_PART_TAB
Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug :: elapsed :: !!! 测试了一个寂寞,根本不是想象中的十个不同的主进程,每个人负责一个分区,而还是一个主进程,串行化导出每个分区! 很傻很天真。 ??? 难道是对于每个Segments 大小 250M的限制??? 对测试分区表扩大几倍。 666 果然是250MB的阈值,当单个分区SEGMENTS大于250M,才真正的开始了并行的作用,导出存在10个主进程导出。
SQL> select PARTITION_NAME,sum(bytes)/1024/1024 from dba_segments where owner='SCOTT' and segment_name='RANGE_PART_TAB' group by PARTITION_NAME;
PARTITION_NAME SUM(BYTES)/1024/1024
------------------------------ --------------------
P7 309
P4 304
P1 312
P2 288
P3 304
P6 296
P10 264
P9 300
P8 312
P5 312
10 rows selected.
[oracle@test dump]$ expdp scott/tiger directory=dump dumpfile=D4%U.dmp tables=RANGE_PART_TAB parallel=10
Worker 1 Status:
Process Name: DW00
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1
Worker 2 Status:
Process Name: DW01
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1
Worker 3 Status:
Process Name: DW02
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1
Worker 4 Status:
Process Name: DW03
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1
Worker 5 Status:
Process Name: DW04
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1
Worker 6 Status:
Process Name: DW05
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1
Worker 7 Status:
Process Name: DW06
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1
Worker 8 Status:
Process Name: DW07
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1
Worker 9 Status:
Process Name: DW08
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1
Worker 10 Status:
Process Name: DW09
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1
******************************************************************************
Dump file set for SCOTT.SYS_EXPORT_TABLE_01 is:
/u01/dump/D401.dmp
/u01/dump/D402.dmp
/u01/dump/D403.dmp
/u01/dump/D404.dmp
/u01/dump/D405.dmp
/u01/dump/D406.dmp
/u01/dump/D407.dmp
/u01/dump/D408.dmp
/u01/dump/D409.dmp
/u01/dump/D410.dmp
Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug 5 06:37:20 2020 elapsed 0 00:00:40
[oracle@test dump]$ expdp scott/tiger directory=dump dumpfile=D5%U.dmp tables=RANGE_PART_TAB
测试下时间对比。
Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug 5 06:39:14 2020 elapsed 0 00:01:21
并行10,非并行时间 40s:80s=1:2 时间上减少了很多。数据越大,效率越明显。
总结:1.对于单个表而言,表小于250M,并行无任何意义,都是1;
2.对单个表来说,并行度2 ,基本上都是第一个进程导出元数据,第二个导出数据,第一个导出元数据的进程导出后,数据未导完毕,会继续往dump灌数据,但是几乎没有并行效果;
3.对于非分区表来说,表很大,并行开很多,仅仅只是导出时辅助进程多几个,效率这块未验证,但是初步对比,没啥效果;
4.对于分区表来说,表很大,单个分区大于250M,并行很多才有用,分区都很小,可以忽略等同于无并行。 数据量越大,并行才能真正使用。
Oracle数据泵导出使用并行参数,单个表能否真正的并行?的更多相关文章
- Oracle数据泵导出数据库
Oracle数据泵导出数据库 特别注意:如果后续要导入的数据库版本低,所有导出命令就需要在后面加一个version=指定版本. 例如从11g导出数据导入到10g,假设10g具体版本为10.2.0.1, ...
- oracle数据泵导出导入
先创建一个目录:比如 Create or Replace directory DATA_PUMP_DIR as 'D:\DataPipe'; 然后给导入导出的用户赋权限: Grant read ...
- 实验:Oracle数据泵导出导入之序列问题
今天同事提出了一个问题: 使用数据泵expdp导出1个schema,有个表主键是触发器自增的id,导入测试库测试时,发现表里的数据比自增序列的值要大.导致插入数据报错. 最终结论是: 由于数据库先进行 ...
- Oracle数据泵导出导入(expdp/impdp)
一.创建表空间 create tablespace atp logging datafile 'D:\oracle\oradata\orcl\atp.dbf' size 50m autoextend ...
- python实现oracle数据泵导出功能
脚本如下:[oracle@ycr python]$ more dump.py #/usr/bin/python#coding:utf8 import sysimport osimport time n ...
- 原创Oracle数据泵导出/导入(expdp/impdp)
//创建目录 create Or Replace directory dpdata1 as 'd:\test\dump'; //赋予读写权限 grant read,write on directory ...
- 数据泵导出报错ORA-31693 ORA-02354 ORA-01466
1.Oracle数据泵导出schema时有报错: Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - P ...
- Oracle简单常用的数据泵导出导入(expdp/impdp)命令举例(上)
<Oracle简单常用的数据泵导出导入(expdp/impdp)命令举例(上)> <Oracle简单常用的数据泵导出导入(expdp/impdp)命令举例(下)> 目的:指导项 ...
- Oracle简单常用的数据泵导出导入(expdp/impdp)命令举例(下)
<Oracle简单常用的数据泵导出导入(expdp/impdp)命令举例(上)> <Oracle简单常用的数据泵导出导入(expdp/impdp)命令举例(下)> 目的:指导项 ...
随机推荐
- python dict乱码如何解决
定义字典并直接输出,结果输出结果中文是乱码展示 d={'name':'lily','age':18,'sex':'女','no':1121} print d 输出结果: {'age': 18, 'no ...
- Spring Security系列之极速入门与实践教程
@ 目录 1. Spring Security 2. 实验环境准备 3. 日志级别修改 4. 配置用户名/密码 5. 数据库方式校验 6. 不拦截静态资源 7. 自定义登录页面 8. Remember ...
- arcgis for js 如何用contains过滤数据
添加全部数据 // 构建map容器 var view = new MapView({ container: 'mapId', map: map }); /******************** * ...
- 预定义的 $_GET 变量用于收集来自 method="get" 的表单中的值
PHP $_GET 变量 在 PHP 中,预定义的 $_GET 变量用于收集来自 method="get" 的表单中的值. $_GET 变量 预定义的 $_GET 变量用于收集来自 ...
- ZROI 提高十连测 DAY2
总结:入题尽量快,想到做法要先证明是否正确是否有不合法的情况,是否和题目中描述的情景一模一样. 不要慌 反正慌也拿不了多少分,多分析题目的性质如果不把题目的性质分析出来的话,暴力也非常的难写,有 ...
- luoguP2154 [SDOI2009]虔诚的墓主人
SDOI2009虔诚的墓主人 喜闻乐见,我终于把此题读懂了..所以可以写了. 其实就是让我们求有多少个十字架 一个十字架的定义为中间有一个空地 周围4个正方向都有k棵树. 不难想到nm的暴力 我们预处 ...
- CMD使用笔记
CMD杂谈 基本功: 1,列出所有任务及进程号,杀进程 tasklist tasklist /? 获取使用帮助 taskkill taskkill /? 获取使用帮助 2,cd 切换目录 cd ...
- Docker管理工具之portainer
参考:https://www.cnblogs.com/frankdeng/p/9686735.html 1. 查询portainer镜像 命令:docker search portainer 实例: ...
- ios 版本更新提示-硬更新/软更新
实现: 强制更新:每次弹框 非强制更新:一天提示一次 代码如下: 步骤一: 将检测更新写到APPDelegate的applicationDidBecomeActive中 步骤二: 检测是否需要更新 步 ...
- 仅需1秒!搞定100万行数据:超强Python数据分析利器
前言 使用Python进行大数据分析变得越来越流行.这一切都要从NumPy开始,它也是今天我们在推文介绍工具背后支持的模块之一. 2 Vaex Vaex是 一种更快.更安全.总体上更方便的方法,可以使 ...