11g包dbms_parallel_execute在海量数据处理过程中的应用
11g包dbms_parallel_execute在海量数据处理过程中的应用
一.1 BLOG文档结构图
一.2 前言部分
一.2.1 导读
各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你所不知道的知识,~O(∩_∩)O~:
① 11g包dbms_parallel_execute在海量数据处理过程中的应用
注意:本篇BLOG中代码部分需要特别关注的地方我都用黄色背景和红色字体来表示,比如下边的例子中,thread 1的最大归档日志号为33,thread 2的最大归档日志号为43是需要特别关注的地方。
List of Archived Logs in backup set 11
Thrd Seq Low SCN Low Time Next SCN Next Time
---- ------- ---------- ------------------- ---------- ---------
1 32 1621589 2015-05-29 11:09:52 1625242 2015-05-29 11:15:48
1 33 1625242 2015-05-29 11:15:48 1625293 2015-05-29 11:15:58
2 42 1613951 2015-05-29 10:41:18 1625245 2015-05-29 11:15:49
2 43 1625245 2015-05-29 11:15:49 1625253 2015-05-29 11:15:53
本文如有错误或不完善的地方请大家多多指正,ITPUB留言或QQ皆可,您的批评指正是我写作的最大动力。
一.2.2 实验环境介绍
11.2.0.1 RHEL6.5
一.2.3 相关参考文章链接
http://blog.itpub.net/26736162/viewspace-1684095/ |
|
http://blog.itpub.net/26736162/viewspace-1683913/ |
|
http://blog.itpub.net/26736162/viewspace-1683912/ |
一.2.4 本文简介
一个朋友own_my要处理批量数据,但是脚本跑的太慢了,于是网上搜到了dbms_parallel_execute这个包,用完后给我说这个包非常强大,于是我也学习学习,关于优化一直是我喜欢的内容,在参考了大神realkid4 的blog后,我自己也做了做实验,感觉很强大,记录在此。
一.3 相关知识点扫盲
参考大神的blog:http://blog.itpub.net/17203031/
一.4 实验部分
一.4.1 实验目标
测试dbms_parallel_execute包在海量数据处理过程中的应用。
一.4.2 实验过程
[oracle@etlhost206 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jun 3 13:40:34 2015
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> CONN LHR/lhr
Connected.
SQL> CREATE TABLE T AS SELECT * FROM DBA_OBJECTS;
Table created.
SQL> insert into t select * from t;
76369 rows created.
SQL> insert into t select * from t;
152738 rows created.
SQL> insert into t select * from t;
305476 rows created.
SQL> COMMIT;
Commit complete.
SQL> insert into t select * from t;
610952 rows created.
SQL> insert into t select * from t;
1221904 rows created.
SQL> insert into t select * from t;
2443808 rows created.
SQL> insert into t select * from t;
4887616 rows created.
SQL> COMMIT;
Commit complete.
SQL> insert into t select * from t;
9775232 rows created.
SQL> COMMIT;
Commit complete.
SQL> insert into t select * from t;
19550464 rows created.
SQL> COMMIT;
Commit complete.
SQL> select bytes/1024/1024 from dba_segments a where a.segment_name='T';
BYTES/1024/1024
---------------
4341
SQL> SELECT COUNT(1) FROM T;
COUNT(1)
----------
39100928
SQL> show parameter job
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes integer 1000
SQL> show parameter cpu
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cpu_count integer 8
parallel_threads_per_cpu integer 2
resource_manager_cpu_allocation integer 8
SQL> set timing on
SQL> set time on;
15:50:01 SQL>
15:50:02 SQL> show parameter job
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes integer 1000
15:50:09 SQL> select bytes/1024/1024 from dba_segments a where a.segment_name='T';
BYTES/1024/1024
---------------
4341
Elapsed: 00:00:00.41
15:50:31 SQL> declare
15:50:39 2 vc_task varchar2(100);
15:50:39 3 vc_sql varchar2(1000);
15:50:39 4 n_try number;
15:50:39 5 n_status number;
15:50:39 6 begin
15:50:39 7 --Define the Task
15:50:39 8 vc_task := 'Task 1: By Rowid'; --Task名称
15:50:39 9 dbms_parallel_execute.create_task(task_name => vc_task); --手工定义一个Task任务;
15:50:39 10
15:50:39 11 --Define the Spilt
15:50:39 12 dbms_parallel_execute.create_chunks_by_rowid(task_name => vc_task,
15:50:39 13 table_owner => 'LHR',
15:50:39 14 table_name => 'T',
15:50:39 15 by_row => true,
15:50:39 16 chunk_size => 10000); --定义Chunk
15:50:39 17
15:50:39 18 vc_sql := 'update /*+ ROWID(dda) */ t set DATA_OBJECT_ID=object_id+1 where rowid between :start_id and :end_id';
15:50:40 19 --Run the task
15:50:40 20 dbms_parallel_execute.run_task(task_name => vc_task,
15:50:40 21 sql_stmt => vc_sql,
15:50:40 22 language_flag => dbms_sql.native,
15:50:40 23 parallel_level => 4); --执行任务,确定并行度
15:50:40 24
15:50:40 25 --Controller
15:50:40 26 n_try := 0;
15:50:40 27 n_status := dbms_parallel_execute.task_status(task_name => vc_task);
15:50:40 28 while (n_try < 2 and n_status != dbms_parallel_execute.FINISHED) loop
15:50:40 29 dbms_parallel_execute.resume_task(task_name => vc_task);
15:50:40 30 n_status := dbms_parallel_execute.task_status(task_name => vc_task);
15:50:40 31 end loop;
15:50:40 32
15:50:40 33 --Deal with Result
15:50:40 34 dbms_parallel_execute.drop_task(task_name => vc_task);
15:50:40 35 end;
15:50:40 36 /
PL/SQL procedure successfully completed.
Elapsed: 00:03:50.78
15:58:05 SQL>
15:58:06 SQL> create index idx_t_id on t(object_id) nologging parallel 4;
Index created.
Elapsed: 00:01:35.12
16:00:05 SQL> alter index idx_t_id noparallel;
Index altered.
Elapsed: 00:00:00.07
16:00:15 SQL>
16:02:51 SQL> declare
16:02:52 2 vc_task varchar2(100);
16:02:52 3 vc_sql varchar2(1000);
16:02:52 4 n_try number;
16:02:52 5 n_status number;
16:02:52 6 begin
16:02:52 7 --Define the Task
16:02:52 8 vc_task := 'Task 2: By Number Col';
16:02:52 9 dbms_parallel_execute.create_task(task_name => vc_task);
16:02:52 10
16:02:52 11 --Define the Spilt
16:02:52 12 dbms_parallel_execute.create_chunks_by_number_col(task_name => vc_task,
16:02:52 13 table_owner => 'LHR',
16:02:52 14 table_name => 'T',
16:02:52 15 table_column => 'OBJECT_ID',
16:02:52 16 chunk_size => 100000); --定义chunk
16:02:53 17 16:02:53 18 vc_sql := 'update /*+ ROWID(dda) */ t set DATA_OBJECT_ID=object_id+1 where object_id between :start_id and :end_id';
16:02:53 19 --Run the task
16:02:53 20 dbms_parallel_execute.run_task(task_name => vc_task,
16:02:53 21 sql_stmt => vc_sql,
16:02:53 22 language_flag => dbms_sql.native,
16:02:53 23 parallel_level => 4);
16:02:53 24
16:02:53 25 --Controller
16:02:53 26 n_try := 0;
16:02:53 27 n_status := dbms_parallel_execute.task_status(task_name => vc_task);
16:02:53 28 while (n_try < 2 and n_status != dbms_parallel_execute.FINISHED) loop
16:02:53 29 dbms_parallel_execute.resume_task(task_name => vc_task);
16:02:53 30 n_status := dbms_parallel_execute.task_status(task_name => vc_task);
16:02:53 31 end loop;
16:02:53 32
16:02:53 33 --Deal with Result
16:02:53 34 dbms_parallel_execute.drop_task(task_name => vc_task);
16:02:53 35 end;
16:02:53 36 /
^Cdeclare
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation
ORA-06512: at "SYS.DBMS_LOCK", line 201
ORA-06512: at "SYS.DBMS_PARALLEL_EXECUTE", line 44
ORA-06512: at "SYS.DBMS_PARALLEL_EXECUTE", line 390
ORA-06512: at "SYS.DBMS_PARALLEL_EXECUTE", line 417
ORA-06512: at line 20
Elapsed: 00:07:12.08
16:11:36 SQL>
16:11:36 SQL> EXEC dbms_parallel_execute.drop_task(task_name => 'Task 2: By Number Col');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.11
16:31:53 SQL> declare
16:32:05 2 vc_task varchar2(100);
16:32:05 3 vc_sql varchar2(1000);
16:32:05 4 vc_sql_mt varchar2(1000);
16:32:05 5 n_try number;
16:32:05 6 n_status number;
16:32:05 7 begin
16:32:05 8 --Define the Task
16:32:05 9 vc_task := 'Task 3: By SQL';
16:32:05 10 dbms_parallel_execute.create_task(task_name => vc_task);
16:32:05 11
16:32:05 12 --Define the Spilt
16:32:05 13 vc_sql_mt := 'select distinct object_id, object_id from t';
16:32:05 14 dbms_parallel_execute.create_chunks_by_SQL(task_name => vc_task,
16:32:05 15 sql_stmt => vc_sql_mt,
16:32:05 16 by_rowid => false);
16:32:05 17
16:32:05 18 vc_sql := 'update /*+ ROWID(dda) */t set DATA_OBJECT_ID=object_id+1 where object_id between :start_id and :end_id';
16:32:05 19 --Run the task
16:32:05 20 dbms_parallel_execute.run_task(task_name => vc_task,
16:32:05 21 sql_stmt => vc_sql,
16:32:05 22 language_flag => dbms_sql.native,
16:32:05 23 parallel_level => 4);
16:32:05 24
16:32:05 25 --Controller
16:32:05 26 n_try := 0;
16:32:05 27 n_status := dbms_parallel_execute.task_status(task_name => vc_task);
16:32:05 28 while (n_try < 2 and n_status != dbms_parallel_execute.FINISHED) loop
16:32:05 29 dbms_parallel_execute.resume_task(task_name => vc_task);
16:32:05 30 n_status := dbms_parallel_execute.task_status(task_name => vc_task);
16:32:05 31 end loop;
16:32:05 32
16:32:05 33 --Deal with Result
16:32:05 34 dbms_parallel_execute.drop_task(task_name => vc_task);
16:32:05 35 end;
16:32:05 36 /
^Cdeclare
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation
ORA-06512: at "SYS.DBMS_PARALLEL_EXECUTE_INTERNAL", line 634
ORA-06512: at "SYS.DBMS_PARALLEL_EXECUTE", line 163
ORA-06512: at line 14
Elapsed: 00:01:09.08
16:33:14 SQL> EXEC dbms_parallel_execute.drop_task(task_name => 'Task 3: By SQL');
PL/SQL procedure successfully completed.
一.4.2.1 相关字典视图查询
一、 create_chunks_by_rowid过程
SELECT * FROM DBA_PARALLEL_EXECUTE_TASKS;
SELECT * FROM DBA_PARALLEL_EXECUTE_CHUNKS;
SELECT count(1) FROM DBA_PARALLEL_EXECUTE_CHUNKS;
select status, count(*) from user_parallel_execute_chunks group by status;
select D.owner,D.job_name,D.JOB_STYLE,D.JOB_TYPE,D.JOB_ACTION from dba_scheduler_jobs d where d.owner='LHR';
告警日志:
Wed Jun 03 15:53:48 2015
Archived Log entry 1202 added for thread 1 sequence 2669 ID 0x6779dfc4 dest 1:
Thread 1 advanced to log sequence 2671 (LGWR switch)
Current log# 4 seq# 2671 mem# 0: /app/oracle/flash_recovery_area/CNYDB/onlinelog/o1_mf_4_bpxd8g7v_.log
Wed Jun 03 15:53:49 2015
Archived Log entry 1203 added for thread 1 sequence 2670 ID 0x6779dfc4 dest 1:
Wed Jun 03 15:53:57 2015
Thread 1 advanced to log sequence 2672 (LGWR switch)
Current log# 5 seq# 2672 mem# 0: /app/oracle/flash_recovery_area/CNYDB/onlinelog/o1_mf_5_bpxdbwdz_.log
Wed Jun 03 15:53:58 2015
Archived Log entry 1204 added for thread 1 sequence 2671 ID 0x6779dfc4 dest 1:
Thread 1 advanced to log sequence 2673 (LGWR switch)
Current log# 1 seq# 2673 mem# 0: /app/oracle/oradata/CNYDB/redo01.log
Wed Jun 03 15:54:04 2015
Archived Log entry 1205 added for thread 1 sequence 2672 ID 0x6779dfc4 dest 1:
Thread 1 advanced to log sequence 2674 (LGWR switch)
Current log# 6 seq# 2674 mem# 0: /app/oracle/flash_recovery_area/CNYDB/onlinelog/o1_mf_6_bpxdcjx2_.log
Wed Jun 03 15:54:05 2015
Archived Log entry 1206 added for thread 1 sequence 2673 ID 0x6779dfc4 dest 1:
由告警日志可以看出redo切换非常迅速,归档来不及,所以还是需要在空闲的时候来做实验。
二、 create_chunks_by_number_col过程
SELECT * FROM DBA_PARALLEL_EXECUTE_CHUNKS;
SELECT * FROM DBA_PARALLEL_EXECUTE_TASKS;
select status, count(*) from dba_parallel_execute_chunks group by status;
select sid, serial#, status, PROGRAM, SQL_ID, event from v$session where action like 'TASK$%';
select D.owner,D.job_name,D.JOB_STYLE,D.JOB_TYPE,D.JOB_ACTION from dba_scheduler_jobs d where d.owner='LHR';
一.4.3 实验总结
由实验可以看出,采用dbms_parallel_execute.create_chunks_by_rowid方法,4千万的数据量大约4G大小的表更新完大约4分钟,这个速度还是可以的,另外2种方式更新下来速度太慢就没有测试了,具体可以参考这里:http://blog.itpub.net/26736162/viewspace-1683912/,http://blog.itpub.net/26736162/viewspace-1683913/。
一.4.4 实验脚本
一.4.4.1 create_chunks_by_rowid方式
declare
vc_task varchar2(100);
vc_sql varchar2(1000);
n_try number;
n_status number;
begin
--Define the Task
vc_task := 'Task 1: By Rowid'; --Task名称
dbms_parallel_execute.create_task(task_name => vc_task); --手工定义一个Task任务;
--Define the Spilt
dbms_parallel_execute.create_chunks_by_rowid(task_name => vc_task,
table_owner => 'LHR',
table_name => 'T',
by_row => true,
chunk_size => 10000); --定义Chunk
vc_sql := 'update /*+ ROWID(dda) */ t set DATA_OBJECT_ID=object_id+1 where rowid between :start_id and :end_id';
--Run the task
dbms_parallel_execute.run_task(task_name => vc_task,
sql_stmt => vc_sql,
language_flag => dbms_sql.native,
parallel_level => 4); --执行任务,确定并行度
--Controller
n_try := 0;
n_status := dbms_parallel_execute.task_status(task_name => vc_task);
while (n_try < 2 and n_status != dbms_parallel_execute.FINISHED) loop
dbms_parallel_execute.resume_task(task_name => vc_task);
n_status := dbms_parallel_execute.task_status(task_name => vc_task);
end loop;
--Deal with Result
dbms_parallel_execute.drop_task(task_name => vc_task);
end;
/
一.4.4.2 create_chunks_by_number_col
declare
vc_task varchar2(100);
vc_sql varchar2(1000);
n_try number;
n_status number;
begin
--Define the Task
vc_task := 'Task 2: By Number Col';
dbms_parallel_execute.create_task(task_name => vc_task);
--Define the Spilt
dbms_parallel_execute.create_chunks_by_number_col(task_name => vc_task,
table_owner => 'LHR',
table_name => 'T',
table_column => 'OBJECT_ID',
chunk_size => 10000); --定义chunk
vc_sql := 'update /*+ ROWID(dda) */ t set DATA_OBJECT_ID=object_id+1 where object_id between :start_id and :end_id';
--Run the task
dbms_parallel_execute.run_task(task_name => vc_task,
sql_stmt => vc_sql,
language_flag => dbms_sql.native,
parallel_level => 4);
--Controller
n_try := 0;
n_status := dbms_parallel_execute.task_status(task_name => vc_task);
while (n_try < 2 and n_status != dbms_parallel_execute.FINISHED) loop
dbms_parallel_execute.resume_task(task_name => vc_task);
n_status := dbms_parallel_execute.task_status(task_name => vc_task);
end loop;
--Deal with Result
dbms_parallel_execute.drop_task(task_name => vc_task);
end;
/
一.4.4.3 create_chunks_by_SQL
declare
vc_task varchar2(100);
vc_sql varchar2(1000);
vc_sql_mt varchar2(1000);
n_try number;
n_status number;
begin
--Define the Task
vc_task := 'Task 3: By SQL';
dbms_parallel_execute.create_task(task_name => vc_task);
--Define the Spilt
vc_sql_mt := 'select distinct object_id, object_id from t';
dbms_parallel_execute.create_chunks_by_SQL(task_name => vc_task,
sql_stmt => vc_sql_mt,
by_rowid => false);
vc_sql := 'update /*+ ROWID(dda) */t set DATA_OBJECT_ID=object_id+1 where object_id between :start_id and :end_id';
--Run the task
dbms_parallel_execute.run_task(task_name => vc_task,
sql_stmt => vc_sql,
language_flag => dbms_sql.native,
parallel_level => 4);
--Controller
n_try := 0;
n_status := dbms_parallel_execute.task_status(task_name => vc_task);
while (n_try < 2 and n_status != dbms_parallel_execute.FINISHED) loop
dbms_parallel_execute.resume_task(task_name => vc_task);
n_status := dbms_parallel_execute.task_status(task_name => vc_task);
end loop;
--Deal with Result
dbms_parallel_execute.drop_task(task_name => vc_task);
end;
/
一.5 About Me
...........................................................................................................................................................................................
本文作者:小麦苗,只专注于数据库的技术,更注重技术的运用
ITPUB BLOG:http://blog.itpub.net/26736162
本文地址:http://blog.itpub.net/26736162/viewspace-1684396/
本文pdf版:http://yunpan.cn/QCwUAI9bn7g7w 提取码:af2d
QQ:642808185 若加QQ请注明你所正在读的文章标题
创作时间地点:2015-06-03 10:00~ 2015-06-03 18:00 于外汇交易中心
<版权所有,文章允许转载,但须以链接方式注明源地址,否则追究法律责任!>
...........................................................................................................................................................................................
11g包dbms_parallel_execute在海量数据处理过程中的应用的更多相关文章
- idea在maven中引入了jar包依赖,但是编译过程中报出XXX程序包不存在,已解决
idea在maven中引入了jar包依赖,但是编译过程中报出XXX程序包不存在 1. 报错具体情况 2. Project Structure中的Libraries没有任何红色波浪线 3. 发现自己要引 ...
- 上传jar包到maven中央仓库过程中遇到的一些问题总结!
网上有很多相关教程, 我按步骤一步步走下来, 都还算顺利, 简单列举一下步骤以及其中需要注意的几个点(不详细, 不适合当教程) 第一步: 到https://issues.sonatype.org/se ...
- DBA_Oracle海量数据处理分析(方法论)
2014-12-18 Created By BaoXinjian
- 腾讯海量数据处理平台TDW
TDW是腾讯海量数据处理平台中最核心的模块,它有以下几个作用: 提供海量的离线计算和存储服务.TDW是腾讯内部规模最大的离线数据处理平台,公司内大多数业务的产品报表.运营分析.数据挖掘等的存储和计算都 ...
- 基于Lua脚本解决实时数据处理流程中的关键问题
摘要 在处理实时数据的过程中需要缓存的参与,由于在更新实时数据时并发处理的特点,因此在更新实时数据时经常产生新老数据相互覆盖的情况,针对这个情况调查了Redis事务和Lua脚本后,发现Redis事务并 ...
- PLUTO平台是由美林数据技术股份有限公司下属西安交大美林数据挖掘研究中心自主研发的一款基于云计算技术架构的数据挖掘产品,产品设计严格遵循国际数据挖掘标准CRISP-DM(跨行业数据挖掘过程标准),具备完备的数据准备、模型构建、模型评估、模型管理、海量数据处理和高纬数据可视化分析能力。
http://www.meritdata.com.cn/article/90 PLUTO平台是由美林数据技术股份有限公司下属西安交大美林数据挖掘研究中心自主研发的一款基于云计算技术架构的数据挖掘产品, ...
- 从hadoop框架与MapReduce模式中谈海量数据处理
http://blog.csdn.net/wind19/article/details/7716326 前言 几周前,当我最初听到,以致后来初次接触Hadoop与MapReduce这两个东西,我便稍显 ...
- 从Hadoop框架与MapReduce模式中谈海量数据处理(含淘宝技术架构) (转)
转自:http://blog.csdn.net/v_july_v/article/details/6704077 从hadoop框架与MapReduce模式中谈海量数据处理 前言 几周前,当我最初听到 ...
- rpm包安装过程中依赖问题“libc.so.6 is needed by XXX”解决方法
rpm包安装过程中依赖问题"libc.so.6 is needed by XXX"解决方法 折腾了几天,终于搞定了CentOS上的Canon LBP2900打印机驱动.中间遇到了一 ...
随机推荐
- scrapy爬虫案例:用MongoDB保存数据
用Pymongo保存数据 爬取豆瓣电影top250movie.douban.com/top250的电影数据,并保存在MongoDB中. items.py class DoubanspiderItem( ...
- android分渠道打包,监测日活量统计(基于友盟SDK)
客服说要看App日活,让加个统计功能. (我们技术部已经混到客服部都能直接提需求的地步) 首先接入友盟统计的SDK,在项目外层的build.gradle中添加依赖'https://dl.bintray ...
- LeetCode 104. Maximum Depth of Binary Tree(二叉树深度)
Given a binary tree, find its maximum depth. The maximum depth is the number of nodes along the long ...
- jw python 培训帮助 手册
#########sample 1 (如果python 遇到有问题,就求助于 help 命令,python 是 个人开发的胶水语言,因此不具备 java,c++的类的继承关系) Python hel ...
- PAT-2019年冬季考试-甲级 7-1 Good in C (20分)
7-1 Good in C (20分) When your interviewer asks you to write "Hello World" using C, can y ...
- 【ARM-Linux开发】ARM嵌入式设备Linux系统启动步骤和方式
1). 简介 本文简单介绍ARM嵌入式设备基于嵌入式Linux操作系统时候的启动步骤和启动方式, 区别与X86平台,ARM平台下并没有一个标准的启动步骤,不同ARM SoC都会使用各自定义的boot ...
- 【转帖】HBase之五:hbase的region分区
HBase之五:hbase的region分区 https://www.cnblogs.com/duanxz/p/3154487.html 一.Region 概念 Region是表获取和分布的基本元素, ...
- springboot异步线程(二)
前言 本篇文章针对上篇文章springboot异步线程,有一位大佬在评论中提出第一点是错误的,当时看到了这个问题,最近刚好有空,针对第一点的问题去搜索了不少的文章: 问题 我在文章中第一点去验证:Sc ...
- Docker容器挂载文件(转载)
一.Docker pull 安装 Nginx 1.查看docker仓库中的 nginx 命令 # 使用 docker search 命令搜索存放在 Docker Hub 中的镜像 docker sea ...
- 【题解】Luogu CF1172B Nauuo and Circle
原题传送门 题意:在圆上有n个节点(珂以构成凸多边形),让你给节点编号,使得将题目给你的边(一棵树)没有交叉 我们钦定1为这个树的根节点.任意节点\(x\)的一颗子树的点应该是圆弧上连续的一段(我也不 ...