The ESTIMATE_PERCENT parameter in DBMS_STATS.GATHER_*_STATS procedures controls the percentage of rows to sample when gathering optimizer statistics. What percentage of rows should you sample to achieve accurate statistics? 100% will ensure that statistics are accurate, but it could take a long time. A 1% sample will finish much more quickly but it could result in poor statistics. It’s not an easy question to answer, which is why it is best practice to use the default: AUTO_SAMPLE_SIZE.

 

In this post, I’ll cover how the AUTO_SAMPLE_SIZE algorithm works in Oracle Database 12c and how it affects the accuracy of the statistics being gathered. If you want to learn more of the history prior to Oracle Database 12c, then this post on Oracle Database 11g is a good place to look. I will indicate below where there are differences between Oracle Database 11g and Oracle Database 12c.

It’s not always appreciated that (in general) a large proportion of the time and resource cost required to gather statistics is associated with evaluating the number of distinct values (NDVs) for each column. Calculating NDV using an exact algorithm can be expensive because the database needs to record and sort column values while statistics are being gathered. If the NDV is high, retaining and sorting column values can become resource-intensive, especially if the sort spills to TEMP. Auto sample size instead uses an approximate (but accurate) algorithm to calculate NDV that avoids the need to sort column data or spill to TEMP. In return for this saving, the database can afford to use a full table scan to ensure that the other basic column statistics are accurate.

Similarly, it can be resource-intensive to generate histograms but the Oracle Database mitigates this cost as follows:

  • Frequency and top frequency histograms are created as the database gathers basic column statistics (such as NDV, MIN, MAX) from the full table scan mentioned above. This is new to Oracle Database 12c.
  • If a frequency or top frequency histogram is not feasible, then the database will collect hybrid histograms using a sample of the column data. Top frequency is only feasible when the top 254 values constitute more than 99% of the entire non null column values and frequency is only feasible if NDV is 254 or less.
  • When the user has specified 'SIZE AUTO' in the METHOD_OPT clause for automatic histogram creation, the Oracle Database chooses which columns to consider for histogram creation based column usage data that’s gathered by the optimizer. Columns that are not used in WHERE-clause predicates or joins are not considered for histograms.

Both Oracle Database 11g and Oracle Database 12c use the following query to gather basic column statistics (it is a simplified here for illustrative purposes).

SELECT COUNT(c1), MIN(c1), MAX(c1)
FROM  t;

The query reads the table (T) and scans all rows (rather than using a sample). The database also needs to calculate the number of distinct values (NDV) for each column but the query does not use COUNT(DISTINCT c1) and so on, but instead, during execution,  a special statistics gathering row source is injected into the query. The statistics gathering row source uses a one-pass, hash-based distinct algorithm to gather NDV. The algorithm requires a full scan of the data, uses a bounded amount of memory and yields a highly accurate NDV that is nearly identical to a 100 percent sampling (a fact that can be proven mathematically). The statistics gathering row source also gathers the number of rows, number of nulls and average column length. Since a full scan is used, the number of rows, average column length, minimum and maximum values are 100% accurate.

Effect of auto sample size on histogram gathering

Hybrid histogram gathering is decoupled from basic column statistics gathering and uses a sampleof column values. This technique was used in Oracle Database 11g to build height-balanced histograms. More information on this can be found in this blog post. Oracle Database 12c replaced height-balanced histograms with hybrid histograms.

Effect of auto sample size on index stats gathering

AUTO_SAMPLE_SIZE affects how index statistics are gathered. Index statistics gathering is sample-based and it can potentially go through several iterations if the sample contains too few blocks or the sample size was too small to properly gather number of distinct keys (NDKs). The algorithm has not changed since Oracle Database 11g, so I’ve left it to the previous blog to go more detail. There one other thing to note:

At the time of writing, there are some cases where index sampling can lead to NDV mis-estimates for composite indexes. The best work-around is to create a column group on the relevant columns and use gather_table_stats. Alternatively, there is a one-off fix - 27268249. This patch changes the way NDV is calculated for indexes on large tables (and no column group is required). It is available for 12.2.0.1 at the moment, but note that it cannot be backported. As you might guess, it's significantly slower than index block sampling, but it's still very fast. At the time of writing, if you find a case where index NDV is causing an issue with a query plan, then the recommended approach is to add a column group rather than attempting to apply this patch.

Summary:

Note that top frequency and hybrid histograms are new to Oracle Database 12c. Oracle Database 11g had frequency and height-balanced histograms only. Hybrid histograms replaced height-balanced histograms.

  1. The auto sample size algorithm uses a full table scan (a 100% sample) to gather basic column statistics.
  2. The cost of a full table scan (verses row sampling) is mitigated by the approximate NDV algorithm, which eliminates the need to sort column data.
  3. The approximate NDV gathered by AUTO_SAMPLE_SIZE is close to the accuracy of a 100% sample.
  4. Other basic column statistics, such as the number of nulls, average column length, minimal and maximal values have an accuracy equivalent to 100% sampling.
  5. Frequency and top frequency histograms are created using a 100%* sample of column values and are created when basic column statistics are gathered. This is different to Oracle Database 11g, which decoupled frequency histogram creation from basic column statistics gathering (and used a sample of column values).
  6. Hybrid histograms are created using a sample of column values. Internally, this step is decoupled from basic column statistics gathering.
  7. Index statistics are gathered using a sample of column values. The sample size is determined automatically.

*There is an exception to case 5, above. Frequency histograms are created using a sample if OPTIONS=>'GATHER AUTO' is used after a bulk load where statistics have been gathered using online statistics gathering.

 

oracle 12c AUTO_SAMPLE_SIZE动态采用工作机制的更多相关文章

  1. oracle 11g AUTO_SAMPLE_SIZE动态采用工作机制

    Note that if you're interested in learning about Oracle Database 12c, there's an updated version of ...

  2. 2014年2月5日 Oracle ORACLE的工作机制[转]

      网上看到一篇描写ORACLE工作机制的文章,觉得很不错!特摘录了下来.   ORACLE的工作机制-1 (by xyf_tck) 我们从一个用户请求开始讲,ORACLE的简要的工作机制是怎样的,首 ...

  3. java开发连接Oracle 12c采用PDB遇到问题记录

    今天初次使用java连接Oracle 12c,遇到各种问题,为方便后续查询,在汇总了问题记录及解决方案如下. ORA-28040: No matching authentication protoco ...

  4. 从一个简单的main方法执行谈谈JVM工作机制

    本来JVM的工作原理浅到可以泛泛而谈,但如果真的想把JVM工作机制弄清楚,实在是很难,涉及到的知识领域太多.所以,本文通过简单的mian方法执行,浅谈JVM工作原理,看看JVM里面都发生了什么. 先上 ...

  5. JavaScript工作机制:V8 引擎内部机制及如何编写优化代码的5个诀窍

    概述 JavaScript引擎是一个执行JavaScript代码的程序或解释器.JavaScript引擎可以被实现为标准解释器,或者实现为以某种形式将JavaScript编译为字节码的即时编译器. 下 ...

  6. oracle 12c 列式存储 ( In Memory 理论)

    随着Oracle 12c推出了in memory组件,使得Oracle数据库具有了双模式数据存放方式,从而能够实现对混合类型应用的支持:传统的以行形式保存的数据满足OLTP应用:列形式保存的数据满足以 ...

  7. Httpd服务入门知识-http协议版本,工作机制及http服务器应用扫盲篇

    Httpd服务入门知识-http协议版本,工作机制及http服务器应用扫盲篇 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.Internet与中国 Internet最早来源于美 ...

  8. malloc 函数工作机制(转)

    malloc()工作机制 malloc函数的实质体现在,它有一个将可用的内存块连接为一个长长的列表的所谓空闲链表.调用malloc函数时,它沿连接表寻找一个大到足以满足用户请求所需要的内存块.然后,将 ...

  9. Oracle 12C RAC的optimizer_adaptive_features造成数据插入超时

    问题分析 使用10046事件追踪方式,直接生成上传时的数据库事件日志进行分析,发现主要区别在于以下两条sql语句在每次长时间上传时都有出现,并且执行用户不是上传用户,而是数据库SYS用户. ***** ...

随机推荐

  1. [Java in NetBeans] Lesson 08. If: conditional statement

    这个课程的参考视频和图片来自youtube. 主要学到的知识点有: 1. If-else statement if (x > 5) { System.out.println("Inpu ...

  2. 一个简单的MapReduce示例(多个MapReduce任务处理)

    一.需求 有一个列表,只有两列:id.pro,记录了id与pro的对应关系,但是在同一个id下,pro有可能是重复的. 现在需要写一个程序,统计一下每个id下有多少个不重复的pro. 为了写一个完整的 ...

  3. python filter函数应用,过滤字符串

    >>> candidate = 'dade142.;!0142f[.,]ad' >>> filter(str.isdigit, candidate) #保留数字 ' ...

  4. ASP.NET Core 启动流程图

    简洁描述: 一   WebHostBuilder.Build() =>1注入公共的实例 2创建WebHost实例 3注入自定义实例 4创建IServer 5添加中间件(_components集合 ...

  5. Web API 入门 一

    因为只是是一个简单的入门.所有暂时不去研究web API一些规范.比如RESTful API 这里有个接收RESTful API的.RESTful API 什么是WebApi 看这里:http://w ...

  6. PHP 函数 ignore_user_abort()详解笔记

     定义和用法 ignore_user_abort()函数设置与客户机断开是否会终止脚本的执行  语法 ignore_user_abort(setting) 参数 描述 setting 可选.如果设置为 ...

  7. Linux shell脚本 批量创建多个用户

    Linux shell脚本 批量创建多个用户 #!/bin/bash groupadd charlesgroup for username in charles1 charles2 charles3 ...

  8. MySQL.ERROR 1133 (42000): Can't find any matching row in the user table

    ERROR 1133 (42000): Can't find any matching row in the user table 今天在执行  grant all privileges on cac ...

  9. java 连接redis 以及基本操作

    一.首先下载安装redis 二.项目搭建 1.搭建一个maven 工程 2. 在pom.xml文件的dependencies节点下增加如下内容: <!-- resis --> <de ...

  10. gispro设置标注属性字体样式设置

    为了应对电子地图和卫星影像的底图,标注样式选择比较关键.挑选了黑字白色晕圈效果.记住不是设置字体轮廓. 因为字体宽度(字粗)有限,设置轮廓直接把字体本身的颜色覆盖了