The ESTIMATE_PERCENT parameter in DBMS_STATS.GATHER_*_STATS procedures controls the percentage of rows to sample when gathering optimizer statistics. What percentage of rows should you sample to achieve accurate statistics? 100% will ensure that statistics are accurate, but it could take a long time. A 1% sample will finish much more quickly but it could result in poor statistics. It’s not an easy question to answer, which is why it is best practice to use the default: AUTO_SAMPLE_SIZE.

 

In this post, I’ll cover how the AUTO_SAMPLE_SIZE algorithm works in Oracle Database 12c and how it affects the accuracy of the statistics being gathered. If you want to learn more of the history prior to Oracle Database 12c, then this post on Oracle Database 11g is a good place to look. I will indicate below where there are differences between Oracle Database 11g and Oracle Database 12c.

It’s not always appreciated that (in general) a large proportion of the time and resource cost required to gather statistics is associated with evaluating the number of distinct values (NDVs) for each column. Calculating NDV using an exact algorithm can be expensive because the database needs to record and sort column values while statistics are being gathered. If the NDV is high, retaining and sorting column values can become resource-intensive, especially if the sort spills to TEMP. Auto sample size instead uses an approximate (but accurate) algorithm to calculate NDV that avoids the need to sort column data or spill to TEMP. In return for this saving, the database can afford to use a full table scan to ensure that the other basic column statistics are accurate.

Similarly, it can be resource-intensive to generate histograms but the Oracle Database mitigates this cost as follows:

  • Frequency and top frequency histograms are created as the database gathers basic column statistics (such as NDV, MIN, MAX) from the full table scan mentioned above. This is new to Oracle Database 12c.
  • If a frequency or top frequency histogram is not feasible, then the database will collect hybrid histograms using a sample of the column data. Top frequency is only feasible when the top 254 values constitute more than 99% of the entire non null column values and frequency is only feasible if NDV is 254 or less.
  • When the user has specified 'SIZE AUTO' in the METHOD_OPT clause for automatic histogram creation, the Oracle Database chooses which columns to consider for histogram creation based column usage data that’s gathered by the optimizer. Columns that are not used in WHERE-clause predicates or joins are not considered for histograms.

Both Oracle Database 11g and Oracle Database 12c use the following query to gather basic column statistics (it is a simplified here for illustrative purposes).

SELECT COUNT(c1), MIN(c1), MAX(c1)
FROM  t;

The query reads the table (T) and scans all rows (rather than using a sample). The database also needs to calculate the number of distinct values (NDV) for each column but the query does not use COUNT(DISTINCT c1) and so on, but instead, during execution,  a special statistics gathering row source is injected into the query. The statistics gathering row source uses a one-pass, hash-based distinct algorithm to gather NDV. The algorithm requires a full scan of the data, uses a bounded amount of memory and yields a highly accurate NDV that is nearly identical to a 100 percent sampling (a fact that can be proven mathematically). The statistics gathering row source also gathers the number of rows, number of nulls and average column length. Since a full scan is used, the number of rows, average column length, minimum and maximum values are 100% accurate.

Effect of auto sample size on histogram gathering

Hybrid histogram gathering is decoupled from basic column statistics gathering and uses a sampleof column values. This technique was used in Oracle Database 11g to build height-balanced histograms. More information on this can be found in this blog post. Oracle Database 12c replaced height-balanced histograms with hybrid histograms.

Effect of auto sample size on index stats gathering

AUTO_SAMPLE_SIZE affects how index statistics are gathered. Index statistics gathering is sample-based and it can potentially go through several iterations if the sample contains too few blocks or the sample size was too small to properly gather number of distinct keys (NDKs). The algorithm has not changed since Oracle Database 11g, so I’ve left it to the previous blog to go more detail. There one other thing to note:

At the time of writing, there are some cases where index sampling can lead to NDV mis-estimates for composite indexes. The best work-around is to create a column group on the relevant columns and use gather_table_stats. Alternatively, there is a one-off fix - 27268249. This patch changes the way NDV is calculated for indexes on large tables (and no column group is required). It is available for 12.2.0.1 at the moment, but note that it cannot be backported. As you might guess, it's significantly slower than index block sampling, but it's still very fast. At the time of writing, if you find a case where index NDV is causing an issue with a query plan, then the recommended approach is to add a column group rather than attempting to apply this patch.

Summary:

Note that top frequency and hybrid histograms are new to Oracle Database 12c. Oracle Database 11g had frequency and height-balanced histograms only. Hybrid histograms replaced height-balanced histograms.

  1. The auto sample size algorithm uses a full table scan (a 100% sample) to gather basic column statistics.
  2. The cost of a full table scan (verses row sampling) is mitigated by the approximate NDV algorithm, which eliminates the need to sort column data.
  3. The approximate NDV gathered by AUTO_SAMPLE_SIZE is close to the accuracy of a 100% sample.
  4. Other basic column statistics, such as the number of nulls, average column length, minimal and maximal values have an accuracy equivalent to 100% sampling.
  5. Frequency and top frequency histograms are created using a 100%* sample of column values and are created when basic column statistics are gathered. This is different to Oracle Database 11g, which decoupled frequency histogram creation from basic column statistics gathering (and used a sample of column values).
  6. Hybrid histograms are created using a sample of column values. Internally, this step is decoupled from basic column statistics gathering.
  7. Index statistics are gathered using a sample of column values. The sample size is determined automatically.

*There is an exception to case 5, above. Frequency histograms are created using a sample if OPTIONS=>'GATHER AUTO' is used after a bulk load where statistics have been gathered using online statistics gathering.

 

oracle 12c AUTO_SAMPLE_SIZE动态采用工作机制的更多相关文章

  1. oracle 11g AUTO_SAMPLE_SIZE动态采用工作机制

    Note that if you're interested in learning about Oracle Database 12c, there's an updated version of ...

  2. 2014年2月5日 Oracle ORACLE的工作机制[转]

      网上看到一篇描写ORACLE工作机制的文章,觉得很不错!特摘录了下来.   ORACLE的工作机制-1 (by xyf_tck) 我们从一个用户请求开始讲,ORACLE的简要的工作机制是怎样的,首 ...

  3. java开发连接Oracle 12c采用PDB遇到问题记录

    今天初次使用java连接Oracle 12c,遇到各种问题,为方便后续查询,在汇总了问题记录及解决方案如下. ORA-28040: No matching authentication protoco ...

  4. 从一个简单的main方法执行谈谈JVM工作机制

    本来JVM的工作原理浅到可以泛泛而谈,但如果真的想把JVM工作机制弄清楚,实在是很难,涉及到的知识领域太多.所以,本文通过简单的mian方法执行,浅谈JVM工作原理,看看JVM里面都发生了什么. 先上 ...

  5. JavaScript工作机制:V8 引擎内部机制及如何编写优化代码的5个诀窍

    概述 JavaScript引擎是一个执行JavaScript代码的程序或解释器.JavaScript引擎可以被实现为标准解释器,或者实现为以某种形式将JavaScript编译为字节码的即时编译器. 下 ...

  6. oracle 12c 列式存储 ( In Memory 理论)

    随着Oracle 12c推出了in memory组件,使得Oracle数据库具有了双模式数据存放方式,从而能够实现对混合类型应用的支持:传统的以行形式保存的数据满足OLTP应用:列形式保存的数据满足以 ...

  7. Httpd服务入门知识-http协议版本,工作机制及http服务器应用扫盲篇

    Httpd服务入门知识-http协议版本,工作机制及http服务器应用扫盲篇 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.Internet与中国 Internet最早来源于美 ...

  8. malloc 函数工作机制(转)

    malloc()工作机制 malloc函数的实质体现在,它有一个将可用的内存块连接为一个长长的列表的所谓空闲链表.调用malloc函数时,它沿连接表寻找一个大到足以满足用户请求所需要的内存块.然后,将 ...

  9. Oracle 12C RAC的optimizer_adaptive_features造成数据插入超时

    问题分析 使用10046事件追踪方式,直接生成上传时的数据库事件日志进行分析,发现主要区别在于以下两条sql语句在每次长时间上传时都有出现,并且执行用户不是上传用户,而是数据库SYS用户. ***** ...

随机推荐

  1. windows 邮槽mailslot 在服务程序内建立后客户端无权限访问(GetLastError() == 5)的问题

    邮槽创建在服务程序内,可以创建成功, 但外部客户端连接时 m_hMailslot = CreateFile("\\\\.\\mailslot\\zdpMailslot",GENER ...

  2. [LeetCode] 130. Surrounded Regions_Medium tag: DFS/BFS

    Given a 2D board containing 'X' and 'O' (the letter O), capture all regions surrounded by 'X'. A reg ...

  3. [LeetCode] 34. Find First and Last Position of Element in Sorted Array == [LintCode] 61. Search for a Range_Easy tag: Binary Search

    Description Given a sorted array of n integers, find the starting and ending position of a given tar ...

  4. [Java] Create File with java.io.File class

    Create a file with some content in some specific location. The reference is here. /** * Write fileCo ...

  5. .yml文件格式

    http://yaml.org/ YAML: YAML Ain't Markup Language What It Is: YAML is a human friendly data serializ ...

  6. RMAN备份策略与异机恢复一例

    实验环境: A机器(生产用途):RHEL 6.5 + Oracle 11.2.0.4 + IP Address 192.168.1.11 B机器(备机用途):RHEL 6.5 + Oracle 11. ...

  7. shell基础:预定义变量

    比如&& ||用的就是$?,用于计算机的识别

  8. Container(容器)与 Injector(注入)

    Container(容器): History: containerd于2014年出生于Docker,最初是Docker引擎的低层运行时管理器.继2017年3月被CNCF接受之后,containerd已 ...

  9. jQuery-animate万能动画效果

    问题:效果受局限 解决:万能动画函数:animate() animation()可对数值类型的CSS样式执行定时器动画 包括:宽高,位置,透明度,边框宽度,字体大小 强调:不能对非数值类型属性做动画 ...

  10. Rpgmakermv(31)MOG插件与YEP的结合

    问题简述: 因为我在开发时使用了gamequestsystem(任务插件),所以必须使用YEP_mainmenumanager; 此时,我又想加个MOG_Picture插件(图片收集插件): 当他们在 ...