Redundant data in update statements
UPDATE
statements, which include all columns, regardless of whether I'm changing the value in that columns, eg:tx.begin();
Item i = em.find(Item.class, 12345);
i.setA("a-value");
tx.commit();
issues this UPDATE
statement:
update Item set A = $1, B = $2, C = $3, D = $4 where id = $5
so columns B, C, D are updated, while I didn't change them.
Say, Items are updated frequently and all columns are indexed. The question is: does it make sense to optimize the Hibernate part to something like this:
tx.begin();
em.createQuery("update Item i set i.a = :a where i.id = :id")
.setParameter("a", "a-value")
.setParameter("id", 12345)
.executeUpdate();
tx.commit();
EXPLAIN
plans of the 'unoptimized' and the 'optimized' query version are identical!Due to PostgreSQL MVCC, an UPDATE
is effectively a DELETE
plus an INSERT
. (To be precise, the "deleted" row is just invisible to any transaction starting after the delete and vacuumed later.) Therefore, on the database side, including index manipulation, there is in effect no difference between the two statements. It increases network traffic a bit (depending on your data) and needs a bit of parsing.
I studied HOT updates after araqnid's input and ran some tests. Updates on columns that don't actually change the value make no difference whatsoever as far as HOT updates are concerned. My answer holds. See details below.
However, if you use per-column triggers (introduced with v9.0), this my have undesired side effects!
I quote the manual on triggers:
... a command such as UPDATE ... SET x = x ... will fire a trigger on column x, even though the column's value did not change.
Abstraction layers are for convenience. They are useful for SQL-illiterate developers or if the application needs to be portable between different RDBMS. On the downside, they can butcher performance and introduce additional points of failure. I avoid them wherever possible.
Concerning HOT (Heap-only tuple) updates
Heap-Only Tuples were introduced with Postgres 8.3, with important improvements in 8.3.4 and 8.4.9.
The release notes for Postgres 8.3:
UPDATEs and DELETEs leave dead tuples behind, as do failed INSERTs. Previously only VACUUM could reclaim space taken by dead tuples.
With HOT dead tuple space can be automatically reclaimed at the time of INSERT or UPDATE if no
changes are made to indexed columns. This allows for more consistent performance. Also, HOT avoids adding duplicate index entries.
Emphasis mine. And "no changes" includes cases where columns are updated with the same value as they already hold. I actually tested that just now, as I wasn't sure.
You don't have to take my word for it. See for yourself, Postgres provides a couple of functions to check statistics. Run your UPDATE
with and without all columns and check if it makes any difference.
-- Number of rows HOT-updated in table:
SELECT pg_stat_get_tuples_hot_updated('table_name'::regclass::oid) -- Number of rows HOT-updated in table, in the current transaction:
SELECT pg_stat_get_xact_tuples_hot_updated('table_name'::regclass::oid)
Or use pgAdmin. Select your table and inspect the "Statistics" tab in the main window.
Be aware that HOT updates are only when there is room for the new tuple version on the same page. One simple way to force that condition is to test with a small table that holds only a few rows. Page size is typically 8k, so there must be free space on the page.
其中araqnid论证的过程如下:
create temp table t1(t1_id serial primary key, reference varchar(16) not null unique, value varchar(16) not null);
copy t1(reference, value) from stdin;
FOO foo
BAR bar
QUUX quux
\. create temp view t1_combined as
select t1_id, reference, value, ctid, lp_flags, lp_off, case when t_ctid <> ctid then t_ctid end as t_ctid,
t_xmin, xmin_visible, case when t_xmax::text <> '' then t_xmax end as t_xmax, xmax_visible,
xmin_visible and (xmax_visible is null or not xmax_visible or t_locked <> '') as visible, t_hot_updated, t_heap_only
from (select *,
t_xmin_valid and txid_visible_in_snapshot(t_xmin::text::bigint, txid_current_snapshot()) as xmin_visible,
t_xmax_valid and txid_visible_in_snapshot(t_xmax::text::bigint, txid_current_snapshot()) as xmax_visible
from (select ('(' || 0 || ',' || lp || ')')::tid as ctid,
lp, lp_off, case lp_flags when 0 then 'UNUSED' when 1 then 'NORMAL' when 2 then 'REDIRECT' when 3 then 'DEAD' end as lp_flags,
lp_len, t_xmin, t_xmax, t_field3, t_ctid, (t_infomask&1)<>0 as t_hasnull, (t_infomask&2)<>0 as t_hasvarwidth,
(t_infomask&4)<>0 as t_hasexternal, (t_infomask&8)<>0 as t_hasoid, (t_infomask&32)<>0 as t_combocid,
case t_infomask & 192 when 64 then 'EXCL' when 128 then 'SHARE' when 0 then '' when 192 then 'INVALID' end as t_locked,
(t_infomask&256)<>0 as t_xmin_committed, (t_infomask&512)=0 as t_xmin_valid,
(t_infomask&1024)<>0 as t_xmax_committed, (t_infomask&2048)=0 as t_xmax_valid,
(t_infomask&4096)<>0 as t_xmax_is_multi, (t_infomask&8192)<>0 as t_updated,
(t_infomask&16384)<>0 as t_moved_off, (t_infomask&32768)<>0 as t_moved_in,
t_infomask2&2047 as t_natts, (t_infomask2&16384)<>0 as t_hot_updated,
(t_infomask2&32768)<>0 as t_heap_only,
t_hoff, t_bits, t_oid
from heap_page_items(get_raw_page('t1', 0))) format_heap_page_items
) heap
full outer join (select ctid, * from t1) t1 using (ctid); create temp view t1_indices as
select ctid, pkey_content.itemoffset as pkey_itemoffset, pkey_content.data as pkey_data, auxkey_content.itemoffset as auxkey_itemoffset, auxkey_content.data as auxkey_data
from bt_page_items('t1_pkey', 1) pkey_content
full outer join bt_page_items('t1_reference_key', 1) auxkey_content using (ctid); \echo ********************************************************************************
\echo * Initial table
\echo
select * from t1_combined;
select * from t1_indices; \echo ********************************************************************************
\echo * Update non-indexed column
\echo * - index entries untouched
\echo * - old tuple at ctid (0,1) has t_hot_updated set
\echo * - new tuple at ctid (0,4) has t_heap_only set
\echo * - t_ctid of (0,1) points to (0,4)
\echo begin;
update t1 set value = 'mumble' where t1_id = 1;
end; select * from t1_combined;
select * from t1_indices; \echo ********************************************************************************
\echo * Update non-indexed column again
\echo * - tuple at ctid (0,4) now just points to ctid (0,5) and is redundant
\echo begin;
update t1 set value = 'womble' where t1_id = 1;
end; select * from t1_combined;
select * from t1_indices; \echo ********************************************************************************
\echo * Vacuum table
\echo * - line pointer ctid (0,1) converted to REDIRECT since index entries still point to it
\echo * - redundant tuple at ctid (0,4) reclaimed for reuse
\echo vacuum t1; select * from t1_combined;
select * from t1_indices; \echo ********************************************************************************
\echo * Update indexed column
\echo * - New index entries written for new tuple at ctid (0,4) which is now reused
\echo update t1 set reference = 'WOMBLE' where t1_id = 1; select * from t1_combined;
select * from t1_indices; \echo ********************************************************************************
\echo * Update indexed column to contain same value
\echo * - even though indexed column is mentioned in update, this makes a heap-only change
\echo * - current version is now (0,6) but indices still indicate (0,4)
\echo update t1 set reference = 'WOMBLE', value = 'womble2' where t1_id = 1; select * from t1_combined;
select * from t1_indices; \echo ********************************************************************************
\echo * Vacuum table
\echo * - ctid (0,1) now reclaimed, index entries pointing to it removed
\echo * - ctid (0,5) reclaimed too, it never had index entries pointing to it
\echo vacuum t1; select * from t1_combined;
select * from t1_indices;
执行结果可以根据脚本自测。在此不再列出。
参考:https://stackoverflow.com/questions/7806058/redundant-data-in-update-statements/7806610#7806610
Redundant data in update statements的更多相关文章
- Map Columns From Different Tables and Create Insert and Update Statements in Oracle Forms
This is one of my most needed tool to create Insert and Update statements using select or alias from ...
- spring data jpa update
一:在controller 加上: @Controller @RequestMapping("/user") public class UserController { @Aut ...
- [转]Creating an Entity Framework Data Model for an ASP.NET MVC Application (1 of 10)
本文转自:http://www.asp.net/mvc/overview/older-versions/getting-started-with-ef-5-using-mvc-4/creating-a ...
- INSERT ... ON DUPLICATE KEY UPDATE Syntax
一 mybatis中返回自动生成的id 当有时我们插入一条数据时,由于id很可能是自动生成的,如果我们想要返回这条刚插入的id怎么办呢.在mysql数据中我们可以在insert下添加一个selectK ...
- Data Types
原地址: Home / Database / Oracle Database Online Documentation 11g Release 2 (11.2) / Database Administ ...
- Data Block Compression
The database can use table compression to eliminate duplicate values in a data block. This section d ...
- How To Commit Just One Data Block Changes In Oracle Forms
You have an Oracle Form in which you have multiple data blocks and requirement is to commit just one ...
- Indexing Sensor Data
In particular embodiments, a method includes, from an indexer in a sensor network, accessing a set o ...
- INSERT ... ON DUPLICATE KEY UPDATE Syntax 专题
ON DUPLICATE KEY UPDATE :不用用于批量,除 insert into t1 select * from t2 on duplicated key update k1=v1,k2 ...
随机推荐
- 小白初识 - 归并排序(MergeSort)
归并排序是一种典型的用分治的思想解决问题的排序方式. 它的原理就是:将一个数组从中间分成两半,对分开的两半再分成两半,直到最终分到最小的单位(即单个元素)的时候, 将已经分开的数据两两合并,并且在合并 ...
- NO.08--VUE之自定义组件添加原生事件
前几篇给大家分享了我的业余的“薅羊毛”的经历,回归正题,讲回vue吧: 许多vue新手在工作开发中会遇到一个问题,直接使用 button 添加原生事件是没有问题的,但是使用自定义组件添加原生事件时,就 ...
- 当git遇上中文乱码
git有个比较奇怪的问题,当目录或者文件名中出现了中文的时候,在执行git status 的时候,会返回一串unicode码,这段unicode码就读不懂了,必须解决. git status显示uni ...
- linux-ubuntu配置通过22端口远程连接
当安装好ubuntu后获取到对应主机的ip地址,要想通过类似xshell这样的远程连接工具连接到ubuntu主机,需要在你刚刚安装好的ubuntu主机上安装openssh这个软件,才能通过远程来连接u ...
- 如何理解IPD+CMMI+Scrum一体化研发管理解决方案之Scrum篇
如何快速响应市场的变化,如何推出更有竞争力的产品,如何在竞争中脱颖而出,是国内研发企业普遍面临的核心问题,为了解决这些问题,越来越多的企业开始重视创新与研发管理,加强研发过程的规范化,集成产品开发(I ...
- UITableViewCell contentView layoutSubviews 死循环
发现一个问题,当在UITableViewCell 的 layoutSubviews 中修改 contentView 的frame时会产生死循环.该问题只会出现在iOS8中,iOS7与iOS9均没有问题 ...
- ASP.NET Zero--2.如何启动
1.直接启动 VS中直接启动 2.IIS站点 IIS中配置一个站点来启动(推荐) 3.登录 系统默认创建2个用户 默认用户名:admin 密码:123qwe 租户:Default 默认用户名:adm ...
- css3 伪元素 ::before ::after
键代码分析: /*css代码*/ .effect::before, .effect::after{ content:""; position:absolute; z-index:- ...
- openssl 加密算法 CA 介绍
首先对于tftp服务的简要使用说明 (1)yum安装:tftp.tftp-server (2)启动tftp CentOS 6 service xinetd restart chkconfig tf ...
- scrum 项目准备2.0
1.确定选题. 应用NABCD模型,分析你们初步选定的项目,充分说明你们选题的理由. 录制为演说视频,上传到视频网站,并把链接发到团队博客上. 截止日期:2016.5.6日晚10点 演说稿: 各位领导 ...