PostgreSQL在没有备份情况下误删除Clog恢复

创建实验表
postgres# create table t (n_id int primary key,c_name varchar(300));
CREATE TABLE
postgres# insert into t select id,(id*1000)::text as name from
generate_series(1,1000) id;
INSERT 0 1000
postgres# delete from t where n_id =1000;
DELETE 1
postgres# update t set c_name = 'cs' where n_id > 990;
UPDATE 9
postgres# select * from t; -- 结果略，此步很关键
postgres# insert into t values ( 1001,'insert'),(1002,'insert');
INSERT 0 2
postgres# update t set c_name = 'update' where n_id = 1002;
UPDATE 1

关闭，并备份数据库
$ pg_ctl stop
等待服务器进程关闭 .... 完成
服务器进程已经关闭
$ cp -R $PGDATA $PGDATA/../pgdata_bak1

删除clog文件
$ cd $PGDATA/pg_xact
$ ls
0000 bak
$ rm 0000

启动数据库报错
$ pg_ctl start
pg_ctl: 无法启动服务器进程
检查日志输出.

略过一些不重要输出，报无法打开pg_xact/0000文件。

$ tail postgresql-2020-08-11_123341.csv
授权失效处理方式:通知
临近失效授权提醒天数:15
",,,,,,,,,""

使用dd命令创建一个clog文件
因为一个clog文件最大为256K，所以只需创建一个256K的文件即可。写入全0的文件，代表所有事务均
在IN_PROGRESS状态。

状态标识事务状态
0x00 TRANSACTION_STATUS_IN_PROGRESS
0x01 TRANSACTION_STATUS_COMMITTED
0x02 TRANSACTION_STATUS_ABORTED
0x03 TRANSACTION_STATUS_SUB_COMMITTED

$ dd if=/dev/zero of=$PGDATA/pg_xact/0000 bs=8K count=32
记录了32+0 的读入
记录了32+0 的写出
262144字节(262 kB)已复制，0.00102351 秒，256 MB/秒

再次启动，验证数据丢失
postgres# select * from t where n_id >=990;
n_id | c_name
------+--------
990 | 990000
991 | cs
992 | cs
993 | cs
994 | cs
995 | cs
996 | cs
997 | cs
998 | cs
999 | cs
1002 | insert

clog丢失，部分事务的影响依然得到保留

IN_PROGRESS 状态的事务对数据的操作，其他会话应该是不可见的。通过查看资料，可以了解到普通
数据元组由3部分组成，HeapTupleHeaderData结构、空值位图及用户数据。HeapTupleHeaderData
的结构如下，来源

Field Type Length Description
t_xmin TransactionId 4 bytes insert XID stamp
t_xmax TransactionId 4 bytes delete XID stamp
t_cid CommandId 4 bytes insert and/or delete CID stamp (overlays
with t_xvac)
t_xvac TransactionId 4 bytes XID for VACUUM operation moving a row
version
t_ctid ItemPointerData 6 bytes current TID of this or newer row version
t_infomask2 uint16 2 bytes number of attributes, plus various flag bits
t_infomask uint16 2 bytes various flag bits
t_hoff uint8 1 byte offset to user data

其中t_infomask也决定了行的可见性，而且t_infomask的优先级更高。一共16位二进制，每4位表示1
个含义。其中第二段用来判断行的可见性。

#define HEAP_HASNULL 0x0001 /* has null attribute(s) */
#define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */
#define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */
#define HEAP_HASOID 0x0008 /* has an object-id field */
#define HEAP_XMAX_KEYSHR_LOCK 0x0010 /* xmax is a key-shared locker */
#define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */
#define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */
#define HEAP_XMAX_LOCK_ONLY 0x0080 /* xmax, if valid, is only a locker */
/* xmax is a shared locker */
#define HEAP_XMAX_SHR_LOCK (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
#define HEAP_LOCK_MASK (HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
HEAP_XMAX_KEYSHR_LOCK)
#define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
#define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
#define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */
#define HEAP_XMAX_INVALID 0x0800 /* t_xmax invalid/aborted */
#define HEAP_XMAX_IS_MULTI 0x1000 /* t_xmax is a MultiXactId */
#define HEAP_UPDATED 0x2000 /* this is UPDATEd version of row */
#define HEAP_MOVED_OFF 0x4000 /* moved to another place by pre-9.0
* VACUUM FULL; kept for binary * upgrade
support */
#define HEAP_MOVED_IN 0x8000 /* moved from another place by pre-9.0
* VACUUM FULL; kept for binary * upgrade support
*/
#define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)
#define HEAP_XACT_MASK 0xFFF0 /* visibility-related bits */

根据刚刚的实验情况看，我们大胆猜测：
t_infomask的更新时间为数据写入后的首次访问。因此在对id为1002的数据更新时，将1002的插入操
作的提交信息写入t_infomask字段。所以上述实验仅丢失对1001的插入和1002的更新。

查看源码验证
通过注释可以了解到，为了避免对clog(10以上改名为xact)日志的争用，代码从以前的立刻更新
t_infomask，改为首次访问时修改（postgres9.6版本修改,对应abase为3.6.1版本）。
heapam_visibility.c 962行

/*
* HeapTupleSatisfiesMVCC
* True iff heap tuple is valid for the given MVCC snapshot.
* *
See SNAPSHOT_MVCC's definition for the intended behaviour.
* *
Notice that here, we will not update the tuple status hint bits if the
* inserting/deleting transaction is still running according to our snapshot,
* even if in reality it's committed or aborted by now. This is intentional.
* Checking the true transaction state would require access to high-traffic
* shared data structures, creating contention we'd rather do without, and it
* would not change the result of our visibility check anyway. The hint bits
* will be updated by the first visitor that has a snapshot new enough to see
* the inserting/deleting transaction as done. In the meantime, the cost of
* leaving the hint bits unset is basically that each HeapTupleSatisfiesMVCC
* call will need to run TransactionIdIsCurrentTransactionId in addition to
* XidInMVCCSnapshot (but it would have to do the latter anyway). In the old
* coding where we tried to set the hint bits as soon as possible, we instead
* did TransactionIdIsInProgress in each call --- to no avail, as long as the
* inserting/deleting transaction was still running --- which was more cycles
* and more contention on the PGXACT array.
*/
static bool
HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
Buffer buffer)
{
HeapTupleHeader tuple = htup->t_data;
Assert(ItemPointerIsValid(&htup->t_self));
Assert(htup->t_tableOid != InvalidOid);
if (!HeapTupleHeaderXminCommitted(tuple))
{
if (HeapTupleHeaderXminInvalid(tuple))
return false;
/* Used by pre-9.0 binary upgrades */
if (tuple->t_infomask & HEAP_MOVED_OFF)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
if (TransactionIdIsCurrentTransactionId(xvac))
return false;
if (!XidInMVCCSnapshot(xvac, snapshot))
{if (TransactionIdDidCommit(xvac))
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
} S
etHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
}
} /
* Used by pre-9.0 binary upgrades */
else if (tuple->t_infomask & HEAP_MOVED_IN)
{
TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
if (!TransactionIdIsCurrentTransactionId(xvac))
{
if (XidInMVCCSnapshot(xvac, snapshot))
return false;
if (TransactionIdDidCommit(xvac))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
InvalidTransactionId);
else
{
SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
InvalidTransactionId);
return false;
}
}
} e
lse if
(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
{
if (HeapTupleHeaderGetCmin(tuple) >= snapshot->curcid)
return false; /* inserted after scan started */
if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */
return true;
if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)) /* not deleter
*/
return true;
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
TransactionId xmax;
xmax = HeapTupleGetUpdateXid(tuple);
/* not LOCKED_ONLY, so it has to have an xmax */
Assert(TransactionIdIsValid(xmax));
/* updating subtransaction must have aborted */
if (!TransactionIdIsCurrentTransactionId(xmax))
return true;
else if (HeapTupleHeaderGetCmax(tuple) >= snapshot->curcid)
return true; /* updated after scan started */
elseheapam_visibility.c 113行
return false; /* updated before scan started */
} i
f
(!TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple)))
{
/* deleting subtransaction must have aborted */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
InvalidTransactionId);
return true;
} i
f (HeapTupleHeaderGetCmax(tuple) >= snapshot->curcid)
return true; /* deleted after scan started */
else
return false; /* deleted before scan started */
} e
lse if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
return false;
else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
HeapTupleHeaderGetRawXmin(tuple));

heapam_visibility.c 113行

/*
* SetHintBits()
* *
Set commit/abort hint bits on a tuple, if appropriate at this time.
* *
It is only safe to set a transaction-committed hint bit if we know the
* transaction's commit record is guaranteed to be flushed to disk before the
* buffer, or if the table is temporary or unlogged and will be obliterated by
* a crash anyway. We cannot change the LSN of the page here, because we may
* hold only a share lock on the buffer, so we can only use the LSN to
* interlock this if the buffer's LSN already is newer than the commit LSN;
* otherwise we have to just refrain from setting the hint bit until some
* future re-examination of the tuple.
* *
We can always set hint bits when marking a transaction aborted. (Some
* code in heapam.c relies on that!)
* *
Also, if we are cleaning up HEAP_MOVED_IN or HEAP_MOVED_OFF entries, then
* we can always set the hint bits, since pre-9.0 VACUUM FULL always used
* synchronous commits and didn't move tuples that weren't previously
* hinted. (This is not known by this subroutine, but is applied by its
* callers.) Note: old-style VACUUM FULL is gone, but we have to keep this
* module's support for MOVED_OFF/MOVED_IN flag bits for as long as we
* support in-place update from pre-9.0 databases.
* *
Normal commits may be asynchronous, so for those we need to get the LSN
* of the transaction and then check whether this is flushed.
* *
The caller should pass xid as the XID of the transaction to check, or
* InvalidTransactionId if no check is needed.
*/
static inline void
SetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid)
{
if (TransactionIdIsValid(xid))
{
/* NB: xid must be known committed here! */
XLogRecPtr commitLSN = TransactionIdGetCommitLSN(xid);
if (BufferIsPermanent(buffer) && XLogNeedsFlush(commitLSN) &&
BufferGetLSNAtomic(buffer) < commitLSN)
{
/* not flushed and no LSN interlock, so don't set hint */
return;
}
} t
uple->t_infomask |= infomask;
MarkBufferDirtyHint(buffer, true);
}

总结
数据备份至关重要。
行的可见性判断除了根据这行上的xmin,xmax和clog的决定，行上的t_infomask也决定了行的可
见性。且优先级高于clog。
事务修改并提交后，提交状态不会立刻写入t_infomask字段，需要在记录被首次访问时才会写入
（abase3.6.1以后）。
若clog丢失，采用dd一个全0文件的方式启动数据库，则丢失部分数据（插入后未访问，插入丢
失；删除后未访问，数据依然存在未被删除；更新后未访问，旧数据可见，更新的改变丢失）。

无数据库备份，误删利用可见性原则恢复
注：仅支持特定情况下的恢复，强调必须做好数据备份，利用备份来保障数据安全。本例更多的是方便
大家理解mvcc、vacuum以及记录可见性规则。
创建实验表
删除全部记录，模拟误删
在开启autovacuum状态下，删除表全部数据，在遇到执行autovacuum进程时，会将表全部数据清
理。这时数据就无法利用本方法恢复，所以删除后要尽快关闭数据库。

将autovacuum改为off
vi $PGDATA/postgresql.auto.conf
# 修改或增加如下配置
autovacuum = 'off'

启动数据库查看删除命令的事务id

# 本实验这里表新创建，看起来比较简单。删除的事务id只有一个,253975。get_raw_page函数第一个参
数是表名，第二个参数是page的编号，从0开始。
select t_xmax,* from heap_page_items(get_raw_page('public.t',0));
select t_xmax,* from heap_page_items(get_raw_page('public.t',1));
select t_xmax,* from heap_page_items(get_raw_page('public.t',2));
select t_xmax,* from heap_page_items(get_raw_page('public.t',3));
select t_xmax,* from heap_page_items(get_raw_page('public.t',4));
select t_xmax,* from heap_page_items(get_raw_page('public.t',5));
t_xmax | lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid
| t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
--------+-----+--------+----------+--------+--------+--------+----------+-------
--+-------------+------------+--------+--------+-------+------------------------
--
253975 | 1 | 8152 | 1 | 33 | 253974 | 253975 | 0 | (0,1)
| 8194 | 258 | 24 | | | \x010000000b31303030
253975 | 2 | 8112 | 1 | 33 | 253974 | 253975 | 0 | (0,2)
| 8194 | 258 | 24 | | | \x020000000b32303030
253975 | 3 | 8072 | 1 | 33 | 253974 | 253975 | 0 | (0,3)
| 8194 | 258 | 24 | | | \x030000000b33303030
253975 | 4 | 8032 | 1 | 33 | 253974 | 253975 | 0 | (0,4)
| 8194 | 258 | 24 | | | \x040000000b34303030
253975 | 5 | 7992 | 1 | 33 | 253974 | 253975 | 0 | (0,5)
| 8194 | 258 | 24 | | | \x050000000b35303030
253975 | 6 | 7952 | 1 | 33 | 253974 | 253975 | 0 | (0,6)
| 8194 | 258 | 24 | | | \x060000000b36303030
253975 | 7 | 7912 | 1 | 33 | 253974 | 253975 | 0 | (0,7)
| 8194 | 258 | 24 | | | \x070000000b37303030
253975 | 8 | 7872 | 1 | 33 | 253974 | 253975 | 0 | (0,8)
| 8194 | 258 | 24 | | | \x080000000b38303030

关闭数据库
$ pg_ctl stop
等待服务器进程关闭 .... 完成
服务器进程已经关闭

pg_resetwal修改下一个事务id为前面查到的误删事务id

$ pg_resetwal $PGDATA -x 253975
Write-ahead log reset

启动数据库
$ pg_ctl start
服务器进程已经启动

将误删数据备份到临时表
当前数据库下一个事务id为253975，因此事务id为253975的误删是不可见的。表内目前依然可以查询
到误删的1000条数据，这些数据的xmax为253975。

postgres# create table t_del as select * from t where xmax=253975;

再次查询t表，数据已经看不到了
因为上一个建表语句，事务id变大了1个。误删事务对数据的修改变为可见。

postgres# select * from t;
n_id | c_name
------+--------
(0 行记录)

将数据插回t表，完成数据恢复
postgres# insert into t select * from t_del;
INSERT 0 1000
postgres# select count(*) from t;
count
-------
1000
(1 行记录)

执行sql使修改生效 select pg_reload_conf();

总结
数据备份至关重要。
由于abase的vacuum机制，删除的数据，并不会立刻删掉。只是做了相关的标志。如果vacuum
一旦清理了这些数据，那么是无法恢复。
autovacuum最低执行间隔由autovacuum_naptime参数控制，默认1分钟。
运行autovacuum时，是否对表进行vacuum，由autovacuum_vacuum_scale_factor参数及
autovacuum_vacuum_threshold参数共同决定，只有同时满足dead tuple数量>=
autovacuum_vacuum_scale_factor*
reltuples(表上记录数) + autovacuum_vacuum_threshold，才会对表进行vacuum操作。
db01=# show autovacuum_naptime ;
autovacuum_naptime
--------------------
1min
db01=# show autovacuum_vacuum_scale_factor ;
autovacuum_vacuum_scale_factor
--------------------------------
0.2
(1 行记录)
db01=# show autovacuum_vacuum_threshold ;
autovacuum_vacuum_threshold
-----------------------------
50

PostgreSQL在没有备份情况下误删除Clog恢复的更多相关文章

迁移/home目录至新硬盘分区总结--无备份情况下
搞了一天,终于成功迁移.由于一开始就没备份过程实在很曲折. 希望本篇对那些没有备份习惯的朋友们有所帮助. 准备工作: sudo vim /etc/fstab 在文件中加入: /dev/sdb8 ...
Oracle备份恢复之无备份情况下恢复undo表空间
UNDO表空间存储着DML操作数据块的前镜像数据,在数据回滚,一致性读,闪回操作,实例恢复的时候都可能用到UNDO表空间中的数据.如果在生产过程中丢失或破坏了UNDO表空间,可能导致某些事务无法回滚, ...
Oracle 无备份情况下的恢复--临时文件/在线重做日志/ORA-00205
13.5 恢复临时文件临时文件没有也不应该备份.通过V$TEMPFILE可以找到所有的临时文件. 此类文件的损坏会造成需要使用临时表空间的命令执行失败,不至于造成实例崩溃或session中断.由于临 ...
Oracle 无备份情况下的恢复--控制文件/数据文件
13.3无备份恢复控制文件没有备份恢复控制文件其实就是在nomount状态,create control创建一个新的控制文件. dba必须知道4个信息才能正确的创建:数据库名.在线日志路径及其大小. ...
Mysql无法启动情况下，如何恢复数据？
本文适用于,mysql无法启动,但数据文件未丢失的情况. Mysql因意外情况,导致无法启动,数据库未做备份的情况下,如何将数据迁移至其他数据库中. 原数据库地址:192.168.1.100(以下简称 ...
windows环境下，svn未备份情况下重新恢复
公司有个同事在未打招呼的情况下把公司服务器进行重新装系统,崩溃啊.SVN之前未备份,还好SVN的库(Repositories)还在,如下图: 恢复办法如下: 由于之前安装的就是VisualSVN-Se ...
Oracle 无备份情况下的恢复--密码文件/参数文件
13.1 恢复密码文件密码文件(linux 为例)在$ORACLE_HOME/dbs目录下,文件名的前缀是orapw,后接数据库实例名. [oracle@DSI backup]$ cd /u01/a ...
在没备份undo的情况下，undo丢失，重启数据库报ORA-01157错误
今天做了一下undo隐藏参数的实验在没有备份的情况下,删除正在使用的undo,然后关机 (本次使用的的oracle的隐藏参数,慎用!!!!!!!!!!!!!!) idle> select * ...
SQL Server 在缺少文件组的情况下如何还原数据库
SQL Server 在缺少文件组的情况下如何还原数据库一.背景我有一个A库,由于a,b两张表的数据量比较大,所以对表进行分区:在把A库迁移到一个新的集群上去,我只备份了A库的主分区过去进行还原为 ...

随机推荐

Ethical Hacking - NETWORK PENETRATION TESTING(14)
MITM - ARP Poisoning Theory Man In The Middle Attacks - ARP Poisoning This is one of the most danger ...
简单实用的办公软件导航网站，IT经理必备工具
最近非常忙,因为公司上线了业财一体化系统.今天分享一个非常实用的办公软件导航网站,节省了我很多百度的时间. 快氪导航,让软件服务更简单. 一.办公软件导航站长已经按照功能进行了分类:协同办公,流程审 ...
【软件安装】CentOS7安装MariaDb(mysql_替代品安装)
1.背景 Maria Db是流行的跨平台MySQL数据库管理系统的分支,被认为是MySQL 的完全替代品.Maria Db是由Sun在Sun Micro systems合并期间被Oracle收购后,于 ...
设计模式：bridge模式
目的:将“类的功能层次结构”和“类的实现层次结构”分类类的功能层次:通过类的继承添加功能(添加普通函数) 类的实现层次:通过类的继承实现虚函数理解:和适配器模式中的桥接方法相同例子: class ...
【Kafka】Kafka测试时控制台日志级别修改
在学习Kafka客户端时日志打的飞起,根本看不到自己发的消息,找了半天网上竟然没有这方面的资料.想了下依赖关系,这里应该只要把slf4j的日志级别调整应该就ok了. static { LoggerCo ...
CMD运行JAVA出现“错误：编码GBK的不可映射字符”
问题: 原因: 字符编码问题.由于java文件中有中文字符,而cmd在编译时解码默认使用GBK,所以导致无法解码出正确的中文字符. 解决办法: 使用-encoding指令指定运行编码为UTF-8.
e的存在性证明和计算公式的证明
$\quad\quad前言\quad\quad\\$ $此证明,改编自中科大数分教材,史济怀版\\$ $中科大教材,用的是先固定m,再放大m,跟菲赫金哥尔茨的方法一样.\\$ \(而我这里 ...
Debug LinkedList
Debug LinkedList源码前置知识 LinkedList基于链表,LinkedList的Node节点定义成员变量 //链表中元素的数量 transient int size = 0; / ...
turtle库常用函数
IPython magic命令

PostgreSQL在没有备份情况下误删除Clog恢复

PostgreSQL在没有备份情况下误删除Clog恢复的更多相关文章

随机推荐

热门专题