Hierarchical data in postgres
https://coderwall.com/p/whf3-a/hierarchical-data-in-postgres
-------------------------------------------------------------------------------
This tip will try to answer the following questions:
- How can we represent a tree of data in postgres
- How can we efficiently query for any entire single node and all of it's children (and children's children).
The test data
Since we want to keep this simple we will assume our data is just a bunch of sections
. A section just has a name
and each section
has a single parent section
.
Section A
|--- Section A.1
Section B
|--- Section B.1
|--- Section B.1
|--- Section B.1.1
We'll use this simple data for examples below.
Simple self-referencing
When designing a self-referential table (something that joins itself to itself) the most obvious choice is to have some kind of parent_id
column on our table that references itself.
CREATE TABLE section (
id INTEGER PRIMARY KEY,
name TEXT,
parent_id INTEGER REFERENCES section,
);
ALTER TABLE page ADD COLUMN parent_id INTEGER REFERENCES page;
CREATE INDEX section_parent_id_idx ON section (parent_id);
Now insert our example data, using the parent_id
to related the nodes together:
INSERT INTO section (id, name, parent_id) VALUES (1, 'Section A', NULL);
INSERT INTO section (id, name, parent_id) VALUES (2, 'Section A.1', 1);
INSERT INTO section (id, name, parent_id) VALUES (3, 'Section B', NULL);
INSERT INTO section (id, name, parent_id) VALUES (4, 'Section B.1', 3);
INSERT INTO section (id, name, parent_id) VALUES (5, 'Section B.2', 3);
INSERT INTO section (id, name, parent_id) VALUES (6, 'Section B.2.1', 5);
This works great for simple queries such as, fetch the direct children of Section B:
SELECT * FROM section WHERE parent = 3
but it will require complex or recursive queries for questions like fetch me all the children and children's children of Section B:
WITH RECURSIVE nodes(id,name,parent_id) AS (
SELECT s1.id, s1.name, s1.parent_id
FROM section s1 WHERE parent_id = 3
UNION
SELECT s2.id, s2.name, s2.parent_id
FROM section s2, nodes s1 WHERE s2.parent_id = s1.id
)
SELECT * FROM nodes;
So we have answered the "how to build a tree" part of the question, but are not happy with the "how to query for a node and all it's children" part.
Enter ltree. (Short for "label tree" - I think?).
The ltree extension
The ltree extension is a great choice for querying hierarchical data. This is especially true for self-referential relationships.
Lets rebuild the above example using ltree. We'll use the page's primary keys as the "labels" within our ltree paths and a special "root" label to denote the top of the tree.
CREATE EXTENSION ltree;
CREATE TABLE section (
id INTEGER PRIMARY KEY,
name TEXT,
parent_path LTREE
);
CREATE INDEX section_parent_path_idx ON section USING GIST (parent_path);
We'll add in our data again, this time rather than using the id
for the parent, we will construct an ltree path that represents the parent node.
INSERT INTO section (id, name, parent_path) VALUES (1, 'Section 1', 'root');
INSERT INTO section (id, name, parent_path) VALUES (2, 'Section 1.1', 'root.1');
INSERT INTO section (id, name, parent_path) VALUES (3, 'Section 2', 'root');
INSERT INTO section (id, name, parent_path) VALUES (4, 'Section 2.1', 'root.3');
INSERT INTO section (id, name, parent_path) VALUES (4, 'Section 2.2', 'root.3');
INSERT INTO section (id, name, parent_path) VALUES (5, 'Section 2.2.1', 'root.3.4');
Cool. So now we can make use of ltree's operators @>
and <@
to answer our original question like:
SELECT * FROM section WHERE parent_path <@ 'root.3';
However we have introduced a few issues.
- Our simple
parent_id
version ensured referential consistancy by making use of theREFERENCES
constraint. We lost that by switching to ltree paths. - Ensuring that the ltree paths are valid can be a bit of a pain, and if paths become stale for some reason your queries may return unexpected results or you may "orphan" nodes.
The final solution
To fix these issues we want a hybrid of our original parent_id
(for the referential consistency and simplicity of the child/parent relationship) and our ltree paths (for improved querying power/indexing). To achieve this we will hide the management of the ltree path behind a trigger and only ever update the parent_id
column.
CREATE EXTENSION ltree;
CREATE TABLE section (
id INTEGER PRIMARY KEY,
name TEXT,
parent_id INTEGER REFERENCES section,
parent_path LTREE
);
CREATE INDEX section_parent_path_idx ON section USING GIST (parent_path);
CREATE INDEX section_parent_id_idx ON section (parent_id);
CREATE OR REPLACE FUNCTION update_section_parent_path() RETURNS TRIGGER AS $$
DECLARE
path ltree;
BEGIN
IF NEW.parent_id IS NULL THEN
NEW.parent_path = 'root'::ltree;
ELSEIF TG_OP = 'INSERT' OR OLD.parent_id IS NULL OR OLD.parent_id != NEW.parent_id THEN
SELECT parent_path || id::text FROM section WHERE id = NEW.parent_id INTO path;
IF path IS NULL THEN
RAISE EXCEPTION 'Invalid parent_id %', NEW.parent_id;
END IF;
NEW.parent_path = path;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER parent_path_tgr
BEFORE INSERT OR UPDATE ON section
FOR EACH ROW EXECUTE PROCEDURE update_section_parent_path();
Much better.
More
Written by Chris Farmiloe
Hierarchical data in postgres的更多相关文章
- asp.net Hierarchical Data
Introduction A Hierarchical Data is a data that is organized in a tree-like structure and structure ...
- mysql 树形数据,层级数据Managing Hierarchical Data in MySQL
原文:http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/ 引言 大多数用户都曾在数据库中处理过分层数据(hiera ...
- Managing Hierarchical Data in MySQL
Managing Hierarchical Data in MySQL Introduction Most users at one time or another have dealt with h ...
- Managing Hierarchical Data in MySQL(邻接表模型)[转载]
原文在:http://dev.mysql.com/tech-resources/articles/hierarchical-data.html 来源: http://www.cnblogs.com/p ...
- 云原生 PostgreSQL 集群 - PGO:来自 Crunchy Data 的 Postgres Operator
使用 PGO 在 Kubernetes 上运行 Cloud Native PostgreSQL:来自 Crunchy Data 的 Postgres Operator! Cloud Native Po ...
- [Postgres] Group and Aggregate Data in Postgres
How can we see a histogram of movies on IMDB with a particular rating? Or how much movies grossed at ...
- 《利用Python进行数据分析: Python for Data Analysis 》学习随笔
NoteBook of <Data Analysis with Python> 3.IPython基础 Tab自动补齐 变量名 变量方法 路径 解释 ?解释, ??显示函数源码 ?搜索命名 ...
- Following a Select Statement Through Postgres Internals
This is the third of a series of posts based on a presentation I did at the Barcelona Ruby Conferenc ...
- ZOJ 3826 Hierarchical Notation 模拟
模拟: 语法的分析 hash一切Key建设规划,对于记录在几个地点的每个节点原始的字符串开始输出. . .. 对每一个询问沿图走就能够了. .. . Hierarchical Notation Tim ...
随机推荐
- 【原创】Mysql中事务ACID实现原理
引言 照例,我们先来一个场景~ 面试官:"知道事务的四大特性么?" 你:"懂,ACID嘛,原子性(Atomicity).一致性(Consistency).隔离性(Isol ...
- 04_ThreadLocal整合事务操作
文章导读: 本文主要讲解了如何在没有框架情况下如何解决Dao的事务问题, 重点理解Connection存放到WeakReference中为什么垃圾回收的时候Connection不回收 视频与源码下载: ...
- 大数据学习——SparkStreaming整合Kafka完成网站点击流实时统计
1.安装并配置zk 2.安装并配置Kafka 3.启动zk 4.启动Kafka 5.创建topic [root@mini3 kafka]# bin/kafka-console-producer. -- ...
- Phpstrom 书签应用
F11增加书签 书签 Ctrl + F11切换书签助记符 Ctrl +#[0-9]转到编号书签 Shift + F11显示书签
- 简单使用EL进行标签的替换
package com.ceshi; public class HtmlShow { public static String transfer(String txt,String cssClass) ...
- 使用runtime关联对象将视图添加到视图的类目里
//get方法 - (RJCircularLoaderView*)rj_circularLoaderView { RJCircularLoaderView *loaderView = objc_get ...
- C++ ---->中include <iostream>和include <iostream.h>的区别
简单来说: .h的是标准C的头文件,没有.h的是标准C++的头文件,两种都是头文件. 造成这两种形式不同的原因,是C++的发展历史决定的,刚才正好有别的人也问这个问题,这里我再回答一下(注意vs200 ...
- 深入理解Java中的volatile关键字
在再有人问你Java内存模型是什么,就把这篇文章发给他中我们曾经介绍过,Java语言为了解决并发编程中存在的原子性.可见性和有序性问题,提供了一系列和并发处理相关的关键字,比如synchronized ...
- <定时主库导出/备库导入>
1.设置定时任务时间及所需要的dmp文件路径 [mm1@localhost ~]$ crontab -e 0 0 * * * sh /home/mm1/exp_table.sh 2>& ...
- 在 IBM RAD 平台上基于 JAX-WS 开发 Web Services服务器端,客户端
原文地址:https://www.ibm.com/developerworks/cn/websphere/library/techarticles/1305_jiangpl_rad/1305_jian ...