Following a Select Statement Through Postgres Internals
This is the third of a series of posts based on a presentation I did at the Barcelona Ruby Conference called “20,000 Leagues Under ActiveRecord.” (posts: one two and video).
Preparing for this presentation over the Summer, I decided to read through parts of the PostgreSQL C source code. I executed a very simple select statement and watched what Postgres did with it using LLDB, a C debugger. How did Postgres understand my query? How did it actually find the data I was looking for?
This post is an informal journal of my trip through the guts of Postgres. I’ll describe the path I took and what I saw along the way. I’ll use a series of simple, conceptual diagrams to explain how Postgres executed my query. In case you understand C, I’ll also leave you a few landmarks and signposts you can look for if you ever decide to hack on Postgres internals.
In the end, the Postgres source code delighted me. It was clean, well documented and easy to follow. Find out for yourself how Postgres works internally by following me on a journey deep inside a tool you use everyday.
Finding Captain Nemo
Here’s the example query from the first half of my presentation; we’ll follow Postgres as it searches for Captain Nemo:
Professor Aronnax and Captain Nemo
plot the course of the Nautilus.
Finding a single name in string column like this should be straightforward, shouldn’t it? We’ll hold tightly onto this select statement while we explore Postgres internals, like a rope deep sea divers use to find their way back to the surface.
The Big Picture
What does Postgres do with this SQL string? How does it understand what we meant? How does it know what data we are looking for?
Postgres processes each SQL command we send it using a four step process.
In the first step, Postgres parses our SQL statement and converts it into a series of C memory structures, a parse tree. Next Postgres analyzes and rewrites our query, optimizing and simplifying it using a series of complex algorithms. After that, Postgres generates a plan for finding our data. Like an obsessive compulsive person who won’t leave home without every suitcase packed perfectly, Postgres doesn’t run our query until it has a plan. Finally, Postgres actually executes our query. In this presentation I’ll briefly touch on the first three topics, and then focus more on the last step: Execute.
The C function inside of Postgres that implements this 4 step process is called exec_simple_query. You can find a link to it below, along with an LLDB backtrace which gives some context about exactly when and how Postgres callsexec_simple_query.
Parse
How does Postgres understand the SQL string we sent it? How does it make sense of the SQL keywords and expressions in our select statement? Through a process called parsing, Postgres converts our SQL string into an internal data structure it understands, the parse tree.
It turns out that Postgres uses the same parsing technology that Ruby does, a parser generator called Bison. Bison runs during the Postgres C build process and generates parser code based on a series of grammar rules. The generated parser code is what runs inside of Postgres when we send it SQL commands. Each grammar rule is triggered when the generated parser finds a corresponding pattern or syntax in the SQL string, and inserts a new C memory structure into the parse tree data structure.
I won’t take the time today to explain how parsing algorithms work in detail. If you’re interested in that sort of thing, I’d suggest taking a look at my book Ruby Under a Microscope. In Chapter One I go through a detailed example of the LALR parse algorithm used by Bison and Ruby. Postgres parses SQL statements in exactly the same way.
Using LLDB and enabling some C logging code, I observed the Postgres parser produce this parse tree for our Captain Nemo query:
At the top is a node representing the entire SQL statement, and below that are child nodes or branches that represent the different portions of the SQL statement syntax: the target list (a list of columns), the from clause (a list of tables), the where clause, the sort order and a limit count.
If you want to learn more about how Postgres parses SQL statements, follow the flow of control down fromexec_simple_query through another C function called pg_parse_query.
As you can see there are many helpful and detailed comments in the Postgres source code that not only explain what is happening but also point out important design decisions.
All That Hard Work For Nothing
The parse tree above should look familiar – it’s almost precisely the same as the abstract syntax tree (AST) we saw ActiveRecord create earlier. Recall from the first half of the presentation ActiveRecord generated our Captain Nemo select statement when we executed this Ruby query:
We saw that ActiveRecord internally created an AST when we called methods such as where and first. Later (see thesecond post), we watched as the Arel gem converted the AST into our example select statement using an algorithm based on the visitor pattern.
Thinking about this, it’s ironic that the first thing Postgres does with your SQL statement is convert it from a string back into an AST. Postgres’s parse process reverses everything ActiveRecord did earlier; all of that hard work the Arel gem did was for nothing! The only reason for creating the SQL string at all was to communicate with Postgres over a network connection. Once Postgres has the string, it converts it back into an AST, which is a much more convenient and useful way of representing queries.
Learning this you might ask: Is there a better way? Is there some way of conceptually specifying the data we want to Postgres without writing a SQL statement? Without learning the complex SQL language or paying the performance overhead of using ActiveRecord and Arel? It seems like a waste of time to go to such lengths to generate a SQL string from an AST, just to convert it back to an AST again. Maybe we should be using a NoSQL database solution instead?
Of course, the AST Postgres uses is much different from the AST used by ActiveRecord. ActiveRecord’s AST was comprised of Ruby objects, while Postgres’s AST is formed of a series of C memory structures. Same idea but very different implementations.
Analyze and Rewrite
Once Postgres has generated a parse tree, it then converts it into a another tree structure using a different set of nodes. This is known as the query tree. Returning to the exec_simple_query C function, you can see it next calls another C functionpg_analyze_and_rewrite.
Waving my hands a bit and glossing over many important details, the analyze and rewrite process applies a series of sophisticated algorithms and heuristics to try to optimize and simplify your SQL statement. If you had executed a complex select statement with sub-selects and multiple inner and outer joins, then there is a lot of room for optimization. It’s quite possible that Postgres could reduce the number of sub-select clauses or joins to produce a simpler query that runs faster.
For our simple select statement, here’s the query tree that pg_analyze_and_rewrite produces:
I don’t pretend to understand the detailed algorithms behind pg_analyze_and_rewrite. I simply observed that for our example the query tree largely resembled the parse tree. This means the select statement was so straightforward Postgres wasn’t able to simplify it further.
Plan
The last step Postgres takes before starting to execute our query is to create a plan. This involves generating a third tree of nodes that form a list of instructions for Postgres to follow. Here’s the plan tree for our select statement.
Imagine that each node in the plan tree is a machine or worker of some kind. The plan tree resembles a pipeline of data or a conveyor belt in a factory. In my simple example there is only one branch in the tree. Each node in the plan tree takes some the output data from the node below, processes it, and returns results as input to the node above. We’ll follow Postgres as it executes the plan in the next section.
The C function that starts the query planning process is called pg_plan_queries.
Note the startup_cost and total_cost values in each plan node. Postgres uses these values to estimate how long the plan will take to complete. You don’t have to use a C debugger to see the execution plan for your query. Just prepend the SQLEXPLAIN command to your query, like this:
This is a powerful way to understand what Postgres is doing internally with one of your queries, and why it might be slow or inefficient – despite the sophisticated planning algorithms in pg_plan_queries.
Executing a Limit Plan Node
By now, Postgres has parsed your SQL statement and converted it back into an AST. Then it optimized and rewrote your query, possibly in a simpler way. Third, Postgres wrote a plan which it will follow to find and return the data you are looking for. Finally it’s time for Postgres to actually execute your query. How does it do this? It follows the plan, of course!
Let’s start at the top of the plan tree and move down. Skipping the root node, the first worker that Postgres uses for our Captain Nemo query is called Limit. The Limit node, as you might guess, implements the LIMIT SQL command, which limits the result set to the specified number of records. The same plan node also implements the OFFSET command, which starts the result set window at the specified row.
The first time Postgres calls the Limit node, it calculates what the limit and offset values should be, because they might be set to the result of some dynamic calculation. In our example, offset is 0 and limit is 1.
Next, the Limit plan node repeatedly calls the subplan, in our case Sort, counting until it reaches the offset value:
In our example the offset value is zero, so this loop will load the first data value and stop iterating. Then Postgres returns the last data value loaded from the subplan to the calling or upper plan. For us, this will be that first value from the subplan.
Finally when Postgres continues to call the Limit node, it will pass the data values through from the subplan one at a time:
In our example, because the limit value is 1 Limit will immediately return NULL indicating to the upper plan there is no more data available.
Postgres implements the Limit node using code in a file called nodeLimit.c
You can see the Postgres source code uses words such as tuple (a set a values, one from each column) and subplan. The subplan in this example is the Sort node, which appears below Limit in the plan.
Executing a Sort Plan Node
Where do the data values Limit filters come from? From the Sort plan node that appears under Limit in the plan tree. Sort loads data values from its subplan and returns them to its calling plan, Limit. Here’s what Sort does when the Limit node calls it for the first time, to get the first data value:
You can see that Sort functions very differently from Limit. It immediately loads all of the available data from the subplan into a buffer, before returning anything. Then it sorts the buffer using the Quicksort algorithm, and finally returns the first sorted value.
For the second and subsequent calls, Sort simply returns additional values from the sorted buffer, and never needs to call the subplan again:
The Sort plan node is implemented by a C function called ExecSort:
Executing a SeqScan Plan Node
Where does ExecSort get its values? From its subplan, or the SeqScan node that appears at the bottom of the plan tree. SeqScan stands for sequence scan, which means to look through the values in a table, returning values that match a given filter. To understand how the scan works with our filter, let’s step through an imaginary users table filled with fake names, looking for Captain Nemo.
Postgres starts at the first record in a table (known as a relation in the Postgres source code) and executes the boolean expression from the plan tree. In simple terms, Postgres asks the question: “Is this Captain Nemo?” Because Laurianne Goodwin is not Captain Nemo, Postgres steps down to the next record.
No, Candace is also not Captain Nemo. Postgres continues:
… and eventually finds Captain Nemo!
Postgres implements the SeqScan node using a C function called ExecSeqScan.
What Are We Doing Wrong?
Now we’re done! We’ve followed a simple select statement all the way through the guts of Postgres, and have seen how it was parsed, rewritten, planned and finally executed. After executing many thousands of lines of C code, Postgres has found the data we are looking for! Now all Postgres has to do is return the Captain Nemo string back to our Rails application and ActiveRecord can create a Ruby object. We can finally return to the surface of our application.
But Postgres doesn’t stop! Instead of simply returning, Postgres continues to scan through the users table, even though we’ve already found Captain Nemo:
While returning from the South Pole, the air
supply inside the Nautilus began to run out.
What’s going on here? Why is Postgres wasting its time, continuing to search even though it’s already found the data we’re looking for?
The answer lies farther up the plan tree in the Sort node. Recall that in order to sort all of the users, ExecSort first loads all of the values into a buffer, by calling the subplan repeatedly until there are no values left. That means that ExecSeqScan will continue to scan to the end of the table, until it has all of the matching users. If our users table contained thousands or even millions of records (imagine we work at Facebook or Twitter), ExecSeqScan will have to loop over every single user record and execute the string comparison for each one. This is obviously inefficient and slow, and will get slower as more and more user records are added.
If we have only one Captain Nemo record, then ExecSort will “sort” just that single matching record, and ExecLimit will pass that single record through its offset/limit filter… but only after ExecSeqScan has iterated over all of the names.
Next Time
How do we fix this problem? What should we do if our SQL queries on the users table take more and more time to execute? The answer is simple: we create an index.
In the next and final post in this series we’ll learn how to create a Postgres index and to avoid the use of ExecSeqScan. More importantly, I’ll show you what a Postgres index looks like: howit works and why it speeds up queries like this one.
注:非常赞的文章,讲解了pg中执行select操作时优化器的所做。
关于lldb具体的操作步骤参考:
https://gist.github.com/patshaughnessy/70519495343412504686
参考:
http://patshaughnessy.net/2014/10/13/following-a-select-statement-through-postgres-internals
Following a Select Statement Through Postgres Internals的更多相关文章
- Select Statement Syntax [AX 2012]
Applies To: Microsoft Dynamics AX 2012 R3, Microsoft Dynamics AX 2012 R2, Microsoft Dynamics AX 2012 ...
- SQL Fundamentals: Basic SELECT statement基本的select语句(控制操作的现实列)(FROM-SELECT)
SQL Fundamentals || Oracle SQL语言 Capabilities of the SELECT Statement(SELECT语句的功能) Data retrieval fr ...
- A SELECT statement that assigns a value to a variable must ... (向变量赋值的 SELECT 语句不能与数据检索操作结合使用 )
A SELECT statement that assigns a value to a variable must ... (向变量赋值的 SELECT 语句不能与数据检索操作结合使用 ) 总结一句 ...
- Cannot resolve collation conflict between "Chinese_Taiwan_Stroke_CI_AS" and "Chinese_PRC_CI_AS" in UNION ALL operator occurring in SELECT statement column 1.
Cannot resolve collation conflict between . 解决方案: COLLATE Chinese_PRC_CI_AS 例子: SELECT A.Name FROM A ...
- Part 5 Select statement in sql server
Select specific or all columns select * from 表名 select * from Student select 列名,列名... from 表名 select ...
- pg 资料大全1
https://github.com/ty4z2008/Qix/blob/master/pg.md?from=timeline&isappinstalled=0 PostgreSQL(数据库) ...
- using the library to generate a dynamic SELECT or DELETE statement mysqlbaits xml配置文件 与 sql构造器 对比
https://github.com/mybatis/mybatis-dynamic-sql MyBatis Dynamic SQL What Is This? This library is ...
- Discovering the Computer Science Behind Postgres Indexes
This is the last in a series of Postgres posts that Pat Shaughnessy wrote based on his presentation ...
- MySQL and Postgres command equivalents (mysql vs psql)
MySQL and Postgres command equivalents (mysql vs psql) 博客分类: Database From: http://blog.endpoint.c ...
随机推荐
- CSS 3中边框怎么用
(1)设置边框图片的来源 图片边框默认只在四个顶点显示 none: 无背景图片; border-image-source: url('borderImage.png'); (2)边框图片的分割 将图片 ...
- 数据库 基础篇2(mysql)
2.1MySQL入门 1)到mysql官网下载. 2)安装mysql软件 3)使用 验证是否成功 打开cmd -> 输入 mysql -u root -p 回车 -> 输入密码 ...
- Minimum Path Sum [LeetCode]
Given a m x n grid filled with non-negative numbers, find a path from top left to bottom right which ...
- Android中Webview使用自定义的javascript进行回调
先说为什么需要讨论这个问题. 现在很多的手机应用,都可能会直接嵌入一个web页面.这样做的好处:一个是功能更新方便,维护起来容易,只需要维护服务器的页面即可,不需要更新客户端:另一个是功能通用,不仅a ...
- guava学习--File1
ByteSource:表示一个可读的字节.通常情况下,我们期望的字节来源是一个文件,但它也可以从一个字节数组读取字节. File f1 = new File("D:\\test2.txt&q ...
- hdu 1005 1021 递归超限 找规律 // 只要看题中n较大都是有规律的
因为n>1000000000所以用递归 数组超限, 由递归函数f(n)=(A*f(n-1)+B*f(n-2))%7; 因为是除7的余数 因次一共有7*7=49种情况, 以后的值都和之前的对应相等 ...
- UE3植被工具-支持刷Actor)
[目标] 植被工具-刷Actor [思路] 1 添加类型FFoliageMeshInfo.AddInstance 的函数 2 添加Instance就直接SpawnActor 3 类结构 4 修改的函数 ...
- CSS动画 防止动画结束后,回归原位
animation-fill-mode防止动画结束后,回归原位 animation: arrowsfirst 1s; animation-timing-function: linear; animat ...
- (DFS)codevs1004-四子连棋
题目地址 方法是建立dfs,并在其中加入pre变量,记录之前移动的是W还是B.外面套for循环,从1步开始逐次递增,直到在i步时可以走完(dfs返回1),break退出循环,即为最短步. 本题的关键主 ...
- .gitignore
# Xcode # build/ *.pbxuser !default.pbxuser *.mode1v3 !default.mode1v3 *.mode2v3 !default.mode2v3 *. ...