PostgreSQL Q&A: Building an Enterprise-Grade PostgreSQL Setup Using Open Source Tools
转自:https://www.percona.com/blog/2018/10/19/postgresql-building-enterprise-grade-setup-with-open-source/
Hello everyone, and thank you to those that attended our webinar on Building an Enterprise-grade PostgreSQL setup using open source tools last Wednesday. You’ll find the recordings of such as well as the slides we have used during our presentation here.
We had over forty questions during the webinar but were only able to tackle a handful during the time available, so most remained unanswered. We address the remaining ones below, and they have been grouped in categories for better organization. Thank you for sending them over! We have merged related questions and kept some of our answers concise, but please leave us a comment if you would like to see a particular point addressed further.
Backups
Q: In our experience, pg_basebackup with compression is slow due to single-thread gzip compression. How to speed up online compressed full backup?
Single-thread operation is indeed a limitation of pg_basebackup, and this is not limited to compression only. pgBackRest is an interesting alternative tool in this regard as it does have support for parallel processing.
Q: Usually one setup database backup on primary DB in a HA setup. Is it possible to automatically activate backup on new primary DB after Patroni failover? (or other HA solutions)
Yes. This can be done transparently by pointing your backup system to the “master-role” port in the HAProxy instead – or to the “replica-role” port; in fact, it’s more common to use standby replicas as the backup source.
Q: Do backups and WAL backups work with third party backup managers like NetBackup for example?
Yes, as usual it depends on how good the vendor support is. NetBackup supports PostgreSQL, and so does Zmanda to mention another one.
Security and auditing
Q: Do you know a TDE solution for PostgreSQL? Can you talk a little bit about the encryption at rest solution for Postgres PCI/PII applications from Percona standpoint.
At this point PostgreSQL does not provide a native Transparent Data Encryption (TDE) functionality, relying instead in the underlying file system for data-at-rest encryption. Encryption at the column level can be achieved through the pgcrypto module.
Moreover, other PostgreSQL security features related to PCI compliance are:
- row-level security
- host based authentication (pg_hba.conf)
- encryption of data over the wire using SSL
Q: How to prevent superuser account to access raw data in Postgres? (…) we encounter companies usually ask that even managed accounts can not access the real data in any mean.
It is fundamental to maintain a superuser account that is able to access any object in the database for maintenance activities. Having said that, currently it is not possible to deny a superuser direct access to the raw data found in tables. What you can do to protect sensitive data from superuser access is to have it stored encrypted. As mentioned above, pgcrypto offers the necessary functionality for achieving this.
Furthermore, avoiding connecting to the database as a superuser is a best practice. The extension set_user allows for unprivileged users to escalate themselves as superuser for maintenance tasks on demand while providing an additional layer of logging and control for better auditing. Also, as discussed in the webinar, it’s possible to implement segregation of users using roles and privileges. Remember it’s best practice to only grant the essential privileges a role to fulfill its duties, including application users. Additionally, password authentication should be enforced to superusers.
Q: How can you make audit logging in Postgres record DMLs while masking data content in these recorded SQLs?
To the best of our knowledge, currently there is not a solution to apply query obfuscation to logs. Bind parameters are always included in both the audit and logging of DMLs, and that is by design. If you would rather avoid logging bind parameters and want to keep track of the statements executed only, you can use the pg_stat_statements extension instead. Note that while pg_stat_statements provides overall statistics of the executed statements, it does not keep track of when each DML has been executed.
Q: How to setup database audit logging effectively when utilizing pgbouncer or pgpool?
A key part of auditing is having separate user accounts in the database instead of a single, shared account. The connection to the database should be made by the appropriate user/application account. In pgBouncer we can have multiple pools for each of the user accounts. Every action by a connection from that pool will be audited against the corresponding user.
High Availability and replication
Q: Is there anything like Galera for PostgreSQL ?
Galera replication library provides support for multi-master, active-active MySQL clusters based on synchronous replication, such as Percona XtraDB Cluster. PostgreSQL does have support for synchronous replication but limited to a single active master context only.
There are, however, clustering solutions for PostgreSQL that address similar business requirements or problem domains such as scalability and high availability (HA). We have presented one of them, Patroni, in our webinar; it focuses on HA and read scaling. For write scaling, there have long been sharding based solutions, including Citus, and PostgreSQL 10 (and now 11!) bring substantial new features in the partitioning area. Finally, PostgreSQL based solutions like Greenplum and Amazon redshift addresses scalability for analytical processing, while TimescaleDB has been conceived to handle large volumes of time series data.
Q: Pgpool can load balance – what is the benefit of HAProxy over Pgpool?
No doubt Pgpool is feature rich, which includes load balancing besides connection pooling, among other functionalities. It could be used in place of HAProxy and PgBouncer, yes. But features is just one of the criteria for selecting a solution. In our evaluation we gave more weight to lightweight and faster, scalable solutions. HAProxy is well known for its lightweight connection routing capability without consuming much of the server resources.
Q: How to combine PgBouncer and Pgpool together so that one can achieve transaction pooling + load balancing? Can you let me know between the two scaling solutions which one is better, PgBouncer or Pgpool-II?
It depends, and must be analyzed on a case-by-case basis. If what we really need is just a connection pooler, PgBouncer will be our first choice because it is more lightweight compared to Pgpool. PgBouncer is thread-based while Pgpool is process-based—like PostgreSQL, forking the main process for each inbound connection is a somewhat expensive operation. PgBouncer is more effective in this front.
However, the relative heavyweight of Pgpool comes with a lot of features, including the capability to manage PostgreSQL replication, and the ability to parse statements fired against PostgreSQL and redirect them to certain cluster nodes for load balancing. Also, when your application cannot differentiate between read and write requests, Pgpool can parse the individual SQL statements and redirect them to the master, if it is a write, or to a standby replica, if it is a read, as configured in your Pgpool setup. The demo application we used in our webinar setup was able to distinguish reads from writes and use multiple connection strings accordingly, so we employed HAProxy on top of Patroni.
We have seen environments where Pgpool was used for its load balancing capabilities while connection pooling duties were left for PgBouncer, but this is not a great combination. As described above, HAProxy is more efficient than Pgpool as a load balancer.
Finally, as discussed in the webinar, any external connection pooler like Pgbouncer is required only if there is no proper application layer connection pooler, or if the application layer connection pooler is not doing a great job in maintaining a proper connection pool, resulting in frequent connections and disconnections.
Q: Is it possible for Postgres to have a built-in connection pool worker? Maybe merge Pgbouncer into postgres core? That would make it much easier to use advanced authentication mechanisms (e.g. LDAP).
A great thought. That would indeed be a better approach in many aspects than employing an external connection pooler like Pgbouncer. Recently there were discussions among PostgreSQL contributors on the related topic, as seen here. A few sample patches have been submitted by hackers but nothing has been accepted yet. The PostgreSQL community is very keen to keep the server code lightweight and stable.
Q: Is rebooting the standby the only way to change master in PostgreSQL?
A standby-to-master promotion does not involve any restart.
From the perspective of the user, a standby is promoted by pg_ctl promote command or by creating a trigger file. During this operation, the replica stops the recovery related processing and becomes a read-write database.
Once we have a new master, all the other standby servers need to start replicating from it. This involves changes to the recovery.conf parameters and, yes, a restart: the restart happens only on the standby side when the current master has to be changed. PostgreSQL currently does not allow us to change this parameter using a SIGHUP.
Q: Are external connection pooling solutions (PgBouncer, Pgpool) compatible with Java Hibernate ORM ?
External connection poolers like PgBouncer and Pgpool are compatible with regular PostgreSQL connections. So connections from Hibernate ORM can treat PgBouncer as regular PostgreSQL but running on a different port (or the same, depending on how you configure it). An important point to remember is that they are complementary to connection pools that integrate well with ORM components. For example c3p0 is a well known connection pooler for Hibernate. If an ORM connection pooler can be well tuned to avoid frequent connections and disconnections, then, external pooling solutions like PgBouncer or Pgpool will become redundant and can/should be avoided.
Q: Question regarding connection pool: I want to understand if the connections are never closed or if there are any settings to force the closing of the connection after some time.
There is no need to close a connection if it can be reused (recycled) again and again instead of having a new one created. That is the very purpose of the connection pooler. When an application “closes” a connection, the connection pooler will virtually release the connection from the application and recover it back to the pool of connections. On the next connection request, instead of establishing a new connection to the database the connection pooler will pick a connection from the pool of connections and “lend” it to the application. Furthermore, most connection poolers include a parameter to control the release of connections after a specified idle time.
Q: Question regarding Patroni: can we select in the settings to not failover automatically and only used Patroni for manual failover/failback?
Yes, Patroni allow users to pause its automation process, leaving them to manually trigger operations such as failover. The actual procedure for achieving this will make an interesting blog post (we put it in our to-do list).
Q: Where should we install PgBouncer, Patroni and HAproxy to fulfill the 3-lawyers format: web frontends, app backends and DB servers ? What about etcd ?
Patroni and etcd must be installed in the database servers. In fact, etcd can be running in other servers as well, because the set of etcd instances just form the distributed consensus store. HAProxy and PgBouncer can be installed on the application servers for simplicity, or optionally they can run on dedicated servers, especially when you ran a large amount of those. Having said that, HAProxy is very lightweight and can be maintained in each application server without added impact. If you want to install PgBouncer on dedicated servers, just make sure to avoid SPOF (single point of failure) by employing active-passive servers.
Q: How does HAproxy in your demo setup know how to route DML appropriately to the master and slaves (e.g. writes always go to the master and reads are load balanced between the replicas) ?
HAProxy does not parse SQL statements in the intermediate layer in order to redirect them to the master or to one of the replicas accordingly—this must be done at the application level. In order to benefit from this traffic distribution, your application needs to send write requests to the appropriate HAproxy port; the same with read requests. In our demo setup, the application connected to two different ports, one for reads and another for writes (DML).
Q: How often does the cluster poll each node/slave? Is it tunable for poor performing networks?
Patroni uses an underlying distributed consensus mechanism for all heartbeat checks. For example, etcd, which can be used for this, has default heartbeat interval of 100ms, but it is adjustable. Apart from this, in every layer of the stack, there are tunable TCP-like timeouts. For connection routing HAProxy polls by making use of the Patroni API, which also allows further control on how the checks can be done. Having said that, please keep in mind that poor performing networks are often a bad choice for distributed services, with problems spanning beyond timeout checks.
Miscellaneous
Q: Hi Avinash/Nando/Jobin, maybe I wasn’t able to catch up with DDL’s but what’s the best way to handle DDLs ? In MySQL, we can use pt-online-schema-change and avoid large replica lag, is there a way to achieve the same in PostgreSQL without blocking/downtime or does Percona has an equivalent tool for PostgreSQL? Looking forward to this!
Currently, PostgreSQL locks tables for DDLs. Some DDLs, such as creating triggers and indexes, may not lock every activity on the table. There isn’t a tool like pt-online-schema-change for PostgreSQL yet. There is, however, an extension called pg_repack, which assists in rebuilding a table online. Additionally, adding the keyword “CONCURRENTLY” to create index statement makes it gentle on the system and allows concurrent DMLs and queries to happen while the index is being built. Let’s suppose you want to rebuild the index behind the primary key or unique key: an index can be created independently and the index behind the key can be replaced with a momentarily lock that may be seamless.
A lot of new features are added in this space with each new release. One of the extreme cases of extended locking is adding a NOT NULL column on a table with DEFAULT values. In most of the database systems this operation can hold a write lock on the table until it completes. Just released, PostgreSQL 11 makes it a brief operation irrespective of the size of the table. It is now achieved with a simple metadata change rather than through a complete table rebuild. As PostgreSQL continues to get better on handling DDLs, the scope for external tools is reducing. Moreover, it is not resulting in table rewrite, so excessive I/O and other side effects like replication lag can be avoided.
Q: What are the actions that can be performed by the parallelization option in PostgreSQL ?
This is the area where PostgreSQL has improved significantly in the last few versions. The answer, then, depends on which version you are using. Parallelization has been introduced in PostgreSQL 9.6, with more capabilities added in version 10. As of version 11 pretty much everything can make use of parallelization, including index building. The more CPU cores your server has at its disposal, the more you would benefit from the latest versions of PostgreSQL, given that it is properly turned for parallel execution.
Q: is there any flashback query or flashback database option in PostgreSQL ?
If flashback queries are an application requirement please consider using temporal tables to better visualize data from a specific time or period. If the application is handling time series data (like IOT devices), then, TimescaleDB may be an interesting option for you.
Flashback of the database can be achieved in multiple ways, either with the help of backup tools (and point-in-time recovery) or using a delayed standby replica.
Q: Question regarding pg_repack: we have attempted running pg_repack and for some reason it kept running forever; can we simply cancel/abort its execution ?
Yes, the execution of pg_repack can be aborted without prejudice. This is safe to do because the tool creates an auxiliary table and uses it to rearrange the data, swapping it with the original table at the end of the process. If its execution is interrupted before it completes, the swapping of tables just doesn’t take place. However, since it works online and doesn’t hold an exclusive lock on the target table, depending on its size and the changes made on the target table during the process, it might take considerable time to complete. Please explore the parallel feature available with pg_repack.
Q: Will the monitoring tool from Percona be open source ?
Percona Monitoring and Management (PMM) has been released already as an open source project with its source code being available at GitHub.
Q: It’s unfortunate that the Master/Slave terminology is still used on slide. Why not use instead leader/follower or orchestrator node/node?
We agree with you, particularly regarding the reference on “slave” – “replica” is a more generally accepted term (for good reason), with “standby” [server|replica] being more commonly used with PostgreSQL.
Patroni usually employs the terms “leader” and “followers”.
The use of “cluster” (and thus “node”) in PostgreSQL, however, contrasts with what is usually the norm (when we think about traditional beowulf clusters, or even Galera and Patroni) as it denotes the set of databases running on a single PostgreSQL instance/server.
PostgreSQL Q&A: Building an Enterprise-Grade PostgreSQL Setup Using Open Source Tools的更多相关文章
- PostgreSQL介绍以及如何开发框架中使用PostgreSQL数据库
最近准备下PostgreSQL数据库开发的相关知识,本文把总结的PPT内容通过博客记录分享,本随笔的主要内容是介绍PostgreSQL数据库的基础信息,以及如何在我们的开发框架中使用PostgreSQ ...
- 再不了解PostgreSQL,你就晚了之PostgreSQL主从流复制部署
前言 在MySQL被收购之后,虽然有其替代品为: MariaDB,但是总感觉心里有点膈应.大家发现了另一款开源的数据库: PostgreSQL. 虽然centos自带版本9.2也可以用,但是最近的几次 ...
- Java向PostgreSQL发送prepared statement 与 libpq 向PostgreSQL发送prepared statement之比较:
Java 代码,在数据库端,并没有当成 prepared statetment 被处理. C代码通过libpq 访问数据库端,被当成了 prepared statement 处理.也许是因Postgr ...
- 将GitLab数据库从阿里云PostgreSQL RDS迁移至自建的PostgreSQL服务器
阿里云RDS目前支持的是PostgreSQL 9.4,而gitlab支持的最低版本是PostgreSQL 9.6.1,不升级PostgreSQL,gitlab就无法升级,阿里云RDS短期内不进行升级, ...
- Postgresql客户端不能远程连接数据库服务器 org.postgresql.util.PSQLException:
Postgresql安装完成之后,默认情况下是不允许远程客户端直接连接的,并且默认的监听配置文件里边,监听的服务器地址是127.0.0.1即:localhost 报如下错误: org.postgres ...
- ISC DHCP: Enterprise grade solution for configuration needs
https://www.isc.org/downloads/dhcp/ ISC DHCP: Enterprise grade solution for configuration needs All ...
- PostgreSQL 速查、备忘手册 | PostgreSQL Quick Find and Tutorial
PostgreSQL 速查.备忘手册 作者:汪嘉霖 这是一个你可能需要的一个备忘手册,此手册方便你快速查询到你需要的常见功能.有时也有一些曾经被使用过的高级功能.如无特殊说明,此手册仅适用于 Linu ...
- 查看postgresql的日志show queries log in PostgreSQL?
原文:https://tableplus.io/blog/2018/10/how-to-show-queries-log-in-postgresql.html -------------------- ...
- RedHat 7 安装PostgreSQL 10.5
系统环境 Redhat: Version: 7.4.1708 Architecture: x86_64 Address: 10.127.1.11 User: root Uassword: redhat ...
随机推荐
- easyui再学习的一部分代码
<%-- Created by IntelliJ IDEA. User: zhen Date: // Time: : To change this template use File | Set ...
- 福大软工 · BETA 版冲刺前准备之拖鞋旅游队
拖鞋旅游队BETA 版冲刺前准备 前言 队名:拖鞋旅游队 组长博客:https://www.cnblogs.com/Sulumer/p/10083834.html 本次作业:https://edu.c ...
- lvm逻辑卷扩容
先扩容卷组在扩容逻辑卷 1.准备磁盘分区 #fdisk -l 2.查看当前的物理卷 #pvdisplay 3.准备物理卷 #pvcreate /dev/sdb 4.查看vg #lvdisplay 5 ...
- 基于ArduinoLeonardo板子的BadUSB攻击实战
0X00 前言 在Freebuf上许多同学已经对HID攻击谈了自己的看法,如维克斯同学的<利用Arduino快速制作Teensy BadUSB>无论从科普还是实践都给我们详尽的描述了Bad ...
- async 函数--学习笔记一
含义: ES2017 标准引入了 async 函数,使得异步操作变得更加方便.async 函数是什么?一句话,它就是 Generator 函数的语法糖. 前文有一个 Generator 函数,依次读取 ...
- MySQL笔记(2)
SQL 的约束 约束是一种限制,它通过对表的行或列的数据做出限制,来确保表的数据的完整性.唯一性.. 1 约束分类 约束是一种限制,它通过对表的行或列的数据做出限制,来确保表的数据的完整性.唯一性. ...
- 5个适用于初学者的有用数据分析表达式(DAX)函数
数据分析表达式(DAX)入门可能令人生畏,但是,当你了解一些基本功能后,你就可以帮助你解答许多有关数据的新见解.虽然在Power BI或Pivot Charts中创建视觉效果很容易,但我们经常希望查看 ...
- angular上传获取图片的directive指令
在AngularJS中,操作DOM一般在指令中完成,那么指令是如何实现的呢?指令的作用是把我们自定义的语义化标签替换成浏览器能够认识的HTML标签 一般的事件监听是在对静态的dom绑定事件,而如果在指 ...
- 【转载】 看996ICU
原文地址: https://www.jianshu.com/p/15d8726fa8a8 作者:Demisstif 来源:简书 ------------------------------------ ...
- ERROR: gnu-config-native-20150728+gitAUTOINC+b576fa87c1-r0 do_unpack: Function failed: Fetcher failure: Fetch command failed with exit code 128, output: fatal: the '--set-upstream' option is no longer
/********************************************************************** * ERROR: gnu-config-native-2 ...