Presto Infrastructure at Lyft
转载一篇关于 lyft presto 平台建设的实践
Overview
Early in 2017 we started exploring Presto for OLAP use cases and we realized the potential of this amazing query engine. It started as an adhoc querying tool for data engineers and analysts to run SQL in a faster way to prototype their queries, when compared to Apache Hive. A lot of internal dashboards were powered by AWS-Redshift back then and it had data storage and compute coupled together. Our data was growing exponentially (doubling every few days) this required frequent storage scaling as well. With storage coupled with compute, any maintenance, upgrade required down time, and scaling nodes made querying extremely slow (as massive data moves across nodes), we needed a system where data and compute were decoupled, thats where Presto fit nicely into our use case. A pipeline to store event data in Parquet format had already been setup and data was being accessed via Hive. Adding Presto to this stack was icing on the cake
Presto infra stack
Now thousands of dashboards are powered by Presto and about 1.5K weekly active users are running a couple of million queries every month on this platform. As of today we have 60 PB of query-able event data stored in an S3 based data lake and about 10 PB of raw data is being scanned every day using Presto. Following are the charts showing the timeline of presto usage growth.
Monthly query volume growth
Daily input data processed
In the above chart we are looking at raw data scan volume per day. We have seen daily raw data scan volume growing 4X in last 4 months.
Presto Clients
Lyft users consume data using query tools such as Apache Superset, Mode, Looker, Tableau, Jupyter notebooks, and some internal ML tools that build and improve application specific machine learning models. All these tools connect to presto via Presto Gateway, queries sent by these clients are load-balanced across multiple Presto clusters uniformly. With the gateway we perform zero downtime upgrades, transparent to the users/apps querying through these clients.
Setup: build, release and deployments
We have forked prestosql/presto under lyft/presto and created the lyft-stable branch (adding additional cherry picks if needed). We have a private repository where we use saltstackand aws-orca scripts to roll out deployments to our environments. In this private branch we have added additional dependencies specific to our environment and have added integration tests that run on every update or pull request. We have dockerized this repo and use the Docker container as a development environment. On every new commit, we trigger the integration tests against the dev environment through Jenkins.
Here are the various components we use while releasing Presto to production.
- Lyft’s presto query-log plugin: We have added a query logging and blocking plugin based on Presto-Event listener framework. We use EventListener to intercept when new query lands and we block a few types of queries that we deem harmful for system (eg. some tools actively cache column names and query system.jdbc.columns or catalog.information_schema.columns and cause additional overload on the system). We log query stats in both queryCreated and queryCompleted events, this helps us analyze success rate, latencies, various types of failures and their root causes etc.
- Presto UDFs: We let our users add custom UDFs based on their needs if a function addressing the use case is not available in Presto. Some users have added custom geo functions based on their use cases.
- Python-based stats collection: We are using datadog-agent lib and have added multiple checks to collect system/jvm/process stats metrics and push to statsd. We have statsd collector that collects and pushes the metrics further to Wavefront. We have setup various alerts and get paged through pagerduty when there is a threshold breach.
- Tests suits: Before we roll a new presto version to production, there are multiple types of tests we run to ensure the quality of a release candidate.
Following are the type of tests we run to ensure quality of each release.
a. Integration test suite — Runs Table-CRUD and UDF tests against each build, launching devbox docker container in Jenkins build task.
b. Replayer test suite — After rolling a release candidate to pre-prod, we launch day-long test with an older days worth queries, replaying in the same way as the queries had executed in the past. We feed the query log using our event logging pipeline and compare the performance with previously logged data.
c. Verifier test suite — Similar to replayer, we use presto-verifier to run static set of queries and run against old vs new versions. We move to this version if the performance results have not degraded.
5. PrestoInfra: collection of configurations and scripts to build and release deploy-able artifacts and saltstack scripts to rollout environment specific deployments.
6. PrestoProxy: a customized presto-gateway with lyft specific routing rules and overrides.
Presto production environment and configurations
We are running multiple presto clusters sharing load through presto-gateway, each 100+ worker nodes. The node specifications are as follows
Presto cluster configuration
Presto configurations
On each cluster, we have set 25 max query concurrency and 200 max query queuing with each running 4 queries max and 20 queries queuing per user. Each query has 30 min max run time limit and 10 minutes max execution time limit. We rotate the entire cluster once in a day to avoid long old gen GC build up. We let each query scan up to 1 TB max with each worker node loading up to 10 GB data. Below are exact configurations to achieve these settings.
Presto server properties
In Presto gateway coordinators can be assigned a routing group each, setting X-Presto-Routing-Group header would route that request to one of the clusters under that routing group. We have one cluster which runs with extended memory limits assigned nolimit routing group. User has to add a comment ` — higherlimit` in the query as a hint to indicate resource heavy query and presto-gateway routes that query to the cluster with higher resource limit.
JVM Configuration
We are running Java 11 on presto nodes. Java 11 has parallel GC for old gen phase, reducing the collection time significantly.
Following are the JVM configuration we are running presto processes with. With Java 11 upgrade, we were able to bring down Old gen GC pause in the order of a couple of seconds, where with Java 8 it was peaking in the order of 400 seconds crippling the service every once in a while.
Presto JVM config salt template
value for max_heap_size_mb is (0.6 * node_memory_mb) which is 114 GB on worker nodes.
Presto node recycle
Despite all the optimizations, presto consumes host resources over time and from our experiences we have learned that the longer the service runs, the slower it becomes. Old gen GC pause time increases with time as more and more queries execute on the cluster also we have noticed that a newer cluster yields better query performance as compared to an old cluster. Keeping that in mind, we have designed the infrastructure to recycle all the nodes in each cluster once every 24 hours.
We use PrestoGateway’s deactivate API to disable the cluster 30 minutes before scheduled shutdown time and activate the cluster 30 minutes after the scheduled start time through a cron providing no-downtime maintenance. Since we have set 30 min max run time per query ensuring no query loss during these operations. We are using aws-orca scripts to trigger AWS’s ScheduledActions to bring up or down entire presto cluster on a given schedule.
This is the salt script for shutting down a cluster at 5:20 PM and bringing up back on 7PM Pacific Time.
ASG scheduled action based presto scaling
Presto Gateway
Presto-Gateway is a stateful load-balancer, proxy and router for multiple presto clusters, it provides transparent access to underlying presto-backend without changing the protocol. It started as a proxy service project to expose presto coordinators to external BI tools like Mode and Looker in a secure way. As the query volume increased, we faced more frequent outages as single cluster could not handle the load. We soon realized that we need multi cluster setup. So we dedicated one cluster to each such query tool. Though it reduced the frequency of outages and incidents, we still had not achieved zero downtime maintenance and noticed that one cluster is relatively free when other was suffering with high queuing. At that point we decided to implement a true proxy load balancing router. Since the BI tools were external, they were accessing presto infra via an agent which is deployed in our network, these agents were unable to follow HTTP-redirects, so we needed to implement a true proxy router and load-balancer.
We used JettyProxy which is a proxy-server that allows binding custom filters and plugging proxy-handler. We implemented routing rules in a Proxy Handler which lets us intercept HTTP Requests and response allowing to inspect query, source, headers etc and based on that we may choose a backend host to serve the query request. There are two types of requests sent from clients 1. New query submission request, 2. Followup request for a previously submitted query. PrestoGateway proxy handler caches the query id to backend mapping so that it can stick the followup requests to the backend which ran the original query.
Presto Gateway has 3 components:
BaseApp — It provides boilerplate code to add/remove plug-able components through yaml configuration and has inbuilt metrics registry module making it easy to emit custom metrics for the apps built on top of this.
BaseApp class diagram
ProxyServer — Its a library built on top of jetty proxy which provides a proxy server implementation with a plug-able proxy-handler.
ProxyServer class diagram
Gateway — This component acts as container for proxy-server and plugs in ProxyHanders to provide proxy, routing and load balancing functionalities. It also exposes a few end points and UI to activate / deactivate backends and query history view for recently submitted queries
Presto Gateway UI
We are using lombok to reduce a lot of boiler plate code such as getters/setters/logger/equals etc speeding up the development process and the app is built using dropwizard framework.
Cost aware scaling to meet query demand
We started with one Presto cluster and as the usage grew, we kept on adding more worker nodes to support the higher compute demand. As we added more worker nodes, to utilize the full potential of cluster, the query concurrency setting had to be raised and that required presto-coordinator restart causing a down time. After introducing the presto-gateway, we went multi-cluster mode using gateway as load-balancer. The entire presto infrastructure usage is not uniform throughout the day and the nature of our workloads have been bursty, so we over provision the infra so if there is a burst of queries, infra should be able to absorb it gracefully.
Query volume for a week (granularity — 1 hour)
To optimize cost we implemented dynamic scaling. We looked at the incoming query volume rate and noticed that usage is usually higher during business hours. After implementing backend activate/deactivate API at gateway we were already able to perform no-downtime upgrade, deployments and maintenance. We took this to the next level and added scheduled down time for half the number of clusters during non-business hours. 30 minutes before triggering the shutdown, we disable the cluster using Gateway APIs, since we have set 30 minutes max-run time for any query ensuring no query is penalized in this process. Following chart shows how number of nodes vary with time.
Total number of nodes over a week time
Cutting 50% infra during non-business hours resulted 30% cost savings overall.
Google sheets connector plugin
This allows Presto to read data from a Google sheet so that small dimensional data can be added into a query.
Querying google sheet as table in Presto
We announced this feature in Presto Summit 2019 and contributed it back to open-source.
Limitations
Currently gsheets connector plugin has following restrictions.
- All the sheets have to be shared with service account user at least with view access if sheet is not public.
- First line of sheet is always considered as header having all column names.
- All columns are resolved with type VARCHAR.
- Gsheets API’s rate limit — 100 calls per 100 seconds (unlimited for the day) will be applied if google project account does not have billing enabled. To avoid this users may choose higher value for cache config property — sheets-data-expire-after-write in presto-gsheets configuration.
Setup instructions
Following are the steps to set up presto google sheets plugin:
Step 1.
To use this, user has to generate service account JSON credential in Gsuite-console, after creating a project and granting read access to Google sheets API. This credential file should be stored and provided in the sheets.properties as credentials-path. Service account user id will be in credentials file, user should share all the google sheets with this service account user to read the sheet in presto.
Step 2.
Now create a metadata sheet and to add table to sheet id mapping. Share this sheet with service account user and note down the sheet id. Add this sheet id in sheets.properties as metadata-sheet-id config property.
Step 3.
Now add the table name and corresponding sheet id (after sharing the sheet with service account user) in the metadata sheet and you should be able to query from your table in presto.
Metadata sheet
Sample table
Step 4.
Happy querying !
Show tables
Desc table
Selecting from table
This feature is currently in beta at lyft and has become pretty popular since we rolled it out and about a couple of hundred users already running thousands of queries every day on 50+ google sheets as table.
Google sheets as table usage — each color band is query volume per user
Superset Presto Integration Improvements
There are many challenges users face when developing a SQL based workflow or pipeline. Query browsers provide error and suggestions to fix syntactical errors (bad sql or wrong UDF usage) after executing the query. It takes some time for users to figure out and resolve such errors through multiple iterations and degrades the user experience. We implemented Explain Type (Validate) queries in Presto and send these explain queries as user types in sqlLab in Superset ahead of actual query execution and the returned explain plan captures the syntactical errors along with validity of columns, tables and udf signatures. This performs deep query validation without actually executing the whole query improving the overall querying experience by eliminating debugging time involved in writing complex sql queries.
Table name validation while writing presto query
Column validation while writing the query
UDF validation while writing the query
Apache Superset — pre execution deep query validation. We have contributed these features back to the open source.
Summary
It is important for all the teams at Lyft to make data driven decisions and it has been Data Platform team’s mission to make data at the heart of all decisions made at Lyft. Presto is a key component for us to achieve this mission. In the last couple of years we have been primarily focused on scaling the infrastructure, reducing data arrival latency, improving user querying experience while being cost effective at the same time.
A huge thanks to the Data Platform Infra team who helped in improving, scaling and supporting Presto Infrastructure at Lyft.
Also, thanks to the rest of the PrestoSQL open source community for helping and supporting us through the development. Please check this page on how to get involved in the project.
The Data Platform team is actively looking for talented and driven engineers, data engineers, scientists, product managers, and marketers to join our team. Learn about life at Lyft and visit the careers section for the latest openings.
Presto Infrastructure at Lyft的更多相关文章
- Amundsen — Lyft’s data discovery & metadata engine
转自:https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9 In order to incre ...
- presto-gateway lyft 团队开源的prestodb 的负载均衡、代理、网关工具
presto-gateway 是 lyft 团队开源 的prestodb 的工具,很方便,我们可以用来方便的管理presto 多集群 通过yaml 进行配置管理,可以方便的管理不同的集群 lyft 参 ...
- Lyft 基于 Flink 的大规模准实时数据分析平台(附FFA大会视频)
摘要:如何基于 Flink 搭建大规模准实时数据分析平台?在 Flink Forward Asia 2019 上,来自 Lyft 公司实时数据平台的徐赢博士和计算数据平台的高立博士分享了 Lyft 基 ...
- DirectX Graphics Infrastructure(DXGI):最佳范例 学习笔记
今天要学习的这篇文章写的算是比较早的了,大概在DX11时代就写好了,当时龙书11版看得很潦草,并没有注意这篇文章,现在看12,觉得是跳不过去的一篇文章,地址如下: https://msdn.micro ...
- #数据技术选型#即席查询Shib+Presto,集群任务调度HUE+Oozie
郑昀 创建于2014/10/30 最后更新于2014/10/31 一)选型:Shib+Presto 应用场景:即席查询(Ad-hoc Query) 1.1.即席查询的目标 使用者是产品/运营/销售 ...
- presto的动态化应用(一):presto节点的横向扩展与伸缩
一.presto动态化概述 近年来,基于hadoop的sql框架层出不穷,presto也是其中的一员.从2012年发展至今,依然保持年轻的活力(版本迭代依然很快),presto的相关介绍,我们就不赘述 ...
- Oracle Grid Infrastructure安装部署文档
1. 部署环境步骤 1.1 软件环境 操作系统: CentOS release 6.5 oracle安装包: linux.x64_11gR2_grid.zip linux.x64_11gR2_data ...
- Lyft押重注于苹果编程语言Swift
Lyft押重注于苹果编程语言Swift 1年后获得丰厚回报BI中文站 8月22日报道 一年多以前,打车应用Lyft做出重大决定,决心押重注于苹果开发的编程语言Swift,用这种编程语言重写其所有iPh ...
- 未能加载文件或程序集 Microsoft.Web.Infrastructure, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad
没有 microsoft.web.infrastructure.dll 文件. 解决方式: ,第一种:在所发布的项目文件里面的 bin 文件目录下,添加 microsoft.web.infrastru ...
随机推荐
- Django ORM 数据库增删改查
Django ORM 数据库增删改查 增 # 创建.增加数据(推荐) models.UserInfo.objects.create(username=') # 创建.增加数据 dic = {'} mo ...
- 什么是SQL ?
SQL 1.什么是SQL ? Structured Query Languange:结构化查询语言 其实就是定义了操作所有关系型数据库的规则.每一种数据库操作的方式存在不一样的地方,称为“方言”. 2 ...
- Java四种权限修饰符 在创建类中的使用
四种权限修饰符 Java中有四种权限修饰符 public > protected > (default) >private 同一个类(我自己) YES YES YES YES同一个包 ...
- drools规则语法(一)
1.基本的匹配规则 1.1变量 drools使用匹配的方式对Fact进行比对, 比如 account : Account(balance > 100) 这个规则的含义就是在Fact中找到类型为A ...
- 推荐一个GOLANG入门很好的网址
推荐一个GOLANG入门很好的网址,栗子很全 https://books.studygolang.com/gobyexample/
- Linux学习笔记之Btrfs 文件系统
0x00 btrfs文件系统简介 btrfs文件系统:技术预览版(Centos7)Btrfs(B-tree.Butter FS.Better FS),GPL授权,Orale2007提出是想用来取代Ex ...
- SQL Server的唯一键和唯一索引会将空值(NULL)也算作重复值
我们先在SQL Server数据库中,建立一张Students表: CREATE TABLE [dbo].[Students]( ,) NOT NULL, ) NULL, ) NULL, [Age] ...
- 【转载】C#中ArrayList集合类使用Add方法添加元素
ArrayList集合是C#中的一个非泛型的集合类,是弱数据类型的集合类,可以使用ArrayList集合变量来存储集合元素信息,任何数据类型的变量都可加入到同一个ArrayList集合中,因此使用Ar ...
- vue组件6 使用vue添加样式
class绑定,内联样式 数组语法 :class="[stylename]" js:data{stylename:classname} 对象语法:class={stylena ...
- js学习之存储
一.Cookie和Session的区别 1.cookie数据存放在客户的浏览器上,session数据放在服务器上(一般以内存.数据库.文件形式). 2.session会在一定时间内保存在服务器上.当访 ...