转载一篇关于 lyft presto 平台建设的实践

Overview

Early in 2017 we started exploring Presto for OLAP use cases and we realized the potential of this amazing query engine. It started as an adhoc querying tool for data engineers and analysts to run SQL in a faster way to prototype their queries, when compared to Apache Hive. A lot of internal dashboards were powered by AWS-Redshift back then and it had data storage and compute coupled together. Our data was growing exponentially (doubling every few days) this required frequent storage scaling as well. With storage coupled with compute, any maintenance, upgrade required down time, and scaling nodes made querying extremely slow (as massive data moves across nodes), we needed a system where data and compute were decoupled, thats where Presto fit nicely into our use case. A pipeline to store event data in Parquet format had already been setup and data was being accessed via Hive. Adding Presto to this stack was icing on the cake

Presto infra stack

Now thousands of dashboards are powered by Presto and about 1.5K weekly active users are running a couple of million queries every month on this platform. As of today we have 60 PB of query-able event data stored in an S3 based data lake and about 10 PB of raw data is being scanned every day using Presto. Following are the charts showing the timeline of presto usage growth.

Monthly query volume growth

Daily input data processed

In the above chart we are looking at raw data scan volume per day. We have seen daily raw data scan volume growing 4X in last 4 months.

Presto Clients

Lyft users consume data using query tools such as Apache Superset, Mode, Looker, Tableau, Jupyter notebooks, and some internal ML tools that build and improve application specific machine learning models. All these tools connect to presto via Presto Gateway, queries sent by these clients are load-balanced across multiple Presto clusters uniformly. With the gateway we perform zero downtime upgrades, transparent to the users/apps querying through these clients.

Setup: build, release and deployments

We have forked prestosql/presto under lyft/presto and created the lyft-stable branch (adding additional cherry picks if needed). We have a private repository where we use saltstackand aws-orca scripts to roll out deployments to our environments. In this private branch we have added additional dependencies specific to our environment and have added integration tests that run on every update or pull request. We have dockerized this repo and use the Docker container as a development environment. On every new commit, we trigger the integration tests against the dev environment through Jenkins.

Here are the various components we use while releasing Presto to production.

Lyft’s presto query-log plugin: We have added a query logging and blocking plugin based on Presto-Event listener framework. We use EventListener to intercept when new query lands and we block a few types of queries that we deem harmful for system (eg. some tools actively cache column names and query system.jdbc.columns or catalog.information_schema.columns and cause additional overload on the system). We log query stats in both queryCreated and queryCompleted events, this helps us analyze success rate, latencies, various types of failures and their root causes etc.
Presto UDFs: We let our users add custom UDFs based on their needs if a function addressing the use case is not available in Presto. Some users have added custom geo functions based on their use cases.
Python-based stats collection: We are using datadog-agent lib and have added multiple checks to collect system/jvm/process stats metrics and push to statsd. We have statsd collector that collects and pushes the metrics further to Wavefront. We have setup various alerts and get paged through pagerduty when there is a threshold breach.
Tests suits: Before we roll a new presto version to production, there are multiple types of tests we run to ensure the quality of a release candidate.

Following are the type of tests we run to ensure quality of each release.

a. Integration test suite — Runs Table-CRUD and UDF tests against each build, launching devbox docker container in Jenkins build task.

b. Replayer test suite — After rolling a release candidate to pre-prod, we launch day-long test with an older days worth queries, replaying in the same way as the queries had executed in the past. We feed the query log using our event logging pipeline and compare the performance with previously logged data.

c. Verifier test suite — Similar to replayer, we use presto-verifier to run static set of queries and run against old vs new versions. We move to this version if the performance results have not degraded.

5. PrestoInfra: collection of configurations and scripts to build and release deploy-able artifacts and saltstack scripts to rollout environment specific deployments.

6. PrestoProxy: a customized presto-gateway with lyft specific routing rules and overrides.

Presto production environment and configurations

We are running multiple presto clusters sharing load through presto-gateway, each 100+ worker nodes. The node specifications are as follows

Presto cluster configuration

Presto configurations

On each cluster, we have set 25 max query concurrency and 200 max query queuing with each running 4 queries max and 20 queries queuing per user. Each query has 30 min max run time limit and 10 minutes max execution time limit. We rotate the entire cluster once in a day to avoid long old gen GC build up. We let each query scan up to 1 TB max with each worker node loading up to 10 GB data. Below are exact configurations to achieve these settings.

Presto server properties

In Presto gateway coordinators can be assigned a routing group each, setting X-Presto-Routing-Group header would route that request to one of the clusters under that routing group. We have one cluster which runs with extended memory limits assigned nolimit routing group. User has to add a comment ` — higherlimit` in the query as a hint to indicate resource heavy query and presto-gateway routes that query to the cluster with higher resource limit.

JVM Configuration

We are running Java 11 on presto nodes. Java 11 has parallel GC for old gen phase, reducing the collection time significantly.

Following are the JVM configuration we are running presto processes with. With Java 11 upgrade, we were able to bring down Old gen GC pause in the order of a couple of seconds, where with Java 8 it was peaking in the order of 400 seconds crippling the service every once in a while.

Presto JVM config salt template

value for max_heap_size_mb is (0.6 * node_memory_mb) which is 114 GB on worker nodes.

Presto node recycle

Despite all the optimizations, presto consumes host resources over time and from our experiences we have learned that the longer the service runs, the slower it becomes. Old gen GC pause time increases with time as more and more queries execute on the cluster also we have noticed that a newer cluster yields better query performance as compared to an old cluster. Keeping that in mind, we have designed the infrastructure to recycle all the nodes in each cluster once every 24 hours.

We use PrestoGateway’s deactivate API to disable the cluster 30 minutes before scheduled shutdown time and activate the cluster 30 minutes after the scheduled start time through a cron providing no-downtime maintenance. Since we have set 30 min max run time per query ensuring no query loss during these operations. We are using aws-orca scripts to trigger AWS’s ScheduledActions to bring up or down entire presto cluster on a given schedule.

This is the salt script for shutting down a cluster at 5:20 PM and bringing up back on 7PM Pacific Time.

ASG scheduled action based presto scaling

Presto Gateway

Presto-Gateway is a stateful load-balancer, proxy and router for multiple presto clusters, it provides transparent access to underlying presto-backend without changing the protocol. It started as a proxy service project to expose presto coordinators to external BI tools like Mode and Looker in a secure way. As the query volume increased, we faced more frequent outages as single cluster could not handle the load. We soon realized that we need multi cluster setup. So we dedicated one cluster to each such query tool. Though it reduced the frequency of outages and incidents, we still had not achieved zero downtime maintenance and noticed that one cluster is relatively free when other was suffering with high queuing. At that point we decided to implement a true proxy load balancing router. Since the BI tools were external, they were accessing presto infra via an agent which is deployed in our network, these agents were unable to follow HTTP-redirects, so we needed to implement a true proxy router and load-balancer.

We used JettyProxy which is a proxy-server that allows binding custom filters and plugging proxy-handler. We implemented routing rules in a Proxy Handler which lets us intercept HTTP Requests and response allowing to inspect query, source, headers etc and based on that we may choose a backend host to serve the query request. There are two types of requests sent from clients 1. New query submission request, 2. Followup request for a previously submitted query. PrestoGateway proxy handler caches the query id to backend mapping so that it can stick the followup requests to the backend which ran the original query.

Presto Gateway has 3 components:

BaseApp — It provides boilerplate code to add/remove plug-able components through yaml configuration and has inbuilt metrics registry module making it easy to emit custom metrics for the apps built on top of this.

BaseApp class diagram

ProxyServer — Its a library built on top of jetty proxy which provides a proxy server implementation with a plug-able proxy-handler.

ProxyServer class diagram

Gateway — This component acts as container for proxy-server and plugs in ProxyHanders to provide proxy, routing and load balancing functionalities. It also exposes a few end points and UI to activate / deactivate backends and query history view for recently submitted queries

Presto Gateway UI

We are using lombok to reduce a lot of boiler plate code such as getters/setters/logger/equals etc speeding up the development process and the app is built using dropwizard framework.

Cost aware scaling to meet query demand

We started with one Presto cluster and as the usage grew, we kept on adding more worker nodes to support the higher compute demand. As we added more worker nodes, to utilize the full potential of cluster, the query concurrency setting had to be raised and that required presto-coordinator restart causing a down time. After introducing the presto-gateway, we went multi-cluster mode using gateway as load-balancer. The entire presto infrastructure usage is not uniform throughout the day and the nature of our workloads have been bursty, so we over provision the infra so if there is a burst of queries, infra should be able to absorb it gracefully.

Query volume for a week (granularity — 1 hour)

To optimize cost we implemented dynamic scaling. We looked at the incoming query volume rate and noticed that usage is usually higher during business hours. After implementing backend activate/deactivate API at gateway we were already able to perform no-downtime upgrade, deployments and maintenance. We took this to the next level and added scheduled down time for half the number of clusters during non-business hours. 30 minutes before triggering the shutdown, we disable the cluster using Gateway APIs, since we have set 30 minutes max-run time for any query ensuring no query is penalized in this process. Following chart shows how number of nodes vary with time.

Total number of nodes over a week time

Cutting 50% infra during non-business hours resulted 30% cost savings overall.

Google sheets connector plugin

This allows Presto to read data from a Google sheet so that small dimensional data can be added into a query.

Querying google sheet as table in Presto

We announced this feature in Presto Summit 2019 and contributed it back to open-source.

Limitations

Currently gsheets connector plugin has following restrictions.

All the sheets have to be shared with service account user at least with view access if sheet is not public.
First line of sheet is always considered as header having all column names.
All columns are resolved with type VARCHAR.
Gsheets API’s rate limit — 100 calls per 100 seconds (unlimited for the day) will be applied if google project account does not have billing enabled. To avoid this users may choose higher value for cache config property — sheets-data-expire-after-write in presto-gsheets configuration.

Setup instructions

Following are the steps to set up presto google sheets plugin:

Step 1.

To use this, user has to generate service account JSON credential in Gsuite-console, after creating a project and granting read access to Google sheets API. This credential file should be stored and provided in the sheets.properties as credentials-path. Service account user id will be in credentials file, user should share all the google sheets with this service account user to read the sheet in presto.

Step 2.

Now create a metadata sheet and to add table to sheet id mapping. Share this sheet with service account user and note down the sheet id. Add this sheet id in sheets.properties as metadata-sheet-id config property.

Step 3.

Now add the table name and corresponding sheet id (after sharing the sheet with service account user) in the metadata sheet and you should be able to query from your table in presto.

Metadata sheet

Sample table

Step 4.

Happy querying !

Show tables

Desc table

Selecting from table

This feature is currently in beta at lyft and has become pretty popular since we rolled it out and about a couple of hundred users already running thousands of queries every day on 50+ google sheets as table.

Google sheets as table usage — each color band is query volume per user

Superset Presto Integration Improvements

There are many challenges users face when developing a SQL based workflow or pipeline. Query browsers provide error and suggestions to fix syntactical errors (bad sql or wrong UDF usage) after executing the query. It takes some time for users to figure out and resolve such errors through multiple iterations and degrades the user experience. We implemented Explain Type (Validate) queries in Presto and send these explain queries as user types in sqlLab in Superset ahead of actual query execution and the returned explain plan captures the syntactical errors along with validity of columns, tables and udf signatures. This performs deep query validation without actually executing the whole query improving the overall querying experience by eliminating debugging time involved in writing complex sql queries.

Table name validation while writing presto query

Column validation while writing the query

UDF validation while writing the query

Apache Superset — pre execution deep query validation. We have contributed these features back to the open source.

Summary

It is important for all the teams at Lyft to make data driven decisions and it has been Data Platform team’s mission to make data at the heart of all decisions made at Lyft. Presto is a key component for us to achieve this mission. In the last couple of years we have been primarily focused on scaling the infrastructure, reducing data arrival latency, improving user querying experience while being cost effective at the same time.

A huge thanks to the Data Platform Infra team who helped in improving, scaling and supporting Presto Infrastructure at Lyft.

Also, thanks to the rest of the PrestoSQL open source community for helping and supporting us through the development. Please check this page on how to get involved in the project.

The Data Platform team is actively looking for talented and driven engineers, data engineers, scientists, product managers, and marketers to join our team. Learn about life at Lyft and visit the careers section for the latest openings.

Presto Infrastructure at Lyft的更多相关文章

Amundsen — Lyft’s data discovery & metadata engine
转自:https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9 In order to incre ...
presto-gateway lyft 团队开源的prestodb 的负载均衡、代理、网关工具
presto-gateway 是 lyft 团队开源的prestodb 的工具,很方便,我们可以用来方便的管理presto 多集群通过yaml 进行配置管理,可以方便的管理不同的集群 lyft 参 ...
Lyft 基于 Flink 的大规模准实时数据分析平台（附FFA大会视频）
摘要:如何基于 Flink 搭建大规模准实时数据分析平台?在 Flink Forward Asia 2019 上,来自 Lyft 公司实时数据平台的徐赢博士和计算数据平台的高立博士分享了 Lyft 基 ...
DirectX Graphics Infrastructure(DXGI):最佳范例学习笔记
今天要学习的这篇文章写的算是比较早的了,大概在DX11时代就写好了,当时龙书11版看得很潦草,并没有注意这篇文章,现在看12,觉得是跳不过去的一篇文章,地址如下: https://msdn.micro ...
#数据技术选型#即席查询Shib+Presto，集群任务调度HUE+Oozie
郑昀创建于2014/10/30 最后更新于2014/10/31 一)选型:Shib+Presto 应用场景:即席查询(Ad-hoc Query) 1.1.即席查询的目标使用者是产品/运营/销售 ...
presto的动态化应用(一):presto节点的横向扩展与伸缩
一.presto动态化概述近年来,基于hadoop的sql框架层出不穷,presto也是其中的一员.从2012年发展至今,依然保持年轻的活力(版本迭代依然很快),presto的相关介绍,我们就不赘述 ...
Oracle Grid Infrastructure安装部署文档
1. 部署环境步骤 1.1 软件环境操作系统: CentOS release 6.5 oracle安装包: linux.x64_11gR2_grid.zip linux.x64_11gR2_data ...
Lyft押重注于苹果编程语言Swift
Lyft押重注于苹果编程语言Swift 1年后获得丰厚回报BI中文站 8月22日报道一年多以前,打车应用Lyft做出重大决定,决心押重注于苹果开发的编程语言Swift,用这种编程语言重写其所有iPh ...
未能加载文件或程序集 Microsoft.Web.Infrastructure, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad
没有 microsoft.web.infrastructure.dll 文件. 解决方式: ,第一种:在所发布的项目文件里面的 bin 文件目录下,添加 microsoft.web.infrastru ...

随机推荐

Angular的Observable可观察对象(转)
原文:https://blog.csdn.net/qq_34414916/article/details/85194098 Observable 在开始讲服务之前,我们先来看一下一个新东西——Obse ...
SQL分类之DDL：操作数据库表
DDL:操作数据库表 1.操作数据库:CRUD 1.C(Create):创建创建数据库: create database 数据库名称创建数据库,判断不存在,再创建: create database ...
订单BOM与销售BOM的区别
訂單BOM: 是實際生產時用的BOM, 在標準BOM和銷售BOM基礎上增減物料的BOM 銷售BOM: 是為特定客戶設定的BOM, 在主檔數據層次上的BOM, 在生產時是帶到訂單BOM中去的. 標準BO ...
修改host文件加速访问github
修改本地电脑系统 hosts 文件C:\Windows\System32\drivers\etc,直接在最后加入以下代码 192.30.253.112 github.com 192.30.253.11 ...
ubuntu开发常用收集
命令: 1.http://blog.csdn.net/simongeek/article/details/45271089 2.http://www.jianshu.com/p/654be9c0f13 ...
使用SqlBulkCopy将DataTable百万级数据瞬间入库
#region 使用SqlBulkCopy将DataTable中的数据批量插入数据库中 /// <summary> /// 注意:DataTable中的列需要与数据库表中的列完全一致.// ...
31、splice()用法
改变数组,向数组中添加/删除元素: eg: 1.删除元素: let arr=['bob','lily','bike','sam']; arr.splice(2,1) console.log(arr) ...
vue项目使用html5+ barcode扫码在苹果遇到的问题以及自己的解决方法
之前在记录扫码在安卓时,会出现黑屏,错位,闪退等等问题.解决方法在另一篇文章里 https://www.cnblogs.com/huzhuhua/p/11064764.html . 当时以为是 ...
[转]C++类成员修饰const和mutable
const:常量,不变的 mutable:易变的从意思上理解,可见const和mutable是一对反义词,它们都是C++的关键字. const成员函数不能修改调用它的对象.类的成员函数可以被声明为c ...
Android studio module生成jar包，module中引用的第三方库没有被引用，导致java.lang.NoClassDefFoundError错误。
android studio 创建了一个Module生成jar包,这个module中有引用一些第三方的类库,比如 gson,volley等. 但是生成的jar包里,并没有将gson,volley等第三 ...

Presto Infrastructure at Lyft