转载一篇关于 lyft presto 平台建设的实践

Overview

Early in 2017 we started exploring Presto for OLAP use cases and we realized the potential of this amazing query engine. It started as an adhoc querying tool for data engineers and analysts to run SQL in a faster way to prototype their queries, when compared to Apache Hive. A lot of internal dashboards were powered by AWS-Redshift back then and it had data storage and compute coupled together. Our data was growing exponentially (doubling every few days) this required frequent storage scaling as well. With storage coupled with compute, any maintenance, upgrade required down time, and scaling nodes made querying extremely slow (as massive data moves across nodes), we needed a system where data and compute were decoupled, thats where Presto fit nicely into our use case. A pipeline to store event data in Parquet format had already been setup and data was being accessed via Hive. Adding Presto to this stack was icing on the cake

Presto infra stack

Now thousands of dashboards are powered by Presto and about 1.5K weekly active users are running a couple of million queries every month on this platform. As of today we have 60 PB of query-able event data stored in an S3 based data lake and about 10 PB of raw data is being scanned every day using Presto. Following are the charts showing the timeline of presto usage growth.

Monthly query volume growth

 

Daily input data processed

In the above chart we are looking at raw data scan volume per day. We have seen daily raw data scan volume growing 4X in last 4 months.

Presto Clients

Lyft users consume data using query tools such as Apache Superset, Mode, Looker, Tableau, Jupyter notebooks, and some internal ML tools that build and improve application specific machine learning models. All these tools connect to presto via Presto Gateway, queries sent by these clients are load-balanced across multiple Presto clusters uniformly. With the gateway we perform zero downtime upgrades, transparent to the users/apps querying through these clients.

Setup: build, release and deployments

We have forked prestosql/presto under lyft/presto and created the lyft-stable branch (adding additional cherry picks if needed). We have a private repository where we use saltstackand aws-orca scripts to roll out deployments to our environments. In this private branch we have added additional dependencies specific to our environment and have added integration tests that run on every update or pull request. We have dockerized this repo and use the Docker container as a development environment. On every new commit, we trigger the integration tests against the dev environment through Jenkins.

Here are the various components we use while releasing Presto to production.

  1. Lyft’s presto query-log plugin: We have added a query logging and blocking plugin based on Presto-Event listener framework. We use EventListener to intercept when new query lands and we block a few types of queries that we deem harmful for system (eg. some tools actively cache column names and query system.jdbc.columns or catalog.information_schema.columns and cause additional overload on the system). We log query stats in both queryCreated and queryCompleted events, this helps us analyze success rate, latencies, various types of failures and their root causes etc.
  2. Presto UDFs: We let our users add custom UDFs based on their needs if a function addressing the use case is not available in Presto. Some users have added custom geo functions based on their use cases.
  3. Python-based stats collection: We are using datadog-agent lib and have added multiple checks to collect system/jvm/process stats metrics and push to statsd. We have statsd collector that collects and pushes the metrics further to Wavefront. We have setup various alerts and get paged through pagerduty when there is a threshold breach.
  4. Tests suits: Before we roll a new presto version to production, there are multiple types of tests we run to ensure the quality of a release candidate.

Following are the type of tests we run to ensure quality of each release.

a. Integration test suite — Runs Table-CRUD and UDF tests against each build, launching devbox docker container in Jenkins build task.

b. Replayer test suite — After rolling a release candidate to pre-prod, we launch day-long test with an older days worth queries, replaying in the same way as the queries had executed in the past. We feed the query log using our event logging pipeline and compare the performance with previously logged data.

c. Verifier test suite — Similar to replayer, we use presto-verifier to run static set of queries and run against old vs new versions. We move to this version if the performance results have not degraded.

5. PrestoInfra: collection of configurations and scripts to build and release deploy-able artifacts and saltstack scripts to rollout environment specific deployments.

6. PrestoProxy: a customized presto-gateway with lyft specific routing rules and overrides.

Presto production environment and configurations

We are running multiple presto clusters sharing load through presto-gateway, each 100+ worker nodes. The node specifications are as follows

 

Presto cluster configuration

Presto configurations

On each cluster, we have set 25 max query concurrency and 200 max query queuing with each running 4 queries max and 20 queries queuing per user. Each query has 30 min max run time limit and 10 minutes max execution time limit. We rotate the entire cluster once in a day to avoid long old gen GC build up. We let each query scan up to 1 TB max with each worker node loading up to 10 GB data. Below are exact configurations to achieve these settings.

Presto server properties

In Presto gateway coordinators can be assigned a routing group each, setting X-Presto-Routing-Group header would route that request to one of the clusters under that routing group. We have one cluster which runs with extended memory limits assigned nolimit routing group. User has to add a comment ` — higherlimit` in the query as a hint to indicate resource heavy query and presto-gateway routes that query to the cluster with higher resource limit.

JVM Configuration

We are running Java 11 on presto nodes. Java 11 has parallel GC for old gen phase, reducing the collection time significantly.

Following are the JVM configuration we are running presto processes with. With Java 11 upgrade, we were able to bring down Old gen GC pause in the order of a couple of seconds, where with Java 8 it was peaking in the order of 400 seconds crippling the service every once in a while.

Presto JVM config salt template

value for max_heap_size_mb is (0.6 * node_memory_mb) which is 114 GB on worker nodes.

Presto node recycle

Despite all the optimizations, presto consumes host resources over time and from our experiences we have learned that the longer the service runs, the slower it becomes. Old gen GC pause time increases with time as more and more queries execute on the cluster also we have noticed that a newer cluster yields better query performance as compared to an old cluster. Keeping that in mind, we have designed the infrastructure to recycle all the nodes in each cluster once every 24 hours.

We use PrestoGateway’s deactivate API to disable the cluster 30 minutes before scheduled shutdown time and activate the cluster 30 minutes after the scheduled start time through a cron providing no-downtime maintenance. Since we have set 30 min max run time per query ensuring no query loss during these operations. We are using aws-orca scripts to trigger AWS’s ScheduledActions to bring up or down entire presto cluster on a given schedule.

This is the salt script for shutting down a cluster at 5:20 PM and bringing up back on 7PM Pacific Time.

ASG scheduled action based presto scaling

Presto Gateway

Presto-Gateway is a stateful load-balancer, proxy and router for multiple presto clusters, it provides transparent access to underlying presto-backend without changing the protocol. It started as a proxy service project to expose presto coordinators to external BI tools like Mode and Looker in a secure way. As the query volume increased, we faced more frequent outages as single cluster could not handle the load. We soon realized that we need multi cluster setup. So we dedicated one cluster to each such query tool. Though it reduced the frequency of outages and incidents, we still had not achieved zero downtime maintenance and noticed that one cluster is relatively free when other was suffering with high queuing. At that point we decided to implement a true proxy load balancing router. Since the BI tools were external, they were accessing presto infra via an agent which is deployed in our network, these agents were unable to follow HTTP-redirects, so we needed to implement a true proxy router and load-balancer.

We used JettyProxy which is a proxy-server that allows binding custom filters and plugging proxy-handler. We implemented routing rules in a Proxy Handler which lets us intercept HTTP Requests and response allowing to inspect query, source, headers etc and based on that we may choose a backend host to serve the query request. There are two types of requests sent from clients 1. New query submission request, 2. Followup request for a previously submitted query. PrestoGateway proxy handler caches the query id to backend mapping so that it can stick the followup requests to the backend which ran the original query.

Presto Gateway has 3 components:

BaseApp — It provides boilerplate code to add/remove plug-able components through yaml configuration and has inbuilt metrics registry module making it easy to emit custom metrics for the apps built on top of this.

BaseApp class diagram

ProxyServer — Its a library built on top of jetty proxy which provides a proxy server implementation with a plug-able proxy-handler.

 

ProxyServer class diagram

Gateway — This component acts as container for proxy-server and plugs in ProxyHanders to provide proxy, routing and load balancing functionalities. It also exposes a few end points and UI to activate / deactivate backends and query history view for recently submitted queries

Presto Gateway UI

We are using lombok to reduce a lot of boiler plate code such as getters/setters/logger/equals etc speeding up the development process and the app is built using dropwizard framework.

Cost aware scaling to meet query demand

We started with one Presto cluster and as the usage grew, we kept on adding more worker nodes to support the higher compute demand. As we added more worker nodes, to utilize the full potential of cluster, the query concurrency setting had to be raised and that required presto-coordinator restart causing a down time. After introducing the presto-gateway, we went multi-cluster mode using gateway as load-balancer. The entire presto infrastructure usage is not uniform throughout the day and the nature of our workloads have been bursty, so we over provision the infra so if there is a burst of queries, infra should be able to absorb it gracefully.

 

Query volume for a week (granularity — 1 hour)

To optimize cost we implemented dynamic scaling. We looked at the incoming query volume rate and noticed that usage is usually higher during business hours. After implementing backend activate/deactivate API at gateway we were already able to perform no-downtime upgrade, deployments and maintenance. We took this to the next level and added scheduled down time for half the number of clusters during non-business hours. 30 minutes before triggering the shutdown, we disable the cluster using Gateway APIs, since we have set 30 minutes max-run time for any query ensuring no query is penalized in this process. Following chart shows how number of nodes vary with time.

 

Total number of nodes over a week time

Cutting 50% infra during non-business hours resulted 30% cost savings overall.

Google sheets connector plugin

This allows Presto to read data from a Google sheet so that small dimensional data can be added into a query.

Querying google sheet as table in Presto

We announced this feature in Presto Summit 2019 and contributed it back to open-source.

Limitations

Currently gsheets connector plugin has following restrictions.

  1. All the sheets have to be shared with service account user at least with view access if sheet is not public.
  2. First line of sheet is always considered as header having all column names.
  3. All columns are resolved with type VARCHAR.
  4. Gsheets API’s rate limit — 100 calls per 100 seconds (unlimited for the day) will be applied if google project account does not have billing enabled. To avoid this users may choose higher value for cache config property — sheets-data-expire-after-write in presto-gsheets configuration.

Setup instructions

Following are the steps to set up presto google sheets plugin:

Step 1.

To use this, user has to generate service account JSON credential in Gsuite-console, after creating a project and granting read access to Google sheets API. This credential file should be stored and provided in the sheets.properties as credentials-path. Service account user id will be in credentials file, user should share all the google sheets with this service account user to read the sheet in presto.

Step 2.

Now create a metadata sheet and to add table to sheet id mapping. Share this sheet with service account user and note down the sheet id. Add this sheet id in sheets.properties as metadata-sheet-id config property.

Step 3.

Now add the table name and corresponding sheet id (after sharing the sheet with service account user) in the metadata sheet and you should be able to query from your table in presto.

Metadata sheet

Sample table

Step 4.

Happy querying !

Show tables

Desc table

Selecting from table

This feature is currently in beta at lyft and has become pretty popular since we rolled it out and about a couple of hundred users already running thousands of queries every day on 50+ google sheets as table.

Google sheets as table usage — each color band is query volume per user

Superset Presto Integration Improvements

There are many challenges users face when developing a SQL based workflow or pipeline. Query browsers provide error and suggestions to fix syntactical errors (bad sql or wrong UDF usage) after executing the query. It takes some time for users to figure out and resolve such errors through multiple iterations and degrades the user experience. We implemented Explain Type (Validate) queries in Presto and send these explain queries as user types in sqlLab in Superset ahead of actual query execution and the returned explain plan captures the syntactical errors along with validity of columns, tables and udf signatures. This performs deep query validation without actually executing the whole query improving the overall querying experience by eliminating debugging time involved in writing complex sql queries.

Table name validation while writing presto query

Column validation while writing the query

UDF validation while writing the query

Apache Superset — pre execution deep query validation. We have contributed these features back to the open source.

Summary

It is important for all the teams at Lyft to make data driven decisions and it has been Data Platform team’s mission to make data at the heart of all decisions made at Lyft. Presto is a key component for us to achieve this mission. In the last couple of years we have been primarily focused on scaling the infrastructure, reducing data arrival latency, improving user querying experience while being cost effective at the same time.

A huge thanks to the Data Platform Infra team who helped in improving, scaling and supporting Presto Infrastructure at Lyft.

Also, thanks to the rest of the PrestoSQL open source community for helping and supporting us through the development. Please check this page on how to get involved in the project.

The Data Platform team is actively looking for talented and driven engineers, data engineers, scientists, product managers, and marketers to join our team. Learn about life at Lyft and visit the careers section for the latest openings.

Presto Infrastructure at Lyft的更多相关文章

  1. Amundsen — Lyft’s data discovery & metadata engine

    转自:https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9 In order to incre ...

  2. presto-gateway lyft 团队开源的prestodb 的负载均衡、代理、网关工具

    presto-gateway 是 lyft 团队开源 的prestodb 的工具,很方便,我们可以用来方便的管理presto 多集群 通过yaml 进行配置管理,可以方便的管理不同的集群 lyft 参 ...

  3. Lyft 基于 Flink 的大规模准实时数据分析平台(附FFA大会视频)

    摘要:如何基于 Flink 搭建大规模准实时数据分析平台?在 Flink Forward Asia 2019 上,来自 Lyft 公司实时数据平台的徐赢博士和计算数据平台的高立博士分享了 Lyft 基 ...

  4. DirectX Graphics Infrastructure(DXGI):最佳范例 学习笔记

    今天要学习的这篇文章写的算是比较早的了,大概在DX11时代就写好了,当时龙书11版看得很潦草,并没有注意这篇文章,现在看12,觉得是跳不过去的一篇文章,地址如下: https://msdn.micro ...

  5. #数据技术选型#即席查询Shib+Presto,集群任务调度HUE+Oozie

    郑昀 创建于2014/10/30 最后更新于2014/10/31   一)选型:Shib+Presto 应用场景:即席查询(Ad-hoc Query) 1.1.即席查询的目标 使用者是产品/运营/销售 ...

  6. presto的动态化应用(一):presto节点的横向扩展与伸缩

    一.presto动态化概述 近年来,基于hadoop的sql框架层出不穷,presto也是其中的一员.从2012年发展至今,依然保持年轻的活力(版本迭代依然很快),presto的相关介绍,我们就不赘述 ...

  7. Oracle Grid Infrastructure安装部署文档

    1. 部署环境步骤 1.1 软件环境 操作系统: CentOS release 6.5 oracle安装包: linux.x64_11gR2_grid.zip linux.x64_11gR2_data ...

  8. Lyft押重注于苹果编程语言Swift

    Lyft押重注于苹果编程语言Swift 1年后获得丰厚回报BI中文站 8月22日报道 一年多以前,打车应用Lyft做出重大决定,决心押重注于苹果开发的编程语言Swift,用这种编程语言重写其所有iPh ...

  9. 未能加载文件或程序集 Microsoft.Web.Infrastructure, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad

    没有 microsoft.web.infrastructure.dll 文件. 解决方式: ,第一种:在所发布的项目文件里面的 bin 文件目录下,添加 microsoft.web.infrastru ...

随机推荐

  1. Android开发DDMS找不到Emulator Control的方法

    1.右键DDMS,点击reset. 2.window->show view->other->android->Emulator Control

  2. 《C++ Primer》学习总结;兼论如何使用'书'这种帮助性资料

    6.25~ 6.27,用了3天翻了一遍<C++ Primer>. ▶书的 固有坏处 一句话: 代码比 文字描述 好看多了.————> 直接看习题部分/ 看demo就行了 看文字在描述 ...

  3. python基础08--迭代器,生成器

    1.1 迭代器 1.可迭代对象:str,list,tuple,set,dict, 迭代器:f(文件),range() 可迭代对象和迭代器都可以for循环 可迭代对象不会计数, 迭代器会计数,上面操作到 ...

  4. MySQL基础-1

    目录 数据库的基本概念 什么是数据库 为什么要使用数据库 数据库的分类 数据库的重要概念 数据库的安装 安装步骤 简单使用数据库 数据库的基本概念 什么是数据库 字面意思数据库就是存储数据的仓库,正常 ...

  5. linux route路由

    网关(Gateway)又称网间连接器.协议转换器.网关在网络层以上实现网络互连 就好像一个房间可以有多扇门一样,一台主机可以有多个网关.默认网关的意思是一台主机如果找不到可用的网关,就把数据包发给默认 ...

  6. Redux 进阶之 react-redux 和 redux-thunk 的应用

    1. react-redux React-Redux 是 Redux 的官方 React 绑定库. React-Redux 能够使你的React组件从Redux store中读取数据,并且向 stor ...

  7. Windows Server 2012 R2 远程桌面自动设置为不允许连接问题解决方案

    用“gpedit.msc”调出策略组设置窗口,在策略组界面点击:计算机配置->管理模块->Windows组件->远程桌面服务->远程桌面会话主机->连接->允许用户 ...

  8. mac php thinkphp5 验证码报错 Call to undefined function think\captcha\imagettftext()

    百度一下,是GD库里缺少了freetype支持,然后各种拓展的方法都试了半天,php-v里都生效了,phpinfo里还是不生效,原来是各种文章里都缺少了最关键的一步,修改Apache的配置(我使用的是 ...

  9. C#-Retrieving the COM class factory for component with CLSID {00024500-0000-0000-C000-000000000046}

    异常信息如下 捕获到执行这句时异常: Excel.Application ep = new Excel.ApplicationClass(); Retrieving the COM class fac ...

  10. Linux操作系统之用户权限,重定向,文件管理

    文件的权限 ls -al  ----->隐藏文件会以 .号开头 ls -ld :显示目录自身属性 ls -i 显示文件的索引号----每个文件都有一个对应的号码 ls -r 逆序显示 dr-xr ...