1)将coordinator和executor角色分离

By default, each host in the cluster that runs the impalad daemon can act as the coordinator for an Impala query, execute the fragments of the execution plan for the query, or both. During highly concurrent workloads for large-scale queries, especially on large clusters, the dual roles can cause scalability issues:

  • The extra work required for a host to act as the coordinator could interfere with its capacity to perform other work for the earlier phases of the query. For example, the coordinator can experience significant network and CPU overhead during queries containing a large number of query fragments. Each coordinator caches metadata for all table partitions and data files, which can be substantial and contend with memory needed to process joins, aggregations, and other operations performed by query executors.
  • Having a large number of hosts act as coordinators can cause unnecessary network overhead, or even timeout errors, as each of those hosts communicates with the statestored daemon for metadata updates.
  • The “soft limits” imposed by the admission control feature are more likely to be exceeded when there are a large number of heavily loaded hosts acting as coordinators.

2)default_pool_max_requests,默认是200,要根据自己集群的内存规模以及每个查询需要的内存进行调整;

Maximum number of concurrent outstanding requests allowed to run before incoming requests are queued. Because this limit applies cluster-wide, but each Impala node makes independent decisions to run queries immediately or queue them, it is a soft limit; the overall number of concurrent queries might be slightly higher during times of heavy load. A negative value indicates no limit. Ignored if fair_scheduler_config_path and llama_site_path are set.

3)开启kerberos之后,通过jdbc访问需要做客户端load balance,因为jdbc url里需要携带对应server的principal;

【原创】大数据基础之Impala(3)部分调优的更多相关文章

  1. 【原创】大数据基础之Impala(1)简介、安装、使用

    impala2.12 官方:http://impala.apache.org/ 一 简介 Apache Impala is the open source, native analytic datab ...

  2. 大数据技术 - MapReduce的Shuffle及调优

    本章内容我们学习一下 MapReduce 中的 Shuffle 过程,Shuffle 发生在 map 输出到 reduce 输入的过程,它的中文解释是 “洗牌”,顾名思义该过程涉及数据的重新分配,主要 ...

  3. 【原创】大数据基础之Impala(2)实现细节

    一 架构 Impala is a massively-parallel query execution engine, which runs on hundreds of machines in ex ...

  4. 【原创】大数据基础之Zookeeper(2)源代码解析

    核心枚举 public enum ServerState { LOOKING, FOLLOWING, LEADING, OBSERVING; } zookeeper服务器状态:刚启动LOOKING,f ...

  5. 【原创】大数据基础之Benchmark(2)TPC-DS

    tpc 官方:http://www.tpc.org/ 一 简介 The TPC is a non-profit corporation founded to define transaction pr ...

  6. 【原创】大数据基础之词频统计Word Count

    对文件进行词频统计,是一个大数据领域的hello word级别的应用,来看下实现有多简单: 1 Linux单机处理 egrep -o "\b[[:alpha:]]+\b" test ...

  7. 【原创】大数据基础之ElasticSearch(4)es数据导入过程

    1 准备analyzer 内置analyzer 参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis- ...

  8. 【原创】大数据基础之Hive(5)性能调优Performance Tuning

    1 compress & mr hive默认的execution engine是mr hive> set hive.execution.engine;hive.execution.eng ...

  9. 【原创】大数据基础之ElasticSearch(5)重要配置及调优

    Index Settings 重要索引配置 Index level settings can be set per-index. Settings may be: 1 static 静态索引配置 Th ...

随机推荐

  1. python爬虫基础应用----爬取校花网视频

    一.爬虫简单介绍 爬虫是什么? 爬虫是首先使用模拟浏览器访问网站获取数据,然后通过解析过滤获得有价值的信息,最后保存到到自己库中的程序. 爬虫程序包括哪些模块? python中的爬虫程序主要包括,re ...

  2. 老年OIer的Python实践记—— Codeforces Round #555 (Div. 3) solution

    对没错下面的代码全部是python 3(除了E的那个multiset) 题目链接:https://codeforces.com/contest/1157 A. Reachable Numbers 按位 ...

  3. tar.gz,直接解压可用?还是需要编译安装?

    在linux搭建环境,下载的tar.gz安装包,有的直接解压就可以用,有的需要编译安装后才可用 怎么知道该怎么操作呢? 其实,tar -zxvf解压后,进入目录看README.md就知道答案了 另外, ...

  4. babel7-按需加载polyfill

    babel7 babel7发布了. 在升级到 Babel 7 时需要注意几个重大变化: 移除对 Node.js 6 之前版本的支持: 使用带有作用域的 @babel 命名空间,以防止与官方 Babel ...

  5. 《Exception团队》第一次作业:团队亮相

    一.项目基本介绍 项目 内容 这个作业属于哪个课程 任课教师博客主页链接 这个作业的要求在哪里 作业链接地址 团队名称 Exception 作业学习目标 深入了解软件思想,强化编程技术 二.正文 1. ...

  6. hive字段名、注释中文显示问号

    问题如下图: 解决方法: header1的/etc/my.conf文件,在[mysqld]分组下面添加配置:character-set-server=utf8init_connect='SET NAM ...

  7. 操作系统层面聊聊BIO,NIO和AIO (epoll)

    BIO 有了Block的定义,就可以讨论BIO和NIO了.BIO是Blocking IO的意思.在类似于网络中进行read, write, connect一类的系统调用时会被卡住. 举个例子,当用re ...

  8. Github 开源项目(一)websocketd (实战:实时监控服务器内存信息)

    websocketd 是WebSocket守护进程,它负责处理WebSocket连接,启动您的程序来处理WebSockets,并在程序和Web浏览器之间传递消息. 安装:websocketd wget ...

  9. kubernetes包管理工具Helm安装

    helm官方建议使用tls,首先生成证书. openssl genrsa -out ca.key.pem openssl req -key ca.key.pem -new -x509 -days -s ...

  10. 080、Weave Scope 容器地图(2019-04-28 周日)

    参考https://www.cnblogs.com/CloudMan6/p/7655294.html   Weave Scope 的最大特点是会自动生成一张 Docker 容器地图,让我们能够直接的理 ...