1. A region is decided to be split when store file size goes above hbase.hregion.max.filesize or according to defined region split policy.
  2. At this point this region is divided into two by region server.
  3. Region server creates two reference files for these two daughter regions.
  4. These reference files are stored in a new directory called splits under parent directory.
  5. Exactly at this point, parent region is marked as closed or offline so no client tries to read or write to it.
  6. Now region server creates two new directories in splits directory for these daughter regions.
  7. If steps till 6 are completed successfully, Region server moves both daughter region directories under table directory.
  8. The META table is now informed of the creation of two new regions, along with an update in the entry of parent region that it has now been split and is offline. (OFFLINE=true , SPLIT=true)
  9. The reference files are very small files containing only the key at which the split happened and also whether it represents top half or bottom half of the parent region.
  10. There is a class called “HalfHFileReader”which then utilizes these two reference files to read the original data file of parent region and also to decide as which half of the file has to be read.
  11. Both regions are now brought online by region server and start serving requests to clients.
  12. As soon as the daughter regions come online, a compaction is scheduled which rewrites the HFile of parent region into two HFiles independent for both daughter regions.
  13. As this process in step 12 completes, both the HFiles cleanly replace their respective reference files. The compaction activity happens under .tmp directory of daughter regions.
  14. With the successful completion till step 13, the parent region is now removed from META and all its files and directories marked for deletion.
  15. Finally Master server is informed by this Region server about two new regions getting born. Master now decides the fate of the two regions as to let them run on same region server or have them travel to another one.

How Region Split works in Hbase的更多相关文章

  1. 「从零单排HBase 05」核心特性region split

    HBase拥有出色的扩展性,其中最依赖的就是region的自动split机制. 1.split触发时机与策略 前面我们已经知道了,数据写入过程中,需要先写memstore,然后memstore满了以后 ...

  2. region split流程分析

    region split流程分析 splitregion的发起主要通过client端调用regionserver.splitRegion或memstore.flsuh时检查并发起. Client通过r ...

  3. [HBase]region split流程

    1. 简介 HBase 的最小管理单位为region,region会按照region 分裂策略进行分裂. 基于CDH5.4.2 2. 总览

  4. HBASE 基础命令总结

    HBASE基础命令总结 一,概述 本文中介绍了hbase的基础命令,作者既有记录总结hbase基础命令的目的还有本着分享的精神,和广大读者一起进步.本文的hbase版本是:HBase 1.2.0-cd ...

  5. Hbase学习02

    第2章 Apache HBase配置 本章在“入门”一章中进行了扩展,以进一步解释Apache HBase的配置. 请仔细阅读本章,特别是基本先决条件,确保您的HBase测试和部署顺利进行,并防止数据 ...

  6. hbase集群region数量和大小的影响

    1.Region数量的影响 通常较少的region数量可使群集运行的更加平稳,官方指出每个RegionServer大约100个regions的时候效果最好,理由如下: 1)Hbase的一个特性MSLA ...

  7. HBase如何选取split point

    hbase region split操作的一些细节,具体split步骤很多文档都有说明,本文主要关注regionserver如何选取split point 首先推荐web ui查看hbase regi ...

  8. Hbase split的三种方式和split的过程

    在Hbase中split是一个很重要的功能,Hbase是通过把数据分配到一定数量的region来达到负载均衡的.一个table会被分配到一个或多个region中,这些region会被分配到一个或者多个 ...

  9. hbase日常运维管用命令,region管理

    1         Hbase日常运维 1.1       监控Hbase运行状况 1.1.1        操作系统 1.1.1.1 IO 群集网络IO,磁盘IO,HDFS IO IO越大说明文件读 ...

随机推荐

  1. Python sqlalchemy orm 外键关联

    创建外键关联 并通过relationship 互相调用 如图: 实现代码: import sqlalchemy # 调用链接数据库 from sqlalchemy import create_engi ...

  2. Shell if条件语句

    1.if条件语句:设定一个条件如果怎么,然后怎么样. (1)-gt大于.-lt小于.-ge大于等于.-le小于等于.-eq等于.-ne不等于. (2)[]内是包括变量时所使用的. (3)-f文件.-n ...

  3. -bash: xhost: command not found

    参考自:http://blog.csdn.net/csdnones/article/details/51513163,感谢原作者解决了我的问题. 执行xhost +,报以下错误,原因是因未没有安装相关 ...

  4. 在Linux下OpenCV的下载和编译

    原理上来说,和windows下没有差别,我们同样使用Cmake-GUI来解决问题. 我们推荐QT和OpenCV全部采用官方的方式重新安装一遍,否则可能会丢失一些模块,而这些都会降低开发效率. 1.参考 ...

  5. 【HBase调优】Hbase万亿级存储性能优化总结

    背景:HBase主集群在生产环境已稳定运行有1年半时间,最大的单表region数已达7200多个,每天新增入库量就有百亿条,对HBase的认识经历了懵懂到熟的过程.为了应对业务数据的压力,HBase入 ...

  6. HTML基础【5】:表单标签

    表单标签 作用:用于收集用户信息,让用户填写.选择相关信息 格式: <from> 表单标签 </from> 注意事项: 所有的表单内容,都要写在form标签里面 form标签中 ...

  7. IOS面试题2018/11/17

    1.设计模式是什么?你知道哪些设计模式? 设计模式是一种编码经验,就是一种成熟的逻辑去处理某一种类型的事情. 1.MVC模式:model view controller,把模型,视图,控制器 层进行解 ...

  8. 20175317 《Java程序设计》第五周学习总结

    20175317 <Java程序设计>第五周学习总结 教材学习内容总结 第五周我学习了教材第六章的内容,了解了接口的知识,学到了以下内容: 明白了什么是接口 学会了如何实现接口 了解了接口 ...

  9. (16)线程---定时器Timer

    # ### 定时器:指定时间执行任务 from threading import Timer def func(): print("目前正在执行任务") t = Timer(5,f ...

  10. [JavaScript] 设置函数同名变量为false会导致函数无法执行

    var findEmail=false; function findEmail(){ alert("findEmail");} 这样函数不会运行. 为了保证函数可以运行,修改为: ...