hbase-indexer官网wiki
Home
Introduction
The HBase Indexer project provides indexing (via Solr) for content stored in HBase. It provides a flexible and extensible way of defining indexing rules, and is designed to scale.
Indexing is performed asynchronously, so it does not impact write throughput on HBase. SolrCloud is used for storing the actual index in order to ensure scalability of the indexing.
Getting started with the HBase Indexer
- Make sure you've got the required software installed, as detailed on the Requirements page.
- Follow the Tutorial to get a feel for how to use the HBase Indexer.
- Customize your indexing setup as needed using the other reference documentation provided here.
How it works
The HBase Indexer works by acting as an HBase replication sink. As updates are written to HBase region servers, they are "replicated" asynchronously to the HBase Indexer processes.
The indexer analyzes incoming HBase mutation events, and where applicable it creates Solr documents and pushes them to SolrCloud servers.
The indexed documents in Solr contain enough information to uniquely identify the HBase row that they are based on, allowing you to use Solr to search for content that is stored in HBase.
HBase replication is based on reading the HBase log files, which are the precise source of truth of the what is stored in HBase: there are no missing or no extra events. In various cases, the log also contains all the information needed to index, so that no expensive random-read on HBase is necessary (see the read-row attribute in the Indexer Configuration).
HBase replication delivers (small) batches of events. HBase-indexer exploits this by avoiding double-indexing of the same row if it would have been updated twice in a short time frame, and as well will batch/buffer the updates towards Solr, which gives important performance gains. The updates are applied to Solr before confirming the processing of the events to HBase, so that no event loss is possible.
Horizontal scalability
All information about indexers is stored in ZooKeeper. New indexer hosts can always be added to a cluster, in the same way that HBase regionservers can be added to to an HBase cluster.
All indexing work for a single configured indexer is shared over all machines in the cluster. In this way, adding additional indexer nodes allows horizontal scaling.
Automatic failure handling
The HBase replication system upon which the HBase Indexer is based is designed to handle hardware failures. Because the HBase Indexer is based on this system, it also benefits from the same ability to handle failures.
In general, indexing nodes going down or Solr nodes going down will not result in any lost data in the HBase Indexer.
hbase-indexer官网wiki的更多相关文章
- eclipse p2更新官网wiki的例子
官网的cvs好像没了,不过在github上找到一份,可用. https://github.com/anthonydahanne/make-p2-buildable-with-tycho/tree/ma ...
- hbases索引技术:Lily HBase Indexer介绍
Lily HBase Indexer 为hbase提供快速查询,他允许不写代码,快速容易的把hbase行索引到solr.Lily HBase Indexer drives HBase indexing ...
- 卸载 Cloudera Manager 5.1.x.和 相关软件【官网翻译】
问题导读: 1.不同的安装方式,卸载方法存在什么区别?2.不同的操作系统,卸载 Cloudera Manager Server and 数据库有什么区别? 重新安装不完整如果你来到这里,因为你的安装没 ...
- dubbo系列一:dubbo介绍、dubbo架构、dubbo的官网入门使用demo
一.dubbo介绍 Dubbo是阿里巴巴公司开源的一个高性能优秀的服务框架,使得应用可通过高性能的RPC实现服务的输出和输入功能,可以和Spring框架无缝集成.简单地说,dubbo是一个基于Spri ...
- OpenTSDB(时序数据库官网)
官网地址:http://opentsdb.net/ 下载地址:https://github.com/OpenTSDB/opentsdb/releases ----------------------- ...
- Java微信扫描支付模式二Demo ,整合官网直接运行版本
概述 场景介绍 用户使用微信“扫一扫”扫描二维码后,获取商品支付信息,引导用户完成支付. 详细 代码下载:http://www.demodashi.com/demo/13880.html 一.相关配置 ...
- caffe官网的部分翻译及NG的教程
Caffe原来叫:Convolutional Architecture for Fast Feature Embedding 官网的个人翻译:http://blog.csdn.net/fengbing ...
- 【原】Zookeeper 概述 + 官网 Overview 翻译
分布式应用 分布式应用 distributed application可以在给定时间(同时)在网络中的多个系统上运行,通过协调它们以快速有效的方式完成特定任务. (a), (b): a distrib ...
- 几个比較好的IT站和开发库官网
几个比較好的IT站和开发库官网 1.IT技术.项目类站点 (1)首推CodeProject,一个国外的IT站点,官网地址为:http://www.codeproject.com,这个站点为程序开发人员 ...
随机推荐
- 浅析Volatile关键字
浅析Volatile关键字 在java中线程并发中,线程之间通信方式分为两种:共享内存和消息传递.共享内存指的是多个线程之间共享内存的属性状态:消息传递指的是线程之间发送信息来通信.在介绍volati ...
- Peekaboo(2019年上海网络赛K题+圆上整点)
目录 题目链接 题意 思路 代码 题目链接 传送门 题意 你的位置在\(O(0,0)\),\(A\)的位置为\((x_1,y_1)\),\(B\)的位置为\((x_2,y_2)\),现在已知\(a=O ...
- Xcode添加库文件framework (转)
首先需要了解一下iOS中静态库和动态库.framework的概念 静态库与动态库的区别 首先来看什么是库,库(Library)说白了就是一段编译好的二进制代码,加上头文件就可以供别人使用. 什么时候我 ...
- SpringBoot整合自定义FTP文件连接池
说明:通过GenericObjectPool实现的FTP连接池,记录一下以供以后使用环境:JDK版本1.8框架 :springboot2.1文件服务器: Serv-U1.引入依赖 <!--ftp ...
- CentOS7.5下实现MySQL5.7主从同步
这里使用两台Linux主机(一台充当MySQL主服务器,另一台充当MySQL从服务器),MySQL用yum安装,版本均为5.7,下表是它们所使用的操作系统以及IP地址. 两台Linux主机所使用的操作 ...
- pydev离线安装及安装后eclipse中不显示解决办法
eclipse插件安装方法(离线安装)pydev进入eclipse目录1.创建links目录2.复制压缩包到目录前解压3.在links目录下新建pydev.link文件(记事本修改后缀名即可)4.py ...
- Linux—— 记录所有登陆用户的历史操作记录
前言 记录相应的人登陆服务器后,做了那些操作,这个不是我自己写的,因为时间久了,原作者连接也无法提供,尴尬. 步骤 history是查询当前连接所操作的命令,通过编写以下内容添加至/etc/profi ...
- NOIP 2006 明明的随机数
洛谷 P1059 明明的随机数 洛谷传送门 JDOJ 1423: [NOIP2006]明明的随机数 T1 JDOJ传送门 Description 明明想在学校中请一些同学一起做一项问卷调查,为了实验的 ...
- spark基础知识一
1. spark是什么 Apache Spark™ is a unified analytics engine for large-scale data processing. spark是针对于大规 ...
- CSS居中方案
1.行内元素或者内联元素 1.垂直居中 设置行高和高度一致,如果没必要设置高度的话,可以直接利用line-height垂直性,直接设置需要的高度为line-height的高度亦可居中 .center- ...