For early Hbase developers, it is often a nightmare to understand how the different modules speak among

themselves and what. This comes in handy to understand what each sub-system is responsible for and which informations

do they maintain. This post is about documenting all the different protocols through which the HBase modules

speak among themselves. I will cover the Hbase specific protocols in this post especially protocol buffer ones.

HBase Protocols

In the Hbase 0.96.0 release, hbase is moving to protocol buffers for communicating with different sub-systems.  There are 5 major  protocols which is used as shown in the figure above.

MasterMonitorProtocol , Protocol that a client uses to communicate with the Master (for monitoring purposes).

  • MasterMonitorService

    • GetSchemaAlterStatusRequest : Used by the client to get the number of regions that have received the updated schema
    • GetSchemaAlterStatusResponse: Indicates the number of regions updated. yetToUpdateRegions is the regions that are yet to be updated. totalRegions is the total number of regions of the table
    • GetTableDescriptorsRequest : Get list of TableDescriptors for requested tables. GetTableDescriptorsRequest that contains tableNames: requested tables, or if empty, all are requested
    • GetTableDescriptorsResponse: The TableSchema (see below for the description of TableSchema) for all the requested tables.
    • GetClusterStatusRequest: Used by the clients to get the current status of the cluster
    • GetClusterStatusResponse: The ClusterStatus object is returned.
  • MasterService

    • IsMasterRunningRequest: Used by clients to see if the master is running or not
    • IsMasterRunningResponse: returns True if the master is available.

MasterAdminProtocol, Protocol that a client uses to communicate with the Master (for admin purposes).

  • MasterAdminService

    • AddColumnRequest: Add a column to the specified table. Req contains tableName: table to modify and column:ColumnFamilySchema.
    • AddColumnResponse: Empty response
    • DeleteColumnRequest: Deletes a column from the specified table. Table must be disabled. Req contains tableName: table to modify and column: column Name
    • DeleteColumnResponse: Empty Response
    • ModifyColumnRequest: Modifies an existing column on the specified table. Req contains tableName: table to modify and column: ColumnFamilySchema.
    • ModifyColumnResponse: Empty Response.
    • MoveRegionRequest: Move a region to a specified destination server. The request that contains region:TheRegionSpecifier destServerName: TheServerName of the destination regionserver.  If passed the empty byte array we’ll
      assign to a random server.
    • MoveRegionResponse: Empty Response
    • AssignRegionRequest: Assign a region to a server chosen at random. req contains the RegionSpecifier.  Will use existing RegionPlan if one found.
    • AssignRegionResponse: Empty Response
    • UnassignRegionRequest: Unassign a region from current hosting regionserver.  Region will then be assigned to a regionserver chosen at random.  Region could be reassigned back to the same server. Req contains aRegionSpecifier (Will
      clear any existing RegionPlan if one found) and a bool to say wether to force it or not (Will remove region from regions-in-transition too if present as well as from assigned regions – radical!.If results in double assignment use hbck -fix to resolve).
    • UnassignRegionResponse: Empty Response
    • OfflineRegionRequest: Offline a region from the assignment manager’s in-memory state.  The region should be in a closed state and there will be no attempt to automatically reassign the region as in unassign.   This is a special method,
      and should only be used by experts or hbck. Req  contains the RegionSpecifier to make offline. Will clear any existing RegionPlan if one found.
    • OfflineRegionResponse: Empty Response
    • DeleteTableRequest: Deletes a table. Req contains the table name in bytes.
    • DeleteTableResponse: Empty Response
    • EnableTableRequest: Puts the table on-line (only needed if table has been previously taken offline).Req contains the table name in bytes.
    • EnableTableResponse: Empty Response.
    • DisableTableRequest: Take table offline. Req contains the table name in bytes.
    • DisableTableResponse: Empty Response
    • ModifyTableRequest: Modify a table’s metadata. req contains tableName: table to modify andTableSchema: new descriptor for table
    • ModifyTableResponse: Empty Response
    • CreateTableRequest: Creates a new table asynchronously.  If splitKeys are specified, then the table will be created with an initial set of multiple regions.If splitKeys is null, the table will be created with a single region. Also contains
      the TableSchema.
    • CreateTableResponse: Empty Response
    • ShutdownRequest: Shutdown an HBase cluster
    • ShutdownResponse: Empty Response
    • StopMasterRequest: Stop HBase Master only. Does not shutdown the cluster.
    • StopMasterResponse: Empty Response
    • BalanceRequest: Run the balancer.  Will run the balancer and if regions to move, it will go ahead and do the reassignments.  Can NOT run for various reasons.  Check logs.
    • BalanceResponse: BalanceResponse that contains: balancerRan: True if balancer ran and was able to tell the region servers to unassign all the regions to balance (the re-assignment itself is async), false otherwise.
    • SetBalancerRunningRequest: Turn the load balancer on or off.  req SetBalancerRunningRequest that contains: on: If true, enable balancer. If false, disable balancer. synchronous: if true, wait until current balance() call, if outstanding,
      to return.
    • SetBalancerRunningResponse: contains: prevBalanceValue: Previous balancer value
    • CatalogScanRequest: Run a scan of the catalog table
    • CatalogScanResponse: CatalogScanResponse that contains the int return code corresponding to the number of entries cleaned
    • EnableCatalogJanitorRequest: Enable/Disable the catalog janitor.contains: enable: If true, enable catalog janitor. If false, disable janitor.
    • EnableCatalogJanitorResponse: contains: prevValue: true, if it was enabled previously; false, otherwise
    • IsCatalogJanitorEnabledRequest: Query whether the catalog janitor is enabled
    • IsCatalogJanitorEnabledResponse: Contains: value: true, if it is enabled; false, otherwise

AdminProtocol, Protocol that a HBase client uses to communicate with a region server.

  • AdminService

    • GetRegionInfoRequest: Get the information of a region identified withRegionSpecifier. Also say wether to retrieve the compaction state  (NONE, MINOR, MAJOR, MINOR_AND_MAJOR) of the given region.
    • GetRegionInfoResponse: returns the RegionInfo and the comapction state (if requested) of the region.
    • GetStoreFileRequest: Get a list of store files for a set of column families in a particular region identified withRegionSpecifier.If no column family is specified, get the store files for all column families.
    • GetStoreFileResponse: A list of all store file names of the region.
    • GetOnlineRegionRequest: Get information of all the region which are online in a particular region server.
    • GetOnlineRegionResponse: list of RegionInfo of all the regions which are in online state in the given region server.
    • OpenRegionRequest: Open a region/ list of region identified withRegionInfo on the region server
    • OpenRegionResponse: A list with the state of the regions requested (OPENED, ALREADY_OPEN< FAILED_OPENING)
    • CloseRegionRequest: Closes the specified region and will use or not use ZK during the close according to the specified flag. req containsRegionSpecifier,ServerName transitionInZK flag etc.
    • CloseRegionResponse: A flag indicating that the region is closed or not.
    • FlushRegionRequest: Flushes the MemStore of the specified region identified withRegionSpecifier on the region server. This method is synchronous. req has a optional timestamp above which the region should be flushed otherwise not.
    • FlushRegionResponse: Has last flush time and a bool specifying wether the region was flushed or not (this field is not used in the code now).
    • SplitRegionRequest: Split a region identified by the RegionSpecifier on the region server. This method currently flushes the region and then forces a compaction which will then trigger a split. The flush is done synchronously but
      the compaction is asynchronous.
    • SplitRegionResponse: Empty Response
    • CompactRegionRequest:  Compacts the region identified by
      RegionSpecifier
      . Performs a major compaction if the flag is set to true. This method is asynchronous.
    • CompactRegionResponse: Empty Response
    • ReplicateWALEntryRequest: Replicates the given entries. The guarantee is that the given entries will be durable on the slave cluster if this method returns without any exception.”hbase.replication” has to be set to true for this to work.
    • ReplicateWALEntryResponse: Empty response
    • RollWALWriterRequest:  Roll the WAL writer of the region server
    • RollWALWriterResponse: A list of regions to flush.
    • GetServerInfoRequest: Get some information of the region server.
    • GetServerInfoResponse: Response contains the ServerName and the webui port.
    • StopServerRequest: Stop the region server. req contains an reason to stop.
    • StopServerResponse: Empty Response.

ClientProtocol, Protocol that a HBase client uses to communicate with a region server.

  • ClientService

    • GetRequest:  Get data from a table. Perform a single Get operation. Unless existenceOnly is specified, return all the requested data for the row that matches exactly, or the one that immediately precedes it if closestRowBefore isspecified. If
      existenceOnly is set, only the existence will be returned.
    • GetResponse: Contains the row to be fetched / the closest row before. If only existence was requested a bool is sent indicating if the row exists or not.
    • MutateRequest: Mutate data in a table (APPEND, INCREMENT, PUT_DELETE).  Perform a single Mutate operation.Optionally, you can specify a condition. The mutate will take place only if the condition is met. Otherwise,the mutate will be ignored.
    • MutateResponse: In the response result, parameter processed (only in PPEND and INCREMENT) is sent back along with a bool ( for PUT and DELETE) which indicates if the mutate actually happened.
    • ScanRequest:  Instead of get from a table, you can scan it with optional filters. You can specify the row key range, time range, the columns/families to scan and so on. A scan request. Initially, it should specify a scan. Later on, you can
      use the scanner id returned to fetch result batches with a different scan request. The scanner will remain open if there are more results, and it’s not asked to be closed explicitly. You can fetch the results and ask the scanner to be closed to save a trip
      if you are not interested in remaining results. req contains a RegionSpecifier.
    • ScanResponse: Contains the result of the scan (rows). If there are no more results, moreResults will be false. If it is not specified, it means there are more.
    • LockRowRequest: Lock a row in a table explicitly from the client. Req containsRegionSpecifier.
    • LockRowResponse: Returns the lock id and TTL (not implemented in code yet)of the lock.
    • UnlockRowRequest: Unlock a locked row in a table. req contains theRegionSpecifier and the lockId.
    • UnlockRowResponse: EmptyResponse
    • BulkLoadHFileRequest: Atomically bulk load multiple HFiles (say from different column families) into an open region. Req containsRegionSpecifier and Column name and the file path.
    • BulkLoadHFileResponse: A bool indicating wether the files are loaded or not.
    • ExecCoprocessorRequest: Executes a single  org.apache.hadoop.hbase.ipc.CoprocessorProtocol method using the registered protocol handlers. Req contains theRegionSpecifier. An individual coprocessor call. You must specify the protocol, the
      method, and the row to which the call will be executed. You can specify the configuration settings in the property list. The parameter list has the parameters used for the method. A parameter is a pair of parameter name and the binary parameter value. The
      name is the parameter class name. The value is the binary format of the parameter, for example, protocol buffer encoded value.
    • ExecCoprocessorResponse: Result of the call.
    • MultiRequest: Execute multiple actions on a table: get, mutate, and/or execCoprocessor. You can execute a list of actions on a given region in order. If it is a list of mutate actions, atomic can be set to make sure they can be processed
      atomically, just like RowMutations.
    • MultiResponse: List of action due to the operations happened.

RegionServerStatusProtocol, Protocol that a RegionServer uses to communicate its status to the Master.

  • RegionServerStatusService

    • RegionServerStartupRequest:  Sent when the region server first starts up. req contains port num, start code, and the server current time in ms.
    • RegionServerStartupResponse: Response from master  contains configuration (list of NameStringPair)for the regionserver to use: e.g. filesystem, hbase rootdir, the hostname to use creating the RegionServer ServerName, etc
    • RegionServerReportRequest: Periodical report sent from region server (ServerName) which containsServerLoad.
    • RegionServerReportResponse: Empty Response
    • ReportRSFatalErrorRequest: Report the Region Server abort (may be due to Oom or HDFS problems) to master. Req contains theServerName and error Message.
    • ReportRSFatalErrorResponse: Empty Response ( no one might receive this )

Internal Messages or Data Structures in Protocol Buffers

  • ClusterId

    • ClusterId : Content of the ‘/hbase/hbaseid’, cluster id, znode.
  • ClusterStatus

    • RegionState: State of the region while undergoing transition. Has  RegionInfo,State and  the timestamp.
    • RegionInTransition: Identifies all the regions in Transition. HasRegionSpecifier and  RegionState.
    • LiveServerInfo: Describes each Live server with its ServerName and  the currentServerLoad.
    • ClusterStatus: ClusterStatus provides clients with information such as
      • The count and names of region servers in the cluster.
      • The count and names of dead region servers in the cluster
      • The name of the active master for the cluster.
      • The name(s) of the backup master(s) for the cluster, if they exist.
      • The average cluster load.
      • The number of regions deployed on the cluster.
      • The number of requests since last report.
      • Detailed region server loading and resource usage information,
      • per server and per region.
      • Regions in transition at master
      • The unique cluster ID
      • Has HBaseVersionFileContentLiveServerInfoServerName (for dead servers),RegionInTransitionClusterIdCoprocessorServerName (for master), ServerName (for backupMaster) etc.
  • FS

    • HBaseVersionFileContent: The ${HBASE_ROOTDIR}/hbase.version file content
    • Reference: Reference file content used when we split an hfile under a region
  • HBase

    • TableSchema: Describes a table, has Table name and ColumnFamilySchema for each column family present.
    • ColumnFamilySchema: Describes the column family name and other attributes like Bloom filter, inMemory etc.
    • RegionInfo: Protocol buffer version of HRegionInfo. Has information like regionId, table name, start key, end key etc.
    • RegionSpecifier: You can specify region by region name, or the hash of the region name, which is known as encoded region name.
    • RegionLoad: Describes the Load on a given region server. Has informations about store files, req counts, index size and bloom size.
    • ServerLoad: Describes the load on a given server. Has information like req count, memory used,RegionLoads,Coprocessor etc.
    • TimeRange: A range of time. Both from and to are Java time stamp in milliseconds. If you don’t specify a time range, it means all time. By default, if not specified, from = 0, and to = Long.MAX_VALUE
    • KeyValue: The generic key type used in HLog. Has information like row, family, qualifier, timestamp, and the key type for mutable operations.
    • ServerName: Host , port of a server.
    • Coprocessor: Name in string.
    • NameStringPair: A generic name value pair in String format.
    • NameBytesPair: A generic name value pair in bytes format.
  • RPC

    • UserInformation : A real and effective user name
    • ConnectionHeader: Contains User Info beyond what is established at connection establishment and the protocol used. As part of setting up a connection to a server, the client needs to send the ConnectionHeader header. At the data level,
      this looks like <”hrpc”-bytearray><5-byte><length-of-serialized-ConnectionHeader-obj[int32]><ConnectionHeader-object serialized>
    • RpcRequest: For every RPC that the client makes it needs to send the RpcRequest. At the data level this looks like <length-of-serialized-RpcRequest-obj><RpcRequest-object serialized>
    • RpcException: The Server sends an exception message if the request throws an exception. It contains the class which threw up and the stack.
    • RpcResponse: The server sends back a RpcResponse object as response.At the data level this looks like <protobuf-encoded-length-of-serialized-RpcResponse-obj><RpcResponse-object serialized>.RpcException is optional here.
  • Zookeeper

    • RootRegionServer: Content of the root-region-server znode which contains theServerName hosting the root region currently.
    • Master: ServerName of the current master store in master znode.
    • ClusterUp: Content of the ‘/hbase/shutdown’, cluster state, znode.  If this znode is present, cluster is up. Currently the data is cluster startDate.
    • RegionTransition: What we write under unassigned up in zookeeper as a region moves through open/close, etc., regions. Details a region in transition. Contains originServerName, regionName, createTime etc.
    • SplitLogTask: WAL SplitLog directory znodes have this for content. Used doing distributed WAL splitting. Holds current state and name of server that originated split. Contains state andSevrverName.
    • Table: The znode that holds state of table (Enabled, Disabled etc). If no znode for a table, its state is presumed enabled.

原文地址:http://blog.zahoor.in/2012/08/protocol-buffers-in-hbase/

Protocol Buffers in HBase的更多相关文章

  1. 让Web API支持Protocol Buffers

    简介 现在我们Web API项目基本上都是使用的Json作为通信的格式,随着移动互联网的兴起,Web API不仅其他系统可以使用,手机端也可以使用,但是手机端也有相对特殊的地方,网络通信除了wifi, ...

  2. Xml,Json,Hessian,Protocol Buffers序列化对比

    简介 这篇博客主要对Xml,Json,Hessian,Protocol Buffers的序列化和反序列化性能进行对比,Xml和Json的基本概念就不说了. Hessian:Hessian是一个轻量级的 ...

  3. Protocol buffers 介绍

    Protocol buffers和mxl一样在序列化数据结构时很灵活.高效和智能,但是它的优势在于定义文件更小,读取速度更快,使用更加简单.目前protocol buffers支持C++.java和p ...

  4. C#/net 使用Protocol Buffers入门

    Protocol buffers 是一个由谷歌开发的开源的编码机制用于将结构化的数据序列化或者反序列化,被设计成语言以及平台中立,protobuff比xml更简单比json还要紧凑一些,网上有一些关于 ...

  5. java&Protocol Buffers

    ps: Protocol Buffers简称PB PB 安装配置 下载 PB: 在 PB 官网,下载最新版(或者其他版本)PB,这里为了与 Java 项目中的 PB Maven 依赖版本一致,使用 P ...

  6. protocol buffers的使用示例[z]

    [http://blog.csdn.net/zhu_xun/article/details/19397081] protocol buffers的使用示例 如果不了解protocol buffers, ...

  7. 理解netty对protocol buffers的编码解码

    一,netty+protocol buffers简要说明 Netty是业界最流行的NIO框架之一优点:1)API使用简单,开发门槛低:2)功能强大,预置了多种编解码功能,支持多种主流协议:3)定制能力 ...

  8. Protocol Buffers(Protobuf) 官方文档--Protobuf语言指南

    Protocol Buffers(Protobuf) 官方文档--Protobuf语言指南 约定:为方便书写,ProtocolBuffers在下文中将已Protobuf代替. 本指南将向您描述如何使用 ...

  9. Protocol Buffers(Protobuf)开发者指南---概览

    Protocol Buffers(Protobuf)开发者指南---概览 欢迎来到protocol buffers的开发者指南文档,protocol buffers是一个与编程语言无关‘.系统平台无关 ...

随机推荐

  1. ZOJ-2366 Weird Dissimilarity 动态规划+贪心

    题意:现给定一个字符集中一共Z个元素的环境,给出一个Z*Z的数组,表示从i到j之间的距离.给定两组字符串,分别问包含着两个字符串(给定的字符串为所求字符串的子序列不是子串)对应位的距离和值最小为多少? ...

  2. iOS - UIColor

    前言 NS_CLASS_AVAILABLE_IOS(2_0) @interface UIColor : NSObject <NSSecureCoding, NSCopying> @avai ...

  3. 使用BAPI_ACC_DOCUMENT_POST,创建会计凭证,用BADI扩展字段(转)

    业务需求:和银行做一个接口,要通过银行流水产生会计凭证,会计凭证的事务码是F-02,查到了BAPI方法BAPI_ACC_DOCUMENT_POST.昨天测试发现,有一些参数在BAPI_ACC_DOCU ...

  4. .net 连接sqlserver类库

    using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.Da ...

  5. hdu 4965 Fast Matrix Calculation

    题目链接:hdu 4965,题目大意:给你一个 n*k 的矩阵 A 和一个 k*n 的矩阵 B,定义矩阵 C= A*B,然后矩阵 M= C^(n*n),矩阵中一切元素皆 mod 6,最后求出 M 中所 ...

  6. python网络编程socket之多线程

    #coding:utf-8 __author__ = 'similarface' import os,socket,threading,SocketServer SERVER_HOST='localh ...

  7. python剑指网络

    >>> #获取hostname ... >>> host_name=socket.gethostname() >>> print "%s ...

  8. spring boot学习

    window10的环境下 使用apache-maven-3.3.9 到https://github.com/spring-projects/spring-boot 下载源码包,解压到c:\spring ...

  9. signalR的一些细节

    获取根目录通过AppDomain.CurrentDomain.BaseDirectory 因为不能直接获取session ,使用的替代方案如下 private static Dictionary< ...

  10. css3开门

    <!DOCTYPE html> <html> <head lang="en"> <meta charset="UTF-8&quo ...