This chapter covers
■ The origins of Hadoop, HBase, and NoSQL
■ Common use cases for HBase
■ A basic HBase installation
■ Storing and querying data with HBase






HBase is a database: the Hadoop database. It’s often described as a sparse, distributed, persistent, multidimensional sorted map, which is indexed by rowkey, column key, and timestamp. You’ll hear people refer to it as a key value store, a column family-oriented  database,  and  sometimes  a  database  storing  versioned  maps  of maps. All these descriptions are correct. But fundamentally, it’s a platform for storing and retrieving data with random access, meaning you can write data as you like and read it back again as you need it. HBase stores structured and semistructured data naturally so you can load it with tweets and parsed log files and a catalog of all your  products  right  along  with  their  customer reviews. It can store unstructured data too, as long as it’s not too large. It doesn’t care about types and allows for a dynamic and flexible data model that doesn’t constrain the kind of data you store.

HBase本身界定为数据库:是基于Hadoop框架上的数据库。它采用一种稀疏的、分布式的、持久化的、多维度的、排序的映射(map)存储模式,这种存储模式是基于数据行的主键(row key),数据列的主键(column key)与时间戳(timestamp)来建立索引的。平常人们更倾向于把它看作是键值式(key-valune)的存储系统,面向列式(column family-oriented)存储的数据库,或是保存多版本数据映射(map)的映射(map)的数据库。但是从根本上来讲,它是一个采用随机访问的数据存取平台,你可以基于它来随意写入保存你的数据,同时在需要时把这些数据读取出来。HBase支持存储结构化和半结构化数据,所以你可以用它来存储微博,解析日志文件,分类存储所有的产品信息以及产品的顾客评论。它也可以存储非结构化数据,不过这些数据最好不要太大。它对数据类型并不敏感,允许建立动态灵活的,同时不限制于数据类型的数据模型。

HBase  isn’t  a  relational  database  like  the  ones  to  which  you’re  likely  accustomed. It doesn’t speak SQLor enforce relationships within your data. It doesn’t allow interrow transactions, and it doesn’t mind storing an integer in one row and a string in another for the same column.


HBase is designed to run on a cluster of computers instead of a single computer. The cluster can be built using commodity hardware; HBase scales horizontally as you add more machines to the cluster. Each node in the cluster provides a bit of storage, a bit of cache, and a bit of computation as well. This makes HBase incredibly flexible and forgiving. No node is unique, so if one of those machines breaks down, you simply replace it with another. This adds up to a powerful, scalable approach to data that, until now, hasn’t been commonly available to mere mortals.


Join the community

Unfortunately, no official public numbers specify the largest HBase clusters running in production. This kind of information easily falls under the realm of business confidential and isn’t often shared. For now, the curious must rely on footnotes in publications, bullets in presentations, and the friendly, unofficial chatter you’ll find at user groups, meet-ups, and conferences.

So participate! It’s good for you, and it’s how we became involved as well. HBase is an open source project in an extremely specialized space. It has well-financed competition from some of the largest software companies on the planet. It’s the community that created HBase and the community that keeps it competitive and innovative.

Plus, it’s an intelligent, friendly group. The best way to get started is to join the mailing lists.

1 You can follow the features, enhancements, and bugs being currently worked on using the JIRA site.

2 It’s open source and collaborative, and users like yourself drive the project’s direction and development.

Step up, say hello, and tell them we sent you!

不幸的是,目前没有官方公共数据指出最大的HBase集群生产环境运行情况是什么样的。 这种信息容易属于商业机密的范畴,不是经常会共享出来的。目前,这种好奇心只能是通过查看出版物的备注,演讲PPT的条目摘要,和友好的用户组信息,约会信息和会议信息来满足下。

过来吧,打声招呼,告诉他们,是我们推荐你来的,呵呵 !

Given that HBase has a different design and different goals as compared to traditional  database systems, building applications using HBase involves a different approach as well. This book is geared toward teaching you how to effectively use the features

HBase has to offer in building applications that are required to work with large  amounts of data. Before you set out on the journey of learning how to use HBase, let’s  get historical perspective about how HBase came into being and the motivations

behind it. We’ll then touch on use cases people have successfully solved using HBase.  If you’re like us, you’ll want to play with HBase before going much further. We’ll wrap  up by walking through installing HBase on your laptop, tossing in some data, and pulling it out. Context is important, so let’s start at the beginning.

HBase相比传统的数据库系统有着不同的设计理念和不同的设计目标,构建应用程序使用HBase会涉及到一些不同的设计方法。这本书是针对如何有效地使用 HBase为处理大数据的应用程序服务。在你开始学习如何使用HBase之前,让我们一起从历史的角度出发看看HBase创造出来的动机和它背后的渊源。然后我们将了解一些人们使用HBase成功解决问题的案例。如果你像我们一样,希望把HBase应用得更好。那我们就继续深入,在你的笔记本电脑上安装HBase,插入一些数据,再查询出来。开发学习环境是很重要的,让我们一起从头开始做起吧。

HBase project mailing lists:

HBase JIRA site:

HBase JIRA网站:。

1.HBase In Action 第一章-HBase简介(后续翻译中)的更多相关文章

  1. 4.HBase In Action 第一章-HBase简介(1.1.2 数据创新)

    As we now know, many prominent internet companies, most notably Google, Amazon, Yahoo!, and Facebook ...

  2. 8.HBase In Action 第一章-HBase简介(1.2.2 捕获增量数据)

    Data often trickles in and is added to an existing data store for further usage, such as analytics, ...

  3. 7.HBase In Action 第一章-HBase简介(1.2.1 典型的网络搜索问题:Bigtable的起原)

    Search is the act of locating information you care about: for example, searching for pages in a text ...

  4. 6.HBase In Action 第一章-HBase简介(1.2 HBase的使用场景和成功案例)

    Sometimes the best way to understand a software product is to look at how it's used. The kinds of pr ...

  5. 5.HBase In Action 第一章-HBase简介(1.1.3 HBase的兴起)

    Pretend that you're working on an open source project for searching the web by crawling websites and ...

  6. 3.HBase In Action 第一章-HBase简介(1.1.1 大数据你好呀)

    Let's take a closer look at the term Big Data. To be honest, it's become something of a loaded term, ...

  7. 2.HBase In Action 第一章-HBase简介(1.1数据管理系统:快速学习)

    Relational database systems have been around for a few decades and have been hugely successful in so ...

  8. 第一章 C++简介

    第一章  C++简介 1.1  C++特点 C++融合了3种不同的编程方式:C语言代表的过程性语言,C++在C语言基础上添加的类代表的面向对象语言,C++模板支持的泛型编程. 1.2  C语言及其编程 ...

  9. python 教程 第一章、 简介

    第一章. 简介 官方介绍: Python是一种简单易学,功能强大的编程语言,它有高效率的高层数据结构,简单而有效地实现面向对象编程.Python简洁的语法和对动态输入的支持,再加上解释性语言的本质,使 ...


  1. vscode 自动提示Threejs

    转自: 1.首先,你要安装Node.js 2.在vscode的 查看-> ...

  2. 新版TeamTalk部署教程

    新版TeamTalk部署教程 新版TeamTalk已经在2015年03月28日发布了,目前版本定为1.0.0版本,后续版本号会按照如下规则进行:1.版本规则按照x.y.z的形式进行.2.各端小bug修 ...

  3. JSP学习笔记(6)-使用数据库

    1.连接MySQL数据库 1.1.JDBC JDBC(Java Database Connectivity)提供了访问数据库的API,由一些Java类和接口组成,是Java运行平台核心库中的一部分.用 ...

  4. Chromium源码--网络请求流程分析

    转载请注明出处: 本文探讨一下chromium中加载URL的流程,具体来说是从地址栏输入URL地址到通过URLR ...

  5. longing加载中实例

    利用图片播放 <div class="wrap" id="wrap" style="position: inherit; height: 604 ...

  6. 自定义适用于手机和平板电脑的 Dynamics 365(一):主页

    当用户首次打开适用于手机和平板电脑的 Dynamics 365 时,他们将看到默认为“销售仪表板”的主页. 您可以创建新仪表板或 Web 应用程序中编辑现有仪表板,然后为移动设备启用它们,用户可以选择 ...

  7. 机器学习实战(Machine Learning in Action)学习笔记————09.利用PCA简化数据

    机器学习实战(Machine Learning in Action)学习笔记————09.利用PCA简化数据 关键字:PCA.主成分分析.降维作者:米仓山下时间:2018-11-15机器学习实战(Ma ...

  8. 团队项目个人进展——Day10

    一.昨天工作总结 冲刺第十天,与小组成员任务合并,并解决结合后的一些问题. 二.遇到的问题 界面还是不太和谐 三.今日工作规划 对页面的布局.wxss做了一些修改

  9. synchronized的四种用法

    一 修饰方法  Synchronized修饰一个方法很简单,就是在方法的前面加synchronized,synchronized修饰方法和修饰一个代码块类似,只是作用范围不一样,修饰代码块是大括号括起 ...

  10. wordpress使用七牛云加速

    一.准备工作. wordpress搭建的网站 七牛云账号 二.简要步骤 1.wordpress安装七牛云插件. WordPress七牛镜像存储插件已经被WordPress官方收录,可以直接在wordp ...