As we now know, many prominent internet companies, most notably Google, Amazon, Yahoo!, and Facebook, were on the forefront of this explosion of data. Some generated their own data, and others collected what was freely available; but managing these vastly different kinds of datasets became core to doing business. They all started by building on the technology available at the time, but the limitations of this technology became limitations on the continued growth and success of these businesses. Although data management technology wasn’t core to the businesses, it became essential for doing business. The ensuing internal investment in technical research resulted in many new experiments in data technology.

正如我们现在所知道的,许多著名的互联网公司,尤其是谷歌,亚马逊,雅虎和Facebook,他们冲在了这些爆炸性增长数据处理技术的前沿。其中一些公司的系统在生成这些数据,而另一些公司则是收集那些免费的数据, 而另一些是免费的。管理这些截然不同的数据集明显已经成为了公司处理业务的核心支撑。他们都开始通过当前的可用技术来建立自己的系统,但这些技术的局限性明显成为限制这些企业业务持续增长和成功的约束。尽管数据管理技术并不是企业的核心业务,但它已经成为了企业业务处理所必需的支撑,于是企业的内部技术研究性投资,促进了许多数据技术方面的新试验。

http://www.uifanr.com/

Although many companies kept their research closely guarded, Google chose to talk about its successes. The publications that shook things up were the Google File System and MapReduce papers. Taken together, these papers represented a novel approach to the storage and processing of data. Shortly thereafter, Google published the Bigtable paper, which provided a complement to the storage paradigm provided by its file system. Other companies built on this momentum, both the ideas and the habit of publishing their successful experiments. As Google’s publications provided insight into indexing the internet, Amazon published Dynamo, demystifying a fundamental component of the company’s shopping cart.

尽管许多公司继续严格保密自己的研究成果,但是谷歌选择了开放并探讨它的成果。它发行和出版了Google文件系统和MapReduce的论文。这些论文展示了数据存储和处理的新方法。此后不久,谷歌发表这篇Bigtable的论文,提供了一份关于它的存储文件系统的存储模式的范例。其他公司看到这样一种势头, 也开始展示他们的成功功经验与成果。谷歌的出版公司提供了互联网索引技术的细节内容,而亚马逊公司则公布了该公司购物车模块的一个神秘组成部分。

http://www.uifanr.com/

It didn’t take long for all these new ideas to begin condensing into open source implementations. In the years following, the data management space has come to host  all manner of projects. Some focus on fast key-value stores, whereas others provide native data structures or document-based abstractions. Equally diverse are the intended access patterns and data volumes these technologies support. Some forego  writing data to disk, sacrificing immediate persistence for performance. Most of these technologies don’t hold ACID guarantees as sacred. Although proprietary products do exist, the vast majority of the technologies are open source projects. Thus, these technologies as a collection have come to be known as NoSQL.

没过多久,所有这些新想法开始变成了开源实现。之后的几年里,源代码管理空间里托管了各种各样的这种类型的项目。一些专注于快速键值对存储功能,而另一些则提供原生的数据结构或基于文档的抽象存储功能。不同之处就在于这些技术支持不同的访问模式和数据量。一些技术选择放弃将数据实时写入磁盘,为了性能而牺牲了立即持久的特性。大部分的这些技术不再信奉ACID的约束。虽然这类技术有的是选择成为了专利产品,但绝大多数这类技术还是属于开放源码的项目。这些技术统称为NoSQL技术。

http://www.uifanr.com/

Where does HBase fit in? HBase does qualify as a NoSQL store. It provides a key value API, although with a twist not common in other key-value stores. It promises strong consistencyso clients can see data immediately after it’s written. HBase runs on multiple nodes in a cluster instead of on a single machine. It doesn’t expose this detail to its clients. Your application code doesn’t know if it’s talking to 1 node or 100, which makes things simpler for everyone. HBase is designed for terabytes to petabytes of data, so it optimizes for this use case. It’s a part of the Hadoop ecosystemand depends on some key features, such as data redundancy and batch processing, to be provided by other parts of Hadoop.

HBase的适合用于什么场景呢? HBase的确有资格称为NoSQL存储。它提供了一个键值对操作API,尽管与其他键值对存储系统不太一样。它提供了强一致性,客户端可以立即看到它刚写入后和数据。 HBase运行于多服务器节点的集群中,而不是在单台机器上。但它的客户端并不需要理会这个细节。客户端应用程序的代码根本不知道它是在和1个节点还是100个节点通讯,这使得大家使用起来感觉更简单。 HBase的是专为TB到PB级的数据而准备的,所以它针对这种场景做了专门的优化。它是Hadoop生态圈的一部分,并依赖于生态圈中的一些关键功能,例如,数据冗余和批量处理,这些关键功能是由Hadoop生态圈的其他部分提供的,而不是HBase本身所具备的。

http://www.uifanr.com/

Now that you have some context for the environment at large, let’s consider specifically the beginnings of HBase.

好,现在大家了解了大数据方面的一些来龙去脉了,那么让我们开始HBase的专项学习吧。

http://www.uifanr.com/

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System,” Google Research Publications, http://research.google.com/archive/gfs.html.

Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Google Research Publications, http://research.google.com/archive/mapreduce.html.

Fay Chang et al., “Bigtable: A Distributed Storage System for Structured Data,” Google Research Publications, http://research.google.com/archive/bigtable.html.

Werner Vogels, “Amazon’s Dynamo,” All Things Distributed, www.allthingsdistributed.com/2007/10/amazons_dynamo.html.

桑杰 Ghemawat,霍华德Gobioff和舜德亮,“谷歌文件系统”,谷歌研究出版物,http://research.google.com/archive/gfs.html。

杰弗里·迪恩和桑杰 Ghemawat“MapReduce:简化数据处理的大型集群”,谷歌研究出版物,http://research.google.com/archive/mapreduce.html。

费伊Chang等人,“Bigtable:一个结构化数据的分布式存储系统”,谷歌研究出版物,http://research.google.com/archive/bigtable.html。

维尔纳·沃格尔,“亚马逊的发电机”,万物分布式出版社区,www.allthingsdistributed.com/2007/10/amazons_dynamo.html。

http://www.uifanr.com/

4.HBase In Action 第一章-HBase简介(1.1.2 数据创新)的更多相关文章

  1. 8.HBase In Action 第一章-HBase简介(1.2.2 捕获增量数据)

    Data often trickles in and is added to an existing data store for further usage, such as analytics, ...

  2. 1.HBase In Action 第一章-HBase简介(后续翻译中)

    This chapter covers ■ The origins of Hadoop, HBase, and NoSQL ■ Common use cases for HBase ■ A basic ...

  3. 7.HBase In Action 第一章-HBase简介(1.2.1 典型的网络搜索问题:Bigtable的起原)

    Search is the act of locating information you care about: for example, searching for pages in a text ...

  4. 6.HBase In Action 第一章-HBase简介(1.2 HBase的使用场景和成功案例)

    Sometimes the best way to understand a software product is to look at how it's used. The kinds of pr ...

  5. 5.HBase In Action 第一章-HBase简介(1.1.3 HBase的兴起)

    Pretend that you're working on an open source project for searching the web by crawling websites and ...

  6. 3.HBase In Action 第一章-HBase简介(1.1.1 大数据你好呀)

    Let's take a closer look at the term Big Data. To be honest, it's become something of a loaded term, ...

  7. 2.HBase In Action 第一章-HBase简介(1.1数据管理系统:快速学习)

    Relational database systems have been around for a few decades and have been hugely successful in so ...

  8. 第一章 C++简介

    第一章  C++简介 1.1  C++特点 C++融合了3种不同的编程方式:C语言代表的过程性语言,C++在C语言基础上添加的类代表的面向对象语言,C++模板支持的泛型编程. 1.2  C语言及其编程 ...

  9. python 教程 第一章、 简介

    第一章. 简介 官方介绍: Python是一种简单易学,功能强大的编程语言,它有高效率的高层数据结构,简单而有效地实现面向对象编程.Python简洁的语法和对动态输入的支持,再加上解释性语言的本质,使 ...

随机推荐

  1. android中实现view可以滑动的六种方法续篇(二)

    承接上一篇,上一篇中讲解了实现滑动的第五种方法,如果你还没读过,可点击下面链接: http://www.cnblogs.com/fuly550871915/p/4985482.html 这篇文章现在来 ...

  2. 基于git的工作流程

    本文针对的是追求极致.快速的产品响应团队的.以下的观点和内容都是围绕这个主题,暂时不涉及个人学习和团队学习. 在说工作流程之间,想说一下我们平常工作中遇到的一些困惑或者说现象 在一个团队里,同时有好多 ...

  3. 读书笔记——Windows环境下32位汇编语言程序设计(2)配置环境

    一直想买本罗云彬的Win32汇编书,现在终于出典藏版了,就买了本,读一读,涨涨姿势. 我把笔记本光驱拆下来添加了个硬盘,现在想装回去发现坏了,所以守着CD盘,代码却用的是第三版的,这真是个悲剧啊. - ...

  4. spring提供的解决中文乱码方案

    在表单提交时,如果遇到中文符号会出现乱码问题. Spring提供一个CharacterEncodingFilter过滤器,可用于解决乱码问题. CharacterEncodingFilter使用的时候 ...

  5. 05_最长公共子序列问题(LCS)

    问题来源:刘汝佳<算法竞赛入门经典--训练指南> P60 问题7: 问题描述:给两个子序列A和B,求长度最大的公共子序列.比如1,5,2,6,8,和2,3,5,6,9,8,4的最长公共子序 ...

  6. nyoj 115 城市平乱 dijkstra最短路

    题目链接: http://acm.nyist.net/JudgeOnline/problem.php?pid=115 dijkstra算法. #include "stdio.h" ...

  7. 关于macOS Sierra无法使用gdb进行调试的解决方案

    1.对gdb进行签名,签名过程详见:http://jingyan.baidu.com/article/d169e1864dc24d436611d839.html: 2.重新启动系统,同时按住键盘上的c ...

  8. 有用的MySQL语句

    摘自onefish资料库 1. 计算年数你想通过生日来计算这个人有几岁了. SELECT DATE_FORMAT(FROM_DAYS(TO_DAYS(now()) - TO_DAYS(@dateofb ...

  9. linux设置tomcat开机启动

    [root@iZ94j7ahvuvZ ~]# cd /etc/rc.d/ [root@iZ94j7ahvuvZ rc.d]# cat rc.local #!/bin/sh # # This scrip ...

  10. 2014 UESTC 暑前集训队内赛(1) 解题报告

    A.Planting Trees 排序+模拟 常识问题,将耗时排一个序,时间长的先种,每次判断更新最后一天的时间. 代码: #include <iostream> #include < ...