转载请标注原链接:http://www.cnblogs.com/xczyd/p/5577124.html

客户在使用HBase的时候,经常会抱怨说写入太慢,并发上不去等等。从前我遇到这种情况,一般都二话不说,直接去看HBase集群的负载,看看有什么性能瓶颈等等。

某老司机说,且慢,先看看用户怎么写的客户端访问HBase集群的代码。

于是花了一些时间去看。

不看不知道,一看就吓尿。客户(也包括我们自己的实施)写出来的客户端,很多时候存在很多低级错误,比如:

(1)滥用sychronize;

(2)创建了连接不释放;

(3)明明只需要调用一次的API,却进行了多次调用,要是碰巧遇到比较花时间的API,那性能就可想而知了;

(4)其他各种幺蛾子...

为此,本篇仅从HBase的Java API入手,通过源码分析和简单的实验,找到最合适Java API调用方法(主要服务于高并发场景)。

如果对HBase的Java API不熟悉的话,可以先去官网看一下文档。

下面开始正文:

使用Java API与HBase集群交互时,需要先创建一个HTable的实例,再使用该实例提供的方法来进行插入/删除/查询等操作。

要创建HTable对象,要先创建一个包含了HBase集群信息的配置实例Configuration conf,其一般创建方法如下:

  1. Configuration conf = HBaseConfiguration.create();
  2. //设置HBase集群的IP和端口
  3. conf.set("hbase.zookeeper.quorum", "XX.XXX.X.XX");
  4. conf.set("hbase.zookeeper.property.clientPort", "2181");

在拥有了conf之后,可以通过HTable提供的如下两种构造方法来创建HTable实例:

方法一:直接利用conf来创建HTable实例

对应的构造函数如下:

  1. public HTable(Configuration conf, final TableName tableName)
  2. throws IOException {
  3. this.tableName = tableName;
  4. this.cleanupPoolOnClose = this.cleanupConnectionOnClose = true;
  5. if (conf == null) {
  6. this.connection = null;
  7. return;
  8. }
  9. this.connection = HConnectionManager.getConnection(conf);
  10. this.configuration = conf;
  11.  
  12. this.pool = getDefaultExecutor(conf);
  13. this.finishSetup();
  14. }

注意红色部分的代码。在这种构造方法中,会调用HConnectionManager的getConnection函数,这个函数以conf作为输入参数,来获取了一个HConnection的实例connection。熟悉odbc,jdbc的话,会知道使用Java API进行数据库操作的时候,都会创建一个类似的connection/connection pool来维护一些数据库与客户端之间相互的连接。对于Hbase来说,承担类似角色的就是HConnection。不过与oracle不同的一点是,HConnection实际上去连接的并不是HBase集群本身,而是维护其关键数据信息的Zookeeper(简称ZK)集群。有关ZK的内容在这里不做展开,不熟悉的话可以单纯地理解为一个独立的元信息管理角色。回过来看getConnection函数,其具体实现如下:

  1. public static HConnection getConnection(final Configuration conf)
  2. throws IOException {
  3. HConnectionKey connectionKey = new HConnectionKey(conf);
  4. synchronized (CONNECTION_INSTANCES) {
  5. HConnectionImplementation connection = CONNECTION_INSTANCES.get(connectionKey);
  6. if (connection == null) {
  7. connection = (HConnectionImplementation)createConnection(conf, true);
  8. CONNECTION_INSTANCES.put(connectionKey, connection);
  9. } else if (connection.isClosed()) {
  10. HConnectionManager.deleteConnection(connectionKey, true);
  11. connection = (HConnectionImplementation)createConnection(conf, true);
  12. CONNECTION_INSTANCES.put(connectionKey, connection);
  13. }
  14. connection.incCount();
  15. return connection;
  16. }
  17. }

其中,CONNECTION_INSTANCES的类型是LinkedHashMap<HConnectionKey,HConnectionImplementation>。所谓HConnectionImplementation其实就是HConnection的具体实现。继续注意红色部分的三行代码。第一行,通过conf创建了一个HConnectionKey的实例connectionKey;第二行,去CONNECTION_INSTANCES中查找是否存在与connectionKey对应的一个HConnection的实例;第三行,如果不存在,那么调用createConnection来创建一个HConnection的实例,否则直接返回刚才从Map中查找得到的HConnection对象

不嫌麻烦,再看一下HConnectionKey的构造函数和重写的hashCode函数,代码分别如下:

  1. HConnectionKey(Configuration conf) {
  2. Map<String, String> m = new HashMap<String, String>();
  3. if (conf != null) {
  4. for (String property : CONNECTION_PROPERTIES) {
  5. String value = conf.get(property);
  6. if (value != null) {
  7. m.put(property, value);
  8. }
  9. }
  10. }
  11. this.properties = Collections.unmodifiableMap(m);
  12.  
  13. try {
  14. UserProvider provider = UserProvider.instantiate(conf);
  15. User currentUser = provider.getCurrent();
  16. if (currentUser != null) {
  17. username = currentUser.getName();
  18. }
  19. } catch (IOException ioe) {
  20. HConnectionManager.LOG.warn("Error obtaining current user, skipping username in HConnectionKey", ioe);
  21. }
    }
  1. public int hashCode() {
  2. final int prime = 31;
  3. int result = 1;
  4. if (username != null) {
  5. result = username.hashCode();
  6. }
  7. for (String property : CONNECTION_PROPERTIES) {
  8. String value = properties.get(property);
  9. if (value != null) {
  10. result = prime * result + value.hashCode();
  11. }
  12. }
  13.  
  14. return result;
  15. }

可以看到,hashCode函数被重写以后,其返回值实际上是username的hashCode函数的返回值,而username来自于currentuser,currentuser又来自于provider,provider是由conf创建的。可以看出,只要有相同的conf,就能创建出相同的username,也就能保证HConnectionKey的hashCode函数被重写以后,能够在username相同时返回相同的值。而CONNECTION_INSTANCES是一个LinkedHashMap,其get函数会调用HConnectionKey的hashCode函数来判断该对象是否已经存在。因此,getConnection函数的本质就是根据conf信息返回connection对象,对每一个内容相同的conf,只会返回一个connection

方法二:调用createConnection方法来显式地创建Hconnection的实例,再将其作为输入参数来创建HTable实例

createConnection方法和Htable对应的构造函数分别如下:

  1. public static HConnection createConnection(Configuration conf) throws IOException {
  2. UserProvider provider = UserProvider.instantiate(conf);
  3. return createConnection(conf, false, null, provider.getCurrent());
  4. }
  5.  
  6. static HConnection createConnection(final Configuration conf, final boolean managed,final ExecutorService pool, final User user)
    throws IOException {
  7. String className = conf.get("hbase.client.connection.impl",HConnectionManager.HConnectionImplementation.class.getName());
  8. Class<?> clazz = null;
  9. try {
  10. clazz = Class.forName(className);
  11. } catch (ClassNotFoundException e) {
  12. throw new IOException(e);
  13. }
  14. try {
  15. // Default HCM#HCI is not accessible; make it so before invoking.
  16. Constructor<?> constructor =
  17. clazz.getDeclaredConstructor(Configuration.class,
  18. boolean.class, ExecutorService.class, User.class);
  19. constructor.setAccessible(true);
  20. return (HConnection) constructor.newInstance(conf, managed, pool, user);
  21. } catch (Exception e) {
  22. throw new IOException(e);
  23. }
  24. }
  1. public HTable(TableName tableName, HConnection connection) throws IOException {
  2. this.tableName = tableName;
  3. this.cleanupPoolOnClose = true;
  4. this.cleanupConnectionOnClose = false;
  5. this.connection = connection;
  6. this.configuration = connection.getConfiguration();
  7.  
  8. this.pool = getDefaultExecutor(this.configuration);
  9. this.finishSetup();
  10. }

可以看出,这种构造HTable的方法会通过反射来创建一个新的HConnection实例,而不像方法一中那样共享一个HConnection实例。

值得一提的是,通过此种方法创建出来的HConnection,是需要在不再使用的时候显式调用close方法去释放掉的,否则容易造成端口占用等问题。

那么,上述两种方法,在执行插入/删除/查找的时候,性能如何呢?不妨先从代码角度分析一下。为了简便,先分析HTable在执行put(插入)操作时具体做的事情。

HTable的put函数如下:

  1. public void put(final Put put) throws InterruptedIOException, RetriesExhaustedWithDetailsException {
  2. doPut(put);
  3. if (autoFlush) {
  4. flushCommits();
  5. }
  6. }
  7.  
  8. private void doPut(Put put) throws InterruptedIOException, RetriesExhaustedWithDetailsException {
  9. if (ap.hasError()){
  10. writeAsyncBuffer.add(put);
  11. backgroundFlushCommits(true);
  12. }
  13.  
  14. validatePut(put);
  15.  
  16. currentWriteBufferSize += put.heapSize();
  17. writeAsyncBuffer.add(put);
  18.  
  19. while (currentWriteBufferSize > writeBufferSize) {
  20. backgroundFlushCommits(false);
  21. }
  22. }
  23.  
  24. private void backgroundFlushCommits(boolean synchronous) throws InterruptedIOException, RetriesExhaustedWithDetailsException {
  25. try {
  26. do {
  27. ap.submit(writeAsyncBuffer, true);
  28. } while (synchronous && !writeAsyncBuffer.isEmpty());
  29.  
  30. if (synchronous) {
  31. ap.waitUntilDone();
  32. }
  33.  
  34. if (ap.hasError()) {
  35. LOG.debug(tableName + ": One or more of the operations have failed -" +
  36. " waiting for all operation in progress to finish (successfully or not)");
  37. while (!writeAsyncBuffer.isEmpty()) {
  38. ap.submit(writeAsyncBuffer, true);
  39. }
  40. ap.waitUntilDone();
  41.  
  42. if (!clearBufferOnFail) {
  43. // if clearBufferOnFailed is not set, we're supposed to keep the failed operation in the
  44. // write buffer. This is a questionable feature kept here for backward compatibility
  45. writeAsyncBuffer.addAll(ap.getFailedOperations());
  46. }
  47. RetriesExhaustedWithDetailsException e = ap.getErrors();
  48. ap.clearErrors();
  49. throw e;
  50. }
  51. } finally {
  52. currentWriteBufferSize = 0;
  53. for (Row mut : writeAsyncBuffer) {
  54. if (mut instanceof Mutation) {
  55. currentWriteBufferSize += ((Mutation) mut).heapSize();
  56. }
  57. }
  58. }
  59. }

如红色部分所表示,调用顺序是put->doPut->backgroundFlushCommits->ap.submit,其中ap是类AsyncProcess的对象。因此追踪到AsyncProcess类,其代码如下:

  1. public void submit(List<? extends Row> rows, boolean atLeastOne) throws InterruptedIOException {
  2. submitLowPriority(rows, atLeastOne, false);
  3. }
  4.  
  5. public void submitLowPriority(List<? extends Row> rows, boolean atLeastOne, boolean isLowPripority) throws InterruptedIOException {
  6. if (rows.isEmpty()) {
  7. return;
  8. }
  9.  
  10. // This looks like we are keying by region but HRegionLocation has a comparator that compares
  11. // on the server portion only (hostname + port) so this Map collects regions by server.
  12. Map<HRegionLocation, MultiAction<Row>> actionsByServer = new HashMap<HRegionLocation, MultiAction<Row>>();
  13. List<Action<Row>> retainedActions = new ArrayList<Action<Row>>(rows.size());
  14.  
  15. long currentTaskCnt = tasksDone.get();
  16. boolean alreadyLooped = false;
  17.  
  18. NonceGenerator ng = this.hConnection.getNonceGenerator();
  19. do {
  20. if (alreadyLooped){
  21. // if, for whatever reason, we looped, we want to be sure that something has changed.
  22. waitForNextTaskDone(currentTaskCnt);
  23. currentTaskCnt = tasksDone.get();
  24. } else {
  25. alreadyLooped = true;
  26. }
  27.  
  28. // Wait until there is at least one slot for a new task.
  29. waitForMaximumCurrentTasks(maxTotalConcurrentTasks - 1);
  30.  
  31. // Remember the previous decisions about regions or region servers we put in the
  32. // final multi.
  33. Map<Long, Boolean> regionIncluded = new HashMap<Long, Boolean>();
  34. Map<ServerName, Boolean> serverIncluded = new HashMap<ServerName, Boolean>();
  35.  
  36. int posInList = -1;
  37. Iterator<? extends Row> it = rows.iterator();
  38. while (it.hasNext()) {
  39. Row r = it.next();
  40. HRegionLocation loc = findDestLocation(r, posInList);
  41.  
  42. if (loc == null) { // loc is null if there is an error such as meta not available.
  43. it.remove();
  44. } else if (canTakeOperation(loc, regionIncluded, serverIncluded)) {
  45. Action<Row> action = new Action<Row>(r, ++posInList);
  46. setNonce(ng, r, action);
  47. retainedActions.add(action);
  48. addAction(loc, action, actionsByServer, ng);
  49. it.remove();
  50. }
  51. }
  52. } while (retainedActions.isEmpty() && atLeastOne && !hasError());
  53.  
  54. HConnectionManager.ServerErrorTracker errorsByServer = createServerErrorTracker();
  55. sendMultiAction(retainedActions, actionsByServer, 1, errorsByServer, isLowPripority);
  56. }
  57.  
  58. private HRegionLocation findDestLocation(Row row, int posInList) {
  59. if (row == null) throw new IllegalArgumentException("#" + id + ", row cannot be null");
  60. HRegionLocation loc = null;
  61. IOException locationException = null;
  62. try {
  63. loc = hConnection.locateRegion(this.tableName, row.getRow());
  64. if (loc == null) {
  65. locationException = new IOException("#" + id + ", no location found, aborting submit for" +
  66. " tableName=" + tableName +
  67. " rowkey=" + Arrays.toString(row.getRow()));
  68. }
  69. } catch (IOException e) {
  70. locationException = e;
  71. }
  72. if (locationException != null) {
  73. // There are multiple retries in locateRegion already. No need to add new.
  74. // We can't continue with this row, hence it's the last retry.
  75. manageError(posInList, row, false, locationException, null);
  76. return null;
  77. }
  78.  
  79. return loc;
  80. }

这里代码的主要实现机制是异步调用,也就是说,并非每一次put操作都是直接往HBase里面写数据的,而是等到缓存区域内的数据多到一定程度(默认设置是2M),再进行一次写操作。当然这次操作在Server端应当还是要排队执行的,具体执行机制这里不作展开。可以确定的是,HConnection在插入/查询/删除的Java API中,只是起到一个定位RegionServer的作用,在定位到RegionServer之后,操作都是由client端通过rpc调用完成的,与客户端创建的connection的数目无关。另外,locateRegion其实只有在没有命中缓存的时候才会进行rpc通信,其他时候都是直接从缓存中获取RegionServer信息,详情可以查看locateRegion的源码,这里也不再展开。

代码分析告一段落,通过分析可以确定,createConnection的方法创建出大量的HConnection并不会对写入性能有任何帮助。相反,由于白白浪费了资源,还会比getConnection更慢。但是慢多少,无法仅凭代码作出判断。

不妨简单做一个实验来验证上述论断:

服务器环境:四台linux服务器组成的HBase集群, 内存64G,ping一次平均约5ms(严谨一点的话应该再提供一下cpu核数、频率,以及磁盘转速等信息)

客户端环境:在Mac上装的ubuntu虚拟机,分配内存10G,CPU、网络和磁盘读写速度都要比物理机慢不少,但是不影响结论

实验代码如下:

  1. public class HbaseConectionTest {
  2.  
  3. public static void main(String[] args) throws Exception{
  4.  
  5. Configuration conf = HBaseConfiguration.create();
  6.  
  7. conf.set("hbase.zookeeper.quorum", "XX.XXX.X.XX");
  8. conf.set("hbase.zookeeper.property.clientPort", "2181");
  9.  
  10. ThreadInfo info = new ThreadInfo();
  11. info.setTableNamePrefix("test");
  12. info.setColNames("col1,col2");
  13. info.setTableCount(1);
  14. info.setConnStrategy("CREATEWITHCONF");//CREATEWITHCONF,CREATEWITHCONN
  15. info.setWriteStrategy("SEPERATE");//OVERLAP,SEPERATE
  16. info.setLifeCycle(60000L);
  17.  
  18. int threadCount = 100;
  19.  
  20. for(int i=0;i<threadCount;i++){
  21. //createTable(tableNamePrefix+i,colNames,conf);
  22. }
  23.  
  24. //
  25. for(int i=0;i<threadCount;i++){
  26. new Thread(new WriteThread(conf,info,i)).start();
  27. }
  28.  
  29. //HBaseAdmin admin = new HBaseAdmin(conf);
  30.  
  31. //System.out.println(admin.tableExists("test"));
  32.  
  33. }
  34.  
  35. public static void createTable(String tableName,String[] colNames,Configuration conf) {
  36. System.out.println("start create table "+tableName);
  37. try {
  38.  
  39. HBaseAdmin hBaseAdmin = new HBaseAdmin(conf);
  40. if (hBaseAdmin.tableExists(tableName)) {
  41. System.out.println(tableName + " is exist");
  42. //hBaseAdmin.disableTable(tableName);
  43. //hBaseAdmin.deleteTable(tableName);
  44. return;
  45. }
  46. HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);
  47. for(int i=0;i<colNames.length;i++) {
  48. tableDescriptor.addFamily(new HColumnDescriptor(colNames[i]));
  49. }
  50. hBaseAdmin.createTable(tableDescriptor);
  51. } catch (Exception ex) {
  52. ex.printStackTrace();
  53. }
  54. System.out.println("end create table "+tableName);
  55. }
  56.  
  57. }
  58.  
  59. //Thread执行操作的配置信息
  60. class ThreadInfo {
  61.  
  62. private int tableCount;
  63.  
  64. String tableNamePrefix;
  65. String[] colNames;
  66.  
  67. //CREATEBYCONF or CREATEBYCONN
  68. String connStrategy;
  69.  
  70. //overlap or seperate
  71. String writeStrategy;
  72.  
  73. long lifeCycle;
  74.  
  75. public ThreadInfo(){
  76.  
  77. }
  78.  
  79. public int getTableCount() {
  80. return tableCount;
  81. }
  82.  
  83. public void setTableCount(int tableCount) {
  84. this.tableCount = tableCount;
  85. }
  86.  
  87. public String getTableNamePrefix() {
  88. return tableNamePrefix;
  89. }
  90.  
  91. public void setTableNamePrefix(String tableNamePrefix) {
  92. this.tableNamePrefix = tableNamePrefix;
  93. }
  94.  
  95. public String[] getColNames() {
  96. return colNames;
  97. }
  98.  
  99. public void setColNames(String[] colNames) {
  100. this.colNames = colNames;
  101. }
  102.  
  103. public void setColNames(String colNames) {
  104. if(colNames == null){
  105. this.colNames = null;
  106. }
  107. else{
  108. this.colNames = colNames.split(",");
  109. }
  110. }
  111.  
  112. public String getWriteStrategy() {
  113. return writeStrategy;
  114. }
  115.  
  116. public void setWriteStrategy(String writeStrategy) {
  117. this.writeStrategy = writeStrategy;
  118. }
  119.  
  120. public String getConnStrategy() {
  121. return connStrategy;
  122. }
  123.  
  124. public void setConnStrategy(String connStrategy) {
  125. this.connStrategy = connStrategy;
  126. }
  127.  
  128. public long getLifeCycle() {
  129. return lifeCycle;
  130. }
  131.  
  132. public void setLifeCycle(long lifeCycle) {
  133. this.lifeCycle = lifeCycle;
  134. }
  135.  
  136. }
  137.  
  138. class WriteThread implements Runnable{
  139.  
  140. private Configuration conf;
  141. private ThreadInfo info;
  142. private int index;
  143.  
  144. public WriteThread(Configuration conf,ThreadInfo info,int index){
  145. this.conf = conf;
  146. this.info = info;
  147. this.index = index;
  148. }
  149.  
  150. @Override
  151. public void run(){
  152.  
  153. String threadName = Thread.currentThread().getName();
  154. int operationCount = 0;
  155.  
  156. HTable[] htables = null;
  157. HConnection conn = null;
  158.  
  159. int tableCount = info.getTableCount();
  160.  
  161. String tableNamePrefix = info.getTableNamePrefix();
  162. String[] colNames = info.getColNames();
  163.  
  164. String connStrategy = info.getConnStrategy();
  165. String writeStrategy = info.getWriteStrategy();
  166.  
  167. long lifeCycle = info.getLifeCycle();
  168.  
  169. System.out.println(threadName+": started with index "+index);
  170.  
  171. try{
  172. if (connStrategy.equals("CREATEWITHCONN")) {
  173.  
  174. conn = HConnectionManager.createConnection(conf);
  175.  
  176. if (writeStrategy.equals("SEPERATE")) {
  177. htables = new HTable[1];
  178. htables[0] = new HTable(TableName.valueOf(tableNamePrefix+(index%tableCount)), conn);
  179. }
  180. else if(writeStrategy.equals("OVERLAP")) {
  181. htables = new HTable[tableCount];
  182. for (int i = 0; i < tableCount; i++) {
  183. htables[i] = new HTable(TableName.valueOf(tableNamePrefix+i), conn);
  184. }
  185. }
  186. else{
  187. return;
  188. }
  189. }
  190. else if (connStrategy.equals("CREATEWITHCONF")) {
  191.  
  192. conn = null;
  193.  
  194. if (writeStrategy.equals("SEPERATE")) {
  195. htables = new HTable[1];
  196. htables[0] = new HTable(conf,TableName.valueOf(tableNamePrefix+(index%tableCount)));
  197. }
  198. else if(writeStrategy.equals("OVERLAP")) {
  199. htables = new HTable[tableCount];
  200. for (int i = 0; i < tableCount; i++) {
  201. htables[i] = new HTable(conf,TableName.valueOf(tableNamePrefix+i));
  202. }
  203. }
  204. else{
  205. return;
  206. }
  207. }
  208. else {
  209. return;
  210. }
  211.  
  212. long start = System.currentTimeMillis();
  213. long end = System.currentTimeMillis();
  214.  
  215. Map<HTable,HColumnDescriptor[]> table_columnFamilies = new HashMap<HTable,HColumnDescriptor[]>();
  216. for(int i=0;i<htables.length;i++){
  217. table_columnFamilies.put(htables[i],htables[i].getTableDescriptor().getColumnFamilies());
  218. }
  219.  
  220. while(end-start<=lifeCycle){
  221. HTable table = htables.length==1?htables[0]:htables[(int)Math.random()*htables.length];
  222. long s1 = System.currentTimeMillis();
  223. double r = Math.random();
  224. HColumnDescriptor[] columnFamilies = table_columnFamilies.get(table);
  225. Put put = generatePut(threadName,columnFamilies,colNames,operationCount);
  226. table.put(put);
  227. if(r>0.999){
  228. System.out.println(System.currentTimeMillis()-s1);
  229. }
  230. operationCount++;
  231. end = System.currentTimeMillis();
  232. }
  233.  
  234. if(conn != null){
  235. conn.close();
  236. }
  237.  
  238. }catch(Exception ex){
  239. ex.printStackTrace();
  240. }
  241.  
  242. System.out.println(threadName+": ended with operation count:"+operationCount);
  243. }
  244.  
  245. private Put generatePut(String threadName,HColumnDescriptor[] columnFamilies,String[] colNames,int operationCount){
  246. Put put = new Put(Bytes.toBytes(threadName+"_"+operationCount));
  247. for (int i = 0; i < columnFamilies.length; i++) {
  248. String familyName = columnFamilies[i].getNameAsString();
  249. //System.out.println("familyName:"+familyName);
  250. for(int j=0;j<colNames.length;j++){
  251. if(familyName.equals(colNames[j])) { //
  252. String columnName = familyName+(int)(Math.floor(Math.random()*5+10*j));
  253. String val = ""+columnName.hashCode()%100;
  254. put.add(Bytes.toBytes(familyName),Bytes.toBytes(columnName),Bytes.toBytes(val));
  255. }
  256. }
  257. }
  258. //System.out.println(put.toString());
  259. return put;
  260. }
  261. }

简单来说就是先创建一些有两列的HBase表,然后创建一些线程分别采用getConnection策略和createConnection策略来写1分钟的数据。当然写几张表,写多久,写什么,怎么写都可以调整。比如我这里就设计了固定写一张表或者随机写一张表几种逻辑。需要注意一下红色部分的代码,这里预先获得了要写的HBase表的列信息。做这个动作的原因是getTableDescriptor是会产生网络开销的,建议写代码时尽量少调用,以免增加不必要的额外开销(事实上这个额外开销是很巨大的)。

具体实验数据如下表所示,具体值因为网络波动等原因会有所差异。总的来说,在并发较高(线程数大于30)的时候,getConnection方法速度要明显快于createConnection;在并发较低的(线程数小于等于10)的时候,createConnection则稍微占优。另外,使用getConnection的时候,写一张表的速度在高并发场景下要明显快于写多张表,但是在低并发情况下此现象不明显;使用createConnection的时候,无论并发高低,写一张表的速度与写多张表大致相同,甚至还偏慢。

上述现象与代码分析的结果并不完全一致。不一致的地方主要包括如下两点:

(1)为什么线程少的时候,createConnection占优?理论上应该持平才是。这一点无法得到很合理的解释,存疑;

(2)为什么线程很多的时候,createConnection会慢这么多?这里猜测服务端的ZK要维护大量连接会负载过大,即便是多个regionServer在负责具体的写操作,也仍旧会导致性能下降。

这两个疑点还有待进一步论证。尽管如此,还是可以先建议大家在使用Java API与HBase交互时,尤其是处理高并发场景的时候,尽量使用getConnection的办法去创建HTable对象,避免维护不必要的connection导致浪费资源。

thread_count table_count conn_strategy write_strategy interval result
1 1 CONF OVERLAP 60s 10000*1=10000
5 1 CONF OVERLAP 60s 11000*5=55000
10 1 CONF OVERLAP 60s 12000*10=120000
30 1 CONF OVERLAP 60s 8300*30=249000
60 1 CONF OVERLAP 60s 6000*60=360000
100 1 CONF OVERLAP 60s 4700*100=470000
1 1 CONN OVERLAP 60s 12000*1=12000
5 1 CONN OVERLAP 60s 16000*5=80000
10 1 CONN OVERLAP 60s 10000*10=100000
30 1 CONN OVERLAP 60s 2500*30=75000
60 1 CONN OVERLAP 60s 1200*60=72000
100 1 CONN OVERLAP 60s 1000*100=100000
5 5 CONF SEPERATE 60s 10600*5=53000
10 10 CONF SEPERATE 60s 11900*10=119000
30 30 CONF SEPERATE 60s 6900*30=207000
60 60 CONF SEPERATE 60s 3650*60=219000
100 100 CONF SEPERATE 60s 2500*100=250000
5 5 CONN SEPERATE 60s 14000*5=70000
10 10 CONN SEPERATE 60s 10500*10=105000
30 30 CONN SEPERATE 60s 3250*30=97500
60 60 CONN SEPERATE 60s 1450*60=87000
100 100 CONN SEPERATE 60s 930*100=93000

HBase学习笔记-HBase性能研究(1)的更多相关文章

  1. HBASE学习笔记(四)

    这两天把要前几天的知识点回顾一下,接下来我会用自己对知识点的理解来写一些东西 一.知识点回顾 1.hbase集群启动:$>start-hbase.sh ===>hbase-daemon.s ...

  2. HBase学习笔记之HBase的安装和配置

    HBase学习笔记之HBase的安装和配置 我是为了调研和验证hbase的bulkload功能,才安装hbase,学习hbase的.为了快速的验证bulkload功能,我安装了一个节点的hadoop集 ...

  3. loadrunner 场景设计-学习笔记之性能误区

    场景设计-学习笔记之性能误区 by:授客 QQ:1033553122 场景假设: 每个事务仅包含一次请求,执行10000个并发用户数 性能误区: 每秒并发用户数=每秒向服务器提交请求数 详细解答: 每 ...

  4. HBase学习笔记-高级(一)

    HBase1. hbase.id记录了集群的唯一标识:hbase.version记录了文件格式的版本号2. split和.corrupt目录在日志分裂过程中使用,以便保存一些中间结果和损坏的日志在表目 ...

  5. Hbase—学习笔记(一)

    此文的目的: 1.重点理解Hbase的整体工作机制 2.熟悉编程api,能够用来写程序 1.  什么是HBASE 1.1.   概念特性 HBASE是一个数据库----可以提供数据的实时随机读写 HB ...

  6. hbase 学习笔记一---基本概念

          说在前面,本文部分内容来源于社区官网经过适度翻译,部分根据经验总结,部分是抄袭网络博文,(不一一列举引用,在此致歉)一并列在一起,本文的目的,希望能总结出一些有用的,应该注意到的东西,基本 ...

  7. Hbase 学习笔记2----概念

    说在前面,本文部分内容来源于社区官网经过适度翻译,部分根据经验总结,部分是抄袭网络博文,(不一一列举引用,在此致歉)一并列在一起,本文的目的,希望能总结出一些有用的,应该注意到的东西,基本思路是先提出 ...

  8. HBase学习笔记一

    HBase简介 HBase概念 HBase的原型是谷歌的Bigtable论文 HBase是一个高可靠性.高性能.面向列.可伸缩的分布式存储系统,利用HBase技术可在廉价PC上搭建起大规模结构化存储集 ...

  9. HBase学习笔记之BulkLoad

    HBase学习之BulkLoad bulkload的学习以后再写文章. 参考资料: 1.https://blog.csdn.net/shixiaoguo90/article/details/78038 ...

随机推荐

  1. 安装Linux Mint

    1.尽量选择trusty的安装版本,kde和xfce不支持Win+..快捷键,推荐cinnamon:制作安装U盘后,选择非EFI模式启动:选择start Linux Mint(就是第一项): 2.In ...

  2. gbk与utf-8转换

    linux: #include <iconv.h> int code_convert(char *from_charset,char *to_charset,char *inbuf,int ...

  3. python网络编程【一】

    TCP/IP 是标准的协议,它可以使用世界范围内的计算机通过Internet或本地的网络通信 1.编写一个TCP客户端程序 #!/usr/bin/env python import socket, s ...

  4. Spring 通过配置文件注入 properties文件

    当我们需要将某些值放入 properties文件 key=value 的方式,获取文件信息使用spring 注入的方式会变得很便捷 1. spring 配置文件需要导入 <?xml versio ...

  5. MicroERP软件更新记录1.1

    MicroERP软件更新记录 最新版本:1.1 1.增加固定资产检修.租赁.转移记录 2.增加产品质检单 3.增加零售单(收银台) 4.支持各种主流关系型数据库 5.完善了数据字典,如加入原材料材质. ...

  6. java常用注释

    @see 加入超链接 @see 类名 @see 完整类名 @see 完整类名#方法名 @version 版本信息 @author 作者信息 @param 参数名 说明 @return 说明 @exce ...

  7. unity代码加密for Android,mono编译

    uinty3d加密推荐几篇比较好的博客链接: http://www.cppcourse.com/u3d-encryption.html http://www.xuanyusong.com/archiv ...

  8. JAVA多线程超时加载当网页图片

    先上图: 这一次没有采取正则匹配,而采取了最简单的java分割和替代方法进行筛选图片 它能够筛选如下的图片并保存到指定的文件夹 如: “http://xxxx/xxxx/xxx.jpg” 'http: ...

  9. ubuntu 安装MTK 移动终端usb驱动

    lsusbBus 001 Device 002: ID 8087:8000 Intel Corp. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation ...

  10. Objective-C学习笔记-第四天(1)

    解决以下昨天遇到的问题 1.@class与import是怎么样的呢?参考:http://www.cnblogs.com/ios8/p/ios-oc-test.html 在头文件中, 一般只需要知道被引 ...