hbase报Dead Region Servers
问题描述:
16010端口启动成功,16020未启动。
hbase-root-regionserver-hbase2.log日志:
2019-08-14 16:45:10,552 WARN [Thread-37] hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hbase/default/tsdb/3f2398c5b49b581c09687c49a739b007/recovered.edits/0000000000006253152-hbase2%2C16020%2C1562822459462.1565198820284.temp could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1603)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1388)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:554)
2019-08-14 16:45:10,568 ERROR [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-2] wal.WALSplitter: Got while writing log entry to log
java.io.IOException: File /hbase/default/tsdb/3f2398c5b49b581c09687c49a739b007/recovered.edits/0000000000006253152-hbase2%2C16020%2C1562822459462.1565198820284.temp could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1601)
at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1559)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1084)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1076)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1046)
16010上日志:

原因:参考网址https://issues.apache.org/jira/browse/HBASE-12426
Description
I initially had a set of 5 region servers which had a single table which was pre split into 30 regions and was evenly distributed to all the regions with data.I then went ahead and removed/decommissioned a coupe of region servers,so in the end I have 3 region servers.Ran hbase hbck and verified there were 0 inconsistencies.However when 'status' command is issued is from hbase shell it shows a dead region server and the same is displayed in master UI as well.Fail over of hbase master did not fix the issue.On investigation we could see some WAL entries which was still pointing to the old region server.
/hbase/WALs/myserver,60020,1406745344969-splitting After removing these orphan entries from hdfs and master failover the dead region servers went away.I wonder if this could have caused any replication issues in the cluster.

解决:删除/hbase/WALs上的分割文件
<!--查看--> hdfs dfs -ls -R /hbase/WALs <!--删除-->
hdfs dfs -rm -R /hbase/WALs/*
<!--重启hbase-->
stop-hbase.sh
start-hbase.sh


16010上验证:

hbase报Dead Region Servers的更多相关文章
- HBase管理与监控——Dead Region Servers
[问题描述] 在持续批量写入HBase的情况下,出现了Dead Region Servers的情况.集群会把dead掉节点上的region自动分发到另外2个节点上,集群还能继续运行,只是少了1个节点. ...
- 应用程序连接hbase报错:java.net.SocketTimeoutException: callTimeout=60000
背景说明: 今天对生产环境hbase增加了节点,下午的时候一个同事反馈,应用程序后台报错,如下: Tue Feb 26 17:35:35 CST 2019, null, java.net.Socket ...
- HBase原理–所有Region切分的细节都在这里了
本文由 网易云发布. 作者:范欣欣(本篇文章仅限内部分享,如需转载,请联系网易获取授权.) Region自动切分是HBase能够拥有良好扩张性的最重要因素之一,也必然是所有分布式系统追求无限 ...
- Eclipse连接HBase 报错:org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
在eclipse中连接到HBase报错org.apache.hadoop.hbase.PleaseHoldException: Master is initializing,搜索了好久,网上其它人说的 ...
- hbase hbck及region RIT处理
hbase hbck主要用来检查hbase集群region的状态以及对有问题的region进行修复. hbase hbck :检查hbase所有表的一致性,如果正常,就会Print OK hbase ...
- hbase集群region数量和大小的影响
1.Region数量的影响 通常较少的region数量可使群集运行的更加平稳,官方指出每个RegionServer大约100个regions的时候效果最好,理由如下: 1)Hbase的一个特性MSLA ...
- 读者来信-5 | 如果你家HBase集群Region太多请点进来看看,这个问题你可能会遇到
前言:<读者来信>是HBase老店开设的一个问答专栏,旨在能为更多的小伙伴解决工作中常遇到的HBase相关的问题.老店会尽力帮大家解决这些问题或帮你发出求救贴,老店希望这会是一个互帮互助的 ...
- 读者来信 | 如果你家HBase集群Region太多请点进来看看,这个问题你可能会遇到
前言:<读者来信>是HBase老店开设的一个问答专栏,旨在能为更多的小伙伴解决工作中常遇到的HBase相关的问题.老店会尽力帮大家解决这些问题或帮你发出求救贴,老店希望这会是一个互帮互助的 ...
- java连接hbase报错
报错信息如下: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the va ...
随机推荐
- Ubuntu 12.04安装Gitlab及问题解决
最近看了下Git,并且之前听同学说过gitlab这个东西,就想自己也搭建一个gitlab,做一个像github那样的代码管理站点,现在的gitlab要安装确实是非常非常方便, https://abou ...
- [2019杭电多校第六场][hdu6635]Nonsense Time
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=6635 题意是说一开始所有数都冻结,第i秒会解冻第ki个数,求每秒状态下的最长上上升子序列长度. 这种题 ...
- free野指针问题
gdb backtrace内容如下: Program received signal SIGABRT, Aborted. (gdb) p cmd No symbol "cmd" i ...
- OtterTune配置记录
0. 准备两台Ubuntu 18.04的虚拟机,安装mysql(供server-side存储调优数据用)和postgresql(供client-side存储业务数据用,这里以PostgreSQL为例. ...
- js中JSON和JSONP的区别,让你从懵逼到恍然大悟
说到AJAX就会不可避免的面临两个问题,第一个是AJAX以何种格式来交换数据?第二个是跨域的需求如何解决?这两个问题目前都有不同的解决方案,比如数据可以用自定义字符串或者用XML来描述,跨域可以通过服 ...
- ElasticSearch 入门介绍
tags: 第三方 lucene [toc] 1. what Elastic Search(ES)是什么 全文检索和lucene 全文检索 优点:高效,准确,分词全文检索允许用户输入一些关键字,从数据 ...
- CentOS 6.5下源码安装LAMP(Linux+Apache+Mysql+Php)环境
---恢复内容开始--- 一.系统环境 系统平台:CentOS 6.5 (Final) Apache版本:httpd-2.2.31.tar.gz(最新版本2015-07-16) Mysql 版本:my ...
- 一、touch.js
一.touch.js 1.引用链接: <script src="https://cdn.bootcss.com/touchjs/0.2.14/touch.min.js"> ...
- rename 重命名文件
1. 使用范例 范例1: 批量修改文件名 [root@localhost data]# touch {a,b,c,d,e}.txt [root@localhost data]# ls a.txt ...
- oracle汇编03
.long expression1, expression2, ..., expressionNThe .long directive generates a long integer (32-bit ...