1、Cannot allocate memory

报错信息:

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000005c5330000, 8502706176, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 8502706176 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /root/hs_err_pid9168.log

  

日志

上面报错信息提示,查看更多,去/root/hs_err_pid9168.log里面查看。

#查看
vim /root/hs_err_pid9168.log
#内容
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 8502706176 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (os_linux.cpp:2743), pid=9168, tid=0x00007f22fdcce700
#
# JRE version: (8.0_191-b12) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops)
# Core dump written. Default location: /root/core or core.9168
# --------------- T H R E A D --------------- Current thread (0x00007f22f4016000): JavaThread "Unknown thread" [_thread_in_vm, id=9255, stack(0x00007f22fdbcf000,0x00007f22fdccf000)] Stack: [0x00007f22fdbcf000,0x00007f22fdccf000], sp=0x00007f22fdccd4c0, free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xace425] VMError::report_and_die()+0x2c5
V [libjvm.so+0x4deb77] report_vm_out_of_memory(char const*, int, unsigned long, VMErrorType, char const*)+0x67
V [libjvm.so+0x90c570] os::pd_commit_memory(char*, unsigned long, unsigned long, bool)+0x100
V [libjvm.so+0x903eaf] os::commit_memory(char*, unsigned long, unsigned long, bool)+0x1f
V [libjvm.so+0xaca93c] VirtualSpace::initialize(ReservedSpace, unsigned long)+0x20c
V [libjvm.so+0x5ea477] CardGeneration::CardGeneration(ReservedSpace, unsigned long, int, GenRemSet*)+0xc7
V [libjvm.so+0x5eb842] GenerationSpec::init(ReservedSpace, int, GenRemSet*)+0x182
V [libjvm.so+0x5d699f] GenCollectedHeap::initialize()+0x20f
V [libjvm.so+0xa922ba] Universe::initialize_heap()+0x16a
V [libjvm.so+0xa92593] universe_init()+0x33
V [libjvm.so+0x62f0f0] init_globals()+0x50
V [libjvm.so+0xa74c57] Threads::create_vm(JavaVMInitArgs*, bool*)+0x257
V [libjvm.so+0x6d49ff] JNI_CreateJavaVM+0x4f
C [libjli.so+0x7e74] JavaMain+0x84
C [libpthread.so.0+0x7dd5] start_thread+0xc5 --------------- P R O C E S S --------------- Java Threads: ( => current thread ) Other Threads: =>0x00007f22f4016000 (exited) JavaThread "Unknown thread" [_thread_in_vm, id=9255, stack(0x00007f22fdbcf000,0x00007f22fdccf000)] VM state:not at safepoint (not fully initialized) VM Mutex/Monitor currently owned by a thread: None GC Heap History (0 events):
No events Deoptimization events (0 events):
No events Classes redefined (0 events):
No events Internal exceptions (0 events):
No events Events (0 events):
No events Dynamic libraries:
00400000-00401000 r-xp 00000000 fd:00 202099235 /usr/local/soft/jdk/jdk1.8.0_191/bin/java
00600000-00601000 r--p 00000000 fd:00 202099235 /usr/local/soft/jdk/jdk1.8.0_191/bin/java
00601000-00602000 rw-p 00001000 fd:00 202099235 /usr/local/soft/jdk/jdk1.8.0_191/bin/java
00dc0000-00dfa000 rw-p 00000000 00:00 0 [heap]
5c0000000-5c5330000 rw-p 00000000 00:00 0
7f22e5000000-7f22e5270000 rwxp 00000000 00:00 0
7f22e5270000-7f22f4000000 ---p 00000000 00:00 0
7f22f4000000-7f22f4043000 rw-p 00000000 00:00 0
7f22f4043000-7f22f8000000 ---p 00000000 00:00 0
7f22f9b03000-7f22f9ec1000 rw-p 00000000 00:00 0
7f22f9ec1000-7f22fae97000 ---p 00000000 00:00 0
7f22fae97000-7f22fae98000 rw-p 00000000 00:00 0
7f22fae98000-7f22fae99000 ---p 00000000 00:00 0
7f22fae99000-7f22fafa3000 rw-p 00000000 00:00 0
7f22fafa3000-7f22fb359000 ---p 00000000 00:00 0
7f22fb359000-7f22fb373000 r-xp 00000000 fd:00 134320156 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so
7f22fb373000-7f22fb573000 ---p 0001a000 fd:00 134320156 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so
7f22fb573000-7f22fb574000 r--p 0001a000 fd:00 134320156 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so
7f22fb574000-7f22fb575000 rw-p 0001b000 fd:00 134320156 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so
7f22fb575000-7f22fb581000 r-xp 00000000 fd:00 1048275 /usr/lib64/libnss_files-2.17.so
7f22fb581000-7f22fb780000 ---p 0000c000 fd:00 1048275 /usr/lib64/libnss_files-2.17.so
7f22fb780000-7f22fb781000 r--p 0000b000 fd:00 1048275 /usr/lib64/libnss_files-2.17.so
7f22fb781000-7f22fb782000 rw-p 0000c000 fd:00 1048275 /usr/lib64/libnss_files-2.17.so
7f22fcdb2000-7f22fcfb2000 ---p 00ce2000 fd:00 19060 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
7f22fcfb2000-7f22fd048000 r--p 00ce2000 fd:00 19060 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
7f22fd048000-7f22fd079000 rw-p 00d78000 fd:00 19060 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
7f22fd079000-7f22fd0b4000 rw-p 00000000 00:00 0
7f22fd476000-7f22fd47a000 r--p 001c2000 fd:00 34193 /usr/lib64/libc-2.17.so
7f22fd47a000-7f22fd47c000 rw-p 001c6000 fd:00 34193 /usr/lib64/libc-2.17.so
7f22fd47c000-7f22fd481000 rw-p 00000000 00:00 0
7f22fd481000-7f22fd483000 r-xp 00000000 fd:00 34199 /usr/lib64/libdl-2.17.so
7f22fd483000-7f22fd683000 ---p 00002000 fd:00 34199 /usr/lib64/libdl-2.17.so
7f22fd683000-7f22fd684000 r--p 00002000 fd:00 34199 /usr/lib64/libdl-2.17.so
7f22fd684000-7f22fd685000 rw-p 00003000 fd:00 34199 /usr/lib64/libdl-2.17.so
7f22fd685000-7f22fd69c000 r-xp 00000000 fd:00 67296956 /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so
7f22fd69c000-7f22fd89b000 ---p 00017000 fd:00 67296956 /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so
7f22fd89b000-7f22fd89c000 r--p 00016000 fd:00 67296956 /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so
7f22fd89c000-7f22fd89d000 rw-p 00017000 fd:00 67296956 /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so
7f22fd89d000-7f22fd8b4000 r-xp 00000000 fd:00 1051052 /usr/lib64/libpthread-2.17.so
7f22fd8b4000-7f22fdab3000 ---p 00017000 fd:00 1051052 /usr/lib64/libpthread-2.17.so
7f22fdab3000-7f22fdab4000 r--p 00016000 fd:00 1051052 /usr/lib64/libpthread-2.17.so
7f22fdab4000-7f22fdab5000 rw-p 00017000 fd:00 1051052 /usr/lib64/libpthread-2.17.so
7f22fdab5000-7f22fdab9000 rw-p 00000000 00:00 0
7f22fdab9000-7f22fdadb000 r-xp 00000000 fd:00 33871 /usr/lib64/ld-2.17.so
7f22fdbc6000-7f22fdbce000 rw-s 00000000 fd:00 503948 /tmp/hsperfdata_root/9168
7f22fdbce000-7f22fdbd2000 ---p 00000000 00:00 0
7f22fdbd2000-7f22fdcd3000 rw-p 00000000 00:00 0
7f22fdcd4000-7f22fdcd8000 rw-p 00000000 00:00 0
7f22fdcd8000-7f22fdcd9000 r--p 00000000 00:00 0
7f22fdcd9000-7f22fdcda000 rw-p 00000000 00:00 0
7f22fdcda000-7f22fdcdb000 r--p 00021000 fd:00 33871 /usr/lib64/ld-2.17.so
7f22fdcdb000-7f22fdcdc000 rw-p 00022000 fd:00 33871 /usr/lib64/ld-2.17.so
7f22fdcdc000-7f22fdcdd000 rw-p 00000000 00:00 0
7fff013af000-7fff013d3000 rw-p 00000000 00:00 0 [stack]
7fff013ec000-7fff013ee000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] VM Arguments:
Launcher Type: SUN_STANDARD Environment Variables:
JAVA_HOME=/usr/local/soft/jdk/jdk1.8.0_191
LD_LIBRARY_PATH=:/opt/hadoop-3.1.2/lib:/opt/hadoop-3.1.2/lib/native
SHELL=/bin/bash Signal Handlers:
SIGSEGV: [libjvm.so+0xaced60], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGBUS: [libjvm.so+0xaced60], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGFPE: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGPIPE: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGXFSZ: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGILL: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO
SIGUSR1: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none
SIGUSR2: [libjvm.so+0x907b70], sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO
SIGHUP: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none
SIGINT: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none
SIGTERM: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none
SIGQUIT: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none --------------- S Y S T E M --------------- OS:CentOS Linux release 7.5.1804 (Core) uname:Linux 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64
libc:glibc 2.17 NPTL 2.17
rlimit: STACK 8192k, CORE infinity, NPROC 71318, NOFILE 102400, AS infinity
load average:4.18 4.20 4.11 /proc/meminfo:
MemTotal: 18281924 kB
MemFree: 199928 kB
MemAvailable: 1747332 kB
Buffers: 0 kB
Cached: 1806360 kB
SwapCached: 2488 kB
Active: 15057324 kB
Inactive(anon): 1729572 kB
Active(file): 865640 kB
Inactive(file): 864956 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 5238780 kB
SwapFree: 5196284 kB
Dirty: 11104 kB
Writeback: 0 kB
AnonPages: 15843208 kB
Mapped: 70700 kB
Shmem: 75776 kB
Slab: 202108 kB
SReclaimable: 156948 kB
SUnreclaim: 45160 kB
KernelStack: 14960 kB
PageTables: 38372 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 14379740 kB
HardwareCorrupted: 0 kB
AnonHugePages: 2625536 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 81792 kB
DirectMap2M: 5111808 kB
DirectMap1G: 13631488 kB container (cgroup) information:
container_type: cgroupv1
cpu_cpuset_cpus: 0
cpu_memory_nodes: 0
active_processor_count: 1
cpu_quota: -1
cpu_period: 100000
cpu_shares: -1
memory_limit_in_bytes: -1
memory_and_swap_limit_in_bytes: -1
memory_soft_limit_in_bytes: -1
memory_usage_in_bytes: 18073210880
memory_max_usage_in_bytes: 0 CPU:total 1 (initial active 1) (1 cores per cpu, 1 threads per core) family 6 model 85 stepping 4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse
4.2, popcnt, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, tsc, tscinvbit, bmi1, bmi2, adx /proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
stepping : 4
microcode : 0x200004d
cpu MHz : 2294.738
cache size : 16896 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant
_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer ae
s xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 invpcid rtm mpx avx512f avx512dq rdseed adx smap cl
flushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec ibpb ibrs stibp arat pku ospke spec_ctrl intel_stibp arch_capabilities
bogomips : 4589.47
clflush size : 64
cache_alignment : 64
address sizes : 43 bits physical, 48 bits virtual
power management: Memory: 4k page, physical 18281924k(199928k free), swap 5238780k(5196284k free) vm_info: Java HotSpot(TM) 64-Bit Server VM (25.191-b12) for linux-amd64 JRE (1.8.0_191-b12), built on Oct 6 2018 05:43:09 by "java_re" with gcc 7.3.0 time: Tue Sep 17 09:54:53 2019
elapsed time: 0 seconds (0d 0h 0m 0s)

原因分析

明显是由于内存不够,查看内存占用

df -h

解决

#重启hbase、hadoop
stop-hbase.sh
stop-all.sh start-all.sh
start-hbase.sh
#清除缓存
sync ;
echo 1 >/proc/sys/vm/drop_caches
echo 2 >/proc/sys/vm/drop_caches
echo 3 >/proc/sys/vm/drop_caches
#再次查看内存占用
df -h

2、Application is added to the scheduler and is not yet activated

再次迁移快照时,任务一直停留着不动

查看web,显示如下:

Application is added to the scheduler and is not yet activated. Skipping AM assignment as cluster resource is empty. Details : 
AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2048, vCores:1>; Queue Resource Limit for AM = <memory:0, vCores:0>;
User AM Resource Limit of the queue = <memory:0, vCores:0>; Queue AM Resource Usage = <memory:0, vCores:0>;

hbase shell能正常进入,但是输入命令,报错

ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2977)
at org.apache.hadoop.hbase.master.MasterRpcServices.getCompletedSnapshots(MasterRpcServices.java:949)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)

jps查看,HRegionServer没有起来

分析HRegionServer的日志

tailf /opt/hbase-2.1.4/logs/hbase-root-regionserver-hbase2.log -n 500
#报错信息

2019-09-17 10:59:22,539 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-0] coordination.ZkSplitLogWorkerCoordination: successfully transitioned task /hbase/splitWAL/WALs%2Fhbase2%2C16020%2C1568617738890-splitting%2Fhbase2%252C16020%252C1568617738890.1568684725860 to final state ERR hbase2,16020,1568688376805
2019-09-17 10:59:22,539 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-0] handler.WALSplitterHandler: Worker hbase2,16020,1568688376805 done with task org.apache.hadoop.hbase.coordination.ZkSplitLogWorkerCoordination$ZkSplitTaskDetails@9885422 in 200ms. Status = ERR
2019-09-17 10:59:23,158 INFO [SplitLogWorker-hbase2:16020] coordination.ZkSplitLogWorkerCoordination: worker hbase2,16020,1568688376805 acquired task /hbase/splitWAL/WALs%2Fhbase2%2C16020%2C1568617738890-splitting%2Fhbase2%252C16020%252C1568617738890.1568683832472
2019-09-17 10:59:23,179 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1] wal.WALSplitter: Splitting WAL=hdfs://hbase2:9000/hbase/WALs/hbase2,16020,1568617738890-splitting/hbase2%2C16020%2C1568617738890.1568683832472, length=138158040
2019-09-17 10:59:23,183 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1] util.FSHDFSUtils: Recover lease on dfs file hdfs://hbase2:9000/hbase/WALs/hbase2,16020,1568617738890-splitting/hbase2%2C16020%2C1568617738890.1568683832472
2019-09-17 10:59:23,183 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1] util.FSHDFSUtils: Recovered lease, attempt=0 on file=hdfs://hbase2:9000/hbase/WALs/hbase2,16020,1568617738890-splitting/hbase2%2C16020%2C1568617738890.1568683832472 after 0ms
2019-09-17 10:59:23,201 WARN [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-1] wal.WALSplitter:
Found old edits file. It could be the result of a previous failed split attempt.
Deleting hdfs://hbase2:9000/hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp, length=0
2019-09-17 10:59:23,222 WARN [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-0] wal.WALSplitter: Found old edits file.
It could be the result of a previous failed split attempt.
Deleting hdfs://hbase2:9000/hbase/default/tsdb/2501a70608674eab4974e7f8006dac12/recovered.edits/0000000000007214923-hbase2%2C16020%2C1568617738890.1568683832472.temp, length=0
2019-09-17 10:59:23,235 WARN [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-2] wal.WALSplitter: Found old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://hbase2:9000/hbase/default/tsdb/8d5fe84d54f4170f35d33dff7b830444/recovered.edits/0000000000006173823-hbase2%2C16020%2C1568617738890.1568683832472.temp, length=0
2019-09-17 10:59:23,411 WARN [Thread-8138] hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1603)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1388)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:554)
2019-09-17 10:59:23,412 ERROR [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-1] wal.WALSplitter: Got while writing log entry to log
java.io.IOException: File /hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1601)
at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1559)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1084)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1076)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1046)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)

at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1603)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1388)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:554)
2019-09-17 10:59:23,412 ERROR [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-1] wal.WALSplitter: Exiting thread

分析日志,发现,在反复切割1568617738890文件,那我们将这个文件删除,再重启

#停止hbase
stop-hbase.sh #查看WALs
hdfs dfs -ls -R /hbase/WALs
#删除WALs
hdfs dfs -rm -R /hbase/WALs
#清空zk里面的hbase
zkCli.sh
rmr /hbase 
#启动hbase start-hbase.sh

再次jps查看

发现HRegionServer成功启动,去hbase shell里面输命令

3、Unexpected error starting NodeStatusUpdater

问题描述

每次迁移快照时,就停留在2019-09-20 14:08:39,184 INFO  [main] mapreduce.Job: Running job: job_1568959466252_0001,去web上查看任务,可看到其诊断提示:

Application is added to the scheduler and is not yet activated. Skipping AM assignment as cluster resource is empty. Details :
AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2048, vCores:1>;
Queue Resource Limit for AM = <memory:0, vCores:0>; User AM Resource Limit of the queue = <memory:0, vCores:0>;
Queue AM Resource Usage = <memory:0, vCores:0>;

退回8088页面,查看到Active Nodes为0:

去50070页面,可以看到Live Nodes为2:

到这里,我估计是两个子节点,出现了问题,jps查看每个节点:

  

看到ResourceManager上面的在hbase0上时,我想起,yarn.resourcemanager.hostname这个配置,当时集群搭建时,我全部给的是本机的hostname,这样肯定是不对的,于是去查两个子节点的日志:

tailf /opt/sfot/hadoop/hadoop-3.1.3/hadoop-root-nodemanager-hbase1.log -n 500

果不其然,看到了以下报错:

2019-09-20 11:53:52,458 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater
java.net.ConnectException: Call From hbase1/192.168.0.211 to hbase1:8031 failed on connection exception: java.net.ConnectException:
Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor28.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1367)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(P

在主节点上查看8031端口:

netstat -ntual |grep 8031

解决

去2个子节点将配置文件中有关yarn.resourcemanager.hostname的配置修改过来:

vim /opt/soft/hadoop/hadoop-3.1.2/etc/hadoop/yarn-site.xml

验证

重启hadoop、hbase之后,再在主节点上查看8031端口

再次迁移快照:

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot_tsdb_212 -copy-from hdfs://192.168.0.210:9000/hbase -copy-to hdfs://192.168.0.212:9000/hbase -mappers 20 -bandwidth 1024

发现成功

并且在web上查看Active Nodes为2

PS

到这里,想起要提醒一下,与伪分布式不同的是,要注意hbase的配置文件hbase-site.xml,在指定hbase.rootdir时,保持一致,我这里主节点是hbase0,3个节点的配置都如下:

<property>
   <name>hbase.rootdir</name>
   <value>hdfs://hbase0:9000/hbase</value>
</property>
  <property>
   <name>hbase.cluster.distributed</name>
   <value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property> <property>
   <name>hbase.zookeeper.quorum</name>
   <value>hbase0,hbase1,hbase3</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property> <property>
  <name>hbase.wal.provider</name>
   <value>filesystem</value>
</property> <property>
   <name>hbase.tmp.dir</name>
   <value>/opt/soft/hbase/hbase-2.1.4/tmpdata</value>
</property>
<property>
<name>hfile.block.cache.size</name>
<value>0.2</value>
</property>
<property>
<name>hbase.snapshot.enabled</name>
<value>true</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>180000</value>
</property>

 

hbase迁移快照ExportSnapshot时遇到的错的更多相关文章

  1. 全量、增量数据在HBase迁移的多种技巧实践

    作者经历了多次基于HBase实现全量与增量数据的迁移测试,总结了在使用HBase进行数据迁移的多种实践,本文针对全量与增量数据迁移的场景不同,提供了1+2的技巧分享. HBase全量与增量数据迁移的方 ...

  2. HBase的快照技术

    (1)     什么是快照 快照就是一份元信息的合集,允许管理员恢复到表的先前状态,快照不是表的复制而是一个文件名称列表,因而不会复制数据. 完全快照恢复是指恢复到之前的表结构以及当时的数据快照之后发 ...

  3. 【PostgreSQL】PostgreSQL添加新服务器连接时,报错“Server doesn't listen ”,已解决。

    PostgreSQL添加新的服务器连接时,报错:

  4. hibernate有关联关系删除子表时可能会报错,可以用个clear避免错误

    //清除子表数据 public SalesSet removeSalesSetDistributor(SalesSet salesSet ){ List<SalesSetDistributor& ...

  5. CentOS命令登录MySQL时,报错ERROR 1045 (28000):

    CentOS命令登录MySQL时,报错ERROR 1045 (28000): Access denied for user root@localhost (using password: NO)错误解 ...

  6. saltstack配置安装的一些关键步骤及安装时各种报错的分析

    以下其他仅做参考,官方网址才是安装重点:http://docs.saltstack.cn/topics/installation/rhel.html 与安装相关的一些文档或资料: 一.linux服务器 ...

  7. ECshop 在迁移到 PHP7 时遇到的兼容性问题

    在 PHP7 上安装 ECShop V2.7.3时,报错! Deprecated: Methods with the same name as their class will not be cons ...

  8. vue使用v-for时vscode报错 Elements in iteration expect to have 'v-bind:key' directives

    vue使用v-for时vscode报错 Elements in iteration expect to have 'v-bind:key' directives Vue 2.2.0+的版本里,当在组件 ...

  9. Ansible 脚本运行一次后,再次运行时出现报错情况,原因:ansible script 的格式不对,应改成Unix编码

    Ansible 脚本运行一次后,再次运行时出现报错情况,原因:ansible  script 的格式不对,应改成Unix编码 find . -name "*" | xargs do ...

随机推荐

  1. css厂商前缀

    在vue中写css,不要加厂商前缀,vue-cli会在打包时自动生成

  2. Linux下配置JDK环境

    安装前需要查询Linux中是否已经存在jdk 如果存在,将存在的jdk删除 在/etc/profile中添加以下 JAVA_HOME为jdk的安装目录 PATH为jdk可执行文件的目录 使用sourc ...

  3. java上传文件-大文件以二进制保存到数据库

    转自:https://blog.csdn.net/qq_29631069/article/details/70054201 1 一.创建表 oracle: create table baoxianda ...

  4. [七月挑选]Tomcat使用命令行启动之指定jdk版本

    title: Tomcat使用命令行启动之指定jdk版本 准备好环境,jdk和tomcat. 主要步骤 1.找到Tomcat/bin/catalina.bat文件. 2.在文件前端添加如下. set ...

  5. js中的函数防抖与节流

    一.滚动条监听的例子 写一个功能需求-- 监听浏览器滚动事件,返回当前滚条与顶部的距离,代码如下: function showTop () { var scrollTop = document.bod ...

  6. [转载]ISE中COE与MIF文件的联系与区别

    原文地址:ISE中COE与MIF文件的联系与区别作者:铁掌北京漂 在ISE中,当用Blcok Memory Generator 生成某个ROM模块时,经常要对ROM中的内容作初始化.这时,就需要我们另 ...

  7. GUI学习之三十二—QLCDNumber学习总结

    下面我们来总结一下QLCDNumber的用法 一.描述 QLCDNumber主要用来展示LCD样式的数字,他可以显示几乎任何大小的数字,可以显示十进制,十六进制,八进制或二进制数,也可以展示一些简单的 ...

  8. Centos7.5 rpm安装zabbix_agent4.0.3

    1.下载并且安装 cd /data/tools/ ##切换到下载客户端目录 wget http://repo.zabbix.com/zabbix/4.0/rhel/7/x86_64/zabbix-ag ...

  9. Android Studio的安装

    下载Android Studio(需要翻墙才能安装得快):直接到官网进行下载就可以了.下载地址:https://developer.android.com/ “Android Virtual Devi ...

  10. 【leetcode】1147. Longest Chunked Palindrome Decomposition

    题目如下: Return the largest possible k such that there exists a_1, a_2, ..., a_k such that: Each a_i is ...