今天HDFS起不来
报错信息:
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/root/5f712323-0bef-4a24-81d7-e3ed7b719d37. Name node is in safe mode.
The reported blocks 0 needs additional 240 blocks to reach the threshold 0.9990 of total blocks 241.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:linux121
# 省略一堆
Caused by: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/root/5f712323-0bef-4a24-81d7-e3ed7b719d37. Name node is in safe mode.
The reported blocks 0 needs additional 240 blocks to reach the threshold 0.9990 of total blocks 241.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:linux121
# 省略一堆
... 9 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/root/5f712323-0bef-4a24-81d7-e3ed7b719d37. Name node is in safe mode.
The reported blocks 0 needs additional 240 blocks to reach the threshold 0.9990 of total blocks 241.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:linux121
# 省略一堆
... 18 more
复制代码
现象是无法创建目录,NameNode处于安全模式,也就是说它自闭了
Cannot create directory /tmp/hive/root/5f712323-0bef-4a24-81d7-e3ed7b719d37. Name node is in safe mode.
复制代码
Q: 原因是什么呢?
A: 就是如果NameNode要正常启动呢,起码要241个block的0.9990倍也就是240+个block正常才行,但是现在因为某种原因,需要向NameNode汇报的DataNode 0还缺少240个block,实际上它也说了DataNode 0的数据到了0,所以不能满足NameNode正常启动的阈值,于是它自闭了,
The reported blocks 0 needs additional 240 blocks to reach the threshold 0.9990 of total blocks 241.
The number of live datanodes 0 has reached the minimum number 0
复制代码
所以大体意思是DataNode丢失超过设置的丢失百分比,所以HDFS系统自动进入安全模式了。那为什么会出现这样的事呢?因为我之前电脑断电了。
Q: 怎样才能把它哄开呢?
A: 达到阈值后,安全模式将自动关闭
Safe mode will be turned off automatically once the thresholds have been reached
复制代码
这说跟没说一样,那我只好求助网友了
先考虑恢复
# 查看根路径/下的所有分区信息
hdfs fsck / -files -blocks -locations
复制代码
会出来一堆
因为太多了,所以我直接输出到一个文件中再进行整理
# 查看根路径/下的所有分区信息并将结果输出到当前路径的recover.txt
hdfs fsck / -files -blocks -locations > recover.txt
复制代码
在本地打开一看,它长这样,里面都是我丢失的block数据
然后我可以使用命令将其恢复数据块
# 恢复数据块
# -path:后面跟需要恢复的文件路径,注意是文件路径,不是目录,即上图中末尾带<dir>的都不是这条命令的参数
# -retries: 重试次数,默认为1
hdfs debug recoverLease [-path] [-retries]
# 比如我要恢复下图所示文件,则使用命令
hdfs debug recoverLease -path /test/input/test.txt
复制代码
然后就会提示恢复成功
搞定一条很开心
但是我上面有几百个文件呢…手动来?那必不可能。那么我来写个脚本吧,批量跑,当然我得先整理数据,以下是最后我整理好的文件
每一行是一个待恢复的文件,中间空格隔开(其实不要也行),然后写个shell脚本vi recover.sh
#!/bin/bash
# 逐行读取文件,存到临时变量line中
while read -r line
do
# 如果line不是空的
if [ -n "$line" ]; then
# 就执行这条命令,参数就是line
hdfs debug recoverLease -path "$line"
fi
# 读取的文件是根目录下的recover.txt
done < ~/recover.txt
复制代码
执行脚本./recover.sh
就好了
我最终有没有搞定呢?并没有,但为什么我也不知道(md…),不过我并不打算现在就去深究它,所以先用网上提供的另一个比较粗暴的方法
# 使用命令离强制开安全模式然后再检测损坏block并删除
# 第一步、强制离开安全模式
hdfs dfsadmin -safemode leave ()
# 第二步、删除掉损坏的blocks
hdfs fsck / -delete ()
复制代码
另外还有一个更粗暴的…
# 暴力的直接将整个文件系统格式化
hdfs namenode -format
复制代码
先这样吧,饮茶去
© 版权声明
文章版权归作者所有,未经允许请勿转载。
THE END