今天HDFS起不来

今天HDFS起不来

报错信息:

Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/root/5f712323-0bef-4a24-81d7-e3ed7b719d37. Name node is in safe mode.
The reported blocks 0 needs additional 240 blocks to reach the threshold 0.9990 of total blocks 241.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:linux121
# 省略一堆
Caused by: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/root/5f712323-0bef-4a24-81d7-e3ed7b719d37. Name node is in safe mode.
The reported blocks 0 needs additional 240 blocks to reach the threshold 0.9990 of total blocks 241.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:linux121
# 省略一堆
	... 9 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/root/5f712323-0bef-4a24-81d7-e3ed7b719d37. Name node is in safe mode.
The reported blocks 0 needs additional 240 blocks to reach the threshold 0.9990 of total blocks 241.
The number of live datanodes 0 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:linux121
# 省略一堆
	... 18 more
复制代码

现象是无法创建目录,NameNode处于安全模式,也就是说它自闭了

Cannot create directory /tmp/hive/root/5f712323-0bef-4a24-81d7-e3ed7b719d37. Name node is in safe mode.
复制代码

Q: 原因是什么呢?

A: 就是如果NameNode要正常启动呢,起码要241个block的0.9990倍也就是240+个block正常才行,但是现在因为某种原因,需要向NameNode汇报的DataNode 0还缺少240个block,实际上它也说了DataNode 0的数据到了0,所以不能满足NameNode正常启动的阈值,于是它自闭了,

The reported blocks 0 needs additional 240 blocks to reach the threshold 0.9990 of total blocks 241.
The number of live datanodes 0 has reached the minimum number 0
复制代码

所以大体意思是DataNode丢失超过设置的丢失百分比,所以HDFS系统自动进入安全模式了。那为什么会出现这样的事呢?因为我之前电脑断电了。

Q: 怎样才能把它哄开呢?

A: 达到阈值后,安全模式将自动关闭

Safe mode will be turned off automatically once the thresholds have been reached
复制代码

这说跟没说一样,那我只好求助网友了

image-20210513222448319

先考虑恢复

# 查看根路径/下的所有分区信息
hdfs fsck / -files -blocks -locations
复制代码

会出来一堆

image-20210513225849166

因为太多了,所以我直接输出到一个文件中再进行整理

# 查看根路径/下的所有分区信息并将结果输出到当前路径的recover.txt
hdfs fsck / -files -blocks -locations > recover.txt
复制代码

在本地打开一看,它长这样,里面都是我丢失的block数据

image-20210513230117222

然后我可以使用命令将其恢复数据块

# 恢复数据块
# -path:后面跟需要恢复的文件路径,注意是文件路径,不是目录,即上图中末尾带<dir>的都不是这条命令的参数
# -retries: 重试次数,默认为1
hdfs debug recoverLease [-path] [-retries]

# 比如我要恢复下图所示文件,则使用命令
hdfs debug recoverLease -path /test/input/test.txt
复制代码

image-20210513230532075

然后就会提示恢复成功

image-20210513230742308

搞定一条很开心

happy下

但是我上面有几百个文件呢…手动来?那必不可能。那么我来写个脚本吧,批量跑,当然我得先整理数据,以下是最后我整理好的文件

image-20210513230041885

每一行是一个待恢复的文件,中间空格隔开(其实不要也行),然后写个shell脚本vi recover.sh

#!/bin/bash

# 逐行读取文件,存到临时变量line中
while read -r line
do
	# 如果line不是空的
	if [ -n "$line" ]; then
		# 就执行这条命令,参数就是line
		hdfs debug recoverLease -path "$line"
	fi
# 读取的文件是根目录下的recover.txt
done < ~/recover.txt
复制代码

执行脚本./recover.sh就好了

image-20210513225554830

我最终有没有搞定呢?并没有,但为什么我也不知道(md…),不过我并不打算现在就去深究它,所以先用网上提供的另一个比较粗暴的方法

# 使用命令离强制开安全模式然后再检测损坏block并删除

# 第一步、强制离开安全模式
hdfs dfsadmin -safemode leave ()

# 第二步、删除掉损坏的blocks
hdfs fsck / -delete ()
复制代码

另外还有一个更粗暴的…

# 暴力的直接将整个文件系统格式化
hdfs namenode -format
复制代码

先这样吧,饮茶去

饮下靓靓的杯

© 版权声明
THE END
喜欢就支持一下吧
点赞0 分享