一、前期准备工作
1.1)软件准备
- 下载Hadoop,本例中下载的是3.3.0;下载地址:mirrors.hust.edu.cn/apache/hado…
- 准备JDK;本地使用的版本是:1.8.0_262;JDK的安装这里就不再进行阐述了,网上很多,自己找度娘吧;
1.2)SSH公私钥免密登录配置
先执行
ssh localhost
复制代码
这时如果需要输入密码说明还没有开启免密登录,输入密码后,继续执行下面的步骤;如果没有提示输入密码可以忽略下面的步骤了;
ssh-keygen -t rsa
复制代码
执行上面这条命令,一路回车或Y或yes(PS:我这里执行的时候一路回车,没有遇到需要输入y或者Yes的时候),依次执行下面的命令,注意是依次执行;
cd .ssh
touch authorized_keys
chmod 600 authorized_keys
复制代码
将公钥追加到authorized_keys文件中去
cat id_rsa.pub >> authorized_keys
复制代码
设置完毕后再次执行下面的命令看看效果
ssh localhost
复制代码
二、Hadoop搭建
2.1)解压Hadoop压缩包
这里最好将解压包放在一个独立的目录便于后面的操作;
tar -zxvf hadoop-3.3.0.tar.gz
复制代码
解压完毕后,解压的结果就在当前压缩文件所在的目录(这句屁话Linux的老嫖客就飘过吧);在当前压缩包所在的目录中创建如下几个目录便于后面的使用
data_tmp
|-data_1
|-data_2
复制代码
这几个目录创建的位置纯属个人喜好,可以根据自己的喜好自定义(自己开心就好),要注意本例的后面实例路径;
2.2)修改环境变量
vim /etc/profile
复制代码
将下面的两行加入最下面文件的最下面
export HADOOP_HOME=【自己定义的Hadoop解压目录路径】
export PATH=.:$HADOOP_HOME/bin:$PATH
复制代码
保存并退出;使配置生效
source /etc/profile
复制代码
使用下面的命令试一试是否成功
hadoop -version
复制代码
出现这个信息标识环境变量配置成功
2.3)修改core-site.xml文件
文件路径:/tools/hadoop/hadoop-3.3.0/etc/hadoop修改文件内容如下:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="https://juejin.cn/post/configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<!-- 指定目录 -->
<value>file:【压缩文件所在的目录路径】/data_tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<!-- 分布式集群中主节点的地址 : 指定端口号;现在搭建的是伪集群,主节点当然就是本机了 -->
<value>hdfs://localhost:【自己定义的端口】</value>
</property>
</configuration>
复制代码
2.4)修改hdfs-site.xml文件
文件路径:/tools/hadoop/hadoop-3.3.0/etc/hadoop
修改文件内容如下:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="https://juejin.cn/post/configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<!-- 真正集群情况下这里应该是3(为啥是3,这个问题就不在这里瞎逼逼了,做过集群的都懂);搭建的是伪分布式就只有一台机器,所以只能写1 -->
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>【压缩文件所在的目录路径】/data_tmp/data_1</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>【压缩文件所在的目录路径】/data_tmp/data_2</value>
</property>
</configuration>
复制代码
2.5)修改mapred-site.xml文件
文件路径:/tools/hadoop/hadoop-3.3.0/etc/hadoop修改文件内容如下:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="https://juejin.cn/post/configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
复制代码
2.6)修改yarn-site.xml文件
文件路径:/tools/hadoop/hadoop-3.3.0/etc/hadoop修改文件内容如下:
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 指定领导 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<!-- 配置yarn集群中的重节点,指定map产生的中间结果传递给reduce采用的机制是shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
复制代码
三、启动Hadoop
介于本次实例是给自己玩及开发调试使用,所以可以关闭防火墙;
3.1)初始化
执行下面的命令:
hadoop namenode -format
复制代码
注意:这里如果提示hadoop命令不存在,那么请检查一下配置的环境变量是否有效;
3.2)启动
进入Hadoop解压目录启动文件目录在sbin下
./sbin/start-dfs.sh
./sbin/start-yarn.sh
复制代码
执行完毕后使用jps命令查看一下,如果如下图所示标识启动成功;
3.3)在浏览器上浏览
Hadoop3.0.0以上版本访问WebUI默认端口从50070改为9870;
在浏览器上输入:【虚拟机IP】:9870
如上图所示,标识成功了,成功的被绿了;(PS:真心不知道,这些研发大牛为啥那么喜欢绿色,必须要“要想生活过得去,就得头上带点绿。”吗?)
四、常见问题
4.1)启动异常
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [localhost.localdomain]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
如果遇到这个问题,请调整如下文件(在Hadoop安装目录下找到sbin文件夹):
在start-dfs.sh和stop-dfs.sh两个文件的头部加入:
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
复制代码
在start-yarn.sh和stop-yarn.sh两个文件的头部加入:
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
复制代码
修改完毕后保存重新执行
./sbin/start-dfs.sh
./sbin/start-yarn.sh
复制代码
4.2)启动hadoop后jps查看进程只有一个
此问题的解决办法是来源于网络,这里记录一下;
在配置完成后我直接启动了hadoop并jps查看,发现只有JPS一个进程,开始没注意启动过程提示一直在检查配置错误,除去上文的配置又补做了如下操作:(事实上下面的做不做都并不需要,但为了避免最终操作和文中不符所以还是列出来)
修改了hadoop-env.sh配置了hadoop_home变量
把core-site.xml的9000后多余的/去掉
mapred-site.xml加上了
复制代码
<property>
<name>mapreduce.admin.user.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
</property>
复制代码
后来恍然意识到是没有输入密码,启动过程中也不会给输入密码的提示步骤,所以一定要先配置SSH免密登录。