下载地址:
hadoop: http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-3.0.0/
准备工作:
1.master节点与其他节点需要建立免密登录,这个很简单,两句话搞定:
ssh-keygenssh-copy-id 10.1.4.58
2.安装jdk
3.配置/etc/hosts(如果配置为ip ip将会导致datanode无法识别master,下面会讲)
4.关闭防火墙
新建用户
useradd sri_udap passwd sri_udap
输入密码
方便起见,全部采用root用户操作,新建用户只是独立出目录,后续如果需要权限管理则重新赋权
解压hadoop-3.0.0.tar.gz
tar -zxvf hadoop-3.0.0.tar.gz
加入环境变量
vi /etc/profile
添加如下内容:
export HADOOP_HOME=/home/sri_udap/app/hadoop-3.0.0export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
修改配置文件:
cd /home/sri_udap/app/hadoop-3.0.0/etc/hadoop/vi hadoop-env.shexport JAVA_HOME=/usr/java/jdk1.8.0_121vi core-site.xml
fs.defaultFS hdfs://10.1.4.57:9000 hadoop.tmp.dir /home/sri_udap/app/hadoop-3.0.0/temp
hdfs-site.xml
dfs.namenode.http-address master:50070 dfs.namenode.secondary.http-address slave1:50090 dfs.namenode.name.dir /opt/soft/hadoop-2.7.2/name dfs.replication 2 dfs.datanode.data.dir /opt/soft/hadoop-2.7.2/data
mapred-site.xml
mapreduce.framework.name yarn
yarn-site.xml
yarn.resourcemanager.hostname master yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler
新建worker文件:
vi workers
添加如下:
10.1.4.5810.1.4.59
将整个包拷贝到其他两台主机的相同位置
格式化:(格式化一次就好,多次格式化可能导致datanode无法识别,如果想要多次格式化,需要先删除数据再格式化)
./bin/hdfs namenode -format
启动 [root@10 sbin]# ./start-dfs.sh
报错:
Starting namenodes on [10.1.4.57]ERROR: Attempting to operate on hdfs namenode as rootERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.Starting datanodesERROR: Attempting to operate on hdfs datanode as rootERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.Starting secondary namenodes [10.1.4.57]ERROR: Attempting to operate on hdfs secondarynamenode as rootERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
把缺少的环境变量加上:hadoop-env.sh
export HDFS_DATANODE_SECURE_USER=rootexport HDFS_DATANODE_SECURE_USER=rootexport HDFS_SECONDARYNAMENODE_USER=root
再启动:
又报错:
Starting namenodes on [10.1.4.57]上一次登录:二 12月 26 15:06:36 CST 2017pts/2 上Starting datanodes10.1.4.58: ERROR: Cannot set priority of datanode process 575210.1.4.59: ERROR: Cannot set priority of datanode process 9788Starting secondary namenodes [10.1.4.58]上一次登录:二 12月 26 15:10:16 CST 2017pts/2 上10.1.4.58: secondarynamenode is running as process 5304. Stop it first.
这个问题,查阅了各个方面的文档始终无法解决,不知道会不会跟hostname的写法有关系,我的/etc/hosts配置方法是,这种写法在hadoopdatanode节点启动是无法识别到master_ip的,这个问题在后面尝试中改掉了,最后会导致集群没有可用datanode,要避免这样写
10.1.4.57 10.1.4.5710.1.4.58 10.1.4.5810.1.4.59 10.1.4.59
尝试失败..不过据说这个问题在2.*版本中是没有出现的,所以回退一下版本
退到2.7.2,把配置拷贝过去,并在配置目录下 etc/hadoop添加两个文件
vi masters
内容:
10.1.4.57
vi slaves
内容:
10.1.4.5810.1.4.59
启动
[root@10 sbin]# ./start-dfs.shStarting namenodes on [10.1.4.57]10.1.4.57: starting namenode, logging to /home/sri_udap/app/hadoop-2.7.2/logs/hadoop-root-namenode-10.1.4.57.out10.1.4.59: starting datanode, logging to /home/sri_udap/app/hadoop-2.7.2/logs/hadoop-root-datanode-10.1.4.59.out10.1.4.58: starting datanode, logging to /home/sri_udap/app/hadoop-2.7.2/logs/hadoop-root-datanode-10.1.4.58.outStarting secondary namenodes [10.1.4.57]10.1.4.57: starting secondarynamenode, logging to /home/sri_udap/app/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-10.1.4.57.out[root@10 sbin]# ./start-yarn.shstarting yarn daemonsstarting resourcemanager, logging to /home/sri_udap/app/hadoop-2.7.2/logs/yarn-root-resourcemanager-10.1.4.57.out10.1.4.59: starting nodemanager, logging to /home/sri_udap/app/hadoop-2.7.2/logs/yarn-root-nodemanager-10.1.4.59.out10.1.4.58: starting nodemanager, logging to /home/sri_udap/app/hadoop-2.7.2/logs/yarn-root-nodemanager-10.1.4.58.out
登录http://10.1.4.57:50070 查看hadoop作业情况
查看yarn 10.1.4.57:8088
测验:
测验可以等到安装hive后一起,因为复杂的hive语句将会产生MapReduce作业在hdfs
hdfs dfs -mkdir /inputhdfs dfs -put 1.txt /inputhadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input /output
这样就产生了一个作业,其中1.txt是随便写的一个文件,我们运行一个单词计数作业