hadoop集群部署指导
#基础环境&软件
软件版本
* jdk-1.8u111-x64.tar.gz* hbase-1.1.8.tar.gz* apache-hive-1.1.1.tar.gz* hadoop-2.6.5.tar.gz* zookeeper-3.4.6.tar.gz
环境
- OS CentOS release 6.6.x64
- 服务器3台
server1:hostname:bigdata1.localdomain ip:169.24.2.100server2:hostname:bigdata2.localdomainip:169.24.2.102server3:hostname:bigdata3.localdomainip:169.24.2.103
修改hosts
使用root用户登录每台服务器后执行下面的命令
echo 169.24.2.100 bigdata1.localdomain bigdata1 echo 169.24.2.102 bigdata2.localdomain bigdata2echo 169.24.2.103 bigdata3.localdomain bigdata3
新增hadoop用户
使用root用户登录每台服务器后执行下面的命令
useradd hadoop passwd hadoop #设置hadoop的密码
创建hadoop的部署目录并赋予hadoop权限
mkdir /hadoop_install chown -R hadoop:hadoop /hadoop_install
增加无密码登录
使用hadoop
登录到bigdata1
执行下面的指令
ssh-keygen -t rsa #不要输入密码,避免在远程执行指令时要求输入密码chmod 700 .ssh/authorized_keys
将id_rsa.pub的文件分发到其他两台服务器上
ssh hadoop@bigdata2 "cat>>~/.ssh/authorized_keys"<~/.ssh/id_rsa.pub #本次是需要输入远程密码的ssh hadoop@bigdata2 uname #验证是否需要输入密码
其他两台服务器上做同样的操作
时间同步
这一点十分重要,一定要保证服务器的时间是同步的,否则hbase启动会失败
配置基本软件
软件均部署在/hadoop_install目录下,请保证系统盘提供了足够大的空间
分别将软件
- jdk-1.8u111-x64.tar.gz
- hbase-1.1.8.tar.gz
- apache-hive-1.1.1.tar.gz
- hadoop-2.6.5.tar.gz
解压到/hadoop_install目录下,注意权限
chown -R hadoop:hadoop /hadoop_install
### 配置hadoop
创建数据目录
mkdir -p /hadoopchown hadoop:hadoop /hadoop
core-site.xml
fs.defaultFS hdfs://169.24.2.100:9000 io.file.buffer.size 4096
hdfs-site.xml
dfs.replication 2 dfs.datanode.max.xcievers 4096 dfs.nameservices hadoop-cluster1 dfs.namenode.secondary.http-address 169.24.2.100:50090 dfs.webhdfs.enabled true dfs.name.dir /hadoop/hdfs/name dfs.data.dir /hadoop/hdfs/data/
yarn-site.xml
yarn.resourcemanager.hostname 169.24.2.100 yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.address 169.24.2.100:8032 yarn.resourcemanager.scheduler.address 169.24.2.100:8030 yarn.resourcemanager.resource-tracker.address 169.24.2.100:8031 yarn.resourcemanager.admin.address 169.24.2.100:8033 yarn.resourcemanager.webapp.address 169.24.2.100:8088
mapred-site.xml
mapreduce.framework.name yarn true mapreduce.jobtracker.http.address 169.24.2.100:50030 mapreduce.jobhistory.address 169.24.2.100:10020 mapreduce.jobhistory.webapp.address 169.24.2.100:19888 mapred.job.tracker http://169.24.2.100:9001
配置hbase
创建目录/hadoop/hbase
mkdir /hadoop/hbase
配置zookeeper
zoo.cfg
文件如果不存在可以自行创建(zoo.cfg)
内容如下:
# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial# synchronization phase can takeinitLimit=10# The number of ticks that can pass between# sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just# example sakes.# the port at which the clients will connectclientPort=2181dataDir=/hadoop/zookeeperserver.3=169.24.2.103:2888:3888server.2=169.24.2.102:2888:3888server.1=169.24.2.100:2888:3888# the maximum number of client connections.# increase this if you need to handle more clients#maxClientCnxns=60## Be sure to read the maintenance section of the# administrator guide before turning on autopurge.## http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge feature#autopurge.purgeInterval=1
注意事项:
- 需要在每台机器上的dataDir对应的目录下手动创建myid文件,文件的内容与server.x一致 比如server.1=169.24.2.100 则需要在169.24.2.100上创建myid文件且写入1,其他的类推
配置好后在每台机器上运行
zkServer.sh start
配置HIVE
在每一个集群节点都需要同样的配置
- hive-site.xml配置如下
javax.jdo.option.ConnectionURL jdbc:mysql://169.24.2.40:3306/metastore_db?characterEncoding=UTF-8 javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName dev javax.jdo.option.ConnectionPassword 1qaz2wsx hive.metastore.uris thrift://169.24.2.100:9083 Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.
启动顺序
- 分别在3台服务器上启动zookeeper
zkServer.sh start
- 启动hadoop(仅在169.24.2.100上执行)
start-dfs.sh start-yarn.sh mr-jobhistory-daemon.sh start historyserver
- 启动hbase(仅在169.24.2.100上执行)
start-hbase.sh
- 启动hive元数据(仅在169.24.2.100上执行)
nohup hive --service metastore -p 9083 &
停止顺序
- 停止 hbase
stop-hbase.sh
- 停止hive的metastore
netstat -tunlp|grep 9083|awk '{print $7}'|awk -F '/' '{print $1}'|xargs kill -9
- 停止 hadoop
mr-jobhistory-daemon.sh stop historyserver stop-yarn.sh stop-dfs.sh
- 停止 zookeeper
分别到zookeeper集群的服务器上执行
zkServer.sh stop
踩过的坑
- hbase master is initializing.
本机引起的原因是多机器的时间不同步导致
- Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
kylin在hadoop集群上运行build动作时提示以上错误,原因是yarn-site.xml中的yarn.application.classpath没有进行配置,处理方法如下
hadoop classpath # 找classpath
修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml
请根据实际情况进行修改
yarn.application.classpath /hadoop_install/hadoop-2.6.5/etc/hadoop:/hadoop_install/hadoop-2.6.5/share/hadoop/common/lib/*:/hadoop_install/hadoop-2.6.5/share/hadoop/common/*:/hadoop_install/hadoop-2.6.5/share/hadoop/hdfs:/hadoop_install/hadoop-2.6.5/share/hadoop/hdfs/lib/*:/hadoop_install/hadoop-2.6.5/share/hadoop/hdfs/*:/hadoop_install/hadoop-2.6.5/share/hadoop/yarn/lib/*:/hadoop_install/hadoop-2.6.5/share/hadoop/yarn/*:/hadoop_install/hadoop-2.6.5/share/hadoop/mapreduce/lib/*:/hadoop_install/hadoop-2.6.5/share/hadoop/mapreduce/*:/hadoop_install/hadoop-2.6.5/contrib/capacity-scheduler/*.jar