linux 之 DolphinScheduler 安装步骤

下载安装包

直接进官网下载 https://dolphinscheduler.apache.org/zh-cn/download/download.html

参考官方文档 https://dolphinscheduler.apache.org/zh-cn/docs/1.3.2/user_doc/cluster-deployment.html

我下载的是1.3.2版本
apache-dolphinscheduler-incubating-1.3.2-dolphinscheduler-bin.tar.gz

基础环境

系统版本: centos6.5
普通用户: hadoop
家目录: /hadoop
JDK1.8: /hadoop/app/jdk1.8.0_281
mysql5.7.27: /hadoop/app/mysql
zookeeper-3.5.6: /hadoop/app/zookeeper-3.5.6
hadoop-2.7.7: /hadoop/app/hadoop-2.7.7

安装机器IP及hostname
192.168.100.10 bigdata01
192.168.100.11 bigdata02
192.168.100.12 bigdata03

配置sudo免密

使用root用户给每台机器配置sudo免密

vi /etc/sudoers

添加

hadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL

注释
# Defaults requirett

或者

echo 'hadoop  ALL=(ALL)  NOPASSWD: NOPASSWD: ALL' >> /etc/sudoers

sed -i 's/Defaults    requirett/#Defaults    requirett/g' /etc/sudoers

注意：
因为是以 sudo -u {linux-user} 切换不同linux用户的方式来实现多用户运行作业，所以部署用户需要有 sudo 权限，而且是免密的。
如果/etc/sudoers文件中有"Default requiretty"这行，须要注释掉
如果用到资源上传的话，还需要在`HDFS或者MinIO`上给该部署用户分配读写的权限

配置hostname

在所有机器上使用root用户配置hostname

vi /etc/hosts

192.168.100.10 bigdata01

192.168.100.11 bigdata02
192.168.100.12 bigdata03

配置ssh免密

在三台机器上都使用hadoop用户配置ssh免密

ssh-keygen -t rsa -m PEM

一直按回车，都设置为默认值，然后再当前用户的Home目录下的.ssh目录中会生成公钥文件（id_rsa.pub）和私钥文件（id_rsa）

分发公钥

ssh-copy-id 192.168.100.10

ssh-copy-id 192.168.100.11

ssh-copy-id 192.168.100.12

注意：正常设置后，ssh bigdata01 是不需要再输入密码的

配置JAVA环境

hadoop用户已经安装/hadoop/app/jdk1.8.0_281
将jdk软链/bin/java下

因为已经存在open-jdk软链接，需root用户修改

sudo ln -snf /hadoop/app/jdk1.8.0_281/bin/java /bin/java

数据库初始化

CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'root'@'%' IDENTIFIED BY '123';

GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'root'@'localhost' IDENTIFIED BY '123';

flush privileges;

添加mysql-connector-java 驱动jar包

手动添加 [ mysql-connector-java 驱动 jar ] 包mysql-connector-java-5.1.49.jar到lib目录

下载mysql-connector-java-5.1.49.jar包

修改配置文件conf/datasource.properties

vi conf/datasource.properties

spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://ywjcapp4:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true
spring.datasource.username=root
spring.datasource.password=123

执行建表及导入基础数据脚本

sh script/create-dolphinscheduler.sh

配置运行参数

vi conf/env/dolphinscheduler_env.sh

export HADOOP_HOME=/hadoop/app/hadoop-2.7.7
export HADOOP_CONF_DIR=/hadoop/app/hadoop-2.7.7/etc/hadoop
#export SPARK_HOME1=/opt/soft/spark1
#export SPARK_HOME2=/opt/soft/spark2
#export PYTHON_HOME=/opt/soft/python

export JAVA_HOME=/opt/soft/java

#export HIVE_HOME=/opt/soft/hive
#export FLINK_HOME=/opt/soft/flink
#export DATAX_HOME=/opt/soft/datax/bin/datax.py

export PATH=$JAVA_HOME/bin:$PATH

修改一键部署配置文件 conf/config/install_config.conf中的各参数

vi conf/config/install_config.conf

#

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

#

#     http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

#

# NOTICE :  If the following config has special characters in the variable `.*[]^${}\+?|()@#&`, Please escape, for example, `[` escape to `\[`

# 这里填 mysql or postgresql

dbtype="mysql"

# db config

# 数据库连接地址

dbhost="bigdata01:3306"

# 数据库用户名，此处需要修改为上面设置的{user}具体值

username="root"

# 数据库名

dbname="dolphinscheduler"

# 数据库密码, 如果有特殊字符，请使用\转义，需要修改为上面设置的{password}具体值

password="123"

# Zookeeper地址

zkQuorum="bigdata01:2181,bigdata02:2181,bigdata03:2181"

# #将DS安装到哪个目录，如: /opt/soft/dolphinscheduler，不同于现在的目录

installPath="/hadoop/app/ds"

#使用哪个用户部署，使用第3节创建的用户

# Note: the deployment user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled, the root directory needs to be created by itself

deployUser="hadoop"

# 邮件配置，以qq邮箱为例

# 邮件协议

mailProtocol="SMTP"

# 邮件服务地址

mailServerHost="smtp.qq.com"

# 邮件服务端口

# note: Different protocols and encryption methods correspond to different ports, when SSL/TLS is enabled, make sure the port is correct.

mailServerPort="25"

# mailSender和mailUser配置成一样即可

# 发送者

mailSender="xxx@qq.com"

# 发送用户

mailUser="xxx@qq.com"

# 邮箱密码

# note: The mail.passwd is email service authorization code, not the email login password.

mailPassword="xxx"

# TLS协议的邮箱设置为true，否则设置为false

starttlsEnable="true"

# 开启SSL协议的邮箱配置为true，否则为false。注意: starttlsEnable和sslEnable不能同时为true

# only one of TLS and SSL can be in the true state.

sslEnable="false"

#note: 邮件服务地址值，参考上面 mailServerHost

sslTrust="smtp.qq.com"

# 业务用到的比如sql等资源文件上传到哪里，可以设置：HDFS,S3,NONE，

# 单机如果想使用本地文件系统，请配置为HDFS，因为HDFS支持本地文件系统；

# 如果不需要资源上传功能请选择NONE。强调一点：使用本地文件系统不需要部署hadoop

# resource storage type：HDFS,S3,NONE

resourceStorageType="HDFS"

# 如果上传资源保存想保存在hadoop上，hadoop集群的NameNode启用了HA的话，

# 需要将hadoop的配置文件core-site.xml和hdfs-site.xml放到安装路径(/hadoop/app/ds/conf)的conf目录下，并配置namenode cluster名称；

# 如果NameNode不是HA,则只需要将mycluster修改为具体的ip或者主机名即可

# if resourceStorageType is HDFS，defaultFS write namenode address，HA you need to put core-site.xml and hdfs-site.xml in the conf directory.

# if S3，write S3 address，HA，for example ：s3a://dolphinscheduler，

# Note，s3 be sure to create the root directory /dolphinscheduler

defaultFS="hdfs://nn1:8020"

# if resourceStorageType is S3, the following three configuration is required, otherwise please ignore

s3Endpoint="http://192.168.xx.xx:9010"

s3AccessKey="xxxxxxxxxx"

s3SecretKey="xxxxxxxxxx"

# 如果没有使用到Yarn,保持以下默认值即可;

# 如果ResourceManager是HA，则配置为ResourceManager节点的主备ip或者hostname,比如"192.168.xx.xx,192.168.xx.xx";

# 如果是单ResourceManager请配置yarnHaIps=""即可

# if resourcemanager HA enable, please type the HA ips ; if resourcemanager is single, make this value empty

yarnHaIps="bigdata01,bigdata02"

# 如果ResourceManager是HA或者没有使用到Yarn保持默认值即可；

# 如果是单ResourceManager，请配置真实的ResourceManager主机名或者ip

# if resourcemanager HA enable or not use resourcemanager, please skip this value setting; If resourcemanager is single, you only need to replace yarnIp1 to actual resourcemanager hostname.

# singleYarnIp="yarnIp1"

# 资源上传根路径,主持HDFS和S3,由于hdfs支持本地文件系统，需要确保本地文件夹存在且有读写权限

# resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration, please make sure the directory exists on hdfs and have read write permissions。/dolphinscheduler is recommended

resourceUploadPath="/hadoop/data/dolphinscheduler"

# 具备权限创建resourceUploadPath的用户

# who have permissions to create directory under HDFS/S3 root path

# Note: if kerberos is enabled, please config hdfsRootUser=

hdfsRootUser="hadoop"

# kerberos config

# whether kerberos starts, if kerberos starts, following four items need to config, otherwise please ignore

kerberosStartUp="false"

# kdc krb5 config file path

krb5ConfPath="$installPath/conf/krb5.conf"

# keytab username

keytabUserName="hdfs-mycluster@ESZ.COM"

# username keytab path

keytabPath="$installPath/conf/hdfs.headless.keytab"

# api server port

apiServerPort="12345"

# 在哪些机器上部署DS服务，本机选localhost

# install hosts

# Note: install the scheduled hostname list. If it is pseudo-distributed, just write a pseudo-distributed hostname

ips="bigdata01,bigdata02,bigdata03"

#ssh端口,默认22

# ssh port, default 22

# Note: if ssh port is not default, modify here

sshPort="22"

#master服务部署在哪台机器上

# run master machine

# Note: list of hosts hostname for deploying master

masters="bigdata01,bigdata02"

# worker服务部署在哪台机器上,并指定此worker属于哪一个worker组,下面示例的default即为组名

# run worker machine

# note: need to write the worker group name of each worker, the default value is "default"

workers="bigdata01:default,bigdata02:default,bigdata03:default"

# 报警服务部署在哪台机器上

# run alert machine

# note: list of machine hostnames for deploying alert server

alertServer="bigdata01"

# 后端api服务部署在在哪台机器上

# run api machine

# note: list of machine hostnames for deploying api server

apiServers="bigdata01"

执行一键安装

sh install.sh

注意：
将hadoop的配置文件 core-site.xml 和 hdfs-site.xml 放到/hadoop/app/ds/conf下面：

cp /hadoop/app/hadoop-2.7.7/etc/hadoop/core-site.xml /hadoop/app/ds/conf

cp /hadoop/app/hadoop-2.7.7/etc/hadoop/hdfs-site.xml /hadoop/app/ds/conf

重启服务

/hadoop/app/ds/bin/stop_all.sh

/hadoop/app/ds/bin/start_all.sh

进程说明

MasterServer 主要负责 DAG 的切分和任务状态的监控
WorkerServer/LoggerServer 主要负责任务的提交、执行和任务状态的更新。LoggerServer 用于 Rest Api 通过 RPC 查看日志
ApiServer 提供 Rest Api 服务，供 UI 进行调用
AlertServer 提供告警服务

查看前台web页面

初始账号、密码： admin/dolphinscheduler123

http://192.168.100.10:12345/dolphinscheduler

OK,安装完成！