Nomad 简介

Nomad是一个管理机器集群并在集群上运行应用程序的工具。

Nomad的特点：

支持docker,Nomad的job可以使用docker驱动将应用部署到集群中。
Nomad安装在linux中仅需单一的二进制文件，不需要其他服务协调，Nomad将资源管理器和调度程序的功能集成到一个系统中。
多数据中心，可以跨数据中心调度。
分布式高可用，支持多种驱动程序（Docker、VMS、Java）运行job，支持多种系统（Linux、Windows、BSD、OSX）。

Nomad安装

一般环境下，首先安装Vagrant，利用Vagrant连接本地的Virtualbox，创建本地测试环境。不过由于在学习过程中，本地win7环境缺失了一些组件，导致无法安装并使用Vagrant。

所以直接使用Linux虚拟机来进行学习。本环境使用Ubuntu16.04，Docker version 17.09.0-ce。

下载Nomad二进制文件,选择适合你系统的安装包。

# wget https://releases.hashicorp.com/nomad/0.7.0/nomad_0.7.0_linux_amd64.zip?_ga=2.169483045.503594617.1512349197-1498904827.1511322624

解压安装包，将Nomad文件放在/usr/local/bin下.

# unzip -o nomad_0.7.0_linux_amd64.zip -d /usr/local/bin/

# cd /usr/local/bin

# chmod +x nomad

终端输入nomad，可看到nomad 提示，即安装成功。

开始Nomad

为了简单运行，我们以开发模式运行Nomad agent。开发模式可以快速启动server端和client端，测试学习Nomad。

# nomad agent -dev

==> Starting Nomad agent...

==> Nomad agent configuration:

                Client: true

             Log Level: DEBUG

                Region: global (DC: dc1)

                Server: true

==> Nomad agent started! Log data will stream in below:

    [INFO] serf: EventMemberJoin: nomad.global 127.0.0.1

    [INFO] nomad: starting 4 scheduling worker(s) for [service batch _core]

    [INFO] client: using alloc directory /tmp/NomadClient599911093

    [INFO] raft: Node at 127.0.0.1:4647 [Follower] entering Follower state

    [INFO] nomad: adding server nomad.global (Addr: 127.0.0.1:4647) (DC: dc1)

    [WARN] fingerprint.network: Ethtool not found, checking /sys/net speed file

    [WARN] raft: Heartbeat timeout reached, starting election

    [INFO] raft: Node at 127.0.0.1:4647 [Candidate] entering Candidate state

    [DEBUG] raft: Votes needed: 1

    [DEBUG] raft: Vote granted. Tally: 1

    [INFO] raft: Election won. Tally: 1

    [INFO] raft: Node at 127.0.0.1:4647 [Leader] entering Leader state

    [INFO] raft: Disabling EnableSingleNode (bootstrap)

    [DEBUG] raft: Node 127.0.0.1:4647 updated peer set (2): [127.0.0.1:4647]

    [INFO] nomad: cluster leadership acquired

    [DEBUG] client: applied fingerprints [arch cpu host memory storage network]

    [DEBUG] client: available drivers [docker exec java]

    [DEBUG] client: node registration complete

    [DEBUG] client: updated allocations at index 1 (0 allocs)

    [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 0)

    [DEBUG] client: state updated to ready

在终端输出中看到，server和client都为true，表示同时开启了server和client。

Nomad集群节点

# nomad node-status

ID             DC   Name           Class   Drain  Status

fb533fd8  dc1  yc-jumpbox  <none>  false  ready

输出显示了我们的节点ID，它是随机生成的UUID，其数据中心，节点名称，节点类别，漏斗模式和当前状态。我们可以看到我们的节点处于就绪状态。

# nomad server-members

Name                      Address     Port  Status  Leader  Protocol  Build  Datacenter  Region

yc-jumpbox.global  10.30.0.52  4648  alive   true            2         0.7.0  dc1             global

输出显示了我们自己的server，运行的地址，运行状况，一些版本信息以及数据中心和区域。

停止Nomad agent

你可以使用Ctrl-C中断agent。默认情况下，所有信号都会导致agent强制关闭。

Nomad Job

Job是我们在使用Nomad主要交互的内容。

示例Job

进入你的工作目录使用nomad init命令。它会在当前目录生成一个example.nomad,这是一个示例的nomad job配置文件。

# cd /tmp

# nomad init

Example job file written to example.nomad

运行这个job，我们使用nomad run命令。

# nomad run example.nomad

==> Monitoring evaluation "13ebb66d"

    Evaluation triggered by job "example"

    Allocation "883269bf" created: node "e42d6f19", group "cache"

    Evaluation within deployment: "b0a84e74"

    Evaluation status changed: "pending" -> "complete"

==> Evaluation "13ebb66d" finished with status "complete"

查看job状态,我们使用nomad status 命令

# nomad status example

ID            = example

Name          = example

Submit Date   = 12/05/17 10:58:40 UTC

Type          = service

Priority      = 50

Datacenters   = dc1

Status        = running

Periodic      = false

Parameterized = false

Summary

Task Group  Queued  Starting  Running  Failed  Complete  Lost

cache       0       0         1        0       0         0

Latest Deployment

ID          = b0a84e74

Status      = successful

Description = Deployment completed successfully

Deployed

Task Group  Desired  Placed  Healthy  Unhealthy

cache       1        1       1        0

Allocations

ID        Node ID   Task Group  Version  Desired  Status   Created At

883269bf  e42d6f19  cache       0        run      running  12/05/17 10:58:40 UTC

检查job的分配情况，我们使用nomad alloc-status命令。

# nomad alloc-status 883269bf

ID                  = 883269bf

Eval ID             = 13ebb66d

Name                = example.cache[0]

Node ID             = e42d6f19

Job ID              = example

Job Version         = 0

Client Status       = running

Client Description  = <none>

Desired Status      = run

Desired Description = <none>

Created At          = 12/05/17 10:58:49 UTC

Deployment ID       = b0a84e74

Deployment Health   = healthy

Task "redis" is "running"

Task Resources

CPU        Memory           Disk     IOPS  Addresses

8/500 MHz  6.3 MiB/256 MiB  300 MiB  0     db: 127.0.0.1:22672

Task Events:

Started At     = 12/05/17 10:58:49 UTC

Finished At    = N/A

Total Restarts = 0

Last Restart   = N/A

Recent Events:

Time                   Type        Description

10/31/17 22:58:49 UTC  Started     Task started by client

10/31/17 22:58:40 UTC  Driver      Downloading image redis:3.2

10/31/17 22:58:40 UTC  Task Setup  Building Task Directory

10/31/17 22:58:40 UTC  Received    Task received by client

查看job日志，我们使用nomad logs 命令。注意logs后面的参数为uuid和task名字。uuid可以通过nomad status example命令得到，task名字在example.nomad配置文件中定义。

# nomad logs 883269bf redis

                 _._

            _.-``__ ''-._

       _.-``    `.  `_.  ''-._           Redis 3.2.1 (00000000/0) 64 bit

   .-`` .-```.  ```\/    _.,_ ''-._

  (    '      ,       .-`  | `,    )     Running in standalone mode

  |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379

  |    `-._   `._    /     _.-'    |     PID: 1

   `-._    `-._  `-./  _.-'    _.-'

  |`-._`-._    `-.__.-'    _.-'_.-'|

  |    `-._`-._        _.-'_.-'    |           http://redis.io

   `-._    `-._`-.__.-'_.-'    _.-'

  |`-._`-._    `-.__.-'    _.-'_.-'|

  |    `-._`-._        _.-'_.-'    |

   `-._    `-._`-.__.-'_.-'    _.-'

       `-._    `-.__.-'    _.-'

           `-._        _.-'

               `-.__.-'

...

修改job

# vim example.nomad

在文件中找到 count = 1，改为count = 3.

完成修改后，使用nomad plan example.nomad命令

# nomad plan example.nomad

+/- Job: "example"

+/- Task Group: "cache" (2 create, 1 in-place update)

  +/- Count: "1" => "3" (forces create)

      Task: "redis"

Scheduler dry-run:

- All tasks successfully allocated.

Job Modify Index: 7

To submit the job with version verification run:

nomad run -check-index 7 example.nomad

When running the job with the check-index flag, the job will only be run if the

server side version matches the job modify index returned. If the index has

changed, another user has modified the job and the plan's results are

potentially invalid.

使用给出的更新命令去更新job。

# nomad run -check-index 7 example.nomad

==> Monitoring evaluation "93d16471"

    Evaluation triggered by job "example"

    Evaluation within deployment: "0d06e1b6"

    Allocation "3249e320" created: node "e42d6f19", group "cache"

    Allocation "453b210f" created: node "e42d6f19", group "cache"

    Allocation "883269bf" modified: node "e42d6f19", group "cache"

    Evaluation status changed: "pending" -> "complete"

==> Evaluation "93d16471" finished with status "complete"

停止job，我们使用nomad stop命令。使用nomad status命令可以看到这个job的状态为dead（stopped）。

# nomad stop example

==> Monitoring evaluation "6d4cd6ca"

    Evaluation triggered by job "example"

    Evaluation within deployment: "f4047b3a"

    Evaluation status changed: "pending" -> "complete"

==> Evaluation "6d4cd6ca" finished with status "complete"

建立简单的Nomad集群

Nomad集群分为两部分，server服务端和client客户端。每个区域至少有一台server，建议使用3或者5台server集群。Nomad客户端是一个非常轻量级的进程，它注册主机，执行心跳，并运行由服务器分配给它的任务。代理必须在集群中的每个节点上运行，以便服务器可以将工作分配给这些机器。

启动服务器

第一步是为服务器创建配置文件。无论是从下载的文件github，或粘贴到一个名为server.hcl：

vim server.hcl

# Increase log verbosity

log_level = "DEBUG"

#setup datacenter

datacenter= "dc1"

# Setup data dir

data_dir = "/tmp/server1"

# Enable the server

server {

enabled = true

# Self-elect, should be 3 or 5 for production

bootstrap_expect = 1}

这是一个相当最小的服务器配置文件，但只能以仅服务器方式启动代理，并将其选为leader。应该对生产进行的主要变化是运行多台服务器，并更改相应的bootstrap_expect值。

创建文件后，在新选项卡中启动代理：

$ sudo nomad agent -config server.hcl

==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.

==> Starting Nomad agent...

==> Nomad agent configuration:

Client: false

Log Level: DEBUG

Region: global (DC: dc1)

Server: true

Version: 0.6.0

==> Nomad agent started! Log data will stream in below:

[INFO] serf: EventMemberJoin: nomad.global 127.0.0.1

[INFO] nomad: starting 4 scheduling worker(s) for [service batch _core]

[INFO] raft: Node at 127.0.0.1:4647 [Follower] entering Follower state

[INFO] nomad: adding server nomad.global (Addr: 127.0.0.1:4647) (DC: dc1)

[WARN] raft: Heartbeat timeout reached, starting election

[INFO] raft: Node at 127.0.0.1:4647 [Candidate] entering Candidate state

[DEBUG] raft: Votes needed: 1

[DEBUG] raft: Vote granted. Tally: 1

[INFO] raft: Election won. Tally: 1

[INFO] raft: Node at 127.0.0.1:4647 [Leader] entering Leader state

[INFO] nomad: cluster leadership acquired

[INFO] raft: Disabling EnableSingleNode (bootstrap)

[DEBUG] raft: Node 127.0.0.1:4647 updated peer set (2): [127.0.0.1:4647]

我们可以看到，客户端模式被禁用，我们只是作为服务器运行。这意味着该服务器将管理状态并进行调度决策，但不会执行任何任务。现在我们需要一些代理来运行任务！

启动客户端

与服务器类似，我们必须先配置客户端。请从github下载client1和client2的配置，或将以下内容粘贴到client1.hcl：

# Increase log verbosity

log_level = "DEBUG"

# Setup data dir

data_dir = "/tmp/client1"

# Enable the client

client {

enabled = true

# For demo assume we are talking to server1. For production,

# this should be like "nomad.service.consul:4647" and a system

# like Consul used for service discovery.

servers = ["127.0.0.1:4647"]

}

# Modify our port to avoid a collision with server1

ports {

http = 5656

}

将该文件复制client2.hcl并更改data_dir为“/tmp/client2 ”并将端口更改为5657.一旦创建了这两个文件，client1.hcl并client2.hcl打开每个选项卡并启动代理程序：

# sudo nomad agent -config client1.hcl

==> Starting Nomad agent...

==> Nomad agent configuration:

Client: true

Log Level: DEBUG

Region: global (DC: dc1)

Server: false

Version: 0.6.0

==> Nomad agent started! Log data will stream in below:

[DEBUG] client: applied fingerprints [host memory storage arch cpu]

[DEBUG] client: available drivers [docker exec]

[DEBUG] client: node registration complete

...

在输出中，我们可以看到代理仅在客户端模式下运行。该代理将可用于运行任务，但不会参与管理集群或做出调度决策。

使用node-status命令我们应该看到ready状态中的两个节点：

# nomad node-status

ID Datacenter Name Class Drain Status

fca62612 dc1 nomad <none> false ready

c887deef dc1 nomad <none> false ready

我们现在有一个简单的三节点集群运行。演示和完整生产集群之间的唯一区别是，我们运行的是单个服务器，而不是三个或五个。

提交工作

现在我们有一个简单的集群，我们可以用它来安排一个工作。我们还应该拥有example.nomad之前的作业文件，但是确认count仍然设置为3。

然后，使用run命令提交作业：

# nomad init

# nomad run example.nomad

==> Monitoring evaluation "8e0a7cf9"

Evaluation triggered by job "example"

Evaluation within deployment: "0917b771"

Allocation "501154ac" created: node "c887deef", group "cache"

Allocation "7e2b3900" created: node "fca62612", group "cache"

Allocation "9c66fcaf" created: node "c887deef", group "cache"

Evaluation status changed: "pending" -> "complete"

==> Evaluation "8e0a7cf9" finished with status "complete"

我们可以在输出中看到调度程序为其中一个客户机节点分配了两个任务，剩下的任务分配给第二个客户端。

我们可以再次使用status命令验证：

# nomad status example

ID = example

Name = example

Submit Date = 07/26/17 16:34:58 UTC

Type = service

Priority = 50

Datacenters = dc1

Status = running

Periodic = false

Parameterized = false

Summary

Task Group Queued Starting Running Failed Complete Lost

cache 0 0 3 0 0 0

Latest Deployment

ID = fc49bd6c

Status = running

Description = Deployment is running

Deployed

Task Group Desired Placed Healthy Unhealthy

cache 3 3 0 0

Allocations

ID Eval ID Node ID Task Group Desired Status Created At

501154ac 8e0a7cf9 c887deef cache run running 08/08/16 21:03:19 CDT

7e2b3900 8e0a7cf9 fca62612 cache run running 08/08/16 21:03:19 CDT

9c66fcaf 8e0a7cf9 c887deef cache run running 08/08/16 21:03:19 CDT

我们可以看到我们的所有任务已经分配并正在运行。一旦我们对我们的工作感到满意，我们就可以把它删掉了nomad stop。

使用nomad UI

仁者见仁智者见智，我在使用途中，觉得第一种UI是挺好的，可以看到很多细节的内容，相比官方的UI还没有完善更多功能。

目前Nomad0.7版本集成了UI，在0.7版本之前，UI一直没有很好的实现，所以我在github上找到一位大牛的UI作品https://github.com/jippi/hashi-ui。

官方UI

需要在github上下载nomad项目到本地，地址为：https://github.com/hashicorp/nomad/tree/master/ui
认真阅读README，将Node.js、Yarn、Ember CLI、PhantomJS安装在本地环境中。
安装

# cd ui/

# yarn

安装完成后，运行这条命令：ember serve --proxy http://10.30.0.52:4646 （10.30.0.52换成你的外网IP，4646换成你自定义的端口），即可在浏览器中查看。

常见问题

服务会运行在127.0.0.1网卡上，外部不能访问？

建议在运行nomad agent时，命令行配置相应的网卡。例如：

# nomad agent -config server.hcl -bind=0.0.0.0

# nomad agent -config client1.hcl -network-interface=ens160

使用docker运行服务时，容器会映射随机端口在本地？

根据研究官方文档，文档中提示了docker会随机映射端口，如果想使用静态端口，可以在job配置文件中定义。

简单的job配置文件

hello world

# cat hello.nomad

job "hello1" {

  datacenters = ["dc1"]  #定义数据中心

  group "hello2" {      #组名字

    task "hello3" {      #一般使用服务名字表示task名字

      driver = "docker"     #使用docker驱动

      config {

        image = "hashicorp/http-echo"    #服务镜像名字

        args = [                                        #容器运行时的命令参数

          "-listen", ":5678",

          "-text", "hello world",

        ]

      }

      resources {                                #配置服务的资源

        network {

          mbits = 10                            #限制10MB带宽

          port "http" {

            static = "5678"                    #使用静态端口

          }

        }

      }

    }

  }

}

搭建一个redmine，由于我还没弄明白nomad如何像docker-compose一样启动服务，所以mysql只好提前单独运行起来。

# cat redmine-example.nomad 

job "redmine" {

  region = "global"        #设置地区

  datacenters = ["dc1"]    #设置数据中心

  type = "service"            #设置该job类型是服务，主要用于conusl的服务注册，不写这条，该job不会注册服务到consul

  update {

    max_parallel = 1                    #同时更新任务数量

    min_healthy_time = "10s"      #分配必须处于健康状态的最低时间，然后标记为正常状态。

    healthy_deadline = "3m"        #标记为健康的截止日期，之后分配自动转换为不健康状态

    auto_revert = false                 #指定job在部署失败时是否应自动恢复到上一个稳定job

    canary = 0                              #如果修改job以后导致更新失败，需要创建指定数量的替身，不会停止之前的旧版本，一旦确定替身健康，他们就会提升为正式服务，更新旧版本。

  }

  group "redmine" {

    count = 1                        # 启动服务数量

    restart {

      attempts = 10                #时间间隔内重启次数

      interval = "5m"               #在服务开始运行的持续时间内，如果一直出现故障，则会由mode控制。mode是控制任务在一个时间间隔内失败多次的行为。

      delay = "25s"                 #重新启动任务之前要等待的时间

      mode = "delay"              #指示调度程序延迟下一次重启，直到下一次重启成功。

    }

    ephemeral_disk {             #临时磁盘 MB为单位

      size = 300

    }

   task "redmine" {

      driver = "docker"

      env {                #环境变量

        REDMINE_DB_MYSQL = "10.30.0.52"

        REDMINE_DB_POSTGRES = "3306"

        REDMINE_DB_PASSWORD = "passwd"

        REDMINE_DB_USER = "root"

        REDMINE_DB_NAME = "redmine"

      }

      config {

        image = "redmine:yc"

        port_map {           #指定映射的端口

          re = 3000

        }

      }

      logs {

        max_files     = 10       #日志文件最多数量

        max_file_size = 15    #单个日志文件大小 MB单位

      }

      resources {

        cpu    = 500 # 500 MHz        #限制服务的cpu，内存，网络

        memory = 256 # 256MB

        network {

          mbits = 10

          port "re" {}          #使用上面配置的映射端口

        }

      }

      service {

        name = "global-redmine-check"        #健康检查

        tags = ["global", "redmine"]

        port = "re"

        check {

          name     = "alive"

          type     = "tcp"

          interval = "10s"

          timeout  = "2s"

        }

      }

  }

}

Nomad入门的更多相关文章

Docker三十分钟快速入门（下）
一.背景上篇文章我们进行了Docker的快速入门,基本命令的讲解,以及简单的实战,那么本篇我们就来实战一个真实的项目,看看怎么在产线上来通过容器技术来运行我们的项目,来达到学会容器间通信以及dock ...
Consul 入门-运行
HashiCorp Consul 是由 HashiCorp 公司开发的,它是一家专注于 DevOps 工具链的公司,旗下的明星级产品包括 Vagrant.Terraform.Vault.Nomad 以 ...
Angular2入门系列教程7-HTTP（一）-使用Angular2自带的http进行网络请求
上一篇:Angular2入门系列教程6-路由(二)-使用多层级路由并在在路由中传递复杂参数感觉这篇不是很好写,因为涉及到网络请求,如果采用真实的网络请求,这个例子大家拿到手估计还要自己写一个web ...
ABP入门系列（1）——学习Abp框架之实操演练
作为.Net工地搬砖长工一名,一直致力于挖坑(Bug)填坑(Debug),但技术却不见长进.也曾热情于新技术的学习,憧憬过成为技术大拿.从前端到后端,从bootstrap到javascript,从py ...
Oracle分析函数入门
一.Oracle分析函数入门分析函数是什么?分析函数是Oracle专门用于解决复杂报表统计需求的功能强大的函数,它可以在数据中进行分组然后计算基于组的某种统计值,并且每一组的每一行都可以返回一个统计 ...
Angular2入门系列教程6-路由（二）-使用多层级路由并在在路由中传递复杂参数
上一篇:Angular2入门系列教程5-路由(一)-使用简单的路由并在在路由中传递参数之前介绍了简单的路由以及传参,这篇文章我们将要学习复杂一些的路由以及传递其他附加参数.一个好的路由系统可以使我们 ...
Angular2入门系列教程5-路由（一）-使用简单的路由并在在路由中传递参数
上一篇:Angular2入门系列教程-服务上一篇文章我们将Angular2的数据服务分离出来,学习了Angular2的依赖注入,这篇文章我们将要学习Angualr2的路由为了编写样式方便,我们这篇 ...
Angular2入门系列教程4-服务
上一篇文章 Angular2入门系列教程-多个组件,主从关系在编程中,我们通常会将数据提供单独分离出来,以免在编写程序的过程中反复复制粘贴数据请求的代码 Angular2中提供了依赖注入的概念,使得 ...
wepack+sass+vue 入门教程（三）
十一.安装sass文件转换为css需要的相关依赖包 npm install --save-dev sass-loader style-loader css-loader loader的作用是辅助web ...

随机推荐

JavaBean编辑器的简单介绍
引言 Sun所指定的JavaBean规范很大程度上是为IDE准备的--它让IDE能够以可视化的方式设置JavaBean的属性.如果在IDE中开发一个可视化的应用程序,则需要通过属性设置的方式对组成应用 ...
Git相关操作二
1.查看HEAD提交: git show HEAD 在git中,目前提交被称为HEAD提交,输入上述命令可以查看当前提交所有文件的修改内容. 2.撤销更改: git checkout HEAD fil ...
Java基础-运算符(03)
概念: 运算符:就是对于常量和变量进行操作的符号. 表达式:用运算符连接起来的符合java语法的式子,不同的运算符连接的表达式是不同类型的表达式. 运算符分类: 算数运算符(+ - * / % ...
扩展Spring切面功能
概述 Spring的切面(Spring动态代理)在Spring中应用十分广泛,例如还有事务管理,重试等等.网上介绍SpringAop源码很多,这里假设你对SpringAop有基本的了解.如果你认为Sp ...
关于Python输出时间戳的问题
在我们的程序中,有时候想要知道程序的执行时间或者准确的停止时间,这时候就需要我们自己添加一个时间戳,以便我们做出判断和相应的处理. 下面是我亲测并收集的资料,菜鸟一枚,不全之处大神可给予补充和指正. ...
写出易于调试的SQL
1.前言相比高级语言的调试如C# , 调试SQL是件痛苦的事 . 特别是那些上千行的存储过程, 更是我等码农的噩梦. 在将上千行存储过程的SQL 分解到 C# 管理后, 也存在调试的不通畅, 如何让 ...
安装PyQt5之后mayavi和VTK不能使用
mayavi在显示数据的过程中需要调用PyQt4的GUI方法产生应用框架.但是新发布的PyQt5和PyQt4在很多方面都是不兼容的,这也就导致了用mayavi编写的程序运行失败.在实践之后,我的解决方 ...
django 实现登录时候输入密码错误5次锁定用户十分钟
在学习django的时候,想要实现登录失败后,进行用户锁定,切记录锁定时间,在网上找了很多资料,但是都感觉不是那么靠谱, 于是乎,我开始了我的设计,其实我一开始想要借助redis呢,但是想要先开发一个 ...
1041: [HAOI2008]圆上的整点
1041: [HAOI2008]圆上的整点 Time Limit: 10 Sec Memory Limit: 162 MBSubmit: 4298 Solved: 1944[Submit][Sta ...
A Very Simple Problem
A Very Simple Problem Time Limit: 4000/2000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Other ...

Nomad入门