http://azkaban.github.io/azkaban/docs/2.5/

There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.

【solve the problem of Hadoop job dependencies】

Azkaban was implemented at LinkedIn to solve the problem of Hadoop job dependencies. We had jobs that needed to run in order, from ETL jobs to data analytics products.

Initially a single server solution, with the increased number of Hadoop users over the years, Azkaban has evolved to be a more robust solution.

Azkaban consists of 3 key components:

  • Relational Database (MySQL)
  • AzkabanWebServer
  • AzkabanExecutorServer

Relational Database (MySQL)

Azkaban uses MySQL to store much of its state. Both the AzkabanWebServer and the AzkabanExecutorServer access the DB.

How does AzkabanWebServer use the DB?

The web server uses the db for the following reasons:

  • Project Management - The projects, the permissions on the projects as well as the uploaded files.
  • Executing Flow State - Keep track of executing flows and which Executor is running them.
  • Previous Flow/Jobs - Search through previous executions of jobs and flows as well as access their log files.
  • Scheduler - Keeps the state of the scheduled jobs.
  • SLA - Keeps all the sla rules

How does the AzkabanExecutorServer use the DB?

The executor server uses the db for the following reasons:

  • Access the project - Retrieves project files from the db.
  • Executing Flows/Jobs - Retrieves and updates data for flows and that are executing
  • Logs - Stores the output logs for jobs and flows into the db.
  • Interflow dependency - If a flow is running on a different executor, it will take state from the DB.

There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.

AzkabanWebServer

The AzkabanWebServer is the main manager to all of Azkaban. It handles project management, authentication, scheduler, and monitoring of executions. It also serves as the web user interface.

Using Azkaban is easy. Azkaban uses *.job key-value property files to define individual tasks in a work flow, and the _dependencies_ property to define the dependency chain of the jobs. These job files and associated code can be archived into a *.zip and uploaded through the web server through the Azkaban UI or through curl.

AzkabanExecutorServer

Previous versions of Azkaban had both the AzkabanWebServer and the AzkabanExecutorServer features in a single server. The Executor has since been separated into its own server. There were several reasons for splitting these services: we will soon be able to scale the number of executions and fall back on operating Executors if one fails. Also, we are able to roll our upgrades of Azkaban with minimal impact on the users. As Azkaban's usage grew, we found that upgrading Azkaban became increasingly more difficult as all times of the day became 'peak'.

【 no cyclical dependencies detected】

Select the archive file of your workflow files that you want to upload. Currently Azkaban only supports *.zip files. The zip should contain the *.job files and any files needed to run your jobs. Job names must be unique in a project.

Azkaban will validate the contents of the zip to make sure that dependencies are met and that there's no cyclical dependencies detected. If it finds any invalid flows, the upload will fail.

Uploads overwrite all files in the project. Any changes made to jobs will be wiped out after a new zip file is uploaded.

After a successful upload, you should see all of your flows listed on the screen.

http://oozie.apache.org/

Apache Oozie Workflow Scheduler for Hadoop

Overview

Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability.

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Oozie is a scalable, reliable and extensible system.

Azkaban_Oozie_action的更多相关文章

随机推荐

  1. iOS9.0 友盟分享详细过程

    一: 申请友盟的AppKey(友盟的Key是根据应用的名称生成的!) 在友盟注册了你自己的开发者账号后就可以申请AppKey了.然后在这个方法里面设置Key - (BOOL)application:( ...

  2. node.js中的exports和module.exports

    不同的编程语言都有各自的代码组织和复用的方式,如.net.php中的命名空间,python中的import,ruby中的module等,来避免命名空间污染.一直都没搞清楚node中的exports和m ...

  3. Scut游戏服务器引擎6.1.5.6发布,直接可运行,支持热更新

    1. 增加exe版(console),web版本(IIS)的游戏服宿主程序 2. 增加Model支持脚本化,实现不停服更新 3. 增加Language支持脚本化 4. 修改Sns与Pay Center ...

  4. iOS -- iOS11新特性,如何适配iOS11

    前言 这几天抽空把WWDC的Session看了一些,总结了一些iOS11新的特性,可能对我们的App有影响,需要我们进行适配.本文作为一个总结. 本文内容包括:集成了搜索的大标题栏.横向选项卡栏.Ma ...

  5. 【重点突破】——使用Canvas进行绘图图像

    一.引言 本文主要是canvas绘图中绘制图像的部分,做了几个练习,综合起来,复习canvas绘图以及定时器的使用. 二.canvas绘制小飞机在指定位置 <!DOCTYPE html> ...

  6. 2016.11.29 activiti实战--第19章--统一身份管理(含自定义用户与数组的实现)

    学习资料:<Activiti实战> 第十九章 统一身份管理 本章讲解如何统一业务系统与activiti的用户管理系统. 第5章的时候已经讲解过activiti的用户与组.一般来说业务系统都 ...

  7. js 展开&收缩 二种

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...

  8. Flume 开发人员指南V1.5.2

    介绍 概述 Apache Flume是一个用来从非常多不同的源有效地收集.聚集和移动大量的日志数据到一个中心数据仓库的分布式的,可靠的和可用的系统. Apache Flume是Apache软件基金会的 ...

  9. 红米note3刷安卓原生

    http://www.romzj.com/rom/63404.htm#comments-version 然后在系统设置里升级系统, http://www.lineageosdownloads.com/ ...

  10. HDU5294——Tricks Device(最短路 + 最大流)

    第一次做最大流的题目- 这题就是堆模板 #include <iostream> #include <algorithm> #include <cmath> #inc ...