http://azkaban.github.io/azkaban/docs/2.5/

There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.

【solve the problem of Hadoop job dependencies】

Azkaban was implemented at LinkedIn to solve the problem of Hadoop job dependencies. We had jobs that needed to run in order, from ETL jobs to data analytics products.

Initially a single server solution, with the increased number of Hadoop users over the years, Azkaban has evolved to be a more robust solution.

Azkaban consists of 3 key components:

  • Relational Database (MySQL)
  • AzkabanWebServer
  • AzkabanExecutorServer

Relational Database (MySQL)

Azkaban uses MySQL to store much of its state. Both the AzkabanWebServer and the AzkabanExecutorServer access the DB.

How does AzkabanWebServer use the DB?

The web server uses the db for the following reasons:

  • Project Management - The projects, the permissions on the projects as well as the uploaded files.
  • Executing Flow State - Keep track of executing flows and which Executor is running them.
  • Previous Flow/Jobs - Search through previous executions of jobs and flows as well as access their log files.
  • Scheduler - Keeps the state of the scheduled jobs.
  • SLA - Keeps all the sla rules

How does the AzkabanExecutorServer use the DB?

The executor server uses the db for the following reasons:

  • Access the project - Retrieves project files from the db.
  • Executing Flows/Jobs - Retrieves and updates data for flows and that are executing
  • Logs - Stores the output logs for jobs and flows into the db.
  • Interflow dependency - If a flow is running on a different executor, it will take state from the DB.

There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.

AzkabanWebServer

The AzkabanWebServer is the main manager to all of Azkaban. It handles project management, authentication, scheduler, and monitoring of executions. It also serves as the web user interface.

Using Azkaban is easy. Azkaban uses *.job key-value property files to define individual tasks in a work flow, and the _dependencies_ property to define the dependency chain of the jobs. These job files and associated code can be archived into a *.zip and uploaded through the web server through the Azkaban UI or through curl.

AzkabanExecutorServer

Previous versions of Azkaban had both the AzkabanWebServer and the AzkabanExecutorServer features in a single server. The Executor has since been separated into its own server. There were several reasons for splitting these services: we will soon be able to scale the number of executions and fall back on operating Executors if one fails. Also, we are able to roll our upgrades of Azkaban with minimal impact on the users. As Azkaban's usage grew, we found that upgrading Azkaban became increasingly more difficult as all times of the day became 'peak'.

【 no cyclical dependencies detected】

Select the archive file of your workflow files that you want to upload. Currently Azkaban only supports *.zip files. The zip should contain the *.job files and any files needed to run your jobs. Job names must be unique in a project.

Azkaban will validate the contents of the zip to make sure that dependencies are met and that there's no cyclical dependencies detected. If it finds any invalid flows, the upload will fail.

Uploads overwrite all files in the project. Any changes made to jobs will be wiped out after a new zip file is uploaded.

After a successful upload, you should see all of your flows listed on the screen.

http://oozie.apache.org/

Apache Oozie Workflow Scheduler for Hadoop

Overview

Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability.

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Oozie is a scalable, reliable and extensible system.

Azkaban_Oozie_action的更多相关文章

随机推荐

  1. Tallest Cow

    题目描述 FJ's N (1 ≤ N ≤ 10,000) cows conveniently indexed 1..N are standing in a line. Each cow has a p ...

  2. 一款不错的编程字体Source Code Pro

    我以前一直是用的MS自家的是Consolas的字体,这个字体基本上具有编程字体所需的所有要素:等宽.支持ClearType.中文字体大小合适,l和1,o和0很容易区分.非要挑刺的话就是字体比较小,9号 ...

  3. 由ASIHttpRequest里的block引发的思考

    项目发http请求,现在一般的都是用的第三方开源库,当然发异步请求时我们也会写几个回调函数来进行请求返回时的处理.不过前段时间看一个朋友写的代码,里面很用block简单的实现了回调相关的部分.比如: ...

  4. go--互斥锁

    解读: main函数里调用了两次lockPrint方法,这个方法中的println(i, "in lock")这句话,由于是在Mutex的Lock和Unlock之间,所以在第一次调 ...

  5. Manifest值冲突解决方法

    FBI Warning:欢迎转载,但请标明出处:http://blog.csdn.net/codezjx/article/details/38669939,未经本人同意请勿用于商业用途,感谢支持! 整 ...

  6. Android 蓝牙技术 实现终端间数据传输

    蓝牙技术在智能硬件方面有很多用武之地,今天我就为大家分享一下蓝牙技术在Android系统下的使用方法技巧.蓝牙是一种短距离的无线通信技术标准,蓝牙协议分为4层,即核心协议层.电缆替代协议层.电话控制协 ...

  7. myeclipse执行tomcat报错Exception in thread "main" java.lang.OutOfMemoryError: PermGen space

    将myeclipse所配置的tomcat的jdk进行设置:-Xms512m -Xmx512m -XX:MaxNewSize=512m -XX:MaxPermSize=512m,例如以下图:

  8. NYOJ92 图像实用区域 【BFS】

    碰到了一个曾经从未见过的奇怪问题:先上截图: 执行号 用户 题目 结果 时间 内存 语言 提交时间 895360 userid=%E9%95%BF%E6%9C%A8" style=" ...

  9. project修改时间日历

    视图→甘特图 格式→时间表→右键时间表  详细的日程表,然后双击时间即可

  10. 转:css:Position

    http://www.cnblogs.com/polk6/archive/2013/07/26/3214847.html http://blog.sina.com.cn/s/blog_4bcf4a5e ...