本次活动邀请DolphinScheduler社区活跃贡献者,开源积极分子,现就职于政采云大数据部门,从事大数据平台架构工作的李进勇同学给大家分享相关内容。

同时也特别感谢示说网对本次直播活动的大力支持。

PS:本章内容篇幅较长,请大家耐心阅读。

为什么拆解json

在DolphinScheduler 1.3.x及以前的工作流中的任务及任务关系保存时是以大json的方式保存到数据库中process_definiton表的process_definition_json字段,如果某个工作流很大比如有100或者1000个任务,这个json字段也就非常大,在使用时需要解析json,非常耗费性能,且任务没法重用;基于大json,在工作流版本及任务版本上也没有好的实现方案,否则会导致数据大量冗余。

故社区计划启动json拆解项目,实现的需求目标:

  • 大json完全拆分

  • 新增工作流及任务版本

  • 引入全局唯一键(code)

如何设计拆解后的表

3.6版本工作流

比如在当前1.3.6版本创建个a-->b的工作流以下是processDefiniton save 接口在controller入口打印的入参日志

create  process definition, project name: hadoop, process definition name: ab, process_definition_json: {"globalParams":[],"tasks":[{"type":"SHELL","id":"tasks-77643","name":"a","params":{"resourceList":[],"localParams":[{"prop":"yesterday","direct":"IN","type":"VARCHAR","value":"${system.biz.date}"}],"rawScript":"echo ${yesterday}"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":[]},{"type":"SHELL","id":"tasks-99814","name":"b","params":{"resourceList":[],"localParams":[{"prop":"today","direct":"IN","type":"VARCHAR","value":"${system.biz.curdate}"}],"rawScript":"echo ${today}"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":["a"]}],"tenantId":1,"timeout":0}, desc:  locations:{"tasks-77643":{"name":"a","targetarr":"","nodenumber":"1","x":251,"y":166},"tasks-99814":{"name":"b","targetarr":"tasks-77643","nodenumber":"0","x":533,"y":161}}, connects:[{"endPointSourceId":"tasks-77643","endPointTargetId":"tasks-99814"}]

依赖节点的工作流,dep是依赖节点

以下是processDefiniton save 接口在controller入口打印的入参日志

 create  process definition, project name: hadoop, process definition name: dep_c, process_definition_json: {"globalParams":[],"tasks":[{"type":"SHELL","id":"tasks-69503","name":"c","params":{"resourceList":[],"localParams":[],"rawScript":"echo 11"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":["dep"]},{"type":"DEPENDENT","id":"tasks-22756","name":"dep","params":{},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{"relation":"AND","dependTaskList":[{"relation":"AND","dependItemList":[{"projectId":1,"definitionId":1,"depTasks":"b","cycle":"day","dateValue":"today"}]}]},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":[]}],"tenantId":1,"timeout":0}, desc:  locations:{"tasks-69503":{"name":"c","targetarr":"tasks-22756","nodenumber":"0","x":597,"y":166},"tasks-22756":{"name":"dep","targetarr":"","nodenumber":"1","x":308,"y":164}}, connects:[{"endPointSourceId":"tasks-22756","endPointTargetId":"tasks-69503"}]

条件判断的工作流

以下是processDefiniton save 接口在controller入口打印的入参日志

create  process definition, project name: hadoop, process definition name: condition_test, process_definition_json: {"globalParams":[],"tasks":[{"type":"SHELL","id":"tasks-68456","name":"d","params":{"resourceList":[],"localParams":[],"rawScript":"echo 11"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":[]},{"type":"SHELL","id":"tasks-58183","name":"e","params":{"resourceList":[],"localParams":[],"rawScript":"echo 22"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":["cond"]},{"type":"SHELL","id":"tasks-43996","name":"f","params":{"resourceList":[],"localParams":[],"rawScript":"echo 33"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":["cond"]},{"type":"CONDITIONS","id":"tasks-38972","name":"cond","params":{},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":["e"],"failedNode":["f"]},"dependence":{"relation":"AND","dependTaskList":[{"relation":"AND","dependItemList":[{"depTasks":"d","status":"SUCCESS"}]}]},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":["d"]}],"tenantId":1,"timeout":0}, desc:  locations:{"tasks-68456":{"name":"d","targetarr":"","nodenumber":"1","x":168,"y":158},"tasks-58183":{"name":"e","targetarr":"tasks-38972","nodenumber":"0","x":573,"y":82},"tasks-43996":{"name":"f","targetarr":"tasks-38972","nodenumber":"0","x":591,"y":288},"tasks-38972":{"name":"cond","targetarr":"tasks-68456","nodenumber":"2","x":382,"y":175}}, connects:[{"endPointSourceId":"tasks-68456","endPointTargetId":"tasks-38972"},{"endPointSourceId":"tasks-38972","endPointTargetId":"tasks-58183"},{"endPointSourceId":"tasks-38972","endPointTargetId":"tasks-43996"}]

从以上三个案例中,我们知道controller的入口参数的每个参数都可以在t_ds_process_definition表中找到对应,故表中数据如下图

拆解后的表设计思路

工作流只是dag的展现形式,任务通过工作流进行组织,组织的同时存在了任务之间的关系,也就是依赖。就好比一个画板,画板上有些图案,工作流就是画板,图案就是任务,图案之间的关系就是依赖。而调度的核心是调度任务,依赖只是表述调度的先后顺序。当前定时还是对整个工作流进行的定时,拆解后就方便对单独任务进行调度。正是基于这个思想设计了拆解的思路,所以这就需要三张表,工作流定义表、任务定义表、任务关系表。

  • 工作流定义表:描述工作流的基本信息,比如全局参数、dag中节点的位置信息

  • 任务定义表:描述任务的详情信息,比如任务类别、任务容错信息、优先级等

  • 任务关系表:描述任务的关系信息,比如当前节点、前置节点等

基于这个设计思想再扩展到版本,无非是对于这三张表,每张表新增个保存版本的日志表。

工作流定义表

现在看案例中save接口日志,现有字段(project、process_definition_name、desc、locations、connects),对于json中除了task之外的还剩下

{"globalParams":[],"tenantId":1,"timeout":0}

所以可知工作流定义表:

CREATE TABLE `t_ds_process_definition` (
 `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
 `code` bigint(20) NOT NULL COMMENT 'encoding',
 `name` varchar(200) DEFAULT NULL COMMENT 'process definition name',
 `version` int(11) DEFAULT NULL COMMENT 'process definition version',
 `description` text COMMENT 'description',
 `project_code` bigint(20) NOT NULL COMMENT 'project code',
 `release_state` tinyint(4) DEFAULT NULL COMMENT 'process definition release state:0:offline,1:online',
 `user_id` int(11) DEFAULT NULL COMMENT 'process definition creator id',
 `global_params` text COMMENT 'global parameters',
 `flag` tinyint(4) DEFAULT NULL COMMENT '0 not available, 1 available',
 `locations` text COMMENT 'Node location information',
 `connects` text COMMENT 'Node connection information',
 `warning_group_id` int(11) DEFAULT NULL COMMENT 'alert group id',
 `timeout` int(11) DEFAULT '0' COMMENT 'time out, unit: minute',
 `tenant_id` int(11) NOT NULL DEFAULT '-1' COMMENT 'tenant id',
 `create_time` datetime NOT NULL COMMENT 'create time',
 `update_time` datetime DEFAULT NULL COMMENT 'update time',
 PRIMARY KEY (`id`,`code`),
 UNIQUE KEY `process_unique` (`name`,`project_code`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `t_ds_process_definition_log` (
 `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
 `code` bigint(20) NOT NULL COMMENT 'encoding',
 `name` varchar(200) DEFAULT NULL COMMENT 'process definition name',
 `version` int(11) DEFAULT NULL COMMENT 'process definition version',
 `description` text COMMENT 'description',
 `project_code` bigint(20) NOT NULL COMMENT 'project code',
 `release_state` tinyint(4) DEFAULT NULL COMMENT 'process definition release state:0:offline,1:online',
 `user_id` int(11) DEFAULT NULL COMMENT 'process definition creator id',
 `global_params` text COMMENT 'global parameters',
 `flag` tinyint(4) DEFAULT NULL COMMENT '0 not available, 1 available',
 `locations` text COMMENT 'Node location information',
 `connects` text COMMENT 'Node connection information',
 `warning_group_id` int(11) DEFAULT NULL COMMENT 'alert group id',
 `timeout` int(11) DEFAULT '0' COMMENT 'time out,unit: minute',
 `tenant_id` int(11) NOT NULL DEFAULT '-1' COMMENT 'tenant id',
 `operator` int(11) DEFAULT NULL COMMENT 'operator user id',
 `operate_time` datetime DEFAULT NULL COMMENT 'operate time',
 `create_time` datetime NOT NULL COMMENT 'create time',
 `update_time` datetime DEFAULT NULL COMMENT 'update time',
 PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

从表字段可以看出 日志表仅仅比主表多了两个字段operator、operate_time

任务定义表

案例中ab工作流task的json

  "tasks": [{
   "type": "SHELL",
   "id": "tasks-77643",
   "name": "a",
   "params": {
     "resourceList": [],
     "localParams": [{
       "prop": "yesterday",
       "direct": "IN",
       "type": "VARCHAR",
       "value": "${system.biz.date}"
     }],
     "rawScript": "echo ${yesterday}"
   },
   "description": "",
   "timeout": {
     "strategy": "",
     "interval": null,
     "enable": false
   },
   "runFlag": "NORMAL",
   "conditionResult": {
     "successNode": [""],
     "failedNode": [""]
   },
   "dependence": {},
   "maxRetryTimes": "0",
   "retryInterval": "1",
   "taskInstancePriority": "MEDIUM",
   "workerGroup": "default",
   "preTasks": []
 }, {
   "type": "SHELL",
   "id": "tasks-99814",
   "name": "b",
   "params": {
     "resourceList": [],
     "localParams": [{
       "prop": "today",
       "direct": "IN",
       "type": "VARCHAR",
       "value": "${system.biz.curdate}"
     }],
     "rawScript": "echo ${today}"
   },
   "description": "",
   "timeout": {
     "strategy": "",
     "interval": null,
     "enable": false
   },
   "runFlag": "NORMAL",
   "conditionResult": {
     "successNode": [""],
     "failedNode": [""]
   },
   "dependence": {},
   "maxRetryTimes": "0",
   "retryInterval": "1",
   "taskInstancePriority": "MEDIUM",
   "workerGroup": "default",
   "preTasks": ["a"]
 }]

dep_c工作流task的json

  "tasks": [{
   "type": "SHELL",
   "id": "tasks-69503",
   "name": "c",
   "params": {
     "resourceList": [],
     "localParams": [],
     "rawScript": "echo 11"
   },
   "description": "",
   "timeout": {
     "strategy": "",
     "interval": null,
     "enable": false
   },
   "runFlag": "NORMAL",
   "conditionResult": {
     "successNode": [""],
     "failedNode": [""]
   },
   "dependence": {},
   "maxRetryTimes": "0",
   "retryInterval": "1",
   "taskInstancePriority": "MEDIUM",
   "workerGroup": "default",
   "preTasks": ["dep"]
 }, {
   "type": "DEPENDENT",
   "id": "tasks-22756",
   "name": "dep",
   "params": {},
   "description": "",
   "timeout": {
     "strategy": "",
     "interval": null,
     "enable": false
   },
   "runFlag": "NORMAL",
   "conditionResult": {
     "successNode": [""],
     "failedNode": [""]
   },
   "dependence": {
     "relation": "AND",
     "dependTaskList": [{
       "relation": "AND",
       "dependItemList": [{
         "projectId": 1,
         "definitionId": 1,
         "depTasks": "b",
         "cycle": "day",
         "dateValue": "today"
       }]
     }]
   },
   "maxRetryTimes": "0",
   "retryInterval": "1",
   "taskInstancePriority": "MEDIUM",
   "workerGroup": "default",
   "preTasks": []
 }]

condition_test工作流task的json

  "tasks": [{
   "type": "SHELL",
   "id": "tasks-68456",
   "name": "d",
   "params": {
     "resourceList": [],
     "localParams": [],
     "rawScript": "echo 11"
   },
   "description": "",
   "timeout": {
     "strategy": "",
     "interval": null,
     "enable": false
   },
   "runFlag": "NORMAL",
   "conditionResult": {
     "successNode": [""],
     "failedNode": [""]
   },
   "dependence": {},
   "maxRetryTimes": "0",
   "retryInterval": "1",
   "taskInstancePriority": "MEDIUM",
   "workerGroup": "default",
   "preTasks": []
 }, {
   "type": "SHELL",
   "id": "tasks-58183",
   "name": "e",
   "params": {
     "resourceList": [],
     "localParams": [],
     "rawScript": "echo 22"
   },
   "description": "",
   "timeout": {
     "strategy": "",
     "interval": null,
     "enable": false
   },
   "runFlag": "NORMAL",
   "conditionResult": {
     "successNode": [""],
     "failedNode": [""]
   },
   "dependence": {},
   "maxRetryTimes": "0",
   "retryInterval": "1",
   "taskInstancePriority": "MEDIUM",
   "workerGroup": "default",
   "preTasks": ["cond"]
 }, {
   "type": "SHELL",
   "id": "tasks-43996",
   "name": "f",
   "params": {
     "resourceList": [],
     "localParams": [],
     "rawScript": "echo 33"
   },
   "description": "",
   "timeout": {
     "strategy": "",
     "interval": null,
     "enable": false
   },
   "runFlag": "NORMAL",
   "conditionResult": {
     "successNode": [""],
     "failedNode": [""]
   },
   "dependence": {},
   "maxRetryTimes": "0",
   "retryInterval": "1",
   "taskInstancePriority": "MEDIUM",
   "workerGroup": "default",
   "preTasks": ["cond"]
 }, {
   "type": "CONDITIONS",
   "id": "tasks-38972",
   "name": "cond",
   "params": {},
   "description": "",
   "timeout": {
     "strategy": "",
     "interval": null,
     "enable": false
   },
   "runFlag": "NORMAL",
   "conditionResult": {
     "successNode": ["e"],
     "failedNode": ["f"]
   },
   "dependence": {
     "relation": "AND",
     "dependTaskList": [{
       "relation": "AND",
       "dependItemList": [{
         "depTasks": "d",
         "status": "SUCCESS"
       }]
     }]
   },
   "maxRetryTimes": "0",
   "retryInterval": "1",
   "taskInstancePriority": "MEDIUM",
   "workerGroup": "default",
   "preTasks": ["d"]
 }]

从案例中可以知道SHELL/DEPENDENT/CONDITIONS类型的节点的json构成(其他任务类似SHELL),preTasks标识前置依赖节点。conditionResult结构比较固定,而dependence结构复杂,DEPENDENT和CONDITIONS类型任务的dependence结构还不一样,所以为了统一,我们将conditionResult和dependence整体放到params中,params对应表字段的task_params。

这样我们就确定了t_ds_task_definition表

CREATE TABLE `t_ds_task_definition` (
 `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
 `code` bigint(20) NOT NULL COMMENT 'encoding',
 `name` varchar(200) DEFAULT NULL COMMENT 'task definition name',
 `version` int(11) DEFAULT NULL COMMENT 'task definition version',
 `description` text COMMENT 'description',
 `project_code` bigint(20) NOT NULL COMMENT 'project code',
 `user_id` int(11) DEFAULT NULL COMMENT 'task definition creator id',
 `task_type` varchar(50) NOT NULL COMMENT 'task type',
 `task_params` text COMMENT 'job custom parameters',
 `flag` tinyint(2) DEFAULT NULL COMMENT '0 not available, 1 available',
 `task_priority` tinyint(4) DEFAULT NULL COMMENT 'job priority',
 `worker_group` varchar(200) DEFAULT NULL COMMENT 'worker grouping',
 `fail_retry_times` int(11) DEFAULT NULL COMMENT 'number of failed retries',
 `fail_retry_interval` int(11) DEFAULT NULL COMMENT 'failed retry interval',
 `timeout_flag` tinyint(2) DEFAULT '0' COMMENT 'timeout flag:0 close, 1 open',
 `timeout_notify_strategy` tinyint(4) DEFAULT NULL COMMENT 'timeout notification policy: 0 warning, 1 fail',
 `timeout` int(11) DEFAULT '0' COMMENT 'timeout length,unit: minute',
 `delay_time` int(11) DEFAULT '0' COMMENT 'delay execution time,unit: minute',
 `resource_ids` varchar(255) DEFAULT NULL COMMENT 'resource id, separated by comma',
 `create_time` datetime NOT NULL COMMENT 'create time',
 `update_time` datetime DEFAULT NULL COMMENT 'update time',
 PRIMARY KEY (`id`,`code`),
 UNIQUE KEY `task_unique` (`name`,`project_code`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `t_ds_task_definition_log` (
 `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
 `code` bigint(20) NOT NULL COMMENT 'encoding',
 `name` varchar(200) DEFAULT NULL COMMENT 'task definition name',
 `version` int(11) DEFAULT NULL COMMENT 'task definition version',
 `description` text COMMENT 'description',
 `project_code` bigint(20) NOT NULL COMMENT 'project code',
 `user_id` int(11) DEFAULT NULL COMMENT 'task definition creator id',
 `task_type` varchar(50) NOT NULL COMMENT 'task type',
 `task_params` text COMMENT 'job custom parameters',
 `flag` tinyint(2) DEFAULT NULL COMMENT '0 not available, 1 available',
 `task_priority` tinyint(4) DEFAULT NULL COMMENT 'job priority',
 `worker_group` varchar(200) DEFAULT NULL COMMENT 'worker grouping',
 `fail_retry_times` int(11) DEFAULT NULL COMMENT 'number of failed retries',
 `fail_retry_interval` int(11) DEFAULT NULL COMMENT 'failed retry interval',
 `timeout_flag` tinyint(2) DEFAULT '0' COMMENT 'timeout flag:0 close, 1 open',
 `timeout_notify_strategy` tinyint(4) DEFAULT NULL COMMENT 'timeout notification policy: 0 warning, 1 fail',
 `timeout` int(11) DEFAULT '0' COMMENT 'timeout length,unit: minute',
 `delay_time` int(11) DEFAULT '0' COMMENT 'delay execution time,unit: minute',
 `resource_ids` varchar(255) DEFAULT NULL COMMENT 'resource id, separated by comma',
 `operator` int(11) DEFAULT NULL COMMENT 'operator user id',
 `operate_time` datetime DEFAULT NULL COMMENT 'operate time',
 `create_time` datetime NOT NULL COMMENT 'create time',
 `update_time` datetime DEFAULT NULL COMMENT 'update time',
 PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

//注意:dev版本和1.3.6版本区别,dev版本已将description换成desc,并且新增了delayTime

{
 "globalParams": [],
 "tasks": [{
     "type": "SHELL",
     "id": "tasks-18200",
     "name": "d",
     "code": "",
     "params": {
       "resourceList": [],
       "localParams": [],
       "rawScript": "echo 5"
     },
     "desc": "",
     "runFlag": "NORMAL",
     "conditionResult": {
       "successNode": [
         ""
       ],
       "failedNode": [
         ""
       ]
     },
     "dependence": {},
     "maxRetryTimes": "0",
     "retryInterval": "1",
     "delayTime": "0",
     "timeout": {
       "strategy": "",
       "interval": null,
       "enable": false
     },
     "waitStartTimeout": {},
     "taskInstancePriority": "MEDIUM",
     "workerGroup": "hadoop",
     "preTasks": [],
     "depList": null
   },
   {
     "type": "SHELL",
     "id": "tasks-55225",
     "name": "e",
     "code": "",
     "params": {
       "resourceList": [],
       "localParams": [],
       "rawScript": "echo 6"
     },
     "desc": "",
     "runFlag": "NORMAL",
     "conditionResult": {
       "successNode": [
         ""
       ],
       "failedNode": [
         ""
       ]
     },
     "dependence": {},
     "maxRetryTimes": "0",
     "retryInterval": "1",
     "delayTime": "0",
     "timeout": {
       "strategy": "",
       "interval": null,
       "enable": false
     },
     "waitStartTimeout": {},
     "taskInstancePriority": "MEDIUM",
     "workerGroup": "hadoop",
     "preTasks": [
       "def"
     ],
     "depList": null
   },
   {
     "type": "SHELL",
     "id": "tasks-67639",
     "name": "f",
     "code": "",
     "params": {
       "resourceList": [],
       "localParams": [],
       "rawScript": "echo 7"
     },
     "desc": "",
     "runFlag": "NORMAL",
     "conditionResult": {
       "successNode": [
         ""
       ],
       "failedNode": [
         ""
       ]
     },
     "dependence": {},
     "maxRetryTimes": "0",
     "retryInterval": "1",
     "delayTime": "0",
     "timeout": {
       "strategy": "",
       "interval": null,
       "enable": false
     },
     "waitStartTimeout": {},
     "taskInstancePriority": "MEDIUM",
     "workerGroup": "hadoop",
     "preTasks": [
       "def"
     ],
     "depList": null
   },
   {
     "type": "CONDITIONS",
     "id": "tasks-67387",
     "name": "def",
     "code": "",
     "params": {},
     "desc": "",
     "runFlag": "NORMAL",
     "conditionResult": {
       "successNode": [
         "e"
       ],
       "failedNode": [
         "f"
       ]
     },
     "dependence": {
       "relation": "AND",
       "dependTaskList": [{
         "relation": "AND",
         "dependItemList": [{
             "depTasks": "d",
             "status": "SUCCESS"
           },
           {
             "depTasks": "d",
             "status": "FAILURE"
           }
         ]
       }]
     },
     "maxRetryTimes": "0",
     "retryInterval": "1",
     "delayTime": "0",
     "timeout": {
       "strategy": "",
       "interval": null,
       "enable": false
     },
     "waitStartTimeout": {},
     "taskInstancePriority": "MEDIUM",
     "workerGroup": "hadoop",
     "preTasks": [
       "d"
     ],
     "depList": null
   }
 ],
 "tenantId": 1,
 "timeout": 0
}

任务关系表

preTasks标识前置依赖节点,当前节点在关系表中使用postTask标识。由于当前节点肯定存在而前置节点不一定存在,所以post不可能为空,而preTask有可能为空

CREATE TABLE `t_ds_process_task_relation` (
 `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
 `name` varchar(200) DEFAULT NULL COMMENT 'relation name',
 `process_definition_version` int(11) DEFAULT NULL COMMENT 'process version',
 `project_code` bigint(20) NOT NULL COMMENT 'project code',
 `process_definition_code` bigint(20) NOT NULL COMMENT 'process code',
 `pre_task_code` bigint(20) NOT NULL COMMENT 'pre task code',
 `pre_task_version` int(11) NOT NULL COMMENT 'pre task version',
 `post_task_code` bigint(20) NOT NULL COMMENT 'post task code',
 `post_task_version` int(11) NOT NULL COMMENT 'post task version',
 `condition_type` tinyint(2) DEFAULT NULL COMMENT 'condition type : 0 none, 1 judge 2 delay',
 `condition_params` text COMMENT 'condition params(json)',
 `create_time` datetime NOT NULL COMMENT 'create time',
 `update_time` datetime DEFAULT NULL COMMENT 'update time',
 PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `t_ds_process_task_relation_log` (
 `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
 `name` varchar(200) DEFAULT NULL COMMENT 'relation name',
 `process_definition_version` int(11) DEFAULT NULL COMMENT 'process version',
 `project_code` bigint(20) NOT NULL COMMENT 'project code',
 `process_definition_code` bigint(20) NOT NULL COMMENT 'process code',
 `pre_task_code` bigint(20) NOT NULL COMMENT 'pre task code',
 `pre_task_version` int(11) NOT NULL COMMENT 'pre task version',
 `post_task_code` bigint(20) NOT NULL COMMENT 'post task code',
 `post_task_version` int(11) NOT NULL COMMENT 'post task version',
 `condition_type` tinyint(2) DEFAULT NULL COMMENT 'condition type : 0 none, 1 judge 2 delay',
 `condition_params` text COMMENT 'condition params(json)',
 `operator` int(11) DEFAULT NULL COMMENT 'operator user id',
 `operate_time` datetime DEFAULT NULL COMMENT 'operate time',
 `create_time` datetime NOT NULL COMMENT 'create time',
 `update_time` datetime DEFAULT NULL COMMENT 'update time',
 PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

对于依赖关系复杂的场景

API模块如何改造

api模块进行save操作时

  • 通过雪花算法生成13位的数字作为process_definition_code,工作流定义保存至process_definition(主表)和process_definition_log(日志表),这两个表保存的是同样的数据,工作流定义版本为1

  • 通过雪花算法生成13位的数字作为task_definition_code,任务定义表保存至task_definition(主表)和task_definition_log(日志表),也是保存同样的数据,任务定义版本为1

  • 工作流任务关系保存在 process_task_relation(主表)和process_task_relation_log(日志表),该表保存的code和version是工作流的code和version,因为任务是通过工作流进行组织,以工作流来画dag。也是通过post_task_code和post_task_version标识dag的当前节点,这个节点的前置依赖通过pre_task_code和pre_task_version来标识,如果没有依赖,pre_task_code和pre_task_version为0

api模块进行update操作时,工作流定义和任务定义直接更新主表数据,更新后的数据insert到日志表。关系表主表先删除然后再插入新的关系,关系表日志表直接插入新的关系

api模块进行delete操作时,工作流定义、任务定义和关系表直接删除主表数据,日志表数据不变动

api模块进行switch操作时,直接将日志表中对应version数据覆盖到主表

数据交互如何改造

在json拆分一期api模块controller层整体未变动,传入的大json还是在service层映射为ProcessData对象。insert或update操作在公共Service模块通过ProcessService.saveProcessDefiniton()入口完成保存数据库操作,按照task_definition、process_task_relation、process_definition的顺序保存。保存时,如果该任务已经存在并且关联的工作流未上线,则更改任务;如果任务关联的工作流已上线,则不允许更改任务

api查询操作时,当前还是通过工作流id来查询,在公共Service模块通过ProcessService.genTaskNodeList()入口完成数据组装,还是组装为ProcessData对象,进而生成json返回

Server模块(Master)也是通过公共Service模块ProcessService.genTaskNodeList()获得TaskNodeList生成调度dag,把当前任务所有信息放到 MasterExecThread.readyToSubmitTaskQueue队列,以便生成taskInstance,dispatch给worker

当前json还需做什么

  • controller对外restAPI接口改造

  • ui模块dag改造

  • ui模块新增task操作页面

processDefinition

备注:taskRelationJson格式:

[{"name":"","pre_task_code":0,"pre_task_version":0,"post_task_code":123456789,"post_task_version":1,"condition_type":0,"condition_params":{}},{"name":"","pre_task_code":123456789,"pre_task_version":1,"post_task_code":123451234,"post_task_version":1,"condition_type":0,"condition_params":{}}]

同理其他接口请求参数processDefinitionId换成code

schedule

taskDefinition(新增)

taskDefinitionJson:[{"name":"test","description":"","task_type":"SHELL","task_params":[],"flag":0,"task_priority":0,"worker_group":"default","fail_retry_times":0,"fail_retry_interval":0,"timeout_flag":0,"timeout_notify_strategy":0,"timeout":0,"delay_time":0,"resource_ids":""}]

对应需求issue

  • [Feature][JsonSplit-api] api module controller design #5498

  • [Feature][JsonSplit-api]processDefinition save/update interface  #5499

  • [Feature][JsonSplit-api]processDefinition switch interface #5501

  • [Feature][JsonSplit-api]processDefinition delete interface #5502

  • [Feature][JsonSplit-api]processDefinition copy interface #5503

  • [Feature][JsonSplit-api]processDefinition export interface #5504

  • [Feature][JsonSplit-api]processDefinition list-paging interface #5505

  • [Feature][JsonSplit-api]processDefinition move interface #5506

  • [Feature][JsonSplit-api]processDefinition queryProcessDefinitionAllByProjectId interface #5507

  • [Feature][JsonSplit-api]processDefinition select-by-id interface #5508

  • [Feature][JsonSplit-api]processDefinition view-tree interface #5509

  • [Feature][JsonSplit-api]schedule create interface #5510

  • [Feature][JsonSplit-api]schedule list-paging interface #5511

  • [Feature][JsonSplit-api]schedule update interface #5512

  • [Feature][JsonSplit-api]taskDefinition save interface #5513

  • [Feature][JsonSplit-api]taskDefinition update interface #5514

  • [Feature][JsonSplit-api]taskDefinition switch interface #5515

  • [Feature][JsonSplit-api]taskDefinition query interface #5516

  • [Feature][JsonSplit-api]taskDefinition delete interface #5517

  • [Feature][JsonSplit-api]WorkFlowLineage interface #5518

  • [Feature][JsonSplit-api]analysis interface #5519

  • [Feature][JsonSplit-api]executors interface #5520

  • [Feature][JsonSplit-api]processInstance interface #5521

  • [Feature][JsonSplit-api]project interface #5522

https://github.com/apache/dolphinscheduler/issues

Github issues

示说网

https://www.slidestalk.com/

点击阅读原文查看本次分享视频

DolphinScheduler JSON拆解详解的更多相关文章

  1. SpringMVC接受JSON参数详解及常见错误总结我改

    SpringMVC接受JSON参数详解及常见错误总结 最近一段时间不想使用Session了,想感受一下Token这样比较安全,稳健的方式,顺便写一个统一的接口给浏览器还有APP.所以把一个练手项目的前 ...

  2. Angular 6.X CLI(Angular.json) 属性详解

    Angular CLI(Angular.json) 属性详解 简介 angular cli 是angular commond line interface的缩写,意为angular的命令行接口.在an ...

  3. Angular Npm Package.Json文件详解

    Angular7 Npm Package.Json文件详解   近期时间比较充裕,正好想了解下Angular Project相关内容.于是将Npm官网上关于Package.json的官方说明文档进行了 ...

  4. SpringMVC接受JSON参数详解及常见错误总结

    SpringMVC接受JSON参数详解及常见错误总结 SpringMVC接受JSON参数详解及常见错误总结 最近一段时间不想使用Session了,想感受一下Token这样比较安全,稳健的方式,顺便写一 ...

  5. SpringMVC接受JSON参数详解

    转:https://blog.csdn.net/LostSh/article/details/68923874 SpringMVC接受JSON参数详解及常见错误总结 最近一段时间不想使用Session ...

  6. 最强常用开发库总结 - JSON库详解

    最强常用开发库总结 - JSON库详解 JSON应用非常广泛,对于Java常用的JSON库要完全掌握.@pdai JSON简介 JSON是什么 JSON 指的是 JavaScript 对象表示法(Ja ...

  7. spring-mvc注解(mvc:annotation-driver,JSON,配置详解)

    一.DefaultAnnotationHandlerMapping 和 AnnotationMethodHandlerAdapter 的使用已经过时! spring 3.1 开始我们应该用 Reque ...

  8. android Json解析详解

    JSON的定义: 一种轻量级的数据交换格式,具有良好的可读和便于快速编写的特性.业内主流技术为其提供了完整的解决方案(有点类似于正则表达式 ,获得了当今大部分语 言的支持),从而可以在不同平台间进行数 ...

  9. android Json解析详解(详细代码)

    JSON的定义: 一种轻量级的数据交换格式,具有良好的可读和便于快速编写的特性.业内主流技术为其提供了完整的解决方案(有点类似于正则表达式 ,获得了当今大部分语言的支持),从而可以在不同平台间进行数据 ...

随机推荐

  1. Javabean使用实例

    1.login.jsp <%@ page language="java" contentType="text/html; charset=utf-8" p ...

  2. 给IDEA道个歉,这不是它的BUG,而是反编译插件的BUG。

    你好呀,我是歪歪. 上周我不是发了<我怀疑这是IDEA的BUG,但是我翻遍全网没找到证据!>这篇文章吗. 主要描述了在 IDEA 里面反编译后的 class 文件中有这样的代码片段: 很明 ...

  3. 关于p命名空间和c命名空间 外加一个context

    P命名空间注入 : 需要在头文件中加入约束文件 导入约束 : xmlns:p="http://www.springframework.org/schema/p" 如 xmlns=& ...

  4. React history.push()无法跳转 url改变页面不渲染

    一.history.push()无法跳转参考了很多文章 研究一下生命周期 render是要有state变化才会执行 BrowserHistory只有props变化 无法触发render 如下改造环境 ...

  5. 耗时半年,Eoapi 终于正式发布 API 工具的插件广场

      这是我们的第一篇月报,每个月和每个来之不易的开发者分享产品故事以及产品进展. 在 5.20 这个极具中国特色的"节日"里,Eoapi 发布了 1.0.0 版,三个程序员掉了半年 ...

  6. GitHub 简介

    用详细的图文对GitHub进行简单的介绍. git是一个版本控制工具,github是一个用git做版本控制的项目托管平台. 主页介绍: overview:总览.相当于个人主页. repositorie ...

  7. .NET中的迭代器(Iterator)

    更新记录 本文迁移自Panda666原博客,原发布时间:2021年6月30日. 一.迭代器介绍 C#2.0开始,我们可以使用迭代器(iterator).编译器自动把我们定义的迭代器生成 可枚举类型 或 ...

  8. Quartus II 13.0 sp1的官方下载页面

    今天为了下个ModelSim跑到网上去找下载资源,清一色的百度网盘,下载速度60k/s,简直有病,于是跑到Intel官网上把连接挖出来了,供各位直接下载 实测使用IDM多线程下载速度可以轻松上到数MB ...

  9. 使用C++的ORM框架QxORM

    QxORM中,我们用的最多的无非是这两点 官方表述是这样的: 持久性: 支持最常见的数据库,如 SQLite.MySQL.PostgreSQL.Oracle.MS SQL Server.MongoDB ...

  10. UiPath条件判断活动If的介绍和使用

    if的介绍 if语句是指编程语言(包括c语言.C#.Python.Java.汇编语言等)中用来判定所给定的条件是否满足,根据判定的结果(真或假)决定执行给出的两种操作之一. if在UiPath中的使用 ...