DolphinScheduler JSON拆解详解
本次活动邀请DolphinScheduler社区活跃贡献者,开源积极分子,现就职于政采云大数据部门,从事大数据平台架构工作的李进勇同学给大家分享相关内容。
同时也特别感谢示说网对本次直播活动的大力支持。
PS:本章内容篇幅较长,请大家耐心阅读。
为什么拆解json
在DolphinScheduler 1.3.x及以前的工作流中的任务及任务关系保存时是以大json的方式保存到数据库中process_definiton表的process_definition_json字段,如果某个工作流很大比如有100或者1000个任务,这个json字段也就非常大,在使用时需要解析json,非常耗费性能,且任务没法重用;基于大json,在工作流版本及任务版本上也没有好的实现方案,否则会导致数据大量冗余。
故社区计划启动json拆解项目,实现的需求目标:
大json完全拆分
新增工作流及任务版本
引入全局唯一键(code)
如何设计拆解后的表
3.6版本工作流
比如在当前1.3.6版本创建个a-->b的工作流以下是processDefiniton save 接口在controller入口打印的入参日志
create process definition, project name: hadoop, process definition name: ab, process_definition_json: {"globalParams":[],"tasks":[{"type":"SHELL","id":"tasks-77643","name":"a","params":{"resourceList":[],"localParams":[{"prop":"yesterday","direct":"IN","type":"VARCHAR","value":"${system.biz.date}"}],"rawScript":"echo ${yesterday}"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":[]},{"type":"SHELL","id":"tasks-99814","name":"b","params":{"resourceList":[],"localParams":[{"prop":"today","direct":"IN","type":"VARCHAR","value":"${system.biz.curdate}"}],"rawScript":"echo ${today}"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":["a"]}],"tenantId":1,"timeout":0}, desc: locations:{"tasks-77643":{"name":"a","targetarr":"","nodenumber":"1","x":251,"y":166},"tasks-99814":{"name":"b","targetarr":"tasks-77643","nodenumber":"0","x":533,"y":161}}, connects:[{"endPointSourceId":"tasks-77643","endPointTargetId":"tasks-99814"}]
依赖节点的工作流,dep是依赖节点
以下是processDefiniton save 接口在controller入口打印的入参日志
create process definition, project name: hadoop, process definition name: dep_c, process_definition_json: {"globalParams":[],"tasks":[{"type":"SHELL","id":"tasks-69503","name":"c","params":{"resourceList":[],"localParams":[],"rawScript":"echo 11"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":["dep"]},{"type":"DEPENDENT","id":"tasks-22756","name":"dep","params":{},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{"relation":"AND","dependTaskList":[{"relation":"AND","dependItemList":[{"projectId":1,"definitionId":1,"depTasks":"b","cycle":"day","dateValue":"today"}]}]},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":[]}],"tenantId":1,"timeout":0}, desc: locations:{"tasks-69503":{"name":"c","targetarr":"tasks-22756","nodenumber":"0","x":597,"y":166},"tasks-22756":{"name":"dep","targetarr":"","nodenumber":"1","x":308,"y":164}}, connects:[{"endPointSourceId":"tasks-22756","endPointTargetId":"tasks-69503"}]
条件判断的工作流
以下是processDefiniton save 接口在controller入口打印的入参日志
create process definition, project name: hadoop, process definition name: condition_test, process_definition_json: {"globalParams":[],"tasks":[{"type":"SHELL","id":"tasks-68456","name":"d","params":{"resourceList":[],"localParams":[],"rawScript":"echo 11"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":[]},{"type":"SHELL","id":"tasks-58183","name":"e","params":{"resourceList":[],"localParams":[],"rawScript":"echo 22"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":["cond"]},{"type":"SHELL","id":"tasks-43996","name":"f","params":{"resourceList":[],"localParams":[],"rawScript":"echo 33"},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":[""],"failedNode":[""]},"dependence":{},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":["cond"]},{"type":"CONDITIONS","id":"tasks-38972","name":"cond","params":{},"description":"","timeout":{"strategy":"","interval":null,"enable":false},"runFlag":"NORMAL","conditionResult":{"successNode":["e"],"failedNode":["f"]},"dependence":{"relation":"AND","dependTaskList":[{"relation":"AND","dependItemList":[{"depTasks":"d","status":"SUCCESS"}]}]},"maxRetryTimes":"0","retryInterval":"1","taskInstancePriority":"MEDIUM","workerGroup":"default","preTasks":["d"]}],"tenantId":1,"timeout":0}, desc: locations:{"tasks-68456":{"name":"d","targetarr":"","nodenumber":"1","x":168,"y":158},"tasks-58183":{"name":"e","targetarr":"tasks-38972","nodenumber":"0","x":573,"y":82},"tasks-43996":{"name":"f","targetarr":"tasks-38972","nodenumber":"0","x":591,"y":288},"tasks-38972":{"name":"cond","targetarr":"tasks-68456","nodenumber":"2","x":382,"y":175}}, connects:[{"endPointSourceId":"tasks-68456","endPointTargetId":"tasks-38972"},{"endPointSourceId":"tasks-38972","endPointTargetId":"tasks-58183"},{"endPointSourceId":"tasks-38972","endPointTargetId":"tasks-43996"}]
从以上三个案例中,我们知道controller的入口参数的每个参数都可以在t_ds_process_definition表中找到对应,故表中数据如下图
拆解后的表设计思路
工作流只是dag的展现形式,任务通过工作流进行组织,组织的同时存在了任务之间的关系,也就是依赖。就好比一个画板,画板上有些图案,工作流就是画板,图案就是任务,图案之间的关系就是依赖。而调度的核心是调度任务,依赖只是表述调度的先后顺序。当前定时还是对整个工作流进行的定时,拆解后就方便对单独任务进行调度。正是基于这个思想设计了拆解的思路,所以这就需要三张表,工作流定义表、任务定义表、任务关系表。
工作流定义表:描述工作流的基本信息,比如全局参数、dag中节点的位置信息
任务定义表:描述任务的详情信息,比如任务类别、任务容错信息、优先级等
任务关系表:描述任务的关系信息,比如当前节点、前置节点等
基于这个设计思想再扩展到版本,无非是对于这三张表,每张表新增个保存版本的日志表。
工作流定义表
现在看案例中save接口日志,现有字段(project、process_definition_name、desc、locations、connects),对于json中除了task之外的还剩下
{"globalParams":[],"tenantId":1,"timeout":0}
所以可知工作流定义表:
CREATE TABLE `t_ds_process_definition` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
`code` bigint(20) NOT NULL COMMENT 'encoding',
`name` varchar(200) DEFAULT NULL COMMENT 'process definition name',
`version` int(11) DEFAULT NULL COMMENT 'process definition version',
`description` text COMMENT 'description',
`project_code` bigint(20) NOT NULL COMMENT 'project code',
`release_state` tinyint(4) DEFAULT NULL COMMENT 'process definition release state:0:offline,1:online',
`user_id` int(11) DEFAULT NULL COMMENT 'process definition creator id',
`global_params` text COMMENT 'global parameters',
`flag` tinyint(4) DEFAULT NULL COMMENT '0 not available, 1 available',
`locations` text COMMENT 'Node location information',
`connects` text COMMENT 'Node connection information',
`warning_group_id` int(11) DEFAULT NULL COMMENT 'alert group id',
`timeout` int(11) DEFAULT '0' COMMENT 'time out, unit: minute',
`tenant_id` int(11) NOT NULL DEFAULT '-1' COMMENT 'tenant id',
`create_time` datetime NOT NULL COMMENT 'create time',
`update_time` datetime DEFAULT NULL COMMENT 'update time',
PRIMARY KEY (`id`,`code`),
UNIQUE KEY `process_unique` (`name`,`project_code`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `t_ds_process_definition_log` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
`code` bigint(20) NOT NULL COMMENT 'encoding',
`name` varchar(200) DEFAULT NULL COMMENT 'process definition name',
`version` int(11) DEFAULT NULL COMMENT 'process definition version',
`description` text COMMENT 'description',
`project_code` bigint(20) NOT NULL COMMENT 'project code',
`release_state` tinyint(4) DEFAULT NULL COMMENT 'process definition release state:0:offline,1:online',
`user_id` int(11) DEFAULT NULL COMMENT 'process definition creator id',
`global_params` text COMMENT 'global parameters',
`flag` tinyint(4) DEFAULT NULL COMMENT '0 not available, 1 available',
`locations` text COMMENT 'Node location information',
`connects` text COMMENT 'Node connection information',
`warning_group_id` int(11) DEFAULT NULL COMMENT 'alert group id',
`timeout` int(11) DEFAULT '0' COMMENT 'time out,unit: minute',
`tenant_id` int(11) NOT NULL DEFAULT '-1' COMMENT 'tenant id',
`operator` int(11) DEFAULT NULL COMMENT 'operator user id',
`operate_time` datetime DEFAULT NULL COMMENT 'operate time',
`create_time` datetime NOT NULL COMMENT 'create time',
`update_time` datetime DEFAULT NULL COMMENT 'update time',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
从表字段可以看出 日志表仅仅比主表多了两个字段operator、operate_time
任务定义表
案例中ab工作流task的json
"tasks": [{
"type": "SHELL",
"id": "tasks-77643",
"name": "a",
"params": {
"resourceList": [],
"localParams": [{
"prop": "yesterday",
"direct": "IN",
"type": "VARCHAR",
"value": "${system.biz.date}"
}],
"rawScript": "echo ${yesterday}"
},
"description": "",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"runFlag": "NORMAL",
"conditionResult": {
"successNode": [""],
"failedNode": [""]
},
"dependence": {},
"maxRetryTimes": "0",
"retryInterval": "1",
"taskInstancePriority": "MEDIUM",
"workerGroup": "default",
"preTasks": []
}, {
"type": "SHELL",
"id": "tasks-99814",
"name": "b",
"params": {
"resourceList": [],
"localParams": [{
"prop": "today",
"direct": "IN",
"type": "VARCHAR",
"value": "${system.biz.curdate}"
}],
"rawScript": "echo ${today}"
},
"description": "",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"runFlag": "NORMAL",
"conditionResult": {
"successNode": [""],
"failedNode": [""]
},
"dependence": {},
"maxRetryTimes": "0",
"retryInterval": "1",
"taskInstancePriority": "MEDIUM",
"workerGroup": "default",
"preTasks": ["a"]
}]
dep_c工作流task的json
"tasks": [{
"type": "SHELL",
"id": "tasks-69503",
"name": "c",
"params": {
"resourceList": [],
"localParams": [],
"rawScript": "echo 11"
},
"description": "",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"runFlag": "NORMAL",
"conditionResult": {
"successNode": [""],
"failedNode": [""]
},
"dependence": {},
"maxRetryTimes": "0",
"retryInterval": "1",
"taskInstancePriority": "MEDIUM",
"workerGroup": "default",
"preTasks": ["dep"]
}, {
"type": "DEPENDENT",
"id": "tasks-22756",
"name": "dep",
"params": {},
"description": "",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"runFlag": "NORMAL",
"conditionResult": {
"successNode": [""],
"failedNode": [""]
},
"dependence": {
"relation": "AND",
"dependTaskList": [{
"relation": "AND",
"dependItemList": [{
"projectId": 1,
"definitionId": 1,
"depTasks": "b",
"cycle": "day",
"dateValue": "today"
}]
}]
},
"maxRetryTimes": "0",
"retryInterval": "1",
"taskInstancePriority": "MEDIUM",
"workerGroup": "default",
"preTasks": []
}]
condition_test工作流task的json
"tasks": [{
"type": "SHELL",
"id": "tasks-68456",
"name": "d",
"params": {
"resourceList": [],
"localParams": [],
"rawScript": "echo 11"
},
"description": "",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"runFlag": "NORMAL",
"conditionResult": {
"successNode": [""],
"failedNode": [""]
},
"dependence": {},
"maxRetryTimes": "0",
"retryInterval": "1",
"taskInstancePriority": "MEDIUM",
"workerGroup": "default",
"preTasks": []
}, {
"type": "SHELL",
"id": "tasks-58183",
"name": "e",
"params": {
"resourceList": [],
"localParams": [],
"rawScript": "echo 22"
},
"description": "",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"runFlag": "NORMAL",
"conditionResult": {
"successNode": [""],
"failedNode": [""]
},
"dependence": {},
"maxRetryTimes": "0",
"retryInterval": "1",
"taskInstancePriority": "MEDIUM",
"workerGroup": "default",
"preTasks": ["cond"]
}, {
"type": "SHELL",
"id": "tasks-43996",
"name": "f",
"params": {
"resourceList": [],
"localParams": [],
"rawScript": "echo 33"
},
"description": "",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"runFlag": "NORMAL",
"conditionResult": {
"successNode": [""],
"failedNode": [""]
},
"dependence": {},
"maxRetryTimes": "0",
"retryInterval": "1",
"taskInstancePriority": "MEDIUM",
"workerGroup": "default",
"preTasks": ["cond"]
}, {
"type": "CONDITIONS",
"id": "tasks-38972",
"name": "cond",
"params": {},
"description": "",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"runFlag": "NORMAL",
"conditionResult": {
"successNode": ["e"],
"failedNode": ["f"]
},
"dependence": {
"relation": "AND",
"dependTaskList": [{
"relation": "AND",
"dependItemList": [{
"depTasks": "d",
"status": "SUCCESS"
}]
}]
},
"maxRetryTimes": "0",
"retryInterval": "1",
"taskInstancePriority": "MEDIUM",
"workerGroup": "default",
"preTasks": ["d"]
}]
从案例中可以知道SHELL/DEPENDENT/CONDITIONS类型的节点的json构成(其他任务类似SHELL),preTasks标识前置依赖节点。conditionResult结构比较固定,而dependence结构复杂,DEPENDENT和CONDITIONS类型任务的dependence结构还不一样,所以为了统一,我们将conditionResult和dependence整体放到params中,params对应表字段的task_params。
这样我们就确定了t_ds_task_definition表
CREATE TABLE `t_ds_task_definition` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
`code` bigint(20) NOT NULL COMMENT 'encoding',
`name` varchar(200) DEFAULT NULL COMMENT 'task definition name',
`version` int(11) DEFAULT NULL COMMENT 'task definition version',
`description` text COMMENT 'description',
`project_code` bigint(20) NOT NULL COMMENT 'project code',
`user_id` int(11) DEFAULT NULL COMMENT 'task definition creator id',
`task_type` varchar(50) NOT NULL COMMENT 'task type',
`task_params` text COMMENT 'job custom parameters',
`flag` tinyint(2) DEFAULT NULL COMMENT '0 not available, 1 available',
`task_priority` tinyint(4) DEFAULT NULL COMMENT 'job priority',
`worker_group` varchar(200) DEFAULT NULL COMMENT 'worker grouping',
`fail_retry_times` int(11) DEFAULT NULL COMMENT 'number of failed retries',
`fail_retry_interval` int(11) DEFAULT NULL COMMENT 'failed retry interval',
`timeout_flag` tinyint(2) DEFAULT '0' COMMENT 'timeout flag:0 close, 1 open',
`timeout_notify_strategy` tinyint(4) DEFAULT NULL COMMENT 'timeout notification policy: 0 warning, 1 fail',
`timeout` int(11) DEFAULT '0' COMMENT 'timeout length,unit: minute',
`delay_time` int(11) DEFAULT '0' COMMENT 'delay execution time,unit: minute',
`resource_ids` varchar(255) DEFAULT NULL COMMENT 'resource id, separated by comma',
`create_time` datetime NOT NULL COMMENT 'create time',
`update_time` datetime DEFAULT NULL COMMENT 'update time',
PRIMARY KEY (`id`,`code`),
UNIQUE KEY `task_unique` (`name`,`project_code`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `t_ds_task_definition_log` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
`code` bigint(20) NOT NULL COMMENT 'encoding',
`name` varchar(200) DEFAULT NULL COMMENT 'task definition name',
`version` int(11) DEFAULT NULL COMMENT 'task definition version',
`description` text COMMENT 'description',
`project_code` bigint(20) NOT NULL COMMENT 'project code',
`user_id` int(11) DEFAULT NULL COMMENT 'task definition creator id',
`task_type` varchar(50) NOT NULL COMMENT 'task type',
`task_params` text COMMENT 'job custom parameters',
`flag` tinyint(2) DEFAULT NULL COMMENT '0 not available, 1 available',
`task_priority` tinyint(4) DEFAULT NULL COMMENT 'job priority',
`worker_group` varchar(200) DEFAULT NULL COMMENT 'worker grouping',
`fail_retry_times` int(11) DEFAULT NULL COMMENT 'number of failed retries',
`fail_retry_interval` int(11) DEFAULT NULL COMMENT 'failed retry interval',
`timeout_flag` tinyint(2) DEFAULT '0' COMMENT 'timeout flag:0 close, 1 open',
`timeout_notify_strategy` tinyint(4) DEFAULT NULL COMMENT 'timeout notification policy: 0 warning, 1 fail',
`timeout` int(11) DEFAULT '0' COMMENT 'timeout length,unit: minute',
`delay_time` int(11) DEFAULT '0' COMMENT 'delay execution time,unit: minute',
`resource_ids` varchar(255) DEFAULT NULL COMMENT 'resource id, separated by comma',
`operator` int(11) DEFAULT NULL COMMENT 'operator user id',
`operate_time` datetime DEFAULT NULL COMMENT 'operate time',
`create_time` datetime NOT NULL COMMENT 'create time',
`update_time` datetime DEFAULT NULL COMMENT 'update time',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
//注意:dev版本和1.3.6版本区别,dev版本已将description换成desc,并且新增了delayTime
{
"globalParams": [],
"tasks": [{
"type": "SHELL",
"id": "tasks-18200",
"name": "d",
"code": "",
"params": {
"resourceList": [],
"localParams": [],
"rawScript": "echo 5"
},
"desc": "",
"runFlag": "NORMAL",
"conditionResult": {
"successNode": [
""
],
"failedNode": [
""
]
},
"dependence": {},
"maxRetryTimes": "0",
"retryInterval": "1",
"delayTime": "0",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"waitStartTimeout": {},
"taskInstancePriority": "MEDIUM",
"workerGroup": "hadoop",
"preTasks": [],
"depList": null
},
{
"type": "SHELL",
"id": "tasks-55225",
"name": "e",
"code": "",
"params": {
"resourceList": [],
"localParams": [],
"rawScript": "echo 6"
},
"desc": "",
"runFlag": "NORMAL",
"conditionResult": {
"successNode": [
""
],
"failedNode": [
""
]
},
"dependence": {},
"maxRetryTimes": "0",
"retryInterval": "1",
"delayTime": "0",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"waitStartTimeout": {},
"taskInstancePriority": "MEDIUM",
"workerGroup": "hadoop",
"preTasks": [
"def"
],
"depList": null
},
{
"type": "SHELL",
"id": "tasks-67639",
"name": "f",
"code": "",
"params": {
"resourceList": [],
"localParams": [],
"rawScript": "echo 7"
},
"desc": "",
"runFlag": "NORMAL",
"conditionResult": {
"successNode": [
""
],
"failedNode": [
""
]
},
"dependence": {},
"maxRetryTimes": "0",
"retryInterval": "1",
"delayTime": "0",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"waitStartTimeout": {},
"taskInstancePriority": "MEDIUM",
"workerGroup": "hadoop",
"preTasks": [
"def"
],
"depList": null
},
{
"type": "CONDITIONS",
"id": "tasks-67387",
"name": "def",
"code": "",
"params": {},
"desc": "",
"runFlag": "NORMAL",
"conditionResult": {
"successNode": [
"e"
],
"failedNode": [
"f"
]
},
"dependence": {
"relation": "AND",
"dependTaskList": [{
"relation": "AND",
"dependItemList": [{
"depTasks": "d",
"status": "SUCCESS"
},
{
"depTasks": "d",
"status": "FAILURE"
}
]
}]
},
"maxRetryTimes": "0",
"retryInterval": "1",
"delayTime": "0",
"timeout": {
"strategy": "",
"interval": null,
"enable": false
},
"waitStartTimeout": {},
"taskInstancePriority": "MEDIUM",
"workerGroup": "hadoop",
"preTasks": [
"d"
],
"depList": null
}
],
"tenantId": 1,
"timeout": 0
}
任务关系表
preTasks标识前置依赖节点,当前节点在关系表中使用postTask标识。由于当前节点肯定存在而前置节点不一定存在,所以post不可能为空,而preTask有可能为空
CREATE TABLE `t_ds_process_task_relation` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
`name` varchar(200) DEFAULT NULL COMMENT 'relation name',
`process_definition_version` int(11) DEFAULT NULL COMMENT 'process version',
`project_code` bigint(20) NOT NULL COMMENT 'project code',
`process_definition_code` bigint(20) NOT NULL COMMENT 'process code',
`pre_task_code` bigint(20) NOT NULL COMMENT 'pre task code',
`pre_task_version` int(11) NOT NULL COMMENT 'pre task version',
`post_task_code` bigint(20) NOT NULL COMMENT 'post task code',
`post_task_version` int(11) NOT NULL COMMENT 'post task version',
`condition_type` tinyint(2) DEFAULT NULL COMMENT 'condition type : 0 none, 1 judge 2 delay',
`condition_params` text COMMENT 'condition params(json)',
`create_time` datetime NOT NULL COMMENT 'create time',
`update_time` datetime DEFAULT NULL COMMENT 'update time',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `t_ds_process_task_relation_log` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'self-increasing id',
`name` varchar(200) DEFAULT NULL COMMENT 'relation name',
`process_definition_version` int(11) DEFAULT NULL COMMENT 'process version',
`project_code` bigint(20) NOT NULL COMMENT 'project code',
`process_definition_code` bigint(20) NOT NULL COMMENT 'process code',
`pre_task_code` bigint(20) NOT NULL COMMENT 'pre task code',
`pre_task_version` int(11) NOT NULL COMMENT 'pre task version',
`post_task_code` bigint(20) NOT NULL COMMENT 'post task code',
`post_task_version` int(11) NOT NULL COMMENT 'post task version',
`condition_type` tinyint(2) DEFAULT NULL COMMENT 'condition type : 0 none, 1 judge 2 delay',
`condition_params` text COMMENT 'condition params(json)',
`operator` int(11) DEFAULT NULL COMMENT 'operator user id',
`operate_time` datetime DEFAULT NULL COMMENT 'operate time',
`create_time` datetime NOT NULL COMMENT 'create time',
`update_time` datetime DEFAULT NULL COMMENT 'update time',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
对于依赖关系复杂的场景
API模块如何改造
api模块进行save操作时
通过雪花算法生成13位的数字作为process_definition_code,工作流定义保存至process_definition(主表)和process_definition_log(日志表),这两个表保存的是同样的数据,工作流定义版本为1
通过雪花算法生成13位的数字作为task_definition_code,任务定义表保存至task_definition(主表)和task_definition_log(日志表),也是保存同样的数据,任务定义版本为1
工作流任务关系保存在 process_task_relation(主表)和process_task_relation_log(日志表),该表保存的code和version是工作流的code和version,因为任务是通过工作流进行组织,以工作流来画dag。也是通过post_task_code和post_task_version标识dag的当前节点,这个节点的前置依赖通过pre_task_code和pre_task_version来标识,如果没有依赖,pre_task_code和pre_task_version为0
api模块进行update操作时,工作流定义和任务定义直接更新主表数据,更新后的数据insert到日志表。关系表主表先删除然后再插入新的关系,关系表日志表直接插入新的关系
api模块进行delete操作时,工作流定义、任务定义和关系表直接删除主表数据,日志表数据不变动
api模块进行switch操作时,直接将日志表中对应version数据覆盖到主表
数据交互如何改造
在json拆分一期api模块controller层整体未变动,传入的大json还是在service层映射为ProcessData对象。insert或update操作在公共Service模块通过ProcessService.saveProcessDefiniton()入口完成保存数据库操作,按照task_definition、process_task_relation、process_definition的顺序保存。保存时,如果该任务已经存在并且关联的工作流未上线,则更改任务;如果任务关联的工作流已上线,则不允许更改任务
api查询操作时,当前还是通过工作流id来查询,在公共Service模块通过ProcessService.genTaskNodeList()入口完成数据组装,还是组装为ProcessData对象,进而生成json返回
Server模块(Master)也是通过公共Service模块ProcessService.genTaskNodeList()获得TaskNodeList生成调度dag,把当前任务所有信息放到 MasterExecThread.readyToSubmitTaskQueue队列,以便生成taskInstance,dispatch给worker
当前json还需做什么
controller对外restAPI接口改造
ui模块dag改造
ui模块新增task操作页面
processDefinition
备注:taskRelationJson格式:
[{"name":"","pre_task_code":0,"pre_task_version":0,"post_task_code":123456789,"post_task_version":1,"condition_type":0,"condition_params":{}},{"name":"","pre_task_code":123456789,"pre_task_version":1,"post_task_code":123451234,"post_task_version":1,"condition_type":0,"condition_params":{}}]
同理其他接口请求参数processDefinitionId换成code
schedule
taskDefinition(新增)
taskDefinitionJson:[{"name":"test","description":"","task_type":"SHELL","task_params":[],"flag":0,"task_priority":0,"worker_group":"default","fail_retry_times":0,"fail_retry_interval":0,"timeout_flag":0,"timeout_notify_strategy":0,"timeout":0,"delay_time":0,"resource_ids":""}]
对应需求issue
[Feature][JsonSplit-api] api module controller design #5498
[Feature][JsonSplit-api]processDefinition save/update interface #5499
[Feature][JsonSplit-api]processDefinition switch interface #5501
[Feature][JsonSplit-api]processDefinition delete interface #5502
[Feature][JsonSplit-api]processDefinition copy interface #5503
[Feature][JsonSplit-api]processDefinition export interface #5504
[Feature][JsonSplit-api]processDefinition list-paging interface #5505
[Feature][JsonSplit-api]processDefinition move interface #5506
[Feature][JsonSplit-api]processDefinition queryProcessDefinitionAllByProjectId interface #5507
[Feature][JsonSplit-api]processDefinition select-by-id interface #5508
[Feature][JsonSplit-api]processDefinition view-tree interface #5509
[Feature][JsonSplit-api]schedule create interface #5510
[Feature][JsonSplit-api]schedule list-paging interface #5511
[Feature][JsonSplit-api]schedule update interface #5512
[Feature][JsonSplit-api]taskDefinition save interface #5513
[Feature][JsonSplit-api]taskDefinition update interface #5514
[Feature][JsonSplit-api]taskDefinition switch interface #5515
[Feature][JsonSplit-api]taskDefinition query interface #5516
[Feature][JsonSplit-api]taskDefinition delete interface #5517
[Feature][JsonSplit-api]WorkFlowLineage interface #5518
[Feature][JsonSplit-api]analysis interface #5519
[Feature][JsonSplit-api]executors interface #5520
[Feature][JsonSplit-api]processInstance interface #5521
[Feature][JsonSplit-api]project interface #5522
https://github.com/apache/dolphinscheduler/issues
Github issues
示说网
https://www.slidestalk.com/
点击阅读原文查看本次分享视频
DolphinScheduler JSON拆解详解的更多相关文章
- SpringMVC接受JSON参数详解及常见错误总结我改
SpringMVC接受JSON参数详解及常见错误总结 最近一段时间不想使用Session了,想感受一下Token这样比较安全,稳健的方式,顺便写一个统一的接口给浏览器还有APP.所以把一个练手项目的前 ...
- Angular 6.X CLI(Angular.json) 属性详解
Angular CLI(Angular.json) 属性详解 简介 angular cli 是angular commond line interface的缩写,意为angular的命令行接口.在an ...
- Angular Npm Package.Json文件详解
Angular7 Npm Package.Json文件详解 近期时间比较充裕,正好想了解下Angular Project相关内容.于是将Npm官网上关于Package.json的官方说明文档进行了 ...
- SpringMVC接受JSON参数详解及常见错误总结
SpringMVC接受JSON参数详解及常见错误总结 SpringMVC接受JSON参数详解及常见错误总结 最近一段时间不想使用Session了,想感受一下Token这样比较安全,稳健的方式,顺便写一 ...
- SpringMVC接受JSON参数详解
转:https://blog.csdn.net/LostSh/article/details/68923874 SpringMVC接受JSON参数详解及常见错误总结 最近一段时间不想使用Session ...
- 最强常用开发库总结 - JSON库详解
最强常用开发库总结 - JSON库详解 JSON应用非常广泛,对于Java常用的JSON库要完全掌握.@pdai JSON简介 JSON是什么 JSON 指的是 JavaScript 对象表示法(Ja ...
- spring-mvc注解(mvc:annotation-driver,JSON,配置详解)
一.DefaultAnnotationHandlerMapping 和 AnnotationMethodHandlerAdapter 的使用已经过时! spring 3.1 开始我们应该用 Reque ...
- android Json解析详解
JSON的定义: 一种轻量级的数据交换格式,具有良好的可读和便于快速编写的特性.业内主流技术为其提供了完整的解决方案(有点类似于正则表达式 ,获得了当今大部分语 言的支持),从而可以在不同平台间进行数 ...
- android Json解析详解(详细代码)
JSON的定义: 一种轻量级的数据交换格式,具有良好的可读和便于快速编写的特性.业内主流技术为其提供了完整的解决方案(有点类似于正则表达式 ,获得了当今大部分语言的支持),从而可以在不同平台间进行数据 ...
随机推荐
- 联盟链 Hyperledger Fabric 应用场景
一.说明 本文主要通过一个例子分享以 Hyperledger Fabric 为代表的联盟链应用场景. 关于 Fabric 的相关概念请先参考文章 <Hyperledger Fabric 核心概念 ...
- OpenHarmony3.1 Release版本特性解析——硬件资源池化架构介绍
李刚 OpenHarmony 分布式硬件管理 SIG 成员 华为技术有限公司分布式硬件专家 OpenHarmony 作为面向全场景.全连接.全智能时代的分布式操作系统,通过将各类不同终端设备的能力进行 ...
- 腾讯产品快速尝鲜,蓝鲸智云社区版V6.1灰度测试开启
这周小鲸悄悄推送了社区版V6.1(二进制部署版本,包含基础套餐.监控日志套餐),没过一天就有用户来问6.1的使用问题了.小鲸大吃一鲸,原来你还是爱我的. ![请添加图片描述](https://img- ...
- Random 中的Seed
C#中使用随机数 看下例 当Random的种子是0时 生成的随机数列表是一样的 也就是说当seed 一样时 审查的随机数时一样的 Random的无参实例默认 种子 时当前时间 如果要确保生成的随机数不 ...
- Go中rune类型浅析
一.字符串简单遍历操作 在很多语言中,字符串都是不可变类型,golang也是. 1.访问字符串字符 如下代码,可以实现访问字符串的单个字符和单个字节 package main import ( &qu ...
- B - A Simple Task
https://vjudge.net/contest/446582#problem/B 这道题是一道不错的线段树练代码能力的题. #include<bits/stdc++.h> using ...
- Mac 睡眠唤醒 不睡眠 问题
问题 之前一直有夜晚睡觉前电脑关机的习惯,主要是想着电脑也跟人一样️要休息.然后最近想着自己 Mac 干脆每天睡眠算了,省得每天开关机麻烦,所以就最近这段时间每次夜晚睡觉前主动去点了电脑的 「Slee ...
- 原理:C++为什么一般把模板实现放入头文件
写在前面 本文通过实例分析与讲解,解释了为什么C++一般将模板实现放在头文件中.这主要与C/C++的编译机制以及C++模板的实现原理相关,详情见正文.同时,本文给出了不将模板实现放在头文件中的解决方案 ...
- 遍历list集合泛型为map
public class ListAddMap { public static void main( String args[] ) { List<Map<String, Object&g ...
- 深入C++04:模板编程
模板编程 函数模板 模板意义:对类型也进行参数化: 函数模板:是不编译的,因为类型不知道 模板的实例化:函数调用点进行实例化,生成模板函数 模板函数:这才是要被编译器所编译的 函数模板.模板的特例化. ...