元数据设计

如上图,Doris 的元数据主要存储4类数据:

  • 用户数据信息。包括数据库、表的 Schema、分片信息等。
  • 各类作业信息。如导入作业,Clone 作业、SchemaChange 作业等。
  • 用户及权限信息
  • 集群及节点信息

元数据目录

元数据目录通过 FE 的配置项 meta_dir 指定。

bdb/ 目录下为 bdbje 的数据存放目录。

image/ 目录下为 image 文件的存放目录。

image.[logid] 是最新的 image 文件。后缀 logid 表明 image 所包含的最后一条日志的 id。

image.ckpt 是正在写入的 image 文件,如果写入成功,会重命名为 image.[logid],并替换掉旧的 image 文件。

VERSION 文件中记录着 cluster_id。cluster_id 唯一标识一个 Doris 集群。是在 leader 第一次启动时随机生成的一个 32 位整型。也可以通过 fe 配置项 cluster_id 来指定一个 cluster id。

ROLE 文件中记录的 FE 自身的角色。只有 FOLLOWER 和 OBSERVER 两种。其中 FOLLOWER 表示 FE 为一个可选举的节点。(注意:即使是 leader 节点,其角色也为 FOLLOWER)

DDL相关源代码阅读

启动MySQL服务

org.apache.doris.qe.QeService

if (nioEnabled) {
mysqlServer = new NMysqlServer(port, scheduler);
} else {
mysqlServer = new MysqlServer(port, scheduler);
}

DDL代码调用过程

org.apache.doris.qe.ConnectProcessor#dispatch 命令识别

switch (command) {
case COM_INIT_DB:
handleInitDb();
break;
case COM_QUIT:
handleQuit();
break;
case COM_QUERY:
handleQuery();
break;
case COM_FIELD_LIST:
handleFieldList();
break;
case COM_PING:
handlePing();
break;
default:
ctx.getState().setError("Unsupported command(" + command + ")");
LOG.warn("Unsupported command(" + command + ")");
break;

org.apache.doris.qe.ConnectProcessor#analyze 词法语法解析

// Parse statement with parser generated by CUP&FLEX
SqlScanner input = new SqlScanner(new StringReader(originStmt), ctx.getSessionVariable().getSqlMode());
SqlParser parser = new SqlParser(input);

从连接中读取原始语句字符串

词法解析文件

• fe/fe-core/src/main/jflex/sql_scanner.flex

• 语法解析文件

• fe/fe-core/src/main/cup/sql_parser.cup

所有语法实现类:

StatementBase     [vim org/apache/doris/analysis/StatementBase.java +33]
├── ExportStmt [vim org/apache/doris/analysis/ExportStmt.java +48]
├── ImportColumnsStmt [vim org/apache/doris/analysis/ImportColumnsStmt.java +21]
├── ImportDeleteOnStmt [vim org/apache/doris/analysis/ImportDeleteOnStmt.java +19]
├── ImportSequenceStmt [vim org/apache/doris/analysis/ImportSequenceStmt.java +19]
├── ImportWhereStmt [vim org/apache/doris/analysis/ImportWhereStmt.java +19]
├── KillStmt [vim org/apache/doris/analysis/KillStmt.java +19]
├── SetStmt [vim org/apache/doris/analysis/SetStmt.java +24]
├── UseStmt [vim org/apache/doris/analysis/UseStmt.java +33]
├── QueryStmt [vim org/apache/doris/analysis/QueryStmt.java +38]
│ ├── SelectStmt [vim org/apache/doris/analysis/SelectStmt.java +65]
│ └── SetOperationStmt [vim org/apache/doris/analysis/SetOperationStmt.java +36]
├── ShowStmt [vim org/apache/doris/analysis/ShowStmt.java +22]
│ ├── AdminShowConfigStmt [vim org/apache/doris/analysis/AdminShowConfigStmt.java +33]
│ ├── AdminShowDataSkewStmt [vim org/apache/doris/analysis/AdminShowDataSkewStmt.java +32]
│ ├── AdminShowReplicaDistributionStmt [vim org/apache/doris/analysis/AdminShowReplicaDistributionStmt.java +34]
│ ├── AdminShowReplicaStatusStmt [vim org/apache/doris/analysis/AdminShowReplicaStatusStmt.java +39]
│ ├── DescribeStmt [vim org/apache/doris/analysis/DescribeStmt.java +54]
│ ├── HelpStmt [vim org/apache/doris/analysis/HelpStmt.java +26]
│ ├── ShowAlterStmt [vim org/apache/doris/analysis/ShowAlterStmt.java +46]
│ ├── ShowAuthorStmt [vim org/apache/doris/analysis/ShowAuthorStmt.java +23]
│ ├── ShowBackendsStmt [vim org/apache/doris/analysis/ShowBackendsStmt.java +30]
│ ├── ShowBackupStmt [vim org/apache/doris/analysis/ShowBackupStmt.java +38]
│ ├── ShowBrokerStmt [vim org/apache/doris/analysis/ShowBrokerStmt.java +30]
│ ├── ShowCharsetStmt [vim org/apache/doris/analysis/ShowCharsetStmt.java +23]
│ ├── ShowClusterStmt [vim org/apache/doris/analysis/ShowClusterStmt.java +34]
│ ├── ShowCollationStmt [vim org/apache/doris/analysis/ShowCollationStmt.java +24]
│ ├── ShowColumnStatsStmt [vim org/apache/doris/analysis/ShowColumnStatsStmt.java +28]
│ ├── ShowColumnStmt [vim org/apache/doris/analysis/ShowColumnStmt.java +28]
│ ├── ShowCreateDbStmt [vim org/apache/doris/analysis/ShowCreateDbStmt.java +36]
│ ├── ShowCreateFunctionStmt [vim org/apache/doris/analysis/ShowCreateFunctionStmt.java +32]
│ ├── ShowCreateRoutineLoadStmt [vim org/apache/doris/analysis/ShowCreateRoutineLoadStmt.java +24]
│ ├── ShowCreateTableStmt [vim org/apache/doris/analysis/ShowCreateTableStmt.java +29]
│ ├── ShowDataStmt [vim org/apache/doris/analysis/ShowDataStmt.java +56]
│ ├── ShowDbIdStmt [vim org/apache/doris/analysis/ShowDbIdStmt.java +29]
│ ├── ShowDbStmt [vim org/apache/doris/analysis/ShowDbStmt.java +27]
│ ├── ShowDeleteStmt [vim org/apache/doris/analysis/ShowDeleteStmt.java +31]
│ ├── ShowDynamicPartitionStmt [vim org/apache/doris/analysis/ShowDynamicPartitionStmt.java +29]
│ ├── ShowEncryptKeysStmt [vim org/apache/doris/analysis/ShowEncryptKeysStmt.java +32]
│ ├── ShowEnginesStmt [vim org/apache/doris/analysis/ShowEnginesStmt.java +23]
│ ├── ShowEventsStmt [vim org/apache/doris/analysis/ShowEventsStmt.java +23]
│ ├── ShowExportStmt [vim org/apache/doris/analysis/ShowExportStmt.java +40]
│ ├── ShowFrontendsStmt [vim org/apache/doris/analysis/ShowFrontendsStmt.java +30]
│ ├── ShowFunctionsStmt [vim org/apache/doris/analysis/ShowFunctionsStmt.java +32]
│ ├── ShowGrantsStmt [vim org/apache/doris/analysis/ShowGrantsStmt.java +32]
│ ├── ShowIndexStmt [vim org/apache/doris/analysis/ShowIndexStmt.java +32]
│ ├── ShowLoadProfileStmt [vim org/apache/doris/analysis/ShowLoadProfileStmt.java +27]
│ ├── ShowLoadStmt [vim org/apache/doris/analysis/ShowLoadStmt.java +42]
│ ├── ShowLoadWarningsStmt [vim org/apache/doris/analysis/ShowLoadWarningsStmt.java +36]
│ ├── ShowMigrationsStmt [vim org/apache/doris/analysis/ShowMigrationsStmt.java +31]
│ ├── ShowOpenTableStmt [vim org/apache/doris/analysis/ShowOpenTableStmt.java +23]
│ ├── ShowPartitionIdStmt [vim org/apache/doris/analysis/ShowPartitionIdStmt.java +29]
│ ├── ShowPartitionsStmt [vim org/apache/doris/analysis/ShowPartitionsStmt.java +49]
│ ├── ShowPluginsStmt [vim org/apache/doris/analysis/ShowPluginsStmt.java +23]
│ ├── ShowProcStmt [vim org/apache/doris/analysis/ShowProcStmt.java +32]
│ ├── ShowProcedureStmt [vim org/apache/doris/analysis/ShowProcedureStmt.java +23]
│ ├── ShowProcesslistStmt [vim org/apache/doris/analysis/ShowProcesslistStmt.java +24]
│ ├── ShowQueryProfileStmt [vim org/apache/doris/analysis/ShowQueryProfileStmt.java +27]
│ ├── ShowRepositoriesStmt [vim org/apache/doris/analysis/ShowRepositoriesStmt.java +25]
│ ├── ShowResourcesStmt [vim org/apache/doris/analysis/ShowResourcesStmt.java +37]
│ ├── ShowRestoreStmt [vim org/apache/doris/analysis/ShowRestoreStmt.java +38]
│ ├── ShowRolesStmt [vim org/apache/doris/analysis/ShowRolesStmt.java +29]
│ ├── ShowRollupStmt [vim org/apache/doris/analysis/ShowRollupStmt.java +28]
│ ├── ShowRoutineLoadStmt [vim org/apache/doris/analysis/ShowRoutineLoadStmt.java +34]
│ ├── ShowRoutineLoadTaskStmt [vim org/apache/doris/analysis/ShowRoutineLoadTaskStmt.java +32]
│ ├── ShowSmallFilesStmt [vim org/apache/doris/analysis/ShowSmallFilesStmt.java +32]
│ ├── ShowSnapshotStmt [vim org/apache/doris/analysis/ShowSnapshotStmt.java +29]
│ ├── ShowSqlBlockRuleStmt [vim org/apache/doris/analysis/ShowSqlBlockRuleStmt.java +31]
│ ├── ShowStatusStmt [vim org/apache/doris/analysis/ShowStatusStmt.java +23]
│ ├── ShowStreamLoadStmt [vim org/apache/doris/analysis/ShowStreamLoadStmt.java +39]
│ ├── ShowSyncJobStmt [vim org/apache/doris/analysis/ShowSyncJobStmt.java +33]
│ ├── ShowTableIdStmt [vim org/apache/doris/analysis/ShowTableIdStmt.java +30]
│ ├── ShowTableStatsStmt [vim org/apache/doris/analysis/ShowTableStatsStmt.java +32]
│ ├── ShowTableStatusStmt [vim org/apache/doris/analysis/ShowTableStatusStmt.java +35]
│ ├── ShowTableStmt [vim org/apache/doris/analysis/ShowTableStmt.java +34]
│ ├── ShowTabletStmt [vim org/apache/doris/analysis/ShowTabletStmt.java +39]
│ ├── ShowTransactionStmt [vim org/apache/doris/analysis/ShowTransactionStmt.java +35]
│ ├── ShowTrashDiskStmt [vim org/apache/doris/analysis/ShowTrashDiskStmt.java +33]
│ ├── ShowTrashStmt [vim org/apache/doris/analysis/ShowTrashStmt.java +36]
│ ├── ShowTriggersStmt [vim org/apache/doris/analysis/ShowTriggersStmt.java +23]
│ ├── ShowUserPropertyStmt [vim org/apache/doris/analysis/ShowUserPropertyStmt.java +42]
│ ├── ShowUserStmt [vim org/apache/doris/analysis/ShowUserStmt.java +25]
│ ├── ShowVariablesStmt [vim org/apache/doris/analysis/ShowVariablesStmt.java +29]
│ ├── ShowViewStmt [vim org/apache/doris/analysis/ShowViewStmt.java +39]
│ ├── ShowWarningStmt [vim org/apache/doris/analysis/ShowWarningStmt.java +23]
│ └── ShowWhiteListStmt [vim org/apache/doris/analysis/ShowWhiteListStmt.java +23]
├── TransactionStmt [vim org/apache/doris/analysis/TransactionStmt.java +22]
│ ├── TransactionBeginStmt [vim org/apache/doris/analysis/TransactionBeginStmt.java +24]
│ ├── TransactionCommitStmt [vim org/apache/doris/analysis/TransactionCommitStmt.java +19]
│ └── TransactionRollbackStmt [vim org/apache/doris/analysis/TransactionRollbackStmt.java +19]
├── UnsupportedStmt [vim org/apache/doris/analysis/UnsupportedStmt.java +22]
│ └── EmptyStmt [vim org/apache/doris/analysis/EmptyStmt.java +19]
└── DdlStmt [vim org/apache/doris/analysis/DdlStmt.java +19]
├── AdminCancelRepairTableStmt [vim org/apache/doris/analysis/AdminCancelRepairTableStmt.java +33]
├── AdminCheckTabletsStmt [vim org/apache/doris/analysis/AdminCheckTabletsStmt.java +33]
├── AdminCleanTrashStmt [vim org/apache/doris/analysis/AdminCleanTrashStmt.java +34]
├── AdminRepairTableStmt [vim org/apache/doris/analysis/AdminRepairTableStmt.java +33]
├── AdminSetConfigStmt [vim org/apache/doris/analysis/AdminSetConfigStmt.java +32]
├── AdminSetReplicaStatusStmt [vim org/apache/doris/analysis/AdminSetReplicaStatusStmt.java +30]
├── AlterClusterStmt [vim org/apache/doris/analysis/AlterClusterStmt.java +29]
├── AlterColumnStatsStmt [vim org/apache/doris/analysis/AlterColumnStatsStmt.java +33]
├── AlterDatabasePropertyStmt [vim org/apache/doris/analysis/AlterDatabasePropertyStmt.java +24]
├── AlterDatabaseQuotaStmt [vim org/apache/doris/analysis/AlterDatabaseQuotaStmt.java +30]
├── AlterDatabaseRename [vim org/apache/doris/analysis/AlterDatabaseRename.java +34]
├── AlterRoutineLoadStmt [vim org/apache/doris/analysis/AlterRoutineLoadStmt.java +34]
├── AlterSqlBlockRuleStmt [vim org/apache/doris/analysis/AlterSqlBlockRuleStmt.java +31]
├── AlterSystemStmt [vim org/apache/doris/analysis/AlterSystemStmt.java +28]
├── AlterTableStatsStmt [vim org/apache/doris/analysis/AlterTableStatsStmt.java +33]
├── AlterTableStmt [vim org/apache/doris/analysis/AlterTableStmt.java +37]
├── CancelLoadStmt [vim org/apache/doris/analysis/CancelLoadStmt.java +26]
├── CreateClusterStmt [vim org/apache/doris/analysis/CreateClusterStmt.java +33]
├── CreateDataSyncJobStmt [vim org/apache/doris/analysis/CreateDataSyncJobStmt.java +36]
├── CreateDbStmt [vim org/apache/doris/analysis/CreateDbStmt.java +31]
├── CreateEncryptKeyStmt [vim org/apache/doris/analysis/CreateEncryptKeyStmt.java +30]
├── CreateFileStmt [vim org/apache/doris/analysis/CreateFileStmt.java +35]
├── CreateFunctionStmt [vim org/apache/doris/analysis/CreateFunctionStmt.java +47]
├── CreateMaterializedViewStmt [vim org/apache/doris/analysis/CreateMaterializedViewStmt.java +43]
├── CreateRepositoryStmt [vim org/apache/doris/analysis/CreateRepositoryStmt.java +28]
├── CreateResourceStmt [vim org/apache/doris/analysis/CreateResourceStmt.java +32]
├── CreateRoleStmt [vim org/apache/doris/analysis/CreateRoleStmt.java +28]
├── CreateRoutineLoadStmt [vim org/apache/doris/analysis/CreateRoutineLoadStmt.java +48]
├── CreateSqlBlockRuleStmt [vim org/apache/doris/analysis/CreateSqlBlockRuleStmt.java +37]
├── CreateTableAsSelectStmt [vim org/apache/doris/analysis/CreateTableAsSelectStmt.java +26]
├── CreateTableLikeStmt [vim org/apache/doris/analysis/CreateTableLikeStmt.java +31]
├── CreateTableStmt [vim org/apache/doris/analysis/CreateTableStmt.java +56]
├── CreateUserStmt [vim org/apache/doris/analysis/CreateUserStmt.java +36]
├── DeleteStmt [vim org/apache/doris/analysis/DeleteStmt.java +35]
├── DropClusterStmt [vim org/apache/doris/analysis/DropClusterStmt.java +31]
├── DropDbStmt [vim org/apache/doris/analysis/DropDbStmt.java +30]
├── DropEncryptKeyStmt [vim org/apache/doris/analysis/DropEncryptKeyStmt.java +28]
├── DropFileStmt [vim org/apache/doris/analysis/DropFileStmt.java +34]
├── DropFunctionStmt [vim org/apache/doris/analysis/DropFunctionStmt.java +27]
├── DropMaterializedViewStmt [vim org/apache/doris/analysis/DropMaterializedViewStmt.java +29]
├── DropRepositoryStmt [vim org/apache/doris/analysis/DropRepositoryStmt.java +27]
├── DropResourceStmt [vim org/apache/doris/analysis/DropResourceStmt.java +27]
├── DropRoleStmt [vim org/apache/doris/analysis/DropRoleStmt.java +28]
├── DropSqlBlockRuleStmt [vim org/apache/doris/analysis/DropSqlBlockRuleStmt.java +30]
├── DropTableStmt [vim org/apache/doris/analysis/DropTableStmt.java +28]
├── DropUserStmt [vim org/apache/doris/analysis/DropUserStmt.java +27]
├── EnterStmt [vim org/apache/doris/analysis/EnterStmt.java +25]
├── GrantStmt [vim org/apache/doris/analysis/GrantStmt.java +39]
├── InsertStmt [vim org/apache/doris/analysis/InsertStmt.java +66]
├── InstallPluginStmt [vim org/apache/doris/analysis/InstallPluginStmt.java +31]
├── LinkDbStmt [vim org/apache/doris/analysis/LinkDbStmt.java +31]
├── LoadStmt [vim org/apache/doris/analysis/LoadStmt.java +45]
├── MigrateDbStmt [vim org/apache/doris/analysis/MigrateDbStmt.java +29]
├── PauseRoutineLoadStmt [vim org/apache/doris/analysis/PauseRoutineLoadStmt.java +26]
├── PauseSyncJobStmt [vim org/apache/doris/analysis/PauseSyncJobStmt.java +22]
├── RecoverDbStmt [vim org/apache/doris/analysis/RecoverDbStmt.java +33]
├── RecoverPartitionStmt [vim org/apache/doris/analysis/RecoverPartitionStmt.java +32]
├── RecoverTableStmt [vim org/apache/doris/analysis/RecoverTableStmt.java +32]
├── ResumeRoutineLoadStmt [vim org/apache/doris/analysis/ResumeRoutineLoadStmt.java +26]
├── ResumeSyncJobStmt [vim org/apache/doris/analysis/ResumeSyncJobStmt.java +22]
├── RevokeStmt [vim org/apache/doris/analysis/RevokeStmt.java +32]
├── SetUserPropertyStmt [vim org/apache/doris/analysis/SetUserPropertyStmt.java +31]
├── StopRoutineLoadStmt [vim org/apache/doris/analysis/StopRoutineLoadStmt.java +23]
├── StopSyncJobStmt [vim org/apache/doris/analysis/StopSyncJobStmt.java +22]
├── SyncStmt [vim org/apache/doris/analysis/SyncStmt.java +22]
├── TruncateTableStmt [vim org/apache/doris/analysis/TruncateTableStmt.java +27]
├── UninstallPluginStmt [vim org/apache/doris/analysis/UninstallPluginStmt.java +28]
├── UpdateStmt [vim org/apache/doris/analysis/UpdateStmt.java +35]
├── AbstractBackupStmt [vim org/apache/doris/analysis/AbstractBackupStmt.java +36]
│ ├── BackupStmt [vim org/apache/doris/analysis/BackupStmt.java +29]
│ └── RestoreStmt [vim org/apache/doris/analysis/RestoreStmt.java +33]
├── BaseViewStmt [vim org/apache/doris/analysis/BaseViewStmt.java +39]
│ ├── AlterViewStmt [vim org/apache/doris/analysis/AlterViewStmt.java +31]
│ └── CreateViewStmt [vim org/apache/doris/analysis/CreateViewStmt.java +33]
└── CancelStmt [vim org/apache/doris/analysis/CancelStmt.java +19]
├── CancelAlterSystemStmt [vim org/apache/doris/analysis/CancelAlterSystemStmt.java +28]
├── CancelAlterTableStmt [vim org/apache/doris/analysis/CancelAlterTableStmt.java +31]
└── CancelBackupStmt [vim org/apache/doris/analysis/CancelBackupStmt.java +30]

org.apache.doris.qe.StmtExecutor#execute(TUniqueId)

analyze(context.getSessionVariable().toThrift()); //语义解析
if (isForwardToMaster()) {
forwardToMaster(); //转发处理
if (masterOpExecutor != null && masterOpExecutor.getQueryId() != null) {
context.setQueryId(masterOpExecutor.getQueryId());
}
return;
} else {
LOG.debug("no need to transfer to Master. stmt: {}", context.getStmtId());
} //命令执行
else if (parsedStmt instanceof DdlStmt) {
handleDdlStmt();
} else if (parsedStmt instanceof ShowStmt) {
handleShow();
} else if (parsedStmt instanceof KillStmt) {
handleKill();
}

语义解析

判断含义的正确性

@Override
public void analyze(Analyzer analyzer) throws UserException { super.analyze(analyzer);
tableName.analyze(analyzer); checkTblPriv(ConnectContext.get(), tableName.getDb(),
tableName.getTbl(), PrivPredicate.CREATE) analyzeEngineName(); keysDesc.analyze(columnDefs); for (ColumnDef columnDef : columnDefs) {
columnDef.analyze(engineName.equals("olap"));
} partitionDesc.analyze(columnDefs, properties); distributionDesc.analyze(columnSet);
}

验证名称是否合法

权限是否正确

分区是否合法

列类型是否合法

转发处理

Master、Follower、Observer

只有Master有元数据的修改能力

所有需要修改元数据的操作,需要转发到Master去执行

转发类型:

FORWARD_NO_SYNC

FORWARD_WITH_SYNC

NO_FORWARD

DDL 采用 FORWARD_WITH_SYNC

命令执行

org.apache.doris.qe.DdlExecutor#execute()

//根据语句类型执行相应的函数

if (ddlStmt instanceof CreateTableStmt) {
catalog.createTable((CreateTableStmt) ddlStmt);
}

支持多种表类型, 除了olap 表, 其余都为映射表

if (engineName.equals("olap")) {
createOlapTable(db, stmt);
return;
} else if (engineName.equals("odbc")) {
createOdbcTable(db, stmt);
return;
} else if (engineName.equals("mysql")) {
createMysqlTable(db, stmt);
return;
} else if (engineName.equals("broker")) {
createBrokerTable(db, stmt);
return;
} else if (engineName.equalsIgnoreCase("elasticsearch") || engineName.equalsIgnoreCase("es")) {
createEsTable(db, stmt);
return;
} else if (engineName.equalsIgnoreCase("hive")) {
createHiveTable(db, stmt);
return;
}

org.apache.doris.catalog.Catalog#createOlapTable

//将语法对象转为元数据对象
String tableName = stmt.getTableName();
LOG.debug("begin create olap table: {}", tableName); // create columns
List<Column> baseSchema = stmt.getColumns();
validateColumns(baseSchema); // create partition info
PartitionDesc partitionDesc = stmt.getPartitionDesc();
PartitionInfo partitionInfo = null; //创建table对象
long tableId = Catalog.getCurrentCatalog().getNextId();
OlapTable olapTable = new OlapTable(tableId, tableName, baseSchema, keysType, partitionInfo,
distributionInfo, indexes); // 创建Partition 对象
if (partitionInfo.getType() == PartitionType.UNPARTITIONED) {
// this is a 1-level partitioned table
// use table name as partition name
String partitionName = tableName;
long partitionId = partitionNameToId.get(partitionName);
// create partition
Partition partition = createPartitionWithIndices()
olapTable.addPartition(partition);
} //添加元数据并进行持久化
Pair<Boolean, Boolean> result = db.createTableWithLock(olapTable, false, stmt.isSetIfNotExists());

org.apache.doris.catalog.Catalog#createPartitionWithIndices

分区是表的实体

// create base index first.
Preconditions.checkArgument(baseIndexId != -1);
MaterializedIndex baseIndex = new MaterializedIndex(baseIndexId, IndexState.NORMAL); // create partition with base index
Partition partition = new Partition(partitionId, partitionName, baseIndex, distributionInfo); // add to index map
Map<Long, MaterializedIndex> indexMap = new HashMap<>();
indexMap.put(baseIndexId, baseIndex); // create rollup index if has
for (long indexId : indexIdToMeta.keySet()) {
if (indexId == baseIndexId) {
continue;
} MaterializedIndex rollup = new MaterializedIndex(indexId, IndexState.NORMAL);
indexMap.put(indexId, rollup);
} for (Map.Entry<Long, MaterializedIndex> entry : indexMap.entrySet()) {
// create tablets
int schemaHash = indexMeta.getSchemaHash();
TabletMeta tabletMeta = new TabletMeta(dbId, tableId, partitionId, indexId, schemaHash, storageMedium);
createTablets(clusterName, index, ReplicaState.NORMAL, distributionInfo, version, versionHash,
replicaAlloc, tabletMeta, tabletIdSet); // add create replica task for olap
short shortKeyColumnCount = indexMeta.getShortKeyColumnCount();
TStorageType storageType = indexMeta.getStorageType();
List<Column> schema = indexMeta.getSchema();
KeysType keysType = indexMeta.getKeysType();
int totalTaskNum = index.getTablets().size() * totalReplicaNum;
MarkedCountDownLatch<Long, Long> countDownLatch = new MarkedCountDownLatch<Long, Long>(totalTaskNum);
AgentBatchTask batchTask = new AgentBatchTask();
for (Tablet tablet : index.getTablets()) {
long tabletId = tablet.getId();
for (Replica replica : tablet.getReplicas()) {
long backendId = replica.getBackendId();
countDownLatch.addMark(backendId, tabletId);
CreateReplicaTask task = new CreateReplicaTask(backendId, dbId, tableId,
partitionId, indexId, tabletId,
shortKeyColumnCount, schemaHash,
version, versionHash,
keysType,
storageType, storageMedium,
schema, bfColumns, bfFpp,
countDownLatch,
indexes,
isInMemory,
tabletType);
task.setStorageFormat(storageFormat);
batchTask.addTask(task);
// add to AgentTaskQueue for handling finish report.
// not for resending task
AgentTaskQueue.addTask(task);
}
}
AgentTaskExecutor.submit(batchTask); }

整体流程:

  • 创建Partition 对象
  • 创建MaterializedIndex对象
  • 对于每个MaterializedIndex对象 创建创建Tablet
  • 创建replica并下发任务到BE

// estimate timeout
long timeout = Config.tablet_create_timeout_second * 1000L * totalTaskNum;
timeout = Math.min(timeout, Config.max_create_table_timeout_second * 1000);
try {
ok = countDownLatch.await(timeout, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
LOG.warn("InterruptedException: ", e);
ok = false;
}

等待BE执行任务完成

org.apache.doris.catalog.Database#createTableWithLock

idToTable.put(table.getId(), table);
nameToTable.put(table.getName(), table);
lowerCaseToTableName.put(tableName.toLowerCase(), tableName); if (!isReplay) {
// Write edit log
CreateTableInfo info = new CreateTableInfo(fullQualifiedName, table);
Catalog.getCurrentCatalog().getEditLog().logCreateTable(info);
}
if (table.getType() == TableType.ELASTICSEARCH) {
Catalog.getCurrentCatalog().getEsRepository().registerTable((EsTable) table);
}

将table添加到DataBase对象里

判断是否replay

写入元数据日志

流程总结:

FE与BE交互

FE与BE交互流程图

FE 发送任务

BE执行

BE汇报执行结果

FE汇总结果

AgentBatchTask batchTask = new AgentBatchTask();
for (Tablet tablet : index.getTablets()) {
long tabletId = tablet.getId();
for (Replica replica : tablet.getReplicas()) {
long backendId = replica.getBackendId();
countDownLatch.addMark(backendId, tabletId);
CreateReplicaTask task = new CreateReplicaTask(backendId, dbId, tableId,
partitionId, indexId, tabletId,
shortKeyColumnCount, schemaHash,
version, versionHash,
keysType,
storageType, storageMedium,
schema, bfColumns, bfFpp,
countDownLatch,
indexes,
isInMemory,
tabletType);
task.setStorageFormat(storageFormat);
batchTask.addTask(task);
// add to AgentTaskQueue for handling finish report.
// not for resending task
AgentTaskQueue.addTask(task);
}
}
AgentTaskExecutor.submit(batchTask);

AgentBatchTask:

收集Task并按照Be分组

AgentTaskExecutor:

发送AgentBatchTask

AgentTaskQueue:

处理任务完成的上报

BE任务接收

be/src/agent/agent_server.cpp

接收Task

// resend request when something is wrong(BE may need some logic to guarantee idempotence.
void AgentServer::submit_tasks(TAgentResult& agent_result,
const std::vector<TAgentTaskRequest>& tasks) {
Status ret_st;
// TODO check master_info here if it is the same with that of heartbeat rpc
if (_master_info.network_address.hostname == "" || _master_info.network_address.port == 0) {
Status ret_st = Status::Cancelled("Have not get FE Master heartbeat yet");
ret_st.to_thrift(&agent_result.status);
return;
} for (auto task : tasks) {
VLOG_RPC << "submit one task: " << apache::thrift::ThriftDebugString(task).c_str();
TTaskType::type task_type = task.task_type;
int64_t signature = task.signature; #define HANDLE_TYPE(t_task_type, work_pool, req_member) \
case t_task_type: \
if (task.__isset.req_member) { \
work_pool->submit_task(task); \
} else { \
ret_st = Status::InvalidArgument(strings::Substitute( \
"task(signature=$0) has wrong request member", signature)); \
} \
break; ... ret_st.to_thrift(&agent_result.status);
}

工作线程

while (_is_work) {
TAgentTaskRequest agent_task_req;
TCreateTabletReq create_tablet_req;
{
lock_guard<Mutex> worker_thread_lock(_worker_thread_lock);
while (_is_work && _tasks.empty()) {
_worker_thread_condition_variable.wait();
}
if (!_is_work) {
return;
}
//从队列中取出任务
agent_task_req = _tasks.front();
create_tablet_req = agent_task_req.create_tablet_req;
_tasks.pop_front(); //执行
OLAPStatus create_status = _env->storage_engine()->create_tablet(create_tablet_req); TFinishTaskRequest finish_task_request;
finish_task_request.__set_finish_tablet_infos(finish_tablet_infos);
finish_task_request.__set_backend(_backend);
finish_task_request.__set_report_version(_s_report_version);
finish_task_request.__set_task_type(agent_task_req.task_type);
finish_task_request.__set_signature(agent_task_req.signature);
finish_task_request.__set_task_status(task_status);
//汇报结果
_finish_task(finish_task_request);
}

处理任务汇报

org.apache.doris.service.FrontendServiceImpl#finishTask

org.apache.doris.master.MasterImpl#finishTask

FE、BE通过Thrift协议通信

错误处理

org.apache.doris.task.AgentTaskQueue 存储正在执行的Task

org.apache.doris.master.ReportHandler#handleReport

org.apache.doris.master.ReportHandler#taskReport

BE: Report tasks/olap tablet/disk state to the master server

FE master 处理任务,超时会进行重试

private static void taskReport(long backendId, Map<TTaskType, Set<Long>> runningTasks) {
... // to escape sending duplicate agent task to be
if (task.shouldResend(taskReportTime)) {
batchTask.addTask(task);
}
...
}

元数据持久化

Edit类似WAL

BDBJE 分布式KV存储

元数据持久化:org.apache.doris.catalog.Database#createTableWithLock

public Pair<Boolean, Boolean> createTableWithLock(Table table, boolean isReplay, boolean setIfNotExist) {
...
//更新内存
nameToTable.put(table.getName(), table);
// Write edit log
//构建元数据日志
CreateTableInfo info = new CreateTableInfo(fullQualifiedName, table);
//写入元数据日志
Catalog.getCurrentCatalog().getEditLog().logCreateTable(info);
... }

元数据回放

元数据回放发生在FE leader 给 其他FE节点同步的时候

逐一回放元数据

在内存中复原元数据

org.apache.doris.catalog.Catalog#replayCreateTable

public void replayCreateTable(String dbName, Table table) {
Database db = this.fullNameToDb.get(dbName);
db.createTableWithLock(table, true, false);
...
}

如何实现一个新的语句

fe/fe-core/src/main/cup/sql_parser.cup 语法文件

KW_CREATE opt_external:isExternal KW_TABLE opt_if_not_exists:ifNotExists table_name:name
LPAREN column_definition_list:columns COMMA index_definition_list:indexes RPAREN opt_engine:engineName
opt_keys:keys
opt_comment:tableComment
opt_partition:partition
opt_distribution:distribution
opt_rollup:index
opt_properties:tblProperties
opt_ext_properties:extProperties
{:
RESULT = new CreateTableStmt(ifNotExists, isExternal, name, columns, indexes, engineName, keys, partition,
distribution, tblProperties, extProperties, tableComment, index);
:}

fe/fe-core/src/main/jflex/sql_scanner.flex 词法文件

keywordMap.put("create", new Integer(SqlParserSymbols.KW_CREATE));
keywordMap.put("cross", new Integer(SqlParserSymbols.KW_CROSS));
keywordMap.put("cube", new Integer(SqlParserSymbols.KW_CUBE));
keywordMap.put("current", new Integer(SqlParserSymbols.KW_CURRENT));
keywordMap.put("current_user", new Integer(SqlParserSymbols.KW_CURRENT_USER));
keywordMap.put("data", new Integer(SqlParserSymbols.KW_DATA));
keywordMap.put("database", new Integer(SqlParserSymbols.KW_DATABASE));

词法语法的代码生成:

cd fe/ && mvn clean install –DskipTests
• SqlScanner.java
• SqlParser.java
• SqlParserSymbols.java

实现新语句步骤总结:

  1. 定义词法语法文件
  2. 实现对应的语句类,比如CreateTableStmt
  3. 实现元数据修改的方法,如Catalog.createTable()
  4. 定义对应操作的元数据日志类,如CreateTableInfo
  5. 实现元数据日志的写入
  6. 实现对应的replay方法,如Catalog.replayCreateTable()

[Apache Doris] Apache Doris 元数据设计及DDL操作源码阅读的更多相关文章

  1. Redis源码阅读(二)高可用设计——复制

    Redis源码阅读(二)高可用设计-复制 复制的概念:Redis的复制简单理解就是一个Redis服务器从另一台Redis服务器复制所有的Redis数据库数据,能保持两台Redis服务器的数据库数据一致 ...

  2. Backbone设计思路和关键源码分析

    一. Backbone的江湖地位: backbone作为一个老牌js框架为大规模前端开发提供了新的开发思路:前端MVC模式,这个模式也是前端开发演变过程中的一个重要里程碑,也为MVVM和Redux等开 ...

  3. Apache mahout 源码阅读笔记--协同过滤, PearsonCorrelationSimilarity

    协同过滤源码路径: ~/project/javaproject/mahout-0.9/core/src $tree main/java/org/apache/mahout/cf/taste/ -L 2 ...

  4. ASP.NET MVC+EF框架+EasyUI实现权限管理系列(24)-权限组的设计和实现(附源码)(终结)

    ASP.NET MVC+EF框架+EasyUI实现权限管系列 (开篇)   (1):框架搭建    (2):数据库访问层的设计Demo    (3):面向接口编程   (4 ):业务逻辑层的封装    ...

  5. .net 平台下, Socket通讯协议中间件设计思路(附源码)

    .net 平台下,实现通讯处理有很多方法(见下表),各有利弊: 序号 实现方式 特点 1 WCF 优点:封装好,方便.缺点:难学,不跨平台 2 RocketMQ,SuperSocket等中间件 优点: ...

  6. C++STL内存配置的设计思想与关键源码分析

    说明:我认为要读懂STL中allocator部分的源码,并汲取它的思想,至少以下几点知识你要了解:operator new和operator delete.handler函数以及一点模板知识.否则,下 ...

  7. Apache Hive (七)Hive的DDL操作

    转自:https://www.cnblogs.com/qingyunzong/p/8723271.html 库操作 1.创建库 语法结构 CREATE (DATABASE|SCHEMA) [IF NO ...

  8. Apache CXF实现WebService入门教程(附完整源码)

    Apache CXF实现WebService非常简单实用,只需要几步就可以实现一个简单的web service. 首先我们需要新建一个maven项目,在pom中添加依赖和jetty作为测试的web s ...

  9. Apache mahout 源码阅读笔记-DataModel之UserBaseRecommender

    先来看一下使用流程: 1)拿到DataModel 2)定义相似度计算模型 PearsonCorrelationSimilarity 3)定义用户邻域计算模型 NearestNUserNeighborh ...

随机推荐

  1. c++ 的学习 构造函数1

    1. 构造函数(也叫构造器),在对象创建的时候自动调用,一般用于完成对象的初始化工作 2.一旦自定义了构造函数,必须用其中一个自定义的构造函数来初始化对象 就是有多个的话    根据参数编译器自行选 ...

  2. VUE自学日志01-MVC和MVVM

    一.需要了解的基础概念 Model(M)是指数据模型,泛指后端进行的各种业务逻辑处理和数据操控,主要围绕数据库系统展开.这里的难点主要在于需要和前端约定统一的接口规则. View(V)是视图层,也就是 ...

  3. 关于国密HTTPS 的那些事(二)

    关于国密HTTPS 的那些事(二) 三. 需要解决的问题 前文我们了解了https,并梳理了国密https的流程.那么完成这些流程的目的是什么呢?又是怎么来保护数据的安全性呢?我们继续... 上文我们 ...

  4. 关于django配置好静态文件后打开相关图片页显示404的解决方法

    在url里设置以上代码即可,即可解决图片显示异常(出现此问题的根本原因是django版本)django3后需要加以上代码)

  5. 文本域textarea的一个小细节

    文本域代码在编写时,最好写在一行上,就像: 如果没写在一行上,如: 那么就会在后续生成的页面上输入的时候就会产生一段空白无法删除: 这是写文本框的时候的一个小细节

  6. 浅尝装饰器和AOP

    [写在前面] 参考文章:https://www.cnblogs.com/huxi/archive/2011/03/01/1967600.html[从简单的例子入手进行讲解,由浅入深,很到位] 装饰器部 ...

  7. Linux常用命令,查看树形结构、删除目录(文件夹)、创建文件、删除文件或目录、复制文件或目录(文件夹)、移动、查看文件内容、权限操作

    5.查看树结构(tree) 通常情况下系统未安装该命令,需要yum install -y tree安装 直接使⽤tree显示深度太多,⼀般会使⽤ -L选项⼿⼯设定⽬录深度 格式:tree -L n [ ...

  8. Java:并发笔记-07

    Java:并发笔记-07 说明:这是看了 bilibili 上 黑马程序员 的课程 java并发编程 后做的笔记 6. 共享模型之不可变 本章内容 不可变类的使用 不可变类设计 无状态类设计 6.1 ...

  9. the Agiles Scrum Meeting 2

    会议时间:2020.4.10 21:00 1.每个人的工作 今天已完成的工作 yjy:debug:班级创建了个人项目不能访问班级:教师窗口的前端bug. issues:Bug:教师创建博客时显示项目为 ...

  10. [对对子队]团队任务拆解Alpha

    Alpha阶段主要目标 完成游戏场景的基本实现(不要求美术资源) 完成游戏UI的基本实现(不要求美术资源) 制作第一部分的关卡(顺序语句,制作3-5关) 完成第一部分关卡和游戏基本逻辑的测试 任务分解 ...