分布式链路跟踪系统架构SkyWalking和zipkin和pinpoint

Net和Java基于zipkin的全链路追踪

https://www.cnblogs.com/zhangs1986/p/8966051.html

　在各大厂分布式链路跟踪系统架构对比中已经介绍了几大框架的对比，如果想用免费的可以用zipkin和pinpoint还有一个忘了介绍：SkyWalking，具体介绍可参考：https://github.com/apache/incubator-skywalking/blob/master/README_ZH.md

　　由于追踪的要求是Net平台和Java平台都要支持，对于java平台各组件都是天生的支持的，但对于net的支持找了些开源组件，发现Pinpoint和SkyWalking给出的Demo都是基于NetCore（SkyWalking可以在github上搜skywalking-netcore，Pinpoint没有好的推荐），版本要求比较高，但不可能更改现有平台的FW框架，Zipkin有开源项目 Medidata.zipkinTracerModule 、zipkin.net、zipkin-csharp，网上依次推荐是从前到后，经过测试发现Medidata.zipkinTracerModule、zipkin.net也是用于Net Core的，在NuGet上安装报错。最后测试zipkin-csharp（https://github.com/openzipkin-attic/zipkin-csharp）可以成功，在NuGet中搜索Zipkin.Core，现在版本也只有一个，如下：

然后查看给出的demo中代码：zipkin-csharp/examples/ZipkinExample/Program.cs

复制代码

using System;

using System.Net;

using System.Threading;

using Zipkin;

using Zipkin.Tracer.Kafka;

namespace ZipkinExample

{

class Program

{

static void Main(string[] args)

{

var random = new Random();

// make sure Zipkin with Scribe client is working

//var collector = new HttpCollector(new Uri("http://localhost:9411/"));

var collector = new KafkaCollector(KafkaSettings.Default);

var traceId = new TraceHeader(traceId: (ulong)random.Next(), spanId: (ulong)random.Next());

var span = new Span(traceId, new IPEndPoint(IPAddress.Loopback, 9000), "test-service");

span.Record(Annotations.ClientSend(DateTime.UtcNow));

Thread.Sleep(100);

span.Record(Annotations.ServerReceive(DateTime.UtcNow));

Thread.Sleep(100);

span.Record(Annotations.ServerSend(DateTime.UtcNow));

Thread.Sleep(100);

span.Record(Annotations.ClientReceive(DateTime.UtcNow));

        collector.CollectAsync(span).Wait();

    }

}

}

复制代码

　　可以看出这里的traceId和spanId都是随机生成的，在这里推荐自己生成ID，注意是ulong型，这里毫秒数只格式化两位（数据库的位数20位，会超），也可以用更保险的其它方法。

复制代码

///

/// 获得随机数

///

///

private static ulong getRandom()

{

var random = new Random();

return ulong.Parse(DateTime.Now.ToString("yyyyMMddHHmmssff") + random.Next(100, 999));

}

}

复制代码

　　collector这里使用Http来接收，注释kafka的，放开http的。去掉 collector.CollectAsync(span).Wait(); 中的Wait。

Zipkin的几个基本概念

Span：基本工作单元，一次链路调用（可以是RPC，DB等没有特定的限制）创建一个span，通过一个64位ID标识它， span通过还有其他的数据，例如描述信息，时间戳，key-value对的（Annotation）tag信息，parent-id等，其中parent-id 可以表示span调用链路来源，通俗的理解span就是一次请求信息

Trace：类似于树结构的Span集合，表示一条调用链路，存在唯一标识，即TraceId

Annotation：注解，用来记录请求特定事件相关信息（例如时间），通常包含四个注解信息

cs - Client Start，表示客户端发起请求

sr - Server Receive，表示服务端收到请求

ss - Server Send，表示服务端完成处理，并将结果发送给客户端

cr - Client Received，表示客户端获取到服务端返回信息

BinaryAnnotation：提供一些额外信息，一般以key-value对出现

启动服务端测试

下载 https://github.com/openzipkin/zipkin/releases 最近的稳定版 release-2.7.1的jar包，这里采用mysql的型式保存记录，因此需要创建数据库zipkin，创建表：

复制代码

SET FOREIGN_KEY_CHECKS=0;

-- Table structure for zipkin_annotations

DROP TABLE IF EXISTS zipkin_annotations;

CREATE TABLE zipkin_annotations (

trace_id_high bigint(20) NOT NULL DEFAULT '0' COMMENT 'If non zero, this means the trace uses 128 bit traceIds instead of 64 bit',

trace_id bigint(20) NOT NULL COMMENT 'coincides with zipkin_spans.trace_id',

span_id bigint(20) NOT NULL COMMENT 'coincides with zipkin_spans.id',

a_key varchar(255) NOT NULL COMMENT 'BinaryAnnotation.key or Annotation.value if type == -1',

a_value blob COMMENT 'BinaryAnnotation.value(), which must be smaller than 64KB',

a_type int(11) NOT NULL COMMENT 'BinaryAnnotation.type() or -1 if Annotation',

a_timestamp bigint(20) DEFAULT NULL COMMENT 'Used to implement TTL; Annotation.timestamp or zipkin_spans.timestamp',

endpoint_ipv4 int(11) DEFAULT NULL COMMENT 'Null when Binary/Annotation.endpoint is null',

endpoint_ipv6 binary(16) DEFAULT NULL COMMENT 'Null when Binary/Annotation.endpoint is null, or no IPv6 address',

endpoint_port smallint(6) DEFAULT NULL COMMENT 'Null when Binary/Annotation.endpoint is null',

endpoint_service_name varchar(255) DEFAULT NULL COMMENT 'Null when Binary/Annotation.endpoint is null',

UNIQUE KEY trace_id_high (trace_id_high,trace_id,span_id,a_key,a_timestamp) COMMENT 'Ignore insert on duplicate',

UNIQUE KEY trace_id_high_4 (trace_id_high,trace_id,span_id,a_key,a_timestamp) COMMENT 'Ignore insert on duplicate',

KEY trace_id_high_2 (trace_id_high,trace_id,span_id) COMMENT 'for joining with zipkin_spans',

KEY trace_id_high_3 (trace_id_high,trace_id) COMMENT 'for getTraces/ByIds',

KEY endpoint_service_name (endpoint_service_name) COMMENT 'for getTraces and getServiceNames',

KEY a_type (a_type) COMMENT 'for getTraces',

KEY a_key (a_key) COMMENT 'for getTraces',

KEY trace_id (trace_id,span_id,a_key) COMMENT 'for dependencies job',

KEY trace_id_high_5 (trace_id_high,trace_id,span_id) COMMENT 'for joining with zipkin_spans',

KEY trace_id_high_6 (trace_id_high,trace_id) COMMENT 'for getTraces/ByIds',

KEY endpoint_service_name_2 (endpoint_service_name) COMMENT 'for getTraces and getServiceNames',

KEY a_type_2 (a_type) COMMENT 'for getTraces',

KEY a_key_2 (a_key) COMMENT 'for getTraces',

KEY trace_id_2 (trace_id,span_id,a_key) COMMENT 'for dependencies job'

) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED;

-- Records of zipkin_annotations

-- Table structure for zipkin_dependencies

DROP TABLE IF EXISTS zipkin_dependencies;

CREATE TABLE zipkin_dependencies (

day date NOT NULL,

parent varchar(255) NOT NULL,

child varchar(255) NOT NULL,

call_count bigint(20) DEFAULT NULL,

error_count bigint(20) DEFAULT NULL,

UNIQUE KEY day (day,parent,child),

UNIQUE KEY day_2 (day,parent,child)

) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED;

-- Records of zipkin_dependencies

-- Table structure for zipkin_spans

DROP TABLE IF EXISTS zipkin_spans;

CREATE TABLE zipkin_spans (

trace_id_high bigint(20) NOT NULL DEFAULT '0' COMMENT 'If non zero, this means the trace uses 128 bit traceIds instead of 64 bit',

trace_id bigint(20) NOT NULL,

id bigint(20) NOT NULL,

name varchar(255) NOT NULL,

parent_id bigint(20) DEFAULT NULL,

debug bit(1) DEFAULT NULL,

start_ts bigint(20) DEFAULT NULL COMMENT 'Span.timestamp(): epoch micros used for endTs query and to implement TTL',

duration bigint(20) DEFAULT NULL COMMENT 'Span.duration(): micros used for minDuration and maxDuration query',

UNIQUE KEY trace_id_high (trace_id_high,trace_id,id) COMMENT 'ignore insert on duplicate',

UNIQUE KEY trace_id_high_4 (trace_id_high,trace_id,id) COMMENT 'ignore insert on duplicate',

KEY trace_id_high_2 (trace_id_high,trace_id,id) COMMENT 'for joining with zipkin_annotations',

KEY trace_id_high_3 (trace_id_high,trace_id) COMMENT 'for getTracesByIds',

KEY name (name) COMMENT 'for getTraces and getSpanNames',

KEY start_ts (start_ts) COMMENT 'for getTraces ordering and range',

KEY trace_id_high_5 (trace_id_high,trace_id,id) COMMENT 'for joining with zipkin_annotations',

KEY trace_id_high_6 (trace_id_high,trace_id) COMMENT 'for getTracesByIds',

KEY name_2 (name) COMMENT 'for getTraces and getSpanNames',

KEY start_ts_2 (start_ts) COMMENT 'for getTraces ordering and range'

) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED;

-- Records of zipkin_spans

复制代码

启动

进入程序的当前目录启动，注意参数内容，如果想要保存到elasticsearch，需要按官方文档更改。

java -jar zipkin-server-2.7.1.jar --STORAGE_TYPE=mysql --MYSQL_DB=zipkin --MYSQL_USER=root --MYSQL_PASS=123456 --MYSQL_HOST=localhost --MYSQL_TCP_PORT=3306

启动后看到如下内容表明成功。

启动成功后浏览器访问 http://localhost:9411/

　　至此服务端和展示页面已经启动，不过功能还是很简单的，具体的使用可另行查询资料。

　　这里用来测试的服务采用网友提供的源码：mircoservice分布式跟踪系统（zipkin+springboot） https://github.com/dreamerkr/mircoservice，文章可参考：微服务之分布式跟踪系统（springboot+zipkin）https://blog.csdn.net/qq_21387171/article/details/53787019

用默认配置分别运行4个客户端服务后运行效果：

（1）分别启动每个服务，然后访问服务1，浏览器访问（http://localhost:8081/service1/test）

（2）输入zipkin地址，每次trace的列表

点击其中的trace，可以看trace的树形结构，包括每个服务所消耗的时间：

点击每个span可以获取延迟信息：

同时可以查看服务之间的依赖关系：

测试Net平台程序

将demo代码改为：

复制代码

static void Main(string[] args)

{

var random = new Random();

// make sure Zipkin with Scribe client is working

var collector = new HttpCollector(new Uri("http://localhost:9411/"));

//var collector = new KafkaCollector(KafkaSettings.Default);

var traceId = new TraceHeader(traceId: (ulong)random.Next(), spanId: (ulong)random.Next());

var span = new Span(traceId, new IPEndPoint(IPAddress.Loopback, 9000), "zipkinweb");

span.Record(Annotations.ClientSend(DateTime.UtcNow));

Thread.Sleep(100);

span.Record(Annotations.ServerReceive(DateTime.UtcNow));

Thread.Sleep(100);

span.Record(Annotations.ServerSend(DateTime.UtcNow));

Thread.Sleep(100);

span.Record(Annotations.ClientReceive(DateTime.UtcNow));

        collector.CollectAsync(span);

    }

复制代码

然后运行一次再查看，会多出一条信息

点进去会看到请求的详细信息和备注信息：

右上角查看json

　　验证了NET平台下是可以成功调用的，而且可以看到zipkin服务前端展示是通过api请求的，前后台分开的，因此我们可以以此来做二次开发，我们知道了数据结构或者通过自己请求数据库内容做更复杂的业务前端。

　　这里强调一点的是net最好用framework4.5以上的版本，由net的demo来看其实封装性不高，所以灵活性能很高，需要自己进一步封装才能达到代码的侵入性更少，性能更高。后面考虑到性能和数据量可改用kafka接收和ES保存数据。

作者：欢醉