全链路跟踪skywalking简介
该文章主要包括以下内容:
- skywalking的简介
- skywalking的使用,支持多种调用中间件(httpclent,springmvc,dubbo,mysql等等)
- skywalking的traceId与日志组件(log4j,logback,elk等)的集成
- skywalking告警模块使用
- skywalking的原理
- skywalking的限制
1.skywalking的简介:
Overview:
SkyWalking: an open source observability platform to collect, analyze, aggregate and visualize data from services and cloud native infrastructures.
SkyWalking provides an easy way to keep you have a clear view of your distributed system, even across Cloud.
It is more like a modern APM, specially designed for cloud native, container based and distributed system. ------- skywalking是一个开放源码的,用于收集、分析,聚合,可视化来自于不同服务和本地基础服务的数据的可观察的平台,
skywalking提供了一个简单的方法来让你对你的分布式系统甚至是跨云的服务有清晰的了解。
它更像是一个现代的系统性能管理,特别为分布式系统而设计。
Why use SkyWalking?
SkyWalking provides solutions for observing and monitoring distributed system, in many different scenarios.
First of all, like traditional ways, SkyWalking provides auto instrument agents for service, such as Java, C# and Node.js.
At the same time, it provides manual instrument SDKs for Go(Not yet), C++(Not yet).
Also with more languages required, risks in manipulating codes at runtime, cloud native infrastructures grow more powerful,
SkyWalking could use Service Mesher infra probes to collect data for understanding the whole distributed system.
In general, it provides observability capabilities for service(s), service instance(s), endpoint(s). ----------
skywalking提供了在很多不同的场景下用于观察和监控分布式系统的方式。
首先,像传统的方法,skywalking为java,c#,Node.js等提供了自动探针代理.
同时,它为Go,C++提供了手工探针。
随着本地服务越来越多,需要越来越多的语言,掌控代码的风险也在增加,
Skywalking可以使用网状服务探针收集数据,以了解整个分布式系统。
通常,skywalking提供了观察service,service instance,endpoint的能力。 service: 一个服务
Service Instance: 服务的实例(1个服务会启动多个节点)
Endpoint: 一个服务中的其中一个接口
Architecture:
2.skywalking的使用:
第一步:从skywalking的官网http://skywalking.apache.org/downloads/下载包,包的结构如图。
第二步:启动skywalking收集器服务,启动脚本是E:\apache-skywalking-apm-bin\bin\startup.sh,启动之后我们就可以访问http://localhost:8080/就可以看到skywalking的ui界面了。
第三步:启动项目: 拷贝skywalking-agent目录到所需位置,探针包含整个目录,请不要改变目录结构,可修改agent.config配置agent.application_code=xxl-job为自己的应用名
增加JVM启动参数,-javaagent:/path/to/skywalking-agent/skywalking-agent.jar。参数值为skywalking-agent.jar的绝对路径。
通过以上几步之后,我们就可以直接访问我们的项目的接口,看skywalking界面上能否收集到我们的调用信息了。
下图为skywalking的首页,主要展示全局的性能信息。
为了验证skywalking具有发现系统拓扑(系统依赖)的功能,启动4个服务,4个服务的接口路径分别为hello/start1,hello/start2,hello/start3,hello/start4,
在服务的依赖关系为: start1依赖start2,start2依赖start3和start4。
访问start1接口,skywalking展示的项目拓扑图如下:
全链路性能跟踪展示页面:
skywalking默认支持调用性能监控的类型有DB(1),RPC_FRAMEWORK(2),HTTP(3),MQ(4),CACHE(5),此外还支持自定义插件来监控未支持的组件。
下面来看下调用dubbo和db的效果:(服务start2中调用db和项目4的dubbo服务)
3.skywalking的traceId与日志组件(log4j,logback,elk等)的集成:
以logback为例,只要在日志配置xml中增加以下配置,则在打印日志的时候,自动把当前上下文中的traceId加入到日志中去。
<appender name="console" class="ch.qos.logback.core.ConsoleAppender">
<layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.TraceIdPatternLogbackLayout">
<pattern>
%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %tid - %msg%n
</pattern>
</layout>
</appender>
效果如下图所示,链路中的所有节点的traceId是一样的,这样就可以在skywalking上面发现性能差的traceId后,再去日志组件中查看日志是否有异常日志。
服务1中打印的日志:
2019-08-14 16:46:22 [http-nio-9091-exec-1] INFO c.z.s.controller.HelloController - TID:47.34.15657723821280001 - service1 logger with traceId
服务2中打印的日志:
2019-08-14 16:46:24 [http-nio-9092-exec-9] INFO c.z.s.controller.HelloController - TID:47.34.15657723821280001 - service2 logger with traceId
服务3中打印的日志:
2019-08-14 16:46:24 [http-nio-9093-exec-1] INFO c.z.s.controller.HelloController - TID:47.34.15657723821280001 - service3 logger with traceId
服务4中打印的日志:
2019-08-14 16:46:24 [http-nio-9094-exec-1] INFO c.z.s.controller.HelloController - TID:47.34.15657723821280001 - service4 logger with traceId
4.skywalking告警模块的使用:
下图为告警页面的ui界面,可以看到可以从三个维度来监控,分别为服务(service)、服务实例(service instance),端点(endpoint/接口)。
告警规则可以在安装包下的配置文件-(apache-skywalking-apm-bin/config/alarm-settings.yml)中,自由定义。
默认配置监控服务和服务实例,不监控端点,因为 # Active endpoint related metrics alarm will cost more memory than service and service instance metrics alarm.# Because the number of endpoint is much more than service and instance.
下面代码为配置告警规则的代码,skywalking还支持使用者配置告警接口,来及时发送通知,如发送短信/邮件等。如配置文件中的webhooks中。
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. # Sample alarm rules.
rules:
# Rule unique name, must be ended with `_rule`.
service_resp_time_rule:
metrics-name: service_resp_time
op: ">"
threshold: 1000
period: 10
count: 3
silence-period: 5
message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes.
service_sla_rule:
# Metrics value need to be long, double or int
metrics-name: service_sla
op: "<"
threshold: 8000
# The length of time to evaluate the metrics
period: 10
# How many times after the metrics match the condition, will trigger alarm
count: 2
# How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
silence-period: 3
message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
service_p90_sla_rule:
# Metrics value need to be long, double or int
metrics-name: service_p90
op: ">"
threshold: 1000
period: 10
count: 3
silence-period: 5
message: 90% response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes
service_instance_resp_time_rule:
metrics-name: service_instance_resp_time
op: ">"
threshold: 1000
period: 10
count: 2
silence-period: 5
message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes
# Active endpoint related metrics alarm will cost more memory than service and service instance metrics alarm.
# Because the number of endpoint is much more than service and instance.
#
endpoint_avg_rule:
metrics-name: endpoint_avg
op: ">"
threshold: 1000
period: 10
count: 2
silence-period: 5
message: Response time of endpoint {name} is more than 1000ms in 2 minutes of last 10 minutes #webhooks:
# - http://127.0.0.1/notify/
# - http://127.0.0.1/go-wechat/
5.skywalking的原理:
skywalaking总体架构分为三部分:
- skywalking-collector:链路数据归集器,数据可以落地ElasticSearch,单机也可以落地H2,不推荐,H2仅作为临时演示用
- skywalking-web:web可视化平台,用来展示落地的数据
- skywalking-agent:探针,用来收集和发送数据到归集器
skywalking的核心在于agent部分,下图展示了一次调用跨多个进程里agent的详细的运行过程:
agent支持多种客户端和服务端,支持的插件明细:--->https://github.com/apache/skywalking/blob/master/docs/en/setup/service-agent/java-agent/Supported-list.md
以拦截dubbo请求为例,skywalking的dubbo拦截插件实现的代码实现:
源码使用的是拦截dubbo中的MonitorFilter
这个类中的invoke
方法。具体如DubboInterceptor所示,通过获取dubbo的上下文RpcContext
先对消费者调用之前加入sky walking的跨进程协议header信息sw:traceId
,然后到生产者取出。
package org.apache.skywalking.apm.plugin.dubbo;
public class DubboInstrumentation extends ClassInstanceMethodsEnhancePluginDefine { private static final String ENHANCE_CLASS = "com.alibaba.dubbo.monitor.support.MonitorFilter";
private static final String INTERCEPT_CLASS = "org.apache.skywalking.apm.plugin.dubbo.DubboInterceptor"; @Override
protected ClassMatch enhanceClass() {
return NameMatch.byName(ENHANCE_CLASS);
} @Override
public ConstructorInterceptPoint[] getConstructorsInterceptPoints() {
return null;
} @Override
public InstanceMethodsInterceptPoint[] getInstanceMethodsInterceptPoints() {
return new InstanceMethodsInterceptPoint[] {
new InstanceMethodsInterceptPoint() {
@Override
public ElementMatcher<MethodDescription> getMethodsMatcher() {
return named("invoke");
} @Override
public String getMethodsInterceptor() {
return INTERCEPT_CLASS;
} @Override
public boolean isOverrideArgs() {
return false;
}
}
};
}
}
以下代码为Dubbo拦截器的实现:
package org.apache.skywalking.apm.plugin.dubbo; import com.alibaba.dubbo.common.URL;
import com.alibaba.dubbo.rpc.Invocation;
import com.alibaba.dubbo.rpc.Invoker;
import com.alibaba.dubbo.rpc.Result;
import com.alibaba.dubbo.rpc.RpcContext;
import java.lang.reflect.Method;
import org.apache.skywalking.apm.agent.core.context.ContextCarrier;
import org.apache.skywalking.apm.agent.core.context.tag.Tags;
import org.apache.skywalking.apm.agent.core.context.CarrierItem;
import org.apache.skywalking.apm.agent.core.context.ContextManager;
import org.apache.skywalking.apm.agent.core.context.trace.AbstractSpan;
import org.apache.skywalking.apm.agent.core.context.trace.SpanLayer;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.EnhancedInstance;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstanceMethodsAroundInterceptor;
import org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.MethodInterceptResult;
import org.apache.skywalking.apm.network.trace.component.ComponentsDefine; /**
* {@link DubboInterceptor} define how to enhance class {@link com.alibaba.dubbo.monitor.support.MonitorFilter#invoke(Invoker,
* Invocation)}. the trace context transport to the provider side by {@link RpcContext#attachments}.but all the version
* of dubbo framework below 2.8.3 don't support {@link RpcContext#attachments}, we support another way to support it.
*
* @author zhangxin
*/
public class DubboInterceptor implements InstanceMethodsAroundInterceptor {
/**
* <h2>Consumer:</h2> The serialized trace context data will
* inject to the {@link RpcContext#attachments} for transport to provider side.
* <p>
* <h2>Provider:</h2> The serialized trace context data will extract from
* {@link RpcContext#attachments}. current trace segment will ref if the serialize context data is not null.
*/
@Override
public void beforeMethod(EnhancedInstance objInst, Method method, Object[] allArguments,
Class<?>[] argumentsTypes, MethodInterceptResult result) throws Throwable {
Invoker invoker = (Invoker)allArguments[0];
Invocation invocation = (Invocation)allArguments[1];
RpcContext rpcContext = RpcContext.getContext();
boolean isConsumer = rpcContext.isConsumerSide();
URL requestURL = invoker.getUrl(); AbstractSpan span; final String host = requestURL.getHost();
final int port = requestURL.getPort();
if (isConsumer) {
final ContextCarrier contextCarrier = new ContextCarrier();
span = ContextManager.createExitSpan(generateOperationName(requestURL, invocation), contextCarrier, host + ":" + port);
//invocation.getAttachments().put("contextData", contextDataStr);
//@see https://github.com/alibaba/dubbo/blob/dubbo-2.5.3/dubbo-rpc/dubbo-rpc-api/src/main/java/com/alibaba/dubbo/rpc/RpcInvocation.java#L154-L161
CarrierItem next = contextCarrier.items();
while (next.hasNext()) {
next = next.next();
rpcContext.getAttachments().put(next.getHeadKey(), next.getHeadValue());
}
} else {
ContextCarrier contextCarrier = new ContextCarrier();
CarrierItem next = contextCarrier.items();
while (next.hasNext()) {
next = next.next();
next.setHeadValue(rpcContext.getAttachment(next.getHeadKey()));
} span = ContextManager.createEntrySpan(generateOperationName(requestURL, invocation), contextCarrier);
} Tags.URL.set(span, generateRequestURL(requestURL, invocation));
span.setComponent(ComponentsDefine.DUBBO);
SpanLayer.asRPCFramework(span);
} @Override
public Object afterMethod(EnhancedInstance objInst, Method method, Object[] allArguments,
Class<?>[] argumentsTypes, Object ret) throws Throwable {
Result result = (Result)ret;
if (result != null && result.getException() != null) {
dealException(result.getException());
} ContextManager.stopSpan();
return ret;
} @Override
public void handleMethodException(EnhancedInstance objInst, Method method, Object[] allArguments,
Class<?>[] argumentsTypes, Throwable t) {
dealException(t);
} /**
* Log the throwable, which occurs in Dubbo RPC service.
*/
private void dealException(Throwable throwable) {
AbstractSpan span = ContextManager.activeSpan();
span.errorOccurred();
span.log(throwable);
} /**
* Format operation name. e.g. org.apache.skywalking.apm.plugin.test.Test.test(String)
*
* @return operation name.
*/
private String generateOperationName(URL requestURL, Invocation invocation) {
StringBuilder operationName = new StringBuilder();
operationName.append(requestURL.getPath());
operationName.append("." + invocation.getMethodName() + "(");
for (Class<?> classes : invocation.getParameterTypes()) {
operationName.append(classes.getSimpleName() + ",");
} if (invocation.getParameterTypes().length > 0) {
operationName.delete(operationName.length() - 1, operationName.length());
} operationName.append(")"); return operationName.toString();
} /**
* Format request url.
* e.g. dubbo://127.0.0.1:20880/org.apache.skywalking.apm.plugin.test.Test.test(String).
*
* @return request url.
*/
private String generateRequestURL(URL url, Invocation invocation) {
StringBuilder requestURL = new StringBuilder();
requestURL.append(url.getProtocol() + "://");
requestURL.append(url.getHost());
requestURL.append(":" + url.getPort() + "/");
requestURL.append(generateOperationName(url, invocation));
return requestURL.toString();
}
}
在调用结束后结束,把span的详情信息发送给collector(数据收集器).具体实现在类org.apache.skywalking.apm.agent.core.context.TracingContext的stopSpan(AbstractSpan span)方法,
下面是stopSpan的具体实现方法:
@Override
public boolean stopSpan(AbstractSpan span) {
AbstractSpan lastSpan = peek();
if (lastSpan == span) {
if (lastSpan instanceof AbstractTracingSpan) {
AbstractTracingSpan toFinishSpan = (AbstractTracingSpan)lastSpan;
if (toFinishSpan.finish(segment)) {
pop();
}
} else {
pop();
}
} else {
throw new IllegalStateException("Stopping the unexpected span = " + span);
} finish(); return activeSpanStack.isEmpty();
}
具体发送数据的逻辑在finish方法中
/**
* Finish this context, and notify all {@link TracingContextListener}s, managed by {@link
* TracingContext.ListenerManager}
*/
private void finish() {
if (isRunningInAsyncMode) {
asyncFinishLock.lock();
}
try {
if (activeSpanStack.isEmpty() && running && (!isRunningInAsyncMode || asyncSpanCounter.get() == 0)) {
TraceSegment finishedSegment = segment.finish(isLimitMechanismWorking());
/*
* Recheck the segment if the segment contains only one span.
* Because in the runtime, can't sure this segment is part of distributed trace.
*
* @see {@link #createSpan(String, long, boolean)}
*/
if (!segment.hasRef() && segment.isSingleSpanSegment()) {
if (!samplingService.trySampling()) {
finishedSegment.setIgnore(true);
}
} /*
* Check that the segment is created after the agent (re-)registered to backend,
* otherwise the segment may be created when the agent is still rebooting and should
* be ignored
*/
if (segment.createTime() < RemoteDownstreamConfig.Agent.INSTANCE_REGISTERED_TIME) {
finishedSegment.setIgnore(true);
} TracingContext.ListenerManager.notifyFinish(finishedSegment); //通知监控追踪容器的监听者,监听者会把数据发送给collector. running = false;
}
} finally {
if (isRunningInAsyncMode) {
asyncFinishLock.unlock();
}
}
}
5.skywalking的限制
Just effect frameworks or libraries.
Because of the changing codes by agents, it also means the codes are already known by agent plugin developers.
So, there is always a supported list in this kind of probes. Like SkyWalking Java agent supported list. Across thread can't be supported all the time.
Like we said about in process propagation, most codes run in a single thread per request, especially business codes.
But in some other scenarios, they do things in different threads, such as job assignment, task pool or batch process.
Or some languages provide coroutine or similar thing like Goroutine, then developer could run async process with low payload, even been encouraged. In those cases, auto instrument will face problems.
1.只支持已知的代理,如果使用的中间件还未被支持,需要自己写插件。
2.跨线程的场景不支持自动代理,比如任务分配,任务池,批处理的场景。
全链路跟踪skywalking简介的更多相关文章
- Go微服务全链路跟踪详解
在微服务架构中,调用链是漫长而复杂的,要了解其中的每个环节及其性能,你需要全链路跟踪. 它的原理很简单,你可以在每个请求开始时生成一个唯一的ID,并将其传递到整个调用链. 该ID称为Correlati ...
- 全链路跟踪TraceId
数据库主键:标示唯一一条数据,譬如唯一商品,唯一订单 全局事务ID:实现分布式事务一致性的必备良药 请求ID:requestId,seesionId,标示一个请求或者一次会话的生命周期 身份证ID:代 ...
- 调用链系列三、基于zipkin调用链封装starter实现springmvc、dubbo、restTemplate等实现全链路跟踪
一.实现思路 1.过滤器实现思路 所有调用链数据都通过过滤器实现埋点并收集.同一条链共享一个traceId.每个节点有唯一的spanId. 2.共享传递方式 1.rpc调用:通过隐式传参.dubbo有 ...
- 微服务、分库分表、分布式事务管理、APM链路跟踪性能分析演示项目
好多年没发博,最近有时间整理些东西,分享给大家. 所有内容都在github项目liuzhibin-cn/my-demo中,基于SpringBoot,演示Dubbo微服务 + Mycat, Shardi ...
- 【SpringCloud构建微服务系列】分布式链路跟踪Spring Cloud Sleuth
一.背景 随着业务的发展,系统规模越来越大,各微服务直接的调用关系也变得越来越复杂.通常一个由客户端发起的请求在后端系统中会经过多个不同的微服务调用协同产生最后的请求结果,几乎每一个前端请求都会形成一 ...
- 分布式链路跟踪系统架构SkyWalking和zipkin和pinpoint
Net和Java基于zipkin的全链路追踪 https://www.cnblogs.com/zhangs1986/p/8966051.html 在各大厂分布式链路跟踪系统架构对比 中已经介绍了几大框 ...
- skywalking与pinpoint全链路追踪方案对比
由于公司目前有200多微服务,微服务之间的调用关系错综复杂,调用关系人工维护基本不可能实现,需要调研一套全链路追踪方案,初步调研之后选取了skywalking和pinpoint进行对比; 选取skyw ...
- 你的Node应用,对接分布式链路跟踪系统了吗?(一) 原创: 金炳 Node全栈进阶 4天前 戳蓝字「Node全栈进阶」关注我们哦
你的Node应用,对接分布式链路跟踪系统了吗?(一) 原创: 金炳 Node全栈进阶 4天前 戳蓝字「Node全栈进阶」关注我们哦
- 全链路压测平台(Quake)在美团中的实践
背景 在美团的价值观中,以“客户为中心”被放在一个非常重要的位置,所以我们对服务出现故障越来越不能容忍.特别是目前公司业务正在高速增长阶段,每一次故障对公司来说都是一笔非常不小的损失.而整个IT基础设 ...
随机推荐
- Swagger学习(一、入门)
简单 入门(效果) SwaggerConfig.class @Configuration //变成配置文件 @EnableSwagger2 //开启swagger2 public class Swag ...
- 深度:Hadoop对Spark五大维度正面比拼!
每年,市场上都会出现种种不同的数据管理规模.类型与速度表现的分布式系统.在这些系统中,Spark和hadoop是获得最大关注的两个.然而该怎么判断哪一款适合你? 如果想批处理流量数据,并将其导入HDF ...
- 【Git的基本操作一】文件初始化及设置签名
1. 本地库初始化 命令: git init 效果:
- Delphi DLL文件的动态调用
樊伟胜
- Java 基本的数据类型(8种)
1.Java 基本的数据类型(8种) 整型:byte .short .int .long 浮点型:float .double 字符型:char 布尔型:boolean
- Select,poll,epoll复用
Select,poll,epoll复用 1)select模块以列表的形式接受四个参数,分别是可读对象,可写对象,产生异常的对象,和超时设置.当监控符对象发生变化时,select会返回发生变化的对象列表 ...
- Linux使用wget仿站
运行命令 $ wget -r -p -np -k www.avatrade.cn 参数说明 -r --recursive(递归) specify recursive download.(指定递归下载) ...
- Linux用户组管理及用户权限3
用户.组管理命令 安全上下文: 进程以其发起者的身份运行: 进程对文件的访问权限,取决于发此进程的用户的权限 系统用户:为了能够让那些后台进程或服务类进程以非管理员 ...
- libusb_control_setup
libusb_fill_control_transfer(transfer, devh, buf, ctrl_urb_complete_cb, utrans, 1000); ...
- PHP底层运行机制与原理
PHP的设计理念及特点 多进程模型:由于PHP是多进程模型,不同请求间互不干涉,这样保证了一个请求挂掉不会对全盘服务造成影响,当然,时代发展,PHP也早已支持多线程模型. 弱类型语言:和C/C++.J ...