Build Telemetry for Distributed Services之OpenTracing简介
What is Distributed Tracing?
Distributed tracing, also called distributed request tracing, is a method used to profile and monitor applications, especially those built using a microservices architecture. Distributed tracing helps pinpoint where failures occur and what causes poor performance.
Who Uses Distributed Tracing?
IT and DevOps teams can use distributed tracing to monitor applications. Distributed tracing is particularly well-suited to debugging and monitoring modern distributed software architectures, such as microservices.
Developers can use distributed tracing to help debug and optimize their code.
What is OpenTracing?
It is probably easier to start with what OpenTracing is NOT.
OpenTracing is not a download or a program. Distributed tracing requires that software developers add instrumentation to the code of an application, or to the frameworks used in the application.
OpenTracing is not a standard. The Cloud Native Computing Foundation (CNCF) is not an official standards body. The OpenTracing API project is working towards creating more standardized APIs and instrumentation for distributed tracing.
OpenTracing is comprised of an API specification, frameworks and libraries that have implemented the specification, and documentation for the project. OpenTracing allows developers to add instrumentation to their application code using APIs that do not lock them into any one particular product or vendor.
For more information about where OpenTracing has already been implemented, see the list of languages and the list of tracers that support the OpenTracing specification.
Concepts and Terminology
All language-specific OpenTracing APIs share some core concepts and terminology. These concepts are so central and important to the project that they have their own repository (github.com/opentracing/specification) and semver scheme.
- The OpenTracing Semantic Specification is a versioned description of the current pan-language OpenTracing standard
- The Semantic Conventions spec describes conventional Span tags and log keys for common semantic scenarios
Both files are versioned and the GitHub repository is tagged according to the rules described by the versioning policy.
Spans
What is a Span?
The “span” is the primary building block of a distributed trace, representing an individual unit of work done in a distributed system.
Each component of the distributed system contributes a span - a named, timed operation representing a piece of the workflow.
Spans can (and generally do) contain “References” to other spans, which allows multiple Spans to be assembled into one complete Trace - a visualization of the life of a request as it moves through a distributed system.
Each span encapsulates the following state according to the OpenTracing specification:
- An operation name
- A start timestamp and finish timestamp
- A set of key:value span Tags
- A set of key:value span Logs
- A SpanContext
Tags
Tags are key:value pairs that enable user-defined annotation of spans in order to query, filter, and comprehend trace data.
Span tags should apply to the whole span. There is a list available at semantic_conventions.md listing conventional span tags for common scenarios. Examples may include tag keys like db.instance
to identify a database host, http.status_code
to represent the HTTP response code, or error
which can be set to True if the operation represented by the Span fails.
Logs
Logs are key:value pairs that are useful for capturing span-specific logging messages and other debugging or informational output from the application itself. Logs may be useful for documenting a specific moment or event within the span (in contrast to tags which should apply to the span as a whole).
SpanContext
The SpanContext carries data across process boundaries. Specifically, it has two major components:
- An implementation-dependent state to refer to the distinct span within a trace
- i.e., the implementing Tracer’s definition of spanID and traceID
- Any Baggage Items
- These are key:value pairs that cross process-boundaries.
- These may be useful to have some data available for access throughout the trace.
Example Span:
t=0 operation name: db_query t=x
+-----------------------------------------------------+
| · · · · · · · · · · Span · · · · · · · · · · |
+-----------------------------------------------------+
Tags:
- db.instance:"jdbc:mysql://127.0.0.1:3306/customers
- db.statement: "SELECT * FROM mytable WHERE foo='bar';"
Logs:
- message:"Can't connect to mysql server on '127.0.0.1'(10061)"
SpanContext:
- trace_id:"abc123"
- span_id:"xyz789"
- Baggage Items:
- special_id:"vsid1738"
Scopes and Threading
Introduction
In any given thread there is an “active” span primarily responsible for the work accomplished by the surrounding application code, called the ActiveSpan
. The OpenTracing API allows for only one span
in a thread to be active at any point in time. This is managed using a Scope
, which formalizes the activation and deactivation of a Span
.
Other spans that are involved with the same thread will satisfy either of the following conditions:
- Started
- Not finished
- Not “active”
For example, there can be multiple spans on the same thread, if the spans are:
- Waiting for I/O
- Blocked on a child Span
- Off of the critical path
Note that if a Scope
exists when the developer creates a new Span
then it will act as its parent, unless the programmer invokes ignoreActiveSpan()
at buildSpan()
time or specifies parent context explicitly.
Accessing the Current Active Span
As it is inconvenient to pass an active Span
from function to function manually, so OpenTracing requires that every Tracer
contain a ScopeManager
. The ScopeManager
API grants access to the active Span through a Scope
. This means that a developer can access any active Span
through a Scope
.
Moving a span between threads
Using the ScopeManager
API, a developer can transfer the spans among different threads. A Span
’s lifetime might start in one thread and end in another. The ScopeManager
API allows for a Span
to be transferred to another thread or callback. Passing of scopes to another thread or callback is not supported. For more details, refer to the language specific documentation
Tags, logs and baggage
Tags
Tags are key:value pairs that enable user-defined annotation of spans in order to query, filter, and comprehend trace data.
Span tags should apply to the whole span. There is a list available at semantic_conventions.md listing conventional span tags for common scenarios. Examples may include tag keys like db.instance
to identify a database host, http.status_code
to represent the HTTP response code, or error
which can be set to True if the operation represented by the Span fails.
Logs
Logs are key:value pairs that are useful for capturing timed log messages and other debugging or informational output from the application itself. Logs may be useful for documenting a specific moment or event within the span (in contrast to tags which should apply to the span regardless of time).
Baggage Items
The SpanContext carries data across process boundaries. Specifically, it has two major components:
- An implementation-dependent state to refer to the distinct span within a trace
- i.e., the implementing Tracer’s definition of spanID and traceID
- Any Baggage Items
- These are key:value pairs that cross process-boundaries.
- These may be useful to have some data available for access throughout the trace.
Tracers
Introduction
OpenTracing provides an open, vendor-neutral standard API for describing distributed transactions, specifically causality, semantics and timing. It provides a general purpose distributed context propagation framework, consisting of API primitives for:
- passing the metadata context in-process
- encoding and decoding the metadata context for transmitting it over the network for inter-process communications
- causality tracking: parent-child, forks, joins
OpenTracing abstracts away the differences among numerous tracer implementations. This means that instrumentation would remain the same irrespective of the tracer system being used by the developer. In order to instrument an application using OpenTracing specification, a compatible OpenTracing tracer must be deployed. A list of the all the supported tracers is available here.
Tracer Interface
The Tracer
interface creates Spans
and understands how to Inject
(serialize) and Extract
(deserialize) their metadata across process boundaries. It has the following capabilities:
- Start a new
Span
Inject
aSpanContext
into a carrierExtract
aSpanContext
from a carrier
Each of these will be discussed in more detail below. For implementation purposes, check out the specific language guide.
Setting up a Tracer
A Tracer
is the actual implementation that will record the Spans
and publish them somewhere. How an application handles the actual Tracer
is up to the developer: either consume it directly throughout the application or store it in the GlobalTracer
for easier usage with instrumented frameworks.
Different Tracer
implementations vary in how and what parameters they receive at initialization time, such as:
- Component name for this application’s traces.
- Tracing endpoint.
- Tracing credentials.
- Sampling strategy.
Once a Tracer
instance is obtained, it can be used to manually create Span
, or pass it to existing instrumentation for frameworks and libraries.
In order to not force the user to keep around a Tracer
, the io.opentracing.util
artifact includes a helper GlobalTracer
class implementing the io.opentracing.Tracer
interface, which, as the name implies, acts as as a global instance that can be used from anywhere. It works by forwarding all operations to another underlying Tracer
, that will get registered at some future point.
By default, the underlying Tracer
is a no-nop
implementation.
Starting a new Trace
A new trace is started whenever a new Span
is created without references to a parent Span
. When creating a new Span
, you need to specify an “operation name”, which is a free-format string that you can use to help you identify the code this Span
relates to. The next Span
from our new trace will probably be a child Span
and can be seen as a representation of a sub-routine that is executed “within” the main Span
. This child Span
has, therefore, a ChildOf
relationship with the parent. Another type of relationship is the FollowsFrom
and is used in special cases where the new Span
is independent of the parent Span
, such as in asynchronous processes.
Accessing the Active Span
Tracer
can be used for enabling access to the ActiveSpan
. ActiveSpans
can also be accessed through a scopeManager
in some languages. Refer to the specific language guide for more implementation details.
Propagating a Trace with Inject/Extract
In order to trace across process boundaries in distributed systems, services need to be able to continue the trace injected by the client that sent each request. OpenTracing allows this to happen by providing inject and extract methods that encode a span’s context into a carrier. The inject
method allows for the SpanContext
to be passed on to a carrier. For example, passing the trace information into the client’s request so that the server you send it to can continue the trace. The extract
method does the exact opposite. It extract the SpanContext
from the carrier. For example, if there was an active request on the client side, the developer must extract the SpanContext
using the io.opentracing.Tracer.extract
method.
Tracing Systems
The following table lists all currently known OpenTracing Tracers:
Tracing system | Supported languages |
---|---|
CNCF Jaeger | Java, Go, Python, Node.js, C++, C# |
Datadog | Go |
inspectIT | Java |
Instana | Crystal, Go, Java, Node.js, Python, Ruby |
LightStep | Go, Python, JavaScript, Objective-C, Java, PHP, Ruby,C++ |
stagemonitor | Java |
Inject and extract
Programmers adding tracing support across process boundaries must understand the Tracer.Inject(...)
and Tracer.Extract(...)
capabilities of the OpenTracing specification. They are conceptually powerful, allowing the programmer to write correct and general cross-process propagation code without being bound to a particular OpenTracing implementation; that said, with great power comes great opportunity for confusion. :)
This document provides a concise summary of the design and proper use of Inject
and Extract
, regardless of the particular OpenTracing language or OpenTracing implementation.
“The Big Picture” for explicit trace propagation
The hardest thing about distributed tracing is the distributed part. Any tracing system needs a way of understanding the causal relationship between activities in many distinct processes, whether they be connected via formal RPC frameworks, pub-sub systems, generic message queues, direct HTTP calls, best-effort UDP packets, or something else entirely.
Some distributed tracing systems (e.g., Project5 from 2003, or WAP5 from 2006 or The Mystery Machine from 2014) infer causal relationships across process boundaries. Of course there is a tradeoff between the apparent convenience of these black-box inference-based approaches and the freshness and quality of the assembled traces. Per the concern about quality, OpenTracing is an explicit distributed tracing instrumentation standard, and as such it is much better-aligned with approaches like X-Trace from 2007, Dapper from 2010, or numerous open-source tracing systems like Zipkin or Jaeger (among others).
Together, Inject
and Extract
allow for inter-process trace propagation without tightly coupling the programmer to a particular OpenTracing implementation.
Requirements for the OpenTracing propagation scheme
For Inject
and Extract
to be useful, all of the following must be true:
- Per the above, the OpenTracing user handling cross-process trace propagation must not need to write OpenTracing-implementation-specific code
- OpenTracing implementations must not need special handlers for every known inter-process communication mechanism: that’s far too much work, and it’s not even well-defined
- That said, the propagation mechanism should be extensible for optimizations
The basic approach: Inject, Extract, and Carriers
Any SpanContext in a trace may be Injected into what OpenTracing refers to as a Carrier. A Carrier is an interface or data structure that’s useful for inter-process communication (IPC); that is, the Carrier is something that “carries” the tracing state from one process to another. The OpenTracing specification includes two required Carrier formats, though custom Carrier formats are possible as well.
Similarly, given a Carrier, an injected trace may be Extracted, yielding a SpanContext instance which is semantically identical to the one Injected into the Carrier.
Inject pseudocode example
span_context = ...
outbound_request = ... # We'll use the (builtin) HTTP_HEADERS carrier format. We
# start by using an empty map as the carrier prior to the
# call to `tracer.inject`.
carrier = {}
tracer.inject(span_context, opentracing.Format.HTTP_HEADERS, carrier) # `carrier` now contains (opaque) key:value pairs which we pass
# along over whatever wire protocol we already use.
for key, value in carrier:
outbound_request.headers[key] = escape(value)
Extract pseudocode example
inbound_request = ... # We'll again use the (builtin) HTTP_HEADERS carrier format. Per the
# HTTP_HEADERS documentation, we can use a map that has extraneous data
# in it and let the OpenTracing implementation look for the subset
# of key:value pairs it needs.
#
# As such, we directly use the key:value `inbound_request.headers`
# map as the carrier.
carrier = inbound_request.headers
span_context = tracer.extract(opentracing.Format.HTTP_HEADERS, carrier)
# Continue the trace given span_context. E.g.,
span = tracer.start_span("...", child_of=span_context) # (If `carrier` held trace data, `span` will now be ready to use.)
Carriers have formats
All Carriers have a format. In some OpenTracing languages, the format must be specified explicitly as a constant or string; in others, the format is inferred from the Carrier’s static type information.
Required Inject/Extract Carrier formats
At a minimum, all platforms require OpenTracing implementations to support two Carrier formats: the “text map” format and the “binary” format.
- The text map Carrier format is a platform-idiomatic map from (unicode)
string
tostring
- The binary Carrier format is an opaque byte array (and presumably more compact and efficient)
What the OpenTracing implementations choose to store in these Carriers is not formally defined by the OpenTracing specification, but the presumption is that they find a way to encode “tracer state” about the propagated SpanContext
(e.g., in Dapper this would include a trace_id
, a span_id
, and a bitmask representing the sampling status for the given trace) as well as any key:value Baggage items.
Interoperability of OpenTracing implementations across process boundaries
There is no expectation that different OpenTracing implementations Inject
and Extract
SpanContexts in compatible ways. Though OpenTracing is agnostic about the tracing implementation across an entire distributed system, for successful inter-process handoff it’s essential that the processes on both sides of a propagation use the same tracing implementation.
Custom Inject/Extract Carrier formats
Any propagation subsystem (an RPC library, a message queue, etc) may choose to introduce their own custom Inject/Extract Carrier format; by preferring their custom format but falling back to a required OpenTracing format as needed they allow OpenTracing implementations to optimize for their custom format without needing OpenTracing implementations to support their format.
Some pseudocode will make this less abstract. Imagine that we’re the author of the (sadly fictitious) ArrrPC pirate RPC subsystem, and we want to add OpenTracing support to our outbound RPC requests. Minus some error handling, our pseudocode might look like this:
span_context = ...
outbound_request = ... # First we try our custom Carrier, the outbound_request itself.
# If the underlying OpenTracing implementation cares to support
# it, this call is presumably more efficient in this process
# and over the wire. But, since this is a non-required format,
# we must also account for the possibility that the OpenTracing
# implementation does not support arrrpc.ARRRPC_OT_CARRIER.
try:
tracer.inject(span_context, arrrpc.ARRRPC_OT_CARRIER, outbound_request) except opentracing.UnsupportedFormatException:
# If unsupported, fall back on a required OpenTracing format.
carrier = {}
tracer.inject(span_context, opentracing.Format.HTTP_HEADERS, carrier)
# `carrier` now contains (opaque) key:value pairs which we
# pass along over whatever wire protocol we already use.
for key, value in carrier:
outbound_request.headers[key] = escape(value)
More about custom Carrier formats
The precise representation of the “Carrier formats” may vary from platform to platform, but in all cases they should be drawn from a global namespace. Support for a new custom carrier format must not necessitate changes to the core OpenTracing platform APIs, though each OpenTracing platform API must define the required OpenTracing carrier formats (e.g., string maps and binary blobs). For example, if the maintainer of ArrrPC RPC framework wanted to define an “ArrrPC” Inject/Extract format, she or he must be able to do so without sending a PR to OpenTracing maintainers (though of course OpenTracing implementations are not required to support the “ArrrPC” format). There is an end-to-end injector and extractor example below to make this more concrete.
An end-to-end Inject and Extract propagation example
To make the above more concrete, consider the following sequence:
- A client process has a
SpanContext
instance and is about to make an RPC over a home-grown HTTP protocol - That client process calls
Tracer.Inject(...)
, passing the activeSpanContext
instance, a format identifier for a text map, and a text map Carrier as parameters Inject
has populated the text map in the Carrier; the client application encodes that map within its homegrown HTTP protocol (e.g., as headers)- The HTTP request happens and the data crosses process boundaries…
- Now in the server process, the application code decodes the text map from the homegrown HTTP protocol and uses it to initialize a text map Carrier
- The server process calls
Tracer.Extract(...)
, passing in the desired operation name, a format identifier for a text map, and the Carrier from above - In the absence of data corruption or other errors, the server now has a
SpanContext
instance that belongs to the same trace as the one in the client
Other examples can be found in the OpenTracing use cases doc.
Build Telemetry for Distributed Services之OpenTracing简介的更多相关文章
- Build Telemetry for Distributed Services之OpenTracing实践
官网:https://opentracing.io/docs/best-practices/ Best Practices This page aims to illustrate common us ...
- Build Telemetry for Distributed Services之OpenTracing项目
中文文档地址:https://wu-sheng.gitbooks.io/opentracing-io/content/pages/quick-start.html 中文github地址:https:/ ...
- Build Telemetry for Distributed Services之OpenTracing指导:C#
官网链接:https://opentracing.io/guides/ 官方微博:https://medium.com/opentracing Welcome to the OpenTracing G ...
- Build Telemetry for Distributed Services之Open Telemetry简介
官网链接:https://opentelemetry.io/about/ OpenTelemetry is the next major version of the OpenTracing and ...
- Build Telemetry for Distributed Services之Open Telemetry来历
官网:https://opentelemetry.io/ github:https://github.com/open-telemetry/ Effective observability requi ...
- Build Telemetry for Distributed Services之Jaeger
github链接:https://github.com/jaegertracing/jaeger 官网:https://www.jaegertracing.io/ Jaeger: open sourc ...
- Build Telemetry for Distributed Services之OpenCensus:C#
OpenCensus Easily collect telemetry like metrics and distributed traces from your services OpenCensu ...
- Build Telemetry for Distributed Services之Elastic APM
官网地址:https://www.elastic.co/guide/en/apm/get-started/current/index.html Overview Elastic APM is an a ...
- Build Telemetry for Distributed Services之OpenCensus:Tracing2(待续)
part 1:Tracing1 Sampling Sampling Samplers Global sampler Per span sampler Rules References
随机推荐
- python3使用ConfigParser从配置文件中获取列表
使用python3使用ConfigParser从配置文件中获取列表 testConfig.py #!/usr/bin/env python # coding=utf- __author__ = 'St ...
- Java字节码常量池深入剖析
继续来分析Java字节码,上一节分析了魔数的规则,接下来继续往下分析,其上次总结的规则也一起贴出来: 1.使用javap -verbose命令分析一个字节码文件时,将会分析该字节码文件的魔数.版本号. ...
- P1006 传纸条[棋盘DP]
题目来源:洛谷 题目描述 小渊和小轩是好朋友也是同班同学,他们在一起总有谈不完的话题.一次素质拓展活动中,班上同学安排做成一个m行n列的矩阵,而小渊和小轩被安排在矩阵对角线的两端,因此,他们就无法直接 ...
- linux实操_shell位置参数变量
基本语法: 脚本内容: 输出效果:
- WebRequest与WebResponse抽象类,DNS静态类、Ping类
一.概述 1.WebRequest: 对统一资源标识符 (URI) 发出请求. 这是一个 abstract 类. WebRequest的派生类:PackWebRequest.FileWebReques ...
- python ini文件内容的读取
(1)新建一个项目,再次新建一个文件 test_cfg.ini (2)再次新建 get_test_cfg.py,用来读取/写入/更改 ini的文件内容 #!/usr/bin/env python # ...
- Mysql中对字符串类型的字段进行数字值排序
排序字段+0或者*1,类似 Java 把 其他类型转换成字符串 比如 +“”: 一.对普通数字字符串字段排序 -- 方式一 SELECT * FROM xxxxxx WHERE STATUS ' O ...
- node.js通过回调函数获取异步函数的返回结果
html文件代码 <!DOCTYPE html> <html lang="en"> <head> <meta charset=" ...
- [2019牛客多校第四场][G. Tree]
题目链接:https://ac.nowcoder.com/acm/contest/884/G 题目大意:给定一个树\(A\),再给出\(t\)次询问,问\(A\)中有多少连通子图与树\(B_i\)同构 ...
- BZOJ 2229 / Luogu P3329 [ZJOI2011]最小割 (分治最小割板题)
题面 求所有点对的最小割中<=c的数量 分析 分治最小割板题 首先,注意这样一个事实:如果(X,Y)是某个s1-t1最小割,(Z,W)是某个s2-t2最小割,那么X∩Z.X∩W.Y∩Z.Y∩W这 ...