分布式应用,会存在各种问题。而要解决这些难题,除了要应用自己做一些监控埋点外,还应该有一些外围的系统进行主动探测,主动发现。

  APM工具就是干这活的,SkyWalking 是国人开源的一款优秀的APM应用,已成为apache的顶级项目。

  

  今天我们就来实践下 SkyWalking 下吧。

  实践目标: 达到监控现有的几个系统,清楚各调用关系,可以找到出性能问题点。

实践步骤:

  1. SkyWalking 服务端安装运行;
  2. 应用端的接入;
  3. 后台查看效果;
  4. 分析排查问题;
  5. 深入了解(如有心情);

1. SkyWalking 服务端安装

  下载应用包:

    # 主下载页
http://skywalking.apache.org/downloads/
# 点开具体下载地址后进行下载,如:
wget http://mirrors.tuna.tsinghua.edu.cn/apache/skywalking/6.5.0/apache-skywalking-apm-6.5.0.tar.gz

  解压安装包:

    tar -xzvf apache-skywalking-apm-6.5..tar.gz

  使用默认配置端口,默认存储方式 h2, 直接启动服务:

    ./bin/startup.sh

  好产品就是这么简单!

  现在服务端就启起来了,可以打开后台地址查看(默认是8080端口): http://localhost:8080    界面如下:

  当然,上面是已存在应用的页面。现在你是看不到任何应用的,因为你还没有接入嘛。

2. 应用端的接入

  我们只以java应用接入方式实践。

  直接使用 javaagent 进行启动即可:

    java -javaagent:/root/skywalking/agent/skywalking-agent.jar -Dskywalking.agent.service_name=app1 -Dskywalking.collector.backend_service=localhost: -jar myapp.jar

  参数说明:

    # 参数解释
skywalking.agent.service_name: 本应用在skywalking中的名称
skywalking.collector.backend_service: skywalking 服务端地址,grpc上报地址,默认端口是
# 上面两个参数也可以使用另外的表现形式
SW_AGENT_COLLECTOR_BACKEND_SERVICES: 与 skywalking.collector.backend_service 含义相同
SW_AGENT_NAME: 与 skywalking.agent.service_name 含义相同

  随便访问几个接口或页面,使监控抓取到数据。

  再回管理页面,已经看到有节点了。截图如上。

  现在我们还可以查看各应用之间的关系了!

  关系清晰吧!一目了然,代码再复杂也不怕了。

  我们还可以追踪具体链路:

  只要知道问题发生的时间点,即可以很快定位到发生问题的接口、系统,快速解决。

3. SkyWalking 配置文件

  如上,我们并没有改任何配置文件,就让系统跑起来了。幸运的同时,我们应该要知道更多!至少配置得知道。

  config/application.yml : 收集器服务端配置

  webapp/webapp.yml : 配置 Web 的端口及获取数据的 OAP(Collector)的IP和端口

  agent/config/agent.config : 配置 Agent 信息,如 Skywalking OAP(Collector)的地址和名称

  下面是 skywalking 的默认配置,我们可以不用更改就能跑起来一个样例!更改以生产化配置!

config/application.yml

cluster:
standalone:
# Please check your ZooKeeper is 3.5+, However, it is also compatible with ZooKeeper 3.4.x. Replace the ZooKeeper 3.5+
# library the oap-libs folder with your ZooKeeper 3.4.x library.
# zookeeper:
# nameSpace: ${SW_NAMESPACE:""}
# hostPort: ${SW_CLUSTER_ZK_HOST_PORT:localhost:}
# #Retry Policy
# baseSleepTimeMs: ${SW_CLUSTER_ZK_SLEEP_TIME:} # initial amount of time to wait between retries
# maxRetries: ${SW_CLUSTER_ZK_MAX_RETRIES:} # max number of times to retry
# # Enable ACL
# enableACL: ${SW_ZK_ENABLE_ACL:false} # disable ACL in default
# schema: ${SW_ZK_SCHEMA:digest} # only support digest schema
# expression: ${SW_ZK_EXPRESSION:skywalking:skywalking}
# kubernetes:
# watchTimeoutSeconds: ${SW_CLUSTER_K8S_WATCH_TIMEOUT:}
# namespace: ${SW_CLUSTER_K8S_NAMESPACE:default}
# labelSelector: ${SW_CLUSTER_K8S_LABEL:app=collector,release=skywalking}
# uidEnvName: ${SW_CLUSTER_K8S_UID:SKYWALKING_COLLECTOR_UID}
# consul:
# serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"}
# Consul cluster nodes, example: 10.0.0.1:,10.0.0.2:,10.0.0.3:
# hostPort: ${SW_CLUSTER_CONSUL_HOST_PORT:localhost:}
# nacos:
# serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"}
# hostPort: ${SW_CLUSTER_NACOS_HOST_PORT:localhost:}
# # Nacos Configuration namespace
# namespace: 'public'
# etcd:
# serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"}
# etcd cluster nodes, example: 10.0.0.1:,10.0.0.2:,10.0.0.3:
# hostPort: ${SW_CLUSTER_ETCD_HOST_PORT:localhost:}
core:
default:
# Mixed: Receive agent data, Level aggregate, Level aggregate
# Receiver: Receive agent data, Level aggregate
# Aggregator: Level aggregate
role: ${SW_CORE_ROLE:Mixed} # Mixed/Receiver/Aggregator
restHost: ${SW_CORE_REST_HOST:0.0.0.0}
restPort: ${SW_CORE_REST_PORT:}
restContextPath: ${SW_CORE_REST_CONTEXT_PATH:/}
gRPCHost: ${SW_CORE_GRPC_HOST:0.0.0.0}
gRPCPort: ${SW_CORE_GRPC_PORT:}
downsampling:
- Hour
- Day
- Month
# Set a timeout on metrics data. After the timeout has expired, the metrics data will automatically be deleted.
enableDataKeeperExecutor: ${SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR:true} # Turn it off then automatically metrics data delete will be close.
dataKeeperExecutePeriod: ${SW_CORE_DATA_KEEPER_EXECUTE_PERIOD:} # How often the data keeper executor runs periodically, unit is minute
recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:} # Unit is minute
minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:} # Unit is minute
hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:} # Unit is hour
dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:} # Unit is day
monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:} # Unit is month
# Cache metric data for minute to reduce database queries, and if the OAP cluster changes within that minute,
# the metrics may not be accurate within that minute.
enableDatabaseSession: ${SW_CORE_ENABLE_DATABASE_SESSION:true}
storage:
# elasticsearch:
# nameSpace: ${SW_NAMESPACE:""}
# clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:localhost:}
# protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
# trustStorePath: ${SW_SW_STORAGE_ES_SSL_JKS_PATH:"../es_keystore.jks"}
# trustStorePass: ${SW_SW_STORAGE_ES_SSL_JKS_PASS:""}
# user: ${SW_ES_USER:""}
# password: ${SW_ES_PASSWORD:""}
# indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:}
# indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:}
# # Those data TTL settings will override the same settings in core module.
# recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:} # Unit is day
# otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:} # Unit is day
# monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:} # Unit is month
# # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html
# bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:} # Execute the bulk every requests
# flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:} # flush the bulk every seconds whatever the number of requests
# concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:} # the number of concurrent requests
# resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:}
# metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:}
# segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:}
h2:
driver: ${SW_STORAGE_H2_DRIVER:org.h2.jdbcx.JdbcDataSource}
url: ${SW_STORAGE_H2_URL:jdbc:h2:mem:skywalking-oap-db}
user: ${SW_STORAGE_H2_USER:sa}
metadataQueryMaxSize: ${SW_STORAGE_H2_QUERY_MAX_SIZE:}
# mysql:
# properties:
# jdbcUrl: ${SW_JDBC_URL:"jdbc:mysql://localhost:3306/swtest"}
# dataSource.user: ${SW_DATA_SOURCE_USER:root}
# dataSource.password: ${SW_DATA_SOURCE_PASSWORD:root@}
# dataSource.cachePrepStmts: ${SW_DATA_SOURCE_CACHE_PREP_STMTS:true}
# dataSource.prepStmtCacheSize: ${SW_DATA_SOURCE_PREP_STMT_CACHE_SQL_SIZE:}
# dataSource.prepStmtCacheSqlLimit: ${SW_DATA_SOURCE_PREP_STMT_CACHE_SQL_LIMIT:}
# dataSource.useServerPrepStmts: ${SW_DATA_SOURCE_USE_SERVER_PREP_STMTS:true}
# metadataQueryMaxSize: ${SW_STORAGE_MYSQL_QUERY_MAX_SIZE:}
receiver-sharing-server:
default:
receiver-register:
default:
receiver-trace:
default:
bufferPath: ${SW_RECEIVER_BUFFER_PATH:../trace-buffer/} # Path to trace buffer files, suggest to use absolute path
bufferOffsetMaxFileSize: ${SW_RECEIVER_BUFFER_OFFSET_MAX_FILE_SIZE:} # Unit is MB
bufferDataMaxFileSize: ${SW_RECEIVER_BUFFER_DATA_MAX_FILE_SIZE:} # Unit is MB
bufferFileCleanWhenRestart: ${SW_RECEIVER_BUFFER_FILE_CLEAN_WHEN_RESTART:false}
sampleRate: ${SW_TRACE_SAMPLE_RATE:} # The sample rate precision is /. means % sample in default.
slowDBAccessThreshold: ${SW_SLOW_DB_THRESHOLD:default:,mongodb:} # The slow database access thresholds. Unit ms.
receiver-jvm:
default:
receiver-clr:
default:
service-mesh:
default:
bufferPath: ${SW_SERVICE_MESH_BUFFER_PATH:../mesh-buffer/} # Path to trace buffer files, suggest to use absolute path
bufferOffsetMaxFileSize: ${SW_SERVICE_MESH_OFFSET_MAX_FILE_SIZE:} # Unit is MB
bufferDataMaxFileSize: ${SW_SERVICE_MESH_BUFFER_DATA_MAX_FILE_SIZE:} # Unit is MB
bufferFileCleanWhenRestart: ${SW_SERVICE_MESH_BUFFER_FILE_CLEAN_WHEN_RESTART:false}
istio-telemetry:
default:
envoy-metric:
default:
# alsHTTPAnalysis: ${SW_ENVOY_METRIC_ALS_HTTP_ANALYSIS:k8s-mesh}
#receiver_zipkin:
# default:
# host: ${SW_RECEIVER_ZIPKIN_HOST:0.0.0.0}
# port: ${SW_RECEIVER_ZIPKIN_PORT:}
# contextPath: ${SW_RECEIVER_ZIPKIN_CONTEXT_PATH:/}
query:
graphql:
path: ${SW_QUERY_GRAPHQL_PATH:/graphql}
alarm:
default:
telemetry:
none:
configuration:
none:
# apollo:
# apolloMeta: http://106.12.25.204:8080
# apolloCluster: default
# # apolloEnv: # defaults to null
# appId: skywalking
# period:
# nacos:
# # Nacos Server Host
# serverAddr: 127.0.0.1
# # Nacos Server Port
# port:
# # Nacos Configuration Group
# group: 'skywalking'
# # Nacos Configuration namespace
# namespace: ''
# # Unit seconds, sync period. Default fetch every seconds.
# period :
# # the name of current cluster, set the name if you want to upstream system known.
# clusterName: "default"
# zookeeper:
# period : # Unit seconds, sync period. Default fetch every seconds.
# nameSpace: /default
# hostPort: localhost:
# #Retry Policy
# baseSleepTimeMs: # initial amount of time to wait between retries
# maxRetries: # max number of times to retry
# etcd:
# period : # Unit seconds, sync period. Default fetch every seconds.
# group : 'skywalking'
# serverAddr: localhost:
# clusterName: "default"
# consul:
# # Consul host and ports, separated by comma, e.g. 1.2.3.4:,2.3.4.5:
# hostAndPorts: ${consul.address}
# # Sync period in seconds. Defaults to seconds.
# period: #exporter:
# grpc:
# targetHost: ${SW_EXPORTER_GRPC_HOST:127.0.0.1}
# targetPort: ${SW_EXPORTER_GRPC_PORT:}

webapp/webapp.yml

server:
port: collector:
path: /graphql
ribbon:
ReadTimeout:
# Point to all backend's restHost:restPort, split by ,
listOfServers: 127.0.0.1:

agent/config/agent.config

# The agent namespace
# agent.namespace=${SW_AGENT_NAMESPACE:default-namespace} # The service name in UI
agent.service_name=${SW_AGENT_NAME:Your_ApplicationName} # The number of sampled traces per seconds
# Negative number means sample traces as many as possible, most likely %
# agent.sample_n_per_3_secs=${SW_AGENT_SAMPLE:-} # Authentication active is based on backend setting, see application.yml for more details.
# agent.authentication = ${SW_AGENT_AUTHENTICATION:xxxx} # The max amount of spans in a single segment.
# Through this config item, skywalking keep your application memory cost estimated.
# agent.span_limit_per_segment=${SW_AGENT_SPAN_LIMIT:} # Ignore the segments if their operation names end with these suffix.
# agent.ignore_suffix=${SW_AGENT_IGNORE_SUFFIX:.jpg,.jpeg,.js,.css,.png,.bmp,.gif,.ico,.mp3,.mp4,.html,.svg} # If true, skywalking agent will save all instrumented classes files in `/debugging` folder.
# Skywalking team may ask for these files in order to resolve compatible problem.
# agent.is_open_debugging_class = ${SW_AGENT_OPEN_DEBUG:true} # The operationName max length
# agent.operation_name_threshold=${SW_AGENT_OPERATION_NAME_THRESHOLD:} # Backend service addresses.
collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:127.0.0.1:} # Logging file_name
logging.file_name=${SW_LOGGING_FILE_NAME:skywalking-api.log} # Logging level
logging.level=${SW_LOGGING_LEVEL:DEBUG} # Logging dir
# logging.dir=${SW_LOGGING_DIR:""} # Logging max_file_size, default: * * =
# logging.max_file_size=${SW_LOGGING_MAX_FILE_SIZE:} # The max history log files. When rollover happened, if log files exceed this number,
# then the oldest file will be delete. Negative or zero means off, by default.
# logging.max_history_files=${SW_LOGGING_MAX_HISTORY_FILES:-} # mysql plugin configuration
# plugin.mysql.trace_sql_parameters=${SW_MYSQL_TRACE_SQL_PARAMETERS:false}

4. SkyWalking 架构

  来自官网的图片,感受一下!无须细说,大概原理就是: 针对各种不同客户端实现不同的指标采集,统一通过grpc/http发送到apm服务端,然后经过分析引擎后存储到es/h2/mysql等等存储系统,最后由前端通过查询引擎进行展现。

5. 可以用来干啥

  发现系统耗时或者说瓶颈在哪里。

  发现各系统之间的调用关系。

  监控服务异常。

  排查系统故障。

6. 其他存储系统接入

  h2只是一个内存存储系统,其目的是为了让你能够快速验证快速响应,它还没有强大到足以支撑线上系统运行。

  所以,线上一定得选用某个更可靠存储。

  一般地,ES会是个不错的选择,一来它以搜索速度著称而这正好符合后台查询的需求,二来es是分布式存储,可以避免一定的大数据量问题。

  mysql: 一般地对普通开发同学友好,且单机mysql容易搭建。

  tidb: 与mysql协议完全兼容,分布式存储。

  配置方法如demo所示。。。

分布式应用监控: SkyWalking 快速接入实践的更多相关文章

  1. 分布式应用监控:SkyWalking 快速接入实践

    分布式应用,会存在各种问题.而要解决这些难题,除了要应用自己做一些监控埋点外,还应该有一些外围的系统进行主动探测,主动发现. APM工具就是干这活的,SkyWalking 是国人开源的一款优秀的APM ...

  2. incubator-dolphinscheduler 如何在不写任何新代码的情况下,能快速接入到prometheus和grafana中进行监控

    一.prometheus和grafana 简介 prometheus是由谷歌研发的一款开源的监控软件,目前已经贡献给了apache 基金会托管. 监控通常分为白盒监控和黑盒监控之分. 白盒监控:通过监 ...

  3. 微信热修复tinker及tinker server快速接入

    博客: 安卓之家 掘金: jp1017 微博: 追风917 CSDN: 蒋朋的家 简书: 追风917 当前热修复方案很多,今天研究了下微信的tinker,使用效果还是不错的,配合tinker serv ...

  4. P4语言编程快速开始 实践二

    参考:P4语言编程快速开始 上一篇系列博客:P4语言编程快速开始 实践二 Demo 2 本Demo所做的修改及实现的功能: 为simple_router添加一个计数器(counter),该计数器附加( ...

  5. 微信小程序之快速接入七牛云

    小程序为什么要接入云? 目前,开发者在开发小程序过程中,主要遇到以下几个问题: 小程序发布大小超限 微信官方限制小程序的发布代码不能超过 1MB,而在实际开发过程中,一般的小程序难免会有图片等富媒体文 ...

  6. 如何让微信小程序快速接入七牛云

    如果你确定用七牛运行小程序的话,给大家分享一个九折优惠码:61d1fd4d1 月 9 日 微信小程序正式发布,小程序终于揭开了它神秘的面纱,开发者对小程序的追捧更是热度不减.从小程序的热门应用场景来看 ...

  7. 如何接入银联“快速接入”产品API

    引言:使用银联开放平台的用户或多或少都接触过产品API吧,那么大家对于“快速接入”产品API是否还会存在一些疑问呢?因为我之前对“快速接入”模糊不清,所以整理的一份详细的资料,里面梳理了“快速接入”产 ...

  8. Kubernetes集群的监控报警策略最佳实践

    版权声明:本文为博主原创文章,未经博主同意不得转载. https://blog.csdn.net/M2l0ZgSsVc7r69eFdTj/article/details/79652064 本文为Kub ...

  9. 快速接入PHP微信支付

    微信支付是微信开发中坑最多的一个功能,本文旨在帮助有开发基础的人快速接入微信支付,如果要详细了解微信支付,请看微信支付的开发文档. 再说把开发文档搬到这里来就没必要了.想要快速跑通微信支付的可以继续查 ...

随机推荐

  1. Sockit 硬件接口编程——点亮一个LED

    1.话不多说上代码 #include <stdio.h> #include <stdlib.h> #include <string.h> #include < ...

  2. Python之工作方向

    "python基础-->(函数/面向对象/网络编程(scoket套接字)/并发编程(mutiprocessing)) "运维+web开发-->页面展示(django/f ...

  3. 《Java Spring框架》Spring切面(AOP)配置详解

    1.  Spring 基本概念 AOP(Aspect Oriented Programming)称为面向切面编程,在程序开发中主要用来解决一些系统层面上的问题,比如日志,事务,权限等待,Struts2 ...

  4. c语言入门到精通怎么能少了这7本书籍?

    C语言作为学编程最好的入门语言,对一个初进程序大门的小白来说是很有帮助的,学习编程能培养一个人的逻辑思维,而C语言则是公认的最符合人们对程序的认知的一款计算机语言,很多大学都选择了使用C语言作为大学生 ...

  5. 小白的springboot之路(九)、集成MongoDB

    0.前言 MongoDB是一个高性能.开源的文档型数据库,是当前nosql数据库中最热门的一种,在企业中广泛应用:虽然前段时间更改了开源协议导致被很多企业舍弃,但主要是对云服务商影响较大,对我们来说其 ...

  6. MESSAGE_TYPE_X dump in RSM_DATASTATE_CHECK -6-

    DTP抽数时系统Dump 参考sapnote:2398760 - MESSAGE_TYPE_X dump in RSM_DATASTATE_CHECK -1- to -12- RSM_DATASTAT ...

  7. 安卓JNI精细化讲解,让你彻底了解JNI(二):用法解析

    目录 用法解析 ├── 1.JNI函数 │ ├── 1.1.extern "C" │ ├── 1.2.JNIEXPORT.JNICALL │ ├── 1.3.函数名 │ ├── 1 ...

  8. python基础之字符串讲解(下)

    7.swapspace 这个命令是让大小写翻转 s = 'qwerQ' s3 = s.swapcase() print(s3) 8.title 每个隔开(特殊字符或者数字)的单词首字母大写 s = ' ...

  9. asp.net core web应用以服务的方式安装运行

    目录 一.方案:使用Microsoft.Extensions.Hosting.WindowsServices实现: 一.方案:使用Microsoft.Extensions.Hosting.Window ...

  10. asp.net core中使用cookie身份验证

    配置 在 Startup.ConfigureServices 方法中,创建具有 AddAuthentication 和 AddCookie 方法的身份验证中间件服务: services.AddAuth ...