Metrics是以MetricsGroup来组织的

MetricGroup

  1. MetricGroup

这就是个metric容器,里面可以放subGroup,或者各种metric

所以主要的接口就是注册,

  1. /**
  2. * A MetricGroup is a named container for {@link Metric Metrics} and further metric subgroups.
  3. *
  4. * <p>Instances of this class can be used to register new metrics with Flink and to create a nested
  5. * hierarchy based on the group names.
  6. *
  7. * <p>A MetricGroup is uniquely identified by it's place in the hierarchy and name.
  8. */
  9. public interface MetricGroup {
  10. <C extends Counter> C counter(int name, C counter);
  11. <T, G extends Gauge<T>> G gauge(int name, G gauge);
  12. <H extends Histogram> H histogram(String name, H histogram);
  13. MetricGroup addGroup(String name);
  14. }

 

  1. AbstractMetricGroup

关键是实现MetricGroup,逻辑很简单,在注册或close的时候都需要加锁互斥

  1. /**
  2. * Abstract {@link MetricGroup} that contains key functionality for adding metrics and groups.
  3. *
  4. */
  5.  
  6. public abstract class AbstractMetricGroup implements MetricGroup {
  7.  
  8. /** The registry that this metrics group belongs to */
  9. protected final MetricRegistry registry;
  10.  
  11. /** All metrics that are directly contained in this group */
  12. private final Map<String, Metric> metrics = new HashMap<>();
  13.  
  14. /** All metric subgroups of this group */
  15. private final Map<String, AbstractMetricGroup> groups = new HashMap<>();
  16.  
  17. /** The metrics scope represented by this group.
  18. * For example ["host-7", "taskmanager-2", "window_word_count", "my-mapper" ]. */
  19. private final String[] scopeComponents; //命名空间
  20.  
  21. /** The metrics scope represented by this group, as a concatenated string, lazily computed.
  22. * For example: "host-7.taskmanager-2.window_word_count.my-mapper" */
  23. private String scopeString;
  24.  
  25. @Override
  26. public <C extends Counter> C counter(String name, C counter) {
  27. addMetric(name, counter);
  28. return counter;
  29. }
  30.  
  31. /**
  32. * Adds the given metric to the group and registers it at the registry, if the group
  33. * is not yet closed, and if no metric with the same name has been registered before.
  34. *
  35. * @param name the name to register the metric under
  36. * @param metric the metric to register
  37. */
  38. protected void addMetric(String name, Metric metric) {
  39. // add the metric only if the group is still open
  40. synchronized (this) { //加锁
  41. if (!closed) {
  42. // immediately put without a 'contains' check to optimize the common case (no collition)
  43. // collisions are resolved later
  44. Metric prior = metrics.put(name, metric);
  45.  
  46. // check for collisions with other metric names
  47. if (prior == null) {
  48. // no other metric with this name yet
  49.  
  50. registry.register(metric, name, this);
  51. }
  52. else {
  53. // we had a collision. put back the original value
  54. metrics.put(name, prior);
  55.  
  56. }
  57. }
  58. }
  59. }
  60. }

 

MetricReporter

采集好的Metrics需要用reporter才能发送出去,

  1. /**
  2. * Reporters are used to export {@link Metric Metrics} to an external backend.
  3. *
  4. * <p>Reporters are instantiated via reflection and must be public, non-abstract, and have a
  5. * public no-argument constructor.
  6. */
  7. public interface MetricReporter {
  8.  
  9. // ------------------------------------------------------------------------
  10. // life cycle
  11. // ------------------------------------------------------------------------
  12.  
  13. /**
  14. * Configures this reporter. Since reporters are instantiated generically and hence parameter-less,
  15. * this method is the place where the reporters set their basic fields based on configuration values.
  16. *
  17. * <p>This method is always called first on a newly instantiated reporter.
  18. *
  19. * @param config The configuration with all parameters.
  20. */
  21. void open(MetricConfig config);
  22.  
  23. /**
  24. * Closes this reporter. Should be used to close channels, streams and release resources.
  25. */
  26. void close();
  27.  
  28. void notifyOfAddedMetric(Metric metric, String metricName, MetricGroup group);
  29. void notifyOfRemovedMetric(Metric metric, String metricName, MetricGroup group);
  30. }

 

AbstractReporter实现MetricReport接口,

  1. /**
  2. * Base interface for custom metric reporters.
  3. */
  4. public abstract class AbstractReporter implements MetricReporter, CharacterFilter {
  5. protected final Logger log = LoggerFactory.getLogger(getClass());
  6.  
  7. protected final Map<Gauge<?>, String> gauges = new HashMap<>();
  8. protected final Map<Counter, String> counters = new HashMap<>();
  9. protected final Map<Histogram, String> histograms = new HashMap<>();
  10.  
  11. @Override
  12. public void notifyOfAddedMetric(Metric metric, String metricName, MetricGroup group) {
  13. final String name = group.getMetricIdentifier(metricName, this); //group只是用来获取metrics完整的name
  14.  
  15. synchronized (this) {
  16. if (metric instanceof Counter) {
  17. counters.put((Counter) metric, name);
  18. } else if (metric instanceof Gauge) {
  19. gauges.put((Gauge<?>) metric, name);
  20. } else if (metric instanceof Histogram) {
  21. histograms.put((Histogram) metric, name);
  22. } else {
  23. log.warn("Cannot add unknown metric type {}. This indicates that the reporter " +
  24. "does not support this metric type.", metric.getClass().getName());
  25. }
  26. }
  27. }
  28.  
  29. @Override
  30. public void notifyOfRemovedMetric(Metric metric, String metricName, MetricGroup group) {
  31. synchronized (this) {
  32. if (metric instanceof Counter) {
  33. counters.remove(metric);
  34. } else if (metric instanceof Gauge) {
  35. gauges.remove(metric);
  36. } else if (metric instanceof Histogram) {
  37. histograms.remove(metric);
  38. } else {
  39. log.warn("Cannot remove unknown metric type {}. This indicates that the reporter " +
  40. "does not support this metric type.", metric.getClass().getName());
  41. }
  42. }
  43. }
  44. }

 

MetricRegistry

MetricRegistry用于连接MetricGroups和MetricReporters,

会把需要report的metric加到MetricReporters,并启动定时的report线程

  1. /**
  2. * A MetricRegistry keeps track of all registered {@link Metric Metrics}. It serves as the
  3. * connection between {@link MetricGroup MetricGroups} and {@link MetricReporter MetricReporters}.
  4. */
  5. public class MetricRegistry {
  6.  
  7. private List<MetricReporter> reporters;
  8. private ScheduledExecutorService executor;
  9.  
  10. private final ScopeFormats scopeFormats;
  11.  
  12. private final char delimiter;
  13.  
  14. /**
  15. * Creates a new MetricRegistry and starts the configured reporter.
  16. */
  17. public MetricRegistry(Configuration config) {
  18. // first parse the scope formats, these are needed for all reporters
  19. ScopeFormats scopeFormats;
  20. try {
  21. scopeFormats = createScopeConfig(config); //从配置中读到scope的格式,即监控数据的namespace的格式是什么
  22. }
  23. catch (Exception e) {
  24. LOG.warn("Failed to parse scope format, using default scope formats", e);
  25. scopeFormats = new ScopeFormats();
  26. }
  27. this.scopeFormats = scopeFormats;
  28.  
  29. char delim;
  30. try {
  31. delim = config.getString(ConfigConstants.METRICS_SCOPE_DELIMITER, ".").charAt(0); //从配置里面读出分隔符
  32. } catch (Exception e) {
  33. LOG.warn("Failed to parse delimiter, using default delimiter.", e);
  34. delim = '.';
  35. }
  36. this.delimiter = delim;
  37.  
  38. // second, instantiate any custom configured reporters
  39. this.reporters = new ArrayList<>();
  40.  
  41. final String definedReporters = config.getString(ConfigConstants.METRICS_REPORTERS_LIST, null); //读出配置的Reporters
  42.  
  43. if (definedReporters == null) {
  44. // no reporters defined
  45. // by default, don't report anything
  46. LOG.info("No metrics reporter configured, no metrics will be exposed/reported.");
  47. this.executor = null;
  48. } else {
  49. // we have some reporters so
  50. String[] namedReporters = definedReporters.split("\\s*,\\s*");
  51. for (String namedReporter : namedReporters) { //对于配置的每个reporter
  52.  
  53. DelegatingConfiguration reporterConfig = new DelegatingConfiguration(config, ConfigConstants.METRICS_REPORTER_PREFIX + namedReporter + ".");
  54. final String className = reporterConfig.getString(ConfigConstants.METRICS_REPORTER_CLASS_SUFFIX, null); //reporter class名配置
  55.  
  56. try {
  57. String configuredPeriod = reporterConfig.getString(ConfigConstants.METRICS_REPORTER_INTERVAL_SUFFIX, null); //report interval配置
  58. TimeUnit timeunit = TimeUnit.SECONDS;
  59. long period = 10;
  60.  
  61. if (configuredPeriod != null) {
  62. try {
  63. String[] interval = configuredPeriod.split(" ");
  64. period = Long.parseLong(interval[0]);
  65. timeunit = TimeUnit.valueOf(interval[1]);
  66. }
  67. catch (Exception e) {
  68. LOG.error("Cannot parse report interval from config: " + configuredPeriod +
  69. " - please use values like '10 SECONDS' or '500 MILLISECONDS'. " +
  70. "Using default reporting interval.");
  71. }
  72. }
  73.  
  74. Class<?> reporterClass = Class.forName(className);
  75. MetricReporter reporterInstance = (MetricReporter) reporterClass.newInstance(); //实例化reporter
  76.  
  77. MetricConfig metricConfig = new MetricConfig();
  78. reporterConfig.addAllToProperties(metricConfig);
  79. reporterInstance.open(metricConfig); //open reporter
  80.  
  81. if (reporterInstance instanceof Scheduled) {
  82. if (this.executor == null) {
  83. executor = Executors.newSingleThreadScheduledExecutor(); //创建Executor
  84. }
  85. LOG.info("Periodically reporting metrics in intervals of {} {} for reporter {} of type {}.", period, timeunit.name(), namedReporter, className);
  86.  
  87. executor.scheduleWithFixedDelay(
  88. new ReporterTask((Scheduled) reporterInstance), period, period, timeunit); //Scheduled report
  89. }
  90. reporters.add(reporterInstance); //加入reporters列表
  91. }
  92. catch (Throwable t) {
  93. shutdownExecutor();
  94. LOG.error("Could not instantiate metrics reporter" + namedReporter + ". Metrics might not be exposed/reported.", t);
  95. }
  96. }
  97. }
  98. }
  99.  
  100. // ------------------------------------------------------------------------
  101. // Metrics (de)registration
  102. // ------------------------------------------------------------------------
  103.  
  104. /**
  105. * Registers a new {@link Metric} with this registry.
  106. *
  107. * @param metric the metric that was added
  108. * @param metricName the name of the metric
  109. * @param group the group that contains the metric
  110. */
  111. public void register(Metric metric, String metricName, MetricGroup group) { //在AbstractMetricGroup.addMetric中被调用,metric被加到group的同时也会加到reporter中
  1. try {
  2. if (reporters != null) {
  3. for (MetricReporter reporter : reporters) {
  4. if (reporter != null) {
  5. reporter.notifyOfAddedMetric(metric, metricName, group); //把metric加到每个reporters上面
  6. }
  7. }
  8. }
  9. } catch (Exception e) {
  10. LOG.error("Error while registering metric.", e);
  11. }
  12. }
  13.  
  14. /**
  15. * Un-registers the given {@link org.apache.flink.metrics.Metric} with this registry.
  16. *
  17. * @param metric the metric that should be removed
  18. * @param metricName the name of the metric
  19. * @param group the group that contains the metric
  20. */
  21. public void unregister(Metric metric, String metricName, MetricGroup group) {
  22. try {
  23. if (reporters != null) {
  24. for (MetricReporter reporter : reporters) {
  25. if (reporter != null) {
  26. reporter.notifyOfRemovedMetric(metric, metricName, group);
  27. }
  28. }
  29. }
  30. } catch (Exception e) {
  31. LOG.error("Error while registering metric.", e);
  32. }
  33. }
  34.  
  35. // ------------------------------------------------------------------------
  36.  
  37. /**
  38. * This task is explicitly a static class, so that it does not hold any references to the enclosing
  39. * MetricsRegistry instance.
  40. *
  41. * This is a subtle difference, but very important: With this static class, the enclosing class instance
  42. * may become garbage-collectible, whereas with an anonymous inner class, the timer thread
  43. * (which is a GC root) will hold a reference via the timer task and its enclosing instance pointer.
  44. * Making the MetricsRegistry garbage collectible makes the java.util.Timer garbage collectible,
  45. * which acts as a fail-safe to stop the timer thread and prevents resource leaks.
  46. */
  47. private static final class ReporterTask extends TimerTask {
  48.  
  49. private final Scheduled reporter;
  50.  
  51. private ReporterTask(Scheduled reporter) {
  52. this.reporter = reporter;
  53. }
  54.  
  55. @Override
  56. public void run() {
  57. try {
  58. reporter.report(); //Task的核心就是调用reporter.report
  59. } catch (Throwable t) {
  60. LOG.warn("Error while reporting metrics", t);
  61. }
  62. }
  63. }
  64. }

 

TaskManager

在TaskManager中,

  1. associateWithJobManager
  1. metricsRegistry = new FlinkMetricRegistry(config.configuration)
  2.  
  3. taskManagerMetricGroup =
  4. new TaskManagerMetricGroup(metricsRegistry, this.runtimeInfo.getHostname, id.toString)
  5.  
  6. TaskManager.instantiateStatusMetrics(taskManagerMetricGroup)

创建metricsRegistry 和TaskManagerMetricGroup

可以看到instantiateStatusMetrics,只是注册各种taskManager的status metrics,

  1. private def instantiateStatusMetrics(taskManagerMetricGroup: MetricGroup) : Unit = {
  2. val jvm = taskManagerMetricGroup
  3. .addGroup("Status")
  4. .addGroup("JVM")
  5.  
  6. instantiateClassLoaderMetrics(jvm.addGroup("ClassLoader"))
  7. instantiateGarbageCollectorMetrics(jvm.addGroup("GarbageCollector"))
  8. instantiateMemoryMetrics(jvm.addGroup("Memory"))
  9. instantiateThreadMetrics(jvm.addGroup("Threads"))
  10. instantiateCPUMetrics(jvm.addGroup("CPU"))
  11. }
  12.  
  13. private def instantiateClassLoaderMetrics(metrics: MetricGroup) {
  14. val mxBean = ManagementFactory.getClassLoadingMXBean //从ManagementFactory可以取出表示JVM指标的MXBean
  15.  
  16. metrics.gauge[Long, FlinkGauge[Long]]("ClassesLoaded", new FlinkGauge[Long] {
  17. override def getValue: Long = mxBean.getTotalLoadedClassCount
  18. })
  19. metrics.gauge[Long, FlinkGauge[Long]]("ClassesUnloaded", new FlinkGauge[Long] {
  20. override def getValue: Long = mxBean.getUnloadedClassCount
  21. })
  22. }

 

在submitTask的时候,

  1. submitTask
  1. val taskMetricGroup = taskManagerMetricGroup.addTaskForJob(tdd)
  2.  
  3. val task = new Task(
  4. tdd,
  5. memoryManager,
  6. ioManager,
  7. network,
  8. bcVarManager,
  9. selfGateway,
  10. jobManagerGateway,
  11. config.timeout,
  12. libCache,
  13. fileCache,
  14. runtimeInfo,
  15. taskMetricGroup)

看到会为每个task,创建taskMetricGroup

并在创建Task对象的时候传入该对象,

  1. Environment env = new RuntimeEnvironment(jobId, vertexId, executionId,
  2. executionConfig, taskInfo, jobConfiguration, taskConfiguration,
  3. userCodeClassLoader, memoryManager, ioManager,
  4. broadcastVariableManager, accumulatorRegistry,
  5. splitProvider, distributedCacheEntries,
  6. writers, inputGates, jobManager, taskManagerConfig, metrics, this);
  7.  
  8. // let the task code create its readers and writers
  9. invokable.setEnvironment(env);

在Task中, 关键的就是把这个taskMetricGroup,加入RuntimeEnvironment,这样在实际逻辑中,就可以通过RuntimeEnvironment获取到metrics

而StreamTask就是一种Invokable,接口定义如下

  1. public abstract class AbstractInvokable {
  2.  
  3. /** The environment assigned to this invokable. */
  4. private Environment environment;
  5.  
  6. /**
  7. * Starts the execution.
  8. *
  9. * <p>Must be overwritten by the concrete task implementation. This method
  10. * is called by the task manager when the actual execution of the task
  11. * starts.
  12. *
  13. * <p>All resources should be cleaned up when the method returns. Make sure
  14. * to guard the code with <code>try-finally</code> blocks where necessary.
  15. *
  16. * @throws Exception
  17. * Tasks may forward their exceptions for the TaskManager to handle through failure/recovery.
  18. */
  19. public abstract void invoke() throws Exception;
  20.  
  21. /**
  22. * Sets the environment of this task.
  23. *
  24. * @param environment
  25. * the environment of this task
  26. */
  27. public final void setEnvironment(Environment environment) {
  28. this.environment = environment;
  29. }
  30.  
  31. /**
  32. * Returns the environment of this task.
  33. *
  34. * @return The environment of this task.
  35. */
  36. public Environment getEnvironment() {
  37. return this.environment;
  38. }
  39. }

 

所以在StreamTask里面可以这样使用metrics,

  1. getEnvironment().getMetricGroup().gauge("lastCheckpointSize", new Gauge<Long>() {
  2. @Override
  3. public Long getValue() {
  4. return StreamTask.this.lastCheckpointSize;
  5. }
  6. });

Flink - metrics的更多相关文章

  1. Flink Metrics 源码解析

    Flink Metrics 有如下模块: Flink Metrics 源码解析 -- Flink-metrics-core Flink Metrics 源码解析 -- Flink-metrics-da ...

  2. 深入理解Flink ---- Metrics的内部结构

    从Metrics的使用说起 Flink的Metrics种类有四种Counters, Gauges, Histograms和Meters. 如何使用Metrics呢? 以Counter为例, publi ...

  3. Flink – metrics V1.2

    WebRuntimeMonitor   .GET("/jobs/:jobid/vertices/:vertexid/metrics", handler(new JobVertexM ...

  4. Apache Flink 进阶(八):详解 Metrics 原理与实战

    本文由 Apache Flink Contributor 刘彪分享,本文对两大问题进行了详细的介绍,即什么是 Metrics.如何使用 Metrics,并对 Metrics 监控实战进行解释说明. 什 ...

  5. Flink写入kafka时,只写入kafka的部分Partitioner,无法写所有的Partitioner问题

    1. 写在前面 在利用flink实时计算的时候,往往会从kafka读取数据写入数据到kafka,但会发现当kafka多个Partitioner时,特别在P量级数据为了kafka的性能kafka的节点有 ...

  6. flink metric库的使用和自定义metric-reporter

    简单介绍 flink内部实现了一套metric数据收集库. 同时flink自身系统有一些固定的metric数据, 包括系统的一些指标,CPU,内存, IO 或者各个task运行的一些指标.具体包含那些 ...

  7. Flink知识点

    1. Flink.Storm.Sparkstreaming对比 Storm只支持流处理任务,数据是一条一条的源源不断地处理,而MapReduce.spark只支持批处理任务,spark-streami ...

  8. Flink 灵魂两百问,这谁顶得住?

    Flink 学习 https://github.com/zhisheng17/flink-learning 麻烦路过的各位亲给这个项目点个 star,太不易了,写了这么多,算是对我坚持下来的一种鼓励吧 ...

  9. Flink 从0到1学习 —— Flink 中如何管理配置?

    前言 如果你了解 Apache Flink 的话,那么你应该熟悉该如何像 Flink 发送数据或者如何从 Flink 获取数据.但是在某些情况下,我们需要将配置数据发送到 Flink 集群并从中接收一 ...

随机推荐

  1. Robotium编写测试用例如何模拟Junit4的BeforeClass和AfterClass方法1 - 条件判断法

    本文来源于:http://blog.csdn.net/zhubaitian/article/details/39293883 Robotium的测试类ActivityInstrumentationTe ...

  2. LSM树由来、设计思想以及应用到HBase的索引

    讲LSM树之前,需要提下三种基本的存储引擎,这样才能清楚LSM树的由来: 哈希存储引擎  是哈希表的持久化实现,支持增.删.改以及随机读取操作,但不支持顺序扫描,对应的存储系统为key-value存储 ...

  3. The 2015 China Collegiate Programming Contest L. Huatuo's Medicine hdu 5551

    Huatuo's Medicine Time Limit: 3000/1000 MS (Java/Others)    Memory Limit: 65535/65535 K (Java/Others ...

  4. 【原】iOS学习44之动画

    1. 简单动画 1> UIImageView GIF 动画 GIF图的原理是:获取图片,存储在图片数组中,按照图片数组的顺序将图片以一定的速度播放 UIImageView *showGifima ...

  5. [转]七天学会NodeJS

    转:http://nqdeng.github.io/7-days-nodejs/ NodeJS基础 什么是NodeJS JS是脚本语言,脚本语言都需要一个解析器才能运行.对于写在HTML页面里的JS, ...

  6. Codeforces Round #353 (Div. 2) A. Infinite Sequence

    Vasya likes everything infinite. Now he is studying the properties of a sequence s, such that its fi ...

  7. locale的设定及其LANG、LC_ALL、LANGUAGE环境变量的区别

    locale这个单词中文翻译成地区或者地域,其实这个单词包含的意义要宽泛很多.Locale是根据计算机用户所使用的语言,所在国家或者地区,以及当地的文化传统所定义的一个软件运行时的语言环境. [ora ...

  8. db2工具优化

  9. CSS+DIV两栏式全屏布局

    在网上找了很久,发现大部分都是固定宽度设置两栏,全屏情况下的布局很少.最后终于完成了,写出来备查,也供大家参考. <!DOCTYPE html PUBLIC "-//W3C//DTD ...

  10. RCA:未注意Curl-library Post 1024以上字节时的HTTP/1.1特性导致 HessianPHP 传输数据失败

    先列出 HessianPHP 的错误提示: CURL transport error: transfer closed with outstanding read data remaining 基础知 ...