Metrics是以MetricsGroup来组织的

MetricGroup

MetricGroup

这就是个metric容器,里面可以放subGroup,或者各种metric

所以主要的接口就是注册,

/**
* A MetricGroup is a named container for {@link Metric Metrics} and further metric subgroups.
*
* <p>Instances of this class can be used to register new metrics with Flink and to create a nested
* hierarchy based on the group names.
*
* <p>A MetricGroup is uniquely identified by it's place in the hierarchy and name.
*/
public interface MetricGroup {
<C extends Counter> C counter(int name, C counter);
<T, G extends Gauge<T>> G gauge(int name, G gauge);
<H extends Histogram> H histogram(String name, H histogram);
MetricGroup addGroup(String name);
}

 

AbstractMetricGroup

关键是实现MetricGroup,逻辑很简单,在注册或close的时候都需要加锁互斥

/**
* Abstract {@link MetricGroup} that contains key functionality for adding metrics and groups.
*
*/ public abstract class AbstractMetricGroup implements MetricGroup { /** The registry that this metrics group belongs to */
protected final MetricRegistry registry; /** All metrics that are directly contained in this group */
private final Map<String, Metric> metrics = new HashMap<>(); /** All metric subgroups of this group */
private final Map<String, AbstractMetricGroup> groups = new HashMap<>(); /** The metrics scope represented by this group.
* For example ["host-7", "taskmanager-2", "window_word_count", "my-mapper" ]. */
private final String[] scopeComponents; //命名空间 /** The metrics scope represented by this group, as a concatenated string, lazily computed.
* For example: "host-7.taskmanager-2.window_word_count.my-mapper" */
private String scopeString; @Override
public <C extends Counter> C counter(String name, C counter) {
addMetric(name, counter);
return counter;
} /**
* Adds the given metric to the group and registers it at the registry, if the group
* is not yet closed, and if no metric with the same name has been registered before.
*
* @param name the name to register the metric under
* @param metric the metric to register
*/
protected void addMetric(String name, Metric metric) {
// add the metric only if the group is still open
synchronized (this) { //加锁
if (!closed) {
// immediately put without a 'contains' check to optimize the common case (no collition)
// collisions are resolved later
Metric prior = metrics.put(name, metric); // check for collisions with other metric names
if (prior == null) {
// no other metric with this name yet registry.register(metric, name, this);
}
else {
// we had a collision. put back the original value
metrics.put(name, prior); }
}
}
}
}

 

MetricReporter

采集好的Metrics需要用reporter才能发送出去,

/**
* Reporters are used to export {@link Metric Metrics} to an external backend.
*
* <p>Reporters are instantiated via reflection and must be public, non-abstract, and have a
* public no-argument constructor.
*/
public interface MetricReporter { // ------------------------------------------------------------------------
// life cycle
// ------------------------------------------------------------------------ /**
* Configures this reporter. Since reporters are instantiated generically and hence parameter-less,
* this method is the place where the reporters set their basic fields based on configuration values.
*
* <p>This method is always called first on a newly instantiated reporter.
*
* @param config The configuration with all parameters.
*/
void open(MetricConfig config); /**
* Closes this reporter. Should be used to close channels, streams and release resources.
*/
void close(); void notifyOfAddedMetric(Metric metric, String metricName, MetricGroup group);
void notifyOfRemovedMetric(Metric metric, String metricName, MetricGroup group);
}

 

AbstractReporter实现MetricReport接口,

/**
* Base interface for custom metric reporters.
*/
public abstract class AbstractReporter implements MetricReporter, CharacterFilter {
protected final Logger log = LoggerFactory.getLogger(getClass()); protected final Map<Gauge<?>, String> gauges = new HashMap<>();
protected final Map<Counter, String> counters = new HashMap<>();
protected final Map<Histogram, String> histograms = new HashMap<>(); @Override
public void notifyOfAddedMetric(Metric metric, String metricName, MetricGroup group) {
final String name = group.getMetricIdentifier(metricName, this); //group只是用来获取metrics完整的name synchronized (this) {
if (metric instanceof Counter) {
counters.put((Counter) metric, name);
} else if (metric instanceof Gauge) {
gauges.put((Gauge<?>) metric, name);
} else if (metric instanceof Histogram) {
histograms.put((Histogram) metric, name);
} else {
log.warn("Cannot add unknown metric type {}. This indicates that the reporter " +
"does not support this metric type.", metric.getClass().getName());
}
}
} @Override
public void notifyOfRemovedMetric(Metric metric, String metricName, MetricGroup group) {
synchronized (this) {
if (metric instanceof Counter) {
counters.remove(metric);
} else if (metric instanceof Gauge) {
gauges.remove(metric);
} else if (metric instanceof Histogram) {
histograms.remove(metric);
} else {
log.warn("Cannot remove unknown metric type {}. This indicates that the reporter " +
"does not support this metric type.", metric.getClass().getName());
}
}
}
}

 

MetricRegistry

MetricRegistry用于连接MetricGroups和MetricReporters,

会把需要report的metric加到MetricReporters,并启动定时的report线程

/**
* A MetricRegistry keeps track of all registered {@link Metric Metrics}. It serves as the
* connection between {@link MetricGroup MetricGroups} and {@link MetricReporter MetricReporters}.
*/
public class MetricRegistry { private List<MetricReporter> reporters;
private ScheduledExecutorService executor; private final ScopeFormats scopeFormats; private final char delimiter; /**
* Creates a new MetricRegistry and starts the configured reporter.
*/
public MetricRegistry(Configuration config) {
// first parse the scope formats, these are needed for all reporters
ScopeFormats scopeFormats;
try {
scopeFormats = createScopeConfig(config); //从配置中读到scope的格式,即监控数据的namespace的格式是什么
}
catch (Exception e) {
LOG.warn("Failed to parse scope format, using default scope formats", e);
scopeFormats = new ScopeFormats();
}
this.scopeFormats = scopeFormats; char delim;
try {
delim = config.getString(ConfigConstants.METRICS_SCOPE_DELIMITER, ".").charAt(0); //从配置里面读出分隔符
} catch (Exception e) {
LOG.warn("Failed to parse delimiter, using default delimiter.", e);
delim = '.';
}
this.delimiter = delim; // second, instantiate any custom configured reporters
this.reporters = new ArrayList<>(); final String definedReporters = config.getString(ConfigConstants.METRICS_REPORTERS_LIST, null); //读出配置的Reporters if (definedReporters == null) {
// no reporters defined
// by default, don't report anything
LOG.info("No metrics reporter configured, no metrics will be exposed/reported.");
this.executor = null;
} else {
// we have some reporters so
String[] namedReporters = definedReporters.split("\\s*,\\s*");
for (String namedReporter : namedReporters) { //对于配置的每个reporter DelegatingConfiguration reporterConfig = new DelegatingConfiguration(config, ConfigConstants.METRICS_REPORTER_PREFIX + namedReporter + ".");
final String className = reporterConfig.getString(ConfigConstants.METRICS_REPORTER_CLASS_SUFFIX, null); //reporter class名配置 try {
String configuredPeriod = reporterConfig.getString(ConfigConstants.METRICS_REPORTER_INTERVAL_SUFFIX, null); //report interval配置
TimeUnit timeunit = TimeUnit.SECONDS;
long period = 10; if (configuredPeriod != null) {
try {
String[] interval = configuredPeriod.split(" ");
period = Long.parseLong(interval[0]);
timeunit = TimeUnit.valueOf(interval[1]);
}
catch (Exception e) {
LOG.error("Cannot parse report interval from config: " + configuredPeriod +
" - please use values like '10 SECONDS' or '500 MILLISECONDS'. " +
"Using default reporting interval.");
}
} Class<?> reporterClass = Class.forName(className);
MetricReporter reporterInstance = (MetricReporter) reporterClass.newInstance(); //实例化reporter MetricConfig metricConfig = new MetricConfig();
reporterConfig.addAllToProperties(metricConfig);
reporterInstance.open(metricConfig); //open reporter if (reporterInstance instanceof Scheduled) {
if (this.executor == null) {
executor = Executors.newSingleThreadScheduledExecutor(); //创建Executor
}
LOG.info("Periodically reporting metrics in intervals of {} {} for reporter {} of type {}.", period, timeunit.name(), namedReporter, className); executor.scheduleWithFixedDelay(
new ReporterTask((Scheduled) reporterInstance), period, period, timeunit); //Scheduled report
}
reporters.add(reporterInstance); //加入reporters列表
}
catch (Throwable t) {
shutdownExecutor();
LOG.error("Could not instantiate metrics reporter" + namedReporter + ". Metrics might not be exposed/reported.", t);
}
}
}
} // ------------------------------------------------------------------------
// Metrics (de)registration
// ------------------------------------------------------------------------ /**
* Registers a new {@link Metric} with this registry.
*
* @param metric the metric that was added
* @param metricName the name of the metric
* @param group the group that contains the metric
*/
public void register(Metric metric, String metricName, MetricGroup group) { //在AbstractMetricGroup.addMetric中被调用,metric被加到group的同时也会加到reporter中
        try {
if (reporters != null) {
for (MetricReporter reporter : reporters) {
if (reporter != null) {
reporter.notifyOfAddedMetric(metric, metricName, group); //把metric加到每个reporters上面
}
}
}
} catch (Exception e) {
LOG.error("Error while registering metric.", e);
}
} /**
* Un-registers the given {@link org.apache.flink.metrics.Metric} with this registry.
*
* @param metric the metric that should be removed
* @param metricName the name of the metric
* @param group the group that contains the metric
*/
public void unregister(Metric metric, String metricName, MetricGroup group) {
try {
if (reporters != null) {
for (MetricReporter reporter : reporters) {
if (reporter != null) {
reporter.notifyOfRemovedMetric(metric, metricName, group);
}
}
}
} catch (Exception e) {
LOG.error("Error while registering metric.", e);
}
} // ------------------------------------------------------------------------ /**
* This task is explicitly a static class, so that it does not hold any references to the enclosing
* MetricsRegistry instance.
*
* This is a subtle difference, but very important: With this static class, the enclosing class instance
* may become garbage-collectible, whereas with an anonymous inner class, the timer thread
* (which is a GC root) will hold a reference via the timer task and its enclosing instance pointer.
* Making the MetricsRegistry garbage collectible makes the java.util.Timer garbage collectible,
* which acts as a fail-safe to stop the timer thread and prevents resource leaks.
*/
private static final class ReporterTask extends TimerTask { private final Scheduled reporter; private ReporterTask(Scheduled reporter) {
this.reporter = reporter;
} @Override
public void run() {
try {
reporter.report(); //Task的核心就是调用reporter.report
} catch (Throwable t) {
LOG.warn("Error while reporting metrics", t);
}
}
}
}

 

TaskManager

在TaskManager中,

associateWithJobManager
metricsRegistry = new FlinkMetricRegistry(config.configuration)

taskManagerMetricGroup =
new TaskManagerMetricGroup(metricsRegistry, this.runtimeInfo.getHostname, id.toString) TaskManager.instantiateStatusMetrics(taskManagerMetricGroup)

创建metricsRegistry 和TaskManagerMetricGroup

可以看到instantiateStatusMetrics,只是注册各种taskManager的status metrics,

private def instantiateStatusMetrics(taskManagerMetricGroup: MetricGroup) : Unit = {
val jvm = taskManagerMetricGroup
.addGroup("Status")
.addGroup("JVM") instantiateClassLoaderMetrics(jvm.addGroup("ClassLoader"))
instantiateGarbageCollectorMetrics(jvm.addGroup("GarbageCollector"))
instantiateMemoryMetrics(jvm.addGroup("Memory"))
instantiateThreadMetrics(jvm.addGroup("Threads"))
instantiateCPUMetrics(jvm.addGroup("CPU"))
} private def instantiateClassLoaderMetrics(metrics: MetricGroup) {
val mxBean = ManagementFactory.getClassLoadingMXBean //从ManagementFactory可以取出表示JVM指标的MXBean metrics.gauge[Long, FlinkGauge[Long]]("ClassesLoaded", new FlinkGauge[Long] {
override def getValue: Long = mxBean.getTotalLoadedClassCount
})
metrics.gauge[Long, FlinkGauge[Long]]("ClassesUnloaded", new FlinkGauge[Long] {
override def getValue: Long = mxBean.getUnloadedClassCount
})
}

 

在submitTask的时候,

submitTask
  val taskMetricGroup = taskManagerMetricGroup.addTaskForJob(tdd)

  val task = new Task(
tdd,
memoryManager,
ioManager,
network,
bcVarManager,
selfGateway,
jobManagerGateway,
config.timeout,
libCache,
fileCache,
runtimeInfo,
taskMetricGroup)

看到会为每个task,创建taskMetricGroup

并在创建Task对象的时候传入该对象,

Environment env = new RuntimeEnvironment(jobId, vertexId, executionId,
executionConfig, taskInfo, jobConfiguration, taskConfiguration,
userCodeClassLoader, memoryManager, ioManager,
broadcastVariableManager, accumulatorRegistry,
splitProvider, distributedCacheEntries,
writers, inputGates, jobManager, taskManagerConfig, metrics, this); // let the task code create its readers and writers
invokable.setEnvironment(env);

在Task中, 关键的就是把这个taskMetricGroup,加入RuntimeEnvironment,这样在实际逻辑中,就可以通过RuntimeEnvironment获取到metrics

而StreamTask就是一种Invokable,接口定义如下

public abstract class AbstractInvokable {

    /** The environment assigned to this invokable. */
private Environment environment; /**
* Starts the execution.
*
* <p>Must be overwritten by the concrete task implementation. This method
* is called by the task manager when the actual execution of the task
* starts.
*
* <p>All resources should be cleaned up when the method returns. Make sure
* to guard the code with <code>try-finally</code> blocks where necessary.
*
* @throws Exception
* Tasks may forward their exceptions for the TaskManager to handle through failure/recovery.
*/
public abstract void invoke() throws Exception; /**
* Sets the environment of this task.
*
* @param environment
* the environment of this task
*/
public final void setEnvironment(Environment environment) {
this.environment = environment;
} /**
* Returns the environment of this task.
*
* @return The environment of this task.
*/
public Environment getEnvironment() {
return this.environment;
}
}

 

所以在StreamTask里面可以这样使用metrics,

getEnvironment().getMetricGroup().gauge("lastCheckpointSize", new Gauge<Long>() {
@Override
public Long getValue() {
return StreamTask.this.lastCheckpointSize;
}
});

Flink - metrics的更多相关文章

  1. Flink Metrics 源码解析

    Flink Metrics 有如下模块: Flink Metrics 源码解析 -- Flink-metrics-core Flink Metrics 源码解析 -- Flink-metrics-da ...

  2. 深入理解Flink ---- Metrics的内部结构

    从Metrics的使用说起 Flink的Metrics种类有四种Counters, Gauges, Histograms和Meters. 如何使用Metrics呢? 以Counter为例, publi ...

  3. Flink – metrics V1.2

    WebRuntimeMonitor   .GET("/jobs/:jobid/vertices/:vertexid/metrics", handler(new JobVertexM ...

  4. Apache Flink 进阶(八):详解 Metrics 原理与实战

    本文由 Apache Flink Contributor 刘彪分享,本文对两大问题进行了详细的介绍,即什么是 Metrics.如何使用 Metrics,并对 Metrics 监控实战进行解释说明. 什 ...

  5. Flink写入kafka时,只写入kafka的部分Partitioner,无法写所有的Partitioner问题

    1. 写在前面 在利用flink实时计算的时候,往往会从kafka读取数据写入数据到kafka,但会发现当kafka多个Partitioner时,特别在P量级数据为了kafka的性能kafka的节点有 ...

  6. flink metric库的使用和自定义metric-reporter

    简单介绍 flink内部实现了一套metric数据收集库. 同时flink自身系统有一些固定的metric数据, 包括系统的一些指标,CPU,内存, IO 或者各个task运行的一些指标.具体包含那些 ...

  7. Flink知识点

    1. Flink.Storm.Sparkstreaming对比 Storm只支持流处理任务,数据是一条一条的源源不断地处理,而MapReduce.spark只支持批处理任务,spark-streami ...

  8. Flink 灵魂两百问,这谁顶得住?

    Flink 学习 https://github.com/zhisheng17/flink-learning 麻烦路过的各位亲给这个项目点个 star,太不易了,写了这么多,算是对我坚持下来的一种鼓励吧 ...

  9. Flink 从0到1学习 —— Flink 中如何管理配置?

    前言 如果你了解 Apache Flink 的话,那么你应该熟悉该如何像 Flink 发送数据或者如何从 Flink 获取数据.但是在某些情况下,我们需要将配置数据发送到 Flink 集群并从中接收一 ...

随机推荐

  1. jQuery的封装和扩展方式

    <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title> ...

  2. Codeforces Round #352 (Div. 2) A Summer Camp

    Every year, hundreds of people come to summer camps, they learn new algorithms and solve hard proble ...

  3. ACM 背包问题

    背包问题 时间限制:3000 ms  |  内存限制:65535 KB 难度:3   描述 现在有很多物品(它们是可以分割的),我们知道它们每个物品的单位重量的价值v和重量w(1<=v,w< ...

  4. 这次,雅虎真的撤销QA团队了

    在一个软件开发过程中取消了质量保证团队会发生什么?更少,而不是更多的错误,以及一个大大加快的开发周期. 至少,根据雅虎的经验,确实如此.该公司的首席设计师Amotz Maimon,以及科学与技术高级副 ...

  5. BZOJ 3211 题解

    3211: 花神游历各国 Time Limit: 5 Sec  Memory Limit: 128 MBSubmit: 2549  Solved: 946[Submit][Status][Discus ...

  6. c#的学习

    C#,读做 "C sharp",中文译音暂时没有,非专业人士一般读"C井",专业人士一般读"C sharp".C#是一种安全的.稳定的.简单 ...

  7. Linux 下找出超過某些容量的檔案

    找目前所在位置下,所有檔案大小超過3M的file,並列出檔名:大小 find . -type f -size +3M -exec ls -alh {} \; | awk '{print$9 " ...

  8. CAS单点登录配置

    见http://download.csdn.net/detail/u010786672/6942715下载.

  9. android-Intent and IntentFilter

    一.Intent简介 Android使用Intent来封装程序的调用"意图",Activity.Service.BroadcastReceiver三种重要的组件都是依靠Intent ...

  10. Hadoop.2.x_WordCount本地测试示例

    代码如下, 后备参考: package com.bigdata.hadoop.hdfs; import java.io.IOException; import org.apache.hadoop.co ...