Flink - metrics
Metrics是以MetricsGroup来组织的
MetricGroup
- MetricGroup
这就是个metric容器,里面可以放subGroup,或者各种metric
所以主要的接口就是注册,
- /**
- * A MetricGroup is a named container for {@link Metric Metrics} and further metric subgroups.
- *
- * <p>Instances of this class can be used to register new metrics with Flink and to create a nested
- * hierarchy based on the group names.
- *
- * <p>A MetricGroup is uniquely identified by it's place in the hierarchy and name.
- */
- public interface MetricGroup {
- <C extends Counter> C counter(int name, C counter);
- <T, G extends Gauge<T>> G gauge(int name, G gauge);
- <H extends Histogram> H histogram(String name, H histogram);
- MetricGroup addGroup(String name);
- }
- AbstractMetricGroup
关键是实现MetricGroup,逻辑很简单,在注册或close的时候都需要加锁互斥
- /**
- * Abstract {@link MetricGroup} that contains key functionality for adding metrics and groups.
- *
- */
- public abstract class AbstractMetricGroup implements MetricGroup {
- /** The registry that this metrics group belongs to */
- protected final MetricRegistry registry;
- /** All metrics that are directly contained in this group */
- private final Map<String, Metric> metrics = new HashMap<>();
- /** All metric subgroups of this group */
- private final Map<String, AbstractMetricGroup> groups = new HashMap<>();
- /** The metrics scope represented by this group.
- * For example ["host-7", "taskmanager-2", "window_word_count", "my-mapper" ]. */
- private final String[] scopeComponents; //命名空间
- /** The metrics scope represented by this group, as a concatenated string, lazily computed.
- * For example: "host-7.taskmanager-2.window_word_count.my-mapper" */
- private String scopeString;
- @Override
- public <C extends Counter> C counter(String name, C counter) {
- addMetric(name, counter);
- return counter;
- }
- /**
- * Adds the given metric to the group and registers it at the registry, if the group
- * is not yet closed, and if no metric with the same name has been registered before.
- *
- * @param name the name to register the metric under
- * @param metric the metric to register
- */
- protected void addMetric(String name, Metric metric) {
- // add the metric only if the group is still open
- synchronized (this) { //加锁
- if (!closed) {
- // immediately put without a 'contains' check to optimize the common case (no collition)
- // collisions are resolved later
- Metric prior = metrics.put(name, metric);
- // check for collisions with other metric names
- if (prior == null) {
- // no other metric with this name yet
- registry.register(metric, name, this);
- }
- else {
- // we had a collision. put back the original value
- metrics.put(name, prior);
- }
- }
- }
- }
- }
MetricReporter
采集好的Metrics需要用reporter才能发送出去,
- /**
- * Reporters are used to export {@link Metric Metrics} to an external backend.
- *
- * <p>Reporters are instantiated via reflection and must be public, non-abstract, and have a
- * public no-argument constructor.
- */
- public interface MetricReporter {
- // ------------------------------------------------------------------------
- // life cycle
- // ------------------------------------------------------------------------
- /**
- * Configures this reporter. Since reporters are instantiated generically and hence parameter-less,
- * this method is the place where the reporters set their basic fields based on configuration values.
- *
- * <p>This method is always called first on a newly instantiated reporter.
- *
- * @param config The configuration with all parameters.
- */
- void open(MetricConfig config);
- /**
- * Closes this reporter. Should be used to close channels, streams and release resources.
- */
- void close();
- void notifyOfAddedMetric(Metric metric, String metricName, MetricGroup group);
- void notifyOfRemovedMetric(Metric metric, String metricName, MetricGroup group);
- }
AbstractReporter实现MetricReport接口,
- /**
- * Base interface for custom metric reporters.
- */
- public abstract class AbstractReporter implements MetricReporter, CharacterFilter {
- protected final Logger log = LoggerFactory.getLogger(getClass());
- protected final Map<Gauge<?>, String> gauges = new HashMap<>();
- protected final Map<Counter, String> counters = new HashMap<>();
- protected final Map<Histogram, String> histograms = new HashMap<>();
- @Override
- public void notifyOfAddedMetric(Metric metric, String metricName, MetricGroup group) {
- final String name = group.getMetricIdentifier(metricName, this); //group只是用来获取metrics完整的name
- synchronized (this) {
- if (metric instanceof Counter) {
- counters.put((Counter) metric, name);
- } else if (metric instanceof Gauge) {
- gauges.put((Gauge<?>) metric, name);
- } else if (metric instanceof Histogram) {
- histograms.put((Histogram) metric, name);
- } else {
- log.warn("Cannot add unknown metric type {}. This indicates that the reporter " +
- "does not support this metric type.", metric.getClass().getName());
- }
- }
- }
- @Override
- public void notifyOfRemovedMetric(Metric metric, String metricName, MetricGroup group) {
- synchronized (this) {
- if (metric instanceof Counter) {
- counters.remove(metric);
- } else if (metric instanceof Gauge) {
- gauges.remove(metric);
- } else if (metric instanceof Histogram) {
- histograms.remove(metric);
- } else {
- log.warn("Cannot remove unknown metric type {}. This indicates that the reporter " +
- "does not support this metric type.", metric.getClass().getName());
- }
- }
- }
- }
MetricRegistry
MetricRegistry用于连接MetricGroups和MetricReporters,
会把需要report的metric加到MetricReporters,并启动定时的report线程
- /**
- * A MetricRegistry keeps track of all registered {@link Metric Metrics}. It serves as the
- * connection between {@link MetricGroup MetricGroups} and {@link MetricReporter MetricReporters}.
- */
- public class MetricRegistry {
- private List<MetricReporter> reporters;
- private ScheduledExecutorService executor;
- private final ScopeFormats scopeFormats;
- private final char delimiter;
- /**
- * Creates a new MetricRegistry and starts the configured reporter.
- */
- public MetricRegistry(Configuration config) {
- // first parse the scope formats, these are needed for all reporters
- ScopeFormats scopeFormats;
- try {
- scopeFormats = createScopeConfig(config); //从配置中读到scope的格式,即监控数据的namespace的格式是什么
- }
- catch (Exception e) {
- LOG.warn("Failed to parse scope format, using default scope formats", e);
- scopeFormats = new ScopeFormats();
- }
- this.scopeFormats = scopeFormats;
- char delim;
- try {
- delim = config.getString(ConfigConstants.METRICS_SCOPE_DELIMITER, ".").charAt(0); //从配置里面读出分隔符
- } catch (Exception e) {
- LOG.warn("Failed to parse delimiter, using default delimiter.", e);
- delim = '.';
- }
- this.delimiter = delim;
- // second, instantiate any custom configured reporters
- this.reporters = new ArrayList<>();
- final String definedReporters = config.getString(ConfigConstants.METRICS_REPORTERS_LIST, null); //读出配置的Reporters
- if (definedReporters == null) {
- // no reporters defined
- // by default, don't report anything
- LOG.info("No metrics reporter configured, no metrics will be exposed/reported.");
- this.executor = null;
- } else {
- // we have some reporters so
- String[] namedReporters = definedReporters.split("\\s*,\\s*");
- for (String namedReporter : namedReporters) { //对于配置的每个reporter
- DelegatingConfiguration reporterConfig = new DelegatingConfiguration(config, ConfigConstants.METRICS_REPORTER_PREFIX + namedReporter + ".");
- final String className = reporterConfig.getString(ConfigConstants.METRICS_REPORTER_CLASS_SUFFIX, null); //reporter class名配置
- try {
- String configuredPeriod = reporterConfig.getString(ConfigConstants.METRICS_REPORTER_INTERVAL_SUFFIX, null); //report interval配置
- TimeUnit timeunit = TimeUnit.SECONDS;
- long period = 10;
- if (configuredPeriod != null) {
- try {
- String[] interval = configuredPeriod.split(" ");
- period = Long.parseLong(interval[0]);
- timeunit = TimeUnit.valueOf(interval[1]);
- }
- catch (Exception e) {
- LOG.error("Cannot parse report interval from config: " + configuredPeriod +
- " - please use values like '10 SECONDS' or '500 MILLISECONDS'. " +
- "Using default reporting interval.");
- }
- }
- Class<?> reporterClass = Class.forName(className);
- MetricReporter reporterInstance = (MetricReporter) reporterClass.newInstance(); //实例化reporter
- MetricConfig metricConfig = new MetricConfig();
- reporterConfig.addAllToProperties(metricConfig);
- reporterInstance.open(metricConfig); //open reporter
- if (reporterInstance instanceof Scheduled) {
- if (this.executor == null) {
- executor = Executors.newSingleThreadScheduledExecutor(); //创建Executor
- }
- LOG.info("Periodically reporting metrics in intervals of {} {} for reporter {} of type {}.", period, timeunit.name(), namedReporter, className);
- executor.scheduleWithFixedDelay(
- new ReporterTask((Scheduled) reporterInstance), period, period, timeunit); //Scheduled report
- }
- reporters.add(reporterInstance); //加入reporters列表
- }
- catch (Throwable t) {
- shutdownExecutor();
- LOG.error("Could not instantiate metrics reporter" + namedReporter + ". Metrics might not be exposed/reported.", t);
- }
- }
- }
- }
- // ------------------------------------------------------------------------
- // Metrics (de)registration
- // ------------------------------------------------------------------------
- /**
- * Registers a new {@link Metric} with this registry.
- *
- * @param metric the metric that was added
- * @param metricName the name of the metric
- * @param group the group that contains the metric
- */
- public void register(Metric metric, String metricName, MetricGroup group) { //在AbstractMetricGroup.addMetric中被调用,metric被加到group的同时也会加到reporter中
- try {
- if (reporters != null) {
- for (MetricReporter reporter : reporters) {
- if (reporter != null) {
- reporter.notifyOfAddedMetric(metric, metricName, group); //把metric加到每个reporters上面
- }
- }
- }
- } catch (Exception e) {
- LOG.error("Error while registering metric.", e);
- }
- }
- /**
- * Un-registers the given {@link org.apache.flink.metrics.Metric} with this registry.
- *
- * @param metric the metric that should be removed
- * @param metricName the name of the metric
- * @param group the group that contains the metric
- */
- public void unregister(Metric metric, String metricName, MetricGroup group) {
- try {
- if (reporters != null) {
- for (MetricReporter reporter : reporters) {
- if (reporter != null) {
- reporter.notifyOfRemovedMetric(metric, metricName, group);
- }
- }
- }
- } catch (Exception e) {
- LOG.error("Error while registering metric.", e);
- }
- }
- // ------------------------------------------------------------------------
- /**
- * This task is explicitly a static class, so that it does not hold any references to the enclosing
- * MetricsRegistry instance.
- *
- * This is a subtle difference, but very important: With this static class, the enclosing class instance
- * may become garbage-collectible, whereas with an anonymous inner class, the timer thread
- * (which is a GC root) will hold a reference via the timer task and its enclosing instance pointer.
- * Making the MetricsRegistry garbage collectible makes the java.util.Timer garbage collectible,
- * which acts as a fail-safe to stop the timer thread and prevents resource leaks.
- */
- private static final class ReporterTask extends TimerTask {
- private final Scheduled reporter;
- private ReporterTask(Scheduled reporter) {
- this.reporter = reporter;
- }
- @Override
- public void run() {
- try {
- reporter.report(); //Task的核心就是调用reporter.report
- } catch (Throwable t) {
- LOG.warn("Error while reporting metrics", t);
- }
- }
- }
- }
TaskManager
在TaskManager中,
- associateWithJobManager
- metricsRegistry = new FlinkMetricRegistry(config.configuration)
- taskManagerMetricGroup =
- new TaskManagerMetricGroup(metricsRegistry, this.runtimeInfo.getHostname, id.toString)
- TaskManager.instantiateStatusMetrics(taskManagerMetricGroup)
创建metricsRegistry 和TaskManagerMetricGroup
可以看到instantiateStatusMetrics,只是注册各种taskManager的status metrics,
- private def instantiateStatusMetrics(taskManagerMetricGroup: MetricGroup) : Unit = {
- val jvm = taskManagerMetricGroup
- .addGroup("Status")
- .addGroup("JVM")
- instantiateClassLoaderMetrics(jvm.addGroup("ClassLoader"))
- instantiateGarbageCollectorMetrics(jvm.addGroup("GarbageCollector"))
- instantiateMemoryMetrics(jvm.addGroup("Memory"))
- instantiateThreadMetrics(jvm.addGroup("Threads"))
- instantiateCPUMetrics(jvm.addGroup("CPU"))
- }
- private def instantiateClassLoaderMetrics(metrics: MetricGroup) {
- val mxBean = ManagementFactory.getClassLoadingMXBean //从ManagementFactory可以取出表示JVM指标的MXBean
- metrics.gauge[Long, FlinkGauge[Long]]("ClassesLoaded", new FlinkGauge[Long] {
- override def getValue: Long = mxBean.getTotalLoadedClassCount
- })
- metrics.gauge[Long, FlinkGauge[Long]]("ClassesUnloaded", new FlinkGauge[Long] {
- override def getValue: Long = mxBean.getUnloadedClassCount
- })
- }
在submitTask的时候,
- submitTask
- val taskMetricGroup = taskManagerMetricGroup.addTaskForJob(tdd)
- val task = new Task(
- tdd,
- memoryManager,
- ioManager,
- network,
- bcVarManager,
- selfGateway,
- jobManagerGateway,
- config.timeout,
- libCache,
- fileCache,
- runtimeInfo,
- taskMetricGroup)
看到会为每个task,创建taskMetricGroup
并在创建Task对象的时候传入该对象,
- Environment env = new RuntimeEnvironment(jobId, vertexId, executionId,
- executionConfig, taskInfo, jobConfiguration, taskConfiguration,
- userCodeClassLoader, memoryManager, ioManager,
- broadcastVariableManager, accumulatorRegistry,
- splitProvider, distributedCacheEntries,
- writers, inputGates, jobManager, taskManagerConfig, metrics, this);
- // let the task code create its readers and writers
- invokable.setEnvironment(env);
在Task中, 关键的就是把这个taskMetricGroup,加入RuntimeEnvironment,这样在实际逻辑中,就可以通过RuntimeEnvironment获取到metrics
而StreamTask就是一种Invokable,接口定义如下
- public abstract class AbstractInvokable {
- /** The environment assigned to this invokable. */
- private Environment environment;
- /**
- * Starts the execution.
- *
- * <p>Must be overwritten by the concrete task implementation. This method
- * is called by the task manager when the actual execution of the task
- * starts.
- *
- * <p>All resources should be cleaned up when the method returns. Make sure
- * to guard the code with <code>try-finally</code> blocks where necessary.
- *
- * @throws Exception
- * Tasks may forward their exceptions for the TaskManager to handle through failure/recovery.
- */
- public abstract void invoke() throws Exception;
- /**
- * Sets the environment of this task.
- *
- * @param environment
- * the environment of this task
- */
- public final void setEnvironment(Environment environment) {
- this.environment = environment;
- }
- /**
- * Returns the environment of this task.
- *
- * @return The environment of this task.
- */
- public Environment getEnvironment() {
- return this.environment;
- }
- }
所以在StreamTask里面可以这样使用metrics,
- getEnvironment().getMetricGroup().gauge("lastCheckpointSize", new Gauge<Long>() {
- @Override
- public Long getValue() {
- return StreamTask.this.lastCheckpointSize;
- }
- });
Flink - metrics的更多相关文章
- Flink Metrics 源码解析
Flink Metrics 有如下模块: Flink Metrics 源码解析 -- Flink-metrics-core Flink Metrics 源码解析 -- Flink-metrics-da ...
- 深入理解Flink ---- Metrics的内部结构
从Metrics的使用说起 Flink的Metrics种类有四种Counters, Gauges, Histograms和Meters. 如何使用Metrics呢? 以Counter为例, publi ...
- Flink – metrics V1.2
WebRuntimeMonitor .GET("/jobs/:jobid/vertices/:vertexid/metrics", handler(new JobVertexM ...
- Apache Flink 进阶(八):详解 Metrics 原理与实战
本文由 Apache Flink Contributor 刘彪分享,本文对两大问题进行了详细的介绍,即什么是 Metrics.如何使用 Metrics,并对 Metrics 监控实战进行解释说明. 什 ...
- Flink写入kafka时,只写入kafka的部分Partitioner,无法写所有的Partitioner问题
1. 写在前面 在利用flink实时计算的时候,往往会从kafka读取数据写入数据到kafka,但会发现当kafka多个Partitioner时,特别在P量级数据为了kafka的性能kafka的节点有 ...
- flink metric库的使用和自定义metric-reporter
简单介绍 flink内部实现了一套metric数据收集库. 同时flink自身系统有一些固定的metric数据, 包括系统的一些指标,CPU,内存, IO 或者各个task运行的一些指标.具体包含那些 ...
- Flink知识点
1. Flink.Storm.Sparkstreaming对比 Storm只支持流处理任务,数据是一条一条的源源不断地处理,而MapReduce.spark只支持批处理任务,spark-streami ...
- Flink 灵魂两百问,这谁顶得住?
Flink 学习 https://github.com/zhisheng17/flink-learning 麻烦路过的各位亲给这个项目点个 star,太不易了,写了这么多,算是对我坚持下来的一种鼓励吧 ...
- Flink 从0到1学习 —— Flink 中如何管理配置?
前言 如果你了解 Apache Flink 的话,那么你应该熟悉该如何像 Flink 发送数据或者如何从 Flink 获取数据.但是在某些情况下,我们需要将配置数据发送到 Flink 集群并从中接收一 ...
随机推荐
- Robotium编写测试用例如何模拟Junit4的BeforeClass和AfterClass方法1 - 条件判断法
本文来源于:http://blog.csdn.net/zhubaitian/article/details/39293883 Robotium的测试类ActivityInstrumentationTe ...
- LSM树由来、设计思想以及应用到HBase的索引
讲LSM树之前,需要提下三种基本的存储引擎,这样才能清楚LSM树的由来: 哈希存储引擎 是哈希表的持久化实现,支持增.删.改以及随机读取操作,但不支持顺序扫描,对应的存储系统为key-value存储 ...
- The 2015 China Collegiate Programming Contest L. Huatuo's Medicine hdu 5551
Huatuo's Medicine Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 65535/65535 K (Java/Others ...
- 【原】iOS学习44之动画
1. 简单动画 1> UIImageView GIF 动画 GIF图的原理是:获取图片,存储在图片数组中,按照图片数组的顺序将图片以一定的速度播放 UIImageView *showGifima ...
- [转]七天学会NodeJS
转:http://nqdeng.github.io/7-days-nodejs/ NodeJS基础 什么是NodeJS JS是脚本语言,脚本语言都需要一个解析器才能运行.对于写在HTML页面里的JS, ...
- Codeforces Round #353 (Div. 2) A. Infinite Sequence
Vasya likes everything infinite. Now he is studying the properties of a sequence s, such that its fi ...
- locale的设定及其LANG、LC_ALL、LANGUAGE环境变量的区别
locale这个单词中文翻译成地区或者地域,其实这个单词包含的意义要宽泛很多.Locale是根据计算机用户所使用的语言,所在国家或者地区,以及当地的文化传统所定义的一个软件运行时的语言环境. [ora ...
- db2工具优化
- CSS+DIV两栏式全屏布局
在网上找了很久,发现大部分都是固定宽度设置两栏,全屏情况下的布局很少.最后终于完成了,写出来备查,也供大家参考. <!DOCTYPE html PUBLIC "-//W3C//DTD ...
- RCA:未注意Curl-library Post 1024以上字节时的HTTP/1.1特性导致 HessianPHP 传输数据失败
先列出 HessianPHP 的错误提示: CURL transport error: transfer closed with outstanding read data remaining 基础知 ...