一、基本用法

首先看 Profiler 的用法：

with ms.Profiler() as profiler:

    # .... 用户代码

print("Tuning Time:")

print(profiler.table())

二、前端接口设计

其中 Profiler 类的设计是绑定和映射到了 C++ 端的接口上。Profile 提供了 Context 语义，支持 with 语句。

@register_object("meta_schedule.Profiler")

class Profiler(Object):

    """Tuning time profiler."""

    def __init__(self) -> None:

        self.__init_handle_by_constructor__(

            _ffi_api.Profiler,  # type: ignore # pylint: disable=no-member

        )

    def get(self) -> Dict[str, float]:

        """Get the profiling results in seconds"""

        return _ffi_api.ProfilerGet(self)  # type: ignore # pylint: disable=no-member

    def table(self) -> str:

        """Get the profiling results in a table format"""

        return _ffi_api.ProfilerTable(self)  # type: ignore # pylint: disable=no-member

    def __enter__(self) -> "Profiler":

        """Entering the scope of the context manager"""

        _ffi_api.ProfilerEnterWithScope(self)  # type: ignore # pylint: disable=no-member

        return self

    def __exit__(self, ptype, value, trace) -> None:

        """Exiting the scope of the context manager"""

        _ffi_api.ProfilerExitWithScope(self)  # type: ignore # pylint: disable=no-member

    @staticmethod

    def current() -> Optional["Profiler"]:

        """Get the current profiler."""

        return _ffi_api.ProfilerCurrent()  # type: ignore # pylint: disable=no-member

    @staticmethod

    def timeit(name: str):

        """Timeit a block of code"""

        @contextmanager

        def _timeit():

            try:

                f = _ffi_api.ProfilerTimedScope(name)  # type: ignore # pylint: disable=no-member

                yield

            finally:

                if f:

                    f()

        return _timeit()

其中 enter 调用时会执行 ProfilerEnterWithScope 其负责往一个 Stack 式的结构中添加一个 Profiler 对象：

void Profiler::EnterWithScope() {

  ThreadLocalProfilers()->push_back(*this);

  (*this)->total_timer = ProfilerTimedScope("Total");

}

在退出 with 语句时，调用 exit 执行 ProfilerExitWithScope 会调用 ProfilerTimedScope() 对象返回的函数，触发当前层计时结束。

profiler.table() 负责收集所有的耗时统计信息，并按照预定义的格式 format 展示给用户。

// std::unordered_map<std::string, double> stats_sec; << 这家伙是被所有的Profiler共享的

// runtime::PackedFunc total_timer; << 负责外层的整体耗时计算

String ProfilerNode::Table() const {

  CHECK(!stats_sec.empty()) << "ValueError: The stats are empty. Please run the profiler first.";

  CHECK(stats_sec.count("Total"))

      << "ValueError: The total time is not recorded. This method should be called only after "

         "exiting the profiler's with scope.";

  double total = stats_sec.at("Total");

  struct Entry {

    String name;

    double minutes;

    double percentage;

    bool operator<(const Entry& other) const { return percentage > other.percentage; }

  };

  std::vector<Entry> table_entry;

  for (const auto& kv : stats_sec) {

    table_entry.push_back(Entry{kv.first, kv.second / 60.0, kv.second / total * 100.0});

  }

  std::sort(table_entry.begin(), table_entry.end());

  support::TablePrinter p;

  p.Row() << "ID"

          << "Name"

          << "Time (min)"

          << "Percentage";

  p.Separator();

  for (int i = 0, n = table_entry.size(); i < n; ++i) {

    if (i == 0) {

      p.Row() << "" << table_entry[i].name << table_entry[i].minutes << table_entry[i].percentage;

    } else {

      p.Row() << i << table_entry[i].name << table_entry[i].minutes << table_entry[i].percentage;

    }

  }

  p.Separator();

  return p.AsStr();

}

三、后端接口设计

// TVM 框架中某个函数

void Apply(const TaskScheduler& task_scheduler, int task_id,

             const Array<MeasureCandidate>& measure_candidates,

             const Array<BuilderResult>& builder_results,

             const Array<RunnerResult>& runner_results) final {

    auto _ = Profiler::TimedScope("MeasureCallback/AddToDatabase");  // 构造时触发计时开始

    // 框架代码

  } // 出了函数的作用域，_ 对象会被析构，触发计时结束，计算duration，放到全局的table中

TimedScope的实现仅仅是返回了一个ScopedTimer对象，由ScopedTimer对象的析构函数负责触发「计时结束」。

ScopedTimer Profiler::TimedScope(String name) { return ScopedTimer(ProfilerTimedScope(name)); }

PackedFunc ProfilerTimedScope(String name) {

  if (Optional<Profiler> opt_profiler = Profiler::Current()) {           // <---- Profiler::Current()是一个「栈」设计，为了支持「嵌套统计」功能

    return TypedPackedFunc<void()>([profiler = opt_profiler.value(),                  //

                                    tik = std::chrono::high_resolution_clock::now(),  // <--- 创建一个函数deleter回调函数，借助lambda函数传递计时起始点

                                    name = std::move(name)]() {

      auto tok = std::chrono::high_resolution_clock::now();

      double duration =

          std::chrono::duration_cast<std::chrono::nanoseconds>(tok - tik).count() / 1e9;

      profiler->stats_sec[name] += duration;      // <----

    });

  }

  return nullptr;

}

TVM 中的 Profiler 是支持多重任意嵌套的，实现通过了一个 Vector<Profile> 模拟了栈了操作：

std::vector<Profiler>* ThreadLocalProfilers() {

  static thread_local std::vector<Profiler> profilers;   // <---- 支持嵌套，全局stack 结构

  return &profilers;

}

void Profiler::EnterWithScope() {

  ThreadLocalProfilers()->push_back(*this);     // 入栈

  (*this)->total_timer = ProfilerTimedScope("Total");

}

void Profiler::ExitWithScope() {

  ThreadLocalProfilers()->pop_back();   // 出栈

  if ((*this)->total_timer != nullptr) {

    (*this)->total_timer();

    (*this)->total_timer = nullptr;

  }

}

附录：飞桨 Profiler 设计

一、基本用法

如下是一个简单的使用样例：

import paddle.fluid as fluid

import paddle.fluid.profiler as profiler

profiler.start_profiler('GPU')    # <---- 开始记录

for iter in range(10):

    if iter == 2:

        profiler.reset_profiler()

    # except each iteration

profiler.stop_profiler('total', '/tmp/profile')   # <---- 结束并记录结果到文件里

"""

------------------------->     Profiling Report     <-------------------------

Place: CPU

Time unit: ms

Sorted by total time in descending order in the same thread

#Sorted by number of calls in descending order in the same thread

#Sorted by number of max in descending order in the same thread

#Sorted by number of min in descending order in the same thread

#Sorted by number of avg in descending order in the same thread

Event                       Calls       Total       Min.        Max.        Ave.        Ratio.

thread0::conv2d             8           129.406     0.304303    127.076     16.1758     0.983319

thread0::elementwise_add    8           2.11865     0.193486    0.525592    0.264832    0.016099

thread0::feed               8           0.076649    0.006834    0.024616    0.00958112  0.000582432

#### 2) sorted_key = None  ####

# Since the profiling results are printed in the order of first end time of Ops,

# the printed order is feed->conv2d->elementwise_add

------------------------->     Profiling Report     <-------------------------

Place: CPU

Time unit: ms

Sorted by event first end time in descending order in the same thread

Event                       Calls       Total       Min.        Max.        Ave.        Ratio.

thread0::feed               8           0.077419    0.006608    0.023349    0.00967738  0.00775934

thread0::conv2d             8           7.93456     0.291385    5.63342     0.99182     0.795243

thread0::elementwise_add    8           1.96555     0.191884    0.518004    0.245693    0.196998

"""

二、前端接口设计

上面涉及到了两个核心的接口：start_profiler 和 stop_profiler

def start_profiler(state, tracer_option='Default'):

    if state == "GPU":

        prof_state = core.ProfilerState.kCUDA

    # ......

    if tracer_option == "Default":

        prof_tracer_option = core.TracerOption.kDefault

    # .....

    core.set_tracer_option(prof_tracer_option)

    core.enable_profiler(prof_state)

def stop_profiler(sorted_key=None, profile_path='/tmp/profile'):

    core.disable_profiler(key_map[sorted_key], profile_path)

这两个接口内部实现都是通过「修改全局变量开关」来实现的，这些接口均通过 Pybind 映射到C++端：

static TracerOption g_tracer_option = TracerOption::kDefault;  // 全局静态变量

void SetTracerOption(TracerOption option) {

  std::lock_guard<std::mutex> l(profiler_mu);

  g_tracer_option = option;

}

class ProfilerHelper {

 public:

  // The profiler state, the initial value is ProfilerState::kDisabled

  static ProfilerState g_state;

  // 省略

}

三、后端接口设计

C++ 端的主要用法：

  LOG(INFO) << "Usage 2: RecordEvent";

  for (int i = 1; i < 5; ++i) {

    std::string name = "evs_op_" + std::to_string(i);

    RecordEvent record_event(name);

    int counter = 1;

    while (counter != i * 1000) counter++;

  }

可以看出提供的核心数据结构类是 RecordEvent ，也是借助对象的构造和析构来实现的。

// Default tracing level.

// It is Recommended to set the level explicitly.

static constexpr uint32_t kDefaultTraceLevel = 4;

class RecordEvent {

 public:

  static bool IsEnabled();

    explicit RecordEvent(

      const std::string& name,

      const TracerEventType type = TracerEventType::UserDefined,

      uint32_t level = kDefaultTraceLevel,

      const EventRole role = EventRole::kOrdinary);

    void End();

    ~RecordEvent() { End(); }

 private:

   std::string* name_{nullptr};

   uint64_t start_ns_;

   // 省略其他成员

 }

可以看出构造时初始化 start_ns_成员，在析构时调用 End 函数：

void RecordEvent::End() {

  uint64_t end_ns = PosixInNsec();

  HostEventRecorder<CommonEvent>::GetInstance().RecordEvent(

          shallow_copy_name_, start_ns_, end_ns, role_, type_);

}

从实现上看，框架层面所有的 Event 是保存到全局单例 HostEventRecorder 中的，包含了name、时间起点、终点、类型、角色：

template <typename EventType>

class HostEventRecorder {

 public:

  // singleton

  static HostEventRecorder &GetInstance() {

    static HostEventRecorder instance;

    return instance;

  }

TVM 中的 Profiler 设计的更多相关文章

自主数据类型：在TVM中启用自定义数据类型探索
自主数据类型:在TVM中启用自定义数据类型探索介绍在设计加速器时,一个重要的决定是如何在硬件中近似地表示实数.这个问题有一个长期的行业标准解决方案:IEEE 754浮点标准.1.然而,当试图通过构 ...
看懂此文，不再困惑于 JS 中的事件设计
看懂此文,不再困惑于 JS 中的事件设计今天刚在关注的微信公众号看到的文章,关于JS事件的,写的很详细也很容易理解,相关的知识点都有总结到,看完就有种很舒畅的感觉,该串起来的知识点都串起来了.反正一 ...
JS 中的事件设计
看懂此文,不再困惑于 JS 中的事件设计原文出处: aitangyong 抽空学习了下javascript和jquery的事件设计,收获颇大,总结此贴,和大家分享. (一)事件绑定的几种方式 ...
推荐一款基于Angular实现的企业级中后台前端/设计解决方案脚手架
ng-alain 是一个企业级中后台前端/设计解决方案脚手架,我们秉承 Ant Design 的设计价值观,目标也非常简单,希望在Angular上面开发企业后台更简单.更快速.随着『设计者』的不断反馈 ...
Ceph中Bufferlist的设计与使用
转自:https://www.ustack.com/blog/bufferlist/ 如果非要在整个Ceph中,找出一个类最重要,我觉得非Bufferlist莫属了,原因很简单,因为Bufferlis ...
mysql中数据库的设计
软件开发流程(CMMI): 1):项目启动; 2):项目计划: 3):需求分析; 需要得到的结果是什么? 4):系统设计; 该怎么做? 5):系统开发; 6):系统测试; 7):系 ...
在Eclipse中使用WindowBuilder设计Swing程序
在Eclipse中使用WindowBuilder设计Swing程序 Swing程序表示Java的客户端窗体程序,除了通过手动编写代码的方式设计Swing程序之外,Eclipse中还提供了一种W ...
Java实现图形界面的三部曲及IDE中的窗口设计
设计和实现图形用户界面的工作主要有以下几点: • (1)创建组件(Component) • 创建组成界面的各种元素,如按钮.文本框等.• (2)指定布局(Layout) • 根据具体需要排列它们的位置 ...
Norns.Urd 中的一些设计
Norns.Urd 是什么? Norns.Urd 是一个基于emit实现动态代理的轻量级AOP框架. 版本基于 netstandard2.0. 所以哪些.net 版本能用你懂的. 完成这个框架的目的主 ...
TVM中的调度原语
TVM中的调度原语 TVM是一种用于高效内核构造的领域专用语言. 本文将展示如何通过TVM提供的各种原语来调度计算. from __future__ import absolute_import, p ...

随机推荐

ET介绍——更为便捷高效的AI框架-行为机（Behavior Machine）
什么是行为机顾名思义,类比状态机每个节点是一个状态,行为机每个节点是描述一种行为.行为机每个节点之间是互斥的,并且节点相互之间完全不用关心是怎么切换的.这里就不讲状态机跟行为树是怎么做ai的了,这里 ...
JDK14性能管理工具:Jconsole详解
目录简介 JConsole 概览内存线程类 VM信息 MBean 总结简介我们在开发java项目的时候,或多或少都会去用到Java的性能管理工具.有时候是为了提升应用程序的性能,有时候是为 ...
Numpy结构化数组
Numpy结构化数组 Numpy的结构化数组和记录数组为复合的.异构的的数据提供了非常有效的存储. 结构化数组 In [1]: import numpy as np In [2]: name = [' ...
RabbitMQ 11 死信队列
死信队列概述消息队列中的数据,如果迟迟没有消费者来处理,就会一直占用消息队列的空间. 比如抢车票的场景,用户下单高铁票之后,会进行抢座,然后再进行付款,但是如果用户下单之后并没有及时的付款,这张票 ...
default_statistics_target参数对PG和MogDB性能影响测试和分析
default_statistics_target 参数对 PG 和 MogDB 性能影响测试和分析本文出处:https://www.modb.pro/db/230160 前段时间在某客户生产环境优 ...
手工安装部署openGauss3.0一主一备（非om工具安装）
手工安装部署 openGauss3.0 一主一备(非 om 工具安装) 本文出处:https://www.modb.pro/db/425385 一.操作系统配置(centos7.6) 1.关闭防火墙 ...
MySQL的下载、安装和配置
一.MySQL的下载下载地址:http://dev.mysql.com/downloads/mysql 进入下载页面,选择所需版本,这里示范MySQL8.0 图一选择版本,下载MSI(软件安装 ...
k8s之Pod
什么是Pod 通俗的来讲就是以pause为基础容器,其它容器共享pause容器的网络名称空间.主机名以及进程间通信,组成的一个逻辑的容器集合. • Kubernetes Pod是Kubernetes的 ...
动态规划（六）——树形dp
树形dp,又称树状dp,即在树上进行的dp,在设计动态规划算法时,一般就以节点从深到浅(子树从小到大)的顺序作为dp的"阶段",dp的状态表示中,第一维通常是节点编号(代表以该节点 ...
实战：如何编写一个 OpenTelemetry Extensions
前言前段时间我们从 SkyWalking 切换到了 OpenTelemetry ,与此同时之前使用 SkyWalking 编写的插件也得转移到 OpenTelemetry 体系下. 我也写了相关介绍 ...

TVM 中的 Profiler 设计

一、基本用法

二、前端接口设计

三、后端接口设计

附录：飞桨 Profiler 设计

一、基本用法

二、前端接口设计

三、后端接口设计

TVM 中的 Profiler 设计的更多相关文章

随机推荐

热门专题