Python API vs C++ API of TensorRT

Python API vs C++ API of TensorRT

本质上，C++ API和Python API应该在支持您的需求方面接近相同。pythonapi的主要优点是数据预处理和后处理都很容易使用，因为您可以使用各种库，如NumPy和SciPy。在安全性很重要的情况下，例如，在汽车中，C++ API应该被使用。有关C++ API的更多信息，请参见使用C++ API。

有关如何使用Python优化性能的更多信息，请参阅how Do I optimize My Python performance？来自TensorRT最佳实践指南。

Procedure

Import TensorRT:

import tensorrt as trt

Implement a logging interface through which TensorRT reports errors, warnings, and informational messages. The following code shows how to implement the logging interface. In this case, we have suppressed informational messages, and report only warnings and errors. There is a simple logger included in the TensorRT Python bindings.

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

使用TensorRT执行推理的第一步是从模型创建一个TensorRT网络。

实现这一点的最简单方法是使用TensorRT解析器库导入模型, (see Importing A Model Using A Parser In Python, Importing From Caffe Using Python, Importing From TensorFlow Using Python, and Importing From ONNX Using Python), which supports serialized models in the following formats:

Caffe (both BVLC and NVCaffe)
Supports ONNX releases up to ONNX 1.6, and ONNX opsets 7 to 11, and
UFF (used for TensorFlow)

An alternative is to define the model directly using the TensorRT Network API, (see Creating A Network Definition From Scratch Using The Python API). This requires you to make a small number of API calls to define each layer in the network graph and to implement your own import mechanism for the model’s trained parameters.

下面的步骤说明了如何使用OnnxParser和pythonapi直接导入ONNX模型。关于这个任务

有关更多信息，请参阅使用Python（introductive_parser_samples）示例将Caffe、TensorFlow和ONNX模型导入TensorRT的简介。

注：

一般来说，较新版本的OnnxParser设计为向后兼容，因此，遇到由早期版本的ONNX导出器生成的模型文件不会引起问题。当更改不向后兼容时，可能会出现一些异常。在本例中，将早期的ONNX模型文件转换为更高版本的支持版本。有关此主题的详细信息，请参阅ONNX Model Opset Version Converter。

用户模型也有可能是由一个支持比TensorRT附带的ONNX解析器支持的操作集更新的导出工具生成的。在这种情况下，检查发布到GitHub的TensorRT的最新版本onnx-tensorrt是否支持所需的版本。有关更多信息，请参阅Python中使用Object Detection With The ONNX TensorRT Backend In Python (yolov3_onnx)示例。

支持的版本由onnx_trt中的BACKEND_optset_version变量定义_后端.cpp. 从GitHub下载并构建最新版本的ONNXTensorRT解析器。可以在这里找到构建说明：用于TensorRT backend for ONNX后端。

在tensorrt7.0中，ONNX解析器只支持全维模式，这意味着必须使用explicitBatch标志集创建网络定义。有关详细信息，请参见使用动态形状Working With Dynamic Shapes。

3. Building An Engine In Python

builder的功能之一是搜索其CUDA内核目录，以获得最快的实现，因此有必要使用相同的GPU来构建优化引擎将运行的GPU。

关于这个任务

IBuilderConfig有许多属性，您可以设置这些属性来控制诸如网络运行的精度，以及自动调整参数，例如在确定哪个内核最快时，TensorRT应该为每个内核计时多少次（迭代次数越多，运行时间越长，但是对噪声的敏感度较低。）您还可以查询构建器，以找出硬件本机支持哪些混合精度类型。

一个特别重要的属性是最大工作空间大小。

层算法通常需要临时工作空间。此参数限制网络中任何层可以使用的最大大小。如果提供的划痕不足，则可能是TensorRT无法找到给定层的实现。

有关用Python构建引擎的更多信息，请参阅使用Python（introductive_parser_samples）示例将Caffe、TensorFlow和ONNX模型导入TensorRT的简介。

Procedure

Build the engine using the builder object:

with trt.Builder(TRT_LOGGER) as builder, builder.create_builder_config() as config:

config.max_workspace_size = 1 << 20 # This determines the amount of memory available to the builder when building an optimized engine and should generally be set as high as possible.

with builder.build_engine(network, config) as engine:

# Do inference here.

When the engine is built, TensorRT makes copies of the weights.

Perform inference. To perform inference, follow the instructions outlined in Performing Inference In Python.
4. Serializing A Model In Python

从这里开始，您可以序列化引擎，也可以直接使用引擎进行推断。在使用模型进行推理之前，序列化和反序列化模型是可选的步骤—如果需要，可以直接使用引擎对象进行推理。

关于这个任务

序列化时，您正在将引擎转换为一种格式，以便存储并在以后用于推断。要用于推断，只需对引擎进行反序列化。序列化和反序列化是可选的。由于从网络定义创建引擎可能很耗时，因此可以避免每次应用程序重新运行时重新生成引擎，方法是序列化一次引擎并在推断时反序列化它。因此，在构建引擎之后，用户通常希望序列化它以备以后使用。

注意：序列化引擎不能跨平台或TensorRT版本移植。引擎是特定于确切的GPU模型，他们是建立在（除了平台和TensorRT版本）。

Serialize the model to a modelstream:

serialized_engine = engine.serialize()

Deserialize modelstream to perform inference. Deserializing requires creation of a runtime object:

with trt.Runtime(TRT_LOGGER) as runtime: engine = runtime.deserialize_cuda_engine(serialized_engine)

It is also possible to save a serialized engine to a file, and read it back from the file:

Serialize the engine and write to a file:

with open(“sample.engine”, “wb”) as f:

f.write(engine.serialize())

Read the engine from the file and deserialize:

with open(“sample.engine”, “rb”) as f, trt.Runtime(TRT_LOGGER) as runtime:

engine = runtime.deserialize_cuda_engine(f.read())

5. Performing Inference In Python

下面的步骤说明了如何在Python中执行推理，现在您有了一个引擎。

Procedure

为输入和输出分配一些主机和设备缓冲区。本例假设context.all_binding_dimensions == True，并且引擎在binding_index=0时有一个输入，在binding_index=1时有一个输出：

# Determine dimensions and create page-locked memory buffers (i.e. won't be swapped to disk) to hold host inputs/outputs.

h_input = cuda.pagelocked_empty(trt.volume(context.get_binding_shape(0)), dtype=np.float32)

h_output = cuda.pagelocked_empty(trt.volume(context.get_binding_shape(1)), dtype=np.float32)

# Allocate device memory for inputs and outputs.

d_input = cuda.mem_alloc(h_input.nbytes)

d_output = cuda.mem_alloc(h_output.nbytes)

# Create a stream in which to copy inputs/outputs and run inference.

stream = cuda.Stream()

创建一些空间来存储中间激活值。由于引擎保存网络定义和训练参数，因此需要额外的空间。这些在执行上下文中保存：

with engine.create_execution_context() as context:

# Transfer input data to the GPU.

cuda.memcpy_htod_async(d_input, h_input, stream)

# Run inference.

context.execute_async_v2(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)

# Transfer predictions back from the GPU.

cuda.memcpy_dtoh_async(h_output, d_output, stream)

# Synchronize the stream

stream.synchronize()

# Return the host output.

return h_output

一个引擎可以有多个执行上下文，允许一组权重用于多个重叠的推理任务。例如，您可以使用一个引擎和每个流一个上下文来处理并行CUDA流中的图像。每个上下文都将在与引擎相同的GPU上创建。

Python API vs C++ API of TensorRT的更多相关文章

python访问cloudstack的api接口
1.CloudStack API 如同 AWS API 一样,CloudStack API 也是基于 Web Service,可以使用任何一种支持 HTTP 调用的语言(例如 Java,python, ...
python调用openstack的api，create_instance的程序解析
python调用openstack的api,create_instance的程序解析 2017年10月17日 15:27:24 CloudXli 阅读数:848 版权声明:本文为博主原创文章,未经 ...
Python+Flask搭建mock api server
Python+Flask搭建mock api server 前言: 近期由于工作需要,需要一个Mock Server调用接口直接返回API结果: 假如可以先通过接口文档的定义,自己模拟出服务器返回结果 ...
Python获得百度统计API的数据并发送邮件
Python获得百度统计API的数据并发送邮件小工具本来这么晚是不准备写博客的,当是想到了那个狗子绝对会在开学的时候跟我逼逼这个事情,所以,还是老老实实地写一下吧. Baidu统计API的使 ...
python操作三大主流数据库(12)python操作redis的api框架redis-py简单使用
python操作三大主流数据库(12)python操作redis的api框架redis-py简单使用 redispy安装安装及简单使用:https://github.com/andymccurdy/r ...
Python 调用图像融合API
Python 调用图像融合API 本文记录使用Python,调用腾讯AI开放平台的图像融合API.官网给出的Demo用的是PHP,博主作为Python的粉丝,自然想用它来和『最好的』的语言一较高下,顺 ...
Python自动化开发 - RESTful API
本节内容 1. RESTful 简介 2. RESTful 设计指南 3. Django REST Framework 最佳实践 4. 理论拓展与开放平台 5. API文档化与测试一 R ...
P4python: python interface to Perforce API
P4python is the python interface to Perforce API, it helps to do Perforce operations through python. ...
Python多线程豆瓣影评API接口爬虫
爬虫库使用简单的requests库,这是一个阻塞的库,速度比较慢. 解析使用XPATH表达式总体采用类的形式多线程使用concurrent.future并发模块,建立线程池,把future对象 ...

随机推荐

hdu4496并查集的删边操作
题意: 给你一个图,问你删除一些边后还有几个连通快.. 思路: 典型的并查集删边操作,并查集的删边就是先把不删除的边并查集一边(本题没有不删除的边),然后逆序吧所有要删除的边以 ...
进程保护原理Hook函数Openprocess
Win32子系统: ...
Windows 怎么知道我已经连接到互联网而不是局域网? 原来当中大有文章!
Windows 怎么知道我已经连接到互联网而不是局域网? 原来当中大有文章! 转载原文章地址:点击 2014-01-09 Windows 怎么知道我已经连接到互联网而不是局域网? 原来当中大有文章! ...
canvas绘制虚线图表
最近有读者加我微信咨询这个问题,如下图所示: 要实现的效果如下: 其实难度不大,但是考虑一些人员对于canvas不熟悉,还是简单的介绍下. 其实该图表,就是一个圆圈外面在套一个圆弧的效果, 主要的难点 ...
[LeetCode每日一题]80. 删除有序数组中的重复项 II
[LeetCode每日一题]80. 删除有序数组中的重复项 II 问题给你一个有序数组 nums ,请你原地删除重复出现的元素,使每个元素最多出现两次 ,返回删除后数组的新长度. 不要使用额外 ...
Elasticsearch exception [type=mapper_parsing_exception, reason=No type specified for field [X]
可能原因是实体类属性没有指定映射类型创建mapping时需要指定field的type,如果不指定则报错错误 //这是一个类中的字段 @Field(store = false) private St ...
（一）安装mysql
数据库数据库的相关概念 DB:数据库(database):存储数据的"仓库".它保存了一系列有组织的数据. DBMS:数据库管理系统(Database Management Sy ...
Java中实现SAX解析xml文件到MySQL数据库
大致步骤: 1.Java bean 2.DBHelper.java 3.重写DefaultHandler中的方法:MyHander.java 4.循环写数据库:SAXParserDemo.java ① ...
10.qml-组件、Loader、Component介绍
1.组件介绍一个组件通常由一个qml文件定义(单独文件定义组件), 实际也可以在qml里面通过Component对象来嵌入式定义组件 (4小节讲解). Component对象封装的内容默认不会显示, ...
Spring Cloud Gateway 之获取请求体（Request Body）的几种方式
Spring Cloud Gateway 获取请求体一.直接在全局拦截器中获取,伪代码如下 private String resolveBodyFromRequest(ServerHttpReque ...

Python API vs C++ API of TensorRT

Procedure

Python API vs C++ API of TensorRT的更多相关文章

随机推荐

热门专题