NSight统计数据的颜色,缩写意义是什么?来自NV Jeff Kiel 比较官方的解释!
结合这个图示来看:https://dl.dropboxusercontent.com/u/32077444/nsight.pdf
1) The bars you see in the Summary Page of the Profiler represent the % bottlenecked that unit was for the selected draw call(s). This gives you a feel for which part of the pipeline to go after for optimization opportunities, rather than just trying things and seeing if the FPS changes. So, in your case, you are showing ~75-80%, which means you can try and improve your shader source and that should help the performance of the 5 draw calls in your selected Draw Call Group. Note that a unit doesn’t have to be a 100% bottleneck for it to be worth investigating for changes. Even if it is a bottleneck 10% of the time it still prevented you from achieving the optimal throughput for a given call, so if you are, say, 20% texture bound you can still investigate the standard optimizations like filtering and mipmapping to see how it impacts perf.
2) The gaps in the Frame Timings graph are sometimes uncontrollable. It can be helpful to run an analysis session on your frames to get a feel for how full your command buffers might be and what might cause the gap (such as resource uploads, etc.). We don’t really give out more details in that screen and without a repro it is hard to tell exactly what caused the gap.
3) You asked about the 3 timing values in the Frame Timings and what might be considered “good”. The 3 values represent 2 ways to measure the draw call timing and 1 calculated value:
a.EPC/Empty Pipeline Cost: This is measuring each draw call, one at a time, as it flows from the top to the bottom of the pipe. We add a flush before and after each call so you can consider this an absolute cost for the draw call, not taking anything else like pipeline width, resource contention (both positive and negative), etc. into account. This is helpful to know how much each draw call costs in isolation.
b. FPC/Full Pipeline Cost: We measure this value with all draw calls in flight but bookended by pipeline reports that give us the start time for each draw call (first vertex being processed) to the end (last fragment being retired to the frame buffer). This means that any resource contention such as hitting the texture unit and either warming or dirtying the cache, having so many threads around that the shader units are fully occupied and cannot start on new work, is all taken into account. This gives you a “real world” cost for every draw call.
c. IDC/Incremental Draw Cost: This is a calculated value that takes into account any overlap you might see in draw calls. Say you have 2 identical draw calls, each one basically takes up ½ of the full pipeline width. Each one’s EPC and FPC are likely to be very close, but if they only take up ½ of the width the incremental or additional cost of that second might actually be 0…it is able to be executed fully in parallel with the first call. So, the FPS would be the same, 1 draw or 2, and the IDC would be full cost for the first call and 0 for the second.
4) On the Memory Screen, you asked if there was a breakdown per shader or draw request. This is what we have the state buckets for. By pressing the button on the tool bar, you can group draw calls be shared state (in this case you can say the shader in question) and then you will see the stats for just those draw calls. You can also do it based on performance markers, so you can group them pretty much however you want.
5) On the Memory Screen, you asked if the 330k was read or write and it is the sum. We don’t yet break out read vs write but could consider it for a future enhancement.
6) Your other question on the Memory screen was what the 3.6GB of bandwidth between L2 and Memory was and that is the number of bytes written. I must confess that I am puzzled by the number because it should be basically the sum of write operations that go through the L2 and most of them should come via the Framebuffer unit. If I can get access to your app it would help me understand if we have a bug there or just a number that isn’t reported.
7) On the Bottleneck screen, you asked about drilling into the shader bottleneck information. We don’t currently support this but it is a feature that we have considered and already laid some of the ground work for in our CUDA tools. I will add you to the list of requestors for that capability.
8) You asked how the Framebuffer could be a bottleneck if rendering a full screen quad and that is because in NVIDIA language, the Framebuffer represents basically the memory controller. All requests for memory, from the blending unit, texture unit, shader, etc. all go through the Framebuffer unit. Are you doing lots of lookups in draw call 116?
9) Utilization is generally trying to show you how much of the available horsepower you used for the amount of time the draw call took. To gain details I would need to know what your workload was and possibly sample additional data, but it is possible the shader unit is underutilized because it was bottlenecked waiting for data inside of the shader unit, like L1 values to return, local memory, or other resource contention.
NSight统计数据的颜色,缩写意义是什么?来自NV Jeff Kiel 比较官方的解释!的更多相关文章
- mysql按月,按日分组统计数据
group by DATE_FORMAT(createtime,'%Y-%m');//按月统计数据 group by DATE_FORMAT(createtime,'%Y-%m-%d');//按天统计 ...
- mysql的if用法解决同一张数据表里面两个字段是否相等统计数据量。
MySQL的使用用法如下所示:格式:if(Condition,A,B)意义:当Condition为true时,返回A:当Condition为false时,返回B.作用:作为条件语句使用.mysql的i ...
- 转载:SQL按照日、周、月、年统计数据的方法
转载源:http://www.jb51.net/article/42613.htm SQL按照日.周.月.季度.年统计数据的方法 方式一: --按日 select sum(consume),day([ ...
- mysql如何按周统计数据?
转自:https://www.cnblogs.com/wanghetao/p/3920124.html MySql 按周/月/日统计数据的方法 知识关键词:DATE_FORMAT select DA ...
- sar网络统计数据
sar是一个研究磁盘I/O的优秀工具.以下是sar磁盘I/O输出的一个示例. 第一行-d显示磁盘I/O信息,5 2选项是间隔和迭代,就像sar数据收集器那样.表3-3列出了字段和说明. 表3-3 ...
- IC卡复位应答ATR的数据元和它们的意义
ISO/IEC 7816-3标准中对ATR的数据串和数据元做了规定和描述.ATR的数据元和它们的意义: 数据元 说明 TS 起始字符 T0 格式字符 TA1,TB1,TC1,TD1,... 接口字符 ...
- sql不重复的查找统计数据(经典)
例表如下: 表名:MYTEST TID COL1 COL2 COL3 1 1 A A2 1 ...
- (转载)MySQL 统计数据行数 Select Count
(转载)http://www.5idev.com/p-php_mysql_select_count.shtml 统计数据行数 SELECT COUNT() FROM 语法用于从数据表中统计数据行数. ...
- HDU 2017 一系列统计数据
一系列统计数据 Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others) Total Su ...
随机推荐
- Python 进阶 之 协程
协程的概念级描述(与线程对比):转自知乎 链接 线程有两个必须要处理的问题:一是碰着阻塞式I\O会导致整个进程被挂起: 二是由于缺乏时钟阻塞,进程需要自己拥有调度线程的能力. 如果一种实现使得每个线程 ...
- 极光推送配置(Android Studio),亲测有效
进行到这里就可以接收到通知了,但是如果你还想根据接收的消息做点什么 step8: public class MyReceiver extends BroadcastReceiver { private ...
- owasp zap 安全审计工具 功能详解
一.persist session 该功能主要保存扫描分析的结果,方便下次继续分析 二.扫描策略 1.修改策略 A.入口 B.具体设置页面 C.设置完成后,发起主动扫描,在弹出的窗口可以选择策略 D. ...
- [mysql] 添加用户,赋予不同的管理权限
增加新用户格式:grant 权限 on 数据库.* to 用户名@登录主机 identified by “密码”如,增加一个用户user1密码为password1,让其可以在本机上登录, 并对所有数 ...
- hdu6071(最短路)
hdu6071 题意 四个点连接形成一个环,给出相邻两个点的距离,求从点 \(2\) 出发再回到 \(2\) 的路程大于等于 \(K\) 的最小值. 分析 首先我们让 \(w=min(d12, d23 ...
- LCA【SP913】Qtree - Query on a tree II
Description 给定一棵n个点的树,边具有边权.要求作以下操作: DIST a b 询问点a至点b路径上的边权之和 KTH a b k 询问点a至点b有向路径上的第k个点的编号 有多组测试数据 ...
- java应用高内存占用
在java虚拟机中,内存分为三个代:新生代(New), 老生代(Old).永久代(Perm) 新生代: 新建的对象都存放这里老生代:存放从新生代中迁移过来的生命周期较久的对象.新生代和老生代共同组成了 ...
- Ubuntu 终端常用命令
文件目录类 1.建立目录:mkdir 目录名 2.删除空目录:rmdir 目录名 3.无条件删除子目录: rm -rf 目录名 4.改变当前目录:cd 目录名 (进入用户home目录:cd ~;进入上 ...
- 【2-SAT】URAL - 2089 - Experienced coach
题意:给出n对点a,b 要求从没对点中选出一个,且最终选出的点n个数不能存在相同的.输入数据满足每种数最多出现3次,最少出现1次 思路:第i对点的编号2*i, 2*i+1, 因为每个数最多出现3 ...
- 【树状数组】Codeforces Round #755 D. PolandBall and Polygon
http://codeforces.com/problemset/problem/755/D 每次新画一条对角线的时候,考虑其跨越了几条原有的对角线. 可以用树状数组区间修改点查询来维护多边形的顶点. ...