D3D9 GPU Hacks (转载)
D3D9 GPU Hacks
I’ve been trying to catch up what hacks GPU vendors have exposed in Direct3D9, and turns out there’s a lot of them!
If you know more hacks or more details, please let me know in the comments!
Most hacks are exposed as custom (“FOURCC”) formats. So to check for that, you do CheckDeviceFormat. Here’s the list (Usage column codes: DS=DepthStencil, RT=RenderTarget; Resource column codes: tex=texture, surf=surface). More green = more hardware support.
| Format | Usage | Resource | Description | NVIDIA GeForce | ATI Radeon | Intel |
|---|---|---|---|---|---|---|
| Shadow mapping | ||||||
| D3DFMT_D16 | DS | tex | Sample depth buffer directly as shadow map. | 3+ | HD 2xxx+ | 965+ |
| D3DFMT_D24X8 | DS | tex | 3+ | HD 2xxx+ | 965+ | |
| Depth Buffer As Texture | ||||||
| DF16 | DS | tex | Read depth buffer as texture. | 9500+ | G45+ | |
| DF24 | DS | tex | X1300+ | SB+ | ||
| INTZ | DS | tex | 8+ | HD 4xxx+ | G45+ | |
| RAWZ | DS | tex | 6 & 7 | |||
| Anti-Aliasing related | ||||||
| RESZ | RT | surf | Resolve MSAA’d depth stencil surface into non-MSAA’d depth texture. | HD 4xxx+ | G45+ | |
| ATOC | 0 | surf | Transparency anti-aliasing. | 7+ | SB+ | |
| SSAA | 0 | surf | 7+ | |||
| All ATI SM2.0+ hardware | 9500+ | |||||
| n/a | Coverage Sampled Anti-Aliasing[6] | 8+ | ||||
| Texturing | ||||||
| ATI1 | 0 | tex | ATI1n & ATI2n texture compression formats. | 8+ | X1300+ | G45+ |
| ATI2 | 0 | tex | 6+ | 9500+ | G45+ | |
| DF24 | DS | tex | Fetch 4: when sampling 1 channel texture, return four touched texel values[1]. Check for DF24 support. | X1300+ | SB+ | |
| Misc | ||||||
| NULL | RT | surf | Dummy render target surface that does not consume video memory. | 6+ | HD 4xxx+ | HD+ |
| NVDB | 0 | surf | Depth Bounds Test. | 6+ | ||
| R2VB | 0 | surf | Render into vertex buffer. | 6 & 7 | 9500+ | |
| INST | 0 | surf | Geometry Instancing on pre-SM3.0 hardware. | 9500+ | ||
Native Shadow Mapping
Native support for shadow map sampling & filtering was introduced ages ago (GeForce 3) by NVIDIA. Turns out ATI also implemented the same feature for it’s DX10 level cards. Intel also supports it on Intel 965 (aka GMA X3100, the shader model 3 card) and later (G45/X4500/HD) cards.
The usage is quite simple; just create a texture with regular depth/stencil format and render into it. When reading from the texture, one extra component in texture coordinates will be the depth to compare with. Compared & filtered result will be returned.
Also useful:
- Creating NULL color surface to keep D3D runtime happy and save on video memory.
Depth Buffer as Texture
For some rendering schemes (anything with “deferred”) or some effects (SSAO, depth of field, volumetric fog, …) having access to a depth buffer is needed. If native depth buffer can be read as a texture, this saves both memory and a rendering pass or extra output for MRTs.
Depending on hardware, this can be achieved via INTZ, RAWZ, DF16 or DF24 formats:
- INTZ is for recent (DX10+) hardware. With recent drivers, all three major IHVs expose this. According to ATI [1], it also allows using stencil buffer while rendering. Also allows reading from depth texture while it’s still being used for depth testing (but not depth writing). Looks like this applies to NV & Intel parts as well.
- RAWZ is for GeForce 6 & 7 series only. Depth is specially encoded into four channels of returned value.
- DF16 and DF24 is for ATI and Intel cards, including older cards that don’t support INTZ. Unlike INTZ, this does not allow using depth buffer or using the surface for both sampling & depth testing at the same time.
Also useful when using depth textures:
- Creating NULL color surface to keep D3D runtime happy and save on video memory.
- RESZ allows resolving multisampled depth surfaces into non-multisampled depth textures (result will be sample zero for each pixel).
Caveats:
- Using INTZ for both depth/stencil testing and sampling at the same time seems to have performance problems on ATI cards (checked Radeon HD 3xxx to 5xxx, Catalyst 9.10 to 10.5). A workaround is to render to INTZ depth/stencil first, then use RESZ to “blit” it into another surface. Then do sampling from one surface, and depth testing on another.
Depth Bounds Test
Direct equivalent of GL_EXT_depth_bounds_test OpenGL extension. See [3] for more information.
Transparency Anti-Aliasing
NVIDIA exposes two controls: transparency multisampling (ATOC) and transparency supersampling (SSAA) [5]. ATI says that all Radeons since 9500 support “alpha to coverage” [1]. Intel supports ATOC with SandyBridge (GMA HD 2000/3000) GPUs.
Render Into Vertex Buffer
Similar to “stream out” or “memexport” in other APIs/platforms. See [2] for more information. Apparently some NVIDIA GPUs (or drivers?) support this as well.
Geometry Instancing
Instancing is supported on all Shader Model 3.0 hardware by Direct3D 9.0c, so there’s no extra hacks necessary there. ATI has exposed a capability to enable instancing on their Shader Model 2.0 hardware as well. Check for “INST” support, and do dev->SetRenderState (D3DRS_POINTSIZE, kFourccINST); at startup to enable instancing.
I can’t find any document on instancing from AMD now. Other references: [7] and [8].
ATI1n & ATI2n Compressed Texture Formats
Compressed texture formats. ATI1n is known as BC4 format in DirectX 10 land; ATI2n as BC5 or 3Dc. Since they are just DX10 formats, support for this is quite widespread, with NVIDIA exposing it a while ago and Intel exposing it recently (drivers 15.17 or higher).
Thing to keep in mind: when DX9 allocates the mip chain, they check if the format is a known compressed format and allocate the appropriate space for the smallest mip levels. For example, a 1x1 DXT1 compressed level actually takes up 8 bytes, as the block size is fixed at 4x4 texels. This is true for all block compressed formats. Now when using the hacked formats DX9 doesn’t know it’s a block compression format and will only allocate the number of bytes the mip would have taken, if it weren’t compressed. For example a 1x1 ATI1n format will only have 1 byte allocated. What you need to do is to stop the mip chain before the size of the either dimension shrinks below the block dimensions otherwise you risk having memory corruption.
Another thing to keep in mind: on Vista+ (WDDM) driver model, textures in these formats will still consume application address space. Most regular textures like DXT5 don’t take up additional address space in WDDM (see here). For some reason ATI1n and ATI2n textures on D3D9 are deemed lockable.
References
All this information gathered mostly from:
- Advanced DX9 Capabilities for ATI Radeon Cards (pdf)
- ATI R2VB Programming (pdf)
- NVIDIA GPU Programming Guide (pdf)
- ATI Tesselation
- NVIDIA Transparency AA
- NVIDIA Coverage Sampled AA
- Humus’ Instancing Demo
- Arseny’s article on particles
Changelog
- 2013 06 11: One more note on ATI1n/ATI2n format virtual address space issue (thanks JSeb!).
- 2013 04 09: Turns out since sometime 2011 Intel has DF24 and Fetch4 for SandyBridge and later.
- 2011 01 09: Intel implemented ATOC for SandyBridge, and NULL for GMA HD and later.
- 2010 08 25: Intel implemented DF16, INTZ, RESZ for G45+ GPUs!
- 2010 08 25: Added note on INTZ performance issue with ATI cards.
- 2010 08 19: Intel implemented ATI1n/ATI2n support for G45+ GPUs in the latest drivers!
- 2010 07 08: Added note on ATI1n/ATI2n texture formats, with a caveat pointed out by Henning Semler (thanks!)
- 2010 01 06: Hey, shadow map hacks are also supported on Intel 965!
- 2009 12 09: Shadow map hacks are supported on Intel G45!
- 2009 11 21: Added instancing on SM2.0 hardware.
- 2009 11 20: Added Fetch-4, CSAA.
- 2009 11 20: Initial version.
原文链接:http://aras-p.info/texts/D3D9GPUHacks.html
D3D9 GPU Hacks (转载)的更多相关文章
- D3D9 优化小技巧
此篇文章主要讲一些小技巧,针对前面转载的D3D9 GPU Hacks,我们可以做的一些优化. 在做延迟渲染或者其它需要深度的地方使用INTZ格式的纹理,这样可以直接对纹理进行操作,节省了显存和带宽,这 ...
- 【转载】 GPU状态监测 nvidia-smi 命令详解
原文地址: https://blog.csdn.net/huangfei711/article/details/79230446 ----------------------------------- ...
- 【转载】 os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = "0" (---------tensorflow中设置GPU可见顺序和选取)
原文地址: https://blog.csdn.net/Jamesjjjjj/article/details/83414680 ------------------------------------ ...
- 【转载】GPU深度发掘(一)::GPGPU数学基础教程
作者:Dominik Göddeke 译者:华文广 Contents 介绍 准备条件 硬件设备要求 软件设备要求 两者选择 初始化OpenGL GLUT OpenGL ...
- 【转载】 NVIDIA Tesla/Quadro和GeForce GPU比较
原文地址: https://blog.csdn.net/m0_37462765/article/details/74394932 版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议 ...
- 【转载】GPU 加速下的图像处理
Instagram,Snapchat,Photoshop. 所有这些应用都是用来做图像处理的.图像处理可以简单到把一张照片转换为灰度图,也可以复杂到是分析一个视频,并在人群中找到某个特定的人.尽管这些 ...
- 【Todo】【转载】深度学习&神经网络 科普及八卦 学习笔记 & GPU & SIMD
上一篇文章提到了数据挖掘.机器学习.深度学习的区别:http://www.cnblogs.com/charlesblc/p/6159355.html 深度学习具体的内容可以看这里: 参考了这篇文章:h ...
- [转载]tensorflow中使用tf.ConfigProto()配置Session运行参数&&GPU设备指定
tf.ConfigProto()函数用在创建session的时候,用来对session进行参数配置: config = tf.ConfigProto(allow_soft_placement=True ...
- 为什么GPU可以用于科学计算【转载】
转自:https://blog.csdn.net/xihuanyuye/article/details/81178352 https://www.zhihu.com/question/35063258 ...
随机推荐
- empty()与remove([expr])的区别.转
jquery之empty()与remove()区别 要用到移除指定元素的时候,发现empty()与remove([expr])都可以用来实现.可仔细观察效果的话就可以发现.empty()是只移除了 ...
- Bootstrap页面布局5 - 响应式布局(格式)
旨在优化不同上网设备中页面显示的优化 响应式布局:就是根据浏览窗口的尺寸,改变页面的变化 原理:利用css的media-queries判断浏览窗口的尺寸,在CSS样式表中设置一些规则! 例如: 在&l ...
- Bootstrap页面布局4 - 嵌套布局
嵌套布局: 在一行中,有三列,每一列都有对应的BS栅格系统中的格子,以下例中因为 .row中的div对应的class分别是span4,span4,span4,所以其每一列对应的格子数是 4,4,4 现 ...
- php session 跨页失效问题
原因是session.savepath 目录不存在或者没有读写权限
- bpel 之伙伴
一.伙伴链接类型(Partner Link Types) 1.交互过程 伙伴之间的交互过程共分为两种典型情况: 流程调用伙伴后同步等待返回结果.这种情况通常是伙伴能很快返回结果,流程不需要等待很长时间 ...
- pro7
1.本次课学习到的知识点: 函数的作用 确定函数的功能 定义函数 调用函数 2.实验过程中遇到的问题及解决方法: 定义函数时 变量的定义会出现混乱 通过看例题 多练习 逐渐熟悉 需从数学角度解决问题时 ...
- zabbix basic concept
tomcat/mysql/hadoop http://www.linuxidc.com/Linux/2014-06/103776p2.htm http://www.aikaiyuan.com/2993 ...
- 【Android测试】【第二节】性能——CPU时间片
◆版权声明:本文出自胖喵~的博客,转载必须注明出处. 转载请注明出处:http://www.cnblogs.com/by-dream/p/5143192.html 前言 第一节讲CPU的时候留下了一个 ...
- 【Android开发学习笔记】【第九课】重力感应
概念 使用重力感应技术的Android游戏已经屡见不鲜,不知道自己以后会不会用到,所以先研究了一下. 在网上学习了一下,貌似没有api,所以得自己去分析手机处在怎样状态下.注意: 下面提供的demo程 ...
- Ajax无刷新提交
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/ ...