对pathtracing的一些个人理解
本人水平有限,若有错误也请指正~
上面说到pathtracing(pt)的一些优点和缺点,优点即其实现很简单,这就是大概为什么当今市面上流行的很多渲染器如今都相继采用pathtracing算法为核心进行实现,但是pathtracing的最大缺点就是收敛速度很慢,其原因就在于全局光照的那个积分式要求在半球面上连续采样,这就需要每次发射n条采样光线,n的数量直接决定了最终图像的质量(对于pt来讲它的图像质量取决于噪点数量的多少,噪点多的话有点像用老式胶片拍出来的照片那样),一般n越大图像过渡越光滑。前人学者们也早已发现了pt的这个收敛慢的缺点,于是有了多种方法来降噪,具体来讲这些降噪手段有以下几种:
1)改进pt算法。如后续比较经典的bidirectional pathtracing(bdpt),metropolis pathtracing
2)保持pt算法不变,但对其内部的采样机制进行改进。比如:
a. 根据BRDF函数进行重要性采样;
b. 采样时所用到的均匀分布,一般都是用c语言自带的rand函数或者c++ random库里面的uniform distribution来完成。但是这种采样方式对于pt算法本身来讲,并不利于最终图像的收敛,
现在的很多渲染器内部都是利用一种叫“低差异序列(Low Discrepancy Sequence)”的采样方式(详见https://zhuanlan.zhihu.com/p/20197323?columnSlug=graphics)。
c. pt算法的一大优势就是屏幕每个像素的采样都是独立的,每次迭代也是互相独立的,这就可以利用并行算法来对算法进行加速。CPU端有OpenMP,而且在诸多流行编译器内都直接内置集成了,
GPU端,N卡有CUDA,A卡可以利用OpenCL,而且渲染速度与GPU数量成线性增长关系,这就有点像暴力解算的意思,GPU端实现的最大区别就是不能递归(虽然N卡早已支持递归,但是
在计算规模很大的情况下每个线程的栈就很浅,再加上本身pt算法的采样光线有可能采样到很深的深度,所以并不是很敢用),递归主要发生在光线与场景求交的过程中,由于对场景采用的
是基于层级的划分(Bounding Volume Hierarchy)本身就有递归的意思。不能递归这点只要自己构造合理数据结构,把递归所用的那些信息尽量在构造BVH的时候自然写入结构内部即可,
可以比较容易的将递归降级为循环。
(ps:也难怪现在很多步进式渲染器都采用pathtracing...实现简单,开发简单,而且要提速的话直接借助并行算法,只不过多配几个GPU就行了,这就节省了一堆人力,成本就低...)
而最近浏览网页也发现了,很多人和企业也倾向于利用CPU渲染,具体请参见http://stackoverflow.com/questions/38029698/why-do-we-use-cpus-for-ray-tracing-instead-of-gpus
里面有个人的回答也是不错:
I'm one of the rendering software architects at a large VFX and animated feature studio with a proprietary renderer (not Pixar, though I was once the rendering software architect there as well, long, long ago).
Almost all high-quality rendering for film (at all the big studios, with all the major renderers) is CPU only. There are a bunch of reasons why this is the case. In no particular order, some of the really compelling ones to give you the flavor of the issues:
GPUs only go fast when everything is in memory. The biggest GPU cards have, what, 12GB or so, and it has to hold everything. Well, we routinely render scenes with 30GB of geometry and that reference 1TB or more of texture. Can't load that into GPU memory, it's literally two orders of magnitude too big. So GPUs are simply unable to deal with our biggest (or even average) scenes. (With CPU renderers, we can page stuff from disk whenever we need. GPUs aren't good at that.)
Don't believe the hype, ray tracing with GPUs is not an obvious win over CPU. GPUs are great at highly coherent work (doing the same things to lots of data at once). Ray tracing is very incoherent (each ray can go a different direction, intersect different objects, shade different materials, access different textures), and so this access pattern degrades GPU performance very severely. It's only very recently that GPU ray tracing could match the best CPU-based ray tracing code, and even though it has surpassed it, it's not by much, not enough to throw out all the old code and start fresh with buggy fragile code for GPUs. And the biggest, most expensive scenes are the ones where GPUs are only marginally faster. Being lots faster on the easy scenes is not really important to us.
If you have 50 or 100 man years of production-hardened code in your CPU-based renderer, you just don't throw it out and start over in order to get a 2x speedup. Software engineering effort, stability, and so on, is more important and a bigger cost factor.
Similarly, if your studio has an investment in a data center holding 20,000 CPU cores, all in the smallest, most power and heat-efficient form factor you can, that's also a sunk cost investment you don't just throw away. Replacing them with new machines containing top of the line GPUs vastly increases the cost of your render farm, and they are bigger and produce more heat, so it literally might not fit in your building.
Amdahl's Law: The actual "rendering" per se is only one stage in generating the scenes, and GPUs don't help with it. Let's say that it takes 1 hour to fully generate and export the scene to the renderer, and 9 hours to "render", and out of that 9 hours, an hour is reading texture, volumes, and other data from disk. So out of the total 10 hours of how the user experiences rendering (push button until final image is ready), 8 hours is potentially sped up with GPUs. So, even if GPU was 10x as fast as CPU for that part, you go from 10 hours to 1+1+0.8 = nearly 3 hours. So 10x GPU speedup only translates to 3x actual gain. If GPU was 1,000,000x faster than CPU for ray tracing, you still have 1+1+tiny, which is only a 5x speedup.
But what's different about games? Why are GPUs good for games but not film?
First of all, when you make a game, remember that it's got to render in real time -- that means your most important constraint is the 60Hz (or whatever) frame rate, and you sacrifice quality or features where necessary to achieve that. In contrast, with film, the unbreakable constraint is making the director and VFX supervisor happy with the quality and look he or she wants, and how long it takes you to get that is (to a degree) secondary.
Also, with a game, you render frame after frame after frame, live in front of every user. But with film, you effectively are rendering ONCE, and what's delivered to theaters is a movie file -- so moviegoers will never know or care if it took you 10 hours per frame, but they will notice if it doesn't look good. So again, there is less of a penalty placed on those renders taking a long time, as long as they look fabulous.
With a game, you don't really know what frames you are going to render, since the player may wander all around the world, view from just about anywhere. You can't and shouldn't try to make it all perfect, you just want it to be good enough all the time. But for a film, the shots are all hand-crafted! A tremendous amount of human time goes into composing, animating, lighting, and compositing every shot, and then you only need to render it once. Think about the economics -- once 10 days of calendar (and salary) has gone into lighting and compositing the shot just right, the advantage of rendering it in an hour (or even a minute) versus overnight, is pretty small, and not worth any sacrifice of quality or achievable complexity of the image.
对pathtracing的一些个人理解的更多相关文章
- 理解CSS视觉格式化
前面的话 CSS视觉格式化这个词可能比较陌生,但说起盒模型可能就恍然大悟了.实际上,盒模型只是CSS视觉格式化的一部分.视觉格式化分为块级和行内两种处理方式.理解视觉格式化,可以确定得到的效果是应 ...
- 彻底理解AC多模式匹配算法
(本文尤其适合遍览网上的讲解而仍百思不得姐的同学) 一.原理 AC自动机首先将模式组记录为Trie字典树的形式,以节点表示不同状态,边上标以字母表中的字符,表示状态的转移.根节点状态记为0状态,表示起 ...
- 理解加密算法(三)——创建CA机构,签发证书并开始TLS通信
接理解加密算法(一)--加密算法分类.理解加密算法(二)--TLS/SSL 1 不安全的TCP通信 普通的TCP通信数据是明文传输的,所以存在数据泄露和被篡改的风险,我们可以写一段测试代码试验一下. ...
- node.js学习(三)简单的node程序&&模块简单使用&&commonJS规范&&深入理解模块原理
一.一个简单的node程序 1.新建一个txt文件 2.修改后缀 修改之后会弹出这个,点击"是" 3.运行test.js 源文件 使用node.js运行之后的. 如果该路径下没有该 ...
- 如何一步一步用DDD设计一个电商网站(一)—— 先理解核心概念
一.前言 DDD(领域驱动设计)的一些介绍网上资料很多,这里就不继续描述了.自己使用领域驱动设计摸滚打爬也有2年多的时间,出于对知识的总结和分享,也是对自我理解的一个公开检验,介于博客园这个平 ...
- 学习AOP之透过Spring的Ioc理解Advisor
花了几天时间来学习Spring,突然明白一个问题,就是看书不能让人理解Spring,一方面要结合使用场景,另一方面要阅读源代码,这种方式理解起来事半功倍.那看书有什么用呢?主要还是扩展视野,毕竟书是别 ...
- ThreadLocal简单理解
在java开源项目的代码中看到一个类里ThreadLocal的属性: private static ThreadLocal<Boolean> clientMode = new Thread ...
- JS核心系列:理解 new 的运行机制
和其他高级语言一样 javascript 中也有 new 运算符,我们知道 new 运算符是用来实例化一个类,从而在内存中分配一个实例对象. 但在 javascript 中,万物皆对象,为什么还要通过 ...
- 深入理解JS 执行细节
javascript从定义到执行,JS引擎在实现层做了很多初始化工作,因此在学习JS引擎工作机制之前,我们需要引入几个相关的概念:执行环境栈.全局对象.执行环境.变量对象.活动对象.作用域和作用域链等 ...
随机推荐
- Android Gradle 指定 Module 打包
Android Gradle 指定 Module 打包 项目中有许多的可以直接独立运行的 Module ,如何在 Gradle 中将签名文件配置好了,那么就不需要普通的手动点击 Generate Si ...
- oracle 归档日志满 报错ORA-00257: archiver error. Connect internal only, until freed
归档日志满导致无法用户无法登陆 具体处理办法 --用户登陆 Microsoft Windows [Version 6.1.7601] Copyright (c) Microsoft Corporati ...
- java面试题—精选30道Java笔试题解答(一)
下面都是我自己的答案非官方,仅供参考,如果有疑问或错误请一定要提出来,大家一起进步啦~~~ 1. 下面哪些是Thread类的方法() A start() B run() C exit() D getP ...
- [ext4]01 磁盘布局 - block分析
ext4文件系统最基本的分配单元是"block"(块). block是由一组连续的sectors来组成,其大小介于1k~4K之间,当然不可能是任意值,只能是2的整数次幂个secto ...
- JavaScript中undefined 和not defined
首先呢,我们来介绍undefined,xx is not defined的区别 (创建一个html文件,在头部编写JavaScript代码) 我们先编写如下代码: <script type=&q ...
- 整合最优雅SSM框架:SpringMVC + Spring + MyBatis
我们看招聘信息的时候,经常会看到这一点,需要具备SSH框架的技能:而且在大部分教学课堂中,也会把SSH作为最核心的教学内容. 但是,我们在实际应用中发现,SpringMVC可以完全替代Struts,配 ...
- 全球移动互联网大会gmic 2017为什么值得参加?
长城会CEO郝义认为,"科学产业化将会推动科学复兴,"而本次GMIC 北京 2017也将首次引入了高规格科学家闭门峰会,专门设置G-Summit全球科学创新峰会,以"科学 ...
- 创建并发布npm包
1.npm官网创建npm账户 npm网站地址:https://www.npmjs.com/ npm网站注册地址:https://www.npmjs.com/signup 2.命令行工具登录npm np ...
- 虚幻UE4中移动端水材质的设置
内容: *概述 *纹理文件 *基本颜色 *法线的设置 *标量参数和材质属性 *场景设置 *最终效果 概述 本教程由52VR翻译自unrealengine官方,在本教程中,我们将教您如何创建可以在移动设 ...
- Linux中安装redis
第一部分:安装redis 1.希望将安装包下载到此目录 /home/local/src 安装过程指令 $ mkdir /home/local/redis $ cd /home/local/src ...