对pathtracing的一些个人理解
本人水平有限,若有错误也请指正~
上面说到pathtracing(pt)的一些优点和缺点,优点即其实现很简单,这就是大概为什么当今市面上流行的很多渲染器如今都相继采用pathtracing算法为核心进行实现,但是pathtracing的最大缺点就是收敛速度很慢,其原因就在于全局光照的那个积分式要求在半球面上连续采样,这就需要每次发射n条采样光线,n的数量直接决定了最终图像的质量(对于pt来讲它的图像质量取决于噪点数量的多少,噪点多的话有点像用老式胶片拍出来的照片那样),一般n越大图像过渡越光滑。前人学者们也早已发现了pt的这个收敛慢的缺点,于是有了多种方法来降噪,具体来讲这些降噪手段有以下几种:
1)改进pt算法。如后续比较经典的bidirectional pathtracing(bdpt),metropolis pathtracing
2)保持pt算法不变,但对其内部的采样机制进行改进。比如:
a. 根据BRDF函数进行重要性采样;
b. 采样时所用到的均匀分布,一般都是用c语言自带的rand函数或者c++ random库里面的uniform distribution来完成。但是这种采样方式对于pt算法本身来讲,并不利于最终图像的收敛,
现在的很多渲染器内部都是利用一种叫“低差异序列(Low Discrepancy Sequence)”的采样方式(详见https://zhuanlan.zhihu.com/p/20197323?columnSlug=graphics)。
c. pt算法的一大优势就是屏幕每个像素的采样都是独立的,每次迭代也是互相独立的,这就可以利用并行算法来对算法进行加速。CPU端有OpenMP,而且在诸多流行编译器内都直接内置集成了,
GPU端,N卡有CUDA,A卡可以利用OpenCL,而且渲染速度与GPU数量成线性增长关系,这就有点像暴力解算的意思,GPU端实现的最大区别就是不能递归(虽然N卡早已支持递归,但是
在计算规模很大的情况下每个线程的栈就很浅,再加上本身pt算法的采样光线有可能采样到很深的深度,所以并不是很敢用),递归主要发生在光线与场景求交的过程中,由于对场景采用的
是基于层级的划分(Bounding Volume Hierarchy)本身就有递归的意思。不能递归这点只要自己构造合理数据结构,把递归所用的那些信息尽量在构造BVH的时候自然写入结构内部即可,
可以比较容易的将递归降级为循环。
(ps:也难怪现在很多步进式渲染器都采用pathtracing...实现简单,开发简单,而且要提速的话直接借助并行算法,只不过多配几个GPU就行了,这就节省了一堆人力,成本就低...)
而最近浏览网页也发现了,很多人和企业也倾向于利用CPU渲染,具体请参见http://stackoverflow.com/questions/38029698/why-do-we-use-cpus-for-ray-tracing-instead-of-gpus
里面有个人的回答也是不错:
I'm one of the rendering software architects at a large VFX and animated feature studio with a proprietary renderer (not Pixar, though I was once the rendering software architect there as well, long, long ago).
Almost all high-quality rendering for film (at all the big studios, with all the major renderers) is CPU only. There are a bunch of reasons why this is the case. In no particular order, some of the really compelling ones to give you the flavor of the issues:
GPUs only go fast when everything is in memory. The biggest GPU cards have, what, 12GB or so, and it has to hold everything. Well, we routinely render scenes with 30GB of geometry and that reference 1TB or more of texture. Can't load that into GPU memory, it's literally two orders of magnitude too big. So GPUs are simply unable to deal with our biggest (or even average) scenes. (With CPU renderers, we can page stuff from disk whenever we need. GPUs aren't good at that.)
Don't believe the hype, ray tracing with GPUs is not an obvious win over CPU. GPUs are great at highly coherent work (doing the same things to lots of data at once). Ray tracing is very incoherent (each ray can go a different direction, intersect different objects, shade different materials, access different textures), and so this access pattern degrades GPU performance very severely. It's only very recently that GPU ray tracing could match the best CPU-based ray tracing code, and even though it has surpassed it, it's not by much, not enough to throw out all the old code and start fresh with buggy fragile code for GPUs. And the biggest, most expensive scenes are the ones where GPUs are only marginally faster. Being lots faster on the easy scenes is not really important to us.
If you have 50 or 100 man years of production-hardened code in your CPU-based renderer, you just don't throw it out and start over in order to get a 2x speedup. Software engineering effort, stability, and so on, is more important and a bigger cost factor.
Similarly, if your studio has an investment in a data center holding 20,000 CPU cores, all in the smallest, most power and heat-efficient form factor you can, that's also a sunk cost investment you don't just throw away. Replacing them with new machines containing top of the line GPUs vastly increases the cost of your render farm, and they are bigger and produce more heat, so it literally might not fit in your building.
Amdahl's Law: The actual "rendering" per se is only one stage in generating the scenes, and GPUs don't help with it. Let's say that it takes 1 hour to fully generate and export the scene to the renderer, and 9 hours to "render", and out of that 9 hours, an hour is reading texture, volumes, and other data from disk. So out of the total 10 hours of how the user experiences rendering (push button until final image is ready), 8 hours is potentially sped up with GPUs. So, even if GPU was 10x as fast as CPU for that part, you go from 10 hours to 1+1+0.8 = nearly 3 hours. So 10x GPU speedup only translates to 3x actual gain. If GPU was 1,000,000x faster than CPU for ray tracing, you still have 1+1+tiny, which is only a 5x speedup.
But what's different about games? Why are GPUs good for games but not film?
First of all, when you make a game, remember that it's got to render in real time -- that means your most important constraint is the 60Hz (or whatever) frame rate, and you sacrifice quality or features where necessary to achieve that. In contrast, with film, the unbreakable constraint is making the director and VFX supervisor happy with the quality and look he or she wants, and how long it takes you to get that is (to a degree) secondary.
Also, with a game, you render frame after frame after frame, live in front of every user. But with film, you effectively are rendering ONCE, and what's delivered to theaters is a movie file -- so moviegoers will never know or care if it took you 10 hours per frame, but they will notice if it doesn't look good. So again, there is less of a penalty placed on those renders taking a long time, as long as they look fabulous.
With a game, you don't really know what frames you are going to render, since the player may wander all around the world, view from just about anywhere. You can't and shouldn't try to make it all perfect, you just want it to be good enough all the time. But for a film, the shots are all hand-crafted! A tremendous amount of human time goes into composing, animating, lighting, and compositing every shot, and then you only need to render it once. Think about the economics -- once 10 days of calendar (and salary) has gone into lighting and compositing the shot just right, the advantage of rendering it in an hour (or even a minute) versus overnight, is pretty small, and not worth any sacrifice of quality or achievable complexity of the image.
对pathtracing的一些个人理解的更多相关文章
- 理解CSS视觉格式化
前面的话 CSS视觉格式化这个词可能比较陌生,但说起盒模型可能就恍然大悟了.实际上,盒模型只是CSS视觉格式化的一部分.视觉格式化分为块级和行内两种处理方式.理解视觉格式化,可以确定得到的效果是应 ...
- 彻底理解AC多模式匹配算法
(本文尤其适合遍览网上的讲解而仍百思不得姐的同学) 一.原理 AC自动机首先将模式组记录为Trie字典树的形式,以节点表示不同状态,边上标以字母表中的字符,表示状态的转移.根节点状态记为0状态,表示起 ...
- 理解加密算法(三)——创建CA机构,签发证书并开始TLS通信
接理解加密算法(一)--加密算法分类.理解加密算法(二)--TLS/SSL 1 不安全的TCP通信 普通的TCP通信数据是明文传输的,所以存在数据泄露和被篡改的风险,我们可以写一段测试代码试验一下. ...
- node.js学习(三)简单的node程序&&模块简单使用&&commonJS规范&&深入理解模块原理
一.一个简单的node程序 1.新建一个txt文件 2.修改后缀 修改之后会弹出这个,点击"是" 3.运行test.js 源文件 使用node.js运行之后的. 如果该路径下没有该 ...
- 如何一步一步用DDD设计一个电商网站(一)—— 先理解核心概念
一.前言 DDD(领域驱动设计)的一些介绍网上资料很多,这里就不继续描述了.自己使用领域驱动设计摸滚打爬也有2年多的时间,出于对知识的总结和分享,也是对自我理解的一个公开检验,介于博客园这个平 ...
- 学习AOP之透过Spring的Ioc理解Advisor
花了几天时间来学习Spring,突然明白一个问题,就是看书不能让人理解Spring,一方面要结合使用场景,另一方面要阅读源代码,这种方式理解起来事半功倍.那看书有什么用呢?主要还是扩展视野,毕竟书是别 ...
- ThreadLocal简单理解
在java开源项目的代码中看到一个类里ThreadLocal的属性: private static ThreadLocal<Boolean> clientMode = new Thread ...
- JS核心系列:理解 new 的运行机制
和其他高级语言一样 javascript 中也有 new 运算符,我们知道 new 运算符是用来实例化一个类,从而在内存中分配一个实例对象. 但在 javascript 中,万物皆对象,为什么还要通过 ...
- 深入理解JS 执行细节
javascript从定义到执行,JS引擎在实现层做了很多初始化工作,因此在学习JS引擎工作机制之前,我们需要引入几个相关的概念:执行环境栈.全局对象.执行环境.变量对象.活动对象.作用域和作用域链等 ...
随机推荐
- CHM文件无法打开或无法搜索
在确保CHM文件本身正常的前提下,检查c:\\windows\hh.exe和C:\\windows\system32\itss.dll和hhctrl.ocx三个文件是否存在. 如不存在,只需要从其他机 ...
- 如何用unity3d实现发送带附件的邮件
以Gmail为例.点击屏幕的Capture按钮得到当前屏幕截图,点击Send按钮将之前的截图作为附件发送邮件. using UnityEngine; using System.Collections; ...
- Elasticsearch搜索之cross_fields分析
cross_fields类型采用了一种以词条为中心(Term-centric)的方法,这种方法和best_fields及most_fields采用的以字段为中心(Field-centric)的方法有很 ...
- codeforces 528D Fuzzy Search
链接:http://codeforces.com/problemset/problem/528/D 正解:$FFT$. 很多字符串匹配的问题都可以用$FFT$来实现. 这道题是要求在左边和右边$k$个 ...
- 针对Mac的DuckHunter攻击演示
0x00 HID 攻击 HID是Human Interface Device的缩写,也就是人机交互设备,说通俗一点,HID设备一般指的是键盘.鼠标等等这一类用于为计算机提供数据输入的设备. DuckH ...
- 蓝桥杯-循环节长度-java
/* (程序头部注释开始) * 程序的版权和版本声明部分 * Copyright (c) 2016, 广州科技贸易职业学院信息工程系学生 * All rights reserved. * 文件名称: ...
- 计算机的启动和Linux的启动
计算机的启动和Linux的启动 一 计算机的启动 计算机的启动过程分为四个阶段,分别是:BIOS.MBR.启动管理程序.加载操作系统内核.操作系统启动. 1.1 BIOS 计算机加电后,第一件 ...
- win10 64位下装Virtual Box安装Linux(centOS)配置联网
第一步:安装VritualBox 百度"VritualBox"下载安装即可: 第二步:下载Linux镜像系统并安装 这里写出我参照的博客,很详细,我就不累赘了! 原文地址:http ...
- zoj 1013 Great Equipment DP
题目链接:http://acm.zju.edu.cn/onlinejudge/showProblem.do?problemId=13 很经典的一个DP的题目 定义dp[i][num1][num2]表示 ...
- jQuery 一个你意想不到的代码神器!
jQuery 一个你意想不到的代码神器! jQuery 选择器.(最简单,最基本) jquery选择器的优势: (1) 简洁的写法,$()函数 (2)支持CSS1到CSS3选择器 (3)完善的处理机制 ...