Rendering in UE4
Intro
Thinking performance.
Identify the target framerate, aim your approach on hitting that target framerate.
- Everything needs to be as efficient as possible
- Adjust pipelines to engine and hardware restrictions
- Try to offload parts to pre-calculations
- Use the engine’s pool of techniques to achieve quality at suitable cost
What is rendering?
- CPU and GPU handle different parts of the rendering calculations
- They are interdependent and can bottleneck each other
- Know how to the load is distributed between the 2
Shading Techniques
- Deferred shading
- Compositing based using the G-Buffer
- Shading happens in deferred passes
- Good at rendering dynamic lighting
- More flexible when it comes to disabling features, less flexible when it comes to surface attributes
- Forward shading
Before Rendering
Rendering threads
Rendering is heavily parallel process. It happens on multiple threads, main threads are CPU(Game), CPU(Draw) and
GPU, reality there is many threads that branch and converge.
UE4 Cmd (stat unit, stat unitgraph)
CPU – Game thread
Before we can render anything we first need to know where everything will be,
Calculate all logic and transforms
- Animations
- Positions of models and objects
- Physics
- AI
- Spawn and destroy, hide and unhide
Results: UE4 now knows all transforms of all models.
CPU – Draw thread
Before we can use the transforms to render the image we need to know what to include in the rendering, ignoring this
question might take rendering expensive on GPU.
Occlusion process – Builds up a list of all visible models/objects, happens per object – not per triangle
4 Stage process
- Distance Culling (manually, LOD Component, Cull Distance Volume)
- Frustum Culling (what is in front of the camera, wide FOV more objects to render)
- Precomputed Visibility
- Occlusion Culling
Precomputed visibility answers more complex occlusion questions,
Objects occluded by other objects, divides the scene into a grid, each grid cell remembers what is visible at that location.
Dynamic Occlusion Culling checks the visibility state on every model, that is mostly run on the CPU but some parts are
GPU handled.
Occlusion Performance Implication
- Set up manual culling (i.e. distance culling, per-computed vis)
- Even things like particles occlude
- Many small objects cause more stress on CPU for culling
- Large models will rarely occlude and thus increase GPU
- Know your world and balance objects size vs count
Results: UE4 now has a list of models to render.
Geometry Rendering
The GPU now has a list of models and transforms but if we just render this info out we could possibly cause a lot of
redundant pixel rendering. Similar to excluding objects, we need to exclude pixels, we need to figure out which pixels
are occluded.
To do this, we generate a depth pass and use it to determine if the given pixel is in front and visible.
GPU – Prepass / Early-Z Pass
Render teapot first, then render the box.
Previous strategy doesn’t work, that why we need to depend on the depth of those pixels to know if this object or this
pixel is behind in front of another object and then decide if we need to render it.
Question 1. How does the renderer associate the early-z pass with an actual object in the scene?
It doesn’t really associated object per object what it happens, it knows the position of pixel on the screen, so what it
need to render an object or a pixel it knows. Ignore it or keep it.
Draw calls
- Now we are ready to actually rendering some geometries, so in order to be efficient, the GPU render a drawcall by
drawcall, not a triangle by a triangle. - A drawcall is it’s a group of triangles that share the same properties.
- Drawcalls are prepared by CPU(Draw) thread
- Distilling rendering info for objects into a GPU state ready for submission
2,000 – 3,000 is reasonable, more than 5,000 is getting high, more than 10,000 is probably a problem, on mobile this number
is far lower (few hundred max), draw calls is determined by visible objects.
- Drawcalls have a huge impact on the CPU (draw) thread
- Has high overhead for preparing GPU states
- Usually we hit the issues with drawcalls way before issues with tri count.
Imagination of the overhead of a draw call vs that triangles,
Copying 1 single 1GB file vs Copying 1 million 1KB files.
Drawcalls performance implications:
- Render your triangles with as few drawcalls as possible
- 50,000 triangles can run worse than 50 million dependents on scene setup (Drawcalls)
- When optimizing scene, know your bottleneck (Drawcall vs Tri count)
Optimizing Drawcalls
Merging objects
To lower the drawcalls it is better to use fewer larger models than many small ones. You can do that too much, it impacts
other things negatively
- Occlusion
- Lightmapping
- Collision calculation
- Memory
Good balance between size and count is a good strategy.
Drawcall is related directly to how many objects you have and how many unique material IDs you have.
Merging guidelines
- Target low poly objects
- Merge only meshes within the same area
- Merge only meshes sharing the same material
- Meshes with no or simple collision are better for merging
- Distant geometry is usually great to merge (fine with culling)
HLODs
Hierarchical Level of Detail
- Regular LODs means a model becomes lower poly in the distance
- Essentially swaps one object for another simpler object (less materials)
- Hierarchical LOD (HLOD) is a bigger version, it merges objects together in the distance to lower the drawcalls
- Groups objects together into single drawcalls
- Grouping need to be done manually
Instanced Rendering
- Groups objects together into single drawcalls
- Grouping need to be done manually
Vertex Processing
First thing processing the Drawcall
Vertex shader takes care of this process
Vertex shader is a small program specialized in vertex processing
Runs completely on the GPU and so they are fast
Input is vertex data in 3D space output vertex data in screen-space
Vertex shaders – Common tasks
- It converts local VTX positions to world position
- It handles vertex shading/coloring
- It can apply additional offsets to vertex positions
Practical examples of world position offset vertex shaders are
- Cloth
- Water displacement
- Foliage wind animation
Why animate things this way?
Scalability with very high number of vertices, imagine a forest, it could involve millions of vertices to animate.
Vertex shaders do not modify the actual object or affect the scene state, it is purely a visual effect.
The CPU is not aware of what the vertex shaders do, thus things like physics or collisions will not take it into account.
Vertex Shaders Performance Implications
- The more complex the animations performed the slower
- The more vertices affected the slower
- Disable complex vertex shader effects on distant geometry
Rasterizing and G-Buffer
Rasterizing
GPU ready to render pixels, determine which pixels should be shaded called rasterizing, done drawcall by drawcall,
then tri by tri.
See here on this magnified pixel grid, we have a blue triangle and the rasterization for this triangle gives up those
orange pixels. Now the thing that need to know it happens drawcall by drawcall again to be more efficient, then it
goes triangle by triangle by same order it is submitted to the GPU.
Pixel shaders are responsible for calculating the pixel color, input is generally interpolated vertex data, texture
samplers, … etc.
Rasterizing inefficiency
When rasterizing dense meshes at distance, they converge to only few pixels. A waste of vertex processing.
i.e. A 100k tris object seen from so far away that it would be 1 pixel big, will only show 1 pixel of its closest triangle!
Overshading
Due to hardware design, it always uses a 2x2 pixel quad for processing. If a triangle is very small or very thin then
it means it might process 4 pixels while only 1 pixel is actually filled.
The gray gird is basically pixels, and the orange grid on the top which is the pixel quads that the GPU can process.
So, if we have a tiny triangle like first pic, ideally, we only having three pixels so we just need to process through those
three pixels to output the final color. However, in reality this is not what happens on the GPU, the GPU need to process
12 pixels just to render us those three pixels at the end. So, here we see like our first waste of pixel processing for small
triangles.
Even worse case,
How to visualize overshading, Lit -> Optimization View modes -> Quad Overdraw.
Rasterization and Overshading Performance Implications
- Triangles are more expensive to render in great density
- When seen at a distance the density increases
- Thus, reducing triangle count at a distance (lodding / culling) is critical
- Very thin triangles are inefficient because they pass through many 2x2 pixel quads yet only fill a fraction of them
- The more complex the pixel shader is the more expensive overshading
Result are written out to:
- Multiple G-Buffers in case of deferred shading
- Shaded buffer in case of forward shading
G-Buffer
It is a rendered image encoding special data, these buffers are then used for different uses – mainly lighting, the frame
rendered out in multiple G-Buffers.
Custom Depth allows for rendering out an additional mask, which in turn can used for Chroma keying.
G-Buffer Performance Implications
The G-Buffer takes a lot of memory and bandwidth and thus has a limit on how many different G-Buffer images you can render out.
G-Buffer’s memory is resolutions dependent.
Dynamic Lighting/Shadows
Two approaches for lighting and shadows, dynamic and static.
UE4 separate between lighting and shadow data calculated in real time, so it allows for dynamic lights and dynamic objects
to considered, has another light type of lighting and shadow, which is very specific to static or pre-calculating lighting and shadows.
Lighting (Deferred Shading)
- Lighting is calculated and applied using pixel shaders
- Dynamic point lights are rendered as spheres
- The spheres act like a mask
- Anything within the sphere is to receive a pixel shader operation to blend in the dynamic light
Light attributes (e.g. color, falloff, intensity … etc.) considered in pixel shader.
Lighting calculation requires position. Depth buffer used to get pixels position in 3D.
Use normal buffer to apply shading. Direction between normal and light.
Shadows
Common techniques for rendering shadows is Shadow Maps, and concept behind shadow maps is checking for each
pixel if it is visible to the given light or not. In order to do this we need to render a depth pass, but now instead of
having it from the camera point of view we need to render it from the light point of view.
Generate a depth pass from the light point of view, and able to do this we need to go back and go through the rendering
process beginning, so we need to project them, we need to calculate many things so all the optimization and the processing
we gone through for rendering the main passes. As if we are almost rendering the scene twice, just to calculate the shadows.
Process Pros/Cons
- Pros
- Is rendered in real time using the G-Buffer
- Lights can be changed, moved, or added/removed at all
- Does not need any special model preparation
- Cons
- Especially shadows are performance heavy
Quality Pros/Cons
- Shadows are heavy on performance, so usually render quality is reduced to compensate
- Does not do radiosity/global illumination for majority of content
- Dynamic soft shadows are very hard to do well, dynamic shadows often looks sharp or blocky
Dynamic Lighting Performance Implications
- Small dynamic light is relatively cheap in a deferred renderer
- The cost is down to pixel shader operations, so the more pixels the slower it is
- The radius must be as small as possible
- Prevent excessive and regular overlap
In UE4,
- Turn off shadow casting if not needed
- The triangle count of geometry affects shadows performance
- Fade or toggle off shadows when far away
Static Lighting/Shadows
Dynamic lights and shadows are expensive, thus part of it offloaded to pre-calculations / pre-rendering. This referred as
static lights and shadows.
Lighting data stored mainly in Lightmaps.
Lightmaps
A Lightmap is a texture with the lighting and shadows baked into it.
An object usually requires UV Lightmap coordinates for this to work; this texture then multiplied on top of the basecolor.
Working out lightmaps manually for large scenes is not scalable; lightmaps generated by UE4 using lightmass.
Lightmass
- Standalone application that handles light rendering, baking to lightmaps and integrating into materials
- Ray tracer supporting GI
- Supports distributed rendering over a network
- Light Build Quality as well as settings in the Lightmass section of each level determine bake quality
- Better to have a Lightmass Importance Volume around parts of the world
Process Pros/Cons
- Super fast for performance in real-time, but increases memory
- Takes a long time to pre-calculate lighting
- Each time something is changed, it must be re-rendered again
- Models require lightmap UVs, this additional prep step that takes time
- Handles radiosity and Global illumination
- Renders realistic shadows including soft shadows
- Quality is dependent on lightmap resolution and UV layout
- Maya have seams in the lighting due to the UV layout
- Static Lighting always renders at the same speed
- Lightmap resolution affects memory and file size, not framerate
- Bake times are increased by
Quality Pros/Cons
Static Lighting Performance Implications
a) Lightmap resolutions
b) Number of models/lights
c) Higher quality settings
d) Lights with a large attenuation radius or source radius
Lighting and Shadows
Mixing
Mixing static and dynamic is often (but not always) the best way to go
- Use static for weak and distant lighting
- Use static to render indirect lighting
- Use dynamic lighting on top of the static to better accentuate the shading and shadows and provide an
interactive layer on top of the static result
Post Processing
Visual effects applied at the very end of the rendering process
Using the G-Buffers to calculate its effects
Once more relies heavily on pixel shaders
Examples:
Light Bloom, Depth of Field/Blurring, Some types of lens flares, Light Shafts, Vignette, Tone Mapping, Color
correction, Exposure, Motion Blur.
Post Processing Performance Implications
- Affected directly by final resolution
- Affected by shader complexity
- Parameters
Rendering in UE4的更多相关文章
- Rendering in UE4(Gnomon School UE4 大师课笔记)
Rendering in UE4 Presented at the Gnomon School of VFX in January 2018, part two of the class offers ...
- 如何突破Ue4材质编辑器没有Pass的概念
Content-Driven Multipass Rendering in UE4 GDC 2017 Blueprint Drawing to Render Targets Overview Live ...
- UE4 使用UGM制作血条
声明:本文是自己做的项目,可能不是最好的方法,或有错误使用方式.各位不喜勿喷! HP进度 HP背景 将上面的资源拖到UE4中(使用UE4自带的颜色也可实现效果,具体参考官方教程 https://doc ...
- UE4 中Struct Emum 类型的定义方式 笔记
UE4 基础,但是不经常用总是忘记,做个笔记加深记忆: 图方便就随便贴一个项目中的STRUCT和 Enum 的.h 文件 Note:虽然USTRUCT可以定义函数,但是不能加UFUNCTION 标签喔 ...
- UE4 去除不正确的水面倒影以及不完整镜头轮廓
最近在做的项目遇到了一点点问题,出现了如下效果 视角对着湖面移动会出现一个显示不完整的轮廓(比较长的蓝色矩形),详细一点就是下图这样,以及近处物体的倒影(从光照的照射角度来看是不应该出现的) 一开始就 ...
- UE4 在C++ 动态生成几何、BSP体、Brush ---- Mesh_Generation
截至UE4 4.10 runtime 无法生成BSP类 ,只能通过自定义的Mesh的Vertex 进行绘制 ( Google 考证,能改UE4源码的请忽略 ) 可用到的 UE4 集成的Render ...
- 移植UE4的模型操作到Unity中
最近在Unity上要写一个东东,功能差不多就是在Unity编辑器上的旋转,移动这些,在手机上也能比较容易操作最好,原来用Axiom3D写过一个类似的,有许多位置并不好用,刚好在研究UE4的源码,在模型 ...
- UE4高级功能--初探超大无缝地图的实现LevelStream
转自:http://blog.csdn.net/u011707076/article/details/44903223 LevelStream 实现超大无缝地图--官方文档学习 The Level S ...
- UE4里的渲染线程
记的上次看过UniRx里的源代码,说是参考微软的响应式编程框架,响应式编程里的一些理论不细说,只单说UniRx里的事件流里的事件压入与执行,与UE4的渲染线程设计有很多相同之处,如果有了解响应式编程相 ...
随机推荐
- nginx/apache静态资源跨域访问问题详解
1. apache静态资源跨域访问 找到apache配置文件httpd.conf 找到这行 #LoadModule headers_module modules/mod_headers.so把#注释符 ...
- Gerrit - 安装配置GitWeb
1 - GitWeb简介 GitWeb是一个支持在Web页面上查看代码以及提交信息的工具. 安装GitWeb工具并且集成到Gerrit中,就可以直接在Gerrit的项目列表中查看项目的代码信息. 2 ...
- SecureCRT-登录unix/linux服务器主机的软件
百度百科说辞: SecureCRT是一款支持SSH(SSH1和SSH2)的终端仿真程序,简单地说是Windows下登录UNIX或Linux服务器主机的软件. SecureCRT支持SSH,同时支持Te ...
- VS2015 控制台cl编译器全局环境变量配置
Visual C++的cl.exe编译器是微软推出的编译器 为了可以在CMD里使用cl.exe手工执行编译操作 设置环境变量 PATH C:\Program Files (x86)\Microsoft ...
- 关于Oracle报 ORA-00600: 内部错误代码, 参数: [kkqcscpopn_Int: 0], [], [], [], [], [], [], [], [], [], [], []解决
服务器上有的Oracle版本是11.2.0.1.0,但是用到了mybatis-PageHelper分页插件会报这个错误. 下面说说我是怎么遇到这个错误的:同事写的这个功能点是用到了前台分页,是正常的没 ...
- thinkphp3.2.2公用函数
thinkphp3.2.2公用函数函数调用默认路径 home/Common/function.php
- 将笔记本无线网卡链接wifi通过有线网卡共享给路由器
1.背景 背景这个就说来长了,在公司宿舍住着,只给了一个账号,每次登录网页都特别麻烦(需要账号认证那种).然后每个账号只支持一个设备在线,这就很尴尬了,那我笔记本.手机.Ipad怎么办? 当然,这时候 ...
- Instance Variable Hiding in Java
class Test { // Instance variable or member variable private int value = 10; void method() { // This ...
- Node.js 开发指南-读书笔记
1. Node.js 使用了单 线程.非阻塞的事件编程模式 Node.js 最大的特点就是采用异步式 I/O 与事件驱动的架构设计.对于高并发的解决方 案,传统的架构是多线程模型,也就是为每个业务逻辑 ...
- js中常见字符串类型操作方法(2)
toLowerCase(),toLocalLowerCase(),toUpperCase(),toLocaleUpperCase() var stringValue = "hello wor ...