GPU Command Buffer
GPU Command Buffer
This are mostly just notes on the GPU command buffer
The GPU Command Buffer system is the way in which Chrome talks to the GPU either OpenGL or OpenGL ES (or OpenGL ES emulated through ANGLE). It is designed to have an API that emulates the OpenGL ES 2.0 API enforcing the restrictions of that API and working around incompatibilities in drivers and platforms. Goals:The #1 goal of the command buffer system is security. Graphics systems in OSes have gaping security holes. Two simple examples are you can allocate a texture or a buffer and the memory returned is left as is. That memory is often left over from other applications and could contain passwords, images or other data that should not be visible to the calling app. Similarly there are many API functions that are buggy or have poorly designed APIs that make them easy to call in ways that would crash the browser. The #1 goal of the GPU process is to prevent these problems.
The #2 goal is compatibility across systems. From the POV of clients there should be no differences in behavior across systems. In some cases that means enforcing restrictions that are not there on the actual system. Examples include disabling advanced GLSL features. Others involve working around bugs by re-writing shaders or other techniques.
The #3 goal is speed. Speed is why a command buffer implementation was chosen. The client can write commands very quickly with little or no communication with the service and only once in a while tell the service it has written more commands. For example, another implementation could have used a separate IPC for each OpenGL ES 2.0 function but that would arguably be too slow. The command buffer gets another speed boost because it effectively parallelizes calls into the OS graphics API. A call like glUniform or glDrawArrays might be a very expensive call but because of the command buffer the client just writes a few bytes into the command buffer and is done. The GPU process calls the real OpenGL function on another process which effectively makes the program multi-core.
Implementation:
The basic implementation is a "command buffer". A client (the render process, pepper plugin, etc..) writes commands into some shared memory. It updates a 'put' pointer through IPC telling the GPU process how far it has written into that buffer. The GPU process or service then reads commands from that buffer. For each command it validates the command, its arguments, and whether or not the arguments are appropriate for the current state of the OSes graphics API and only then makes the actual call into the OS. This means even a compromised renderer running native code, writing its own commands, can hopefully not get the GPU process to call the graphics system in such a way as to compromise the system.
When writing new service side code, please keep that in mind. Never design a new command that requires the client to be well behaved. Assume the client can go rogue. For example, make sure the service's bookkeeping will never be wrong no matter what the client does.
API Layers:
Life of a GL call in ChromeIn simple terms:
gl2.h->gles2_c_lib.cc->GLES2Implemetation->GLES2CmdHelper...SharedMemory...->GLES2DecoderImpl->ui/gfx/gl/gl_bindings->OpenGL
There is an interface, CommandBuffer, that is responsible for coordinating communication between GLES2CmdHelper and GLES2DecoderImpl. It has methods for creating and deleting shared memory as well as communicating the current state back and forth. Specifically sending the latest 'put' pointer from the client through AsyncFlush() or Flush() and for getting the latest 'get' pointer through the results of 'Flush'
An implementation of CommandBuffer called CommandBufferService directly talks to GLES2DecoderImpl. If you had a single threaded single process chrome you could pass an instance of CommandBufferService to GLES2CmdHelper and the ideas is things would just work. In the the real multi-process chrome there is another implementation, ComamdBufferProxy which uses IPC to talk from the client to the service through GpuCommandBufferStub to GpuScheduler to CommandBufferService.
Client side code:
Note: Everything in src/gpu/command_buffer/client and src/gpu/command_buffer/common must compile WITHOUT EXTRA LIBRARIES as they are used in the untrusted Pepper plugin
These define the public OpenGL ES 2.0 interface
src/third_party/khronos/GLES2/gl2.h
src/third_party/khronos/GLES2/gl2ext.h
This defines the C interface. Most of this is auto generated
src/gpu/command_buffer/client/gles2_c_lib.cc
src/gpu/command_buffer/client/gles2_c_lib_autogen.h
This is the actual client side implementation that writes commands into the command buffer. Most of this is auto generated.
src/gpu/command_buffer/client/gles2_implementation.cc
src/gpu/command_buffer/client/gles2_implementation_autogen.h
This is a mostly auto generated class to help with formatting commands.
src/gpu/command_buffer/client/gles2_cmd_helper.h
src/gpu/command_buffer/client/gles2_cmd_helper_autogen.h
These define the actual format of the commands
src/gpu/command_buffer/common/cmd_buffer_common.h
src/gpu/command_buffer/common/gles2_cmd_format.h
src/gpu/command_buffer/common/gles2_cmd_format_autogen.h
Service side code:This is the code that reads the commands, validates and calls OpenGL.
src/gpu/command_buffer/service/gles2_cmd_decoder.cc
src/gpu/command_buffer/service/gles2_cmd_decoder_autogen.cc
3 ways of transferring dataThere are 3 ways of transferring data through the command buffer.
#1) In a command itselfCommands can either have a hard coded length (glUniform4f for example takes exactly a location and 4 floats) or they can have a variable length (glUniform4fv takes N sets of 4 floats). The data is inserted after the command in the command buffer and the length of the command itself is updated to contain that data.
Advantages:
* Easy. Fire and forget
Disadvantages:
* Commands have a maximum length of 1meg - 1
* Commands can only be as long as the command buffer itself.
#2) In shared memorySome commands transfer data in shared memory. TexImage2D for example, the client puts data into a shared memory. The command itself has a shared memory id and an offset into that shared memory as well as either an explicit or implicit size. For TexImage2D the size is implicit.
Advantages:
* Can transfer any size
* Can pre-allocate the shared memory and fill any time (glMapTexSubImage2D for example)
Disadvantages:
* Must check with the server when it has actually used the contents of the shared memory.
#3) In a bucketBuckets are a kind of abstraction of #2. You define a bucket by size (1 command) you then transfer the data into the bucket through shared memory (n commands), finally you issue the command you really wanted to issue (ShaderSource, CompressedTeximage2D, ...) and reference the bucket.
The problem buckets attempt to solve is; Imagine you are trying to implement TexImage2D and you only have 1meg of shared memory and you are asked to send a 3meg texture. You can't call TexImage2D with your 3 meg of data since you only have 1 meg so instead you call TexImage2D with no data to define your texture and then call TexSubImage2D 3 times to transfer your data. GLES2Implementation actually does this.
Now imagine you are trying to implement ShaderSource. You are passed a 3 meg string and you only have 1meg of shared memory. There is no ShaderSubSource function so you can't use the previous solution. Instead, you create a bucket of 3 meg, transfer the data to that bucket 1meg at a time, then issue the ShaderSource command referencing the bucket.
Advantages:
* Do not need a "SubData" command implemented
* Can handle data larger than shared memory
Disadvantages:
* Slower. Data has to be copied out of shared memory and into a bucket.
Adding a command:
Here are some terse notes for adding a new command:
Texture issues
In order to prevent a user program from reading uninitialized vram all textures must be cleared before being used. In order to increase the speed of programs that upload a lot of textures this clearing happens lazily. If you call glTexImage(..., null) the command buffer will create the texture level and mark it as unclear. Before any read or write to that texture the command buffer will clear it.
If you add functions that update textures you need to call the code that clears an uncleared level by calling TextureManager::ClearTextureLevel for the level in question. You can see examples of this in GLES2DecoderImpl::DoTexImage2D, GLES2DecoderImpl::DoCopyTexSubImage2D, etc..
OpenGL Quirks to be aware of
OpenGL ES 2.0 incompatibilitiesClient side arraysClient side arrays refers to the ability to store vertex data in client side memory and have OpenGL reference it directly. The command buffer itself does not support this. All vertex data must be put in an OpenGL buffer.
To keep compatibility the client side class, GLES2Implementation, emulates client side arrays by tracking the OpenGL attribute state and at draw time, copying any client side arrays into a buffer, updating the vertex attributes to use this buffer, issuing the draw call, then restoring the vertex attribute state. This is a very slow operation and because there is no way to know when the client has changed any of its data those buffers must be updated with every draw call. For this reason, and because more modern versions of OpenGL require it, client side array emulation is compiled out for everything except Native Client.
GL_FIXEDOpenGL ES 2.0 is required to support GL_FIXED as an attribute type (an argument to glVertexAttribPointer). Desktop GPUs do not support this.
The command buffer has optional support for this. Turning it on requires calling glEnableFeatureCHROMIUM("pepper3d_support_fixed_attribs"); as one of the first calls into GL. This makes the command buffer keep it's own copy of all GL buffers. At DrawXXX time, any attributes that are of type GL_FIXED are pulled out of their respective buffers, converted to float and copied to a temp buffer. The attributes are changed to point to this temp buffer. Then the draw happens and the attributes are reset to their previous state.
Clearly this is slow and requires lots of memory. It is there solely to help port OpenGL ES 2.0 apps to NaCl and to pass the OpenGL ES 2.0 conformance tests.
Refactoring Ideas:
Separating decoding the command buffer from emulating OpenGL ES 2.0Currently those 2 responsibilities are mixed together in GLES2DecoderImpl. There has been some discussion of separating them. A few issues off the top of my head
#1) Validating shared memoryAs one example the command for TexImage2D gets passed a shared memory id, an offset and a size. Before the real glTexImage2D is called the service needs to validate that the id is a valid shared memory id, that the offset and size are wholly contained inside that shared memory, that the call to glTexImage2D is going to only reference memory inside that shared memory region. To do that requires potentially knowing various state that would normally be not efficiently query-able given a separated OpenGL ES emulation. It's possible the needed state could be easily exposed through separate functions or else maybe changing TexImage2D and similar commands so the size is explicit 'size' instead of implicit 'width * height * type * format'.
#2) Dealing with resource idsOpenGL ES 2.0 uses int ids for resources. The client uses one set of ids and the service a different set. A mapping from a client id to the service id is kept by the service. In order to avoid a round trip from client to server to manage those ids, for clients context that are not sharing resources, the ids are completely managed on the client and just their usage is communicated to the service. The service makes up a service id to associate with a given client id as needed. Under the current design this works. If the command buffer code was separated from the OpenGL ES 2.0 emulation code a new method of managing these ids would need to be inserted, possibly requiring a double mapping, mapping a client id to a command buffer service id and mapping a command buffer service id to the OpenGL ES 2.0 emulation id. Again, there may be ways to design out that issue.
Moving functionally from GLES2DecoderImpl to the various resource managersGLES2DecoderImpl is HUGE. Nearly 8000 lines. There's been talk of moving large chunks of functionality to the various resource managers. For example, move all handling of the texture functions, TexImage2D, TexSubImage2D, TexParameter, CopyTexImage2D, CopyTexSubImage2D, CompressedTexImage2D, CompressedTexSubImage2D, GenTexture, DeleteTexture, IsTexture, TexStorage2DEXT from GLES2DecoderImpl to TextureManager.
I'd love to see that happen. Unfortunately I expect it's not a small amount of work. In particular fixing up all the unit tests
Still, it seems like it would be a much cleaner implementation to go that route.
Separate command generation from OpenGL ESbuild_gles2_cmd_buffer.py has a nearly 1 to 1 mapping of OpenGL ES functions to commands. Ideally the commands in the command buffer would be separate from the OpenGL api so that it would be easier to add any command needed and not have to expose it as an OpenGL ES extension.
Remove legacy codeOriginally the command buffer commands were going to be the public API to the gpu process with the OpenGL ES API as a wrapper. Most game consoles allow you to work directly with command buffers which is one reason for their performance. Being able to work directly with command buffers means you can pre-compute command buffers and patch them on the fly as needed which in turn means your code can do the minimal amount of work and therefore gain a lot of speed. Eventually it was decided not to expose the command buffer commands directly but there is still code based on the original design that could be removed.
low-level commandsImplemented in src/gpu/command_buffer/client/cmd_buffer_helper.cc, src/gpu/command_buffer/common/cmd_buffer_common.h and src/gpu/command_buffer/service/common_decoder.cc are functions that implement JUMP, CALL and RETURN. These functions in turn influence some of the design constraints of the rest of the system. They are not needed unless command buffers are a public interface and could be removed.3 types of commandsbuild_gles2_cmd_buffer.py generates 3 versions of many functions. One for each of the data transfer modes above. For example TexImage2D, TexImage2DImmediate and TexImage2DBucket are respectively the transfer buffer implementation of TexImage2D, the data-in-the-command-buffer version of TexImage2D and the bucket version of TexImage2D. When the command buffer was the public interface it seemed important to have all 3 as they each have their pluses and minuses. Now though only the commands used by GLES2Implementation are needed. Maybe the code that generates all 3 versions should be retired.
remove _CMD_ID_TABLE in build_gles2_cmd_buffer.py _CMD_ID_TABLE's sole purpose is to make sure the ids of commands do not change. This was important when commands were going to be a public API. It no longer matters and can be removed and commands can change ids any time.
size in entriesLeft over from the O3D code, the command buffer works on CommandBufferEntry units. Each unit is 32 bits and sizes of commands and command data is calculated in those units. There's a lot of superfluous math involved in converting to and from those units. If instead the code was refactored so that the size of commands was in bytes all of that extra math code could disappear.
|
GPU Command Buffer的更多相关文章
- WebGPU学习(十一):学习两个优化:“reuse render command buffer”和“dynamic uniform buffer offset”
大家好,本文介绍了"reuse render command buffer"和"dynamic uniform buffer offset"这两个优化,以及Ch ...
- Capabilities & ChromeOptions
https://sites.google.com/a/chromium.org/chromedriver/capabilities http://stackoverflow.com/questions ...
- Rendering and compositing out of process iframes
For Developers > Design Documents > Out-of-Process iframes (OOPIFs) > Rendering and ...
- linux GPU上多个buffer间的同步 —— ww_mutex、dma-fence的使用 笔记
原文链接:https://www.cnblogs.com/yaongtime/p/14111134.html WW-Mutexes 在GPU中一次Render可能会涉及到对多个buffer的引 ...
- context--command buffer
今天看了下 context ,因为要找怎么设置command buffer context为设备提供一些状态的设置和管理command buffer & const buffer buffe ...
- iOS离屏渲染的解释:渲染与cpu、gpu
重开一个环境(内存.资源.上下文)来完成(部分)图片的绘制 指的是GPU在当前屏幕缓冲区以外新开辟一个缓冲区进行渲染操作 意为离屏渲染,指的是GPU在当前屏幕缓冲区以外新开辟一个缓冲区进行渲染操作. ...
- List of Chromium Command Line Switches(命令行开关集)——官方指定命令行更新网址
转自:http://peter.sh/experiments/chromium-command-line-switches/ There are lots of command lines which ...
- Unity GPU Query OpenGLES 3.0
https://github.com/google/render-timing-for-unity/blob/master/RenderTimingPlugin/RenderTimingPlugin. ...
- PatentTips - Indexes of graphics processing objects in GPU commands
BACKGROUND A graphics processing unit (GPU) is a specialized electronic device that is specifically ...
随机推荐
- JAVA设计模式之【状态模式】
状态模式 水.固态.气态.液态 账户.正常状态.透支状态.受限状态 状态模式中,用一个状态类来分散冗长的条件语句,让系统有灵活性和可扩展性 状态模式用于解决系统中复杂对象的状态转换以及不同状态下行为的 ...
- m_Orchestrate learning system---十八、mo项目的启示是什么
m_Orchestrate learning system---十八.mo项目的启示是什么 一.总结 一句话总结:多看教程,体统看教程的学, 完全不懂的话百度的作用也不大 多学点,可以节约后面的超多时 ...
- Laravel-redis-订阅发布
Laravel-redis-订阅发布 标签(空格分隔): php Redis订阅发布 理解订阅发布: publish:将信息 message 发送到指定的频道 channel publish test ...
- xBIM 基础14 使用LINQ实现最佳性能(优化查询)
系列目录 [已更新最新开发文章,点击查看详细] LINQ代表语言集成查询,它是3.5版以来的.NET Framework的一部分.它实现延迟执行,这意味着您可以链接查询语句,并且在您实际迭代结 ...
- JS报错:Cannot read property 'type' of undefined
在做图片上传功能的时候,遇到了JS无法识别图片type的问题,在使用过程中是没有问题的,但是不知道为什么浏览器的Console报这个错误: Uncaught TypeError: Cannot rea ...
- ZBrush软件特性之3D图层
用过Photoshop的小伙伴都知道图层的概念,在Photoshop中可以创建许许多多的图层,在每一图层中又可以分别编辑图像信息.ZBrush®中的3D图层,也可以简单理解成Photoshop中的图层 ...
- 玩转HTML5移动页面(动效篇)
为一名前端,在拿到设计稿时你有两种选择: 快速输出静态页面 加上高级大气上档次狂拽炫酷屌炸天的动画让页面动起来 作为一个有志向的前端,当然是选2啦!可是需求时间又很短很短,怎么办呢? 这次就来谈谈一些 ...
- Linux 磁盘管理及分区
硬盘结构和基础知识 扇区(Sector)为最小的物理储存单位,每个扇区为512 bytes,将扇区组成一个圆就是磁道(track),不同磁盘的相同磁道组成磁柱(Cylinder),磁柱是分区(par ...
- php函数in_array奇怪现象
$k = 0; $fieldArr = array('tt', 'bb'); if ( in_array( $k, $fieldArr)) { echo '1'; } 按理来说,是不会输出1的,可是最 ...
- 走进 CPU 的 Cache
看了上一篇文章.你可能非常想知道,为什么程序的执行结果会是这样.如今,就让我们来走进 CPU 的世界. 在 SMP(对称多处理器)时代,多个 CPU 一起工作.使运算能力进一步提升,那么CPU 是怎样 ...