Down to the TLP: How PCI express devices talk (Part II)
http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-2
Data Link Layer Packets
Aside from wrapping TLPs with its header (2 bytes) and adding a CRC at the end (LCRC actually, 4 bytes), the Data Link layer runs packets of its own for maintaining reliable transmission. These special packets are Data Link Layer Packets (DLLPs). We’ll list them shortly:
- Ack DLLP for acknowledging successfully received TLPs.
- Nack DLLP for indicating that a TLP arrived corrupted, and that a retransmit is due. Note that there’s also a timeout mechanism in case nothing that looks like a TLP arrives.
- Flow Control DLLPs: InitFC1, InitFC2 and UpdateFC, used to announce credits, as described below.
- Power Management DLLPs.
Flow control
As mentioned before, the data link layer has a Flow Control (FC) mechanism, which makes sure that a TLP is transmitted only when the link partner has enough buffer space to accept it.
I used the term “link partner” and not “destination” deliberately. For example, when a peripheral is connected to the Root Complex through a switch, it runs its flow control mechanism against the switch and not the final destination. In other words, once the TLP is transmitted from the peripheral, it’s still subject to the flow control mechanism between the switch and the Root Complex. If there are more switches on the way, each leg has its own flow control.
The mechanism is not the simplest, and its description in the spec will give you goosebumps. So I’ll try to put it fairly clear.
The flow control mechanism runs independent accounting for 6 (six!) distinct buffer consumers:
- Posted Requests TLP’s headers
- Posted Requests TLP’s data
- Non-Posted Requests TLP’s headers
- Non-Posted Requests TLP’s data
- Completion TLP’s headers
- Completion TLP’s data
These are the six credit types.
The accounting is done in flow control units, which correspond to 4 DWs of traffic (16 bytes), always rounded up to the nearest integer. Since headers are always 3 or 4 DWs in length, every TLP transmitted consumes one unit from the respective header credit. When data is transmitted, the number of consumed units is the number of data DWs in the TLP, divided by four, rounded upwards. So we can imagine data buckets at the receiver of 16 bytes each, on which we are not allowed to mix data from different TLPs. Each bucket is a flow control unit.
Now lets imagine that there’s a doorkeeper at the transmitter, which counts the total number of flow control units consumed since the link establishment, separately for each credit type. This is six numbers to keep track of. This doorkeeper also has the information about the maximum number each of these credit types is allowed to reach. If a certain TLP for transmission would make any of these counted units exceed its limit, it’s not allowed through. Another TLP may be transmitted instead (subject to reordering rules) or the doorkeeper simply waits for the limit to rise.
This is the way the flow control works. When the link is established, both sides exchange their initial limits. As each receiver processes incoming packets, it updates the limits for its link partner, so it can use the buffer space released. UpdateFC FLLP packets are sent periodically to announce the new credit limits.
Well, I overlooked a small detail: Since we’re counting the total number of units since the link started, there’s always a potential for overflow. The PCIe standard allocates a certain number of bits for each credit type counter and its limit (8 bits for header credits, 12 bits for data credits), knowing that they will overflow pretty soon. This overflow is worked around by making the comparison between each counter and its limit with straightforward modulo arithmetic. So given some restrictions on not setting the limit too high above the counter, the flow control mechanism implements the doorkeeper described above.
Bus entities are allowed to announce an infinite credit limit for any or all of the six credit types, meaning that flow control for that specific credit type is disabled. As a matter of fact, endpoints (as opposed to switches and the Root Complex) must advertise an infinite credit for completion headers and data. In other words, an endpoint can’t refuse to accept a completion TLP based upon flow control. So the Requester of a non-posted transactions must take responsibility for being able to accept the completion by verifying that it has enough buffer space when making the request. This also applies to root complexes not allowing peer-to-peer transactions.
Virtual channels
In part I of this guide, I marked the TC fields in the example TLPs green, saying that those fields are almost always zero. TC stands for Traffic Class and is an identifier used to create Virtual Channels. These Virtual Channels are merely separate sets of data buffers having a separate flow control credits and counters. So by choosing a TC other than zero (and setting up the bus entities accordingly) one can have TLPs being subject to independent flow control systems, preventing TLPs belonging to one channel block the traffic of TLPs belonging to another.
The mapping from TC’s to Virtual Channels is done by software for each bus entity. Anyhow, the real-life PCIe elements I’ve seen so far support only one Virtual Channel, VC0, and hence only TC0 is used, which is the minimum required by spec. So unless some special application requires this, TC will remain zero in all TLPs, and this whole issue can be disregarded.
Packet reordering
One of the issues that comes to mind in a packet network, is to what extent the TLPs may arrive in an order different from how they were sent. The Internet Protocol (IP, as in TCP/IP) for example, allows any packet reshuffling on the way. The PCIe specification allows a certain extent of TLP reordering, and in fact in some cases reordering is mandatory to avoid deadlocks.
Fortunately, the legacy PCI compatibility concern was taken into account in this issue as well, unless the “relaxed ordering” bit is set in the TLP, which it rarely is. This is one of the bits in the Attr field, marked green in the TLP examples in part I of this guide. So all in all, one can trust that things will work as if there was a good old bus we were talking with. Those of us who write to a few registers, and then trigger an event by writing to another one, can go on doing it. I turn off the BAR’s Prefetch bit to be on the safe side, even though there’s nothing to imply that it has anything to do with writes.
The spec defines reordering rules in full detail, but it’s not easy to get the bottom line. So I’ll mention a few results of those rules. All here is said assuming relaxed ordering bit is cleared in all transactions. I’m also ignoring I/O space completely (why use it?):
- Posted writes and MSI’s arrive in the order they were sent. Now, all memory writes are posted, and MSIs are in fact (posted) memory writes. So we know for sure that memory writes are executed in order, and that if we issued an MSI after filling a buffer (writes…) it will arrive after the buffer was actually written to.
- A read request will never arrive before a write request or MSI sent before it. As a matter of fact, performing a Read Request is a safe way to wait for a write to complete.
- Write requests may very well come before read requests sent before them. This mechanism prevents deadlock in certain exotic scenarios. Don’t write to a certain memory area while waiting for the read completion to come in.
- Read completions for a certain request (i.e. with the same Tag and Requester ID) arrive in the order they were sent (so they arrive in order with rising addresses). Read completions of different request may be reordered (but who cares).
Other than that, anything can change order or arrival, including read requests which may be reordered among themselves and with read completions.
To relieve any paranoia about an interrupt message arriving before the write operations that preceded it, section 2.2.7 in the spec spells it out:
The Request format used for MSI/MSI-X transactions is identical to the Memory Write Request format defined above, and MSI/MSI-X Requests are indistinguishable from memory writes with regard to ordering, Flow Control, and data integrity.
Zero-length read request
As just mentioned, reading from a bus entity after writing to it, is a safe way to wait for the write operation to finish for real. But why read anything, if we’re not interested in the data? So they made up a zero-length request, which reads nothing. All four Byte Enables are assigned zeroes, meaning nothing is read. As for the completion, section 2.2.5 in the spec says:
If a Read Request of 1 DW specifies that no bytes are enabled to be read (1st DW BE[3:0] field = 0000b), the corresponding Completion must specify a Length of 1 DW, and include a data payload of 1 DW
So we have one DW of rubbish data in the completion. That’s fair enough.
Payload sizes and boundaries
Every TLP carrying data must limit the number of payload data DWs to Max_Payload_Size, which is a number allocated during configuration (typically 128 bytes). This number applies only to payloads, and not to the Length field itself: Memory Read Requests are not restricted in length by Max_Payload_Size (per spec 2.2.2), but are restricted by Max_Read_Request_Size (per spec 2.2.7).
So a Memory Read Request may ask for more data than is allowed in one TLP, and hence multiple TLP completions are inevitable.
Regardless of the Max_Payload_Size restrictions, completions of (memory) read requests may be split into several completion TLPs. The cuts must be in addresses aligned by RCB bytes (Request Completion Boundary, 128 bytes, for Root Complex possibly 64) per spec 2.3.11. If the Request doesn’t cross such an alignment boundary, only a single Completion TLP is allowed. Multiple Memory Read Completions for a single Read Request must return data in increasing address order (which will be kept by the switching network).
And a last remark, citing the spec 2.2.7: Requests must not specify an Address/Length combination which causes a Memory Space access to cross a 4-KB boundary.
That’s it. I hope reading through the PCI Express specification will be easier now. There’s still a lot to read…
Questions & Comments
If you have a remark, would like to ask a question or discuss something, please post a new topic here. Posting is anonymous; no registration is required.
Down to the TLP: How PCI express devices talk (Part II)的更多相关文章
- Down to the TLP: How PCI express devices talk (Part I)
http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-1 Down to the TLP: How PCI ...
- PCI Express(四) - The transaction layer
原文出处:http://www.fpga4fun.com/PCI-Express4.html 感觉没什么好翻译的,都比较简单,主要讲了TLP的帧结构 In the transaction layer, ...
- Ubuntu 16.04 RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller” 不能上网
来源:http://forum.ubuntu.org.cn/viewtopic.php?f=116&t=463646 1.执行如下命令 uname -a sudo lspci -knn sud ...
- [中英对照]How PCI Express Works | PCIe工作原理
How PCI Express Works | PCIe工作原理 PCI Express is a high-speed serial connection that operates more li ...
- PCI Express
1.1课题研究背景 在目前高速发展的计算机平台上,应用软件的开发越来越依赖于硬件平台,尤其是随着大数据.云计算的提出,人们对计算机在各个领域的性能有更高的需求.日常生活中的视频和图像信息包含大量的数据 ...
- 了解PCI Express的Posted传输与Non-Posted传输
0.写在前面 本文首发于公众号[两猿社],后续将在公众号内持续更新~ 其实算下来接触PCIe很久了,但是由于之前换工作,一直没有系统的学习和练手项目,现在新项目买了Synopsys的PCIe IP,总 ...
- PCI Express(六) - Simple transactions
原文地址:http://www.fpga4fun.com/PCI-Express6.html Let's try to control LEDs from the PCI Express bus. X ...
- PCI Express(五) - Xilinx wizard
原文地址:http://www.fpga4fun.com/PCI-Express5.html Xilinx makes using PCI express easy - they provide a ...
- PCI Express(三) - A story of packets, stack and network
原文出处:http://www.fpga4fun.com/PCI-Express3.html Packetized transactions PCI express is a serial bus. ...
随机推荐
- 日历的问题C语言,C++(boost),python,Javascript,Java和Matlab实现
今天看到一个很有意思的话题,例的标题叙述性描述,下面: 根据以下信息来计算1901年1月1至2000年12月31适逢星期日每个月的第一天的合伙人数量? a) 1900.1.1星期一 b) 1月,3 ...
- JDK源码学习系列05----LinkedList
JDK源码学习系列05----LinkedList 1.LinkedList简介 LinkedList是基于双向链表实 ...
- Could not load file or assembly'System.Data.SQLite.dll' or one of its depedencies
[问题] 在我本机的开发环境c#连接sqlite3没有问题,但是release版本号移植到其它的机器就提示Could not load file or assembly'System.Data. ...
- 于Unity3D动态创建对象和创建Prefab三种方式的原型对象
于Unity3D动态创建对象和创建Prefab三种方式的原型对象 u3d在动态创建的对象,需要使用prefab 和创建时 MonoBehaviour.Instantiate( GameObject o ...
- swift的struct本节描述结构的类型
<span style="font-size:24px;">struct David { var x = 0;//一个结构的定义,两个字段x,y var y = 0;/ ...
- 采用Sambaserver由win平台,linux平台上传文件
1.构造yum [root@db /]# cd /etc/yum.repos.d/ [root@db yum.repos.d]# vi yum.repo --改动光盘挂载位置,enabled设置为启动 ...
- Windows 8实例教程系列 - 数据绑定高级实例
原文:Windows 8实例教程系列 - 数据绑定高级实例 上篇Windows 8实例教程系列 - 数据绑定基础实例中,介绍Windows 8应用开发数据绑定基础,其中包括一些简单的数据绑定控件的使用 ...
- 妙用perfmon Alert抓dump
抓dump文件,经常是解决众多疑难杂症的不二手段.但是很多时候,我们没办法抓.比如说 几秒内的线程数暴涨200个,然后迅速回落 程序跑了两天,内存涨到某个数字就自己OOM了 原因不外乎都是时间短,没有 ...
- IDF - CTF - 牛刀小试
找学校CTF好地方,IDF实验室CTF训练营(http://ctf.idf.cn/). . 刚接触CTF.来玩下牛刀小试.AK了. . 好爽好爽.. 1.摩斯password 嘀嗒嘀嗒嘀嗒嘀嗒 时针它 ...
- Android开发技巧——PagerAdapter再简单的包
再次内容View的ViewPager该适配器PagerAdapter简包,支持List数据与SparseArray数据.随着更新的浏览功能. 首先,首先贴上顶部抽象类代码: /* * Date: 14 ...