Method and system for early speculative store-load bypass
In an embodiment, the present invention describes a method and apparatus for detecting RAW condition earlier in an instruction pipeline. The store instructions are stored in a special store bypass buffer (SBB) within an instruction decode unit (IDU). The IDU compares the instruction fields that are used for address generation of all 'load' instructions against 'store' instructions within a group of fetched instructions and 'store' instructions previously stored in the SBB. If a match of instruction fields is found, the IDU 'speculates' that the load instruction has dependency on the 'store' instruction. A data cache unit (DCU) validates the dependency of the load instruction 'speculated' by the IDU. If a false dependency is 'speculated' by the IDU, the DCU forces a re-fetch of the load instruction.
FIELD OF THE INVENTION
[0001] Present invention relates to out of order processor architecture, specifically to read-after-write (RAW) bypass in the out of order processor.
DESCRIPTION OF THE RELATED ART
[0002] Generally, in out of order processors, when an instruction attempts to read a location that has been modified, it creates a condition called read-after-write (RAW). In most out of order processors, RAW condition is detected at the Store Queue boundary. Typically, the address (physical or virtual) of a store instruction is compared against the address (physical or virtual) of a load instruction. If a match is found, the data from store instruction is forwarded to the load instruction.
[0003] Typically, the RAW condition is detected before accessing the main memory for data in a cache (e.g., data cache unit or the like). The cache unit includes load and store queues. The load/store addresses are compared in the cache before accessing the main memory. Detecting the RAW condition late in the instruction pipeline (e.g., in the cache, before accessing the main memory or the like) degrades the performance of out of order processors. Further, it requires additional multiplexers to forward data from the store queue to the load instruction which complicates store queue design. Adding additional devices (e.g., multiplexers, comparator logic with each entry of store queue or the like) results in additional power dissipation in the out of order processors. A method and an apparatus are needed to detect the RAW condition earlier in the instruction pipeline.
SUMMARY
[0004] In one embodiment, a method and apparatus for detecting RAW condition earlier in an instruction pipeline is described. The store instructions are stored in a special store bypass buffer (SBB) within an instruction decode unit (IDU). The IDU compares the instruction fields that are used for address generation (e.g., immediate fields, register fields or the like) of all 'load' instructions against 'store' instructions within a group of fetched instructions and 'store' instructions previously stored in the SBB. If a match of instruction fields is found between the 'load' and the 'store' instruction, the IDU 'speculates' that the load instruction has dependency on the 'store' instruction. The IDU transfers the instruction fields' pointers from the 'store' instruction to the 'load' instruction and the 'load' instruction is then scheduled for execution with newly assigned instruction fields. A data cache unit (DCU) validates the dependency of the load instruction 'speculated' by the IDU by comparing the actual address of the load instruction and the store instruction. If a false dependency is 'speculated' by the IDU, the DCU forces a re-fetch of the load instruction.
[0005] In some embodiment, a method for determining dependency of a load instruction is described. The method includes identifying one or more instruction fields of the load instruction, comparing the one or more instruction fields of the load instruction with one or more instruction field of one or more store instructions and determining whether the one or more instruction fields of the load instruction match with the one or more instruction fields of the one or more store instructions. According to some variations of the present invention, the comparing is done using one or more instruction field identifications of the one or more instruction fields. In some variation the comparing is done using contents of the one or more instruction fields.
[0006] In some variation, the method includes declaring the dependency of the load instruction on one of the one or more store instructions if the one or more instruction fields of the load instruction match with the one or more instruction fields of one of the one or more store instructions. In some variation, the method includes declaring the dependency of the load instruction on a most recently fetched store instruction from one of the one or more store instruction whose one or more instruction fields matched with one or more instruction fields of the load instruction if the one or more instruction fields of the load instruction match with the one or more instruction fields of more than one of the one or more store instructions. According to an embodiment of the present invention, one of the instruction fields is an immediate field. According to some variations, one of the instruction fields is a register field. In some variation, the identifying, comparing and determining is done during instruction decoding.
[0007] The method includes fetching a group of instructions. In some variation, the load instruction is one of the group of instructions. According to other variation the present invention, the one or more store instructions are from the group of instructions. According to an embodiment of the present invention, the one or more store instructions are stored in a store bypass buffer.
[0008] The method includes performing a data bypass between the load instruction and one of the one or more store instructions whose one or more instruction fields matched with one or more instruction fields of the load instruction. According to some embodiment of the present invention, performing the data bypass further comprises assigning the one or more instruction field identifications of the one or more instruction fields of the one of the one or more store instructions to the load instruction. The method includes storing the one or more store instructions in the store bypass buffer if the group of instructions includes the one or more store instructions. The method includes forwarding the load instruction for execution.
[0009] The method includes validating the dependency of the load instruction upon one of the one or more store instructions. In some variations, the validating comprises comparing the physical addresses of one or more instruction fields of the load instruction with physical address of the one or more instruction fields of the one or more store instructions. The method includes, if the contents of the one or more instruction fields of the load instruction do not match with the one or more instruction fields of one or more store instructions, requesting a re-fetch of the load instruction.
[0010] The foregoing is a summary and thus contains, by necessity, simplifications, generalizations and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] The present invention may be better understood, and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings.
[0012]FIG. 1 illustrates an example of processor architecture according to an embodiment of the present invention.
[0013]FIG. 2 is a flow diagram illustrating an exemplary sequence of operations performed during a process detecting instruction dependency according to an embodiment of the present invention.
[0014]FIG. 3 is a flow diagram illustrating an exemplary sequence of operations performed when a 'dependent' load instruction is received according to an embodiment of the present invention.
[0015] The use of the same reference symbols in different drawings indicates similar or identical items.
DESCRIPTION OF THE PREFERRED EMBODIMENT(S)
[0016] The following is intended to provide a detailed description of an example of the invention and should not be taken to be limiting of the invention itself. Rather, any number of variations may fall within the scope of the invention which is defined in the claims following the description.
[0017] In addition, the following detailed description has been divided into sections in order to highlight the invention described herein; however, those skilled in the art will appreciate that such sections are merely for illustrative focus, and that the invention herein disclosed typically draws its support from multiple sections. Consequently, it is to be understood that the division of the detailed description into separate sections is merely done as an aid to understanding and is in no way intended to be limiting.
[0018] Introduction
[0019] In some embodiment, the present invention describes a method and apparatus for detecting RAW condition earlier in an instruction pipeline. The store instructions are stored in a special store bypass buffer (SBB) within an instruction decode unit (IDU). The IDU compares the instruction fields that are used for address generation (e.g., immediate fields, register fields or the like) of all 'load' instructions against 'store' instructions within a group of fetched instructions and 'store' instructions previously stored in the SBB. If a match of instruction fields is found between the 'load' and the 'store' instruction, the IDU 'speculates' that the load instruction has dependency on the 'store' instruction. The IDU transfers the instruction fields' pointers from the 'store' instruction to the 'load' instruction and the 'load' instruction is then scheduled for execution with newly assigned instruction fields. A data cache unit (DCU) validates the dependency of the load instruction 'speculated' by the IDU by comparing the actual address of the load instruction and the store instruction. If a false dependency is 'speculated' by the IDU, the DCU forces a re-fetch of the load instruction.
DETAILED ARCHITECTURAL DESCRIPTION
[0020]FIG. 1 illustrates an example of a core 100 for an out of order processor according to an embodiment of the present invention. A system may include one or more cores such as core 100. Core 100 includes an Instruction Fetch Unit (IFU) 110. IFU 110 is coupled via a link 115 to an Instruction Decode Unit (ID U)120. IFU 110 fetches a group of instructions to be executed (e.g., in one cycle or the like) and forwards the group of instructions to IDU 120. IDU 120 decodes the group of instructions fetched by IFU 110. IDU 120 includes a Store Bypass Buffer (SBB) 125. SBB125 can be configured as any storage element (e.g., first-in-first-out, first-in-last-out or the like). The size of SBB 125 can be of any length (e.g., same as the number of instructions fetched in a group or the like). In the present example, SBB 125 is configured as first-in-first-out storage element.
[0021] IDU 120 identifies 'store' instructions from the group of instructions fetched by IFU 110 and stores them into SBB 125. IDU assigns a temporary scratch register for each one of the 'store' and 'load' instructions. According to an embodiment of the present invention, IDU assigns a temporary scratch register 'rd' to each 'store' instruction, extends the instruction fields of each load instruction using a temporary register and assigns a temporary register 'rs' to each one of 'load' instruction. The 'rs' field is used to identify the source register for data bypass using 'rd' field of the store instruction once a dependency between a 'store' and a 'load' instruction is declared valid. The use of temporary register fields is one of several means that can be used to bypass data between dependent instructions. Other means such as, indirect addressing, transfer of data pointers between instructions or the like can be used to bypass data between dependent instructions.
[0022] IDU 120 identifies 'load' instructions from the group of fetched instructions and compares the instruction fields of the 'load' instructions with the instruction fields of 'store' instructions within the group of fetched instructions and with 'store' instructions stored in SBB 125. When a match of instruction fields between a load instruction and a 'store' instruction is found, IDU 120 'speculates' that the 'load' instruction is dependent upon the 'store' instruction and declares the dependency of the 'load' instruction upon the 'store' instruction by forwarding the 'rd' field of the matching 'store' on to the newly assigned 'rs' field of the 'load.' IDU 120 then forwards the group of fetched instructions via a link 127 to a Rename Issue Unit (RIU) 130. RIU 130 renames the instruction fields (e.g., the source registers of the instructions or the like), checks the dependencies of instructions and when instructions are ready to be issued, issues the instructions via a link 135 to an Execution Unit (EXU) 140. IDU 120 renames the destination registers of 'store' and 'load' instructions.
[0023] EXU 140 includes a Working Register File (WRF) 142 and an Architectural Register File (ARF) 145. WRF 142 and ARF 145 can be any storage elements. In the present example, ARF 145 includes temporary scratch registers (e.g., register 'rd' or the like). EXU140 executes instructions and stores the results into WRF 142. EXU 140 is coupled to a Commit Unit (CMU) 150 via a common link 160. Link 160 couples various elements of core 100 as shown in FIG. 1. CMU 150 monitors instructions and determines whether the instructions are ready to be committed. When an instruction is ready to be committed, CMU150 writes the associated results from WRF 142 into ARF 145. The functions of RIU 130, WRF 142, ARF 145 and CMU 150 are known in art. CMU 150 is coupled to a Data Cache Unit (DCU) 170 via a link 180. DCU 170 is further coupled to various elements of core 100 via a link 190 as shown in FIG. 1. DCU170 further includes a Load Queue (LDQ) 172 and a Store Queue (STQ) 175. LDQ 172 is responsible for managing load and store requests and STQ 175 is responsible for managing store requests. DCU 170 is coupled via a link 192 to a memory sub-system 195. The functions of DCU 170 are known in art. Conventionally, DCU 170 performs load/store bypass after comparing the physical addresses of load and store destinations.
[0024] Determining Instruction Dependency
[0025] Initially, in the first fetch cycle, IFU 110 fetches a group of instructions. For purposes of illustration, in the present example, IFU 110 fetches a group of three instructions. However, IFU 150 can fetch any number of instructions supported by the architecture of core 100. IDU 120 decodes the group of fetched instructions and determines whether there are any 'load' and 'store' instructions in the group of fetched instructions. If there are 'load' and 'store' instructions in the group of fetched instructions, IDU 120 compares the instruction fields of the 'load' instruction (e.g., fields used in address generation such as register field, immediate field or the like) with the instruction fields of the 'store' instruction in the fetch group and writes the 'store' instructions into SBB125. If a match is found between the instruction fields of the 'load' and a 'store' instruction, IDU 120 'speculates' that the 'load' instruction is dependent upon the 'store' instruction and identifies the 'load' instruction as dependent upon the 'store' instruction. The dependency of 'load' instructions can be identified using various techniques known in art. If there are no 'load' instructions but 'store' instructions in the fetch group, IDU 120 stores those 'store' instructions in SBB125. If there are 'load' instructions but no 'store' instructions in the fetch group then IDU does not force any dependency as SSB 125 in this case is empty. In the present example, the size of SBB 125 is same as the size of the group of fetched instructions (e.g., three or the like). However, SBB 125 can be of any size supported by core 100 architecture.
[0026] IDU 120 then determines whether there are any 'load' instructions in the group of fetched instructions. If there are 'load' instructions in the group of fetched instructions, IDU 120 compares the instruction fields of the 'load' instructions (e.g., register field, immediate field or the like) with the instruction fields of the 'store' instructions in SBB 125. If a match is found between the instruction fields of the 'load' and a 'store' instruction, EDU 120 'speculates' that the 'load' instruction is dependent upon the 'store' instruction and identifies the 'load' instruction as dependent upon the 'store' instruction. The dependency of 'load' instructions can be identified using various techniques known in art.
[0027] Typically, IDU 120 does not analyze the instruction fields of 'load' and 'store' instructions (e.g., actual contents of the instruction fields, physical addresses of the instruction fields or the like). Because the contents of the instruction fields are not analyzed, the dependency of the 'load' instruction is a 'speculation' by IDU 120. 'Speculating' a dependency in IDU 120 and performing a data bypass in RIU 130 and EXU 140 allows a younger dependent instruction to be issued sooner than the conventional execution in the out of order processors. Conventionally, the younger dependent instructions are not issued until the comparison of addresses and data bypass is done in DCU 170.
[0028] In the following fetch cycle, before storing 'store' instructions in SBB 125, IDU 120 first identifies and compares the instruction fields of 'load' instructions with 'store' instructions in the group of fetched instructions and then with the instruction fields of 'store' instructions stored in SBB 125. Thus, in every following fetch cycle, load instructions are compared against 'store' instructions of incoming fetch group and previous fetch group in SBB 125.
[0029] Once IDU 120 identifies a 'load' instruction as dependent upon a 'store' instruction, IDU 120 forwards the assigned 'rd' field of the 'store' instruction on to the newly assigned instruction field 'rs' of the 'load' instruction and the load instruction is executed using the newly assigned instruction fields. The data between the 'load' instruction and the 'store' instruction is by-passed in RIU 130and EXU 140 using 'rd' and 'rs' register fields. Upon receiving the 'dependent' 'load' instruction, DCU 170 determines whether the dependency of the 'load' instruction was validly 'speculated' by IDU 120. The determination of dependency validity can be made using various techniques known in art (e.g., by comparing the physical addresses or the like). If the dependency of the 'load' instruction was validly 'speculated' by IDU 120, DCU 170 maintains the dependency of the 'load' instruction. If DCU 170 determines that dependency of the 'load' instruction was not validly 'speculated' by IDU 120 then DCU 170 forces a re-fetch of the 'load' instruction.
[0030] When IDU 120 identifies a 'load' instruction having its instruction fields (e.g., register, immediate, or the like) common with more than one 'store' instructions (i.e. multiple dependencies), IDU 120 forces the 'load' instruction to be dependent on the most recently fetched ('youngest') 'store' instruction prior to this 'load'. When a 'load' instruction is dependent upon a 'store' instruction that is not part of the group of instructions fetched by IFU 110 or stored in SBB 125 then the dependency is identified by DCU 170 using conventional means.
[0031]FIG. 2 is a flow diagram illustrating an exemplary sequence of operations performed during a process of detecting instruction dependency in an instruction decoding unit (e.g., IDU 120) according to an embodiment of the present invention. While the operations are described in a particular order, the operations described herein can be performed in other sequential orders (or in parallel) as long as dependencies between operations allow. In general, a particular sequence of operations is a matter of design choice and a variety of sequences can be appreciated by persons of skill in art based on the description herein.
[0032] Initially, the process identifies 'load' instructions from a group of instructions fetched by an instruction fetch unit (e.g., IFU 110 or the like) (205). The process then first compares the instruction fields of 'load' instructions (e.g., immediate field, register field or the like) with one or more 'store' instructions within the group of fetched instructions, if any (210). The process then compares the instruction fields of 'load' instructions with the instruction fields of 'store' instructions stored in a 'store' instruction buffer (e.g., SBB 125) (215).
[0033] The process then determines whether there was a match between the instruction fields of 'load' instructions and one or more 'store' instructions (220). If there was no match between the instruction fields of 'load' instructions and 'store' instructions, the process proceeds to store the 'store' instructions in the store bypass buffer (240). If there was match between the instruction fields of 'load' instructions and 'store' instructions, the process determines whether the instruction fields of the 'load' instructions matched with the instruction fields of more than one 'store' instructions within the group of fetched instructions and 'store' instructions stored in the store bypass buffer (225). If the instruction fields of the 'load' instructions matched with instruction fields of more than one 'store' instructions within the group of fetched instructions and 'store' instructions stored in the store bypass buffer, the process forces the 'load' instruction to be dependent on the 'youngest' 'store' instruction prior to this 'load' (230). The process then proceeds to store the 'store' instructions in the store bypass buffer (240).
[0034] If the instruction fields of the 'load' instruction did not match with instruction fields of more than one 'store' instructions from the group of fetched instructions and 'store' instructions stored in the store bypass buffer, the process 'speculates' that the 'load' instruction is dependent upon the 'store' instruction and 'forces' a dependency of the 'load' instruction upon the 'store' instruction that matched the instruction fields of the 'load' instruction (235). According to one embodiment, after forcing the dependency, the process forwards the 'rd' field of the 'store' instruction to the newly assigned 'rs' field of the 'load' instruction.
[0035] The process stores the 'store' instructions, if any, from the group of fetched instructions, into the store bypass buffer (step240). The process then forwards the instruction for execution (e.g., forwarding the instruction to RIU 130 or the like) (250). The process executes the 'load' instruction using the newly assigned instruction fields. The process then performs the data bypass for the 'load' instruction (260). The data bypass can be performed using various techniques known in art.
[0036]FIG. 3 is a flow diagram illustrating an exemplary sequence of operations performed when a 'dependent load' instruction is received according to an embodiment of the present invention. The steps described herein can be performed in any order (e.g., sequentially, parallel or the like). Initially, the process receives a 'dependent' load instruction (e.g., at DCU170 or the like) (310). The dependency of the load instruction can be determined using the process such as the one described in FIG. 2. The process then determines whether the dependency of the 'load' instruction is valid (320). The validity of the dependencies can be determined using various techniques known in art. According to an embodiment of the present invention, the process determines the validity of the dependency by comparing the contents of instruction fields of the 'load' instructions and the 'store' instruction (e.g., actual comparison of immediate instruction fields, physical addresses or the like).
[0037] If dependency of the 'load' instruction is valid, the process completes the execution of instructions (330). If the dependency of the 'load' instruction is not valid, the process forces a re-fetch of the instruction (340). When the process forces a re-fetch, the 'load' instruction is fetched again (e.g., by IFU 110 or the like) and is processed again according to the instruction execution process.
SRC=https://www.google.com.hk/patents/US20040044881
Method and system for early speculative store-load bypass的更多相关文章
- Speculative store buffer
A speculative store buffer is speculatively updated in response to speculative store memory operatio ...
- Failed to issue method call: Unit httpd.service failed to load: No such file or directory.
centos7修改httpd.service后运行systemctl restart httpd.service提示 Failed to issue method call: Unit httpd.s ...
- 关于.ToList(): LINQ to Entities does not recognize the method ‘xxx’ method, and this method cannot be translated into a store expression.
LINQ to Entities works by translating LINQ queries to SQL queries, then executing the resulting quer ...
- Failed to issue method call: Unit mysql.service failed to load: No such file or directory解决的方式
Failed to issue method call: Unit mysql.service failed to load: No such file or directory解决的方式 作者:ch ...
- PatentTips - Method and system for browsing things of internet of things on ip using web platform
BACKGROUND The following disclosure relates to a method and system for enabling a user to browse phy ...
- Ext.store.load callback
var paramsReceivable = {}; paramsReceivable.querytext = Ext.getCmp('hiddquerytext').g ...
- Ext JS 5 关于Store load返回json错误信息或异常的处理
关于在store load的时候服务器返回错误信息或服务器出错的处理.ExtJS4应该也能用,没测试过. 这里有两种情况: 服务器返回错误json错误信息,状态为200 服务器异常,状态为500 一. ...
- Method and system for providing security policy for linux-based security operating system
A system for providing security policy for a Linux-based security operating system, which includes a ...
- Method and system for public-key-based secure authentication to distributed legacy applications
A method, a system, an apparatus, and a computer program product are presented for an authentication ...
随机推荐
- 【学习笔记】后端中的MVC和前端MVVM的关系
- UML建模图实战笔记
一.前言 UML:Unified Modeling Language(统一建模语言),使用UML进行建模的作用有哪些: 可以更好的理解问题 可以及早的发现错误或者被遗漏的点 可以更加方便的进行组员之间 ...
- Mac上安装Homebrew和wget
实际上是使用Homebrew来安装wget 安装Homebrew Homebrew一般称为brew,是Mac OSX上的软件包管理工具,能在Mac中方便的安装软件或者卸载软件, 只需要一个命令, 非常 ...
- Selenium私房菜系列1 -- Selenium简介
一.Selenium是什么? Selenium是ThroughtWorks公司一个强大的开源Web功能测试工具系列,本系列现在主要包括以下4款: 1.Selenium Core:支持DHTML的测试案 ...
- 关于ie的内存泄漏与javascript内存释放
最近做一个公司的业务系统,公司要求能尽可能的与c/s近似,也就是如c/s一样,点击文本框可以弹出此项目的相关内容,进行选择输入. 我使用了弹出窗口,然后在子窗口双击选中项目,把选中的值返回给父 ...
- Apache的HttpClient的使用
Apache的HttpClient可以被用于从客户端发送HTTP请求到服务器端,其中封装了客户端发送http的get和post请求 使用Apache的HttpClient发送GET和POST请求的步骤 ...
- Recyclerview设置间距
首先自定义一个RecyclerViewDivider 继承 RecyclerView.ItemDecoration,实现自定义. public class RecyclerViewDivider ex ...
- Java数据结构和算法(三)--三大排序--冒泡、选择、插入排序
三大排序在我们刚开始学习编程的时候就接触过,也是刚开始工作笔试会遇到的,后续也会学习希尔.快速排序,这里顺便复习一下 冒泡排序: 步骤: 1.从首位开始,比较首位和右边的索引 2.如果当前位置比右边的 ...
- Ubuntu下编辑并编译运行c++程序
一.使用vim编辑c++代码: vim hello.cpp 输入如下代码: #include <iostream> using namespace std; int main() { co ...
- Liskon替换原则
肯定有不少人跟我刚看到这项原则的时候一样,对这个原则的名字充满疑惑.其实原因就是这项原则最早是在1988年,由麻省理工学院的一位姓里的女士(Barbara Liskov)提出来的. 定义1:如果对每一 ...