simlescalar CPU模拟器源代码分析

Sim-outorder.c

Main性能

Fetch ——> despetch——> issue——> writeback ——>commit

Code text——>fetch queue ——> RUU/LSQ(—>readyqueue)——>event queue——>删除envent——>RUU/LSQ

功能模拟高速跳过的指令，之后在for循环中一个cycle一个cycle的模拟，每一个cycle逆序运行：

/* commitentries from RUU/LSQ to architected register file */

ruu_commit();

/* service function unit release events*/

ruu_release_fu();

/* ==> may have ready queue entriescarried over from previous cycles */

/* service result completions, alsoreadies dependent operations */

/* ==> inserts operations into readyqueue --> register deps resolved */

ruu_writeback();

/* try to locate memory operations that are ready to execute */

/* ==> inserts operations into ready queue --> mem deps resolved*/

lsq_refresh();

/* issue operations ready to execute from a previous cycle */

/* <== drains ready queue <-- ready operations commence execution*/

ruu_issue();

/* decode and dispatch new operations */

/* ==> insert ops w/ no deps or allregs ready --> reg deps resolved */

ruu_dispatch();

/* call instruction fetch unit if it isnot blocked */

if (!ruu_fetch_issue_delay)

ruu_fetch();

else

ruu_fetch_issue_delay--;

逆序运行是由于要用顺序运行的代码，模拟并发运行的硬件。假设顺序运行的话，在同一个cycle内，取指阶段取到的指令可能立刻就得到了dispatch。即前一阶段的运行结果会改动相关内容。导致下一阶段本来要使用的数据被改动。因此採用逆序运行。

重要数据结构：

/* areservation station link: this structure links elements of a RUU

reservation station list; used for readyinstruction queue, event queue, and

output dependency lists; each RS_LINKnode contains a pointer to the RUU

entry it references along with an instancetag, the RS_LINK is only valid if

the instruction instance tag matches theinstruction RUU entry instance tag;

this strategy allows entries in the RUU canbe squashed and reused without

updating the lists that point to it, whichsignificantly improves the

performance of (all to frequent) squashevents */

struct RS_link {

struct RS_link *next; /* next entry in list */

struct RUU_station *rs; /* referenced RUU resv station */

INST_TAG_TYPE tag; /* inst instance sequence number */

union {

tick_t when; /* time stamp of entry (for eventq) */

INST_SEQ_TYPE seq; /* inst sequence */

int opnum; /*input/output operand number */

} x;

};

用在3个地方，就绪队列，ready_queue，event_queue和保留站中每条指令的输出依赖（即输入依赖于该条指令输出的全部其它保留站）。

/* a register update unit (RUU) station, thisrecord is contained in the

processors RUU, which serves as a collection of ordered reservations

stations. The reservationstations capture register results and await

thetime when all operands are ready, at which time the instruction is

issued to the functional units; the RUU is an order circular queue, inwhich

instructions are inserted in fetch (program) order, results are storedin

theRUU buffers, and later when an RUU entry is the oldest entry in the

machines, it and its instruction's value is retired to the architectural

structure, this is useful because loads and stores are split into two

operations: an effective address add and a load/store, the add isinserted

intothe RUU and the load/store inserted into the LSQ, allowing the add

towake up the load/store when effective address computation has finished */

structRUU_station {

/*inst info */

md_inst_t IR; /*instruction bits */

enummd_opcode op; /*decoded instruction opcode */

md_addr_t PC, next_PC, pred_PC; /*inst PC, next PC, predicted PC */

intin_LSQ; /*non-zero if op is in LSQ */

intea_comp; /*non-zero if op is an addr comp */

intrecover_inst; /* start ofmis-speculation?

intstack_recover_idx; /*non-speculative TOS for RSB pred */

struct bpred_update_t dir_update; /*bpred direction update info */

intspec_mode; /* non-zeroif issued in spec_mode */

md_addr_t addr; /*effective address for ld/st's */

INST_TAG_TYPE tag; /*RUU slot tag, increment to

squash operation */

INST_SEQ_TYPE seq; /*instruction sequence, used to

sort the ready list and tag inst */

unsigned int ptrace_seq; /*pipetrace sequence number */

intslip;

/*instruction status */

intqueued; /*operands ready and queued */

intissued; /*operation is/was executing */

intcompleted; /*operation has completed execution */

/*output operand dependency list, these lists are used to

limit the number of associative searches into the RUU when

instructions complete and need to wake up dependent insts */

intonames[MAX_ODEPS]; /* outputlogical names (NA=unused) */

struct RS_link *odep_list[MAX_ODEPS]; /*chains to consuming operations */

/*input dependent links, the output chains rooted above use these

fields to mark input operands as ready, when all these fields have

been set non-zero, the RUU operation has all of its register

operands, it may commence execution as soon as all of its memory

operands are known to be read (see lsq_refresh() for details on

enforcing memory dependencies) */

int idep_ready[MAX_IDEPS]; /* input operand ready? */

};

* thecreate vector maps a logical register to a creator in the RUU (and

*specific output operand) or the architected register file (if RS_link

* isNULL)

/* an entry in the create vector */

structCV_link {

struct RUU_station *rs; /* creator's reservation station */

intodep_num; /*specific output operand */

};

每一个寄存器都相应0个或者一个CV_link结构。CV即create vector。即最新的产生该寄存器值的保留站。

寄存器有regs和 spec_regs_R（F、C），前者为真实逻辑寄存器，后者为判断运行时的寄存器。

struct regs_t {

md_gpr_t regs_R; /*(signed) integer register file */

md_fpr_t regs_F; /*floating point register file */

md_ctrl_t regs_C; /*control register file */

md_addr_t regs_PC; /*program counter */

md_addr_t regs_NPC; /*next-cycle program counter */

};

在非判断运行模式，在dispatch的译码阶段真正运行指令，包含读写寄存器，此时读写的是regs，在判断运行模式。即spec_mode=true时，例如以下：

#define GPR(N) (BITMAP_SET_P(use_spec_R,R_BMAP_SZ, (N))\

? spec_regs_R[N] \

: regs.regs_R[N])

#define SET_GPR(N,EXPR) (spec_mode \

? ((spec_regs_R[N] = (EXPR)), \

BITMAP_SET(use_spec_R, R_BMAP_SZ, (N)),\

spec_regs_R[N]) \

: (regs.regs_R[N] = (EXPR)))

即，写寄存器时。假设写spec_regs_R寄存器。并将相应的bitmask表中寄存器相应的位置位；读寄存器时，假设bitmask表中寄存器相应的位置位了，则读spec_regs_R，否则读普通寄存器regs。

各阶段的数据结构：

Ruu_fetch：code_textà fetch_data，即从代码段将指令读进取指队列。

Ruu_dispatch：fetch_dataàRUU（àready_queue，状态为queued），即从取指队列将指令读进保留站，对于普通算术指令，假设操作数准备好了直接发射。对于store指令，操作数准备好了直接发射（发射即进入ready_queue）。对于load、store指令和long latency的指令排在就绪队列的前边。其它的按指令序列插入就绪队列。

Ruu_issue：ready_queueàevent_queue(RSlink，指向保留站)。状态为issued，对ready_queue中的就绪指令进行运行，store指令立马完毕（在commit阶段真正訪存）。load指令检查前面的store指令地址，匹配延迟为1。否则訪问cache。记录延迟，设置事件发生在延迟之后；普通算术指令延迟为运行时间，并设置事件发生时间为延迟之后。

lsq_refresh：LSQàready_queue。更新LSQ，处理WAW和WAR。后面的对同一地址的store会覆盖前面的，假设load指令之前的全部store指令都没有地址同样的。则发射load指令。

ruu_writeback：event_queueà删除该事件，并置rs状态为completed，假设当前指令为转移指令且推測错误，则将该指令之后进入保留站RUU的指令撤销，而且消除判断运行过程中对内存和寄存器的写，而且将返回地址栈顶设置为该转移指令正确的转移地址，设置分支延迟；假设为writeback阶段更新分支预測表则更新；将该指令的输出依赖表置空，并将依赖该指令输出的操作数值为ready。并在所有ready的情况下让指令进入ready_queue。

ruu_release_fu：将资源池中的全部busy的资源的busy值减1。

ruu_commit：删除completed的RUU和LSQ中的指令。处理全部completed的指令，对store指令真正訪存写入cache，可是不产生事件（全部使用同一地址的数据的指令已经都得到了数据）。假设分支预測器更新在commit阶段则更新。（寄存器的写回在dispatch阶段就已经完毕了）。

转移推測更新能够在dispatch阶段、writeback阶段或者commit阶段。

ruu_fetch();

while (ifq不满的情况下。最多取ruu_decode_width * fetch_speed条指令)

/*fetch an instruction at the next predicted fetch address */

fetch_regs_PC =fetch_pred_PC;

If（PC合法）

到memory中取指令赋值给inst（cache仅仅模拟訪问过程，无数据）

If 存在cache和tlb

则模拟訪问cache和tlb，得到取指令的延迟lat

If lat != cache_il1_lat

则堵塞取指令ruu_fetch_issue_delay += lat - 1;

Else

指令为空指令

If 存在分支预測器pred

取操作码op

Ifop为control指令

fetch_pred_PC=预測器预測的指令(同一时候得到stack_recover_idx)

else

fetch_pred_PC为当前指令的下条指令

else

fetch_pred_PC为当前指令的下条指令

当前指令进入指令队列：

fetch_data[fetch_tail].IR = inst;

fetch_data[fetch_tail].regs_PC =fetch_regs_PC;

fetch_data[fetch_tail].pred_PC =fetch_pred_PC;

fetch_data[fetch_tail].stack_recover_idx= stack_recover_idx;

fetch_data[fetch_tail].ptrace_seq =ptrace_seq++;

ruu_dispatch();

while 取指队列不空，保留站和LSQ不满。没达到每轮取指最大值

假设在“顺序”模式，且最后一条指令操作数没准备好，则退出

//取指令队列中头结点的数据，例如以下：

inst = fetch_data[fetch_head].IR;

regs.regs_PC = fetch_data[fetch_head].regs_PC;

pred_PC = fetch_data[fetch_head].pred_PC;

dir_update_ptr = &(fetch_data[fetch_head].dir_update);

stack_recover_idx = fetch_data[fetch_head].stack_recover_idx;

pseq = fetch_data[fetch_head].ptrace_seq;

regs.regs_NPC= regs.regs_PC + sizeof(md_inst_t);

//译码and真正运行指令

switch（op）

//next PC 不等于 PC+4，即发生了跳转

br_taken = (regs.regs_NPC != (regs.regs_PC + sizeof(md_inst_t)));

//predicted PC 不等于 PC+4，即预測结果是发生了跳转

br_pred_taken = (pred_PC != (regs.regs_PC + sizeof(md_inst_t)));

if 完美预測下预測错误或者直接跳转预測跳转但目标错误（显然的错误）

| 修正next PC、指令对列，并设置取指延迟为分支延迟

| fetch_redirected = TRUE;//告知已经取指重定向

if 操作码非空

| 设置保留站rs的对应值

| If 操作为訪存操作

| rs->op = MD_AGEN_OP;//add指令计算地址

| rs->ea_comp = TRUE;

| 设置lsq的对应值

| 设置rs和lsq的in/out依赖

| ruu_link_idep(rs, 0, NA);//rs依赖于哪一个寄存器相应的CVlink

ruu_link_idep(rs, 1, in2);

ruu_link_idep(rs, 2, in3);

/* install output after inputs to prevent self reference */

ruu_install_odep(rs, 0, DTMP);//创建寄存器的CVlink

If rs的操作数准备好了

| readyq_enqueue(rs);

if lsq的操作数准备好了（仅仅有store指令！。！

）

| readyq_enqueue(lsq);

else

设置rs的in/out依赖

If rs的操作数准备好了

| | readyq_enqueue(rs);

Else 空指令 rs=NULL

If当前不是判断模式

If 当前指令是分支指令且设置的为dispatch阶段更新转移预測表。则

更新分支预測的信息表

if (pred_PC != regs.regs_NPC && !fetch_redirected)

//假设预測的结果不正确，且没有修正！

。

spec_mode = TRUE;//開始判断运行

rs->recover_inst = TRUE;

recover_PC = regs.regs_NPC;

end while

ruu_issue:

node = ready_queue;//将ready_queue赋给node

ready_queue = NULL;//将ready_queue赋为空

for 不到发射宽度（default 4）

| ifnode合法

| | 从node取rs，并rs->queued = FALSE;

| | if 是store指令

| | | 直接完毕，设置rs状态为completed

| | else //不是store指令

| | | if 该指令须要功能部件fu

| | | | if 拿到了对应的fu

| | | | else 将rs又一次放回就绪队列

| | | else 不须要功能部件，直接将结果写入事件队列。延迟为1

| | end if

| end node合法

| RSLINK_FREE(node);//释放处理过的

End for

For node不为空

| 将node链表中的就绪指令又一次放回ready_queue

End for

lsq_refresh:

std_unknowns[MAX_STD_UNKNOWNS];一个地址数组，该地址的值不知道（有未完毕的store）

for 遍历LSQ中的每一个操作

| ifstore指令

| | if 地址没准备好

| | | 结束，一个不知道地址的store指令能够堵塞之后全部的load、store指令

| | elseif 操作数没准备好

| | | 将该store指令的地址写入std_unknowns中

| | else 操作数和地址都准备好，则将std_unknowns中地址同样的地址清除

| end if

| if 是queue=false，没发射。没完毕且操作数准备好的load指令

| | 看std_unknowns中是否有地址与load地址同样的，没有则该load进入就绪队列

ruu_writeback：

while (rs = eventq_next_event())//当前sim_cycle有事件

| rs状态置位completed

| if(rs->recover_inst)即该指令是分支且预測错误

| | ruu_recover//清除RUU和LSQ中。该指令之后的全部指令

| | tracer_recover//清空判断运行寄存器的值。清空对memory的写、取指队列。取消判断运行，并将fetch_pred_PC = fetch_regs_PC = recover_PC;

| | bpred_recover//pred->retstack.tos= stack_recover_idx;

| | 设置分支预測错误延迟3

| end if

| if设置的为该WB阶段更新分支预測器，则更新

| for

| | if当前相应该指令输出的寄存器的creator依赖于该指令

则将依赖于该指令的输出的creator vector清空

| | for将依赖于该指令结果的指令的操作数设置为ready，

| | if 依赖指令得到结果后。操作数都ready了且（不是訪存指令或者是store指令）

| | | 将指令增加就绪队列

ruu_release_fu：资源池中的每一个资源假设busy不为0。则减1

ruu_commit：

while RUU不为空且不超过提交宽度

| 获取rs

| if指令为地址比較（LSQ中有相应的load store指令）

| | if LSQ中的load、store指令为完毕，则break

| | if 指令为store指令

| | | 取store port即fu

| | | if 取到了fu

| | | | 设置fu的busy值为发射延迟为issuelat

| | | | 訪问TLB和cache写回数据

| | | else 即没有store port，则break

| | end if

| | 将LSQ的第一个元素删除（load指令已经完毕）

| if设置的为该CT阶段更新分支预測器，则更新

| 删除RUU的第一个元素

1、全部跳转指令都须要运算部件。可是直接跳转假设预測跳转且预測跳转的目标不等于指令中的地址，则明显预測错误。能够在译码阶段就清空取指队列并设置分支延迟。

2、译码阶段的宏展开：

switch (op)

{

、#defineDEFINST(OP,MSK,NAME,OPFORM,RES,CLASS,O1,O2,I1,I2,I3)
\

caseOP: \

/* compute output/input dependencies toout1-2 and in1-3 */ \

out1 = O1; out2 = O2; \

in1 = I1; in2 = I2; in3 = I3; \

/* execute the instruction */ \

SYMCAT(OP,_IMPL); \

break;

省略·············································

、#include "machine.def"

Default：

}

在machine.def中有

、#define LDA_IMPL \

{ \

SET_GPR(RA, GPR(RB) + SEXT(OFS)); \

}

、DEFINST(LDA, 0x08,

"lda", "a,o(b)",

IntALU, F_ICOMP,

DGPR(RA), DNA, DNA,DGPR(RB), DNA)

当中1是宏定义相当于空格，2处引入了文件machine.def。相当于将machine.def的内容放入switch语句中，当然宏4仍然当做空格。3处的代码。正好相应了1处的宏定义。进行展开，变成了case OP语句，而1中的SYMCAT（OP，_IMPL），又相应了4处的宏，于是展开为4中的代码，于是switch语句就变成了

Switch（op）

Case LDA：

out1 = O1; out2 = O2; \

in1 = I1; in2 = I2; in3 = I3; \

SET_GPR(RA, GPR(RB) + SEXT(OFS)); \

break;

simlescalar CPU模拟器源代码分析的更多相关文章

Parrot源代码分析之海贼王
我们的目的是找到speedup-example在使用Parrot加速的原因,假设仅仅说它源于Context Switch的降低,有点简单了,它究竟为什么降低了?除了Context Switch外是否还 ...
openVswitch（OVS）源代码分析之工作流程（数据包处理）
上篇分析到数据包的收发,这篇开始着手分析数据包的处理问题.在openVswitch中数据包的处理是其核心技术,该技术分为三部分来实现:第一.根据skb数据包提取相关信息封装成key值:第二.根据提取到 ...
基于五阶段流水线的RISC-V CPU模拟器实现
RISC-V是源自Berkeley的开源体系结构和指令集标准.这个模拟器实现的是RISC-V Specification 2.2中所规定RV64I指令集,基于标准的五阶段流水线,并且实现了分支预测模块 ...
Android异步任务处理框架AsyncTask源代码分析
[转载请注明出处:http://blog.csdn.net/feiduclear_up CSDN 废墟的树] 引言在平时项目开发中难免会遇到异步耗时的任务(比方最常见的网络请求).遇到这样的问题.我 ...
《LINUX3.0内核源代码分析》第二章：中断和异常【转】
转自:http://blog.chinaunix.net/uid-25845340-id-2982887.html 摘要:第二章主要讲述linux如何处理ARM cortex A9多核处理器的中断.异 ...
Spark源代码分析之六：Task调度（二）
话说在<Spark源代码分析之五:Task调度(一)>一文中,我们对Task调度分析到了DriverEndpoint的makeOffers()方法.这种方法针对接收到的ReviveOffe ...
nginx源代码分析--进程间通信机制 & 同步机制
Nginx源代码分析-进程间通信机制从nginx的进程模型能够知道.master进程和worker进程须要通信,nginx中通信的方式有套接字.共享内存.信号.对于master进程,从外部接受信号, ...
android-plugmgr源代码分析
android-plugmgr是一个Android插件加载框架,它最大的特点就是对插件不需要进行任何约束.关于这个类库的介绍见作者博客,市面上也有一些插件加载框架,但是感觉没有这个好.在这篇文章中,我 ...
Twitter Storm源代码分析之ZooKeeper中的目录结构
徐明明博客:Twitter Storm源代码分析之ZooKeeper中的目录结构我们知道Twitter Storm的所有的状态信息都是保存在Zookeeper里面,nimbus通过在zookeepe ...

随机推荐

[Angular] The Select DOM Event and Enabling Text Copy
When we "Tab" into a input field, we want to select all the content, if we start typing, i ...
9.8 Binder系统_c++实现_内部机制1
1. 内部机制_回顾binder框架关键点 binder进程通讯过程情景举例: test_server通过addservice向service_manager注册服务 test_client通过get ...
判断是iphone或是ipad 和系统版本
//判断是iphone或是ipad等 NSLog(@"Device -- %@",[UIDevice currentDevice].model); //系统版本 NSLog(@&q ...
php对xml进行简单的增删改查（CRUD）操作
假如有以下xml文件: <?xml version="1.0" encoding="UTF-8"? > <setting> &l ...
[Redis专辑][1]ubuntu12.04下安装php-redis的方法和步骤
首次公布路径:phpredis的安装非常久非常久没有写博文了,好多博文都没有整理完成,今天才抽时间整理完这一篇博文,希望能对大家有一定的帮助首先对redis做个简单的介绍: Redis 是全然开源 ...
[Angular2Fire] Firebase auth (Google, Github)
To do auth, first you need to go firebase.console.com to enable the auth methods, for example, enabl ...
before/after伪类常见用法
1.清除浮动 ; } /*IE6/IE7*/ .clearfix:after{ display: block; content: "clear"; ; clear: both; o ...
Nginx的一些介绍
Apacheserver:http://httpd.apache.org,世界上用的最多的server,开放源码.支持跨平台,可移植性,模块支持丰富,虽速度和性能及内存消耗不及其它轻量级Webserv ...
嵌入式linux串口设置（一）
在linux中,所有的设备文件一般都位于“/dev”下,串口1和串口2对应的设备名依次为“/dev/ttyS0”, “/dev/ttyS1”,而且USB转串口的设备名通常为“/dev/ttyUSB0” ...
synology
入手群晖261J无法正常安装DSM 错误代码38 求教各位恶魔https://www.chiphell.com/thread-1599081-1-1.html(出处: Chiphell - 分享与交流 ...

simlescalar CPU模拟器源代码分析

simlescalar CPU模拟器源代码分析的更多相关文章

随机推荐

热门专题