课程回顾

Swarthmore学院16年开的编译系统课,总共10次大作业。本随笔记录了相关的课堂笔记以及第9次大作业。

  • 赋值的副作用:循环元组



    下面的代码展示了Python3是如何处理循环列表(print和==):
    >>> x = [1, 2]
    >>> x[1] = x
    >>> print(x)
    [1, [...]]
    >>> y = [1, 2]
    >>> y[1] = y
    >>> x is y
    False
    >>> x is x
    True
    >>> y is y
    True
    >>> x == y
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    RecursionError: maximum recursion depth exceeded in comparison
  • 内存管理(Memory Management):维护一个free list数据结构,或者将元数据存储到堆中,用来标记空闲的空间。

  • 自动内存管理(Automatic Memory Management)

    维护引用数



    维护标记



  • 标记缩并垃圾回收算法(mark/compact algorithm)

    下图展示了f((7, 8))执行时的堆栈情况,这时候内存空间已满,需要进行垃圾收集。



    下图展示了f((7, 8))(12)执行时的堆栈情况



    下图是回收算法的实现细节

  • General GC

    在eden区域中,使用semispace swapping将live的data拷贝到堆区的下半部分;如果超过了移动次数,将其拷贝到G1区域,依次进行;

    在tenured区域中使用mark/compact算法进行垃圾回收,这个区域中的数据不会频繁的被销毁或移动。

编程作业

Mark

In the first phase, we take the initial heap and stack, and set all the GC words of live data to 1. The live data is all the data reachable from references on the stack, excluding return pointers and base pointers (which don't represent data on the heap). We can do this by looping over the words on the stack, and doing a depth-first traversal of the heap from any reference values (pairs or closures) that we find.

The stack_top and first_frame arguments to mark point to the top of the stack, which contains a previous base pointer value, followed by a return pointer. If f had local variables, then they would cause stack_top to point higher. stack_bottom points at the highest (in terms of number) word on the stack. Thus, we want to loop from stack_top to stack_bottom, traversing the heap from each reference we find. We also want to skip the words corresponding to first_frame and the word after it, and each pair of base pointer and return pointer from there down (if there are multiple function calls active).

Along the way, we also keep track of the highest start address of a live value to use later, which in this case is 64, the address of the closure

int get_heap_index(int* current_addr, int offset, int* heap_start) {
return (*current_addr + offset - (int)heap_start) / 4;
} void mark_dfs(int* current_addr, int* heap_start, int* max_index) {
// pair: [ tag ][ GC word ][ value ][ value ]
if ((*current_addr & 0x00000007) == 1) {
DEBUG_PRINT("pair: %#010x\n", *current_addr - 1);
int index = get_heap_index(current_addr, -1, heap_start);
if (!heap_start[index + 1]) {
heap_start[index + 1] = 0x1;
if (index > *max_index) *max_index = index;
mark_dfs(&heap_start[get_heap_index(current_addr, 7, heap_start)], heap_start, max_index);
mark_dfs(&heap_start[get_heap_index(current_addr, 11, heap_start)], heap_start, max_index);
}
}
// closure: [ tag ][ GC word ][ varcount = N ][ arity ][ code ptr ][[ N vars' data ]][ maybe padding ]
else if ((*current_addr & 0x00000007) == 5) {
DEBUG_PRINT("closure: %#010x\n", *current_addr - 5);
int index = get_heap_index(current_addr, -5, heap_start);
if (!heap_start[index + 1]) {
heap_start[index + 1] = 0x1;
if (index > *max_index) *max_index = index;
int arity = heap_start[index + 3];
for (int i = 0; i < arity; i++) {
mark_dfs(&heap_start[get_heap_index(current_addr, i * 4 + 15, heap_start)], heap_start, max_index);
}
}
}
} int* mark(int* stack_top, int* first_frame, int* stack_bottom, int* heap_start) {
int max_index = 0;
for (int* stack_current = stack_top; stack_current != stack_bottom; stack_current++) {
// skip the words corresponding to first_frame and the word after it,
// and each pair of base pointer and return pointer from there down (if there are multiple function calls active).
if (*stack_current != *first_frame && *stack_current != *(first_frame + 1)) {
DEBUG_PRINT("stack_current: %#010x\n", *stack_current);
mark_dfs(stack_current, heap_start, &max_index);
}
} return heap_start + max_index;
}

Forward

To set up the forwarding of values, we traverse the heap starting from the beginning (heap_start). We keep track of two pointers, one to the next space to use for the eventual location of compacted data, and one to the currently-inspected value.

For each value, we check if it's live, and if it is, set its forwarding address to the current compacted data pointer and increase the compacted pointer by the size of the value. If it is not live, we simply continue onto the next value – we can use the tag and other metadata to compute the size of each value to determine which address to inspect next. The traversal stops when we reach the max_address stored above (so we don't accidentally treat the undefined data in spaces 72-80 as real data).

Then we traverse all of the stack and heap values again to update any internal pointers to use the new addresses.

In this case, the closure is scheduled to move from its current location of 64 to a new location of 16. So its forwarding pointer is set, and both references to it on the stack are also updated. The first tuple is already in its final position (starting at 0), so while its forwarding pointer is set, references to it do not change.

void forward_dfs(int* current_addr, int* heap_start) {
if ((*current_addr & 0x00000007) == 1) {
int index = get_heap_index(current_addr, -1, heap_start);
int gc_tag = heap_start[index + 1];
if (gc_tag != 0x0 && gc_tag != 0x1) {
DEBUG_PRINT("gc_tag: %p\n", (int *)gc_tag);
forward_dfs(&heap_start[get_heap_index(current_addr, 7, heap_start)], heap_start);
forward_dfs(&heap_start[get_heap_index(current_addr, 11, heap_start)], heap_start);
*current_addr = gc_tag;
}
} else if ((*current_addr & 0x00000007) == 5) {
int index = get_heap_index(current_addr, -5, heap_start);
int gc_tag = heap_start[index + 1];
if (gc_tag != 0x0 && gc_tag != 0x1) {
int arity = heap_start[index + 3];
for (int i = 0; i < arity; i++) {
forward_dfs(&heap_start[get_heap_index(current_addr, i * 4 + 15, heap_start)], heap_start);
*current_addr = gc_tag;
}
}
}
} void forward(int* stack_top, int* first_frame, int* stack_bottom, int* heap_start, int* max_address) {
int* p = heap_start; // current compacted data pointer
int* q = p; // current inspected value pointer while (q <= max_address) {
int tag = *q;
int gc_word = *(q + 1);
DEBUG_PRINT("tag: %d\n", tag);
DEBUG_PRINT("gc_word: %d\n", gc_word);
// pair
if ((tag & 0x00000007) == 1) {
if (gc_word) {
*(q + 1) |= (int)p;
p += 4;
}
q += 4;
}
// closure
else if ((tag & 0x00000007) == 5) {
int frees = *(q + 3);
int needed_space = 5 + frees;
int with_padding = needed_space + needed_space % 2;
if (gc_word) {
*(q + 1) |= (int)p;
p += with_padding;
}
q += with_padding;
}
} // traverse all of the stack and heap values again to update any internal pointers to use the new addresses.
for (int* stack_current = stack_top; stack_current != stack_bottom; stack_current++) {
if (*stack_current != *first_frame && *stack_current != *(first_frame + 1)) {
forward_dfs(stack_current, heap_start);
}
}
return;
}

Compact

Finally, we travers the heap, starting from the beginning, and copy the values into their forwarding positions. Since all the internal pointers and stack pointers have been updated already, once the values are copied, the heap becomes consistent again. We track the last compacted address so that we can return the first free address—in this case 40—which will be returned and used as the new start of allocation. While doing so, we also zero out all of the GC words, so that the next time we mark the heap we have a fresh start.

I also highly recommend that you walk the rest of the heap and set the words to some special value. The given tests suggest overwriting each word with the value 0x0cab005e – the “caboose” of the heap. This will make it much easier when debugging to tell where the heap ends, and also stop a runaway algorithm from interpreting leftover heap data as live data accidentally.

int* compact(int* heap_start, int* max_address, int* heap_end) {
int start_index;
int offset;
int size;
for (int i = 0; (heap_start + i) <= max_address; ) {
int tag = *(heap_start + i);
int *gc_tag = heap_start + i + 1; if ((tag & 0x00000007) == 1) {
size = 4;
offset = -1;
} else if ((tag & 0x00000007) == 5) {
offset = -5;
int frees = *(heap_start + i + 3);
int needed_space = 5 + frees;
size = needed_space + needed_space % 2;
} if (*gc_tag == 0x0) {
i += size;
} else {
start_index = get_heap_index(gc_tag, offset, heap_start);
*gc_tag = 0x0;
DEBUG_PRINT("start_index: %d, i: %d\n", start_index, i);
for (int j = start_index; j < start_index + size; j++) {
DEBUG_PRINT("value: %#010x\n", *(heap_start + i + j % 4));
*(heap_start + j) = *(heap_start + i + j % size);
}
i += size;
}
}
for (int i = start_index + size; (heap_start + i) != heap_end; i++) {
*(heap_start + i) = 0x0cab005e;
} return heap_start + size;
}

Test

➜  starter-garbage git:(master) ✗ ./gctest
heap: 0x7a615160, max address: 0x7a615190
0/0x7a615160: 0x1 (1)
1/0x7a615164: 0x0 (0)
2/0x7a615168: 0x7a615171 (2053198193)
3/0x7a61516c: 0x8 (8)
4/0x7a615170: 0x1 (1)
5/0x7a615174: 0x0 (0)
6/0x7a615178: 0xa (10)
7/0x7a61517c: 0xc (12)
8/0x7a615180: 0xcab005e (212533342)
9/0x7a615184: 0xcab005e (212533342)
10/0x7a615188: 0xcab005e (212533342)
11/0x7a61518c: 0xcab005e (212533342)
12/0x7a615190: 0xcab005e (212533342)
13/0x7a615194: 0xcab005e (212533342)
14/0x7a615198: 0xcab005e (212533342)
15/0x7a61519c: 0xcab005e (212533342)
. OK (1 test)

参考资料

starter-garbage

garbageSnake

[swarthmore cs75] Compiler 6 – Garbage Snake的更多相关文章

  1. [swarthmore cs75] Compiler 6 – Fer-de-lance

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第8次大作业. First-class function: It treats function ...

  2. [swarthmore cs75] Compiler 5 – Egg-eater

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第7次大作业. 抽象语法: 存储方式: 栈中的数据如果最后三位(tag bits)是001表示元 ...

  3. [swarthmore cs75] Compiler 4 – Diamondback

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第6次大作业. 函数声明 增加函数声明.函数调用的抽象语法:在转换成anf之前还要检查函数声明和 ...

  4. [swarthmore cs75] Compiler 3 – Cobra

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第5次大作业. 增加了bool数据表示和比较运算符的支持,具体语法参考下图: 第一种int和bo ...

  5. [swarthmore cs75] Compiler 2 – Boa

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第4次大作业. A-Normal Form 在80年代,函数式语言编译器主要使用Continua ...

  6. [swarthmore cs75] Compiler 1 – Adder

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第3次大作业. 编译的过程:首先解析(parse)源代码,然后成抽象语法树(AST),再生成汇编 ...

  7. [swarthmore cs75] inlab1 — Tiny Compiler

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了inlab1的实践过程. tiny compiler 这个迷你的编译器可以将一个源文件,编译成可执行的二进制代码. ...

  8. [swarthmore cs75] Lab 1 — OCaml Tree Programming

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第2大次作业. 比较两个lists的逻辑: let rec cmp l ll = match ( ...

  9. [swarthmore cs75] Lab 0 Warmup & Basic OCaml

    课程回顾 Swarthmore学院16年开的编译系统课,总共10次大作业.本随笔记录了相关的课堂笔记以及第1次大作业. 什么是编译 编译就是执行Program->Program'转换的过程,如下 ...

随机推荐

  1. 吴裕雄 python matplotlib 绘图示例

    import matplotlib.pyplot as plt plt.scatter([1,2,3,4],[2,3,2,5])plt.title('My first plot')plt.show() ...

  2. 爬虫之scrapy扩展

    针对pipelines的扩展 from scrapy.exceptions import DropItem class CustomPipeline(object): def __init__(sel ...

  3. [ts] Property 'aaa' does not exist on type 'Window' 解决办法

    第一种: (window as any).aaa 第二种: declare global { interface Window { aaa: any; } } window.aaa = window. ...

  4. C++ 数组和字符串

    数组和字符串的基本知识 目录 一.数组的声明 二.字符串 一.数组的声明 1.1.存储在每个元素中的值得类型: 1.2.数组名: 1.3.数组中的元素数. ];//short 数组元素值的类型,a数组 ...

  5. parallel Stream 学习

    首先,我们需要了解下ForkJoinPool.ForkJoin框架是从jdk7中新特性,它同ThreadPoolExecutor一样,也实现了Executor和ExecutorService接口.它使 ...

  6. Uncaught TypeError: form.attr is not a function 解决办法

    前端form表单提交时遇到个问题,一直报错如下 首先说结论:form是个js对象,不是jQuery对象,不能用jquery对象的方法. 代码是: $(document).ready(function( ...

  7. 关于感受野 (Receptive field) 你该知道的事

    Receptive field 可中译为“感受野”,是卷积神经网络中非常重要的概念之一. 我个人最早看到这个词的描述是在 2012 年 Krizhevsky 的 paper 中就有提到过,当时是各种不 ...

  8. python测试开发django-2.templates模板与html页

    前言 Django 中的视图的概念是一类具有相同功能和模板的网页的集合.通俗一点来说,就是你平常打开浏览器,看到浏览器窗口展示出来的页面内容,那就是视图.前面一章通过浏览器访问http://127.0 ...

  9. @ManyToMany的学习

    新建了一个manyToMany的学生-课程表,学生可以选择多个课程,多个课程可以被学生选.尝试如下代码,创建了两个list,然后新建对象,加入list,然后set<list>,报错,这是因 ...

  10. Tomcat start.bat闪退:JRE_HOME环境变量配置不对

    最近在配置Tomcat上遇到startup.bat启动闪退,在网上找了很多方法,都没解决.后来在网上找到两种问题,更改了两次,解决了.现将我遇到的问题分享,希望对遇到同样问题有帮助. 1.很多初学者对 ...