《CS:APP》 chapter 9 Vitrual Memory 笔记
Vitrual Memory
In order to manage memory more efficiently and with fewer errors, modern systems provide an abstraction of main memory known as virtual memory (VM). Virtual memory is an elegant interaction of hardware exceptions,
hardware ad-dress translation, main memory, disk files, and kernel software that provides each process with a large, uniform, and private address space.
With one clean mechanism, virtual memory provides three important capabilities.
(1) It uses main memory efficiently by treating it as a cache for an address space stored on disk, keeping only the active areas in main memory, and transferring data back and forth between disk and memory as needed.
(2) It simplifies memory management by providing each process with a uniform address space.
(3) It protects the address space of each process from corruption by other processes.
屁话在前,还是建议先看看《MOS》的内存管理
9.1 Physical and Virtual Addressing
The task of converting a virtual address to a physical one is known as address translation .
9.2 Address Spaces
An address space is an ordered set of nonnegative integer addresses
{0 , 1, 2 ,... }
If the integers in the address space are consecutive, then we say that it is a linear address space .
In a system with virtual memory, the CPU generates virtual addresses from an address space of N = 2^n addresses called the virtual address space :
{0 , 1, 2 ,...,N − 1}
The size of an address space is characterized by the number of bits that are needed to represent the largest address. For example, a virtual address space with N = 2^n addresses is called an n-bit address space.
Modern systems typically support either 32-bit or 64-bit virtual address spaces.
A system also has aphysical address space that corresponds to the M bytes of physical memory in the system:
{0 , 1, 2 ,...,M − 1}
M is not required to be a power of two, but to simplify the discussion we will assume thatM = 2^m.
9.3 VM as a Tool for Caching
Conceptually, a virtual memory is organized as an array of N contiguous byte-sized cells stored on disk. Each byte has a unique virtual address that serves as an index into the array.
Unallocated:
Pages that have not yet been allocated (or created) by the VM system. Unallocated blocks do not have any data associated with them, and thus do not occupy any space on disk.
Cached:
Allocated pages that are currently cached in physical memory.
Uncached:
Allocated pages that are not cached in physical memory.
9.3.1 DRAM Cache Organization
Because of the large miss penalty and the expense of accessing the first byte, virtual pages tend to be large, typically 4 KB to 2 MB.
Finally, because of the large access time of disk, DRAM caches always use write-back instead of write-through.
9.3.2 Page Tables
As with any cache, the VM system must have some way to determine if a virtual page is cached somewhere in DRAM. If so, the system must determine which physical page it is cached in. If there is a miss, the system
must determine where the virtual page is stored on disk, select a victim page in physical memory, and
copy the virtual page from disk to DRAM, replacing the victim page.
A data structure stored in physical memory known as a page table that maps virtual pages to physical pages.
If the valid bit is not set, then a null address indicates that the virtual page has not yet been allocated. Otherwise, the address points to the start of the virtual page on
disk.
9.3.3 Page Hits
Consider what happens when the CPU reads a word of virtual memory contained in VP 2, which is cached in DRAM (Figure 9.5).
Since the valid bit is set, the address translation hardware knows that VP 2 is cached in memory. So it uses the physical memory address in the PTE (which points to the start of the cached page in PP 1) to
construct the physical address of the word.
9.3.4 Page Faults
In virtual memory parlance, a DRAM cache miss is known as a page fault. Fig-ure 9.6 shows the state of our example page table before the fault. The CPU has referenced a word
in VP 3, which is not cached in DRAM. The address transla-tion hardware reads PTE 3 from memory, infers from the valid bit that VP 3 is not cached, and triggers a page fault exception.
The page fault exception invokes a page fault exception handler in the kernel, which selects a victim page, in this case VP 4 stored in PP 3. If VP 4 has been modified, then the kernel copies it back to disk.
In either case, the kernel modifies the page table entry for VP 4 to reflect the fact that VP 4 is no longer cached in main memory.
Next, the kernel copies VP 3 from disk to PP 3 in memory, updates PTE 3, and then returns. When the handler returns, it restarts the faulting instruction, which resends the faulting virtual address to the address
translation hardware.
But now, VP 3 is cached in main memory, and the page hit is handled normally by the address translation hardware. Figure 9.7 shows the state of our example page table after the page fault.
In virtual memory parlance, blocks are known as pages. The activity of transferring a page between disk and memory is known asswapping or paging . Pages are swapped in (paged
in) from disk to DRAM, and swapped out (paged out) from DRAM to disk. The strategy of waiting until the last moment to swap in a page, when a miss occurs, is known as demand paging
.
9.4 VM as a Tool for Memory Management
In fact, operating systems provide a separate page table, and thus a separate virtual address space, for each process.
Figure 9.9 shows the basic idea. In the example, the page table for process i maps VP 1 to PP 2 and VP 2 to PP 7. Similarly, the page table for processj maps VP 1 to PP 7 and VP 2 to PP 10. Notice that multiple
virtual pages can be mapped to the same shared physical page.
9.5 VM as a Tool for Memory Protection
A user process should not be allowed to modify its read-only text section. Nor should it be allowed to read or modify any of the code and data structures in the kernel. It should not be allowed to read or write
the private memory of other processes, and it should not be allowed to modify any virtual pages that are shared with other processes, unless all parties explicitly allow it (via calls to explicit interprocess communication system calls).
As we have seen, providing separate virtual address spaces makes it easy to isolate the private memories of different processes. But the address translation mechanism can be extended in a natural way to provide
even finer access control. Since the address translation hardware reads a PTE each time the CPU generates an address, it is straightforward to control access to the contents of a virtual page by adding some additional permission bits to the PTE. Figure 9.10
shows the general idea.
In this example, we have added three permission bits to each PTE. The SUP bit indicates whether processes must be running in kernel (supervisor) mode to access the page.
9.6 Address Translation
A control register in the CPU, the page table base register (PTBR) points to the current page table. Then-bit virtual address has two components: a p -bit virtual page offset (VPO) and an(n − p)-bit
virtual page number (VPN). The MMU uses the VPN to select the appropriate PTE. For example, VPN 0 selects PTE 0, VPN 1
selects PTE 1, and so on. The corresponding physical address is the concatenation of the physical page number (PPN) from the page table entry and the VPO from the virtual address. Notice that since the physical and virtual pages are both Pbytes, thephysical
page offset (PPO) is identical to the VPO.
Figure 9.13(a) shows the steps that the CPU hardware performs when there is a page hit.
Step 1: The processor generates a virtual address and sends it to the MMU.
Step 2: The MMU generates the PTE address and requests it from the cache/main memory.
Step 3: The cache/main memory returns the PTE to the MMU.
Step 4: The MMU constructs the physical address and sends it to cache/main memory.
Step 5: The cache/main memory returns the requested data word to the processor.
Unlike a page hit, which is handled entirely by hardware, handling a page fault requires cooperation between hardware and the operating system kernel (Figure 9.13(b)).
Steps 1 to 3:The same as Steps 1 to 3 in Figure 9.13(a).
Step 4: The valid bit in the PTE is zero, so the MMU triggers an exception, which transfers control in the CPU to a page fault exception handler in the operating system kernel.
Step 5: The fault handler identifies a victim page in physical memory, and if that page has been modified, pages it out to disk.
Step 6: The fault handler pages in the new page and updates the PTE in memory.
Step 7: The fault handler returns to the original process, causing the faulting instruction to be restarted. The CPU resends the offending virtual address to the MMU. Because the virtual page is now cached in physical memory, there is a hit, and after the MMU
performs the steps in Figure 9.13(b), the main memory returns the requested word to the processor.
9.6.2 Speeding up Address Translation with a TLB
However, many systems try to eliminate even this cost by including a small cache of PTEs in the MMU called a translation lookaside buffer (TLB).
有点断章取义的味道了,这里是为了消除VM translation的长时间消耗,直接採用一个新的硬件去帮助实现page table的检索。TLB
9.6.3 Multi-Level Page Tables
The common approach for compacting the page table is to use a hierarchy of page tables instead. The idea is easiest to understand with a concrete example.
Each PTE in the level-1 table is responsible for mapping a 4MB chunk of the virtual address space, where each chunk consists of 1024 contiguous pages.
Each PTE in a level 2 page table is responsible for mapping a 4KB page of virtual memory, just as before when we looked at single-level page tables.
This scheme reduces memory requirements in two ways.
First, if a PTE in the level 1 table is null, then the corresponding level 2 page table does not even have to exist. This represents a significant potential savings, since most of the 4 GB virtual address space
for a typical program is unallocated.Second, only the level 1 table needs to be in main memory at all times. The level 2 page tables can be created and paged in and out by the VM system as they are needed, which reduces pressure on main memory. Only
the most heavily used level 2 page tables need to be cached in main memory.
具体的vitrual address translate into physics address的具体demo看书吧。
。。
,受益匪浅
9.7 Case Study: The Intel Core i7/Linux Memory System
9.7.1 Core i7 Address Translation
The Core i7 uses a four-level page table hierarchy. Each process has its own private page table hierarchy
CPU先产生VA。然后首先检索TLB,假设TLB里面没有,于是通过MMU,去内存的page table 找相应的VA。找到相应的VA储存的PA之后。接着看相应的PA在cache中有没有。有就通过CT | CI |CO 读取数据,没有就去更低一层的储存中找例如说。main memory
9.7.2 Linux Virtual Memory System
Linux maintains a separate virtual address space for each process of the form shown in Figure 9.26.
Linux Virtual Memory Areas
Linux organizes the virtual memory as a collection of areas(also called segments ). An area is a contiguous chunk of existing (allocated) virtual memory whose pages are related in some way. For example, the
code segment, data segment, heap, shared library segment, and user stack are all distinct areas.
vm_start: Points to the beginning of the area
vm_end: Points to the end of the area
vm_prot : Describes the read/write permissions for all of the pages contained in the area
vm_flags: Describes (among other things) whether the pages in the area are shared with other processes or private to this process
vm_next : Points to the next area struct in the list
9.8 Memory Mapping
9.8.4 User-level Memory Mapping with the mmap Function
Unix processes can use the mmapfunction to create new areas of virtual memory and to map objects into these areas.
/***************************************************************
code writer: EOF
code date : 2014.07.27
e-mail:jasonleaster@gmail.com code purpose:
practice for function mmap void *mmap(void *start,size_t length,int prot,int flags,int fd,off_t offset) ****************************************************************/ #include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h> void mmapcopy(int fd,int size)
{
char* bufp;
bufp = mmap(NULL,size,PROT_READ,MAP_PRIVATE,fd,0);
write(STDOUT_FILENO,bufp,size);
return;
} int main(int argc,char* argv[])
{
struct stat stat;
int fd; if(argc != 2)
{
printf("usage: %s <filename>\n",argv[0]);
return 0;
} fd = open(argv[1],O_RDONLY,0);
fstat(fd,&stat);
mmapcopy(fd,stat.st_size);
return 0;
}
9.9 Dynamic Memory Allocation
9.9.1 The mallocand freeFunctions
Figure 9.34 shows how an implementation ofmallocand freemight manage a (very) small heap of 16 words for a C program. Each box represents a 4-byte word. The heavy-lined rectangles correspond to allocated blocks
(shaded) and free blocks (unshaded). Initially, the heap consists of a single 16-word double-word aligned free block.
Figure 9.34(a):The program asks for a four-word block. Mallocresponds by carving out a four-word block from the front of the free block and returning a pointer to the first word of the block.
Figure 9.34(b): The program requests a five-word block. Mallocresponds by allocating a six-word block from the front of the free block. In this example, malloc pads the block with an extra word in order to keep the free block aligned on
a double-word boundary.
Figure 9.34(c):The program requests a six-word block and mallocresponds by carving out a six-word block from the free block.
Figure 9.34(d): The program frees the six-word block that was allocated in
Figure 9.34(b). Notice that after the call to freereturns, the pointer p2 still points to the freed block. It is the responsibility of the application not to use p2 again until it is reinitialized by a new call
to malloc
Figure 9.34(e):The program requests a two-word block. In this case,malloc allocates a portion of the block that was freed in the previous step and returns a pointer to this new block.
关于malloc申请内存的blocksize 另外专门用别的blog笔记
link:
http://blog.csdn.net/cinmyheart/article/details/38174421
9.9.10 Coalescing Free Blocks
When the allocator frees an allocated block, there might be other free blocks that are adjacent to the newly freed block. Such adjacent free blocks can cause a phenomenon known as false fragmentation, where
there is a lot of available free memory chopped up into small, unusable free blocks. For example, Figure 9.38 shows the result of freeing the block that was allocated in Figure 9.37. The result is two adjacent free blocks with payloads of three words each.
As a result, a subsequent request for a payload of four words would fail, even though the
aggregate size of the two free blocks is large enough to satisfy the request.
To combat false fragmentation, any practical allocator must merge adjacent free blocks in a process known ascoalescing.
就到这里了。
之后的user space 简单的malloc实现和垃圾回收的实现,以及常见和malloc有关的bug会单独用别的blog贴出。
《CS:APP》 chapter 9 Vitrual Memory 笔记的更多相关文章
- CS:APP Chapter 3 程序的机器级表示-读书笔记
3.1 程序的机器级表示 发展历史 Intel,AMD,ARM 等企业各有又是,CPU 从 8 位发展到 16 位,再到 32 位,近几年发展到 64 位,当下的 CPU 体系被称为 x86-64 体 ...
- CS:APP Chapter-6 存储器层次系统-读书笔记
存储器层次系统 笔记,应该不是一个大而全的文件,笔记应该是提纲挈领,是对思想的汇总浓缩,如果追求详实的内容反而是丢了初心. 计算机是抽象的,它的设计者努力让计算机变得简单,在设计上高度抽象,而计算机的 ...
- 深入理解计算机系统 (CS:APP) Lab2 - Bomb Lab 解析
原文地址:https://billc.io/2019/04/csapp-bomblab/ 写在前面 CS:APP是这学期的一门硬核课程,应该是目前接触到最底层的课程了.学校的教学也是尝试着尽量和CMU ...
- CS:APP配套实验 Data Lab
刚刚完成注册博客,想写一篇随笔,方便以后自己回顾.如果恰好也能帮助到你,是我的荣幸. 这次随笔是记载我的计算机系统(CS:APP,Computer Systems:A Programer's Pers ...
- 图文并茂-超详解 CS:APP: Lab3-Attack(附带栈帧分析)
CS:APP:Lab3-ATTACK 0. 环境要求 关于环境已经在lab1里配置过了.lab1的连接如下 实验的下载地址如下 说明文档如下 http://csapp.cs.cmu.edu/3e/at ...
- ubuntu12.04 安装CS:APP Y86模拟器
下的第一UBUNTU12.04下Y86模拟器的安装:(參考http://archive.cnblogs.com/a/1865627/ 作适当改动) 1.安装bison和flex词法分析工具 sudo ...
- What every programmer should know about memory 笔记
What every programmer should know about memory, Part 1(笔记) 每个程序员都应该了解的内存知识[第一部分] 2.商用硬件现状 现在硬件的 ...
- 深入理解计算机系统_3e 第四章家庭作业(部分) CS:APP3e chapter 4 homework
4.52以后的题目中的代码大多是书上的,如需使用请联系 randy.bryant@cs.cmu.edu 更新:关于编译Y86-64中遇到的问题,可以参考一下CS:APP3e 深入理解计算机系统_3e ...
- 深入理解计算机系统_3e 第九章家庭作业 CS:APP3e chapter 9 homework
9.11 A. 00001001 111100 B. +----------------------------+ | Parameter Value | +--------------------- ...
随机推荐
- 实现微信小程序的wxml文件和wxss文件在phpstrom的支持
最近下载了微信小程序准备好好看看,但是发现微信小程序用的后缀名是不一样的,.wxml代表的就是平时用的.html,.wxss代码的就是平时用的.css.但是phpstorm无法识别,为了更方便的码代码 ...
- Mac下CUDA开启及Tensorflow-gpu 1.4 安装
本文由@ray 出品,转载请注明出处. 文章链接:http://www.cnblogs.com/wolfray/p/8040694.html 在之前的文章中,笔者介绍了在Mac下安装Tensorfl ...
- Linux 查询PID和端口号
https://www.cnblogs.com/understander/p/5546458.html
- vim下阅读代码时标签跳转设置
1.在fedora14中的 /etc/vimrc下,加入如下几行,可根据源代码工程文件的结构来定 2. 在源代码工程内,输入如下命令 ctags -R 当前目录下将生成一个tags文件 3.查看源代码 ...
- JDK升级
保存jboss运行时环境的配置 删除jboss下面的缓存文件 删除deployments里面的war包 重新build项目
- 00 python基础知识
''' ''' print('hello world!') ''' 变量 ''' # 变量的:‘tank’,会在内存中产生一份内存地址 #变量名:相当于一个门牌号,用于与变量进行绑定 # = :用来把 ...
- 【原】PHPExcel导出Excel
1.引入相关公共库PHPExcel 2.编写公共函数 public function exportExcel($excelTitle,$data,$filename='',$column_width= ...
- SqlServer Function
set quoted_identifier on; set ansi_nulls on; go create function [dbo].[Get_StrArrayStrOfIndex] ( @st ...
- Luogu P1892 [BOI2003]团伙
P1892 [BOI2003]团伙 题目描述 1920年的芝加哥,出现了一群强盗.如果两个强盗遇上了,那么他们要么是朋友,要么是敌人.而且有一点是肯定的,就是: 我朋友的朋友是我的朋友: 我敌人的敌人 ...
- 第九节:numpy之随机数组及随机排列