您可能想知道为什么VirtualAlloc在64K边界分配内存,即使
页面粒度为4K。

你有Alpha AXP处理器,感谢你。

在Alpha AXP上,没有“加载32位整数”指令。要加载32位
整数,实际上要加载两个16位整数并将它们组合起来。

因此,如果分配粒度小于64K,则重新定位在内存中的DLL 
将需要每个可重定位地址两个修正:一个到高16位,一个
到低16位。如果这改变了
两半之间的进位或借位,事情会变得更糟。(例如,来自0x1234F000移动地址4K到0x12350000,
这迫使这两个地址的低和高部分发生变化。即使
运动的量远小于64K,它仍然有在高部的冲击,由于
以随身携带。)

但等等,还有更多。

Alpha AXP实际上将两个带符号的 16位整数组合在一起形成一个32位
整数。例如,要加载值0x1234ABCD,首先要使用LDAH指令
将值0x1235加载到目标寄存器的高位字中。然后,您
将使用LDA指令添加签名值-0x5433。(因为0x5433 = 0x10000 
- 0xABCD。)结果是所需的值0x1234ABCD。

LDAH t1,0x1235(零)// t1 = 0x12350000
LDA t1,-0x5433(t1)// t1 = t1 - 0x5433 = 0x1234ABCD

因此,如果重定位导致地址在64K块的“下半部分” 
和“上半部分” 之间移动,则必须进行额外的修复以确保
正确调整地址上半部分的算法。由于编译器
喜欢重新排序指令,因此LDAH指令可能距离很远,所以
下半部分的重定位记录必须有一些方法来找到匹配的
上半部分。

更重要的是,编译器很聪明,如果它需要为
同一个64K区域内的两个变量计算地址,它会在它们之间共享LDAH指令。如果
可以通过不是64K的倍数的值重新定位,则编译器
将不再能够执行此优化,因为在
重定位之后,这两个变量可能不再属于同一个64K块。

以64K粒度强制内存分配解决了所有这些问题。

如果你一直在密切关注,你已经看到这也解释了
为什么在2GB边界附近有一个64K“无人区”。考虑
计算值0x7FFFABCD 的方法:由于低16位在
64K范围的上半部分,因此需要通过减法而不是加法来计算该值。在
天真的解决办法是使用

LDAH t1,0x8000(零)// t1 = 0x80000000,对吗?
LDA t1,-0x5433(t1)// t1 = t1 - 0x5433 = 0x7FFFABCD,对吗?

除了这不起作用。Alpha AXP是一个64位处理器,0x8000不
适合16位有符号整数,所以你必须使用-0x8000,一个负数。
实际发生的是

LDAH t1,-0x8000(零)// t1 = 0xFFFFFFFF`80000000
LDA t1,-0x5433(t1)// t1 = t1 - 0x5433 = 0xFFFFFFFF`7FFFABCD

您需要添加第三条指令来清除高32位。巧妙的
做法是添加零并告诉处理器将结果视为32位整数
并将其符号扩展为64位。

ADDL t1,零,t1 // t1 = t1 + 0,带L后缀
// L后缀表示符号扩展结果从32位到64位
// t1 = 0x00000000`7FFFABCD

如果允许2GB边界的64K内的地址,那么每个存储器地址
计算都必须插入第三个ADDL指令,以防地址
被重新定位到2GB边界附近的“危险区域”。

这是为了获得对最后64K地址空间的访问而付出的非常高的代价
(所有地址计算的性能损失为50%,以防止
实际上永远不会发生的情况),因此将该区域作为永久无效的
方式一个更谨慎的选择。

You may have wondered why VirtualAlloc allocates memory at 64K boundaries even though
page granularity is 4K.

You have the Alpha AXP processor to thank for that.

On the Alpha AXP, there is no “load 32-bit integer” instruction. To load a 32-bit
integer, you actually load two 16-bit integers and combine them.

So if allocation granularity were finer than 64K, a DLL that got relocated in memory
would require two fixups per relocatable address: one to the upper 16 bits and one
to the lower 16 bits. And things get worse if this changes a carry or borrow between
the two halves. (For example, moving an address 4K from 0x1234F000 to 0x12350000,
this forces both the low and high parts of the address to change. Even though the
amount of motion was far less than 64K, it still had an impact on the high part due
to the carry.)

But wait, there’s more.

The Alpha AXP actually combines two signed 16-bit integers to form a 32-bit
integer. For example, to load the value 0x1234ABCD, you would first use the LDAH instruction
to load the value 0x1235 into the high word of the destination register. Then you
would use the LDA instruction to add the signed value -0x5433. (Since 0x5433 = 0x10000
– 0xABCD.) The result is then the desired value of 0x1234ABCD.

LDAH t1, 0x1235(zero) // t1 = 0x12350000
LDA t1, -0x5433(t1) // t1 = t1 - 0x5433 = 0x1234ABCD

So if a relocation caused an address to move between the “lower half” of a 64K block
and the “upper half”, additional fixing-up would have to be done to ensure that the
arithmetic for the top half of the address was adjusted properly. Since compilers
like to reorder instructions, that LDAH instruction could be far, far away, so the
relocation record for the bottom half would have to have some way of finding the matching
top half.

What’s more, the compiler is clever and if it needs to compute addresses for two variables
that are in the same 64K region, it shares the LDAH instruction between them. If it
were possible to relocate by a value that wasn’t a multiple of 64K, then the compiler
would no longer be able to do this optimization since it’s possible that after the
relocation, the two variables no longer belonged to the same 64K block.

Forcing memory allocations at 64K granularity solves all these problems.

If you have been paying really close attention, you’d have seen that this also explains
why there is a 64K “no man’s land” near the 2GB boundary. Consider the method for
computing the value 0x7FFFABCD: Since the lower 16 bits are in the upper half of the
64K range, the value needs to be computed by subtraction rather than addition. The
naïve solution would be to use

LDAH t1, 0x8000(zero) // t1 = 0x80000000, right?
LDA t1, -0x5433(t1) // t1 = t1 - 0x5433 = 0x7FFFABCD, right?

Except that this doesn’t work. The Alpha AXP is a 64-bit processor, and 0x8000 does
not fit in a 16-bit signed integer, so you have to use -0x8000, a negative number.
What actually happens is

LDAH t1, -0x8000(zero) // t1 = 0xFFFFFFFF`80000000
LDA t1, -0x5433(t1) // t1 = t1 - 0x5433 = 0xFFFFFFFF`7FFFABCD

You need to add a third instruction to clear the high 32 bits. The clever trick for
this is to add zero and tell the processor to treat the result as a 32-bit integer
and sign-extend it to 64 bits.

ADDL t1, zero, t1    // t1 = t1 + 0, with L suffix
// L suffix means sign extend result from 32 bits to 64
// t1 = 0x00000000`7FFFABCD

If addresses within 64K of the 2GB boundary were permitted, then every memory address
computation would have to insert that third ADDL instruction just in case the address
got relocated to the “danger zone” near the 2GB boundary.

This was an awfully high price to pay to get access to that last 64K of address space
(a 50% performance penalty for all address computations to protect against a case
that in practice would never happen), so roping off that area as permanently invalid
was a more prudent choice.

 

为什么地址空间分配粒度为64K?Why is address space allocation granularity 64K?的更多相关文章

  1. ASLR(Address space layout randomization)地址空间布局随机化

    /*********************************************************************  * Author  : Samson  * Date   ...

  2. windows 地址空间分配

    当系统创建一个进程同时为其创建它地址空间时,此地址空间中大部分都是闲置的.为了使用这部分地址空间,我们必须调用VirtualAlloc来分配其中的区域.分配区域的操作被称为预定. 当应用程序预定地址空 ...

  3. goroutine 分析 协程的调度和执行顺序 并发写 run in the same address space 内存地址 闭包 存在两种并发 确定性 非确定性的 Go 的协程和通道理所当然的支持确定性的并发方式(

    package main import ( "fmt" "runtime" "sync" ) const N = 26 func main( ...

  4. CSAPP 2-1 - 信息的存储

    目录 0 基础概念及摘要 1 信息存储 1.1 十六进制表示法 1.2 字数据大小 1.3 寻址和字节顺序 0 基础概念及摘要 (1) 基础概念: 现代计算机存储和处理的信息以二进制信号表示 -- 0 ...

  5. BGP MPLS IP V匹N基本概念

    BGP/MPLS IP VPN基本概念 Site 在介绍VPN时经常会提到"Site",Site(站点)的含义可以从下述几个方面理解: · Site是指相互之间具备IP连通性的一组 ...

  6. CSAPP Chapter 8:Exception Control Flow

    prcesssor在运行时,假设program counter的值为a0, a1, ... , an-1,每个ak表示相对应的instruction的地址.从ak到ak+1的变化被称为control ...

  7. IPv6笔记-地址结构与分类

    1.地址基础 IPv6地址由被划分为8个16位块的128位组成. 然后将每个块转换为由冒号符号分隔的4位十六进制数字. 2001::3238:00E1:0063:0000:0000:FEFB 每一块多 ...

  8. 【WCF】终结点的监听地址

    终结点主要作用是向客户端公开一些信息入口,通过这个入口,可以找到要调用的服务操作.通常,终结点会使用三个要素来表述,我记得老蒋(网名:Artech,在园子里可以找到他)在他有关WCF的书里,把这三要素 ...

  9. 使用powershell为物理网卡添加多个IP地址

    因特殊要求,需要给某物理网卡添加多个IP地址: powershell中有个netsh的命令,添加IPv4地址的方法: add address [name=]<字符串>       [[ad ...

随机推荐

  1. 理解cookie,session,token

    彻底理解cookie,session,token 发展史 1.很久很久以前,Web 基本上就是文档的浏览而已, 既然是浏览,作为服务器, 不需要记录谁在某一段时间里都浏览了什么文档,每次请求都是一个新 ...

  2. 阿里巴巴Java开发手册(华山版).pdf

    https://github.com/alibaba/p3c/blob/master/阿里巴巴Java开发手册(华山版).pdf

  3. sublime设置默认字体样式

    因电脑配置的不同,还有个人喜好的不同,有时候想用自己喜欢的字体来写代码,想用自己习惯的字号大小来显示代码.这些又该怎样设置呢? 本节主要介绍下如何设置字体大小和样式 (1)点菜单栏 “Preferen ...

  4. mysql考题

    mysql 的考题 数据库考试题目 名字: 一.简答 1. mysql的管理员是?mysql的端口是? root,3306 2. mysql中常见的数据类型有哪些, int  char  varcha ...

  5. 【转载】TCP协议

    首部格式 图释: 各个段位说明: 源端口和目的端口:各占 2 字节.端口是传输层与应用层的服务接口.传输层的复用和分用功能都要通过端口才能实现 序号:占 4 字节.TCP 连接中传送的数据流中的每一个 ...

  6. 地址栏参数获取函数 GetQueryStr(name)

    //name:参数名称,return:有则返回该参数对应值,没有则返回null function GetQueryStr(name) {          var reg = new RegExp(& ...

  7. BlocksKit block从配角到主角—oc通往函数式编程之路--oc rx化?

    BlocksKit 对 oc语言的功能层.UI层进行了大量的适配,使得oc能够复合函数式编程的风格: 是oc语言的函数式编程风格改造: 将函数式风格的BlocksKit API解释为原生的功能调用: ...

  8. day 11

    Clear thinking requires courage rather than intelligence. 清晰的思维需要勇气而不是智力.

  9. 用jdk1.6的pack200和unpack200,对jar文件进行压缩和解压 .pack.gz

    用jdk1.6的pack200和unpack200,对jar文件进行压缩和解压 解压xxx.jar.pack.gz为xxx.jar:unpack200 -r xxx.jar.pack.gz xxx.j ...

  10. x64汇编第二讲,复习x86汇编指令格式,学习x64指令格式

    目录 x64汇编第二讲,复习x86汇编指令格式,学习x64指令格式 一丶x86指令复习. 1.1什么是x86指令. 1.2 x86与x64下的通用寄存器 1.3 OpCode 1.4 7种寻址方式 二 ...