原文地址

While implementing the x64 built-in assembler for Delphi 64bit, I got to “know” the AMD64/EM64T architecture a lot more. The good thing about the x64 architecture is that it really builds on the existing instruction format and design. However, unlike the move from 16bit to 32bit where most existing instruction encodings were automatically promoted to using 32bit arguments, the x64 design takes a different approach.

One myth about the x64 instructions is that “everything’s wider.” That’s not the case. In fact many addressing modes which were taken as absolute addresses (actually offsets within a segment, but the segments are 4G in 32bit), are actually now 32bit relative offsets now. There are very few addressing modes which use a full 64bit absolute address. Most addressing modes are 32bit offsets relative to one of the 64bit registers. One interesting addressing mode that is “implied” in many instruction encodings is the notion of RIP-relative addressing. RIP, is the 64bit equivalent of the 32bit EIP, or 16bit IP, or Instruction Pointer. This represents from which address the CPU will fetch the next instruction for execution. Most hard-coded addresses within many instructions are now relative offsets from the current RIP register. This is probably the biggest thing you have to wrap your head around when moving from 32bit assembler.

Even though many instructions will implicitly use the RIP-relative addressing mode, there are some instruction addressing modes that continue to use a 32bit offset, and are not RIP-relative. This can really bite you when doing simple mechanical translations from 32bit to 64bit. These are the SIB form with a 32bit (or even 8bit) offset. What can happen is that you end up forming an address that can only address 32bits, and is thus limited to addressing items below the 4G boundary! And this is a perfectly legal instruction! To demonstration this, consider the following 32bit assembler that we’ll translate to 64bits.

  var
TestArray: array[0..255] of Word; function GetValue(Index: Integer): Word;
asm
MOV AX,[EAX * 2 + TestArray]
end;

Let’s now translate this for use in 64bit using a simple mechanical translation.

  var
TestArray: array[0..255] of Word; function GetValue(Index: Integer): Word;
asm
MOVSX RAX,ECX
MOV AX,[RAX * 2 + TestArray]
end;

Pretty straight forward, right? Not so fast there partner. Let’s see;
I know that I need to use a full 64bit register for the offset but
since Integer is still 32bits, I need to “sign-extend” it to 64bits. The
venerable MOVSX (Move with sign extension) instruction “promotes” the
signed 32bit offset to 64bits while preserving the sign. Nope, that’s
not a problem. The only thing I changed in the next instruction was EAX
to RAX, so how could that be a problem? Well, when you compile this code
you’ll get a rather strange error message:

[DCC Error] Project7.dpr(18): E2577 Assembler instruction requires a 32bit absolute address fixup which is invalid for 64bit

Huh? Remember the little note above about the SIB instruction form?
Because the RAX (or EAX in 32bit) register is being scaled (the * 2),
this instruction must use the SIB (Scale-Index-Base) instruction form.
When using the SIB form RIP isn’t considered when calculating the actual
address. Additionally, the offset encoded in the instruction can still
only be 8 or 32bits. No 64bit offsets.

In 32bit, the compiler would generate a “fixup” to ensure that the
encoding of the instruction offset field to the global “TestArray”
variable was properly “fixed up” at runtime should the image happened to
be relocated to another address. This is a 32bit absolute address. The
64bit version of this instruction, while actually a truly valid
instruction, would only have 32bits in which to place the address of
“TestArray.” The “fixup” generated would have to remain 32bit. This
could lead to creating an image that were it ever relocated above the 4G
boundary, would likely crash at best or read the wrong memory address
at worst!

Ok, so now what? There is a SIB form that we can use to work
around this problem, but it requires burning another register. The good
news is that we now have another 8 registers with which to work. So if
you have a rather complicated chunk of 32bit assembler code that burns
up all the existing usable 32bit registers, you now have another group
of registers that can help solve this problem without having to rework
the code even more. So here’s how to fix this for 64 bit:

  var
TestArray: array[0..255] of Word; function GetValue(Index: Integer): Word;
asm
MOVSX RAX,ECX
LEA R10,[TestArray]
MOV AX,[RAX * 2 + R10]
end;

Here, I used the volatile R10 register (R8 an R9 are used for
parameter passing) to get the absolute address of TestArray using the
LEA instruction. While the “address” portion of this instruction is
still 32bits, it is taken as RIP-relative. In other words, this value is
the “distance” from the next instruction to the variable TestArray in
memory. After this instruction, R10 now contains a true 64bit address of
the TestArray variable. I must still use the SIB form in the next
instruction, but instead of a hard-coded “offset” I use the value in
R10. Yes, there is still an implicit offset of 0, which uses the 8bit
offset form.

You can see that mindless, mechanical translations of assembler code
is likely to cause you some grief due to some of the subtle changes in
instruction behaviors. For this very reason, we strongly recommend you
use all Object Pascal code instead of resorting to assembler when
possible. This will not only better ensure that your code will more
likely move unchanged to other processor architectures (think ARM here
folks), but you’ll not have to worry about such assembler gotchas in the
future. If you’re using assembler code because “it’s faster,” I would
encourage you to look closely at the algorithm used. There are many
cases where the proper algorithm written in Object Pascal will yield
greater gains than a simple translation to assembler using the same
algorithm. Yes there are some things which you simply must do in
assembler (strange, off-beat calling conventions, “LOCK” instructions
for concurrency, etc…), but I would contend that many assembler
functions can be moved back to Object Pascal with little impact on
performance.

x64 assembler fun-facts(转载)的更多相关文章

  1. More x64 assembler fun-facts–new assembler directives(转载)

    原文地址 The Windows x64 ABI (Application Binary Interface) presents some new challenges for assembly pr ...

  2. puppet之自定义fact(转载)

    1.使用环境变量'FACTERLIB'创建fact 1.1.在自定义目录里面定义一个fact,列出当前系统登录的用户数 [root@agent1 ~]# vim /var/lib/puppet/kis ...

  3. X64 Deep Dive

    zhuan http://www.codemachine.com/article_x64deepdive.html X64 Deep Dive This tutorial discusses some ...

  4. Cygwin安装时,选择163的源后出错:Unable to get setup.ini from <http://mirrors.163.com/cygwin/>

    [问题] 折腾: [记录]Cygwin下把make从v3.82换成v3.81 期间,选择了163的源,结果出错: Cygwin Setup Unable to get setup.ini from & ...

  5. VSTO 学习笔记(十)Office 2010 Ribbon开发

    原文:VSTO 学习笔记(十)Office 2010 Ribbon开发 微软的Office系列办公套件从Office 2007开始首次引入了Ribbon导航菜单模式,其将一系列相关的功能集成在一个个R ...

  6. VSTO 学习笔记(十一)开发Excel 2010 64位自定义公式

    原文:VSTO 学习笔记(十一)开发Excel 2010 64位自定义公式 Excel包含很多公式,如数学.日期.文本.逻辑等公式,非常方便,可以灵活快捷的对数据进行处理,达到我们想要的效果.Exce ...

  7. 浅谈OCR之Onenote 2010

    原文:浅谈OCR之Onenote 2010 上一次我们讨论了Tesseract OCR引擎的用法,作为一款老牌的OCR引擎,目前已经开源,最新版本3.0中更是加入了中文OCR功能,再加上Google的 ...

  8. Linux gdb调试器用法全面解析

    GDB是GNU开源组织发布的一个强大的UNIX下的程序调试工具,GDB主要可帮助工程师完成下面4个方面的功能: 启动程序,可以按照工程师自定义的要求随心所欲的运行程序. 让被调试的程序在工程师指定的断 ...

  9. 【VS开发】VSTO 学习笔记(十)Office 2010 Ribbon开发

    微软的Office系列办公套件从Office 2007开始首次引入了Ribbon导航菜单模式,其将一系列相关的功能集成在一个个Ribbon中,便于集中管理.操作.这种Ribbon是高度可定制的,用户可 ...

随机推荐

  1. 2017-2018-2 20165312实验二《Java面向对象程序设计》实验报告

    2017-2018-2 20165312实验二<Java面向对象程序设计>实验报告 实验中遇到的问题 1.增加MyUtil的测试类之后,TestCase是红色的,但是没有找到junit.j ...

  2. 模拟实现简单ATM功能

  3. Spring整体了解

      1.spring原理 内部最核心的就是IOC了,动态注入,让一个对象的创建不用new了,可以自动的生产,这其实就是利用java里的反射,反射其实就是在运行时动态的去创建.调用对象,Spring就是 ...

  4. 定位JVM内存溢出问题思路总结

    JVM的内存溢出问题,是个常见而有时候有非常难以定位的问题.定位内存溢出问题常见方法有很多,但是其实很多情况下可供你选择的有效手段非常有限.很多方法在一些实际场景下没有实用价值.这里总结下我的一些定位 ...

  5. python下selenium自动化测试自我实践

    周末实验自动化提交数据时,本来没打算写记录的,不过遇到一些问题,觉得可以提提.基本操作就不用写了,搜索过程中都发现了两个博客都出了selenium+python的书,说明操作一搜一大把. 1. 等待页 ...

  6. VS2015密匙--VS2015打开丢失msvcp140.dll--cannot find one or more components ,please reinstall the application

    win7旗舰版 64位 + vs2015 专业版 1.安装VS2015过程中可能需要用到的VS2015专业版钥匙:(测试,可用) HMGNV-WCYXV-X7G9W-YCX63-B98R2 2.VS2 ...

  7. Java入门到精通第4版汇总

  8. 为Firefox浏览器安装Firebug插件

    一.确保联网 二.打开Firefox 三.菜单:工具 -> 附加组件 显示附加组件管理器界面,点扩展 在搜索框输入firebug,搜,在搜索结果列表中找到Firebug项,安装 安装进度 安装完 ...

  9. 2018 pycharm最近激活码

    今天更新了一下pycharm,结果之前的激活就不能用了,下面是新的激活方法: 1.mac下在终端进入etc目录: cd /etc 2.编辑hosts文件: vi hosts 将“0.0.0.0 acc ...

  10. TensorFlow学习之四

    Tensorflow一些常用基本概念与函数(1) 摘要:本文主要对tf的一些常用概念与方法进行描述. 1.tensorflow的基本运作 为了快速的熟悉TensorFlow编程,下面从一段简单的代码开 ...