原文地址

The Windows x64 ABI (Application Binary Interface) presents some new challenges for assembly programming that don’t exist for x86. A couple of the changes that must be taken into account can can be seen as very positive. First of all, there is now one and only one OS specified calling convention. We certainly could have devised our own calling convention like in x86 where it is a register-based convention, however since the system calling convention was already register based, that would have been an unnecessary complication. The other significant change is that the stack must always remain aligned on 16 byte boundaries. This seems a little onerous at first, but I’ll explain how and why it’s necessary along how it can actually make calling other functions from assembly code more efficient and sometimes even faster than x86. For a detailed description of the calling convention, register usage and reservations, etc… please see this. Another thing that I’ll discuss is exceptions and why all of this is necessary.

For an given function there are three parts we’re going to talk about,
the prolog, body, and epilog. The prologue and epilogue contain all the
setup and tear-down of the function’s “frame”. The prolog is where all
the space on the stack is reserved for local variables and, different
from how the x86 compiler works, the space for the maximum number of
parameter space needed for all the function calls within the body. The
epilog does the reverse and releases the reserved stack space just prior
to returning to the caller. The body of a function is where the user’s
code is placed, either in Pascal, or as we’ll see this is where your
assembler code you write will go.

You may be wondering why the prolog is reserving parameter space in
addition to the space needed for local variables. Why not just push the
parameters on the stack right before calling a function? While there is
technically nothing keeping the compiler from placing parameters for a
function call on the stack immediately before a call, this will have the
effect of making the exception tables larger. As I mentioned above,
exceptions in x64 are not implemented the same as in x86, which was a
stack-based linked list of records. In x64, exceptions are done using
extra data generated by the compiler that describes the stack changes
for a given function and where the handlers/finally blocks are located.
By only modifying the stack within the prolog and epilog, “unwinding”
the stack is easier and more accurate. Another side benefit is that when
passing stack parameters to functions, the space is already available
so the data merely needs to be “MOV”ed onto the stack without the need
for a PUSH. The stack also remains properly aligned, so no extra
finagling of the RSP register is necessary.

Directives

Delphi for Windows 64bit introduced several new assembler directives or
“pseudo-instructions”, .NOFRAME, .PARAMS, .PUSHNV, and .SAVENV. These
directives allow you to control how the compiler sets up the context
frame and ensures that the proper exception table information is
generated.

.NOFRAME

Some functions never make calls to other functions. These are called
“leaf” functions because the don’t do any further “branching” out to
other functions, so like a tree, they represent the “leaf” For functions
such as this, having a full stack frame may be extra overhead you want
eliminate. While the compiler does try and eliminate the stack frame if
it can, there are times that it simply cannot automatically figure this
out. If you are certain a frame is unnecessary, you can use this
directive as a hint to the compiler.

.PARAMS <max params>

This one may be a little confusing because it does not refer to
the parameters passed into the current function, rather this directive
should be placed near the top of the function (preferably before any
actual CPU instructions) with a single ordinal parameter to tell the
compiler what the maximum number of parameters will be needed
for all the function calls within the body. This will allow the compiler
to properly reserve extra, properly aligned, stack space for passing
parameters to other functions. This number should reflect the maximum
number of parameters for all functions and should include even those
parameters that are passed in registers. If you’re going to call a
function that takes 6 parameters, then you should use “.PARAMS 6”.

When you use the .PARAMS directive, a pseudo-variable @Params becomes
available to simplify passing parameters to other functions. It’s fairly
easy to load up a few registers and make a call, but the x64 calling
convention also requires that callers reserve space on the stack even
for register parameters. The .PARAMS directive ensures this is the case,
so you should still use the .PARAMS directive even if you’re going to
call a function in which all parameters are passed in registers. You use
the @Params pseudo-variable as an array, where the first parameter is
at index 0. You generally don’t actually use the first 4 array elements
since those must be passed in registers, so you’ll start at parameter
index 4. The default element size is the register size of
64bits, so if you want to pass a smaller value, you’ll need a cast or
size override such as “DWORD PTR @Params[4]”, or “ @Params[4].Byte”.
Using the @Params pseudo-variable will save the programmer from having
to manually calculate the offsets based on alignments and local
variables. UPDATE: I foobar’ed that one… The
@Params[] array is an array of bytes, which allows you to address every
byte of the parameters. Each parameter takes up 8 bytes (64bits), so
you’ll need to scale accordingly to access each parameter. Casting or
size overrides are still necessary. The above bad example should have
been: “DWORD PTR @Params[4*8]” or “ @Params[4*8].Byte”. Sorry about that.

.PUSHNV <GPReg>, .SAVENV <XMMReg>

According to the x64 calling convention and register usage spec, there
are some registers which are considered non-volatile. This means that
certain registers are guaranteed to have the same value after a function
call as it had before the function call. This doesn’t mean this
register is not available for usage,  it just means the called function
must ensure it is properly preserved and restored. The best place to
preserve the value is on the stack, but that means space should be
reserved for it. These directives provide both the function of ensuring
the compiler includes space for the register in the generated prolog
code and actually places the register’s value in that reserved location.
It also ensures that the function epilog properly restores the register
before cleaning up the local frame. .PUSHNV works with the 64bit
general purpose registers RAX…R15 and .SAVENV works with the 128bit
XMM0..XMM15 SSE2 registers. See the above link for a description of
which registers are considered non-volatile. Even though you can specify
any register, volatile or non-volatile as a parameter to these
directives, only those registers which are actually non-volatile will be
preserved. For instance, .PUSHNV R11 will assemble just fine, but no
changes to the frame will be made. Whereas, .PUSHNV R12 will place a
PUSH R12 instruction right after the PUSH RBP instruction in the prolog.
The compiler will also continue to ensure that the stack remains
aligned. Remember when I talked about why the stack must remain 16byte
aligned? One key reason is that many SSE2 instructions which operate on
128bit memory entities require that the memory access be aligned on a
16byte boundary. Because the compiler ensures this is the case, the
space reserved by the .SAVENV directive is guaranteed to be 16byte
aligned.
Writing assembler code in the new x64 world can be daunting and
frustrating due to the very strict requirements on stack alignment and
exception meta-data. By using the above directives, you are signaling
your intentions to the one thing that is pretty darn good at ensuring
all those requirements are met; the compiler. You should always ensure
the directives are placed at the top of the assembler function body
before any actual CPU instructions. This makes sure the compiler has all
the information and everything is already calculated for when it begins
to see the actual CPU instructions and needs to know what the offset
from RBP where that local variable is located. Also, by ensuring that
all stack manipulations happen within the prolog and epilog, the system
will be able to properly “unwind” the stack past a properly written
assembler function. Without this data, the OS unwind process could
become lost and at worst, skip exception handlers, or at worst call the
wrong one and lead to further corruption. If the unwind process gets
lost enough, the OS may simply kill the process without any warning,
similar to what stack overflows do in 32bit (and 64bit).

More x64 assembler fun-facts–new assembler directives(转载)的更多相关文章

  1. C166 Interfacing C to Assembler

    Interfacing C to Assembler You can easily interface your C programs to routines written in XC16x/C16 ...

  2. win10下Visual Studio 2015,C++ x64编译zlib

    前提安装了visual studio 2015      PS.几乎所有方式,x64的编译都会有点坑,鉴于网上的x86编译方式非常的多,所以不再累赘x86的编译方式 zlib下载源: 官网:http: ...

  3. [转]ARM/Thumb2PortingHowto

    src: https://wiki.edubuntu.org/ARM/Thumb2PortingHowto#ARM_Assembler_Overview When you see some assem ...

  4. An Assembly Language

    BUFFER OVERFLOW 3 An Assembly Language Introduction Basic of x86 Architecture Assembly Language Comp ...

  5. -fomit-frame-pointer 编译选项在gcc 4.8.2版本中的汇编代码研究

    #include void fun(void) { printf("fun"); } int main(int argc, char *argv[]){ fun(); return ...

  6. 领域驱动设计(Domain Driven Design)参考架构详解

    摘要 本文将介绍领域驱动设计(Domain Driven Design)的官方参考架构,该架构分成了Interfaces.Applications和Domain三层以及包含各类基础设施的Infrast ...

  7. Keil使用中的若干问题

    一.混合编程 1.模块内接口: 使用如下标志符: #pragma asm 汇编语句 #pragma endasm 注意:如果在c51程序中使用了汇编语言,注意在keil编译器中需要激活Properti ...

  8. java开发中的链式思维 —— 设计一个链式过滤器

    概述 最近在弄阿里云的sls日志服务,该服务提供了一个搜索接口,可根据各种运算.逻辑等表达式搜出想要的内容.具体语法可见https://help.aliyun.com/document_detail/ ...

  9. SpringBoot2.0源码分析(三):整合RabbitMQ分析

    SpringBoot具体整合rabbitMQ可参考:SpringBoot2.0应用(三):SpringBoot2.0整合RabbitMQ RabbitMQ自动注入 当项目中存在org.springfr ...

  10. 阅读Java Native源码前的准备

    前言 读java native源代码时,我们一般会去网站下载openjdk8源码http://download.java.net/openjdk/jdk8/promoted/b132/openjdk- ...

随机推荐

  1. Java中的权限修饰符private、protected、public

    java中的修饰符分类: 权限修饰符: private, default, protected, public 状态修饰符: static, final 抽象修饰符: abstract 权限修饰符 我 ...

  2. npm安装与使用

    NPM 使用介绍 摘自:http://www.runoob.com/nodejs/nodejs-npm.html NPM是随同NodeJS一起安装的包管理工具,能解决NodeJS代码部署上的很多问题, ...

  3. 随机森林RandomForest

    ID3,C4.5决策树的生成: 输入:训练集D,特征集A,阈值eps, 输出:决策树T 若D中所有样本属于同一类Ck,则T为单节点树,将类Ck作为该结点的类标记,返回T: 若A为空集,即没有特征作为划 ...

  4. linux文件压缩解压命令

    01-.tar格式解包:[*******]$ tar xvf FileName.tar打包:[*******]$ tar cvf FileName.tar DirName(注:tar是打包,不是压缩! ...

  5. 用java打印图形

    代码如下 public static void main(String[] args) { for (int i = 0; i <7; i++) { for (int j = 0; j < ...

  6. BZOJ 3097: Hash Killer I

    3097: Hash Killer I Time Limit: 5 Sec  Memory Limit: 128 MBSec  Special Judge[Submit][Status][Discus ...

  7. gentoo 图像方面的软件

    图像方面的软件一般包括:查看图像,屏幕截图,图像修改. 查看图像简单的可以安装 feh,但是 feh 一般作为墙纸来用.稍微好一些的是 gqview. 屏幕截图可以用 screengrab,使用的时候 ...

  8. kettle数据库连接使用变量

    新增db连接(密码也可以设置参数) 转换中,右键空白处,选择转换设置

  9. LevelDB源码分析-编码

    编码(util/coding.h util/coding.cc) LevelDB将整型编码为二进制字符串的形式,同时又能够和ASCII字符区分. 首先是定长编码: void EncodeFixed32 ...

  10. Android中五大字符串总结(String、StringBuffer、StringBuilder、Spanna

    https://www.aliyun.com/jiaocheng/2861.html?spm=5176.100033.1.35.2ed56b03CbsYFK 摘要:String.StringBuffe ...