Made of Bugs

Amd64 and Va_arg

OCT 3RD, 2010

A while back, I was poking around LLVM bugs, and discovered, to my surprise, that LLVM doesn't supportthe va_arg intrinsic, used by functions to accept multiple arguments, at all on amd64. It turns out that clang and llvm-gcc, the compilers that backend to LLVM, have their own implementations in the frontend, so this isn't as big a deal as it might sound, but it was still a surprise to me.

Figuring that this might just be something no one got around to, and couldn't actually be that hard, I pulled out my copy of the amd64 ABI specification, figuring that maybe I could throw together a patch and fix this issue.

Maybe half an hour of reading later, I stopped in terror and gave up, repenting of my foolish ways to go work on something else. va_arg on amd64 is a hairy, hairy beast, and probably not something I was going to hack together in an evening. And so instead I decided to blog about it.

The problem: Argument passing on amd64

On i386, because of the dearth of general-purpose registers, the calling convention passes all arguments on the stack. This makes the va_arg implementation easy – A va_list is simply a pointer into the stack, and va_arg just adds the size of the type to be retrieved to the va_list, and returns the old value. In fact, the i386 ABI reference simply specifies va_arg in terms of a single line of code:

  1. #define va_arg(list, mode) ((mode *)(list = (char *)list + sizeof(mode)))[-1]

On amd64, the problem is much more complicated. To start, amd64 specifies that up to 6 integer arguments and up to 8 floating-point arguments are passed to functions in registers, to take advantage of amd64's larger number of registers. So, for a start, va_arg will have to deal with the fact that some arguments may have been passed in registers, and some on the stack.

(One could imagine simplifying the problem by stipulating a different calling convention for variadic functions, but unfortunately, for historical reasons and otherwise, C requires that code be able to call functions even if their prototype is not visible, which means the compiler doesn't necessarily know if it's calling a variadic function at any given call site. [edited to add: caf points out in the comments that C99 actually explicitly does not require this property. But I speculate that the ABI designers wanted to preserve this property from i386 because it has historically worked, and so existing code depended on it]).

That's not all, however. Not only can integer arguments be passed by registers, but small structs (16 bytes or fewer) can also be passed in registers. A sufficiently small struct, for the purposes of the calling convention, is essentially broken up into its component members, which are passed as though they were separate arguments – unless only some of them would fit into registers, in which case the whole struct is passed on the stack.

So va_arg, given a struct as an argument, has to be able to figure out whether it was passed in registers or on the stack, and possibly even re-assemble it into temporary space.

The implementation

Given all those constraints, the required implementation is fairly straightforward, but incredibly complex compared to any other platform I know of.

To start, any function that is known to use va_start is required to, at the start of the function, save all registers that may have been used to pass arguments onto the stack, into the "register save area", for future access by va_start and va_arg. This is an obvious step, and I believe pretty standard on any platform with a register calling convention. The registers are saved as integer registers followed by floating point registers. As an optimization, during a function call, %rax is required to hold the number of SSE registers used to hold arguments, to allow a varargs caller to avoid touching the FPU at all if there are no floating point arguments.

va_list, instead of being a pointer, is a structure that keeps track of four different things:

  1. typedef struct {
  2. unsigned int gp_offset;
  3. unsigned int fp_offset;
  4. void *overflow_arg_area;
  5. void *reg_save_area;
  6. } va_list[1];

reg_save_area points at the base of the register save area initialized at the start of the function. fp_offsetand gp_offset are offsets into that register save area, indicating the next unused floating point and general-purpose register, respectively. Finally, overflow_arg_area points at the next stack-passed argument to the function, for arguments that didn't fit into registers.

Here's an ASCII art diagram of the stack frame during the execution of a varargs function, after the register save area has been established. Note that the spec allows functions to put the register save area anywhere in its frame it wants, so I've shown potential storage both above and below it.

  1. | ... | [high addresses]
  2. +----------------+
  3. | argument |
  4. | passed |
  5. | on stack (2) |
  6. +----------------+ <---- overflow_arg_area
  7. | argument |
  8. | passed |
  9. | on stack (1) |
  10. +----------------+
  11. | return address |
  12. +----------------+
  13. | ... | (possible local storage for func)
  14. +----------------+
  15. | %xmm15 | \
  16. +----------------+ |
  17. | %xmm14 | | ___
  18. +----------------+ | |
  19. | ... | \ register
  20. +----------------+ }save|
  21. | %xmm0 | / area|
  22. +----------------+ | |
  23. | %r9 | | |
  24. +----------------+ | | fp_offset
  25. | %r8 | | ___ |
  26. +----------------+ | | |
  27. | ... | | | |
  28. +----------------+ | | gp_offset
  29. | %rsi | | | |
  30. +----------------+ | | |
  31. | %rdi | / | |
  32. +----------------+ <----+--+--- reg_save_area
  33. | ... | (potentially more storage)
  34. +----------------+ <----------- %esp
  35. | ... | [low addresses]

Because va_arg must tell determine whether the requested type was passed in registers, it needs compiler support, and can't be implemented as a simple macro like on i386. The amd64 ABI reference specifies va_arg using a list of eleven different steps that the macro must perform. I'll try to summarize them here.

First off, va_arg determines whether the requested type could be passed in registers. If not, va_argbehaves much like it does on i386, using the overflow_arg_area member of the va_list (Plus some complexity to deal with alignment values).

Next, assuming the argument can be passed in registers, va_arg determines how many floating-point and general-purpose registers would be used to pass the requested type. It compares those values with the gp_offset and fp_offset fields in the va_list. If the additional registers would cause either value to overflow the number of registers used for parameter-passing for that type, then the argument was passed on the stack, and va_arg bails out and uses overflow_arg_area.

If we've made it this far, the argument was passed in registers. va_arg fetches the argument using reg_save_area and the appropriate offsets, and then updates gp_offset and fp_offset as appropriate.

Note that if the argument was passed in a mix of floating-point and general-purpose registers, or requires a large alignment, this means that va_arg must copy it out of the register save area onto temporary space in order to assemble the value.

So, in the worst case, va_arg on a type that embeds both a floating-point and an integer type must do two comparisons, a conditional branch, and then update two fields in the va_list and copy multiple values out of the register save area into a temporary object to return. That's quite a lot more work than the i386 version does. Note that I don't mean to suggest this is a performance concern – I don't have any benchmarks to back this up, but I would be shocked if this is measurable in any reasonable code. But I was surprised by how complex this operation is.

Amd64 and Va_arg的更多相关文章

  1. AMD64和i386的区别

    下载Debian系统时,出现两个选项:ADM64和i386,那么这两者的区别是什么? i386=Intel 80386.其实i386通常被用来作为对Intel(英特尔)32位微处理器的统称. AMD6 ...

  2. debian7 请把标有“Debian GNU/Linux 7.1.0 _Wheezy_ - Official amd64 DVD Binary-1 20130615-23:06”的盘片插入驱动器“/media/cdrom/”再按回车键

    有时候,在通过apt-get install 安装软件的时候,会出现: 更换介质:请把标有“Debian GNU/Linux 7.1.0 _Wheezy_ - Official amd64 DVD B ...

  3. AMD64与IA64的区别

    其实很多人从字面上,都以为AMD64就是针对AMD CPU的,IA64是针对INTEL CPU,其实是错的,我最初也是这样认为,其实不然: 你在市面上买的到的intel 64位 CPU都属于amd64 ...

  4. C++省略参数(va_list va_start va_arg va_end)的简单应用

    原文参考自:http://www.cnblogs.com/hanyonglu/archive/2011/05/07/2039916.html #include <iostream> #in ...

  5. va_list/va_start/va_arg/va_end深入分析【转】

    转自:http://www.cnblogs.com/justinzhang/archive/2011/09/29/2195969.html va_list/va_start/va_arg/va_end ...

  6. 对C语言中va_list,va_start,va_arg和va_end的一点理解

    这几个函数和变量是针对可变参数函数的,什么是可变参数函数呢,最经典的莫过于printf和scanf,这两个函数的声明如下: int printf(const char *format, ...); i ...

  7. Compiling Xen-4.4 From Source And Installing It On Ubuntu Server (Amd-64)

    First of all, you should install a clean Ubuntu Server (Amd-64) on your server. (Version 14.04 is st ...

  8. 可变长参数列表误区与陷阱——va_arg不可接受的类型

    可变长参数列表误区与陷阱--va_arg不可接受的类型 实现一个有可变长参数列表函数的时候,会使用到stdarg.h(这里不讨论varargs.h)中提供的宏. 例如,我们要实现一个简易的my_pri ...

  9. i386 和amd64 的意思

    首先可以简化一个概念,i386=Intel 80386.其实i386通常被用来作为对Intel(英特尔)32位微处理器的统称. Windows NT类系统的安装盘上,通常i386是其根上的一个文件夹, ...

随机推荐

  1. ios基础篇(二十一)—— UIImagePickerController类

    UIImagePickerController简述: UIImagePickerController 类是获取选择图片和视频的用户接口,我们可以用UIImagePickerController选择我们 ...

  2. java selenium (一) selenium 介绍

    Selenium 是目前用的最广泛的Web UI 自动化测试框架. 本系列文章,将深入简出来讲解selenium 的用法 文章的末尾处, 有整个系列的链接 阅读目录 selenium 的命名 sele ...

  3. sql sever读取Excel总结【转】

    主要用到openrowset,opendatasource系统函数,这两个函数任意一个都能完成任务 用这种方法可以实现Excel和sqlserver表之间的相互导入导出. openrowset的写法 ...

  4. Linux教程:SSH免密码登录的方法

    公司里有N台服务器需要经常登录,每次ssh的时候都要输入密码实在太不爽了,今天有空一口气全部改为公钥/私钥认证,登录再也不用任何密码了. 实现步骤: 1.在你的自己的机器下面使用ssh-keygen命 ...

  5. 吐槽!important专用博文

    在IT公司实习了1个多月,氛围还是不错的,也算是积累了一些项目经验,同时在代码模块化.版本控制.任务优先级等方面有了更进一步的体会和理解,深刻认识到在一个团队,最重要的是沟通和负责. 嗯,说了下题外话 ...

  6. [CentOS Server] Bug when calling matlab in bash

    尝试了好几遍,仍然不能用简写命令调用matlab,这里把过程记录如下. (1). 登录 server [She@She ~]$ ssh shecl@xx.xx.xx.xx Last :: from x ...

  7. 【SPI】Polling Interrupt DMA

    三種將資料在I/O間傳送的方法有 1. Polling2. Interrupt-driven I/O3. DMA(Direct Memory Access) Polling:最簡單的方式讓I/O de ...

  8. SQLserver 备份和还原 失败

    错误一: 备份对于服务器“xxxxxx”失败. System.Data.SqlClient.SqlError: 无法使用备份文件 'C:\Program Files\Microsoft SQL Ser ...

  9. 腾讯云服务器centos 6.5(jdk+tomcat+vsftp)、腾讯mysql数据库 及 tomcat自启动 配置教程

    1.腾讯云数据库配置 1.考虑到安全性问题,,平常不使用root用户登录,新增一个用户名neil,用来管理项目的数据库 a.首先登录root创建db_AA数据库 b.在root用户下,创建neil用户 ...

  10. 从新 开始学习java

    今天备受打击了,群里 发一个段 招租的代码.挺火的,一时没想出来.就亲测了一遍.做了两遍才看出来原因,对此感觉基础不扎实,从新学习,当做复习. 群里传的代码 亲测,代码. 了解缘由. package ...