一个关于内联优化和调用约定的Bug

很久没有更新博客了（博客园怎么还不更新后台），前几天在写一个Linux 0.11的实验 [1] 时遇到了一个奇葩的Bug，就在这简单记录一下调试过程吧。

现象

这个实验要求在Linux 0.11中实现简单的信号量 [2]，但在改动内核代码后运行测试程序总是报错，例如：

/* pc_test.c */

#define   __LIBRARY__

#include <stdio.h>

#include <stdlib.h>

#include <semaphore.h>

#include <unistd.h>

_syscall2(long, sem_open, const char *, name, unsigned int, value);

_syscall1(int, sem_unlink, const char *, name);

int main(void)

{

    sem_t *mutex;

    if ((mutex = (sem_t *) sem_open("mutex", 1)) == (sem_t *)-1)

    {

        perror("opening mutex semaphore");

        return EXIT_FAILURE;

    }

    sem_unlink("mutex");

    return EXIT_SUCCESS;

}

提示为段错误：

定位

在内核实现信号量的核心代码 sem.c 中插桩调试，最终把发生段错误的位置定在寻找已存在信号量的 find_sem 函数中：

/*

以下注释部分是semaphore.h中我定义的链表结构体

#define MAXSEMNAME 128

struct sem_t

{

    char m_name[MAXSEMNAME+1];

    unsigned long m_value;

	struct sem_t * m_prev;

	struct sem_t * m_next;

    struct task_struct * m_wait;

};

typedef struct sem_t sem_t;

#define SEM_FAILED ((sem_t *)-1)

*/

// Data structure optimization is possible here

sem_t _semHead={.m_name = "_semHead", .m_value = 0, .m_prev = NULL,\

                 .m_next = NULL, .m_wait = NULL};

sem_t *find_sem(const char* name)

{

    sem_t *tmpSemP = &_semHead;

    while (tmpSemP->m_next != NULL)

    {

        if (strcmp((tmpSemP->m_name), name) == 0)

        {

            return tmpSemP;

        }

        tmpSemP = tmpSemP->m_next;

    }

    return tmpSemP;

}

由于该函数中存在 P->member 这样的解引用操作，很大概率就是P的值出了问题，所以就在P对应的操作附近加上 printk ，判断是否是P的值出了问题：

sem_t *find_sem(const char* name)

{

    printk("Now we are in find_sem\n"); // DEBUG

    sem_t *tmpSemP = &_semHead;

    while (tmpSemP->m_next != NULL)

    {

        printk("find_sem: tmpSemp before strcmp: %p\n", tmpSemP); // DEBUG

        if (strcmp((tmpSemP->m_name), name) == 0)

        {

            printk("find_sem: tmpSemp after strcmp: %p\n", tmpSemP); // DEBUG

            printk("find_sem: return...\n\n"); // DEBUG

            return tmpSemP;

        }

        printk("find_sem: tmpSemp after strcmp: %p\n\n", tmpSemP); // DEBUG

        tmpSemP = tmpSemP->m_next;

    }

    printk("find_sem: return...\n\n"); // DEBUG

    return tmpSemP;

}

重新编译内核，再次运行上面的 pc_test.c ，奇怪的事情发生了：

可以看到，第一次进入 find_sem 并没有发生段错误，这是因为第一次调用 sem_open 的时候内核中还没有信号量，所以 tmpSemP->m_next != NULL 不成立，但是第二次和第三次进入 find_sem ，temSemP 的值却在 strcmp(tmpSemP->m_name, name) 前后发生了改变。我们知道，C中的函数参数是“按值传递”的，如果编译器真的把strcmp 按照C函数的规则编译，那么传递 m_name 的值， tmpSemP 的值是不可能改变的。所以现在的结论是， string.h 中定义的 strcmp 很可能出了问题。

复现

为了更好的分析和调试，我将 string.h , semaphore.h 和 sem.c 中的 find_sem 关键代码拿出来，精简后在用户态进行Bug复现：

/* test.c */

#include <stdio.h>

// string.h

inline int strcmp(const char * cs,const char * ct)

{

register int __res ;

__asm__("cld\n"

	"1:\tlodsb\n\t"

	"scasb\n\t"

	"jne 2f\n\t"

	"testb %%al,%%al\n\t"

	"jne 1b\n\t"

	"xorl %%eax,%%eax\n\t"

	"jmp 3f\n"

	"2:\tmovl $1,%%eax\n\t"

	"jl 3f\n\t"

	"negl %%eax\n"

	"3:"

	:"=a" (__res):"D" (cs),"S" (ct));

return __res;

}

//semaphore.h

typedef struct sem_t

{

    char m_name[128];

	struct sem_t *m_next;

} sem_t;

//sem.c

int main(void)

{

    sem_t _semRear={.m_name = "_semRear", .m_next = (sem_t *)0};

    sem_t _semHead={.m_name = "_semHead", .m_next = &_semRear};

    sem_t *tmpSemP = &_semHead;

    char name[] = "test";

    while (tmpSemP->m_next != (sem_t *)0)

    {

        printf("1. tempSemP: %p\n", tmpSemP);

        if(!strcmp((tmpSemP->m_name), name))

            return 0;

        printf("2. tempSemP: %p\n", tmpSemP);

        tmpSemP = tmpSemP->m_next;

    }

    return 0;

}

Bug复现：

分析

我们首先分析一下 strcmp 的实现：

extern inline int strcmp(const char * cs,const char * ct)

{

register int __res ;			// 寄存器变量

__asm__("cld\n"					// 清理方向位

	"1:\tlodsb\n\t"				// 将ds:[esi]存入al，esi++

	"scasb\n\t"					// 比较al与es:[edi]，edi++

	"jne 2f\n\t"				// 若不等，向下跳转到2标志

	"testb %%al,%%al\n\t"		// 测试al寄存器

	"jne 1b\n\t"				// 若al不为0，则向上跳转到1标志

	"xorl %%eax,%%eax\n\t"		// 若al为零，则清空eax（返回值）

	"jmp 3f\n"					// 向下跳转到3标志返回

	"2:\tmovl $1,%%eax\n\t"		// eax置为1

	"jl 3f\n\t"					// 若上面的比较al更小，则这里返回正值（1）

	"negl %%eax\n\t"			// 否则eax = -1 返回负值

	"3:"

	:"=a" (__res):"D" (cs),"S" (ct)); // 规定edi寄存器接收cs参数的值，esi接收ct参数的值，最终将eax的值输出到__res寄存器变量中

return __res;					// 返回__res

}

如上，为了性能优化， strcmp 使用了内联优化（函数和汇编），是代码还是编译器的锅呢？拖入IDA，静态分析一下：

编译器忠实的保留了内联汇编的语句。通过 __printf_chk 的参数，我们知道进入控制流进入 strcmp 之前和之后编译器都把 tempSemP 放在寄存器 edi 中，并且由于信号量结构体的第一个成员就是 m_name :

//semaphore.h

typedef struct sem_t

{

    char m_name[128];

	struct sem_t *m_next;

} sem_t;

而 m_name 又是一个数组名，所以 tmpSemP->m_name 和 tmpSemP 就值而言是相同的。由于内联汇编规定使用 edi 作为第一个参数的输入寄存器，所以编译器为了优化，首先就将 tempSemP 放在寄存器 edi ，这样后面进入 strcmp 的时候就不需要再次改变 edi 了。

但是，内联汇编的代码中明明有 scasb [3] ，其会在比较操作后更改 edi 的值，难道编译器不知道吗？通过查阅GCC文档关于内联汇编的说明 [4]：

asm asm-qualifiers ( AssemblerTemplate

              : OutputOperands

              [ : InputOperands

              [ : Clobbers ] ])
6.47.2.6 Clobbers and Scratch Registers

While the compiler is aware of changes to entries listed in the output operands, the inline asm code may modify more than just the outputs. For example, calculations may require additional registers, or the processor may overwrite a register as a side effect of a particular assembler instruction. In order to inform the compiler of these changes, list them in the clobber list. Clobber list items are either register names or the special clobbers (listed below). Each clobber list item is a string constant enclosed in double quotes and separated by commas.

Clobber descriptions may not in any way overlap with an input or output operand….

文档说明了对于汇编语句中被修改但是不在 InputOperands中的寄存器，应该在 Clobbers 中写出，不然编译器不知道哪些寄存器（Bug这里是 edi ）被修改，也就可能在优化的过程中出错了。

回到 strcmp 的代码，最后一行是:"=a" (__res):"D" (cs),"S" (ct)); ，而scasb 与 lodsb [5] 修改的又是 edi , esi 。根据上面文档的说明， clobbers 不能与输入输出位置的操作数重复，所以如果这里在 clobbers 的位置放上 edi , esi 就会报错：

（这个程序员）为了编译通过，在 clobbers 的位置便没有放上 edi , esi ，大部分情况下都没有问题，但是如果编译器在优化的过程中依赖于 strcmp 不改变 edi , esi ，就可能出现Bug。

试验

现在我们从理论上发现了Bug的成因，下面我们做个试验验证一下。由于该Bug是因为tmpSemP->m_name 和 tmpSemP 就值而言是相同，才导致 tmpSemP 变量中间存储和 tmpSemP->m_name 传参使用了相同的寄存器 edi ,我们可以改变结构体成员的排列，避免这种特定的优化方式，应该就会在测试程序中避免bug，例如：

typedef struct sem_t

{

    struct sem_t * m_next;

    char m_name[128];

} sem_t;

再次运行，报错消失：

再次在IDA中观察：

可见，这里在调用第一个 __printf_chk 的时候 tempSemP 是放在 ecx 而非 edi 中，而第二个 __printf_chk 是使用之前放在 edx 中的 tempSemP 而非 edi ，确实避免了这种优化。

但是，一个新的问题出现了，根据x86调用约定（Calling Convention）， ecx 和 edx 是 Caller-saved (volatile) registers [6] ，即调用者不能依赖被调用函数保证它们的值不变，那 GCC 为什么就使用这两个寄存器作为 strcmp 调用前后 tempSemP 的值呢？

其实，在 GCC 文档中对于 inline function 提到了这么一句 [7]：

This combination of inline and extern has almost the effect of a macro. The way to use it is to put a function definition in a header file with these keywords, and put another copy of the definition (lacking inline and extern) in a library file. The definition in the header file will cause most calls to the function to be inlined. If any uses of the function remain, they will refer to the single copy in the library.

也就是说，在使用 inline 和 extern 修饰的函数时，GCC将其几乎（almost）和宏一样处理，可能也就不再根据调用约定优化了。

解决

解决思路有两种。

一是告知编译器哪些寄存器不能依赖（volatile），或者直接使用非汇编的写法，让编译器去安排。例如我们可以创建一个 string_fix.h ，在C上实现实现一个 strCmp ：

#ifndef _STRING_FIX_H_

#define _STRING_FIX_H_

/*

 * This header file is for fixing bugs caused by inline assembly

 * in string.h.

 */

int strCmp(const char* s1, const char* s2)

{

    while(*s1 && (*s1 == *s2))

    {

        s1++;

        s2++;

    }

    return *(const unsigned char*)s1 - *(const unsigned char*)s2;

}

#endif

二是手动在原来的内联汇编中保存被修改的寄存器，例如：

extern inline int strcmp(const char * cs,const char * ct)

{

register int __res ;

__asm__("push %%edi\n\tpush %%esi\n\t"

	"cld\n\t"

	"1:\tlodsb\n\t"

	"scasb\n\t"

	"jne 2f\n\t"

	"testb %%al,%%al\n\t"

	"jne 1b\n\t"

	"xorl %%eax,%%eax\n\t"

	"jmp 3f\n"

	"2:\tmovl $1,%%eax\n\t"

	"jl 3f\n\t"

	"negl %%eax\n\t"

	"3:\n\t"

	"pop %%esi\n\tpop %%edi\n"

	:"=a" (__res):"D" (cs),"S" (ct));

return __res;

}

测试及后续不再展示。

后记

这真的是Linus Torvalds [8] 写的代码吗？我试着在网上找到了一份看似权威的代码 [9]，结果其中的 strcmp 如下：



extern inline int strcmp(const char * cs,const char * ct)

{

register int __res __asm__("ax");

__asm__("cld\n"

	"1:\tlodsb\n\t"

	"scasb\n\t"

	"jne 2f\n\t"

	"testb %%al,%%al\n\t"

	"jne 1b\n\t"

	"xorl %%eax,%%eax\n\t"

	"jmp 3f\n"

	"2:\tmovl $1,%%eax\n\t"

	"jl 3f\n\t"

	"negl %%eax\n"

	"3:"

	:"=a" (__res):"D" (cs),"S" (ct):"si","di");

return __res;

}

Linus Torvalds明确了 Clobbers 为 si 和 di ，或许那个时候的GCC没有 Clobbers 不能和 InOutputOperands 重叠这个限制吧。

比较大的可能性是现在的人在研究的过程中为了方便编译，将 Clobbers 直接做了删除，例如下面几篇文章都提到了这种方法：

Ubuntu15.10邂逅linux0.11

linux环境下编译linux0.11内核

linux0.12 编译过程

同时，在这篇文章中指出 [10] ，Linux 0.1x 中这种因 Clobbers 无法通过现代编译器文件还有：

include/linux/sched.h： set_base，set_limit

include/string.h ：strcpy， strncpy，strcat，strncat，strcmp，strncmp，strchr， strrchr，strspn，strcspn，strpbrk，strstr，memcpy，memmove，memcmp，memchr，

mm/memory.c：copy_page，get_free_page

fs/buffer.c：COPY_BLK

fs/namei.c：match

fs/bitmap.c：clear_block，find_first_zero

kernel/blk_drv/floppy.c：copy_buffer

kernel/blk_drv/hd.c：port_read，port_write

kernel/chr_drv/console.c：scrup，scrdown，csi_J，csi_K，con_write

参考

[1] HIT-OSLAB-MANUAL

[2] Semaphore (programming)

[3] scasb

[4] 6.47 How to Use Inline Assembly Language in C Code

[5] lodsb

[6] Register_preservation

[7] 5.34 An Inline Function is As Fast As a Macro

[8] Linus Torvalds

[9] Linux 0.11 source

[10] 64位Debian Sid下编译Linux 0.11内核

一个关于内联优化和调用约定的Bug的更多相关文章

jvm之方法内联优化
前言在日常中工作中,我们时不时会代码进行一些优化,比如用新的算法,简化计算逻辑,减少计算量等.对于java程序来说,除了开发者本身对代码优化之外,还有一个"人"也在背后默默的优化 ...
Go 中的内联优化
文讨论 Go 编译器是如何实现内联的以及这种优化方法如何影响你的 Go 代码. 请注意:本文重点讨论 gc,实际上是 golang.org 的 Go 编译器.讨论到的概念可以广泛用于其他 Go 编译器 ...
C语言内自定义汇编函数&调用约定
探究如何在C语言里直接自写汇编函数裸函数裸函数与普通函数的区别普通函数在经过编译器编译时,编译器自动生成保护现场,恢复现场等反汇编代码当我们想要自己实现函数内部的汇编代码时,就可以告诉汇编器不 ...
C++ 内联函数摘自 C++ 应用程序性能优化
内联函数在C++语言的设计中,内联函数的引入可以说完全是为了性能的考虑.因此在编写对性能要求比较高的C++程序时,非常有必要仔细考量内联函数的使用. 所谓"内联",即将被调用函数 ...
C#效率优化（2）-- 方法内联
一.JIT编译器可以通过将方法内联展开(Method Inline Expansion)来提升效率,类似C++中的内联函数(Inline Function),与C++的内联函数不同的是,C#并不支持内 ...
inline（内联）函数
1,为小操作定义一个函数的好处是: a.可读性会强很多. b.改变一个局部化的实现比更改一个应用中的300个出现要容易得多 c.函数可以被重用,不必为其他的应用重写代码 ...
最牛X的GCC 内联汇编
导读正如大家知道的,在C语言中插入汇编语言,其是Linux中使用的基本汇编程序语法.本文将讲解 GCC 提供的内联汇编特性的用途和用法.对于阅读这篇文章,这里只有两个前提要求,很明显,就是 x86 ...
[翻译] GCC 内联汇编 HOWTO
目录 GCC 内联汇编 HOWTO 原文链接与说明 1. 简介 1.1 版权许可 1.2 反馈校正 1.3 致谢 2. 概览 3. GCC 汇编语法 4. 基本内联 5. 扩展汇编 5.1 汇编程序模 ...
C++ inline内联函数
inline 函数避免函数调用的开销 // find longer of two strings const string &shorterString(const string &s ...

随机推荐

linux 操作系统级别监控 TOP命令
Top命令是Linux下一个实时的.交互式的,对操作系统整体监控的命令,可以对CPU.内存.进程监控. 是Linux下最常用的监控命令. 第一行是任务队列信息 1 user 当前登录用户数load a ...
12 (OC)* AFNetworking
AFNetworking主要是对NSURLSession和NSURLConnection(iOS9.0废弃)的封装,其中主要有以下类:1). AFHTTPRequestOperationManager ...
译 .NET Core 3.0 发布
原文:<Announcing .NET Core 3.0> 宣布.NET Core 3.0 发布很高兴宣布.NET Core 3.0的发布.它包括许多改进,包括添加Windows窗体和W ...
C++类拷贝控制深拷贝浅拷贝
普通类型对象之间的复制很简单,而类对象与普通对象不同,类对象内部结构一般较为复杂,存在各种成员变量,这篇文章将帮你理清C++类对象的拷贝方式拷贝构造函数,拷贝赋值运算符首先我们简单了解下默认的拷贝 ...
利用ShowDoc自动生成api接口文档
最近在做新项目,感觉写完一个接口还要去再写一遍api文档挺浪费时间的,所以借用ShowDoc的api开放功能自动生成api文档. 首先去 https://www.showdoc.cc/ 注册一 ...
jenkins构建maven项目：找不到本地依赖包的解决办法
前言: 我们在构建maven项目时,常常会用到一些特殊的jar包(不能在中央仓库中直接下载到本地仓库如微软不允许以maven的方式直接下载com.microsoft.sqlserver:sqljdbc ...
vuex（vue状态管理）
vuex(vue状态管理) 1.先安装vuex npm install vuex --save 2.在项目的src目录下创建store目录,并且新建index.js文件,然后创建vuex实例,引入 ...
SpringSecurity原理剖析与权限系统设计
Spring Secutity和Apache Shiro是Java领域的两大主流开源安全框架,也是权限系统设计的主要技术选型.本文主要介绍Spring Secutity的实现原理,并基于Spring ...
读《深入理解Elasticsearch》点滴-改善查询相关性
1.标准查询 query match _all query:"搜索字符串" operator:or 2.多匹配查询+区分权重 query multi_match "que ...
thymeleaf 将后端绑定数据直接传递js变量
根据自我需求,thymeleaf可以直接将后端数据传递给js中进行使用,例如: 1.后端接口数据: @Controllerpublic class TestController { @RequestM ...

一个关于内联优化和调用约定的Bug

现象

定位

复现

分析

试验

解决

后记

参考

一个关于内联优化和调用约定的Bug的更多相关文章

随机推荐

热门专题