ref:http://kmittal82.wordpress.com/2012/02/17/armthumbthumb-2/

A few months ago I gave a presentation titled “Introduction to the ARM architecture”. One of the most well received sections of that was a bit where I explained the difference between the various types of instruction sets that can be run on the ARM architecture, i.e. ARM (32 bit), Thumb (16 bit) and Thumb-2 (16/32 bit). I will try and explain the difference between the three in this post.

Before we begin, let’s look at a very simple test case which we will build for all three architectures (Thanks for my friend and colleague Stephen Wade for coming up with this test case)

typedef unsigned char uint8;
typedef unsigned short uint16;
/* r0 = x , r1 = a, r2 = b, r3 = c */
uint8 foo(uint8 x, uint8 a, uint16 b, uint16 c)
{
    if (a==2)
    {
        x += (b >> 8);
    }
    else
    {
        x += (c >> 8);
    }
    return x;
}

According to the ARM procedure call standard, the arguments to the function would be passed in registers r0-r3, with the return value passed in r0.

ARM

Each instruction in the ARM instruction set is 32 bits in size. At the same time, the ARM instructions have access to other useful features such as conditional instructions and in-line barrel shifter. Without access to conditional instructions, code needs to be handled by branching, which is more expensive. The in-line barrel shifter gives the instruction the ability to shift bits within the registers as part of the instruction itself, which eliminates the need for having separate instructions for shifting. Using RVCT, I compiled this for ARM state, and dumped the code with fromelf

1
2
3
4
5
0x00000000:    e3510002    ..Q.    CMP      r1,#2
0x00000004:    10800423    #...    ADDNE    r0,r0,r3,LSR #8
0x00000008:    00800422    "...    ADDEQ    r0,r0,r2,LSR #8
0x0000000c:    e20000ff    ....    AND      r0,r0,#0xff
0x00000010:    e12fff1e    ../.    BX       lr

First thing we notice here is that each instruction is 32 bits (or 4 bytes) wide based on the instruction encoding (second column from the left). Given that we have 5 such instructions, we have a code size of 20 bytes

Now, in the first line, r1 is compared to the integer value 2, which is the first thing that our function foo() does. Simple so far. The more interesting thing is the second and third instruction, which demonstrates conditional execution of instructions as well as the in-line barrel shifter.

The second instruction, in effect says “If the result of the previous comparison was a NE (not equal), then shift r3 by 8 bits to the right, add it to r0 and store the result back in r0″. This corresponds to the “else” part of the loop in our function. Not only did we manage to avoid a branch, we were also able to shift the bits and do the addition, all in one simple instruction. The third instruction is exactly the same, but deals with the “if” part of the loop.

The fourth instruction is again interesting. Since the return value of the function is of type unsigned char, only the bottom 8 bits are significant. So, we do an AND of the return value (stored in r0) with #0xff, which effectively zeros the top 24bits (by ANDing with 0), and now our return value is ready in r0.

The fifth and final instruction simply returns back to the caller.

Thumb

In Thumb state, each instruction is 16 bits in size, and very few instructions are conditional. Also, there is no access to the in-line barrel shifter, so separate instructions are needed for shifting bits. What this means in practice is that Thumb code would generally be slower to execute than ARM code (since more Thumb instructions might be needed to do the job than the number of ARM instructions), but it can help save code size. Building our example with “RVCT” and dumping using “fromelf”, we see the following code sequence#

1
2
3
4
5
6
7
8
9
10
_Z3foohhtt
    0x00000000:    2902        .)      CMP      r1,#2
    0x00000002:    d101        ..      BNE      {pc}+0x6 ; 0x8
    0x00000004:    0a11        ..      LSRS     r1,r2,#8
    0x00000006:    e000        ..      B        {pc}+0x4 ; 0xa
    0x00000008:    0a19        ..      LSRS     r1,r3,#8
    0x0000000a:    1808        ..      ADDS     r0,r1,r0
    0x0000000c:    0600        ..      LSLS     r0,r0,#24
    0x0000000e:    0e00        ..      LSRS     r0,r0,#24
    0x00000010:    4770        pG      BX       lr

Immediately, the first thing we notice is that all the instructions are 16 bits in size, as indicated by the instruction encodings (second column from the left). This here gives us a total code size of 9 x 2 = 18 bytes

When we start analysing the generated code, the drawback of Thumb state here is immediately apparent. Thumb only has access to conditional branches, so the generated code is done through branches. The second instruction branches to address 0×8 if the comparison in the first step was not equal (i.e. we enter the “else” part of our C++ code). At address 0×8, we notice the second drawback of the Thumb state, the lack of an in-line barrel shifter. A separate LSRS instruction shifts the bits in r3 by 8, and stores it in r1. Following that, this value is r1 is added to r0. Our return value is nearly ready, but we need to zero the top 24bits. The Thumb instruction set does not have access to AND, so it performs two shifts on r0, first shifting r0 by 24bits to the left (zeroing the bottom 8 bits), then shifting this further by 24 bits to the right (zeroing the top 24 bits). Similar code is generated for the “if” part of the code as well.

Thumb-2

Thumb-2 offers a “best of both worlds” compromise between ARM and Thumb, and aims to deliver the performance of ARM state code with the code density of Thumb state code. Thumb-2 has access to both 16 and 32 bit instructions, and even has support for conditional execution, albeit in the form of “If-then” (IT) constructs. Thumb-2 was first introduced as part of ARMv6-T2, and has subsequently been made the default thumb implementation for ARMv7.

So, lets build our example for Thumb-2 and analyse it as before. Note, I have built this by using the “-Otime” optimization in order to generate the IT constructs.

1
2
3
4
5
6
7
_Z3foohhtt
    0x00000000:    2902        .)      CMP      r1,#2
    0x00000002:    bf14        ..      ITE      NE
    0x00000004:    eb002013    ...     ADDNE    r0,r0,r3,LSR #8
    0x00000008:    eb002012    ...     ADDEQ    r0,r0,r2,LSR #8
    0x0000000c:    b2c0        ..      UXTB     r0,r0
    0x0000000e:    4770        pG      BX       lr

First thing we notice here is there is a mix of 32 bit and 16 bit instructions, as apparent by the instruction encodings (second column from the left). We see here that instead of all instructions being the same width, we have 4 instructions which are 16 bit, and 2 instructions which are 32 bits, giving us a code size of 16 bytes.

As seen, the second instruction is an ITE NE construct. This is not an instruction per se, but more of a heads up to the processor, instructing it that some conditional instructions need to be executed, and the first one of this will be based on the NE condition. What follows in instructions 3 and 4 is identical to what happened in ARM state code. Finally, the UXTB instruction is an instruction which extends (unsigned) the byte to a word, which in this case effectively zeros out the top 24 bits.

Remember

  • ARM instructions are all 32 bits in size, and have access to an in-line barrel shifter, as well as most instructions are conditional. This is best suited for performance sensitive code, where the code size does not matter
  • Thumb instructions are all 16 bits in size, and do not have access to an in-line barrel shifter, and neither are the instructions conditional. This is best suite for situation where code footprint needs to be minimised, albeit at the expense of performance (Having said that, Thumb code might give better performance results depending on the size of the cache and other factors)
  • Thumb-2 instructions offer a “best of both worlds” approach, and it utilizes both 16 and 32 bit instructions. Conditional execution of instructions is possible through IT constructs, although there a limit to the number of conditions which can be conditionally executed within the IT block.

[转]ARM/Thumb/Thumb-2的更多相关文章

  1. ARM 的Thumb状态测试

    作为一个使用ARM的学习者,有必要全面了解你的处理器内核.尽管有些内容可能在实际应用中用不到,但是“了解”还是很必要的.Thumb状态,是ARM的一个特色,但是你知道Thumb状态与ARM状态最大的区 ...

  2. 对于Android NDK编译器ARM和Thumb模式的理解

    编译NDK项目时,编译器无法识别arm汇编,设置LOCAL_ARM_MODE := arm后问题解决, NDK文档上对LOCAL_ARM_MODE的说明如下: LOCAL_ARM_MODE By de ...

  3. ARM处理器的寄存器,ARM与Thumb状态,7中运行模式

     ** ARM处理器的寄存器,ARM与Thumb状态,7中运行模式  分类: 嵌入式 ARM处理器工作模式一共有 7 种 : USR  模式    正常用户模式,程序正常执行模式 FIQ模式(Fast ...

  4. ARM处理器的寄存器,ARM与Thumb状态,7中运行模式 【转】

    转自:http://blog.chinaunix.net/uid-28458801-id-3494646.html ARM处理器工作模式一共有 7 种 : USR  模式    正常用户模式,程序正常 ...

  5. ARM ® and Thumb ®-2 指令系统

    指令表关键词        Rm {, <opsh>} 寄存器移位方式,将寄存器的移位结果作为操作数而Rm值保持不变       <Operand2> 灵活的使用第二个操作数. ...

  6. ARM状态和THUMB状态

    ARM处理器的工作状态 在ARM的体系结构中,可以工作在三种不同的状态,一是ARM状态,二是Thumb状态及Thumb-2状态,三是调试状态. <嵌入式系统开发与应用教程(第2版)>上介绍 ...

  7. 13 ARM指令集与Thumb指令集

    指令格式 ARM基本格式 <opcode>{<cond>}{S}{.W|.N}<Rd>,<Rn>{,<operand2>} opecode: ...

  8. iOS程序破解——ARM汇编基础

    原文在此:http://www.cnblogs.com/mddblog/p/4951650.html 一.Thumb指令与ARM指令 Thumb指令为16位,因此存储代码的密度高,节省存储空间.但是功 ...

  9. 原子操作--ARM架构

    说明:内核版本号为3.10.101 一.ARM架构中的原子操作实现 在原子操作(一)中我们已经提到,各个架构组织为“复仇者”联盟,统一了基本的原子变量操作,这里我们就拿atomic_dec(v)来看看 ...

随机推荐

  1. win7或windows server 2008 R2 被远程登录日志记录 系统日志

    事件查看器 → Windows 日志 → 安全 (win7 事件查看器 打开方式 :计算机 右键   → 管理  → 计算机管理 → 系统工具 → 事件查看器 windows server 2008 ...

  2. strip 使用命令

    使用 通过消除使用调试器的粘合剂和符号信息,减少扩展公共对象文件格式(XCOFF)对象文件大小. 语法 strip [ -V ] [ -r [ -l ] | -x [ -l ] | -t | -H | ...

  3. C#操作Xml:通过XmlDocument读写Xml文档

    什么是Xml? Xml是扩展标记语言的简写,是一种开发的文本格式.关于它的更多情况可以通过w3组织了解http://www.w3.org/TR/1998/REC-xml-19980210.如果你不知道 ...

  4. crawler_httpurlconnection_自动编码识别

    核心思想: 1:从响应头中读取 [命中解流准确率最高] 2:如果响应头中没有,打开流从源码中读取,[取舍,如果有一般在前30行会有,前100行中寻找] 3:如果还没有,根据字节码code位置,字符识别 ...

  5. crawler_如何从页面获取新浪cookie

    步奏如下: 1 用chrome浏览器(其他浏览器原理相同)打开地址: http://weibo.com/ 2.点击鼠标右键 右键点击查看元素 点击Network   3.输入用户名  密码 执行登录 ...

  6. jQuery版推箱子游戏详解和源码

    前言 偶然间看到很多用js写游戏的感觉很炫酷的样子,所以就想试试,就看了一些资料和某前端站点的视屏.于是乎就自己动手实践了一下,上推箱子截图 感觉很丑陋,但是功能是实现了.再说貌似大多都是这样的吧,这 ...

  7. cocos2d-x 移植android竖,横屏设置

    AndroidManifest.xml于android:screenOrientation现场控制屏幕方向,默认为横屏 android:screenOrientation="landscap ...

  8. Oracle Data Provider for .NET now on NuGet

    Oracle Data Provider for .NET now on NuGet 时间 2015-03-02 22:30:00  Oracle Bloggers原文  http://cshay.b ...

  9. 候选键(unique)

    foreign key references  除了关联外键,还可以关联 候选键(unique) 需求 table1 中的  status  int  类型 ,表示状态 ,0 未启动 ,1 已启动,2 ...

  10. Android学习之 WebView使用小结

    这段时间基于项目须要 在开发中与WebView的接触比較多,前段时间关于HTML5规范尘埃落定的消息出如今各大IT社区头版上,更有人说:HTML5将颠覆原生App开发 尽管我不太认同这一点 可是关于H ...