Under the Hood: Dalvik patch for Facebook for Android

先来看一段中文内容

Hack Dalvik VM解决Android 2.3 DEX/LinearAllocHdr超限

当安卓工程庞大到一定程度(代码结构渣到一定程度)的时候,就会遇到诸如最大方法数超过限制导致无法安装,Crash等问题。Android 2.3 INSTALL_FAILED_DEXOPT 65535

问题的本质有两个

  • dx 打包时限制了单个dx文件的最大方法数为65535
  • Dalvik VM限制内存中加载的方法数(方法,类定义及构造函数)不能超过65535个

问题的重现很简单

  • 写一个类,把函数复制个6w份,一build,报错
  • apk安装到2.3系统,提示INSTALL_FAIL_DEXOPT
  • 动态加载两个DEX模块,每个函数3w份,一加载运行,程序Crash

网上一般推荐的解决方法

  • 删代码以及jar包,尤其是自动生成的get/set,没用的类,可以使用proguard自动优化掉无用代码
  • 由于高于Gingerbread的版本将LinearAllocHdr分配空间从5M提高到8M,放弃2.3的用户后可以有一定的缓冲时间
  • 使用dex动态加载的方式将程序内的模块插件化,这样会将问题1转化为问题2,如果程序加载项过大时还是会有崩溃现象出现
  • 将java层逻辑移到jni层实现
  • hacking dalvik vm

Facebook曾经遇到了这样的问题,有一个相关博文(Under the Hood: Dalvik patch for Facebook for Android),大概解决方法是发了一个lite版本去掉了一大票功能,以及写了一个小补丁hack掉Android Dalvik VM把它搞大了。。。

hacking dalvik vm的方法似乎是最干净利落的。可惜facebook语焉不详,参照博文中给出的信息,可以找到LinearAllocHdr*指针位于vm/Globals.h

使用jni写了个小程序做了以下几件事情实现了该hacking

  1. 通过jni方法取到*env
  2. 指针往回便利内存查找65535对应内存块
  3. 重新mmap8M内存,替换到len以及Hdr的当前位置到新map的位置

https://github.com/viilaismonster/LinearAllocFix

另外提供一个小工具可以用来反编译apk以统计构建出来的apk内*大约*有多少个方法

小工具里面会依据文件名缓存先前反编译的结果,可以用-diff参数将两个版本apk对比,查看具体到包的方法数变动

 
 
 
下面转自facebook网址:https://www.facebook.com/notes/facebook-engineering/under-the-hood-dalvik-patch-for-facebook-for-android/10151345597798920
David Reiss在2013年3月4日周一下午1:59发表的文章
相关内容链接:https://github.com/aosp-mirror/platform_dalvik/blob/android-2.3.7_r1/vm/Globals.h#L519
以及: https://github.com/aosp-mirror/platform_dalvik/blob/android-2.3.7_r1/vm/LinearAlloc.h#L33
解决方案:https://github.com/viilaismonster/LinearAllocFix
 

Facebook is one of the most feature-rich apps available for Android. With features like push notifications, news feed, and an embedded version of Facebook Messenger (a complete app in its own right) all working together in real-time, the complexity and volume of code creates technical challenges that few, if any, other Android developers face--especially on older versions of the platform. (Our latest apps support Android versions as old as Froyo--Android version 2.2--which is almost three years old.)

One of these challenges is related to the way Android's runtime engine, the Dalvik Virtual Machine, handles Java methods. Late last year we completed a  major rebuildof our Android app (https://www.facebook.com/notes/facebook-engineering/under-the-hood-rebuilding-facebook-for-android/10151189598933920), which involved moving a lot of our code from JavaScript to Java, as well as using newer abstractions that encouraged large numbers of small methods (generally considered a good programming practice). Unfortunately, this caused the number of Java methods in our app to drastically increase.

As we were testing, the problem first showed up as described in this bug (http://code.google.com/p/android/issues/detail?id=22586) , which caused our app installation to fail on older Android phones. During standard installation, a program called "dexopt" runs to prepare your app for the specific phone it's being installed on. Dexopt uses a fixed-size buffer (called the "LinearAlloc" buffer) to store information about all of the methods in your app. Recent versions of Android use an 8 or 16 MB buffer, but Froyo and Gingerbread (versions 2.2 and 2.3) only have 5 MB. Because older versions of Android have a relatively small buffer, our large number of methods was exceeding the buffer size and causing dexopt to crash.

After a bit of panic, we realized that we could work around this problem by breaking our app into multiple dex files, using the technique described here (http://android-developers.blogspot.com/2011/07/custom-class-loading-in-dalvik.html), which focuses on using secondary dex files for extension modules, not core parts of the app.

However, there was no way we could break our app up this way--too many of our classes are accessed directly by the Android framework. Instead, we needed to inject our secondary dex files directly into the system class loader. This isn't normally possible, but we examined the Android source code and used Java reflection to directly modify some of its internal structures. We were certainly glad and grateful that Android is open source—otherwise, this change wouldn’t have been possible.

But as we came closer to launching our redesigned app, we ran into another problem. The LinearAlloc buffer doesn't just exist in dexopt--it exists within every running Android program. While dexopt uses LinearAlloc to to store information about all of the methods in your dex file, the running app only needs it for methods in classes that you are actually using. Unfortunately, we were now using too many methods for Android versions up to Gingerbread, and our app was crashing shortly after startup.

There was no way to work around this with dex files since all of our classes were being loaded into one process, and we weren’t able to find any information about anyone who had faced this problem before (since it is only possible once you are already using multiple dex files, which is a difficult technique in itself).  We were on our own.

We tried various techniques to reclaim space, including aggressive use of ProGuard and source code transformations to reduce our method count. We even built a profiler for LinearAlloc usage to figure out what the biggest consumers were. Nothing we tried had a significant impact, and we still needed to write many more methods to support all of the rich content types in our new and improved news feed and timeline.

As it stood, the release of the much-anticipated Facebook for Android 2.0 was at risk. It seemed like we would have to choose between cutting significant features from the app or only shipping our new version to the newest Android phones (ICS and up). Neither seemed acceptable. We needed a better solution.

Once again, we looked to the Android source code. Looking at the definition of the LinearAlloc buffer (https://github.com/android/platform_dalvik/blob/android-2.3.7_r1/vm/LinearAlloc.h#L33), we realized that if we could only increase that buffer from 5 MB to 8 MB, we would be safe!

That's when we had the idea of using a JNI extension to replace the existing buffer with a larger one. At first, this idea seemed completely insane. Modifying the internals of the Java class loader is one thing, but modifying the internals of the Dalvik VM while it was running our code is incredibly dangerous. But as we pored over the code, analyzing all the uses of LinearAlloc, we began to realize that it should be safe as long as we did it at the start of our program. All we had to do was find the LinearAllocHdr object, lock it, and replace the buffer.

Finding it turned out to be the hard part. Here’s where it’s stored(https://github.com/android/platform_dalvik/blob/android-2.3.7_r1/vm/Globals.h#L519), buried within the DvmGlobals object, over 700 bytes from the start. Searching the entire object would be risky at best, but fortunately, we had an anchor point: the vmList object just a few bytes before. This contained a value that we could compare to the JavaVM pointer available through JNI.

The plan was finally coming together: find the proper value for vmList, scan the DvmGlobals object to find a match, jump a few more bytes to the LinearAlloc header, and replace the buffer. So we built the JNI extension, embedded it in our app, started it up, and...we saw the app running on a Gingerbread phone for the first time in weeks.The plan had worked.

But for some reason it failed on the Samsung Galaxy S II...

The most popular Gingerbread phone...

Of all time...

It seems that Samsung made a small change to Android that was confusing our code. Other manufacturers might have done the same, so we realized we needed to make our code more robust.

Manual inspection of the GSII revealed that the LinearAlloc buffer was only 4 bytes from where we expected it, so we adjusted our code to look a few bytes to each side if it failed to find the LinearAlloc buffer in the expected location. This required us to parse our process's memory map to ensure we didn't make any invalid memory references (which would crash the app immediately) and also build some strong heuristics to make sure we would recognize the LinearAlloc buffer when we found it. As a last resort, we found a (mostly) safe way to scan the entire process heap to search for the buffer.

Now we had a version of the code that worked on a few popular phones--but we needed more than just a few. So we bundled our code up into a test app that would run the same procedure we were using for the Facebook app, then just display a large green or red box, indicating success or failure.

We used manual testing, DeviceAnywhere, and a test lab that Google let us borrow to run our test app on 70 different phone models, and fortunately, it worked on every single one!

We released this code with Facebook for Android 2.0 in December. It's now running on hundreds of different phone models, and we have yet to find one where it doesn't work. The great speed improvements in that release would not have been possible without this crazy hack. And needless to say, without Android’s open platform, we wouldn’t have had the opportunity to ship our best version of the app. There’s a lot of opportunity for building on Android, and we’re excited to keep bringing the Facebook experience to more people and devices.

Android的方法数超过65535问题的更多相关文章

  1. Android工程方法数超过65535的解决办法

    Error:Execution failed for task ':ttt:transformClassesWithDexForDebug'.com.android.build.api.transfo ...

  2. APK方法数超过65535及MultiDex解决方案

    以下参考自官方文档配置方法数超过 64K 的应用 随着 Android 平台的持续成长,Android 应用的大小也在增加.当您的应用及其引用的库达到特定大小时,您会遇到构建错误,指明您的应用已达到 ...

  3. Android工程方法数超过64k,The number of method references in a .dex file cannot exceed 64K.

    最近将一个老的Eclipse项目转到Android Studio后,用gradle添加了几个依赖,项目可以make,但是一旦run就报错 Error:The number of method refe ...

  4. Android为什么方法数不能超过65535

    言归正传,来聊聊为什么方法数不能超过65535?搬上Dalvik工程师在SF上的回答,因为在Dalvik指令集里,调用方法的invoke-kind指令中,method reference index只 ...

  5. Android 方法数超过64k、编译OOM、编译过慢解决方案。

    目前将项目中的leancloud的即时通讯改为环信的即时通讯.当引入easeui的时候 出现方法数超过上限的问题. 搜索一下问题,解决方法很简单. 这里简单记录一下,顺序记录一下此解决方案导致的另一个 ...

  6. 彻底解决Android 应用方法数不能超过65K的问题

    作为一名Android开发者,相信你对Android方法数不能超过65K的限制应该有所耳闻,随着应用程序功能不断的丰富,总有一天你会遇到一个异常: Conversion to Dalvik forma ...

  7. (转载)Android 方法数超过64k、编译OOM、编译过慢解决方案。

    Android 方法数超过64k.编译OOM.编译过慢解决方案.   目前将项目中的leancloud的即时通讯改为环信的即时通讯.当引入easeui的时候 出现方法数超过上限的问题. 搜索一下问题, ...

  8. Android方法引用数超过65535优雅解决

    随着应用不断迭代更新,业务线的扩展,应用越来越大(比如:集成了各种第三方SDK或者公共开源的Library文件.jar文件)这样一来,项目耦合性就很高,重复作用的类就越来越多了,SO:问题就来了.相信 ...

  9. Android 65536方法数限制的思考

    前言 没想到,65536真的很小. 1 Unable to execute dex: method ID not in [0, 0xffff]: 65536 PS:本文只是纯探索一下这个65K的来源, ...

随机推荐

  1. 简单AOP

    代码如下 //使用说明 //1,新加接口与类 //2,新加类并实现ICallHandler类: ExecuteHandler //3,新建特性并实现HandlerAttribute和重写其中的Crea ...

  2. MapReduce超时原因(Time out after 300 secs)

    目前碰到过三种原因导致 Time out after 300 secs. 1. 死循环 这是最常见的原因.显式的死循环很容易定位,隐式的死循环就比较麻烦了,比如正则表达式.曾经用一个网上抄来的邮箱正则 ...

  3. TOMCAT下面发布项目的4种方式

    摘要 TOMCAT下面发布项目的4种方式,可用于在平时资料查询. 第一种方法: 将web项目文件件拷贝到webapps 目录中:或者直接通过Eclipse发布到Tomcat上. 第二种方法: 在tom ...

  4. 关于访问asp.net网站时登录后的奇怪问题

    登录后,地址栏地址变成了 http://www.XXXX.com/(F(HDc3otfFs0wkZu4P4CjZ50Qkck2q8aekR3g6F0m_NRZRo7kt7XQ6CjAFBR4PR8kZ ...

  5. JavaScript 实现双向队列并用此来测试一个单词是否为回文

    题目出自<数据结构与算法 JavaScript 描述>一书第五章,习题 5.2 代码如下: /*************** Deque 类的实现 *************/ funct ...

  6. 103041000997维护的是周批,按周合并后再考虑最小采购批量、舍入值、然后回写到SAP系统

    描述:103041000997维护的是周批量,但最终没有按周批量来回写数据. 业务逻辑如下: 1.净需求考虑数量按周汇总(也有按日.按3天,具体 要根据物料主数据维护来判断) 2.第1点的结果再加上安 ...

  7. 浅谈QT打印功能实现

    QT作为一款轻量级的集成开发环境,其设计的目标是使开发人员利用QT这个应用程序框架更加快速及轻易的开发应用程序.要达到此目的,要求QT必须能够跨平台,QT能够在32位及64位的Linux,MAC OS ...

  8. 230. Kth Smallest Element in a BST 找到bst中的第k小的元素

    [抄题]: Given a binary search tree, write a function kthSmallest to find the kth smallest element in i ...

  9. 解决安装Apache中出现checking for APR... no configure: error: APR not found. Please read the documentation的问题

    Linux中安装Apache 编译出现问题: 解决办法: 1.下载所需要的软件包 wget http://archive.apache.org/dist/apr/apr-1.4.5.tar.gz wg ...

  10. php单点登陆简单实现 (iframe方式)

    有四个网站分别为: www.a.com www.b.com www.c.com www.sso.com 需求是如果我们在sso登陆后,其他网站也会显示登陆中,不需要重复登陆,退出时,其他网站也会失效. ...