转:https://labsblog.f-secure.com/2017/06/22/super-awesome-fuzzing-part-one/

An informative guide on using AFL and libFuzzer.

Posted on behalf of Atte Kettunen (Software Security Expert) & Eero Kurimo (Lead Software Engineer) – Security Research and Technologies.


The point of security software is to make a system more secure. When developing software, one definitely doesn’t want to introduce new points of failure, or to increase the attack surface of the system the software is running on. So naturally, we take secure coding practices and software quality seriously. One good example of how we strive to improve software quality and security at F-Secure is our Vulnerability Reward Program that’s been running for almost two years now. And it’s still running, so do participate if you have a chance! Earlier this year, we posted an article detailing what we learned during the first year. It goes without saying that we have many processes in-house to catch potential bugs and vulnerabilities in our software. In this article, we’d like to explain one of the many processes we use in-house to find vulnerabilities before they reach our customers, and our dear bug bounty hunters.

One method for bug hunting that has proven to be very effective is a technique called fuzzing, where the target program is injected with unexpected or malformed data, in order to reveal input handling errors leading to, for example, an exploitable memory corruption. To create fuzz test cases, a typical fuzzer will either mutate existing sample inputs, or generate test cases based on a defined grammar or ruleset. An even more effective way of fuzzing is coverage guided fuzzing, where program execution paths are used to guide the generation of more effective input data for test cases. Coverage guided fuzzing tries to maximize the code coverage of a program, such that every code branch present in the program is tested. With the emergence of [Google’s!!] open source coverage guided fuzzing tools such as American Fuzzy Lop (AFL), LLVM libFuzzer, and HonggFuzz, using coverage guided fuzzing has never been easier or more approachable. You no longer need to master arcane arts, spend countless hours writing test case generator rules, or collecting input samples that cover all functionality of the target. In the simplest cases you can just compile your existing tool with a different compiler, or isolate the functionality you want to fuzz, write just a few lines of code, and then compile and run the fuzzer. The fuzzer will execute thousands or even tens of thousands of test cases per second, and collect a set of interesting results from triggered behaviors in the target.

If you’d want to get started with coverage guided fuzzing yourself, here’s a couple of examples showing how you’d fuzz libxml2, a widely used XML parsing and toolkit library, with two fuzzers we prefer in-house: AFL and LLVM libFuzzer.

Fuzzing with AFL

Using AFL for a real world example is straightforward. On Ubuntu 16.04 Linux you can get fuzzing libxml2 via its xmllint utility with AFL with just seven commands.

First we install AFL and get the source code of libxml2-utils.

  1. $ apt-get install -y afl
  2. $ apt-get source libxml2-utils

Next we configure libxml2 build to use AFL compilers and compile the xmllint utility.

  1. $ cd libxml2/
  2. $ ./configure CC=afl-gcc CXX=afl-g++
  3. $ make xmllint

Lastly we create a sample file with content “<a></a>” for AFL to start with and run the afl-fuzz.

  1. $ echo "" > in/sample
  2. $ LD_LIBRARY_PATH=./.libs/ afl-fuzz -i ./in -o ./out -- ./.libs/lt-xmllint -o /dev/null @@

AFL will continue fuzzing indefinitely, writing inputs that trigger new code coverage in ./out/queue/, crash triggering inputs in ./out/crashes/ and inputs causing hangs in /out/hangs/. For more information on how to interpret the AFL’s status screen, see: http://lcamtuf.coredump.cx/afl/status_screen.txt

Fuzzing with LLVM libFuzzer

Let’s now fuzz libxml2 with the LLVM libFuzzer. To start fuzzing, you’ll first need to introduce a target function, LLVMFuzzerTestOneInput, that receives the fuzzed input buffer from libFuzzer. The code looks like this.

  1. extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  2. DoSomethingInterestingWithMyAPI(Data, Size);
  3. return 0; // Non-zero return values are reserved for future use.
  4. }

For fuzzing libxml2, Google’s fuzzer test suite provides a good example fuzzing function.

  1. // Copyright 2016 Google Inc. All Rights Reserved.
  2. // Licensed under the Apache License, Version 2.0 (the "License");
  3. #include
  4. #include
  5. #include "libxml/xmlversion.h"
  6. #include "libxml/parser.h"
  7. #include "libxml/HTMLparser.h"
  8. #include "libxml/tree.h"
  9. void ignore (void * ctx, const char * msg, ...) {}
  10. extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  11. xmlSetGenericErrorFunc(NULL, &ignore);
  12. if (auto doc = xmlReadMemory(reinterpret_cast(data), size, "noname.xml", NULL, 0))
  13. xmlFreeDoc(doc);
  14. return 0;
  15. }

Before compiling our target function, we need to compile all dependencies with clang and -fsanitize-coverage=trace-pc-guard, to enable SanitizerCoverage coverage tracing. It is a good idea to also use -fsanitize=address,undefined in order to enable both the AddressSanitizer(ASAN) and the UndefinedBehaviorSanitizer(UBSAN) that catch many bugs that otherwise might be hard to find.

  1. $ git clone https://github.com/GNOME/libxml2 libxml2
  2. $ cd libxml2
  3. $ FUZZ_CXXFLAGS="-O2 -fno-omit-frame-pointer -g -fsanitize=address,undefined -fsanitize-coverage=trace-pc-guard"
  4. $ ./autogen.sh
  5. $ CXX="clang++-5.0 $FUZZ_CXXFLAGS" CC="clang-5.0 $FUZZ_CXXFLAGS" CCLD="clang++-5.0 $FUZZ_CXXFLAGS" ./configure
  6. $ make

As of writing this post, libFuzzer is not shipped with precompiled clang-5.0 packages of http://apt.llvm.org/, so you’ll still need to checkout and compile libFuzzer.a yourself as documented in http://llvm.org/docs/LibFuzzer.html#getting-started, but this might change in the near future.

The second step is to compile our target function, with the same flags, and link it with both the libFuzzer runtime and the libxml2 we compiled earlier.

  1. $ clang++-5.0 -std=c++11 $FUZZ_CXXFLAGS -lFuzzer ./libxml-test.cc -I ./include ./.libs/libxml2.a -lz -llzma -o libxml-fuzzer

Now we are ready to run our fuzzer.

  1. $ mkdir ./output
  2. $ ./libxml-fuzzer ./output/

We didn’t use any sample inputs, so libFuzzer starts by generating random data in order to find inputs that trigger new code paths in our libxml2 target function. All inputs that trigger new coverage are stored as sample files in ./output. As libFuzzer runs in-process, if a bug is found, it saves the test case and exits. On a high-end laptop, a single instance of libFuzzer reached over 5000 executions per second, slowing down to around 2000 once it started to generate test cases with more coverage. For more information on how to interpret the output see: http://llvm.org/docs/LibFuzzer.html#output

Creating a corpus

If your target is fast, meaning hundreds or even thousands of executions per second, you can try generating a base corpus out of thin air. With coverage guided fuzzing it is possible to do this even with more complex formats like the AFL author Michał Zalewski did with JPEG-files, but to save time, you should get a good representation of typical files for the application that are as small as possible. The smaller the files, the faster they are to fuzz.

AFL does not give any additional flags to tinker with when generating corpus out of thin air. Just give it a small sample input, for example “<a></a>” as an XML sample, and run AFL like you normally would.

With libFuzzer you have more flags to experiment with. For example, for XML you might want to try with ‘-only_ascii=1‘. One good technique for most formats is to execute multiple short runs while incrementing the maximum sample size of our fuzzer on each round and then merge all the results to form the output corpus.

  1. $ for foo in 4 8 16 32 64 128 256 512; do \
  2. ./libxml-fuzzer -max_len=$foo -runs=500000 ./temp-corpus-dir; \
  3. done
  4. $ ./libxml-fuzzer -merge=1 ./corpus ./temp-corpus-dir

With this approach, we first collect interesting inputs with maximum length of 4 bytes, the second run analyses the 4 byte inputs and uses those as a base for 8 byte inputs and so on. This way we discover “easy” coverage with faster smaller inputs and when we move to larger files we have a better initial set to start with.

To get some numbers for this technique, we did three runs with the example script.

On average, running the corpus generation script took about 18 minutes on our laptop. LibFuzzer was still frequently discovering new coverage at the end of iterations where -max_len was larger than 8 bytes, which suggests that, for those lengths, libFuzzer should be allowed to run longer.

For comparison, we also took the libFuzzer with default settings and ran it for three rounds, which took about 18 minutes.

  1. $ ./libxml-fuzzer -max_total_time=1080 ./temp-corpus-dir
  2. $ ./libxml-fuzzer -merge=1 ./corpus ./temp-corpus-dir;

From these results we see that our runs with the corpus generation script on average executed more test cases, generated a larger set of files, that triggers more coverage and features than the set generated with the default values. This is due to the size of test cases generated by libFuzzer using default settings. Previously libFuzzer used default -max_len of 64 bytes, but at the time of writing libFuzzer was just updated to have a default -max_len of 4096 bytes. In practice sample sets generated by this script have been very working starting points for fuzzing, but no data has been collected how the effects differ in comparison to default setting in long continuous fuzzing.

Corpus generation out of thin air is an impressive feat, but if we compare these results to the coverage from W3C XML test suite we see that it is a good idea to also to include sample files from different sources to your initial corpus, as you’ll get much better coverage before you’ve even fuzzed the target.

  1. $ wget https://www.w3.org/XML/Test/xmlts20130923.tar.gz -O - | tar -xz
  2. $ ./libxml-fuzzer -merge=1 ./samples ./xmlconf
  3. $ ./libxml-fuzzer -runs=0 ./samples
  4. #950 DONE cov: 18067 ft: 74369 corp: 934/2283Kb exec/s: 950 rss: 215Mb

Merging our generated corpus into the W3C test suite increased the block coverage to 18727, so not that much, but we still got a total of 83972 features, increasing the total throughput of these test cases. Both improvements are most probably due to small samples triggering error conditions that were not covered by the W3C test suite.

Trimming your corpus

After fuzzing the target for a while, you’ll end up with a huge set of fuzzed files. A lot of these files are unnecessary, and trimming them to a much smaller set will provide you with the same code coverage of the target. To achieve this, both projects provide corpus minimization tools.

AFL gives you the afl-cmin shell script that you can use to minimize your corpus. For the previous example, to minimize the corpus generated in the ./out directory, you can generate a minimized set of files into the ./output_corpus directory.

  1. $ afl-cmin -i ./out/queue -o ./output_corpus -- ./.libs/lt-xmllint -o /dev/null @@

AFL also offers another tool afl-tmin that can be used to minimize individual files while maintaining the same coverage as observed initially. Be aware that running afl-tmin on a large set of files can take a very long time, so first do couple of iterations with afl-cmin before trying afl-tmin.

LibFuzzer doesn’t have an external trimming tool – it has the corpus minimization feature, called merge, built-in.

  1. $ ./libxml-fuzzer -merge=1 <output directory> <input directory 1> <input directory 2> ... <input directory n>

LibFuzzer merge is a little easier to use since it looks for files recursively from any number of input directories. Another nice feature in libFuzzer merge is the -max_len flag. Using -max_len=X, libFuzzer will only use the first X bytes from each sample file, so you can collect random samples without caring about their sizes. Without the max_len flag, libFuzzer uses a default maximum length of 1048576 bytes when doing a merge.

With libFuzzer merge, you can use the same technique as you did to generate a corpus out of thin air.

  1. $ for foo in 4 8 16 32 64 128 256 512 1024; do
  2. mkdir ./corpus_max_len-$foo;
  3. ./libxml-fuzzer -merge=1 -max_len=$foo ./corpus-max_len-$foo ./corpus-max_len-* <input-directories>;
  4. done
  5. $ mkdir output_corpus;
  6. $ ./libxml-fuzzer -merge=1 ./output_corpus ./corpus-max_len-*;

With this trimming strategy libFuzzer will first collect new coverage triggering 2 byte chunks from each input sample, then merge those samples to 4 byte chunks, and so on, until you have the optimized set out of all the different length chunks.

A simple merge won’t always help you with performance issues. Sometimes your fuzzer can stumble upon very slow code paths, causing collected samples to start decaying your fuzzing throughput. If you don’t mind sacrificing a few code blocks for performance, libFuzzer can be easily used to remove too slow samples from your corpus. When libFuzzer is run with a list of files as an argument instead of a folder, it will execute every file individually and print out execution time for each file.

  1. $ ./libxml-fuzzer /*
  2. INFO: Seed: 3825257193
  3. INFO: Loaded 1 modules (237370 guards): [0x13b3460, 0x149b148),
  4. ./libxml2/libxml-fuzzer: Running 1098 inputs 1 time(s) each.
  5. Running: ./corpus-dir/002ade626996b33f24cb808f9a948919799a45da
  6. Executed ./corpus-dir/002ade626996b33f24cb808f9a948919799a45da in 1 ms
  7. Running: ./corpus-dir/0068e3beeeaecd7917793a4de2251ffb978ef133
  8. Executed ./corpus-dir/0068e3beeeaecd7917793a4de2251ffb978ef133 in 0 ms

With a snippet of awk, this feature can be used to print out names of files that that took too long to run, in our example 100 milliseconds, and then we can just remove those files.

  1. $ ./libxml-fuzzer /* 2>&1 | awk '$1 == "Executed" && $4 > 100 {print $2}' | xargs -r -I '{}' rm '{}'

Running both fuzzers in parallel

Now that you have a good base corpus, and you know how to maintain it, you can kick off some continuous fuzzing runs. You could run your favorite fuzzer alone, or run both fuzzers separately, but if you’ve got enough hardware available you can also easily run multiple fuzzers simultaneously on the same corpus. That way you get to combine best of both worlds while the fuzzers can share all the new coverage they find.

It’s easy to implement a simple script that will run both fuzzers simultaneously, while restarting the fuzzers every hour to refresh their sample corpus.

  1. $ mkdir libfuzzer-output; echo "" > .libfuzzer-output/1
  2. $ while true; do \
  3. afl-fuzz -d -i ./libfuzzer-output/ -o ./afl-output/ -- ./libxml/afl-output/bin/xmllint -o /dev/null @@ 1>/dev/null & \
  4. ./libxml/libxml-fuzzer -max_total_time=3600 ./libfuzzer-output/; \
  5. pkill -15 afl-fuzz; \
  6. sleep 1; \
  7. mkdir ./libfuzzer-merge; \
  8. ./libxml/libxml-fuzzer -merge=1 ./libfuzzer-merge ./libfuzzer-output/ ./afl-output/; \
  9. rm -rf ./afl-output ./libfuzzer-output; \
  10. mv ./libfuzzer-merge ./libfuzzer-output; \
  11. done

Because the example script only runs one hour per iteration, AFL is used in “quick & dirty mode” to skip all the deterministic steps. Even one large file can cause AFL to spend hours, or even days, on deterministic steps, so it it’s more reliable to run AFL without them when running on time budget. Deterministic steps can be run manually, or automatically on another instance that copies new samples to ‘./libfuzzer_output‘.

Dictionaries

You have your corpus, and you’re happily fuzzing and trimming. Where do you go from here?

Both AFL and libFuzzer support user-provided dictionaries. These dictionaries should contain keywords, or other interesting byte patterns, that would be hard for the fuzzer to determine. For some useful examples, take a look at Google libFuzzer’s XML dictionary and this AFL blog post about dictionaries.

Since these tools are quite popular nowadays, some good base dictionaries can be already found online. For example, Google has collected quite a few dictionaries: https://chromium.googlesource.com/chromium/src/+/master/testing/libfuzzer/fuzzers/dicts. Also, AFL source code contains few example dictionaries. If you don’t have the source code, you can check out afl mirror from github: https://github.com/rc0r/afl-fuzz/tree/master/dictionaries

Both AFL and libFuzzer also collect dictionary during execution. AFL collects dictionary when performing deterministic fuzzing steps, while libFuzzer approach is to instrument.

When running libFuzzer with time or test case limit, libFuzzer will output a recommended dictionary upon exit. This feature can be used to collect interesting dictionary entries, but it is recommended to do manual sanity checks over all automatically collected entries. libFuzzer builds those dictionary entries as it discovers new coverage, so those entries often build up towards the final keyword.

  1. "ISO-"
  2. "ISO-1"
  3. "ISO-10"
  4. "ISO-1064"
  5. "ISO-10646-"
  6. "ISO-10646-UCS-2"
  7. "ISO-10646-UCS-4"

We tested dictionaries with three 10 minute runs: without dictionary, with the recommended dictionary from first run and with the Google’s libFuzzer XML dictionary. Results can be seen from the table below.

Surprisingly, there was no significant difference between the results from the run without dictionary and the run with recommended dictionary from the first run, but with a “real” dictionary there is a dramatic change in the amount of coverage discovered during the run.

Dictionaries can really change the effectiveness of fuzzing, at least on short runs, so they are worth the investment. Shortcuts, like the libFuzzer recommended dictionary, can help, but you still need to do the extra manual effort to leverage the potential in dictionaries.

Fuzzing experiment

Our goal was to do a weekend long run on a couple of laptops. We ran two instances of AFL and libFuzzer, fuzzing the above example. The first instance was started without any corpus, and the second one with trimmed corpus from W3C XML Test Suite. The results could then be compared by performing a dry run for minimized corpus from all four sets. Results from these fuzzers are not directly comparable since both fuzzers use different instrumentation to detect executed code paths and features. libFuzzer measures two things for assessing new sample coverage, block coverage, that is isolated blocks of code visited, the and feature coverage, that is a combination of different code path features like transitions between code blocks and hit counts. AFL doesn’t offer direct count for the observed coverage, but we use overall coverage map density in our comparisons. The map density indicates how many branch tuples we have hit, in proportion to how many tuples the coverage map can hold.

Our first run didn’t go quite as expected. After 2 days and 7 hours we were reminded about the downsides of using deterministic fuzzing on large files. Our afl-cmin minimized corpus contained a couple of over 100kB samples that caused AFL to slow down to crawl after processing only under 38% of the first round. It would have taken days for AFL to get through a single file, and we had four of those in our sample set, so we decided to restart instances, after we removed all over 10kB samples. Sadly, on Sunday night at 11PM, “backup first” wasn’t the first thing in our mind and the AFL plot data was accidentally overwritten, so no cool plots from the first round. We managed to save the AFL UI before aborting.

Full results of our 2 day fuzzing campaign can be found from the image/table below.

We had actually never tried to pit these fuzzers against each other before. Both fuzzers were surprisingly even in our experiment. Starting from the W3C samples, the difference between discovered coverage, as measured by libFuzzer, was only 1.4%. Also both fuzzers found pretty much the same coverage. When we merged all the collected files from the four runs, and the original W3C samples, the combined coverage was only 1.5% higher than the coverage discovered by libFuzzer alone. Another notable thing is that without initial samples, even after 2 days, neither libFuzzer or AFL had discovered more coverage than our previous demonstration in generating a corpus out of thin air did repeatedly in 10 minutes.

We also generated a chart from coverage discovery during libFuzzer fuzzing run with the the W3C samples.

Which one should I use?

As we detailed, AFL is really simple to use, and can be started with virtually no setup. AFL takes care of handling found crashes and stuff like that. However, if you don’t have a ready command line tool like xmllint, and would need to write some code to enable fuzzing, it often makes sense to use libFuzzer for superior performance.

In comparison to AFL, libFuzzer has built-in support for sanitizers, such as AddressSanitizer and UndefinedBehaviorSanitizer, which help in finding subtle bugs during fuzzing. AFL has some support for sanitizers, but depending on your target there might be some serious side effects. AFL documentation suggests on running fuzzing without sanitizers and running the output queue separately with sanitizer build, but there is no actual data available to determine whether that technique can catch the same issues as ASAN enabled fuzzing. For more info about AFL and ASAN you can check docs/notes_for_asan.txt from the AFL sources.

In many cases however it makes sense to run both fuzzers, as their fuzzing, crash detection and coverage strategies are slightly different.

If you end up using libFuzzer, you really should check the Google’s great libFuzzer tutorial.

Happy fuzzing!
Atte & Eero

转:Super Awesome Fuzzing, Part One的更多相关文章

  1. 子类继承父类时JVM报出Error:Implicit super constructor People() is undefined for default constructor. Must define an explicit constructor

    当子类继承父类的时候,若父类没有定义带参的构造方法,则子类可以继承父类的默认构造方法 当父类中定义了带参的构造方法,子类必须显式的调用父类的构造方法 若此时,子类还想调用父类的默认构造方法,必须在父类 ...

  2. [LeetCode] Super Ugly Number 超级丑陋数

    Write a program to find the nth super ugly number. Super ugly numbers are positive numbers whose all ...

  3. Maven Super POM

    Maven super POM defines some properties. Three ways to find it ${M2_HOME}/lib/maven-model-builder-3. ...

  4. java基础 super 子类调用父类

    如果希望在子类中,去调用父类的构造方法,要求在子类的构造函数调用 example如下: package test; /* * 如果希望在子类中,去调用父类的构造方法,要求在子类的构造函数调用 * */ ...

  5. Python类中super()和__init__()的关系

    Python类中super()和__init__()的关系 1.单继承时super()和__init__()实现的功能是类似的 class Base(object): def __init__(sel ...

  6. java方法重载(overload)、重写(override);this、super关键简介

    一.方法重载: 条件:必须在一个类中,方法名称相同,参数列表不同(包括:数据类型.顺序.个数),典型案例构 造方重载.  注意:与返回值无关 二.方法重写: 条件: (1)继承某个类或实现某接口 (2 ...

  7. Java super关键字活用

    在实际开发中我们要自定义组件,就需要继承自某个组件类,如果我们自定义的这个组件类也需要像被继承的这个组件类一样,拥有丰富的构造方法. 关键字super的作用就更加显得尤为重要了,你可以在堆砌自己自定义 ...

  8. 深入super,看Python如何解决钻石继承难题 【转】

    原文地址 http://www.cnblogs.com/testview/p/4651198.html 1.   Python的继承以及调用父类成员 python子类调用父类成员有2种方法,分别是普通 ...

  9. 关于[super dealloc]

    销毁一个对象时,需要重写系统的dealloc方法来释放当前类所拥有的对象,在dealloc方法中需要先释放当前类中所有的对象,然后再调用[super dealloc]释放父类中所拥有的对象.如先调用[ ...

随机推荐

  1. 最大公倍数_Greatest Common Divisor

    计算最大公倍数 Static int gcd( int a, int b) { int t; while( b>0) { t = b; b = a % b; a = t; } return a; ...

  2. Jmeter-8-FTP测试

    1. 此处要深刻理解FTP的用法. 2. Get的时候填写的Remote File 路径/, 此处是相对路径. 实际为/home/user/ 3. Local file 此处要写到具体的文件. 4. ...

  3. SpringBoot jar包不支持jsp

    官方原文如下: When running a Spring Boot application that uses an embedded servlet container (and is packa ...

  4. Chrome浏览器启动页被360导航篡改解决方法

    右键Chrome浏览器快捷方式,选择“属性”,在“目标”的结尾处有添加的网址,删了即可. 2 如果在结尾处没有任何网址,可以添加“ -nohome”,这样下次启动时,就会打开一个空白页,也就不会打开被 ...

  5. JAVA list 列表 字典 dict

    import java.util.ArrayList; import java.util.HashMap; import java.util.Map; import java.util.Set; pu ...

  6. Cordova入门

    创建你的第一个App 因为对接要对接酷音,实现h5跨平台调用客户端的保存和分享功能,所以学了下cordova的入门. 安装Cordova CLI Cordova命令行工具作为npm包分发. 安装cor ...

  7. git 配置多用户

    .ssh 下的 config.txt 内容 # 配置github.com Host github.com HostName github.com IdentityFile ~/.ssh/id_rsa_ ...

  8. python中的argparse模块

    argparse干什么用的? 答:参数设置,比如python demo.py -h 诸如此类的. 开始学习这个模块: parser = argparse.ArgumentParser() #使用这个模 ...

  9. Django之项目搭建和配置总结(一)

    安装和创建虚拟环境 参考:linux系统下Python虚拟环境的安装和使用 安装Django包 先进入虚拟环境,在联网下执行: pip install django==1.8.7 1.8.7表示dja ...

  10. Linux从入门到放弃

    Ch.0 几点Linux常识 Linux严格区分大小写,不像windows中命令是不区分大小写的 Linux中所有内容以文件形式保存,包括硬件 Linux不靠扩展名区分文件类型,所有扩展名只是为了方便 ...