硕士论文的研究方向为Android恶意应用分类,因此花了一点时间去搜集Android恶意样本。其中一部分来自过去论文的公开数据集,一部分来自社区或平台的样本。现做一个汇总,标明了样本或数据集的采集时间、样本数量、对于论文以及获取方式。

List some android malware datasets in academic research.Some of them are still up to date.

  1. 我这里有Drebin的数据集,以及VirusTotal(2018.3)的android恶意样本,约15GB。VirusTotal的数据集在Google云盘上,Drebin数据集我上传了 2560/5560 个到OneDrive(由于空间受限)。需要的可联系我本人,并告知机构和身份(分享Google云盘需要提供你的gmail)。
  2. 历史的数据集例如Drebin、Genome 等可以联系导师,然后发邮件联系他们获取,一些不再共享的也可以联系一些已经拥有数据集的大学和机构,基本上国内知名的大学都会有这些数据集。
  3. VirusTotal的样本可以自己去申请。分为API和恶意文件夹。前者可以等到详细的样本检测报告,后者的话主要是大量的恶意样本。但是VirusTotal样本申请需要填写大量的信息,例如身份、研究的内容、学校和导师的资料等。
  4. Contagio样本的密码,直接联系博主本人即可。
  5. 所有样本仅可用于学术研究,并且请指出样本来源。

VirusTotal Mobile Apps Samples

VirusTotal: Analyze suspicious files and URLs to detect types of malware including viruses, worms, and trojans.

Description: VirusTotal can also be used through a smartphone app. VirusTotal is about empowering the Community in order to build tools that will make the Internet a safer place, as such, we like to credit and feature Community-developed goodies that help the antivirus industry in receiving more files in order to have more visibility into threats. Below you can find links to apps that will allow you to interact with VirusTotal making use of your smartphone, note that these are not developed by VirusTotal itself and so we are not responsible for them.

Sample Volume: N/A

Collected Time: up to date

HomePage: https://www.virustotal.com

Way to get:

  1. If you need a small volume of sample, login to VirusTotal and download manually.
  2. If you need a large volume of sample, email to virusTotal for academic requests. You can choose "access to the Academic API" or "access to a folder of malware"

Contagio Mobile Malware Mini Dump

Description: aka "take a sample, leave a sample"Contagio mobile mini-dump is a part of contagiodump.blogspot.com. Contagio mobile mini-dump offers an upload dropbox for you to share your mobile malware samples. You can also download any samples individually or in one zip.

Sample Volume: N/A

Collected Time: up to date

HomePage: http://contagiominidump.blogspot.hk/

Way to get: free for download in Contagio blogs.And you can also download the sample from this link: http://contagiomobile.deependresearch.org/index.html However, the package need password to decompress, you need to email bloger to get password.

Koodous

Description: Koodous is a collaborative platform that combines the power of online analysis tools with social interactions between the analysts over a vast APKs repository.

Sample Volume: N/A

Collected Time: up to date

HomePage: https://koodous.com/

Way to get: register and download manually or use the api.

The Drebin Dataset

Description: The dataset contains 5,560 applications from 179 different malware families. The samples have been collected in the period of August 2010 to October 2012 and were made available to us by the MobileSandbox project.

Sample Volume: 5,560 applications from 179 different malware families

Collected Time: 2010.8 - 2012.10

Papers:

  1. Daniel Arp, Michael Spreitzenbarth, Malte Huebner, Hugo Gascon, and Konrad Rieck "Drebin: Efficient and Explainable Detection of Android Malware in Your Pocket", 21th Annual Network and Distributed System Security Symposium (NDSS), February 2014
  2. Michael Spreitzenbarth, Florian Echtler, Thomas Schreck, Felix C. Freling, Johannes Hoffmann, "MobileSandbox: Looking Deeper into Android Applications", 28th International ACM Symposium on Applied Computing (SAC), March 2013

HomePage: https://www.sec.cs.tu-bs.de/~danarp/drebin/index.html

Way to get: send email

Android Malware Genome Project

(2015/12/21) Due to limited resources and the situation that students involving in this project have graduated, we decide to stop the efforts of malware dataset sharing.

Description: In this project, we focus on the Android platform and aim to systematize or characterize existing Android malware. Particularly, with more than one year effort, we have managed to collect more than 1,200 malware samples that cover the majority of existing Android malware families, ranging from their debut in August 2010 to recent ones in October 2011.

Sample Volume: more than 1,200

Collected Time: 2010.8 - 2011.10

Papers:

Yajin Zhou, Xuxian Jiang, Dissecting Android Malware: Characterization and Evolution. Proceedings of the 33rd IEEE Symposium on Security and Privacy (Oakland 2012). San Francisco, CA, May 2012

HomePage: http://www.malgenomeproject.org/

Way to get: ask someone who had already get this dataset. following universities, research labs and companies

Kharon Malware Dataset

Description: The Kharon dataset is a collection of malware totally reversed and documented. This dataset has been constructed to help us to evaluate our research experiments. Its construction has required a huge amount of work to understand the malicous code, trigger it and then construct the documentation. This dataset is now available for research purpose, we hope it will help you to lead your own experiments.

Papers: CIDRE, EPI. Kharon dataset: Android malware under a microscope. Learning from Authoritative Security Experiment Results (2016): 1.

Homepage: http://kharon.gforge.inria.fr/dataset/

AMD Project

Description: AMD contains 24,553 samples, categorized in 135 varieties among 71 malware families ranging from 2010 to 2016. The dataset provides an up-to-date picture of the current landscape of Android malware, and is publicly shared with the community.

Sample Volume: 24,553 samples

Collected Time: 2010 to 2016

Papers
Li Y, Jang J, Hu X, et al. Android malware clustering through malicious payload mining[C]//International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, Cham, 2017: 192-214.

Wei F, Li Y, Roy S, et al. Deep Ground Truth Analysis of Current Android Malware[C]//International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, Cham, 2017: 252-276.

Homepage: http://amd.arguslab.org

更多有关于Android恶意分类的资料,可访问我的github。项目地址为:DroidCC,里面包含了Android恶意检测的工具、最近的参考文献、第三方应用市场等资料。

如果仅仅是希望得到恶意样本的,请尽可能通过邮箱联系,并且告知相应的机构和个人身份。未告知身份信息的一律不回复。

Android恶意样本数据集汇总的更多相关文章

  1. GitHub上史上最全的Android开源项目分类汇总 (转)

    GitHub上史上最全的Android开源项目分类汇总 标签: github android 开源 | 发表时间:2014-11-23 23:00 | 作者:u013149325 分享到: 出处:ht ...

  2. ANDROID内存优化——大汇总(转)

    原文作者博客:转载请注明本文出自大苞米的博客(http://blog.csdn.net/a396901990),谢谢支持! ANDROID内存优化(大汇总——上) 写在最前: 本文的思路主要借鉴了20 ...

  3. ANDROID内存优化(大汇总——中)

    转载请注明本文出自大苞米的博客(http://blog.csdn.net/a396901990),谢谢支持! 写在最前: 本文的思路主要借鉴了2014年AnDevCon开发者大会的一个演讲PPT,加上 ...

  4. 准确率99%!基于深度学习的二进制恶意样本检测——瀚思APT 沙箱恶意文件检测使用的是CNN,LSTM TODO

    所以我们的流程如图所示.将正负样本按 1:1 的比例转换为图像.将 ImageNet 中训练好的图像分类模型作为迁移学习的输入.在 GPU 集群中进行训练.我们同时训练了标准模型和压缩模型,对应不同的 ...

  5. CVPR2021提出的一些新数据集汇总

    ​  前言  在<论文创新的常见思路总结>(点击标题阅读)一文中,提到过一些新的数据集或者新方向比较容易出论文.因此纠结于选择课题方向的读者可以考虑以下几个新方向.文末附相关论文获取方式. ...

  6. GitHub上史上最全的Android开源项目分类汇总

    今天在看博客的时候,无意中发现了 @Trinea 在GitHub上的一个项目 Android开源项目分类汇总 ,由于类容太多了,我没有一个个完整地看完,但是里面介绍的开源项目都非常有参考价值,包括很炫 ...

  7. Android 开源项目分类汇总(转)

    Android 开源项目分类汇总(转) ## 第一部分 个性化控件(View)主要介绍那些不错个性化的 View,包括 ListView.ActionBar.Menu.ViewPager.Galler ...

  8. 大礼包!ANDROID内存优化(大汇总)

    写在最前: 本文的思路主要借鉴了2014年AnDevCon开发者大会的一个演讲PPT,加上把网上搜集的各种内存零散知识点进行汇总.挑选.简化后整理而成. 所以我将本文定义为一个工具类的文章,如果你在A ...

  9. Android 开源项目分类汇总

    Android 开源项目分类汇总 Android 开源项目第一篇——个性化控件(View)篇  包括ListView.ActionBar.Menu.ViewPager.Gallery.GridView ...

随机推荐

  1. Python之随机梯度下降

    实现:# -*- coding: UTF-8 -*-""" 练习使用随机梯度下降算法"""import numpy as npimport ...

  2. MySQL修改编码为UTF-8无效果解决办法

    本来这是一件很简单的事,有很多博客里都有教程,但却足足花了我半天的时间才解决问题. 可能是因为我的MySQL安装时没有选择默认路径的原因,按照网上的教程修改了下图中的my.ini配置文件后编码并没有发 ...

  3. 【redis专题(2)】命令语法介绍之string

    REDIS有5大数据结构:string,link,sortedset,sets,hash. 这5个结构我将用5篇文章来记录各自是怎么用的,然后再用一篇文章来说一下各自的应用场景: 更多语法请参考: h ...

  4. [20171110]sql语句相同sql_id可以不同吗.txt

    [20171110]sql语句相同sql_id可以不同吗.txt --//提一个问题,就是sql语句相同sql_id可以不同吗?--//使用dbms_shared_pool.markhot就可以做到. ...

  5. [MapReduce_add_3] MapReduce 通过分区解决数据倾斜

    0. 说明 数据倾斜及解决方法的介绍与代码实现 1. 介绍 [1.1 数据倾斜的含义] 大量数据发送到同一个节点进行处理,造成此节点繁忙甚至瘫痪,而其他节点资源空闲 [1.2 解决数据倾斜的方式] 重 ...

  6. windows服务器设置文件属性设置去掉隐藏已知文件类型的扩展名(即文件后缀名可见)

    摘要: 1.文件后缀名不可见,系统运维过程容易发生同名不同后缀的文件操作混淆的情况 2.windows系统默认是文件后缀名不可见 3.所以需要更改一下配置. 4.操作步骤如下图: (1)点击组织-文件 ...

  7. 4.5Python数据类型(5)之列表类型

    返回总目录 目录: 1.列表的定义 2.列表的常规操作 3.列表的额外操作 (一)列表的定义: 列表的定义 [var1, var2, --, var n ] # (1)列表的定义 [var1, var ...

  8. 变量计算——强制类型转换的js面试题

    console.log(1+"2"+"2"); console.log(1++"2"+"2"); console.log ...

  9. 转载 线程初步了解 - <第一篇>

    操作系统通过线程对程序的执行进行管理,当操作系统运行一个程序的时候,首先,操作系统将为这个准备运行的程序分配一个进程,以管理这个程序所需要的各种资源.在这些资源之中,会包含一个称为主线程的线程数据结构 ...

  10. JAVA 第七周学习总结

    20175308 2018-2019-2 <Java程序设计>第七周学习总结 教材学习内容总结 本周学习第八章:常用实用类 String类: String类位于java.lang包中,被定 ...