Basic Information

  • Authors: Jooyong Yi, Shin Hwei Tan, Sergey Mechtaev, Marcel Böhme, Abhik Roychoudhury
  • Publication: EMSE'17
  • Conclusion: In general, with the increase of traditional test suite metrics, the reliability of repairs tend to increase. In particular, such a trend is most strongly observed in statement coverage. Their results imply that the traditional test suite metrics proposed for software testing can also be used for automated program repair to improve the reliability of repairs.

Interesting Points

Correlation between Mutation Testing and Automated Program Repair:

To some extent, automated program repair and mutation testing are very similar. It can be viewed that automated program repair “mutates" the original program, this time in an attempt to find a repair. As in mutation testing, mutants that fail to pass all tests in the provided test-suite are considered buggy (hence, incorrect repairs). This conceptual similarity between mutation testing and automated program repair suggests the plausibility of using the mutation score to measure the quality of a test-suite not only for mutation testing but also for automated program repair. Just as a higher mutation score is associated with a better fault-detection ability in mutation testing, it appears plausible to associate a higher mutation score with a better ability to guide a reliable repair.

There is not only similarity but also duality between mutation testing and automated program repair. As pointed out by Weimer et al (2013), “our confidence in mutant testing increases with the set of non-redundant mutants considered, but our confidence in the quality of a program repair gains increases with the set of non-redundant tests." Note that mutation score measures the non-redundancy of killed mutants, not the non-redundancy of tests capable of killing mutants. We introduce a new metric called capable-tests ratio in the next section that measures the non-redundancy of capable tests.

Measure quality of Test-suite quality and Repair

This paper mainly explore the correlation between quality of automated program repair (APR) and test-suite.

To evaluate quality of APR, traditional metrics (i.e., 1) statement coverage, 2) branch coverage, 3) test-suite size, 4) mutation score) and capable-tests ratio are used.

RQs and Results

RQ1: : Is there a negative correlation between the metrics of a testsuite and the regression ratio of automatically generated repairs? In other words, are generated repairs less likely to cause regressions, as test-suite metrics increase?

As the traditional test-suite metrics (statement coverage, branch coverage, test-suite size, and mutation score) increase, the regression rate of automatically generated repairs generally decreases, showing the promise of using the traditional test-suite metrics to control the regression ratio of automatically generated repairs. Capable-tests ratio does not seem as useful as the traditional metrics in controlling the quality of generated repairs.

RQ2: Which test-suite metric is most strongly correlated with the regression ratio of automatically generated repairs?

In our experiments, statement coverage is, on average, more strongly correlated with regression ratio than other metrics we investigate. Our results suggest that to reduce the regression ratio, increasing statement coverage is more promising than improving the other test-suite metrics.

RQ3: Is there a negative correlation between the metrics of a test-suite and the repairability of automated program repair? In other words, should repairability be sacrificed in an attempt to obtain a higher-quality repair via a higher-quality testsuite?

Our experimental results are inconclusive about the correlation between test-suites and repairability. However, we note that increasing test-suite metric does not always decrease repairability. Im some subjects, positive correlations were observed between test-suite metrics and repairability, indicating that as the test-suite metrics increase, repairability tends to increase.

RQ4: Is there a negative correlation between the metrics of a test-suite and repair time? In other words, should more time be spent in an attempt to obtain a higher-quality repair via a higher-quality test-suite?

Our experimental results are inconlusive about the correlation between test-suites and repair time. However, we note that increasing test-suite metric does not always increase repair time. In some subjects, negative correlations were observed between test-suite metrics and repair time, indicating that as the test-suite metrics increase, repair time tends to decrease.

Different Repair Algorithm: SEMFIX

Our experimental results from SEMFIX generally coincide with our finding from the GENPROG experiment, despite the differences in repair algorithms and fault localization techniques. The traditional test-suite metrics are, overall, negatively correlated with regression ratio, similar to our GENPROG experimental results. In particular, **statement coverage** is again shown to be most strongly correlated with regression ratio.

[EMSE'17] A Correlation Study between Automated Program Repair and Test-Suite Metrics的更多相关文章

  1. Reading List on Automated Program Repair

    Some resources: https://www.monperrus.net/martin/automatic-software-repair 2017 [ ] DeepFix: Fixing ...

  2. [Benchmark] Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools

    Basic Information Publication: ICSE'17 Authors: Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, A ...

  3. One example to understand SemFix: Program Repair via Semantic Analysis

    One example to understand SemFix: Program Repair via Semantic Analysis Basic Information Authors: Ho ...

  4. paho_c_pub 使用方法

    Latest Paho Status (2) 摘自:http://modelbasedtesting.co.uk/ I last wrote about the state of Paho in Oc ...

  5. A Great List of Windows Tools

    Windows is an extremely effective and a an efficient operating system. Like any other operating syst ...

  6. docker入门级详解

    Docker 1 docker安装 yum install docker [root@topcheer ~]# systemctl start docker [root@topcheer ~]# mk ...

  7. C#Lambda表达式演变和Linq的深度解析

    Lambda 一.Lambda的演变 Lambda的演变,从下面的类中可以看出,.Net Framwork1.0时还是用方法实例化委托的,2.0的时候出现了匿名方法,3.0的时候出现了Lambda. ...

  8. hadoop 2.7.3本地环境运行官方wordcount

    hadoop 2.7.3本地环境运行官方wordcount 基本环境: 系统:win7 虚机环境:virtualBox 虚机:centos 7 hadoop版本:2.7.3 本次先以独立模式(本地模式 ...

  9. Manual——Test (翻译1)

    LTE Manual ——Logging(翻译) (本文为个人学习笔记,如有不当的地方,欢迎指正!) 1.17.3 Testing framework(测试框架)   ns-3 包含一个仿真核心引擎. ...

随机推荐

  1. vivox23幻彩版手机怎么设置双击息屏

    除了使用电源键来实现快速息屏方式外,我们还能通过双击屏幕的手势来息屏,下面小编就教大家vivox23幻彩版设置双击息屏的方法教程. vivox23幻彩版怎么设置双击息屏 第一步:打开vivox23幻彩 ...

  2. javascript对样式的操作

    js可实现用户对页面中的选择条件改变页面中的样式,页面样式可以通过style修饰,也可以通过css修饰,先来看一下js改变style样式,代码如下: 案例一: <!DOCTYPE html> ...

  3. poj3276 Face The Right Way(反转问题,好题)

    https://vjudge.net/problem/POJ-3276 首先意识到,对一个区间进行两次及以上的反转是没有意义的,而且反转次序不影响最终结果. 有点像二分搜索时用的逐个试的方法,每次翻的 ...

  4. pygame-KidsCanCode系列jumpy-part11-角色动画(下)

    接上节继续,上节并没有处理向左走.向右走的动画效果,这节补上,看似很简单,但是有一些细节还是要注意: def jump(self): hits = pg.sprite.spritecollide(se ...

  5. 利用百度OCR实现验证码自动识别

    在爬取网站的时候都遇到过验证码,那么我们有什么方法让程序自动的识别验证码呢?其实网上已有很多打码平台,但是这些都是需要money.但对于仅仅爬取点数据而接入打码平台实属浪费.所以百度免费ocr正好可以 ...

  6. Spring中Bean的五个作用域

    当通过spring容器创建一个Bean实例时,不仅可以完成Bean实例的实例化,还可以为Bean指定特定的作用域.Spring支持如下5种作用域: singleton:单例模式,在整个Spring I ...

  7. iOS:苹果内购实践

    iOS 苹果的内购 一.介绍 苹果规定,凡是虚拟的物品(例如:QQ音乐的乐币)进行交易时,都必须走苹果的内购通道,苹果要收取大约30%的抽成,所以不允许接入第三方的支付方式(微信.支付宝等),当然开发 ...

  8. DockerSwarm获取Token与常用命令

    一.Token相关 Join tokens是允许一个节点加入集群的密钥.有两种可用的不同的join tokens,一个是用作worker角色,另一个是用作manager角色.在执行swarm join ...

  9. 类中添加log4j日志

    在编写代码的时候需要随时查看工作日志,查看工作日志的好处就是随时能检查出错误.所以我一般就需要在编写代码的前期添加工作日志,以便更好的查看相关错误输出. 以一个springmvc小demo为例子  主 ...

  10. windows10开启hyper-v虚拟化

    windows积极融入虚拟化,对pc体验很不错的! 01.程序更新组件 控制面板--->程序-->打开/关闭 windwods功能--->更新完毕,重启windows 02.确认是否 ...