RNA-seq数据的比对结果怎么解读?网上有很多人问,这里做一个大致的总结。

Hisat2和bowtie2比对后产生的Alignment summary的格式是一样的,如下:

Alignment summary

When HISAT2 finishes running, it prints messages summarizing what happened. These messages are printed to the "standard error" ("stderr") filehandle. For datasets consisting of unpaired reads, the summary might look like this:

单端数据比对的结果如下:

  1. 20000 reads; of these:
  2. 20000 (100.00%) were unpaired; of these:
  3. 1247 (6.24%) aligned 0 times
  4. 18739 (93.69%) aligned exactly 1 time
  5. 14 (0.07%) aligned >1 times
  6. 93.77% overall alignment rate

For datasets consisting of pairs, the summary might look like this:

双端数据比对的结果如下:

  1. 10000 reads; of these:
  2. 10000 (100.00%) were paired; of these:
  3. 650 (6.50%) aligned concordantly 0 times
  4. 8823 (88.23%) aligned concordantly exactly 1 time
  5. 527 (5.27%) aligned concordantly >1 times
  6. ----
  7. 650 pairs aligned concordantly 0 times; of these:
  8. 34 (5.23%) aligned discordantly 1 time
  9. ----
  10. 616 pairs aligned 0 times concordantly or discordantly; of these:
  11. 1232 mates make up the pairs; of these:
  12. 660 (53.57%) aligned 0 times
  13. 571 (46.35%) aligned exactly 1 time
  14. 1 (0.08%) aligned >1 times
  15. 96.70% overall alignment rate

(copy自:Hisat2官网

单端的就没什么好说的了,主要看双端序列比对:

650 (6.50%) aligned concordantly 0 times 是什么意思? 其实第一部分描述的是pair-end模式下的一致比对结果,aligned concordantly就是read1和read2同时合理的比对到了基因组/转录组上。

8823 (88.23%) aligned concordantly exactly 1 time,exactly 1 time 就是只有一种比对结果。>1 times 就是read1和read2可以同时比对到多个地方。

第二部分,pair-end模式下不一致的比对结果。

650 pairs aligned concordantly 0 times; of these: 34 (5.23%) aligned discordantly 1 time

concordantly 0 times就是read1和read2不能同时合理的同时比对到基因组/转录组上,aligned discordantly 1 time 最难理解。

以下是bowtie2官网对 aligned discordantly 的解释:

Concordant pairs match pair expectations, discordant pairs don’t

A pair that aligns with the expected relative mate orientation and with the expected range of distances between mates is said to align “concordantly”. If both mates have unique alignments, but the alignments do not match paired-end expectations (i.e. the mates aren’t in the expected relative orientation, or aren’t within the expected distance range, or both), the pair is said to align “discordantly”. Discordant alignments may be of particular interest, for instance, when seeking structural variants.

The expected relative orientation of the mates is set using the --ff--fr, or --rf options. The expected range of inter-mates distances (as measured from the furthest extremes of the mates; also called “outer distance”) is set with the -I and -X options. Note that setting -I and -X far apart makes Bowtie 2 slower. See documentation for -I and -X.

To declare that a pair aligns discordantly, Bowtie 2 requires that both mates align uniquely. This is a conservative threshold, but this is often desirable when seeking structural variants.

By default, Bowtie 2 searches for both concordant and discordant alignments, though searching for discordant alignments can be disabled with the --no-discordantoption.

Mixed mode: paired where possible, unpaired otherwise

If Bowtie 2 cannot find a paired-end alignment for a pair, by default it will go on to look for unpaired alignments for the constituent mates. This is called “mixed mode.” To disable mixed mode, set the --no-mixed option.

Bowtie 2 runs a little faster in --no-mixed mode, but will only consider alignment status of pairs per se, not individual mates.

重点:If both mates have unique alignments, but the alignments do not match paired-end expectations (i.e. the mates aren’t in the expected relative orientation, or aren’t within the expected distance range, or both), the pair is said to align “discordantly”.

discordantly比对就是read1和read2都能比对上,但是不合理,

1. 比对方向不对,pair-end测序的方向是固定的;

2.read1和read2的插入片段长度是有限的

第三部分就是对剩余reads(既不能concordantly,也不能discordantly 1 time)的单端模式的比对,没什么好说的。

以上的理解来自SEQanswers

16182999 reads; of these:
16182999 (100.00%) were paired; of these:
5731231 (35.42%) aligned concordantly 0 times
4522376 (27.95%) aligned concordantly exactly 1 time
5929392 (36.64%) aligned concordantly >1 times
----
5731231 pairs aligned concordantly 0 times; of these:
2381431 (41.55%) aligned discordantly 1 time
----
3349800 pairs aligned 0 times concordantly or discordantly; of these:
6699600 mates make up the pairs; of these:
3814736 (56.94%) aligned 0 times
1883429 (28.11%) aligned exactly 1 time
1001435 (14.95%) aligned >1 times
88.21% overall alignment rate

the Bowtie2 result summary is divided in 3 sections:
Concordant alignment - In your data (4522376 + 5929392) reads align concordantly. Which is 64.59% of reads
Discordant alignment - So now 5731231 reads remain which is 35.41% (100-64.59). Of these, 2381431 reads align discordantly. That is to say, of the non-concordant fraction, 41.55% of reads (2381431 reads) align discordantly.
The rest - Now, remember that alignment whether concord. or discord., but both are aligned in paired-end mode. The rest of the reads either align as singles (i.e. Read1 in one locus & Read2 in completely different locus or one mate aligned and the other unaligned) or may not align at all. So the reads that are in this section is Total -(Concord.+Discord.). That is 16182999 -(10451768+2381431) = 3349800 reads.
Since alignment, if any, here is in single fashion so we calculate in mates (readsx2).

Now to reach the overall alignment, count the mates in total (i.e. mates aligned in paired and mates aligned in single fashion). That would be -
(10451768 x2)+(2381431 x2)+1883429+1001435 = 28551262 mates
That is 28551262 mates aligned of total (16182999 x2) mates, which is 88.21%.

为什么要区分这么多比对类型,简单点不好吗?no

在具体项目分析时你就会体会到这些比对类型信息的重要性,有时间再讲。see you next time.

Hisat2 bowtie2比对结果解读(Hisat2 Alignment summary)的更多相关文章

  1. Java--API解读之Method Summary

    参考来源:Java 中静态方法 实例方法 具体方法区别与联系 JAVA Method Summary网页 * Static Method :"静态方法",直接引用,无需创建对象: ...

  2. 转录组分析---Hisat2+StringTie+Ballgown使用

    转录组分析---Hisat2+StringTie+Ballgown使用 (2016-10-10 08:14:45) 转载▼ 标签: 生物信息学 转录组   1.Hisat2建立基因组索引: First ...

  3. HISAT2+StringTie+Ballgown安装及使用流程

    HISAT2+StringTie+Ballgown安装及使用流程 2015年Nature Methods上面发表了一款快速比对工具hisat,作为接替tophat和bowtie的比对工具,它具有更快的 ...

  4. 单细胞RNA-seq比对定量用什么工具好?使用哪个版本的基因组?数据来说话

    这么多工具和基因组版本,选择困难症犯了,到底用哪个好呢? 2018 nature - Developmental diversification of cortical inhibitory inte ...

  5. WPF报表自定义通用可筛选列头-WPF特工队内部资料

    由于项目需要制作一个可通用的报表多行标题,且可实现各种类型的内容显示,包括文本.输入框.下拉框.多选框等(自定的显示内容可自行扩展),并支持参数绑定转换,效果如下: 源码结构 ColumnItem类: ...

  6. Vulkan(1)用apispec生成Vulkan库

    Vulkan(1)用apispec生成Vulkan库 我的Vulkan.net库已在(https://github.com/bitzhuwei/Vulkan.net)开源,欢迎交流. apispec. ...

  7. (七十四)c#Winform自定义控件-金字塔图表

    前提 入行已经7,8年了,一直想做一套漂亮点的自定义控件,于是就有了本系列文章. GitHub:https://github.com/kwwwvagaa/NetWinformControl 码云:ht ...

  8. RNA-Seq基因组比对工具HISAT2

    原文网址: http://blog.biochen.com/archives/337 HISAT2是TopHat2/Bowti2的继任者,使用改进的BWT算法,实现了更快的速度和更少的资源占用,作者推 ...

  9. hisat2+stringtie+ballgown

    hisat2+stringtie+ballgown Posted on 2016年11月25日 早在去年九月,我就写个博文说 RNA-seq流程需要进化啦!http://www.bio-info-tr ...

随机推荐

  1. 主动攻击:利用ms08_067_netapi进行攻击

    利用ms09_053_wins进行攻击 ms08_067漏洞 如果用户在受影响的系统上收到特制的 RPC 请求,则该漏洞可能允许远程执行代码. 在 Microsoft Windows 2000.Win ...

  2. NetworkExtension

    一, 按照网上的方法: iOS 无法获取 WiFi 列表?一定是因为你不知道这个框架 提交了申请.两个小时候后苹果回应邮件.意思就是如果只是使用 App Proxy, Content Filter, ...

  3. CentOS7 安装git服务器

    在CentOS7系统中安装git服务器有两种方法,分别为yum安装和下载git安装包手动安装,这篇文章只有下载git安装包手动安装方法. 方法一:使用yum安装 暂无 方法二:下载git安装包手动安装 ...

  4. 41. 包含min函数的栈

    包含min函数的栈 描述 设计一个支持push,pop,top等操作并且可以在O(1)时间内检索出最小元素的堆栈. push(x)–将元素x插入栈中 pop()–移除栈顶元素 top()–得到栈顶元素 ...

  5. CodeChef - ELHIDARR Find an element in hidden array(互动题)题解

    题意:有一串不递减的串,串中的任意元素都有k个,除了一个元素,他只有1 <= n < k-1个,你现在能向oj做出以下操作: 输出:1 pos,oj会返回pos位置的元素值 输出:2 va ...

  6. hexo在github和coding.net部署并分流(一)

    安装GIT和Node.JS 首先在自己的电脑上安装好git和node.js,这一步怎么做自己搜索,安装软件都是下一步下一步,应该不难,GIT安装完成后打开git cmd输入 git config -- ...

  7. Vue的生命周期(钩子函数)

    Vue一共有10个生命周期函数,我们可以利用这些函数在vue的每个阶段都进行操作数据或者改变内容. 其实在Vue的官网有一张图已经很好的诠释了生命周期,我在这里就不再多讲了,直接贴图,然后上程序代码. ...

  8. Codeforces Beta Round #65 (Div. 2) C. Round Table Knights

    http://codeforces.com/problemset/problem/71/C 题意: 在一个圆桌上有n个人,每个人要么是1,要么就是0,现在要判断是否能由一些1相连构成正多边形. 思路: ...

  9. js之鼠标随动后面跟随事件(类似于长龙跟着跑)

    <!DOCTYPE html> <html lang="zh"> <head> <meta charset="UTF-8&quo ...

  10. Node内核基本自带模块fs 文件的读写

    在node的内核中存在一些内置的模块 这些是最基本的服务端所必要的 1:node全局环境:global类似于浏览器端的window 2:文件读取模块:fs fs模块同时提供了异步和同步的方法. 'us ...