最近接触的数据都是靶向测序,或者全外测序的数据。对数据的覆盖深度及靶向捕获效率的评估成为了数据质量监控中必不可少的一环。

  以前都是用samtools depth 算出单碱基的深度后,用perl来进行深度及捕获效率的计算。今天无意中看到了bamdst(https://github.com/shiquan/bamdst)这个软件,用起来也很方便,参考GitHub,在此记录使用方法。

  下载并安装:下载安装包并解压后,

cd ./bamdst-master
make

  安装好后,需要准备.bed文件及.bam文件,以软件提供的MT-RNR1.bed和test.bam为例:

./bamdst-master/bamdst -p ./bamdst-master/example/MT-RNR1.bed -o ./test ./bamdst-master/example/test.bam

  输出的结果包含7个文件,为:

-coverage.report
-cumu.plot
-insert.plot
-chromosome.report
-region.tsv.gz
-depth.tsv.gz
-uncover.bed

  其中coverage.report提供的信息很多,具体可参照如下:

 [Total] Raw Reads (All reads) // All reads in the bam file(s).
[Total] QC Fail reads // Reads number failed QC, this flag is marked by other software,like bwa. See flag in the bam structure.
[Total] Raw Data(Mb) // Total reads data in the bam file(s).
[Total] Paired Reads // Paired reads numbers.
[Total] Mapped Reads // Mapped reads numbers.
[Total] Fraction of Mapped Reads // Ratio of mapped reads against raw reads.
[Total] Mapped Data(Mb) // Mapped data in the bam file(s).
[Total] Fraction of Mapped Data(Mb) // Ratio of mapped data against raw data.
[Total] Properly paired // Paired reads with properly insert size. See bam format protocol for details.
[Total] Fraction of Properly paired // Ratio of properly paired reads against mapped reads
[Total] Read and mate paired // Read (read1) and mate read (read2) paired.
[Total] Fraction of Read and mate paired // Ratio of read and mate paired against mapped reads
[Total] Singletons // Read mapped but mate read unmapped, and vice versa.
[Total] Read and mate map to diff chr // Read and mate read mapped to different chromosome, usually because mapping error and structure variants.
[Total] Read1 // First reads in mate paired sequencing
[Total] Read2 // Mate reads
[Total] Read1(rmdup) // First reads after remove duplications.
[Total] Read2(rmdup) // Mate reads after remove duplications.
[Total] forward strand reads // Number of forward strand reads.
[Total] backward strand reads // Number of backward strand reads.
[Total] PCR duplicate reads // PCR duplications.
[Total] Fraction of PCR duplicate reads // Ratio of PCR duplications.
[Total] Map quality cutoff value // Cutoff map quality score, this value can be set by -q. default is 20, because some variants caller like GATK only consider high quality reads.
[Total] MapQuality above cutoff reads // Number of reads with higher or equal quality score than cutoff value.
[Total] Fraction of MapQ reads in all reads // Ratio of reads with higher or equal Q score against raw reads.
[Total] Fraction of MapQ reads in mapped reads // Ratio of reads with higher or equal Q score against mapped reads.
[Target] Target Reads // Number of reads covered target region (specified by bed file).
[Target] Fraction of Target Reads in all reads // Ratio of target reads against raw reads.
[Target] Fraction of Target Reads in mapped reads // Ratio of target reads against mapped reads.
[Target] Target Data(Mb) // Total bases covered target region. If a read covered target region partly, only the covered bases will be counted.
[Target] Target Data Rmdup(Mb) // Total bases covered target region after remove PCR duplications.
[Target] Fraction of Target Data in all data // Ratio of target bases against raw bases.
[Target] Fraction of Target Data in mapped data // Ratio of target bases against mapped bases.
[Target] Len of region // The length of target regions.
[Target] Average depth // Average depth of target regions. Calculated by "target bases / length of regions".
[Target] Average depth(rmdup) // Average depth of target regions after remove PCR duplications.
[Target] Coverage (>0x) // Ratio of bases with depth greater than 0x in target regions, which also means the ratio of covered regions in target regions.
[Target] Coverage (>=4x) // Ratio of bases with depth greater than or equal to 4x in target regions.
[Target] Coverage (>=10x) // Ratio of bases with depth greater than or equal to 10x in target regions.
[Target] Coverage (>=30x) // Ratio of bases with depth greater than or equal to 30x in target regions.
[Target] Coverage (>=100x) // Ratio of bases with depth greater than or equal to 100x in target regions.
[Target] Coverage (>=Nx) // This is addtional line for user self-defined cutoff value, see --cutoffdepth
[Target] Target Region Count // Number of target regions. In normal practise,it is the total number of exomes.
[Target] Region covered > 0x // The number of these regions with average depth greater than 0x.
[Target] Fraction Region covered > 0x // Ratio of these regions with average depth greater than 0x.
[Target] Fraction Region covered >= 4x // Ratio of these regions with average depth greater than or equal to 4x.
[Target] Fraction Region covered >= 10x // Ratio of these regions with average depth greater than or equal to 10x.
[Target] Fraction Region covered >= 30x // Ratio of these regions with average depth greater than or equal to 30x.
[Target] Fraction Region covered >= 100x // Ratio of these regions with average depth greater than or equal to 100x.
[flank] flank size // The flank size will be count. 200 bp in default. Oligos could also capture the nearby regions of target regions.
[flank] Len of region (not include target region) // The length of flank regions (target regions will not be count).
[flank] Average depth // Average depth of flank regions.
[flank] flank Reads // The total number of reads covered the flank regions. Note: some reads covered the edge of target regions, will be count in flank regions also.
[flank] Fraction of flank Reads in all reads // Ratio of reads covered in flank regions against raw reads.
[flank] Fraction of flank Reads in mapped reads // Ration of reads covered in flank regions against mapped reads.
[flank] flank Data(Mb) // Total bases in the flank regions.
[flank] Fraction of flank Data in all data // Ratio of total bases in the flank regions against raw data.
[flank] Fraction of flank Data in mapped data // Ratio of total bases in the flank regions against mapped data.
[flank] Coverage (>0x) // Ratio of flank bases with depth greater than 0x.
[flank] Coverage (>=4x) // Ratio of flank bases with depth greater than or equal to 4x.
[flank] Coverage (>=10x) // Ratio of flank bases with depth greater than or equal to 10x.
[flank] Coverage (>=30x) // Ratio of flank bases with depth greater than or equal to 30x.

  

  

bam文件测序深度统计-bamdst的更多相关文章

  1. SAMTOOLS使用 SAM BAM文件处理

    [怪毛匠子 整理] samtools学习及使用范例,以及官方文档详解 #第一步:把sam文件转换成bam文件,我们得到map.bam文件 system"samtools view -bS m ...

  2. Pysam 处理bam文件

    Pysam可用来处理bam文件 安装: 用 pip 或者 conda即可 使用: Pysam的函数有很多,主要的读取函数有: AlignmentFile:读取BAM/CRAM/SAM文件 Varian ...

  3. SAM/BAM文件处理

    当测序得到的fastq文件map到基因组之后,我们通常会得到一个sam或者bam为扩展名的文件.SAM的全称是sequence alignment/map format.而BAM就是SAM的二进制文件 ...

  4. bam文件softclip , hardclip ,markduplicate的探究

      测序产生的bam文件,有一些reads在cigar值里显示存在softclip,有一些存在hardclip,究竟softclip和hardclip是怎么判断出来的,还有是怎么标记duplicate ...

  5. C++使用htslib库读入和写出bam文件

      有时候我们需要使用C++处理bam文件,比如取出read1或者read2等符合特定条件的序列,根据cigar值对序列指定位置的碱基进行统计或者对序列进行处理并输出等,这时我们可以使用htslib库 ...

  6. 文件格式——Sam&bam文件

    Sam&bam文件 SAM是一种序列比对格式标准, 由sanger制定,是以TAB为分割符的文本格式.主要应用于测序序列mapping到基因组上的结果表示,当然也可以表示任意的多重比对结果.当 ...

  7. 测序深度和覆盖度(Sequencing depth and coverage)

    总是跑数据,却对数据一无所知,这说不过去吧. 看几篇文章吧 Sequencing depth and coverage: key considerations in genomic analyses( ...

  8. 键盘录入一个文件夹路径,统计该文件夹(包含子文件夹)中每种类型的文件及个数,注意:用文件类型(后缀名,不包含.(点),如:"java","txt")作为key, 用个数作为value,放入到map集合中,遍历map集合

    package cn.it.zuoye5; import java.io.File;import java.util.HashMap;import java.util.Iterator;import ...

  9. java基础 File与递归练习 使用文件过滤器筛选将指定文件夹下的小于200K的小文件获取并打印按层次打印(包括所有子文件夹的文件) 多层文件夹情况统计文件和文件夹的数量 统计已知类型的数量 未知类型的数量

    package com.swift.kuozhan; import java.io.File; import java.io.FileFilter; /*使用文件过滤器筛选将指定文件夹下的小于200K ...

随机推荐

  1. R 代码积累

    R 代码积累不定期更新 1.阶乘.递归.reduce.sprintf #NO.1 # 阶乘函数 fact <- function(n){ if(n==0) return(1) #基例在这 els ...

  2. 【洛谷P1159】排行榜

    排行榜 题目链接 看到题解中一个很巧妙的做法: 先确定SAME的位置, 将DOWN的按输入顺序从上往下输出 再将UP的接着从上往下输出 这样便可以保证DOWN的人名次一定下降了 UP的人名次一定上升了 ...

  3. iOS开发之widget实现

    前言     iOS extension的出现,方便了用户查看应用的服务,比如用户可以在Today的widgets中查看应用的简略信息,然后点击进入相关的应用界面.暂且不表网络上现有的widget文章 ...

  4. DOM操作指令整理

    DOM操作指令整理: (1) 创建新节点: createDocumentFragment() 创建一个DOM片段 creatElement() 创建一个具体的元素 creatTextNode() 创建 ...

  5. jzoj5195. 【NOIP2017提高组模拟7.3】A(递推,打表)

    Description

  6. 【PTA 天梯赛训练】词频统计(map+vector)

    请编写程序,对一段英文文本,统计其中所有不同单词的个数,以及词频最大的前10%的单词. 所谓“单词”,是指由不超过80个单词字符组成的连续字符串,但长度超过15的单词将只截取保留前15个单词字符.而合 ...

  7. Java中Lambda表达式的简单使用

    Lambda表达式是Java SE 8中一个重要的新特性.你可以把 Lambda表达式 理解为是一段可以传递的代码 (将代码像数据一样进行传递).可以写出更简洁.更灵活的代码.作为一种更紧凑的代码风格 ...

  8. java的动态验证码单线设计

    1.java的动态验证码我这里将介绍两种方法: 一:根据java本身提供的一种验证码的写法,这种呢只限于大家了解就可以了,因为java自带的模式编写的在实际开发中是没有意义的,所以只供学习一下就可以了 ...

  9. 使用CSS3制作首页登录界面实例

    响应式设计 在这个页面中,使用下面3点来完成响应式设计 1.最大宽度 .设定了一个 max-width 的最大宽度,以便在大屏幕时兼容.: 2.margin : 30px auto; 使其保持时刻居中 ...

  10. pyqt5通过qt designer 设计方式连接多个UI图形界面

    当我们通过pyqt开发时,eric6为我们提供了一个方便的工具:图形化的绘制UI工具--qtdesigner.我们可以通过它开发多个UI,然后利用信号-槽工具,将功能代码附着在上面.也可以将多个界面连 ...