软件地址:

http://www.htslib.org/

功能三大版块 :

Samtools
Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
BCFtools
Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
HTSlib
A C library for reading/writing high-throughput sequencing data
cd samtools-1.x # and similarly for bcftools and htslib
./configure --prefix=/where/to/install
make
make install
export PATH=/where/to/install/bin:$PATH # for sh or bash users

Mapping

To prepare the reference for mapping you must first index   ( BWT:Burrows Wheeler Transform 索引算法,耗时较长,安排好时间)

bwa index <ref.fa>
bwa mem -R '@RG\tID:foo\tSM:bar\tLB:library1' <ref.fa> <read1.fa> <read1.fa> > lane.sam

Ensure that the @RG information here is correct as this information is used by later tools. The SM field must be set to the name of the sample being processed, and LB field to the library.

samtools fixmate -O bam <lane.sam> <lane_fixmate.bam>
samtools sort -O bam -o <lane_sorted.bam> -T </tmp/lane_temp> <lane_fixmate.sam>

In order to reduce the number of miscalls of INDELs in your data it is helpful to realign your raw gapped alignment with the Broad’s GATK Realigner.

java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R <ref.fa> -I <lane.bam> -o <lane.intervals> --known <bundle/b38/Mills1000G.b38.vcf>
java -Xmx4g -jar GenomeAnalysisTK.jar -T IndelRealigner -R <ref.fa> -I <lane.bam> -targetIntervals <lane.intervals> --known <bundle/b38/Mills1000G.b38.vcf> -o <lane_realigned.bam>

BQSR from the Broad’s GATK allows you to reduce the effects of analysis artefacts produced by your sequencing machines. It does this in two steps, the first analyses your data to detect covariates and the second compensates for those covariates by adjusting quality scores.

java -Xmx4g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R <ref.fa> -knownSites >bundle/b38/dbsnp_142.b38.vcf> -I <lane.bam> -o <lane_recal.table>
java -Xmx2g -jar GenomeAnalysisTK.jar -T PrintReads -R <ref.fa> -I <lane.bam> --BSQR <lane_recal.table> -o <lane_recal.bam>

It is helpful at this point to compile all of the reads from each library together into one BAM, which can be done at the same time as marking PCR and optical duplicates. To identify duplicates we currently recommend the use of either the Picard or biobambam’s mark duplicates tool.

java -Xmx2g -jar MarkDuplicates.jar VALIDATION_STRINGENCY=LENIENT INPUT=<lane_1.bam> INPUT=<lane_2.bam> INPUT=<lane_3.bam> OUTPUT=<library.bam>

Once this is done you can perform another merge step to produce your sample BAM files.

samtools merge <sample.bam> <library1.bam> <library2.bam> <library3.bam>
samtools index <sample.bam>

Variant Calling

To convert your BAM file into genomic positions we first use mpileup to produce a BCF file that contains all of the locations in the genome.

bcftools mpileup -Ou -f <ref.fa> <sample1.bam> <sample2.bam> <sample3.bam> | bcftools call -vmO z -o <study.vcf.gz>

To prepare our VCF for querying we next index it using tabix:

tabix -p vcf <study.vcf.gz>

prepare graphs and statistics to assist you in filtering your variants:

bcftools stats -F <ref.fa> -s - <study.vcf.gz> > <study.vcf.gz.stats>
mkdir plots
plot-vcfstats -p plots/ <study.vcf.gz.stats>

filter data

bcftools filter -O z -o <study_filtered..vcf.gz> -s LOWQUAL -i'%QUAL>10' <study.vcf.gz>

Variant filtration is a subject worthy of an article in itself and the exact filters you will need to use will depend on the purpose of your study and quality and depth of the data used to call the variants.

注: 变异过滤要服从具体研究目的以及数据的质量和深度。

samtools 工具的更多相关文章

  1. the grave of my scripts

    不定期更新.......... 1,fetch_seq.py https://github.com/freemao/AHRD/blob/master/fetch_seq.py 提取出你想要得染色体的某 ...

  2. snakemake使用小结

    首先在linux 里配置conda 下载 wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-5.3.1-Linu ...

  3. SAM格式 及 比对工具之 samtools 使用方法

    参考资料: SAMtools(官网) SAM Spec v1.4 (SAM格式 说明书) (重要) samtools-1.3.1 使用手册 (SAMtools软件说明书) samtools常用命令详解 ...

  4. samtools常用命令详解

    samtools的说明文档:http://samtools.sourceforge.net/samtools.shtmlsamtools是一个用于操作sam和bam文件的工具合集.包含有许多命令.以下 ...

  5. 安装生物信息学软件-Samtools

    装完Bowtie2,官方文档给出的栗子说可以玩一玩samtools,所以我入个坑 参考这篇http://m.010lm.com/roll/2016/0620/2343389.html Step 1: ...

  6. samtools常用命令详解(转)

    转自:samtools常用命令详解 samtools的说明文档:http://samtools.sourceforge.net/samtools.shtml samtools是一个用于操作sam和ba ...

  7. 可视化工具之 IGV 使用方法

    整合基因组浏览器(IGV)是一种高性能的可视化工具,用来交互式地探索大型综合基因组数据.它支持各种数据类型,包括array-based的和下一代测序的数据和基因注释. IGV这个工具很牛,发了NB: ...

  8. 推荐一个SAM文件或者bam文件中flag含义解释工具

    SAM是Sequence Alignment/Map 的缩写.像bwa等软件序列比对结果都会输出这样的文件.samtools网站上有专门的文档介绍SAM文件.具体地址:http://samtools. ...

  9. SAMTOOLS使用 SAM BAM文件处理

    [怪毛匠子 整理] samtools学习及使用范例,以及官方文档详解 #第一步:把sam文件转换成bam文件,我们得到map.bam文件 system"samtools view -bS m ...

随机推荐

  1. 1.vue和react的区别

    1.个人感觉Vue好用,react不咋地呀. 2.(网上搜的)Vue的解决方案适用于小型应用,但对于对于大型应用而言不太适合.

  2. leetcode227

    class Solution { public: stack<int> OPD; stack<char> OPR; int calculate(int num1, int nu ...

  3. 客户端如何连接 DataSnap Server 调用服务的方法

    一般http访问的地址是 http://localhost:8099/datasnap/rest/TServerMethods1/EchoString/abc 一.用FDConnection1连接Da ...

  4. 9 个Java 异常处理的规则

    在 Java 中,异常处理是个很麻烦的事情.初学者觉得它很难理解,甚至是经验丰富的开发者也要花费很长时间决定异常是要处理掉和抛出. 所以很多开发团队约定一些原则处理异常.如果你是一个团队的新成员,你可 ...

  5. TEXT 5 Stuff of dreams

    TEXT 5 Stuff of dreams 梦想的精粹 Feb 16th 2006 | CORK AND LONDON From The Economist print edition (译者注:本 ...

  6. spring 整合 struts2 xml配置

    整合之前要搞清楚struts2是什么; struts2:表现层框架  增删改查  作用域  页面跳转   异常处理  ajax 上传下载  excel   调用service spring :IOC/ ...

  7. vmware虚拟机桥接模式不能上网

    方法/步骤     首先我的主机的有线连接是正常的,如下:   但是我的虚拟机的网络连接模式为桥接模式,但是却上不了网,如下:   我们来确认下,我的虚拟机的网络模式,如下:   设置全部都是对的,但 ...

  8. java并发:CAS算法和ABA问题

    CAS算法是硬件对于并发的支持,针对多处理器操作而设计的处理器中的一种特殊指令. CAS用于管理对共享数据的并发访问. java的并发包中,AQS.原子操作类等都是基于CAS实现的. CAS 是一种 ...

  9. 早上突然看明白 shader和材质球的关系

    计算机的世界不外乎 指令+数据 shader即Gpu指令,材质即数据

  10. python模块之time模块

    import time #从1970年1月1号凌晨开始到现在的秒数,是因为这一年unix的第一个商业版本上市了,这个最常用# print(time.time()) # 1491574950.23983 ...