海宝建议用这个拼接软件

http://www.broadinstitute.org/software/discovar/blog/?page_id=98

DISCOVAR – variant caller 适合于call variant 和拼接小基因组

DISCOVAR de novo 适合拼接大基因组

下载:

ftp://ftp.broadinstitute.org/pub/crd/DiscovarDeNovo/latest_source_code/LATEST_VERSION.tar.gz

安装:

General Instructions for Building Our Software

System Requirements

Software released from the CRD group at the Broad Institute is built and tested on a modern version of Linux for the x86_64 architecture. Our software does not run on 32-bit machines: you must have a 64-bit Linux system. Our users have successfully built and executed our software using a variety of Linux distributions including Ubuntu, RedHat, and SUSE. We expect that any flavor of x86_64 Linux will work fine, as long as it provides the necessary software prerequisites, as described below.

Basic Compiler and Library Requirements

We rely on reasonably up-to-date versions of these software packages:

  • GCC, with its associated g++ compiler for the C++ language, version 4.7 or later.  We're now using C++11 features, and require a modern GCC.
  • For ARACHNE only, the Xerces-C library. The source can be downloaded from the Xerces-c download page.  Supply the argument --with-xerces=/path/to/xercesc/installation when you run./configure.

The Build Procedure

  • Move the package that you downloaded from our FTP site to a location on you system where you'd like to build the software.
  • Extract the contents by typing: tar xzf package.tgz
  • You'll have a new subdirectory in the current directory named after the package and revision. cd package Some older releases spill all the source into the current directory, rather than creating a new subdirectory. If that's the case, ignore this step.
  • Execute the configuration script by typing: ./configure This assumes that you can copy executables to /usr/local/bin.  If you cannot, you should instead execute: ./configure --prefix=/path/to/install/directory Note: Some older packages lack a configuration script. Consider returning to the FTP site for a more recent version, or just skip this step.
  • Build the software by typing: make all
  • Install the software by typing: make install
  • The executables will be in /usr/local/bin or in /path/to/install/directory/bin.  Make sure this is on your path.

If this goes well, you're ready to go. Consult the manual for the package to learn how to set up your data and what programs to execute.

对于数据的要求:

Sequencing data requirements summary
● Illumina MiSeq or HiSeq 2500 genome sequencers
● PCR-free library preparation
● 250 base paired end reads (or longer)
● ~450 base pair fragment size
● ~60x coverage

Input files
DISCOVAR requires a BAM file containing the raw reads from the sequencer. For variant calling it also
requires a matching reference FASTA file.

call variant  命令:

DISCOVAR can currently generate variants for small regions, and not the entire genome at once. To
generate variants for a 100 kb region for example, use:
Discovar \
READS=reads.bam \
OUT_HEAD=assembly \
REGIONS=1:50000150000
\
REFERENCE=genome.fasta
The complete set of variant calls for this region is given in the text file:
assembly.final.variant

Input files

DISCOVAR requires a BAM file containing the raw reads from the sequencer. For variant calling it also
requires a matching reference FASTA file.
BAM files
The reads to assemble must be in a BAM file or files. The name of the BAM file is specified with the
required argument READS :
READS= filename
Multiple BAM files are specified using a comma separated list:
READS= filename1,filename2,...
Alternatively, the BAM files can be specified in a separate file contain a list of BAM filenames, one per
line:
READS= @listfilename

DISCOVAR calls SAMtools internally to extract reads from the BAM.

Reference file (optional)

This is only required if you are using DISCOVAR as a variant caller. The reference information is used
only for variant calling and not in the assembly process. Specifying a valid FASTA reference file is all
that is required to cause DISCOVAR to generate variants.
To specify a reference FASTA file use the optional argument REFERENCE :
REFERENCE= filename

It should be the same file that was used to generate the alignments in the input BAM file(s), or at least
should share the same coordinate system. The FASTA record names should match those in the BAM
file. Ns are allowed.In addition to the reference FASTA file, DISCOVAR also requires the associated index file ( .fai

DISCOVAR can currently de novo assemble small genomes (up to 50 Mb), with larger genome support

to come soon.

The syntax for DISCOVAR de novo assembly is:
Discovar READS= bamfilenames \
OUT_HEAD= outputfilename \
REGIONS=all

This will take as input all the reads in the BAM file reads.bam , generate an assembly, then write the
output to a set of files prefixed with assembly

安装:

安装命令(在安装包里执行):
$ CC=/opt/centos/devtoolset-1.1/root/usr/bin/gcc CXX=/opt/centos/devtoolset-1.1/root/usr/bin/c++ ./configure
$ make -j 32
$ sudo make install
 
GCC 4.7安装在/opt/centos/devtoolset-1.1,安装办法参考:
 
如果是双端测序, 运行命令:

DiscovarDeNovo

NUM_THREADS=30(线程数)

MAX_MEM_GB=300G(最大内存)

MEMORY_CHECK=False (加上这个没报错)

READS=Project_TongJi_DNAseq_THB/Sample_TongJi-DNA-1/TongJi-DNA-1_CTTGTA_L000_R1.fastq.gz,Project_TongJi_DNAseq_THB/Sample_TongJi-DNA-1/TongJi-DNA-1_CTTGTA_L000_R2.fastq.gz(要用都好隔开)

OUT_DIR=wortdic(结果所在目录)

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DISCOVAR de novo的更多相关文章

  1. De novo 测序基础知识

    名词解释 De novo:拉丁文,从头开始的意思,de nove测序则是指在不需要任何参考序列的情况下对某一物种进行基因组测序,然后将测得的序列进行拼接.组装,从而绘制该物种的全基因组序列图谱. 重测 ...

  2. (转)8 reviews about de novo genome assembly

    转自:http://dskernel.blogspot.com/2012/04/8-reviews-about-de-novo-genome-assembly.html 8 reviews about ...

  3. De novo RNA-Seq Assembly Using De Bruijn Graphs

    De novo RNA-Seq Assembly Using De Bruijn Graphs  2017-06-12 09:42:47     59     0     0 在说基因组的拼接之前,可 ...

  4. 全基因组测序 从头测序(de novo sequencing) 重测序(re-sequencing)

    全基因组测序 全基因组测序分为从头测序(de novo sequencing)和重测序(re-sequencing). 从头测序(de novo)不需要任何参考基因组信息即可对某个物种的基因组进行测序 ...

  5. MCP|ZWT|Precision de novo peptide sequencing using mirror proteases of Ac-LysargiNase and trypsin for large-scale proteomics(基于Ac-LysargiNase和胰蛋白酶的蛋白组镜像de novo测序)

    一.概述 由于难以获得100%的蛋白氨基酸序列覆盖率,蛋白组de novo测序成为了蛋白测序的难点,由Ac-LysargiNase(N端蛋白酶)和胰蛋白酶构成的镜像酶组合可以解决这个问题并具有稳定性, ...

  6. chromosome interaction mapping|cis- and trans-regulation|de novo|SRS|LRS|Haplotype blocks|linkage disequilibrium

    Dissecting evolution and disease using comparative vertebrate genomics-The sequencing revolution   s ...

  7. HHP|HPLC-MS/MS|PMT|PST|de novo|

    生物医学大数据 Protein 应用 人类蛋白质组计划 Gene的存在要依靠在蛋白水平确认基因真实存在. 蛋白质组是确定时间地点的研究单元的蛋白质总体,因为时间.地点和研究单元的相互组合存在多种变化, ...

  8. Uncovering thousands of new peptides with sequence-mask-search hybrid de novo peptide sequencing framework (使用序列掩码搜索结合肽段从头测序框架发现了数千个新肽段)-解读人:刘佳维

    期刊名:Molecular & Cellular Proteomics 发表时间:(2019年12月) IF:4.828 单位: 朱拉隆功大学 费城威斯塔研究所 物种:人 技术:de novo ...

  9. 基因组 de novo 组装原理

    Falcon软件的组装流程 为了错误校正,将原始子reads进行overlap 预组装和错误校正 错误校正后reads的overlap检测 overlap的过滤 从overlap构建图 从图构建con ...

随机推荐

  1. Java:Exception

    异常: 就是程序在运行时出现不正常的情况. 异常的由来:问题也是现实生活中一个具体的事物,也可以通过java的类的形式进行描述,并封装成对象.其实就是Java对不正常情况进行描述后的对象的体现. 两种 ...

  2. Mac OS X 卸载MySQL

    sudo rm /usr/local/mysqlsudo rm -rf /usr/local/mysql*sudo rm -rf /Library/StartupItems/MySQLCOMsudo ...

  3. ubuntu 12.04安装TP-LINK TL-WN725N v2

    用了一个上午,折腾完毕,分享如下. 1.先试了ndiswrapper和compat-wireless,各种不给力.后来看这篇博文<Ubuntu12.04下安装TL-WN322G+无线网卡驱动(R ...

  4. SPOJ 10628 求树上的某条路径上第k小的点

    第k小,很容易会想到用主席树来解决 这里简单想一下树的转移过程 因为本身无向图形成一棵树,那么我们总以1为根,那么之后连下去的边对应的点建立的线段树总是在父亲节点对应的树上加上一个当前点对应位置出现的 ...

  5. 怎么用navicat自动备份mysql数据库

    打开navicat客户端,连上mysql后,双击左边你想要备份的数据库.点击“计划”,再点击“新建批处理作业”.   双击上面的可用任务,它就会到下面的列表里去,代表你选择了这个任务.   点击保存, ...

  6. C++中各种容器的类型与特点

    1.vector 连续存储结构,每个元素在内存上是连续的: 支持高效的随机访问和在尾端插入/删除操作,但其他位置的插入/删除操作效率低下: 2.deque 连续存储结构,即其每个元素在内存上也是连续的 ...

  7. Oracle Database 11g express edition

    commands : show sys connect sys as sysdba or connect system as sysdba logout or disc clear screen or ...

  8. linux内核编译

    1,进入内核源码树,如果是第一次编译,建议清理以下内核功能选择文件: make mrproper 2,删除前一次编译的残留文件: make clean 3,配置内核功能 make menuconfig ...

  9. jQuery 1.7_20120209 学习笔记

    html([val|fn]) parameters: function(index,html) 此函数返回一个html字符串,接受两个参数,index为元素在集合中的索引位置,html为原先的html ...

  10. iPhone 6/6 Plus国行版开卖当日抢购攻略

    在距离苹果首批发售时隔一个月也就是北京时间10月17日,苹果iPhone 6.iPhone 6 Plus终于也要在中国大陆开卖,众多国内用户终于有机会安排自己的购机计划.据不完全数据显示,目前iPho ...