Near-optimal RNA-Seq quantification https://pachterlab.github.io/kallisto

输入输出文件说明：http://bio.math.berkeley.edu/eXpress/manual.html

文章标题：

Pseudoalignment for metagenomic read assignment

文章摘要：

We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data. In particular, we show that the recent idea of pseudoalignment introduced in the RNA-Seq context is suitable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software.

文章地址：

https://arxiv.org/abs/1510.07371v2

源代码：

https://pachterlab.github.io/kallisto/about

安装：

wget https://github.com/pachterlab/kallisto/releases/download/v0.43.0/kallisto_linux-v0.43.0.tar.gz

测试：

[biostack@localhost.localdomain test]$ /project/metagenomics_benchmark/kallisto_linux-v0.43.0/kallisto index -i --index transcripts.fasta

[biostack@localhost.localdomain test]$ /project/metagenomics_benchmark/kallisto_linux-v0.43.0/kallisto quant -i --index -o output reads_1.fastq reads_2.fastq（输入文件）

[biostack@localhost.localdomain output]$ more abundance.tsv

target_id length eff_length est_counts tpm

NM_001168316 2283 2105.9 160.606 12581

NM_174914 2385 2207.9 1500.72 112128

NR_031764 1853 1675.9 102.671 10106.2

NM_004503 1681 1503.9 331.118 36320.7

NM_006897 1541 1363.9 664 80311.3

NM_014212 2037 1859.9 55 4878.25

NM_014620 2300 2122.9 591.166 45937.9

NM_017409 1959 1781.9 47 4351.17

NM_017410 2396 2218.9 42 3122.5

NM_018953 1612 1434.9 227.999 26212.1

NM_022658 2288 2110.9 4881 381446

NM_153633 1666 1488.9 361.044 40002.4

NM_153693 2072 1894.9 73.6719 6413.67

NM_173860 849 671.903 962 236189

NR_003084 1640 1462.9 0.00164208 0.18517

使用说明：

kallisto

kallisto是一个用高通量测序片段从ＲＮＡ序列或更为普遍的目标序列中量化转录丰富度的一个程序。它是基于伪对齐的新的数据，用于快速确定reads目标，而无需alignment。在标准的ＲＮＡ序列数据中，kallisto能够在mac系统上用不到十分钟的时间构建索引，用不到三分钟的时间量化（也就是分类）３千ｗ人类的reads。reads伪对齐保留关键信息需要量化，并且kallisto不仅速度快，而且比现有的量化工具准确。事实上，由于伪对齐的过程是对reads出错上的健壮性，在许多基准中kallisto显著优于现有的工具。

kallisto能够用sleuth量化RNA序列分析。

kallisto产生的使用选项，这是一个列表：

kallisto 0.43.0

Usage: kallisto <CMD> [arguments] ..

Where <CMD> can be one of:

    index         Builds a kallisto index #构建一个kallisto索引

    quant         Runs the quantification algorithm #运行量化分析算法

    pseudo        Runs the pseudoalignment step#运行为比对

    h5dump        Converts HDF5-formatted results to plaintext#格式转换

    version       Prints version information#输出版本信息

    cite          Prints citation information#引用信息

Running kallisto <CMD> without arguments prints usage information for <CMD>

关于这些command说明如下：

index ：

kallisto index建立从靶序列的FASTA格式的文件的索引。该指数命令的参数有：

kallisto 0.43.0

Builds a kallisto index

Usage: kallisto index [arguments] FASTA-files#输入文件

Required argument: #必选参数

-i, --index=STRING          Filename for the kallisto index to be constructed #kallisto索引被构建的文件名

Optional argument:

-k, --kmer-size=INT         k-mer (odd) length (default: 31, max value: 31)

    --make-unique           Replace repeated target names with unique names

输入文件为fasta格式，可以是压缩文件。

quant：

kallisto quant运行量化算法。对于定量命令的参数有：

kallisto 0.43.0

Computes equivalence classes for reads and quantifies abundances#对reads进行分类和物种丰富度评估

Usage: kallisto quant [arguments] FASTQ-files #输入文件

Required arguments: #必选参数

-i, --index=STRING            Filename for the kallisto index to be used for

                              quantification  #索引文件

-o, --output-dir=STRING       Directory to write output to  #输出文件目录

Optional arguments:

    --bias                    Perform sequence based bias correction

-b, --bootstrap-samples=INT   Number of bootstrap samples (default: 0)

    --seed=INT                Seed for the bootstrap sampling (default: 42)

    --plaintext               Output plaintext instead of HDF5

    --single                  Quantify single-end reads

    --fr-stranded             Strand specific reads, first read forward

    --rf-stranded             Strand specific reads, first read reverse

-l, --fragment-length=DOUBLE  Estimated average fragment length

-s, --sd=DOUBLE               Estimated standard deviation of fragment length

                              (default: value is estimated from the input data)

-t, --threads=INT             Number of threads to use (default: 1)

    --pseudobam               Output pseudoalignments in SAM format to stdout

kallisto可以处理单端或双端的序列，默认情况下是双端序列，输入为fastq文件：

kallisto quant -i index -o output pairA_1.fastq pairA_2.fastq pairB_1.fastq pairB_2.fastq

对于单端序列可以用选项 --single ，也可用用 -l 和 -s 选项，然后列出输入的fastq文件即可：

kallisto quant -i index -o output --single -l 200 -s 20 file1.fastq.gz file2.fastq.gz file3.fastq.gz

kallisto quant produces three output files by default:

kallisto定量分析默认产生三个输出文件：

abundances.h5 ：二进制文件，包含运行信息，物种丰富度评估，bootstrap 评估等这个文件可以被sleuth打开阅读。
abundances.tsv ：是一个物种丰富度的说明文件。
run_info.json ：是一个包含运行的相关信息

可选参数说明：

Pseudobam：
--pseudobam，所有的伪比对输出格式为格式。可以被定向到一个文件中，也可以用samtools转换成bam。

例如： kallisto quant -i index -o out --pseudobam r1.fastq r2.fastq > out.sam

或者用samtools：

kallisto quant -i index -o out --pseudobam r1.fastq r2.fastq | samtools view -Sb - > out.bam 



　　　　　　　　　　　　　　　　　　（学校的秋天，哈哈）

pseudo

kallisto pseudo只是在伪比对这一环节运行并且其目的是为在单细胞RNA的序列的使用。pseudo详细的命令选项如下：

kallisto 0.43.0

Computes equivalence classes for reads and quantifies abundances

Usage: kallisto pseudo [arguments] FASTQ-files

Required arguments:

-i, --index=STRING            Filename for the kallisto index to be used for

                              pseudoalignment

-o, --output-dir=STRING       Directory to write output to

Optional arguments:

-u  --umi                     First file in pair is a UMI file

-b  --batch=FILE              Process files listed in FILE

    --single                  Quantify single-end reads

-l, --fragment-length=DOUBLE  Estimated average fragment length

-s, --sd=DOUBLE               Estimated standard deviation of fragment length

                              (default: value is estimated from the input data)

-t, --threads=INT             Number of threads to use (default: 1)

    --pseudobam               Output pseudoalignments in SAM format to stdout

该命令的格式和参数的含义是与quant命令相同。然而，pseudo不运行EM算法来量化丰度。此外pseudo指令有一个选项在批处理文件中指定许多细胞，如：

kallisto pseudo -i index -o output -b batch.txt

h5dump

kallisto h5dump转换 hdf5格式。对于h5dump命令的参数有：

kallisto 0.43.0

Converts HDF5-formatted results to plaintext

Usage:  kallisto h5dump [arguments] abundance.h5

Required argument:

-o, --output-dir=STRING       Directory to write output to

kallisto：Near-optimal RNA-Seq quantification的更多相关文章

RNA seq 两种计算基因表达量方法
两种RNA seq的基因表达量计算方法: 1. RPKM:http://www.plob.org/2011/10/24/294.html 2. RSEM:这个是TCGAdata中使用的.RSEM据说比 ...
RNA -seq
RNA -seq RNA-seq目的.用处::可以帮助我们了解,各种比较条件下,所有基因的表达情况的差异. 比如:正常组织和肿瘤组织的之间的差异:检测药物治疗前后,基因表达的差异:检测发育过程中,不同 ...
数据结构（分块）：[HZOI 2015]easy seq
[题目描述] 给定一个序列,下标从0开始,分别为a0,a1,a2...an−1,有m个询问,每次给出l和r,求满足ai=aj且l<=i<=j<=r时j−i的最大值本题强制在线,l和 ...
链终止法|边合成边测序|Bowtie|TopHat|Cufflinks|RPKM|FASTX-Toolkit|fastaQC|基因芯片|桥式扩增|
生物信息学 Sanger采用链终止法进行测序带有荧光基团的ddXTP+其他四种普通的脱氧核苷酸放入同一个培养皿中,例如带有荧光基团的ddATP+普通的脱氧核苷酸A.T.C.G放入同一个培养皿,以此类 ...
xgene：之ROC曲线、ctDNA、small-RNA seq、甲基化seq、单细胞DNA, mRNA
灵敏度高 == 假阴性率低,即漏检率低,即有病人却没有发现出来的概率低. 用于判断:有一部分人患有一种疾病,某种检验方法可以在人群中检出多少个病人来. 特异性高 == 假阳性率低,即错把健康判定为病人 ...
泡泡一分钟：Optimal Trajectory Generation for Quadrotor Teach-And-Repeat
张宁 Optimal Trajectory Generation for Quadrotor Teach-And-Repeat链接:https://pan.baidu.com/s/1x0CmuOXiL ...
RNA测序相对基因表达芯片有什么优势？
RNA测序相对基因表达芯片有什么优势? RNA-Seq和基因表达芯片相比,哪种方法更有优势?关键看适用不适用.那么RNA-Seq适用哪些研究方向?是否您的研究?来跟随本文了解一下RNA测序相对基因表达 ...
xgene：WGS，突变与癌，RNA-seq，WES
人类全基因组测序06 SNP(single nucleotide polymorphism):有了10倍以上的覆盖深度以后,来确认SNP信息,就相当可靠了. 一个普通黄种人的基因组,与hg19这个参 ...
如果你也会C#，那不妨了解下F#（4）：了解函数及常用函数
函数式编程其实就是按照数学上的函数运算思想来实现计算机上的运算.虽然我们不需要深入了解数学函数的知识,但应该清楚函数式编程的基础是来自于数学. 例如数学函数$f(x) = x^2+x$,并没有指定 ...

随机推荐

HDU 5783 Divide the Sequence（数列划分）
p.MsoNormal { margin: 0pt; margin-bottom: .0001pt; text-align: justify; font-family: Calibri; font-s ...
怎样去除ul li a标签文字下的下划线
这个主要是text-decoration属性,颜色的话就是普通的了 <style> ul li a{ text-decoration:none; } ul li a { color: ...
sqlite与android交互 (封装)
学android已经有大概一周时间了吧 ,总感觉自己基础不怎么好,只能通过一点一点积累着敲来巩固平常的知识,有的时候就先不封装的敲一遍,再封装上,有些语句真的记不住,虽然知道他是什么意思,于是乎就反复 ...
centos7 禁止防火墙
#CentOS .0默认使用的是firewall作为防火墙,这里改为iptables防火墙. #firewall: systemctl start firewalld.service#启动firewa ...
整合Spring Data JPA与Spring MVC: 分页和排序
之前我们学习了如何使用Jpa访问关系型数据库.比较完整Spring MVC和JPA教程请见Spring Data JPA实战入门,Spring MVC实战入门. 通过Jpa大大简化了我们对数据库的开发 ...
window.onload()和$(function(){});的区别
1.window.onload必须等到页面中所有元素加载完之后才会执行(包括图片.视频等)而$(function(){});是在结构绘制完毕之后执行,二者的执行时机是不同的,一般来说后者会首先执行 2 ...
Hibernate疑问
官方User_guide中,3.2节 JPA Bootstrapping 第一段最后一句话, The standardized approach has some limitations in cer ...
Java 实现word 中写入文字图片的解决方案
JAVA生成WORD文件的方法目前有以下两种方式: 一种是jacob 但是局限于windows平台往往许多JAVA程序运行于其他操作系统在此不讨论该方案; 一种是poi但是他的excel处理很程序 ...
rebuild new environment for DW step
Steps to rebuild PPE environment: (CTS) 1, Disable both CTS Daily Job (Daily) and CTS Daily Job (Sta ...
UI里的UIActionSheet按钮
1.效果图:分别为有短信分享无短信分享 -(void)viewDidLoad{ //添加按钮 UIButton *share ...

kallisto：Near-optimal RNA-Seq quantification