如何用cufflinks 拼出一个理想的注释文件
后记:
cufflinks安装:
下载安装包, 不要下载source code ,直接下载binary.
Source code |
Linux x86_64 binary |
http://cufflinks.cbcb.umd.edu/downloads/cufflinks-2.2.1.Linux_x86_64.tar.gz
下载好后解压,解压后将cuff* 复制到/usr/local/bin中即可。
步骤:
第一步: 产生各自的gtf文件
cufflinks -p 30 -o ROOT RF12.merged.bam
cufflinks -p 30 -o LEAF LF12.merged.bam
发现产生的gtf文件都是单exon
第二部:将产生的gtf文件放到namelist 中
/share/bioinfo/miaochenyong/call_snp/testgroup/newGTF/using_new_versin_naturepipe/LEAF/transcripts.gtf
/share/bioinfo/miaochenyong/call_snp/testgroup/newGTF/using_new_versin_naturepipe/ROOT/transcripts.gtf
第三部执行cuffmerge
cuffmerge -g Osativa_204_gene.gtf -s ./Osativa_204.fa -p 50 new_namelist
结果:
/share/bioinfo/miaochenyong/call_snp/testgroup/newGTF/using_new_versin_naturepipe/merged_asm/merged.gtf
gene_id是merge新生成的, 但是gene_name如果在参考的gtf中有,会保留。 新的gene_name都只是单exon.
background:
在做ASE的过程中,发现很多的SNPsites并没有落到Osativa204 所提供的gene id上,为了给这些位点分配一个gene id, 准备用cufflinks自己拼一个出来
Cufflinks :
assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.
Common uses of the Cufflinks package
Cufflinks includes a number of tools for analyzing RNA-Seq experiments. Some of these tools can be run on their own, while others are pieces of a larger workflow. The complexity of your workflow depends on what you want to achieve with your analysis. For a complete discussion of how Cufflinks can help you with your analysis, please see our protocol paper. The paper includes a diagram (Figure 2) describing how the various parts of the Cufflinks package (and its companion tool TopHat) fit together. As of version 2.2.0, you can also run Cuffquant and Cuffnorm to make large scale analyses easier to handle. The figure below is an updated version of Figure 2 showing how the two utilities released after the protocol paper appeared fit into the workflow:
You can use Cuffquant to pre-compute gene expression levels for each of your samples, which can save time if you have to re-run part of your analysis. Using Cuffquant also makes it easier to spread the load of computation for lots of samples across multiple computers. If you don't want to perform differential expression analysis, you can run Cuffnorm instead of Cuffdiff. Cuffnorm produces simple tables of expression values that you can look at in R (for example) to cluster samples and perform other follow up analysis.
#################################################################
Discovering novel genes and transcripts:
RNA-Seq is a powerful technology for gene and splice variant discovery. You can use Cufflinks to help annotate a new genome or find new genes and splice isoforms of known genes in even well-annotated genomes. Annotating genomes is a complex and difficult process, but we outline a basic workflow that should get you started here. The workflow also excludes examples of the commands you'd run to implement each step in the workflow. Suppose we have RNA-Seq reads from human liver, brain, and heart.
- Map the reads for each tissue to the reference genome
We recommend that you use TopHat to map your reads to the reference genome. For this example, we'll assume you have paired-end RNA-Seq data. You can map reads as follows:
tophat -r 50 -o tophat_brain /seqdata/indexes/hg19 brain_1.fq brain_2.fq tophat -r 50 -o tophat_liver /seqdata/indexes/hg19 liver_1.fq liver_2.fq tophat -r 50 -o tophat_heart /seqdata/indexes/hg19 heart_1.fq heart_2.fq
The commands above are just examples of how to map reads with TopHat. Please see the TopHat manual for more details on RNA-Seq read mapping.
- Run Cufflinks on each mapping file
The next step is to assemble each tissue sample independently using Cufflinks. Assemble each tissue like so:
cufflinks -o cufflinks_brain tophat_brain/accepted_hits.bam
cufflinks -o cufflinks_liver tophat_liver/accepted_hits.bam
cufflinks -o cufflinks_heart tophat_liver/accepted_hits.bam - Merge the resulting assembliesassemblies.txt:
cufflinks_brain/transcripts.gtf
cufflinks_liver/transcripts.gtf
cufflinks_heart/transcripts.gtfNow run the merge script:
cuffmerge -s /seqdata/fastafiles/hg19/hg19.fa assemblies.txt
The final, merged annotation will be in the file merged_asm/merged.gtf. At this point, you can use your favorite browser to explore the structure of your genes, or feed this file into downstream informatic analyses, such as a search for orthologs in other organisms. You can also explore your samples with Cuffdiff and identify genes that are significantly differentially expressed between the three conditions. See the workflows below for more details on how to do this.
- (optional) Compare the merged assembly with known or annotated genes If you want to discover new genes in a genome that has been annotated, you can use cuffcompare to sort out what is new in your assembly from what is already known. Run cuffcompare like this:
cuffcompare -s /seqdata/fastafiles/hg19/hg19.fa -r known_annotation.gtf merged_asm/merged.gtf
Cuffcompare will produce a number of output files that you can parse to select novel genes and isoforms.
#################################################################
Identifying differentially expressed and regulated genes
There are two workflows you can choose from when looking for differentially expressed and regulated genes using the Cufflinks package. The first workflow is simpler and is a good choice when you aren't looking for novel genes and transcripts. This workflow requires that you not only have a reference genome, but also a reference gene annotation in GFF format (GFF3 or GTF2 formats are accepted, see details here). The second workflow, which includes steps to discover new genes and new splice variants of known genes, is more complex and requires more computing power. The second workflow can use and augment a reference gene annotation GFF if one is available.
Differential analysis without gene and transcript discovery
- Map the reads for each condition to the reference genome
We recommend that you use TopHat to map your reads to the reference genome. For this example, we'll assume you have paired-end RNA-Seq data. Suppose you have RNA-Seq from a knockdown experiment where you have two biological replicates of a mock condition as a control and two replicates of your knockdown.
Note: Cuffdiff will work much better if you map your replicates independently, rather than pooling the replicates from one condition into a single set of reads.
Note: While an GTF of known transcripts is not strictly required at this stage, providing one will improve alignment sensitivity, and ultimately, the accuracy of Cuffdiff's analysis.
You can map reads as follows:
tophat -r 50 -G annotation.gtf -o tophat_mock_rep1 /seqdata/indexes/hg19 \
mock_rep1_1.fq mock_rep1_2.fq
tophat -r 50 -G annotation.gtf -o tophat_mock_rep2 /seqdata/indexes/hg19 \
mock_rep2_1.fq mock_rep2_2.fq
tophat -r 50 -G annotation.gtf -o tophat_knockdown_rep1 /seqdata/indexes/hg19 \
knockdown_rep1_1.fq knockdown_rep1_2.fq
tophat -r 50 -G annotation.gtf -o tophat_knockdown_rep2 /seqdata/indexes/hg19 \
knockdown_rep2_1.fq knockdown_rep2_2.fq - Run Cuffdiff Take the annotated transcripts for your genome (as GFF or GTF) and provide them to cuffdiff along with the BAM files from TopHat for each replicate:
cuffdiff annotation.gtf mock_rep1.bam,mock_rep2.bam \
knockdown_rep1.bam,knockdown_rep2.bam
Differential analysis with gene and transcript discovery
- Complete steps 1-3 in "Discovering novel genes and transcripts", above
Follow the protocol for gene and transcript discovery listed above. Be sure to provide TopHat and the assembly merging script with an reference annotation if one is available for your organism, to ensure the highest possible quality of differential expression analysis.
- Run Cuffdiff Take the merged assembly from produced in step 3 of the discovery protocol and provide it to cuffdiff along with the BAM files from TopHat:
cuffdiff merged_asm/merged.gtf liver1.bam,liver2.bam brain1.bam,brain2.bam
As shown above, replicate BAM files for each conditions must be given as a comma separated list. If you put spaces between replicate files instead of commas, cuffdiff will treat them as independent conditions.
详细说明:
http://cufflinks.cbcb.umd.edu/manual.html
如何用cufflinks 拼出一个理想的注释文件的更多相关文章
- 如何用CSS3画出一个立体魔方?
前言 最近在写<动画点点系列>文章,上一期分享了< 手把手教你如何绘制一辆会跑车 >,本期给大家带来是结合CSS3画出来的一个立体3d魔方,结合了js让你随心所欲想怎么转,就怎 ...
- 2017.12.1 如何用java写出一个菱形图案
上机课自己写的代码 两个图形原理都是一样的 1.一共有仨个循环 注意搞清楚每一层循环需要做的事情 2.第一层循环:是用来控制行数 3.第二层循环控制打印空格数 4.第三层循环是用来循环输出星星 imp ...
- MySQL GROUP_CONCAT函数使用示例:如何用一个SQL查询出一个班级各个学科第N名是谁?
如何用一个SQL查询出一个班级各个学科第N名是谁? 首先贴出建表语句,方便大家本地测试: -- 建表语句 CREATE TABLE score ( id INT NOT NULL auto_incre ...
- WPF Blend 脑洞大开的问题:如何用Blend得到或画出一个凹槽、曲面。
原文:WPF Blend 脑洞大开的问题:如何用Blend得到或画出一个凹槽.曲面. 目标图: 步骤一(放置一个矩形,填充蓝色): 步骤二(复制该矩形,并调整边角,填充粉红色): 第三部:让图形部分重 ...
- 如何用css画出三角形
看到有面试题里会有问到如何用css画出三角形 众所周知好多图形都可以拆分成三角形,所以说会了画三角形就可以画出很多有意思的形状 画出三角形的原理是调整border(边框)的四个方向的宽度,线条样式以及 ...
- 如何用 React Native 创建一个iOS APP?(三)
前两部分,<如何用 React Native 创建一个iOS APP?>,<如何用 React Native 创建一个iOS APP (二)?>中,我们分别讲了用 React ...
- 如何用 React Native 创建一个iOS APP?(二)
我们书接上文<如何用 React Native 创建一个iOS APP?>,继续来讲如何用 React Native 创建一个iOS APP.接下来,我们会涉及到很多控件. 1 AppRe ...
- 如何用MathType编辑出积分符号
MathType由于能够编辑出众多的数学符号而备受理工科学生与老师的喜爱.利用它,你可以在文档中随意编写出你想要的公式.对于从来没有用过公式编辑器的人来说,在文档中看到那些复杂的数学公式时总是会为之惊 ...
- java实现拼出漂亮的表格
/* * 在中文 Windows 环境下,控制台窗口中也可以用特殊符号拼出漂亮的表格来. 比如: ┌─┬─┐ │ │ │ ├─┼─┤ │ │ │ └─┴─┘ 其实,它是由如下的符号拼接的: 左上 = ...
随机推荐
- 启动、关闭Service
//获取程序界面中的start.stop两个按钮 start = (Button) findViewById(R.id.start); stop = (Button) findViewById(R.i ...
- Redis persistence demystified - part 2
重写AOF 当AOF文件太大时,Redis将在临时文件重新写入新的内容.重写不会读取旧的AOF文件,而是直接访问内存中数据,以便让新产生的AOF文件最小,重写过程不需要读取磁盘. 重写完成后,Redi ...
- java模式之-模板方法模式
模板方法模式是java设计模式常见的模式之一. <JAVA与模式>中写道: 模板方法模式是类的行为模式.准备一个抽象类,将部分逻辑以具体方法以及具体构造函数的形式实现,然后声明一些抽象方法 ...
- iOS 登陆的实现四种方式
iOS 登陆的实现四种方式 一. 网页加载: http://www.cnblogs.com/tekkaman/archive/2013/02/21/2920218.ht ml [iOS登陆的实现] A ...
- bzoj 2242: [SDOI2011]计算器
#include<cstdio> #include<iostream> #include<map> #include<cmath> #define ll ...
- Rhel6-heartbeat配置文档
系统环境: rhel6 x86_64 iptables and selinux disabled 主机: 192.168.122.119 server19.example.com 192.168.12 ...
- ios 常用的小框架
在ios开发中,一些请求 kvc 下拉列表 图片请求等等自己手写代码的话非常麻烦,还容易出现一系列的问题,现在整理了一些常用的一些小框架. 其中MJExtension 和 MJRefresh 这两个 ...
- 建议入门-用ArcMap进行空间查询与空间连接
1.打开arcmap并导入数据(如本图导入美国地图(usa.mxd)): 2.空间查询操作,在地图上的某片区域点击右键,得到下图,点击identify,此时我在阿拉斯加上面点击的 地图会闪现一下被查询 ...
- monkey(1)
写完应用之后,作完单元测试和功能测试,必要对应用的抗打击能力做个测试,最好的方法是雇个“猴子”在测试,猴子可以胡乱瞎按键,在这种情况下,你的应用是否还能正常工作呢?Android 测试包中提供了一个M ...
- InterruptedException 线程异常
InterruptedException 这个异常一般发生在线程中,当一个正在执行的线程被中断时就会出现这个异常-! 简单的说就是:假如有两个线程,第一个线程正在运行,第二个没有运行,这时第二个线程启 ...