6、RNA-Seq Analysis Pipeline

风中之铃 2024-08-29 12:43:22 原文

Created by Dhivya Arasappan, last modified by Dennis C Wylie on Nov 08, 2015

This pipeline uses an annotated genome to identify differential expressed genes/transcripts. 10 hour minimum ($470 internal, $600 external) per project.

1. Quality Assessment

Quality of data assessed by FastQC; results of quality assessment will be evaluated prior to downstream analysis.

Deliverables:
- reports generated by FastQC
Tools used:
- FastQC: (Andrews 2010) used to generate quality summaries of data:
  - Per base sequence quality report: useful for deciding if trimming necessary.
  - Sequence duplication levels: evaluation of library complexity. Higher levels of sequence duplication may be expected for high coverage RNAseq data.
  - Overrepresented sequences: evaluation of adapter contamination.

2. Fastq Preprocessing

Quality assessment used to decide if any preprocessing of the raw data is required and if so, preprocessing is performed.

Deliverables:
- Trimmed/filtered fastq files.
Tools Used:
- Fastx-toolkit: Used to preprocess fastq files.
  - Fastq quality trimmer: Trimming reads based on quality.
  - Fastq quality filter: Filtering reads based on quality.
- Cutadapt: Used to remove adaptor from reads.

3. Mapping

Mapping to genome reference performed using BWA-mem or Tophat.

Deliverables:
- Mapping results, as bam files and mapping statistics.
Tools Used:
- BWA-mem: (Li 2013) primary aligner used to generate read alignments.
- Tophat: (Kim 2011) aligner used to generate read alignments in a splice-aware manner and identify novel junctions.
- Samtools: (Li 2009) used to generate mapping statistics.

4. Gene/Transcript Counting

Counting the number of reads mapping to annotated intervals to obtain abundance of genes/transcripts.

Deliverables:
- Raw gene/transcript counts
Tools Used:
- HTSeq-count: (Anders 2014) used to count reads overlapping gene intervals.

5. DEG Identification

Normalization and statistical testing to identify differentially expressed genes.

Deliverables:
- DEG Summary and master file containing fold changes and p values for every gene, MA Plots.
Tools Used:
- DESeq2: (Love 2014) used to perform normalization and test for differential expression using the negative binomial distribution.

6、RNA-Seq Analysis Pipeline的更多相关文章

RNA -seq
RNA -seq RNA-seq目的.用处::可以帮助我们了解,各种比较条件下,所有基因的表达情况的差异. 比如:正常组织和肿瘤组织的之间的差异:检测药物治疗前后,基因表达的差异:检测发育过程中,不同 ...
RNA seq 两种计算基因表达量方法
两种RNA seq的基因表达量计算方法: 1. RPKM:http://www.plob.org/2011/10/24/294.html 2. RSEM:这个是TCGAdata中使用的.RSEM据说比 ...
Power BI 与 Azure Analysis Services 的数据关联：1、建立 Azure Analysis Services服务
Power BI 与 Azure Analysis Services 的数据关联:1.建立 Azure Analysis Services服务
xgene：之ROC曲线、ctDNA、small-RNA seq、甲基化seq、单细胞DNA, mRNA
灵敏度高 == 假阴性率低,即漏检率低,即有病人却没有发现出来的概率低. 用于判断:有一部分人患有一种疾病,某种检验方法可以在人群中检出多少个病人来. 特异性高 == 假阳性率低,即错把健康判定为病人 ...
Scrapy框架——介绍、安装、命令行创建，启动、项目目录结构介绍、Spiders文件夹详解（包括去重规则）、Selectors解析页面、Items、pipelines（自定义pipeline）、下载中间件（Downloader Middleware）、爬虫中间件、信号
一介绍 Scrapy一个开源和协作的框架,其最初是为了页面抓取 (更确切来说, 网络抓取 )所设计的,使用它可以以快速.简单.可扩展的方式从网站中提取所需的数据.但目前Scrapy的用途十分广泛,可 ...
7、RNAseq Downstream Analysis
Created by Dennis C Wylie, last modified on Jun 29, 2015 Machine learning methods (including cluster ...
五、Scrapy中Item Pipeline的用法
本文转载自以下链接: https://scrapy-chs.readthedocs.io/zh_CN/latest/topics/item-pipeline.html https://doc.scra ...
09、RNA降解图的计算过程
RNA降解是影响芯片质量的一个很重要的因素,因为RNA是从5’开始降解的,所以理论5’的荧光强度要低于3’.RNA降解曲线可以表现这种趋势. 以样品GSM286756.CEL和GSM286757.CE ...
RNA测序相对基因表达芯片有什么优势？
RNA测序相对基因表达芯片有什么优势? RNA-Seq和基因表达芯片相比,哪种方法更有优势?关键看适用不适用.那么RNA-Seq适用哪些研究方向?是否您的研究?来跟随本文了解一下RNA测序相对基因表达 ...

随机推荐

oracle 序列 + 触发器实现 ID自动增长
1.创建序列 create sequence emp_sequence increment by ----每次增加几个 minvalue ----最小值为1 nomaxvalue----不限制最大值 ...
【leetcode刷题笔记】Implement strStr()
Implement strStr(). Returns a pointer to the first occurrence of needle in haystack, or null if need ...
curl 监控web
[root@rhel6 ~]# curl -I -s -w "%{http_code}\n" -o /dev/null http://127.0.0.1 [root@rhel6 ~ ...
POJ3614 贪心+优先队列
题意:m头牛每头牛有minspf和maxspf,n种spf为spf[i]的防晒霜每种l[i]瓶,尽可能给数量多的牛涂防晒霜,每头牛最多涂一瓶. 思路:贪心想法,实现是每次取出minspf<=sp ...
利用etcd及confd实现配置自动管理
ETCD etcd 架设etcd集群静态启动etcd集群需要每个成员在集群中知道另一个成员.在许多情况下,集群成员的IP可能提前未知.在这种情况下,可以使用etcd集群的自动发现服务.一旦etcd集 ...
解决ul里最后一个li的margin问题
在html+css布局里ul>li挺常用的,在群里(WEB前端开发 458732443)总有新手问怎么解决li的最后一个margin值的问题.下面介绍一下,大神请不要拍砖. 先看两个demo,你 ...
F1 score,micro F1score,macro F1score 的定义
F1 score,micro F1score,macro F1score 的定义 2018年09月28日 19:30:08 wanglei_1996 阅读数 976 本篇博客可能会继续更新最近在 ...
linux命令学习笔记（56）：netstat命令
netstat命令用于显示与IP.TCP.UDP和ICMP协议相关的统计数据,一般用于检验本机各端口的网络连接情况. netstat是在内核中访问网络及相关信息的程序,它能提供TCP连接,TCP和UD ...
Android: 利用SurfaceView绘制股票滑动直线解决延迟问题
1.背景介绍最近项目要绘制股票走势图,并绘制能够跟随手指滑动的指示线(Indicator)来精确查看股票价格和日期.如下图所示: 上图中的那条白色直线就是股票的指示线,用来跟随手指精确确定股票的时间 ...
Android之setContentView和LayoutInflater
setContentView: 1.常用的构造函数: 1)setContentView(int layoutResID) 2)setContentView(View view) 3)setConten ...