Chip-seq peak annontation
Chip-seq peak annontation
Chip-seq peak annontation
PeRl
narrowPeak/boardPeak
narrowPeak/boardPeak 是ENCODE可提供下载的两种 Chip-seq 经过参考人类基因组mapping后的关于peak的数据.
其他类型的seq数据储存个数可以参看FAQformat
narrowPeak
数据按照以下规则储存:
1. string chrom: "Reference sequence chromosome or scaffold"
2. uint chromStart: "Start position in chromosome"
3. uint chromEnd: "End position in chromosome"
4. string name: "Name given to a region (preferably unique). Use . if no name is assigned"
5. uint score: "Indicates how dark the peak will be displayed in the browser (0-1000) "
6. char[1] strand: "+ or - or . for unknown"
7. float signalValue: "Measurement of average enrichment for the region"
8. float pValue: "Statistical significance of signal value (-log10). Set to -1 if not used."
9. float qValue: "Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if n-ot used."
10. int peak: "Point-source called for this peak; 0-based offset from chromStart. Set to -1 if no point-sour-ce called."
boardPeak
数据按照以下规则储存:
1. string chrom: "Reference sequence chromosome or scaffold"
2. uint chromStart: "Start position in chromosome"
3. uint chromEnd: "End position in chromosome"
4. string name: "Name given to a region (preferably unique). Use . if no name is assigned"
5. uint score: "Indicates how dark the peak will be displayed in the browser (0-1000) "
6. char[1] strand: "+ or - or . for unknown"
7. float signalValue: "Measurement of average enrichment for the region"
8. float pValue: "Statistical significance of signal value (-log10). Set to -1 if<BR> not used."
9. float qValue: "Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if n-ot used."
在接下去的peak annotation中,只演示narrowPeak.bed格式数据.
示例数据
演示数据来自ENCODE 的 H3K9me3 的Chip-seq,样本ID为ENCFF199BLM.
样本的基本信息:
- Homo sapiens liver male adult (32 years)
- Target: H3K9me3
- Lab: Bing Ren, UCSD
- Project: Roadmap
narrowPeak 数据基本信息:
## seqnames start end name score strand signalValue pValue
## 1 chr10 100134639 100134831 Peak_7974 178 . 4.41261 6.68330
## 2 chr10 100446376 100446664 Peak_4189 244 . 5.13248 8.41215
## 3 chr10 100779568 100779699 Peak_32369 101 . 3.34436 4.50936
## 4 chr10 10088147 10088346 Peak_28509 112 . 4.05830 4.84890
## 5 chr10 101149252 101149594 Peak_6146 211 . 4.89252 7.53305
## 6 chr10 101173156 101173424 Peak_32985 94 . 3.45278 4.33257
## qValue peak
## 1 2.96490 132
## 2 4.37217 165
## 3 1.47374 105
## 4 1.69566 90
## 5 3.67439 174
## 6 1.30993 237
构建注释文件
在peak annotation中需要一个用于参考的注释文件,需要包括一下信息: 1. 染色体名 2. 转录本起始位点 3. 转录本终止位点 4. gene的名字 5. 正反链
注释文件可以直接从外部导入,也可以利用biomaRt 生成
library(ChIPpeakAnno)
library(biomaRt)
mart <- useMart("ensembl")
datasets <- listDatasets(mart)
mart <- useDataset("hsapiens_gene_ensembl",mart)
# 需要筛选的特征
props <- c("ensembl_gene_id", "external_gene_name", "transcript_biotype", "chromosome_name", "start_position", "end_position", "strand")
# 筛选染色体号
lincRNA <- subset(
getBM(attributes=props, mart=mart, filters = "chromosome_name", values = c(1:22,"X","Y")),
transcript_biotype == "lincRNA"
)
# 在染色体前加"chr"保持和narrowPeak数据一致
lincRNA[,4] <- paste0("chr", lincRNA[,4])
得到的数据包含以下信息:
## ensembl_gene_id external_gene_name transcript_biotype chromosome_name
## 1 ENSG00000276255 RP5-881P19.7 lincRNA chr1
## 2 ENSG00000234277 LINC01641 lincRNA chr1
## 3 ENSG00000238107 RP11-495P10.5 lincRNA chr1
## 4 ENSG00000274020 LINC01138 lincRNA chr1
## 5 ENSG00000225620 RP11-569A11.2 lincRNA chr1
## 6 ENSG00000237520 RP11-443B7.2 lincRNA chr1
## start_position end_position strand
## 1 228073909 228076550 -1
## 2 227393591 227431035 1
## 3 148295180 148297556 1
## 4 148290889 148519604 -1
## 5 202632428 202632911 1
## 6 234957231 234959989 1
进行peak annotation
有了这个注释文件以后,我们就可以根据我们想要的筛选规则对peak进行注释,主要用到的包是 ChIPpeakAnno.用到的函数为annotatePeakInBatch
.
# 将前面得到的注释文件转换为RangedData对象
library(ChIPpeakAnno)
myCustomAnno <- RangedData(
IRanges(
start=lincRNA[,"start_position"],
end=lincRNA[,"end_position"],
names=lincRNA[,"ensembl_gene_id"]),
space=lincRNA[,"chromosome_name"],
strand=lincRNA[,"strand"])
# 读入需要注释的narrowPeak数据
bed_file <- read.table("ENCFF199BLM.bed", header = T, sep = "\t", stringsAsFactors = F)
# 将peak数据转换为GRanges
peaks <- toGRanges(bed_file, format="narrowPeak", colNames = colnames(bed_file))
# 根据需要进行筛选
anno <- annotatePeakInBatch(peaks, AnnotationData=myCustomAnno,
output="overlapping",
FeatureLocForDistance="TSS",
bindingRegion=c(-2000, 2000))
从anno
中提取我们需要的lincRNA的ID
result_lincRNA <- anno@elementMetadata$feature
head(result_lincRNA)
## [1] "ENSG00000237579" "ENSG00000235180" "ENSG00000232259" "ENSG00000204365"
## [5] "ENSG00000226578" "ENSG00000260137"
Chip-seq peak annontation的更多相关文章
- 测序深度和覆盖度(Sequencing depth and coverage)
总是跑数据,却对数据一无所知,这说不过去吧. 看几篇文章吧 Sequencing depth and coverage: key considerations in genomic analyses( ...
- getopt两个模块getopt 和gun_getopt 的异同
getopt的两个模块getopt和gun_getopt都可以接收参数,但是又有不同; 先看 getopt.getopt这个模块: import sys import getopt def main( ...
- ChIP-seq技术介绍|易基因
大家好,这里是专注表观组学十余年,多组学科研服务领跑者的易基因. 染色质免疫沉淀后测序(ChIP seq)是一种针对DNA结合蛋白.组蛋白修饰或核小体的全基因组分析技术.由于二代测序技术的巨大进步,C ...
- [LeetCode] Find Peak Element 求数组的局部峰值
A peak element is an element that is greater than its neighbors. Given an input array where num[i] ≠ ...
- LeetCode 162 Find Peak Element
Problem: A peak element is an element that is greater than its neighbors. Given an input array where ...
- Find Peak Element
A peak element is an element that is greater than its neighbors. Given an input array where num[i] ≠ ...
- [LintCode] Find Peak Element 求数组的峰值
There is an integer array which has the following features: The numbers in adjacent positions are di ...
- BZOJ1798: [Ahoi2009]Seq 维护序列seq[线段树]
1798: [Ahoi2009]Seq 维护序列seq Time Limit: 30 Sec Memory Limit: 64 MBSubmit: 5504 Solved: 1937[Submit ...
- a chip multiprocessor
COMPUTER OR GANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE NINTH EDITION A multicore computer ...
随机推荐
- assert 的使用
一直以来没分清什么时候该使用assert,什么时候该使用if.现在将其记录下来 assert 用于检查参数的合法性以及某个预期的结果等,assert只在debug模式中在在.assert是面向程序员的 ...
- Cocos2d-x移植Android 常见问题处理办法
1.函数.变量出现"could not be resolved "问题 出现此问题通常是没有找到cocos2d-x声明的头文件.在eclipse将cocos2d的头文件文件夹引入. ...
- MySQL常用函数 一
一.数学函数ABS(x) 返回x的绝对值BIN(x) 返回x的二进制(OCT返回八进制,HEX返回十六进制)CEILING(x) 返 ...
- BZOJ3033:太鼓达人(DFS,欧拉图)
Description 七夕祭上,Vani牵着cl的手,在明亮的灯光和欢乐的气氛中愉快地穿行.这时,在前面忽然出现了一台太鼓达人机台,而在机台前坐着的是刚刚被精英队伍成员XLk.Poet_shy和ly ...
- BZOJ1996:[HNOI2010]CHORUS 合唱队(区间DP)
Description Input Output Sample Input 4 1701 1702 1703 1704 Sample Output 8 HINT Solution 辣鸡guide真难用 ...
- 【[USACO13NOV]没有找零No Change】
其实我是点单调队列的标签进来的,之后看着题就懵逼了 于是就去题解里一翻,发现楼上楼下的题解说的都好有道理, f[j]表示一个再使用一个硬币就能到达i的某个之前状态,b[now]表示使用那个能使状态j变 ...
- cmd进入指定的文件夹
怎么利用cmd进入指定的文件夹呢? 1:win+r ——cmd 2:进入要到达的盘符 (比如我要进入d盘) 3:然后通过 cd d:\project 进入指定的文件夹
- 【MongoDB】CentOS上安装MongoDB
权限部分尚未测试完成,请勿参考. 1.去官方网站下载Mongodb for linux的包,我没找到CentOS的,随便下载了个mongodb-linux-x86_64-amazon-3.2.0.tg ...
- centos安装hadoop(伪分布式)
在本机上装的CentOS 5.5 虚拟机, 软件准备:jdk 1.6 U26 hadoop:hadoop-0.20.203.tar.gz ssh检查配置 [root@localhost ~]# ssh ...
- Jquery中on绑定事件 点击一次 执行多次 的解决办法
举个例子,在同一个页面有下拉选择框 <select class="mySelect"> <option value="user">按用户 ...