http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/tcgaBiolinks.html#gdcquery:_searching_tcga_open-access_data

举例:

Working with TCGAbiolinks package

Antonio Colaprico, Tiago Chedraoui Silva, Luciano Garofano, Catharina Olsen, Davide Garolini, Claudia Cava, Isabella Castiglioni, Thais Sarraf Sabedot, Tathiane Maistro Malta, Stefano Pagnotta, Michele Ceccarelli, Gianluca Bontempi, Houtan Noushmehr

2017-12-06

Updates

Recently the TCGA data has been moved from the DCC server to The National Cancer Institute (NCI) Genomic Data Commons (GDC) Data Portal In this version of the package, we rewrote all the functions that were acessing the old TCGA server to GDC.

The GDC, which receives, processes, harmonizes, and distributes clinical, biospecimen, and genomic data from multiple cancer research programs, has data from the following programs:

  • The Cancer Genome Atlas (TCGA)
  • Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
  • the Cancer Genome Characterization Initiative (CGCI)

The big change is that the GDC data is harmonized against GRCh38. However, not all data has been harmonized yet. The old TCGA data can be acessed through GDC legacy Archive, in which the majority of data can be found.

More information about the project can be found in GDC FAQS

The functions TCGAqueryTCGAdownloadTCGAPrepareTCGAquery_mafTCGAquery_clinical, were replaced by GDCqueryGDCdownloadGDCprepareGDCquery_mafGDCquery_clinical.

And it can acess both the GDC and GDC Legacy Archive.

Note: Not all the examples in this vignette were updated.

Introduction

Motivation: The Cancer Genome Atlas (TCGA) provides us with an enormous collection of data sets, not only spanning a large number of cancers but also a large number of experimental platforms. Even though the data can be accessed and downloaded from the database, the possibility to analyse these downloaded data directly in one single R package has not yet been available.

TCGAbiolinks consists of three parts or levels. Firstly, we provide different options to query and download from TCGA relevant data from all currently platforms and their subsequent pre-processing for commonly used bio-informatics (tools) packages in Bioconductor or CRAN. Secondly, the package allows to integrate different data types and it can be used for different types of analyses dealing with all platforms such as diff.expression, network inference or survival analysis, etc, and then it allows to visualize the obtained results. Thirdly we added a social level where a researcher can found a similar intereset in a bioinformatic community, and allows both to find a validation of results in literature in pubmed and also to retrieve questions and answers from site such as support.bioconductor.org, biostars.org, stackoverflow,etc.

This document describes how to search, download and analyze TCGA data using the TCGAbiolinks package.

Installation

To install use the code below.

source("https://bioconductor.org/biocLite.R")
biocLite("TCGAbiolinks")

For a Graphical User Interface, please see TCGAbiolinksGUI. The GUI in under review and will soon be available in Bioconductor repository.

Citation

Please cite TCGAbiolinks package:

  • “TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data.” Nucleic acids research (2015): gkv1507. (Colaprico, Antonio and Silva, Tiago C. and Olsen, Catharina and Garofano, Luciano and Cava, Claudia and Garolini, Davide and Sabedot, Thais S. and Malta, Tathiane M. and Pagnotta, Stefano M. and Castiglioni, Isabella and Ceccarelli, Michele and Bontempi, Gianluca and Noushmehr, Houtan 2016)

Related publications to this package:

  • “TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages”. F1000Research 10.12688/f1000research.8923.1 (Silva, TC and Colaprico, A and Olsen, C and D’Angelo, F and Bontempi, G and Ceccarelli, M and Noushmehr, H 2016)

Also, if you have used ELMER analysis please cite:

  • Yao, L., Shen, H., Laird, P. W., Farnham, P. J., & Berman, B. P. “Inferring regulatory element landscapes and transcription factor networks from cancer methylomes.” Genome Biol 16 (2015): 105.
  • Yao, Lijing, Benjamin P. Berman, and Peggy J. Farnham. “Demystifying the secret mission of enhancers: linking distal regulatory elements to target genes.” Critical reviews in biochemistry and molecular biology 50.6 (2015): 550-573.
 

GDCquery: Searching TCGA open-access data

 

GDCquery: Searching GDC data for download

You can easily search GDC data using the GDCquery function.

Using a summary of filters as used in the TCGA portal, the function works with the following arguments:

  • project A list of valid project (see table below)
  • data.category A valid project (see list with getProjectSummary(project))
  • data.type A data type to filter the files to download
  • sample.type A sample type to filter the files to download (See table below)
  • workflow.type GDC workflow type
  • barcode A list of barcodes to filter the files to download
  • legacy Search in the legacy repository? Default: FALSE
  • platform Experimental data platform (HumanMethylation450, AgilentG4502A_07 etc). Used only for legacy repository
  • file.type A string to filter files, based on its names. Used only for legacy repository

The next subsections will detail each of the search arguments. Below, we show some search examples:

#---------------------------------------------------------------
# For available entries and combinations please se table below
#--------------------------------------------------------------- # Gene expression aligned against Hg38
query <- GDCquery(project = "TARGET-AML",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts") # All DNA methylation data for TCGA-GBM and TCGA-GBM
query.met <- GDCquery(project = c("TCGA-GBM","TCGA-LGG"),
legacy = TRUE,
data.category = "DNA methylation",
platform = c("Illumina Human Methylation 450", "Illumina Human Methylation 27")) # Using sample type to get only Primary solid Tumor samples and Solid Tissue Normal
query.mirna <- GDCquery(project = "TCGA-ACC",
data.category = "Transcriptome Profiling",
data.type = "miRNA Expression Quantification",
sample.type = c("Primary solid Tumor","Solid Tissue Normal")) # Example Using legacy to accessing hg19 and filtering by barcode
query <- GDCquery(project = "TCGA-GBM",
data.category = "DNA methylation",
platform = "Illumina Human Methylation 27",
legacy = TRUE,
barcode = c("TCGA-02-0047-01A-01D-0186-05","TCGA-06-2559-01A-01D-0788-05")) # Gene expression aligned against hg19.
query.exp.hg19 <- GDCquery(project = "TCGA-GBM",
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "normalized_results",
experimental.strategy = "RNA-Seq",
barcode = c("TCGA-14-0736-02A-01R-2005-01", "TCGA-06-0211-02A-02R-2005-01"),
legacy = TRUE) # Searching idat file for DNA methylation
query <- GDCquery(project = "TCGA-OV",
data.category = "Raw microarray data",
data.type = "Raw intensities",
experimental.strategy = "Methylation array",
legacy = TRUE,
file.type = ".idat",
platform = "Illumina Human Methylation 450")

TCGA下载神器--TCGAbiolinks的更多相关文章

  1. 下载神器(vip下载速度)

    简单介绍: 用过好几款下载神器,现在推荐一款比较好用的软件,强调一点本软件强调开源免费的原则,禁止一切人员在其中收取费用. 我把这款软件放到了,自己的百度云盘. 神器的使用教程如下: 百度云下载连接: ...

  2. 「下载神器」aria2 懒人安装教程 [Windows]

    是一款开源.轻量级的多协议命令行下载工具,支持 HTTP/HTTPS.FTP.SFTP.BitTorrent 和 Metalink 协议,拥有众多第三方支持插件,被誉为「下一代下载工具」和「下载神器」 ...

  3. 力推:无限制下载神器aria2

    百度网盘是一个非常方便的存储以及寻找资源的好帮手,但是百度为了挣钱把非会员的下载网速一再限制(无力吐槽),还还好一直使用油猴插件加idm下载神器来下载百度云文件.奈何idm对bt种子文件不支持下载,终 ...

  4. 在Linode VPS上搭建离线下载神器Aria2+WEBUI管理及对国内云盘看法

    在Linode VPS上搭建离线下载神器Aria2+WEBUI管理及对国内云盘看法 2015-09-21 by Hansen 原文链接:http://www.hansendong.me/archive ...

  5. 开源音乐下载神器XMusicDownloader更新,支持歌单一键下载,支持无损音乐

    开源音乐下载神器XMusicDownloader更新啦,新增网易.腾讯音乐歌单歌曲.歌手歌曲.专辑歌曲一键下载,同时支持下载flac无损音乐. 功能 V1.0 功能开源工具软件XMusicDownlo ...

  6. 百度云下载神器 速盘SpeedPan v1.9.7

    速盘 – 不一样的度盘神器!SpeedPan 是一款由吾爱破解论坛会员"菩提叶"制作的度盘满速下载工具.这款百度网盘高速下载工具,免费小巧简单易用,采用了Aria2多线程下载,支持 ...

  7. b站视频_下载_去水印_视频转mp4-批量下载神器

    b站下载_视频_去水印_转mp4_批量下载的解决办法 以下问题均可解决 b站下载的视频如何保存到本地 b站下载的视频在那个文件夹里 b站下载视频转mp4 b站下载app b站下载在哪 b站下载视频电脑 ...

  8. IDM下载神器

    破解版IDM 个人评价:基本上能满足日常下载需求,除bt.磁力外. 不管是在线视频, 还是音乐, 电子书, 统统都能下载, 还附有爬虫功能~~ 直接附链接: 百度云链接: https://pan.ba ...

  9. 开源工具软件XMusicDownloader——音乐下载神器

    XMusicDownloader,一款 支持从百度.网易.qq和酷狗等音乐网站搜索并下载歌曲的程序. 缘起: 一直用网易音乐听歌,但是诸如李健.周杰伦的不少歌曲,网易都没有版权,要从QQ等音乐去下载, ...

随机推荐

  1. oracle如何四舍五入?

    转自:http://www.jb51.net/article/84924.htm 取整(向下取整): 复制代码代码如下: select floor(5.534) from dual;select tr ...

  2. Spring应用配置文件上传的两种方案

    欢迎查看Java开发之上帝之眼系列教程,如果您正在为Java后端庞大的体系所困扰,如果您正在为各种繁出不穷的技术和各种框架所迷茫,那么本系列文章将带您窥探Java庞大的体系.本系列教程希望您能站在上帝 ...

  3. 个人理解---KMP与Next数组详解

    Kmp就是求子串在母串中的位置等相关问题:当然KMP最重要的是Next数组,也称失败数组,Next[i]代表的意思是子串 sub 从sub[0] 到 sub[i-1]的前缀和后缀的最大匹配.模拟KMP ...

  4. Linux环境下proc的配置c/c++操作数据库简单示例

    在虚拟机上装了oracle11g数据库,原本想利用c/c++学习操作数据库.结果感觉摊上了一个大坑.从安装好oracle数据库到配置好proc的编译选项整整花了二天.但让我意识到自己自己几点薄弱:1. ...

  5. Python开发【笔记】:如何在字典遍历中删除key值?

    数据遍历时不能犯傻系列 前言: 针对字典做一些操作时,有时会遇到下面的状况,列如我们需要把data中的key值根据replace中的映射关系进行替换(Caller替换为caller) data = { ...

  6. android(十五) FTP的两种工作模式

    (一)PORT(主动)方式的连接过程是:客户端向服务器的FTP端口(默认是21)发送连接请求,服务器接受连接,建立一条命令链 当需要传送数据时,客户端在命令链路上用 PORT命令告诉服务器:“我打开了 ...

  7. Git版本控制工具安装与配置

    这里太多,我写在这里方便复制: sudo yum -y install zlib-devel openssl-devel cpio expat-devel gettext-devel curl-dev ...

  8. OpenStack功能简介

    为什要用云? 一.简单的说就是对资源更加合理的分配,使用,比如硬件的数量,带宽等等这些,因为你不能机器买来不需要了再卖掉(当然也可以),带宽跟机房签合同得来一年的,中间不够了也不能加,超了也不退钱 二 ...

  9. HDFS的工作流程分析

    HDFS的工作机制 概述 HDFS集群分为两大角色:NameNode.DataNode NameNode负责管理整个文件系统的元数据 DataNode 负责管理用户的文件数据块 文件会按照固定的大小( ...

  10. Python自动发送邮件-smtplib和email库

    ''' 一.先导入smtplib模块 导入MIMEText库用来做纯文本的邮件模板 二.发邮件几个相关的参数,每个邮箱的发件服务器不一样,以163为例子百度搜索服务器是 smtp.163.com 三. ...