FASTQ format

每个FASTQ文件中每个序列通常有四行信息：

1：以 '@' 字符开头，后面紧接着的是序列标识符和可选字段的描述(类似FASTA title line).

2：序列

3：以 '+' 字符开头，后面紧接着的是可选字段的描述性信息

4：第二行序列的质量信息

Illumina sequence identifiers

@HWUSI-EAS100R:6:73:941:1973#0/1

sequence identifiers	description
HWUSI-EAS100R	the unique instrument name
6	flowcell lane
73	tile number within the flowcell lane
941	'x'-coordinate of the cluster within the tile
1973	'y'-coordinate of the cluster within the tile
#0	index number for a multiplexed sample (0 for no indexing)
/1	the member of a pair, /1 or /2 (paired-end or mate-pair reads only)

Versions of the Illumina pipeline since 1.4 appear to use #NNNNNN instead of #0 for the multiplex ID, where NNNNNN is the sequence of the multiplex tag.

With Casava 1.8 the format of the '@' line has changed:

@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG

sequence identifiers	description
EAS139	the unique instrument name
136	the run id
FC706VJ	the flowcell id
2	flowcell lane
2104	tile number within the flowcell lane
15343	'x'-coordinate of the cluster within the tile
197393	'y'-coordinate of the cluster within the tile
1	the member of a pair, 1 or 2 (paired-end or mate-pair reads only)
Y	Y if the read is filtered, N otherwise
18	0 when none of the control bits are on, otherwise it is an even number(偶数)
ATCACG	index sequence

将FASTQ 转换为 FASTA 格式:

zcat input_file.fastq.gz | awk 'NR%4==1{printf ">%s\n", substr($0,2)}NR%4==2{print}' > output_file.fa

#printf 命令的语法：format-string 为格式控制字符串，arguments 为参数列表。

printf  format-string  [arguments...]

#substr(s,p) 返回字符串s中从p开始的后缀部分

#substr(s,p,n) 返回字符串s中从p开始长度为n的后缀部分。

FASTQ format的更多相关文章

怎么检测自己fastq的Phred类型 | phred33 phred64
http://wiki.bits.vib.be/index.php/Identify_the_Phred_scale_of_quality_scores_used_in_fastQ # S - San ...
Quality assessment and quality control of NGS data
http://www.molecularevolution.org/resources/activities/QC_of_NGS_data_activity_new table of contents ...
Canu Tutorial（canu指导手册）
链接:Canu Tutorial Canu assembles reads from PacBio RS II or Oxford Nanopore MinION instruments into u ...
het smooth 组装高杂合度二倍体基因组前期数据处理
http://sourceforge.net/projects/het-smooth/ equencing technologies, such as Illumina sequencing, pro ...
去除reads中的pcr 重复，fastquniq
改编: python ~/tools2assemble/run_fastuniq.py SHT-3K-1_1.fq.gz SHT-3K-1_2.fq.gz 好像不支持gz文件,要先解压 http:// ...
Question: Should I use reads with good quality but failed-vendor flag?--biostart for vendor quality
https://www.biostars.org/p/198405/ Quick question is: I have some mapped reads in bam file which hav ...
<二代測序> 下载 NCBI sra 文件
本文近期更新地址: http://blog.csdn.net/tanzuozhev/article/details/51077222 随着測序技术的不断提高.二代測序数据成指数增长. NCBI提供了S ...
利用Bioperl的SeqIO模块解析fastq文件
测序数据中经常会接触到fastq格式的文件,比如说拿到fastq格式的原始数据后希望查看测序碱基的质量并去除低质量碱基.一般而言大家都是用现有的工具,比如说fastqc这个Java写的小程序,确实很好 ...
fasta/fastq格式解读
1)知识简介--------------------------------------------------------1.1)测序质量值首先在了解fastq,fasta之前,了解一下什么是质量 ...

随机推荐

Servlet------>servletDemo 及细节注意
原理图: 前提:我用的命令行都是mac系统下用的,非win jsp实质是一个servlet,所以要先了解servlet,如上页面是一个servletdemo,下面是尝试的步骤 1.先写好Demo.ja ...
git学习——<四>git版本管理
一.git版本管理的优势都说git比svn强大,强大在哪呢? 首先,从部署上说:svn.cvs都是集中式的,一台服务器上部署服务,所有客户端编写的代码都要提交到该服务器上.git是分布式的,所有人都 ...
scrapy之中间件
中间件的简介 1.中间件的作用在scrapy运行的整个过程中,对scrapy框架运行的某些步骤做一些适配自己项目的动作. 例如scrapy内置的HttpErrorMiddleware,可以在http ...
understand EntityManager.joinTransaction()
Join Transaction The EntityManager.joinTransaction() API allows an application managed EntityManager ...
Python之配置文件读写
ConfigParser模块一.创建配置文件在D盘建立一个配置文件,名字为:test.ini 内容如下: [baseconf] host=127.0.0.1 port=3306 user=root ...
mysql5.7新特性探究
一.MySql5.7增加的特性 1.MySql服务方面新特性 1) 初始化方式改变 MySql5.7之前版本初始化方式: scripts/mysql_install_db MySql5.7版本初始化方 ...
基于Flume+Kafka+ Elasticsearch+Storm的海量日志实时分析平台（转）
0背景介绍随着机器个数的增加.各种服务.各种组件的扩容.开发人员的递增,日志的运维问题是日渐尖锐.通常,日志都是存储在服务运行的本地机器上,使用脚本来管理,一般非压缩日志保留最近三天,压缩保留最近1 ...
pytorch rnn
温习一下,写着玩. import torch import torch.nn as nn import numpy as np import torch.optim as optim class RN ...
关于/proc/进程idpid/fd ,根据fd来查找连接
当创建好epoll句柄后,它就是会占用一个fd值,在linux下如果查看/proc/进程id/fd/,是能够看到这个fd的,所以在使用完epoll后,必须调用close()关闭,否则可能导致fd被耗尽 ...
Linux系统——本地yum仓库安装
一.yum仓库概述 yum是基于rpm包管理,能够从指定的服务器自动下载rpm包并且安装,可以自动处理依赖性关系,并且一次安装所有依赖的软件包,无需繁琐地一次次下载.安装. 二.yum仓库安装的方式 ...

FASTQ format

将FASTQ 转换为 FASTA 格式:

FASTQ format的更多相关文章

随机推荐

热门专题