Chromosome coordinate systems: 0-based, 1-based
From:
https://arnaudceol.wordpress.com/2014/09/18/chromosome-coordinate-systems-0-based-1-based/
I’ve had hard time figuring out that different website and file formats are using different systems to represent genome coordinate.
Basically, the bases can be numerated in two way: starting at 0 or starting at 1. Those are the 0-based and 1-based coordinate system.
0-based:
ACTGACTG
12345678
1-based:
ACTGACTG
123456789
Then you say that the system is inclusive if the last index is part of the sequence or exclusive if it is not.
For instance to represent the sequence TGAC of ACTGACTG:
0-based inclusive: 2-5
1-based inclusive: 3-6
1-based exclusive: 3-7
I’ve tried to figure out which website-application are using each
coordinate system. The results can be found bellow. For each source, I
provide the URL of the reference website where I found the information,
and a caption where the system is described.
I found most of those links in Biostar (https://www.biostars.org/p/6373/) and on the blog of Casey M. Bergman (http://bergmanlab.smith.man.ac.uk/?p=36), who also wrote an article about this argument: https://www.landesbioscience.com/journals/mge/article/19479/.
- Ensembl: 1-based inclusive, ref: http://www.ensembl.org/Help/Faq?id=286)(http://www.ensembl.org/info/docs/api/core/core_tutorial.html
Ensembl, and many other bioinformatics applications, use inclusive
coordinates which start at 1. The first nucleotide of a DNA sequence is
1 and the first amino acid of a protein sequence is also 1. The length
of a sequence is defined as end – start + 1.(Ensembl gtf format = gff2 = 1-based)
- UCSC: internal representation: 0-based start and 1-based end, display: 1-based, ref: http://www.ensembl.org/Help/Faq?id=286) http://genome.ucsc.edu/FAQ/FAQtracks.html#tracks1
“I am confused about the start coordinates for items in the refGene
table. It looks like you need to add “1” to the starting point in order
to get the same start coordinate as is shown by the Genome Browser. Why
is this the case?”
Our internal database representations of coordinates always have a
zero-based start and a one-based end. We add 1 to the start before
displaying coordinates in the Genome Browser. Therefore, they appear as
one-based start, one-based end in the graphical display. The refGene.txt file is a database file, and consequently is based on the internal representation.
We use this particular internal representation because it
simplifies coordinate arithmetic, i.e. it eliminates the need to add or
subtract 1 at every step. Unfortunately, it does create some confusion
when the internal representation is exposed or when we forget to add 1
before displaying a start coordinate. However, it saves us from much
trickier bugs. If you use a database dump file but would prefer to see
the one-based start coordinates, you will always need to add 1 to each
start coordinate.
If you submit data to the browser in position format
(chr#:##-##), the browser assumes this information is 1-based. If you
submit data in any other format (BED (chr# ## ##) or otherwise), the
browser will assume it is 0-based. You can see this both in our liftOver
utility and in our search bar, by entering the same numbers in position
or BED format and observing the results. Similarly, any data returned
by the browser in position format is 1-based, while data returned in BED
format is 0-based.
- Bed format: 0-based exclusive, ref: http://genome.ucsc.edu/FAQ/FAQformat.html#format1
BED format uses zero-based, half-open
coordinates, so the first 25 bases of a sequence are in the range 0-25
(those bases being numbered 0 to 24)
The first three required BED fields are:
ending position of the feature in the chromosome or scaffold. The
chromEnd base is not included in the display of the feature. For
example, the first 100 bases of a chromosome are defined as
chromStart=0, chromEnd=100, and span the bases numbered 0-99.
- MAF: 1-based inclusive, ref: https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+%28MAF%29+Specification+-+v2.4
reported variant on the genomic reference sequence. start: Mutation
start coordinate (1-based coordinate system), end: Highest numeric
genomic position of the reported variant on the genomic reference
sequence. Mutation end coordinate (inclusive, 1-based coordinate
system).
- Other 1-based coordinate formats: ref: http://samtools.github.io/hts-specs/SAMv1.pdf:SAM, VCF, GFF and Wiggle
- Other 0-based coordinate formats: ref: http://samtools.github.io/hts-specs/SAMv1.pdf: BAM, BCFv2 and PSL
Chromosome coordinate systems: 0-based, 1-based的更多相关文章
- Projected Coordinate Systems
Coordinate Systems Projected Coordinate Systems This is an archive of a previous version of the ArcG ...
- Geographic Coordinate Systems
Coordinate Systems Geographic Coordinate Systems This is an archive of a previous version of the Arc ...
- HoloLens开发手记-世界坐标系 Coordinate systems
坐标系 Coordinate systems 全息的核心是,全息应用可以在真实世界中放置全息图形并使得它们看起来和听起来像真实的物体.这涉及到了物体在真实世界中的定位和方向的确定,这对用户来说很重要. ...
- Projected coordinate systems 和 wkid
Projected coordinate systems Well-known ID Name Well-known text 2000 Anguilla_1957_British_West_Indi ...
- 小小知识点(四十一)-based和based on的正确理解
A-based B<等价于>B based on A<等价于>B on the basis of A,翻译为基于A的B For example: (1) Radar-based ...
- 【sqli-labs】 less2 GET - Error based - Intiger based (基于错误的GET整型注入)
与less1相同,直接走流程 提交参数,直接order by http://localhost/sqli/Less-2/?id=1 order by 1%23 http://localhost/sql ...
- Geographic coordinate systems 坐标系和 wkid
Well-Known ID 与对应的坐标系 地理坐标系 Well-known ID Name Well-known text 3819 GCS_HD1909 GEOGCS["GCS_HD1 ...
- Global and Local Coordinate Systems
ansys 中的坐标系 整体和局部坐标系(主要在建模中涉及) 整体坐标系是以你建模的整个建筑为一体,来确定坐标系的.比如你建一个矩形平面的建筑,整体坐标系一般默认水平方向为X轴,竖直方向为Y轴,以垂直 ...
- Physically Based Shader Development for Unity 2017 Develop Custom Lighting Systems (Claudia Doppioslash 著)
http://www.doppioslash.com/ https://github.com/Apress/physically-based-shader-dev-for-unity-2017 Par ...
随机推荐
- mysql 数据操作 多表查询 目录
mysql 数据操作 多表查询 准备 多表连接查询介绍 mysql 数据操作 多表查询 多表连接查询 笛卡尔积 mysql 数据操作 多表查询 多表连接查询 内连接 mysql 数据操作 多表查询 多 ...
- PHP开发接口使用RSA进行加密解密方法
网络安全问题很重要,尤其是保证数据安全,遇到很多在写接口的程序员直接都是明文数据传输,在我看来这是很不专业的.本人提倡经过接口的数据都要进行加密解密之后进行使用. 这篇文章主要介绍使用PHP开发接口, ...
- [py]str list切片-去除字符串首尾空格-递归思想
取出arr的前几项 #方法1 print([arr[0], arr[1]]) #方法2 arr2 = [] for i in range(2): arr2.append(arr[i]) print(a ...
- 浅谈Java中的==和equals
引言 最近在看TIJ,看到==和equals相关内容,今天就来简单的总结下. 关系操作符== 书中对关系操作符的描述是这样的:"关系操作符生成的是一个boolean结果,它们计算的是操作数的 ...
- Android View事件分发源码分析
引言 上一篇文章我们介绍了View的事件分发机制,今天我们从源码的角度来学习下事件分发机制. Activity对点击事件的分发过程 事件最先传递给当前Activity,由Activity的dispat ...
- VMware Coding Challenge: The Heist
类似BackpackII问题 static int maximize_loot(int[] gold, int[] silver) { int[][] res = new int[gold.lengt ...
- mysql主从数据库不同步的2种解决方法 (转载)
今天发现Mysql的主从数据库没有同步 先上Master库: mysql>show processlist; 查看下进程是否Sleep太多.发现很正常. show master status; ...
- cordova+Android Studio 1.0+ionic+win7(转)
转自http://blog.csdn.net/fuyunww/article/details/42216125 目录(?)[-] 在项目目录下执行 a创建工程 b添加平台支持 c添加插件在Androi ...
- XML—代码—DOM4J解析
什么是xml: 众所周知,xml常用语数据存储和传输,文件后缀为 .xml: 它是可扩展标记语言(Extensible Markup Language,简称XML),是一种标记语言. 如何定义这些标记 ...
- Python: ljust()|rjust()|center()字符串对齐
通过某种对齐方式来格式化字符串 ①对于基本的操作,可以使用字符串的ljust(),rjust(),center() ②函数format()同样可以用来很容易的对齐字符串,使用<,>,~