Multiple sequence alignment Benchmark Data set

1. 汇总: 序列比对标准数据集: http://www.drive5.com/bench/

This is a collection of multiple alignment benchmarks in a uniform
format that is convenient for further analysis. All files are in
FASTA format, with upper-case letters used to indicate aligned
columns.

See References below for original sources of benchmark data.

Benchmarks are:

--------------------------1---------------------------

bali2dna
BALIBASE v2, reverse-translated to DNA

bali2dnaf
Bali2dbn, with frame-shifts induced by random insertions of one
or two nucleotides into the middle 50% of exactly one sequence
in each set.

bali3
BALIBASE v3.

bali3pdb
BALIS, the structural subset of BALIBASE v3.

bali3pdbm
MU-BALIS, i.e. BALIS re-aligned by MUSTANG.

---------------------------2--------------------------

ox
OXBENCH.

oxm
MU-OXBENCH, i.e. OXBENCH re-aligned by MUSTANG.

oxx
OXBENCH-X, i.e. the Extended set in OBENCH.

---------------------------3--------------------------

prefab4
PREFAB v4.

prefab4ref
PREFAB-R, i.e. the pair-wise reference pairs in PREFAB v4.

prefab4refm
MU-PREFAB-R, i.e. PREFAB-R re-aligned by MUSTANG.

---------------------------4--------------------------

sabre
Consistent multiple alignments constructed from SABMARK v1.65.

sabrem
MU-SABRE, i.e. SABRE re-aligned by MUSTANG.

-----------------------------------------------------

Directory structure under each benchmark is:

in/
Input sequences.

ref/
Reference alignments. Upper-case regions indicate conservative
regions that are intended for use in assessment. Lower-case regions
should not be used.

info/
Contains ids.txt (list of set identifiers that are filenames in ref/
and in/), nrseqs.txt (number of sequences in each set), and
pctids.txt (%id in conservative regions in each set).

Download page for qscore :http://www.drive5.com/bench/bench.tar.gz

This is a quality scoring program that compares two multiple sequence alignments: an alignment to be evaluated (the "test" alignment) and a second alignment that is believed to be correct (the "reference" alignment). The program outputs the following scores:
- The PREFAB Q score (aka the Balibase SPS score or the Developer score).
- The Modeler score
- The Cline et al. shift score
- The Balibase TC (total column) score


Balibase标准数据库地址: http://www.lbgi.fr/balibase/


References
----------

Thompson JD, Koehl P, Ripp R, Poch O (2005) BAliBASE 3.0: latest
developments of the multiple sequence alignment benchmark. Proteins
61: 127-136.

Bahr A, Thompson JD, Thierry JC, Poch O (2001) BAliBASE (Benchmark
Alignment dataBASE): enhancements for repeats, transmembrane
sequences and circular permutations. Nucleic Acids Res 29: 323-326.

Thompson JD, Plewniak F, Poch O (1999) BAliBASE: a benchmark
alignment database for the evaluation of multiple alignment programs.
Bioinformatics 15: 87-88.

Van Walle I, Lasters I, Wyns L (2005) SABmark--a benchmark for
sequence alignment that covers the entire known fold space.
Bioinformatics 21: 1267-1268.

Raghava GP, Searle SM, Audley PC, Barber JD, Barton GJ (2003)
OXBench: a benchmark for evaluation of protein multiple sequence
alignment accuracy. BMC Bioinformatics 4: 47.

Edgar RC (2004) MUSCLE: multiple sequence alignment with high
accuracy and high throughput. Nucleic Acids Res 32: 1792-1797.

Multiple sequence alignment Benchmark Data set的更多相关文章

  1. [Sequence Alignment Methods] Dynamic time warping (DTW)

    本系列介绍几种序列对齐方法,包括Dynamic time warping (DTW),Smith–Waterman algorithm,Cross-recurrence plot Dynamic ti ...

  2. [Sequence Alignment Methods] Cross-Recurrent Plot (CRP)

    A recurrence plot (RP) is a straightforward way to visualize characteristics of similar system state ...

  3. [Sequence Alignment Methods] Smith–Waterman algorithm

    Smith–Waterman algorithm 首先需要澄清一个事实,Smith–Waterman algorithm是求两个序列的最佳subsequence匹配,与之对应的算法但是求两个序列整体匹 ...

  4. INTRODUCTION TO BIOINFORMATICS

    INTRODUCTION TO BIOINFORMATICS      这套教程源自Youtube,算得上比较完整的生物信息学领域的视频教程,授课内容完整清晰,专题化的讲座形式,细节讲解比国内的京师大 ...

  5. Bioinformatics Glossary

    原文:http://homepages.ulb.ac.be/~dgonze/TEACHING/bioinfo_glossary.html Affine gap costs: A scoring sys ...

  6. 三代PacBio reads纠错 - 专题

    三代纠错的重要性不言而喻,三代的核心优势就是长,唯一的缺点就是错误率高,但好就好在错误是随机分布的,可以通过算法解决,这也就是为什么现在有这么多针对三代开发的纠错工具. 纠错和组装是分不开的,纠错就是 ...

  7. Difference between Hard Clip(H) and Soft Clip(S) in Samtools CIGAR string

    一般人都知道 H 和 S 的表面上的区别,即 S 就是 soft, H 就是 hard,S 后,序列里还是会保留序列的信息,而 H 则不会. ----------------------------- ...

  8. SOAPdenovo组装软件使用记录

    背景: 1.为什么要从头测序组装基因组? 基因组是不同表型的遗传基础:获得参考基因组是深入研究一个生物体全基因组的第一步也是必须的一步:从头测序组装能够对新的测序物种构建参考基因组: 2.为什么要研究 ...

  9. 序列比对之Biostrings包

    基本概念 Biostrings包很重要的3个功能是进行Pairwise sequence alignment 和Multiple sequence alignment及 Pattern finding ...

随机推荐

  1. iPad开发--UIPopoverController简单使用iOS7之前和iOS7之后的使用方法

    一.iOS7之前的Popover的使用 对Popover进行懒加载处理 内容控制器中设置Popover弹出后的尺寸 设置显示的位置,两种情况.1 -- 给BarButtonItem设置Popover的 ...

  2. jquery 模糊查询下拉框

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"><HTML><HEAD&g ...

  3. Hadoop 权威指南学习2 (Sqoop)

    6. Sqoop Apache sqoop is an open source tool that allow users to extract data from structured data s ...

  4. bzoj 1503 splay

    因为是整体加减,所以直接记录在外面. #include<iostream> #include<cstdio> #include<cstring> #include& ...

  5. hdu5124 线段树+离散化

    题意:令a[l..r]都+1,求a[1..n]的最大值 裸的成段更新+区间最值,但是本题坐标范围很大(10^9),所以需要离散化 顺便离散化模板get 离散化的基本思路: 设一共有m个数,范围1--n ...

  6. Linux给用户添加sudo权限

    一.linux给用户添加sudo权限: 有时候,linux下面运行sudo命令,会提示类似: xxxis not in the sudoers file.  This incident will be ...

  7. HDU 1754 I Hate It

    I Hate It Time Limit: 9000/3000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others) Total ...

  8. C# List根据某一字段排序 将字段相同的排序到一起

    List<JZJLXQ_Prescription_Item> ciList = new List<JZJLXQ_Prescription_Item>(); List<JZ ...

  9. MFC 文件对话框

    文件对话框的分类 文件对话框分为打开文件对话框和保存文件对话框,相信大家在Windows系统中经常见到这两种文件对话框.例如,很多编辑软件像记事本等都有"打开"选项,选择" ...

  10. django redirect的几种方式

    You can use the redirect() function in a number of ways. By passing some object; that object’s get_a ...