sequencing:使用二代测序原因:高通量,短序列 不用长序列原因: 1.算法错误率高 2.长序列测序将嵌合体基因错误积累。嵌合体基因:通过重组由来源与功能不同的基因序列剪接而形成的杂合基因

sequencing:

增多的total length>N>gap>missing in genome The reads with a frequency >

1 were called duplicated reads, and we defined the duplication rate as the count of duplicated reads / the count of total reads. 利用further reads mapping gap后仍旧存在的gap,原因是:此区域大部分是串联重复区域和转座子 串联重复区域是什么? the non-interspersed repeat sequences using RepeatMasker with the “-noint” option, including Simple_repeat, Satellites, and Low_complexity repeats panda装配过程中失掉了一部分串联重复 Loose tandem repeats, requiring that percent of matches larger than 90% and percent of Indels less than 10%; Exact tandem repeats, requiring that percent of matches equal to 100% and percent of Indels equal to 0%

evaluate-1:

1.realigned all the usable sequencing reads onto the scaffolds using SOAPaligner

2.CG content:

3. Comparison of assembled scaffolds and 26 panda mRNA gene sequences in GenBank: The known panda mRNA gene sequences were downloaded from GenBank

4. sequenced and assembled nine BACs independently using Sanger sequencing technology

5.

evaluate-2:

1.annotation 2.mannual check

estimated the genome size:

L为reads平均长度,K为k-mer长度;knum为所有的k-mer总个数,kdepth为k-mer频数的期望深度(即k-mer曲线中主峰对应的横坐标位置);bnum为测序reads覆盖碱基的总个数,bdepth为覆盖碱基的期望深度。 L=average_reads_length= 52 bp K=17 kdepth=k-mer曲线中主峰对应的横坐标位置=15 bdepth为覆盖碱基的期望深度 bnum为测序reads覆盖碱基的总个数 为什么用C值? We therefore used C-values (haploid DNA content in picograms), as this is proportional to genome size

repetitive sequences:

1.the Repbase transposable-element consensus sequences were annotated using mammalian genomes other than the panda.

2.about 70Mb of transposable-element sequences (3% of the genome) had a ,10% divergence rate from the consensus (Supplementary Fig. 9), which are likely to be active transposable elements of recent origin.

Comparative genomics-Sequence comparison:

1.segmental duplication:也称为节段重复,是真核基因组内高度同源的序列元件。 whole-genome assembly comparison (WGAC)受 assembly whole-genome shotgun sequence detection or WSSD:Considering the average depth was about 2.47 times that of the whole genome, we estimated that the total length of the duplicated copies was about 34.3 Mb 2.Conserved sequences among the panda, dog and human genomes.Each of these genomes contained ,1.4 Gb of non-repetitive sequence, 3.The <target> dog or human and <query>panda are usually just the names of files containing the sequences to be aligned, in either FASTA, Nib, or 2Bit format. However they can be HSX index files that refer to the sequences indirectly, and they also can specify pre-processing actions such as selecting a subsequence from the file (see Sequence Specifiers for details). With certain options such as ‑‑self the <query> is not needed; otherwise if it is left unspecified the query sequences are read from stdin (though this does not work with random-access formats like 2Bit). As a special case, the <target> is omitted when the ‑‑targetcapsule option is used, since the target sequence is embedded within the capsule file. 4.there were 4–5 times more rearrangements in dog than in panda, which provided evidence that the panda has a lower divergence rate than the dog

annotation:

1.predict/annotation “With synteny” means genes predicted on regions with synteny to the human or dog, and the fragmental genes were conjoined by building gene-scaffolds. “Out of synteny” means genes predicted on regions without synteny evidence to the other species; Pseudo-genes, are those containing more frame errors than a specified threshold. 2.验证: 3.The panda was similar to the human with respect to all of these key parameters 4.Figure S11 | Analysis of sequence completeness of the predicted genes. a.Alignment rate comparison between panda (annotation后的)and dog using single-copy genes, both panda and dog genes were aligned to human genes, and the alignment rate was calculated for each pair of orthologous genes. On average, the dog and panda genes covered 96.2% and 93.5% of the human gene sequences, b. The ratio of missing exons(未annotation的). The annotated panda genes were compared with the human genes, and the ratio of missing length was calculated on 5’-end, 3-end, and middle of genes unannotated exons were at the 5‘ or 3’ ends of genes; these exons were usually very small conclusion:quality of the predicted panda genes was comparable to that of other well-annotated mammalian genomes.

Comparative genomics-PSGs prediction:

1.one specific for the panda lineage, one specific for the dog lineage, and one combining evidence from all five species included in the alignmen 2. 3.The panda and the dog lineage only share six PSGs: MWU&PE consistent with the results from previous genome-wide positive selection scans in mammalian genomes, 4.

Panda-specific characteristics:

diet:

1.在遗传学中,Ka/Ks或者dN/dS表示的是异意替换(Ka)和同意替换(Ks)之间的比例。这个比例可以判断是否有选择压力作用于这个蛋白质编码基因。   不导致氨基酸改变的核苷酸变异我们称为同义突变,反之则称为非同义突变。一般认为,同义突变不受自然选择,而非同义突变则受到自然选择作用。在进化分析中,了解同义突变和非同义突变发生的速率是很有意义的。常用的参数有以下几种:同义突变频率(Ks)、非同义突变频率(Ka)、非同义突变率与同义突变率的比值(Ka/Ks)。如果Ka/Ks>1,则认为有正选择效应。如果Ka/Ks=1,则认为存在中性选择。如果Ka/Ks<1,则认为有纯化选择作用。 移码突变是由DNA序列中的许多核苷酸的indel引起的遗传突变,其不能被3整除。由于密码子基因表达的三重特性,插入或缺失可以改变阅读框,导致与原始翻译完全不同。序列中较早的缺失或插入发生,蛋白质的改变越多。

2.The third exon contained a 2-bp (‘GG’) insertion; the sixth exon contained a 4-bp (‘GTGT’) deletion. 3.coding not protein:purify selection: 如果某DNA突变对于生物是有害的,但是却不是致命的(立即被消灭),那么这个突变就将处于纯化选择作用之下。 weak selection can lead to different evolutionary fates due to different population sizes and reduced population size can increase the fixation of slightly deleterious mutations. low fecundity rate: a.Alignment b.Phylogenies of FSHB genes in different mammalian species

Population genetics:

1.杂合率:a. panda vs human;b.常 vs X 2.Substitution matrix of the panda heterozygous SNPs in the whole genome. The ratio of transition / transversion is 2.1.

The sequence and de novo assembly of the giant panda genome.ppt的更多相关文章

  1. DISCOVAR de novo

    海宝建议用这个拼接软件 http://www.broadinstitute.org/software/discovar/blog/?page_id=98 DISCOVAR – variant call ...

  2. (转)8 reviews about de novo genome assembly

    转自:http://dskernel.blogspot.com/2012/04/8-reviews-about-de-novo-genome-assembly.html 8 reviews about ...

  3. De novo RNA-Seq Assembly Using De Bruijn Graphs

    De novo RNA-Seq Assembly Using De Bruijn Graphs  2017-06-12 09:42:47     59     0     0 在说基因组的拼接之前,可 ...

  4. HHP|HPLC-MS/MS|PMT|PST|de novo|

    生物医学大数据 Protein 应用 人类蛋白质组计划 Gene的存在要依靠在蛋白水平确认基因真实存在. 蛋白质组是确定时间地点的研究单元的蛋白质总体,因为时间.地点和研究单元的相互组合存在多种变化, ...

  5. chromosome interaction mapping|cis- and trans-regulation|de novo|SRS|LRS|Haplotype blocks|linkage disequilibrium

    Dissecting evolution and disease using comparative vertebrate genomics-The sequencing revolution   s ...

  6. De novo 测序基础知识

    名词解释 De novo:拉丁文,从头开始的意思,de nove测序则是指在不需要任何参考序列的情况下对某一物种进行基因组测序,然后将测得的序列进行拼接.组装,从而绘制该物种的全基因组序列图谱. 重测 ...

  7. 全基因组测序 从头测序(de novo sequencing) 重测序(re-sequencing)

    全基因组测序 全基因组测序分为从头测序(de novo sequencing)和重测序(re-sequencing). 从头测序(de novo)不需要任何参考基因组信息即可对某个物种的基因组进行测序 ...

  8. MCP|ZWT|Precision de novo peptide sequencing using mirror proteases of Ac-LysargiNase and trypsin for large-scale proteomics(基于Ac-LysargiNase和胰蛋白酶的蛋白组镜像de novo测序)

    一.概述 由于难以获得100%的蛋白氨基酸序列覆盖率,蛋白组de novo测序成为了蛋白测序的难点,由Ac-LysargiNase(N端蛋白酶)和胰蛋白酶构成的镜像酶组合可以解决这个问题并具有稳定性, ...

  9. Uncovering thousands of new peptides with sequence-mask-search hybrid de novo peptide sequencing framework (使用序列掩码搜索结合肽段从头测序框架发现了数千个新肽段)-解读人:刘佳维

    期刊名:Molecular & Cellular Proteomics 发表时间:(2019年12月) IF:4.828 单位: 朱拉隆功大学 费城威斯塔研究所 物种:人 技术:de novo ...

随机推荐

  1. 51nod 1284:2 3 5 7的倍数 容斥原理

    1284 2 3 5 7的倍数 基准时间限制:1 秒 空间限制:131072 KB 分值: 5 难度:1级算法题  收藏  关注 给出一个数N,求1至N中,有多少个数不是2 3 5 7的倍数. 例如N ...

  2. Charles下载与破解方法

    文章参考:charles4.2下载与破解方法以及配置https 1.Charles官网下载 地址:https://www.charlesproxy.com/latest-release/downloa ...

  3. js冒泡,阻止冒泡

    js 冒泡事件 阻止冒泡 window.onload = function () { var oDiv1 = document.getElementById('div1'); var oDiv2 = ...

  4. Maven:A cycle was detected in the build path of project 'xxx'. The cycle consists of projects {xx}

    以下这个错误是在Eclipse中导入多个相互依赖的工程时出现的“循环依赖问题”:A cycle was detected in the build path of project 'xxx'. The ...

  5. 循环(while,break,continue),转义字符

    01. 程序的三大流程 在程序开发中,一共有三种流程方式: 顺序 -- 从上向下,顺序执行代码 分支 -- 根据条件判断,决定执行代码的 分支 循环 -- 让 特定代码 重复 执行 02. while ...

  6. .net core excel导入导出

    做的上一个项目用的是vs2013,传统的 Mvc模式开发的,excel报表的导入导出都是那几段代码,已经习惯了. 导入:string filename = ExcelFileUpload.FileNa ...

  7. CPU压力测试--限制到指定范围

    作用:增加CPU使用率到指定范围 1.书写shell脚本增加CPU压力 #! /bin/bash # filename cputest.sh endless_loop() { echo -ne &qu ...

  8. UML-如何使用GRASP进行对象设计?

    1.GRASP有以下模式 2.创建者 问题:谁创建某类的新实例? 方案:(我认为) 聚集:物理模型下,由父类创建子类.(父类聚集了子类的集合) 包含:子类包含父类对象 专家模式:提供初始化数据的类来创 ...

  9. 18个Java8日期处理的实践,太有用了

    专注于Java领域优质技术,欢迎关注 作者:胖先森 Java 8 推出了全新的日期时间API,在教程中我们将通过一些简单的实例来学习如何使用新API. Java处理日期.日历和时间的方式一直为社区所诟 ...

  10. 2020/1/28 PHP代码审计之代码执行漏洞

    0x00代码执行原理 应用程序在调用一些能够将字符串转换为代码的函数(如PHP中的eval)时,没有考虑用户是否控制这个字符串,将造成代码执行漏洞. 该漏洞主要存在于eval().assert().p ...