Problem

To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.

You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into

http://www.uniprot.org/uniprot/uniprot_id

Alternatively, you can obtain a protein sequence in FASTA format by following

http://www.uniprot.org/uniprot/uniprot_id.fasta

For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.

Given: At most 15 UniProt Protein Database access IDs.

Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.

Sample Dataset

A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST

Sample Output

B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])

  

16 Finding a Protein Motif的更多相关文章

  1. 14 Finding a Shared Motif

    Problem A common substring of a collection of strings is a substring of every member of the collecti ...

  2. 09 Finding a Motif in DNA

    Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...

  3. ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe

    Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...

  4. DNA motif 搜索算法总结

    DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...

  5. P6 EPPM Installation and Configuration Guide 16 R1 April 2016

    P6 EPPM Installation and Configuration Guide 16 R1         April 2016 Contents About Installing and ...

  6. P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1

    P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1       May ...

  7. LOJ Finding LCM(math)

    1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...

  8. 越狱Season 1- Episode 16

    Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...

  9. 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)

    Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...

随机推荐

  1. 在.NET中实现Actor模型的不同方式

    上周,<实现领域驱动设计>(Implementing Domain-Driven Design)一书的作者Vaughn Vernon,发布了Dotsero,这是一个使用C#编写的.基于.N ...

  2. Javascript 的数据是什么数据类型?

    Javascript 中的数据都是以 64 位浮点 float 存储的. 所有语言对浮点的精度是很难确定的. 如下代码可以实验到问题. <script> var a = 0.4; var ...

  3. Tcl 和 Raft 发明人的软件设计哲学

    John Ousterhout(斯坦福大学教授,Tcl 语言.Raft 协议的发明人...真的是超级牛人,Title 好多好多,这里就列几个大家熟悉的),在 Google 做了一次演讲,题目就叫 「A ...

  4. tomcat自启动脚本

    1.#cd /etc/rc.d/init.d2.#vi tomcat3.把下面的代码保存为tomcat文件,并让它成为可执行文件 chmod 755 tomcat. #!/bin/sh # # /et ...

  5. 自动下载google reader里面的星标文章

    1. google reader马上就要关闭了,最后一次看看俺的浏览记录吧 最近 30 天的统计信息 全部订阅: 367 已读条目: 151 已点击的条目:41 个 加星标条目: 16 已发电子邮件条 ...

  6. GCC参数详解 二

    1简介 2简单编译 2.1预处理 2.2编译为汇编代码(Compilation) 2.3汇编(Assembly) 2.4连接(Linking) 3多个程序文件的编译 4检错 5库文件连接 5.1编译成 ...

  7. 使用GridFsTemplate在mongodb中存取文件

    spring-data-mongodb之gridfs   mongodb除了能够存储大量的数据外,还内置了一个非常好用的文件系统.基于mongodb集群的优势,GridFS当然也是分布式的,而且备份也 ...

  8. centos7.3给squid搭建代理服务器添加认证nginx

    1先安装 nginx 这里是教程 点击查看 2 然后 使用命令 创建用户 htpasswd -c /etc/nginx/passwd.db baker 输入密码  提示添加完毕 3 查看加密后的用户和 ...

  9. c++ new 与malloc有什么区别

    前言 几个星期前去面试C++研发的实习岗位,面试官问了个问题: new与malloc有什么区别? 这是个老生常谈的问题.当时我回答new从自由存储区上分配内存,malloc从堆上分配内存:new/de ...

  10. WPF MVVM

    MVVM全称:Model-View-ViewModel 优点:MVVM的引入使程序实现了松耦合设计,UI层与业务逻辑可以并行设计 1.Model:对现实世界的抽象 比如:需要做一个学校的管理系统,学校 ...