16 Finding a Protein Motif
Problem
To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.
You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into
http://www.uniprot.org/uniprot/uniprot_id
Alternatively, you can obtain a protein sequence in FASTA format by following
http://www.uniprot.org/uniprot/uniprot_id.fasta
For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.
Given: At most 15 UniProt Protein Database access IDs.
Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.
Sample Dataset
A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST
Sample Output
B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])
16 Finding a Protein Motif的更多相关文章
- 14 Finding a Shared Motif
Problem A common substring of a collection of strings is a substring of every member of the collecti ...
- 09 Finding a Motif in DNA
Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...
- ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe
Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...
- DNA motif 搜索算法总结
DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...
- P6 EPPM Installation and Configuration Guide 16 R1 April 2016
P6 EPPM Installation and Configuration Guide 16 R1 April 2016 Contents About Installing and ...
- P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1
P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1 May ...
- LOJ Finding LCM(math)
1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...
- 越狱Season 1- Episode 16
Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...
- 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)
Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...
随机推荐
- 在.NET中实现Actor模型的不同方式
上周,<实现领域驱动设计>(Implementing Domain-Driven Design)一书的作者Vaughn Vernon,发布了Dotsero,这是一个使用C#编写的.基于.N ...
- Javascript 的数据是什么数据类型?
Javascript 中的数据都是以 64 位浮点 float 存储的. 所有语言对浮点的精度是很难确定的. 如下代码可以实验到问题. <script> var a = 0.4; var ...
- Tcl 和 Raft 发明人的软件设计哲学
John Ousterhout(斯坦福大学教授,Tcl 语言.Raft 协议的发明人...真的是超级牛人,Title 好多好多,这里就列几个大家熟悉的),在 Google 做了一次演讲,题目就叫 「A ...
- tomcat自启动脚本
1.#cd /etc/rc.d/init.d2.#vi tomcat3.把下面的代码保存为tomcat文件,并让它成为可执行文件 chmod 755 tomcat. #!/bin/sh # # /et ...
- 自动下载google reader里面的星标文章
1. google reader马上就要关闭了,最后一次看看俺的浏览记录吧 最近 30 天的统计信息 全部订阅: 367 已读条目: 151 已点击的条目:41 个 加星标条目: 16 已发电子邮件条 ...
- GCC参数详解 二
1简介 2简单编译 2.1预处理 2.2编译为汇编代码(Compilation) 2.3汇编(Assembly) 2.4连接(Linking) 3多个程序文件的编译 4检错 5库文件连接 5.1编译成 ...
- 使用GridFsTemplate在mongodb中存取文件
spring-data-mongodb之gridfs mongodb除了能够存储大量的数据外,还内置了一个非常好用的文件系统.基于mongodb集群的优势,GridFS当然也是分布式的,而且备份也 ...
- centos7.3给squid搭建代理服务器添加认证nginx
1先安装 nginx 这里是教程 点击查看 2 然后 使用命令 创建用户 htpasswd -c /etc/nginx/passwd.db baker 输入密码 提示添加完毕 3 查看加密后的用户和 ...
- c++ new 与malloc有什么区别
前言 几个星期前去面试C++研发的实习岗位,面试官问了个问题: new与malloc有什么区别? 这是个老生常谈的问题.当时我回答new从自由存储区上分配内存,malloc从堆上分配内存:new/de ...
- WPF MVVM
MVVM全称:Model-View-ViewModel 优点:MVVM的引入使程序实现了松耦合设计,UI层与业务逻辑可以并行设计 1.Model:对现实世界的抽象 比如:需要做一个学校的管理系统,学校 ...