16 Finding a Protein Motif
Problem
To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.
You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into
http://www.uniprot.org/uniprot/uniprot_id
Alternatively, you can obtain a protein sequence in FASTA format by following
http://www.uniprot.org/uniprot/uniprot_id.fasta
For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.
Given: At most 15 UniProt Protein Database access IDs.
Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.
Sample Dataset
A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST
Sample Output
B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])
16 Finding a Protein Motif的更多相关文章
- 14 Finding a Shared Motif
Problem A common substring of a collection of strings is a substring of every member of the collecti ...
- 09 Finding a Motif in DNA
Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...
- ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe
Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...
- DNA motif 搜索算法总结
DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...
- P6 EPPM Installation and Configuration Guide 16 R1 April 2016
P6 EPPM Installation and Configuration Guide 16 R1 April 2016 Contents About Installing and ...
- P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1
P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1 May ...
- LOJ Finding LCM(math)
1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...
- 越狱Season 1- Episode 16
Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...
- 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)
Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...
随机推荐
- wiremock docker 运行
使用docker 模式 docker-compose yaml version: '3.3' services: service1: image: rodolpheche/wiremock ports ...
- Linux常用命令总结--不断补充
首先介绍一个很有用的命令:history 查看linux机器上历史命令. 在Linux下查看内存我们一般用free命令:free -m 查看硬盘状况:df -h 查看cpu信息:less /proc ...
- oracle12c之 表空间维护总结
1.1.创建永久表空间 In the CDB:SQL> CONNECT system@cdb1SQL> CREATE TABLESPACE cdb_users DATAFILE'/home ...
- Could not write content: No serializer found for class and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) )
1. 问题 org.springframework.http.converter.HttpMessageNotWritableException: Could not write content: N ...
- TCL数据类型
原始数据类型在Tcl中是字符串,我们常常可以找到字符串和引用在Tcl语言中.这些原始数据类型依次创建复合数据类型列表和关联数组.在Tcl中,数据类型可以表示不仅是简单Tcl的对象,但也可以代表相同的句 ...
- linux中强大的screen命令
今天发现了一个“宝贝”,就是Linux的screen命令,对于远程登录来说,不仅提供了类似于nohup的功能,而且提供了我非常喜欢的“多个桌面”的功能. 平常开一个putty远程登录,经常需要在两个程 ...
- springboot中对yaml文件的解析
一.YAML是“YAML不是一种标记语言”的外语缩写 (见前方参考资料原文内容):但为了强调这种语言以数据做为中心,而不是以置标语言为重点,而用返璞词重新命名.它是一种直观的能够被电脑识别的数据序列化 ...
- Solr进行Distinct 获取Count
今天碰到一个问题,数据之前入solr的时候并没有计算条数,现在需要计算出某几个表中去重后的总数. 由于solr的ISearch并没有相关的Distinct功能.想到一个解决方案是用Solr的Facet ...
- SpringBoot2.0实现静态资源版本控制
写在最前面 犹记毕业第一年时,公司每次发布完成后,都会在一个群里通知[版本更新,各部门清理缓存,有问题及时反馈]之类的话.归根结底就是资源缓存的问题,浏览器会将请求到的静态资源,如JS.CSS等文件缓 ...
- asp.net利用QQ邮箱发送邮件,关键在于开启pop并设置授权码为发送密码
public static bool SendEmail(string mailTo, string mailSubject, string mailContent) { ...