Problem

To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.

You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into

http://www.uniprot.org/uniprot/uniprot_id

Alternatively, you can obtain a protein sequence in FASTA format by following

http://www.uniprot.org/uniprot/uniprot_id.fasta

For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.

Given: At most 15 UniProt Protein Database access IDs.

Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.

Sample Dataset

A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST

Sample Output

B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])

  

16 Finding a Protein Motif的更多相关文章

  1. 14 Finding a Shared Motif

    Problem A common substring of a collection of strings is a substring of every member of the collecti ...

  2. 09 Finding a Motif in DNA

    Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...

  3. ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe

    Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...

  4. DNA motif 搜索算法总结

    DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...

  5. P6 EPPM Installation and Configuration Guide 16 R1 April 2016

    P6 EPPM Installation and Configuration Guide 16 R1         April 2016 Contents About Installing and ...

  6. P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1

    P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1       May ...

  7. LOJ Finding LCM(math)

    1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...

  8. 越狱Season 1- Episode 16

    Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...

  9. 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)

    Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...

随机推荐

  1. wiremock docker 运行

    使用docker 模式 docker-compose yaml version: '3.3' services: service1: image: rodolpheche/wiremock ports ...

  2. Linux常用命令总结--不断补充

    首先介绍一个很有用的命令:history  查看linux机器上历史命令. 在Linux下查看内存我们一般用free命令:free -m 查看硬盘状况:df -h 查看cpu信息:less /proc ...

  3. oracle12c之 表空间维护总结

    1.1.创建永久表空间 In the CDB:SQL> CONNECT system@cdb1SQL> CREATE TABLESPACE cdb_users DATAFILE'/home ...

  4. Could not write content: No serializer found for class and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS) )

    1. 问题 org.springframework.http.converter.HttpMessageNotWritableException: Could not write content: N ...

  5. TCL数据类型

    原始数据类型在Tcl中是字符串,我们常常可以找到字符串和引用在Tcl语言中.这些原始数据类型依次创建复合数据类型列表和关联数组.在Tcl中,数据类型可以表示不仅是简单Tcl的对象,但也可以代表相同的句 ...

  6. linux中强大的screen命令

    今天发现了一个“宝贝”,就是Linux的screen命令,对于远程登录来说,不仅提供了类似于nohup的功能,而且提供了我非常喜欢的“多个桌面”的功能. 平常开一个putty远程登录,经常需要在两个程 ...

  7. springboot中对yaml文件的解析

    一.YAML是“YAML不是一种标记语言”的外语缩写 (见前方参考资料原文内容):但为了强调这种语言以数据做为中心,而不是以置标语言为重点,而用返璞词重新命名.它是一种直观的能够被电脑识别的数据序列化 ...

  8. Solr进行Distinct 获取Count

    今天碰到一个问题,数据之前入solr的时候并没有计算条数,现在需要计算出某几个表中去重后的总数. 由于solr的ISearch并没有相关的Distinct功能.想到一个解决方案是用Solr的Facet ...

  9. SpringBoot2.0实现静态资源版本控制

    写在最前面 犹记毕业第一年时,公司每次发布完成后,都会在一个群里通知[版本更新,各部门清理缓存,有问题及时反馈]之类的话.归根结底就是资源缓存的问题,浏览器会将请求到的静态资源,如JS.CSS等文件缓 ...

  10. asp.net利用QQ邮箱发送邮件,关键在于开启pop并设置授权码为发送密码

    public static bool SendEmail(string mailTo, string mailSubject, string mailContent)        {         ...