Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].
 
用位图算法可以减少内存,代码如下:
int map_exist[ *  / ];
int map_pattern[ * / ]; #define set(map,x) \
(map[x >> ] |= ( << (x & 0x1F))) #define test(map,x) \
(map[x >> ] & ( << (x & 0x1F))) int dnamap[]; char** findRepeatedDnaSequences(char* s, int* returnSize) {
*returnSize = ;
if (s == NULL) return NULL;
int len = strlen(s);
if (len <= ) return NULL; memset(map_exist, , sizeof(int)* ( * / ));
memset(map_pattern, , sizeof(int)* ( * / )); dnamap['A' - 'A'] = ; dnamap['C' - 'A'] = ;
dnamap['G' - 'A'] = ; dnamap['T' - 'A'] = ; char ** ret = malloc(sizeof(char*));
int curr = ;
int size = ;
int key;
int i = ; while (i < )
key = (key << ) | dnamap[s[i++] - 'A'];
while (i < len){
key = ((key << ) & 0xFFFFF) | dnamap[s[i++] - 'A'];
if (test(map_pattern, key)){
if (!test(map_exist, key)){
set(map_exist, key);
if (curr == size){
size *= ;
ret = realloc(ret, sizeof(char*)* size);
}
ret[curr] = malloc(sizeof(char)* );
memcpy(ret[curr], &s[i-], );
ret[curr][] = '\0';
++curr;
} }
else{
set(map_pattern, key);
}
} ret = realloc(ret, sizeof(char*)* curr);
*returnSize = curr;
return ret;
}

该算法用时 6ms 左右, 非常快

 

LeetCode-Repeated DNA Sequences (位图算法减少内存)的更多相关文章

  1. [LeetCode] Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  2. [LeetCode] Repeated DNA Sequences hash map

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  3. [Leetcode] Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  4. LeetCode() Repeated DNA Sequences 看的非常的过瘾!

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  5. LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)

    187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...

  6. lc面试准备:Repeated DNA Sequences

    1 题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...

  7. [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  8. Leetcode:Repeated DNA Sequences详细题解

    题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  9. 【leetcode】Repeated DNA Sequences(middle)★

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

随机推荐

  1. microsoft office安装选择

    office分为零售版和批量授权版 零售版(文件名以cn开头)需要提供序列号才可以安装,而批量授权版(文件名以SW开头)可以先安装试用一段时间.

  2. 在Java中>、>>、>>>三者的区别

    Java,是由Sun Microsystems公司于1995年5月推出的Java程序设计语言和Java平台的总称.用Java实现的HotJava浏览器(支持Java applet)显示了Java的魅力 ...

  3. 【转】maven导出项目依赖的jar包

    本文转自:http://my.oschina.net/cloudcoder/blog/212648 一.导出到默认目录 targed/dependency 从Maven项目中导出项目依赖的jar包:进 ...

  4. mysql cluster 运行的必备条件

    1.由于同步复制一共需要4次消息传递,故mysql  cluster的数据更新速度比单机mysql要慢.所以mysql cluster要求运行在千兆以上的局域网内,节点可以采用双网卡,节点组之间采用直 ...

  5. 转mysql存储引擎memory,ndb,innodb之选择

    1 mysql的innodb和cluster的NDB引擎都支持事务,在有共同的特性外,也有不同之处:以mysql cluster NDB 7.3和MySQL 5.6之InnoDB为例:ndb7.3基于 ...

  6. WPF 打印控件 无弹框打印。

    WPF中打印用到了 PrintDialog类. 其中设置打印属性的是PrintTicket,管理打印机的是PrintQueue. 实例如下: public class PrintDialogHelpe ...

  7. 1.前端笔记之html

    title: 1.前端笔记之HTML date: 2016-04-04 23:21:52 tags: Python categories: Python --- 作者:刘耀 **出处:http://w ...

  8. ShortestPath:Silver Cow Party(POJ 3268)

    牛的聚会 题目大意:一群牛在一块农田的不同的点,现在他们都要去到同一个地方开会,然后现在从那个地方回到原来的位置,点与点之间的连线都是单向的,并且通过一个路径需要一定时间,问你现在哪只牛需要最多的时间 ...

  9. Java关于队列的自我实现

    1.循环队列的封装 package com.pinjia.shop.common.collection; /** * Created by wangwei on 2016/12/29. * 循环队列的 ...

  10. ip

    D组播地址 主机号 用于识别该网络中的主机. IP地址分为五类,A类保留给政府机构,B类分配给中等规模的公司,C类分配给任何需要的人,D类用于组播,E类用于实验,各类可容纳的地址数目不同. A.B.C ...