Question

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].

Solution -- Bit Manipulation

Original idea is to use a set to store each substring. Time complexity is O(n) and space cost is O(n). But for details of space cost, a char is 2 bytes, so we need 20 bytes to store a substring and therefore (20n) space.

If we represent DNA substring by integer, the space is cut down to  (4n).

 public List<String> findRepeatedDnaSequences(String s) {
List<String> result = new ArrayList<String>(); int len = s.length();
if (len < 10) {
return result;
} Map<Character, Integer> map = new HashMap<Character, Integer>();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); Set<Integer> temp = new HashSet<Integer>();
Set<Integer> added = new HashSet<Integer>(); int hash = 0;
for (int i = 0; i < len; i++) {
if (i < 9) {
//each ACGT fit 2 bits, so left shift 2
hash = (hash << 2) + map.get(s.charAt(i));
} else {
hash = (hash << 2) + map.get(s.charAt(i));
//make length of hash to be 20
hash = hash & (1 << 20) - 1; if (temp.contains(hash) && !added.contains(hash)) {
result.add(s.substring(i - 9, i + 1));
added.add(hash); //track added
} else {
temp.add(hash);
}
} } return result;
}

Repeated DNA Sequences 解答的更多相关文章

  1. lc面试准备:Repeated DNA Sequences

    1 题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...

  2. LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)

    187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...

  3. [LeetCode] Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  4. [Leetcode] Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  5. leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  6. 【leetcode】Repeated DNA Sequences(middle)★

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  7. LeetCode() Repeated DNA Sequences 看的非常的过瘾!

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  8. Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  9. Java for LeetCode 187 Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

随机推荐

  1. 【POJ1003】Hangover(二分搜索)

    直接用库函数二分即可. #include <iostream> #include <cstring> #include <cstdlib> #include < ...

  2. hdu 3711 Binary Number(暴力 模拟)

    Problem Description For non-negative integers x and y, f(x, y) , )=,f(, )=, f(, )=. Now given sets o ...

  3. MySQL查询随机数据的4种方法和性能对比

    从MySQL随机选取数据也是我们最常用的一种发发,其最简单的办法就是使用”ORDER BY RAND()”,本文介绍了包括ORDER BY RAND()的4种获取随机数据的方法,并分析了各自的优缺点. ...

  4. ajax的封装

    ajax是前端工程中与后台进行数据交互的一门重要技术,通过在后台与服务器进行少量数据交换,AJAX 可以使网页实现异步更新.这意味着可以在不重新加载整个网页的情况下,对网页的某部分进行更新.jquer ...

  5. Android.mk具体解释

    概述     Android.mk文件用来向编译系统描写叙述怎样编译你的源码.更确切地说,该文件事实上就是一个小型的Makefile.由于该文件会被NDK的编译工具解析多次,因此应该尽量降低源码中声明 ...

  6. vue-resource插件使用

    本文的主要内容如下: 介绍vue-resource的特点 介绍vue-resource的基本使用方法 基于this.$http的增删查改示例 基于this.$resource的增删查改示例 基于int ...

  7. 一个Demo就懂的Angular之directive

    <body> <div ng-controller="myCtrl"> <hello-word></hello-word> < ...

  8. c# 调用 友盟api

    今天要使用友盟的推送API来给我的app进行推送信息,调试了好久,老是返回500错误,最终在友盟的技术人员支持下完成了此操作,在此多谢友盟技术和客服人员. 把发方法和注意事项贴出来供大家参考. pub ...

  9. Android中Binder的基础知识点

    Android Binder基础知识点 一 传统IPC和Binder机制的比较 传统IPC: 1)收方无法获得对方进程可靠的UID/PID,从而无法鉴别对方身份. 2)接入点开放,无法建立私有通道. ...

  10. C++服务器设计(五):多设备类型及消息事件管理

    在传统的服务器系统中,服务器仅针对接收到的客户端消息进行解析,并处理后回复响应.在该过程中服务器并不会主动判断客户端类型.但在现实中,往往存在多种类型的客户端设备,比如物联网下的智能家居系统,就存在智 ...