Problem:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Analysis:

This problem has a genius solution.
If you have not encounter it before, you may never be able to solve it out. Idea:
Since we only have four characters "A", "C", "G", "T", We can map each character with a sole 2 bits. (Note: not the ASCII code)
And each sub sequence is 10 characters long, after mapping, which would only take up 20 bits. (Since an Integer in Java takes up 32 bits, a subsequence could be represented into an Integer, or we call this as an Integer hash code) Another benefits of this mapping is that, as long we add new character, we can update on related hash code through bit movement operation. 1. prepare the HashMap for the mapping. HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); 2. move the subsequence window, and get realted Hashcode.
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
... }
}
Note: once the slide window's size meet 10 characters, we should get the hash code for the window. The skill here is to use '&' with a 20 bits "1" to get those bits.
2.1 get 20 bits '1'.
((1 << 20) - 1)
The idea is not hard: like 4 - 1 = 100 - 1 = 011
2.2 use '&'' operator to get the bits.
hash = hash & ((1 << 20) - 1); Errors:
When you put a <key, value> pair into hashmap, and the value based on the existing in the HashMap, you must test if the pair exist or not.
if (counted.containsKey(hash))
counted.put(hash, counted.get(hash)+1);
else
counted.put(hash, 1);

Solution:

public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String> ();
if (s.length() < 10)
return ret;
HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); HashMap<Integer, Integer> counted = new HashMap<Integer, Integer> ();
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
if (counted.containsKey(hash) && counted.get(hash) == 1) {
ret.add(s.substring(i-9, i+1));
counted.put(hash, 2);
} else{
if (counted.containsKey(hash))
counted.put(hash, counted.get(hash)+1);
else
counted.put(hash, 1);
}
}
}
return ret;
}
}
Actually, since we only care about if a subsequence has appeared twice, we could use two HashSet to avoid the above ugly code.
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String> ();
if (s.length() < 10)
return ret;
HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3);
HashSet<Integer> appeared = new HashSet<Integer> ();
HashSet<Integer> counted = new HashSet<Integer> (); int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
if (appeared.contains(hash) && !counted.contains(hash)) {
ret.add(s.substring(i-9, i+1));
counted.add(hash);
} else{
appeared.add(hash);
}
}
}
return ret;
}
}

[LeetCode#187]Repeated DNA Sequences的更多相关文章

  1. [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  2. leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  3. Java for LeetCode 187 Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  4. [LeetCode] 187. Repeated DNA Sequences 解题思路

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  5. [leetcode]187. Repeated DNA Sequences寻找DNA中重复出现的子串

    很重要的一道题 题型适合在面试的时候考 位操作和哈希表结合 public List<String> findRepeatedDnaSequences(String s) { /* 寻找出现 ...

  6. 【LeetCode】187. Repeated DNA Sequences 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/repeated ...

  7. 【LeetCode】187. Repeated DNA Sequences

    题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  8. 187. Repeated DNA Sequences

    题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  9. Leetcode:Repeated DNA Sequences详细题解

    题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

随机推荐

  1. NET/ASP.NET Routing路由(深入解析路由系统架构原理)(转载)

    NET/ASP.NET Routing路由(深入解析路由系统架构原理) 阅读目录: 1.开篇介绍 2.ASP.NET Routing 路由对象模型的位置 3.ASP.NET Routing 路由对象模 ...

  2. 20、CSS

    CSS 层叠样式表(Cascading Style Sheets). 用于定义显示HTML样式. DIV和SPAN div是块级元素. span是行级元素. 将一些页面中的内容包裹起来统一设置样式. ...

  3. C# Java DES加密解密

    转自http://www.cnblogs.com/zhuiyi/archive/2013/04/01/2993201.html 最近被DES加解密弄得超级郁闷,我用C#的方法加密得到的密文老是跟客户给 ...

  4. [C# 基础知识系列]专题一:深入解析委托——C#中为什么要引入委托

    转自http://www.cnblogs.com/zhili/archive/2012/10/22/Delegate.html 引言: 对于一些刚接触C# 不久的朋友可能会对C#中一些基本特性理解的不 ...

  5. [功能帮助类] 最新的Functions 类 (转载)

    代码 using System; using System.Collections.Generic; using System.Text; using System.Security.Cryptogr ...

  6. angularjs小知识

    字符串和对象的转化  :angular.fromJson(jsonStr) 对象转字符串 :angular.toJson(obj) jsonStr:json字符串 obj:对象

  7. Android eclipse - aapt.exe has stopped working.

    今天在修改Android的布局文件的时候,发现每保存一次,就提示: aapt.exe has stopped working(appt.exe已停止工作).很是郁闷,当时Android控制台已经提示错 ...

  8. CI 笔记 数据库

    demo: 1.  建立数据库,driver, 字段 name,telphone,idcard,car,content 2. 建立model,Driver_model.php文件, 建立add方法, ...

  9. C#程序中获取电脑硬件配置信息的一种方法

    本文介绍获取cpu信息和内存信息的方法,根据本文所举例的代码可以举一反三获取更多信息. 获取cpu名称的方法: public string GetCpuInfo() { ManagementObjec ...

  10. angular2 组件之间通讯-使用服务通讯模式 2016.10.27 基于正式版ng2

    工作中用到ng2的组件通讯 奈何官方文档言简意赅 没说明白 自己搞明白后 整理后分享下 rxjs 不懂的看这篇文章 讲很详细 http://www.open-open.com/lib/view/ope ...