Problem:

  1. All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
  2.  
  3. Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
  4.  
  5. For example,
  6.  
  7. Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
  8.  
  9. Return:
  10. ["AAAAACCCCC", "CCCCCAAAAA"].

Analysis:

  1. This problem has a genius solution.
  2. If you have not encounter it before, you may never be able to solve it out.
  3.  
  4. Idea:
  5. Since we only have four characters "A", "C", "G", "T", We can map each character with a sole 2 bits. (Note: not the ASCII code)
  6. And each sub sequence is 10 characters long, after mapping, which would only take up 20 bits. (Since an Integer in Java takes up 32 bits, a subsequence could be represented into an Integer, or we call this as an Integer hash code)
  7.  
  8. Another benefits of this mapping is that, as long we add new character, we can update on related hash code through bit movement operation.
  9.  
  10. 1. prepare the HashMap for the mapping.
  11.  
  12. HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
  13. map.put('A', 0);
  14. map.put('C', 1);
  15. map.put('G', 2);
  16. map.put('T', 3);
  17.  
  18. 2. move the subsequence window, and get realted Hashcode.
  19. int hash = 0;
  20. for (int i = 0; i < s.length(); i++) {
  21. if (i < 9) {
  22. hash = (hash << 2) + map.get(s.charAt(i));
  23. } else{
  24. hash = (hash << 2) + map.get(s.charAt(i));
  25. hash = hash & ((1 << 20) - 1);
  26. ...
  27.  
  28. }
  29. }
  30. Note: once the slide window's size meet 10 characters, we should get the hash code for the window. The skill here is to use '&' with a 20 bits "1" to get those bits.
  31. 2.1 get 20 bits '1'.
  32. ((1 << 20) - 1)
  33. The idea is not hard: like 4 - 1 = 100 - 1 = 011
  34. 2.2 use '&'' operator to get the bits.
  35. hash = hash & ((1 << 20) - 1);
  36.  
  37. Errors:
  38. When you put a <key, value> pair into hashmap, and the value based on the existing in the HashMap, you must test if the pair exist or not.
  39. if (counted.containsKey(hash))
  40. counted.put(hash, counted.get(hash)+1);
  41. else
  42. counted.put(hash, 1);

Solution:

  1. public class Solution {
  2. public List<String> findRepeatedDnaSequences(String s) {
  3. ArrayList<String> ret = new ArrayList<String> ();
  4. if (s.length() < 10)
  5. return ret;
  6. HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
  7. map.put('A', 0);
  8. map.put('C', 1);
  9. map.put('G', 2);
  10. map.put('T', 3);
  11.  
  12. HashMap<Integer, Integer> counted = new HashMap<Integer, Integer> ();
  13. int hash = 0;
  14. for (int i = 0; i < s.length(); i++) {
  15. if (i < 9) {
  16. hash = (hash << 2) + map.get(s.charAt(i));
  17. } else{
  18. hash = (hash << 2) + map.get(s.charAt(i));
  19. hash = hash & ((1 << 20) - 1);
  20. if (counted.containsKey(hash) && counted.get(hash) == 1) {
  21. ret.add(s.substring(i-9, i+1));
  22. counted.put(hash, 2);
  23. } else{
  24. if (counted.containsKey(hash))
  25. counted.put(hash, counted.get(hash)+1);
  26. else
  27. counted.put(hash, 1);
  28. }
  29. }
  30. }
  31. return ret;
  32. }
  33. }
  1. Actually, since we only care about if a subsequence has appeared twice, we could use two HashSet to avoid the above ugly code.
  1. public class Solution {
  2. public List<String> findRepeatedDnaSequences(String s) {
  3. ArrayList<String> ret = new ArrayList<String> ();
  4. if (s.length() < 10)
  5. return ret;
  6. HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
  7. map.put('A', 0);
  8. map.put('C', 1);
  9. map.put('G', 2);
  10. map.put('T', 3);
  11. HashSet<Integer> appeared = new HashSet<Integer> ();
  12. HashSet<Integer> counted = new HashSet<Integer> ();
  13.  
  14. int hash = 0;
  15. for (int i = 0; i < s.length(); i++) {
  16. if (i < 9) {
  17. hash = (hash << 2) + map.get(s.charAt(i));
  18. } else{
  19. hash = (hash << 2) + map.get(s.charAt(i));
  20. hash = hash & ((1 << 20) - 1);
  21. if (appeared.contains(hash) && !counted.contains(hash)) {
  22. ret.add(s.substring(i-9, i+1));
  23. counted.add(hash);
  24. } else{
  25. appeared.add(hash);
  26. }
  27. }
  28. }
  29. return ret;
  30. }
  31. }

[LeetCode#187]Repeated DNA Sequences的更多相关文章

  1. [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  2. leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  3. Java for LeetCode 187 Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  4. [LeetCode] 187. Repeated DNA Sequences 解题思路

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  5. [leetcode]187. Repeated DNA Sequences寻找DNA中重复出现的子串

    很重要的一道题 题型适合在面试的时候考 位操作和哈希表结合 public List<String> findRepeatedDnaSequences(String s) { /* 寻找出现 ...

  6. 【LeetCode】187. Repeated DNA Sequences 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/repeated ...

  7. 【LeetCode】187. Repeated DNA Sequences

    题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  8. 187. Repeated DNA Sequences

    题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  9. Leetcode:Repeated DNA Sequences详细题解

    题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

随机推荐

  1. jquery easyui动态校验,easyui动态验证

    >>>>>>>>>>>>>>>>>>>>>>>>> ...

  2. build/envsetup.sh中hmm、get_abs_build_var、get_build_var解析

    function hmm() { # 打印帮助信息 cat <<EOF Invoke ". build/envsetup.sh" from your shell to ...

  3. HDU-1020(水题)

    Encoding Problem Description Given a string containing only 'A' - 'Z', we could encode it using the ...

  4. (转)ASP.NET QueryString乱码解决问题

    正常的情况下,现在asp.net的网站很多都直接使用UTF8来进行页面编码的,这与Javascript.缺省网站的编码是相同的,但是也有相当一部分采用GB2312. 对于GB2312的网站如果直接用j ...

  5. delphi 功能函数大全-备份用

    function CheckTask(ExeFileName: string): Boolean;constPROCESS_TERMINATE=$0001;varContinueLoop: BOOL; ...

  6. java开发规范总结_代码编码规范

    规范需要平时编码过程中注意,是一个慢慢养成的好习惯 1.基本原则 强制性原则:     1.字符串的拼加操作,必须使用StringBuilder:     2.try…catch的用法 try{ }c ...

  7. ios7 苹果原生二维码扫描(和微信类似)

    在ios7苹果推出了二维码扫描,以前想要做二维码扫描,只能通过第三方ZBar与ZXing. ZBar在扫描的灵敏度上,和内存的使用上相对于ZXing上都是较优的,但是对于 “圆角二维码” 的扫描确很困 ...

  8. 数据库(学习整理)----1--如何彻底清除系统中Oracle的痕迹(重装Oracle时)

    1.关于重装Oracle数据库: 由于以前装过Oracle数据库,但是版本不怎么样,结果过了试用期之后,我就没有破解和再找合适的版本了!直接使用电脑管家卸载了!可想而知,肯定没清除Oracle痕迹啊! ...

  9. php文件锁(转)

    bool flock ( int handle, int operation [, int &wouldblock] );flock() 操作的 handle 必须是一个已经打开的文件指针.o ...

  10. phpExcel使用与中文处理教程

    PHPExcel 是相当强大的 MS Office Excel 文档生成类库,当需要输出比较复杂格式数据的时候,PHPExcel 是个不错的选择.不过其使用方法相对来说也就有些繁琐. phpExcel ...