[LeetCode#187]Repeated DNA Sequences
Problem:
- All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
- Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
- For example,
- Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
- Return:
- ["AAAAACCCCC", "CCCCCAAAAA"].
Analysis:
- This problem has a genius solution.
- If you have not encounter it before, you may never be able to solve it out.
- Idea:
- Since we only have four characters "A", "C", "G", "T", We can map each character with a sole 2 bits. (Note: not the ASCII code)
- And each sub sequence is 10 characters long, after mapping, which would only take up 20 bits. (Since an Integer in Java takes up 32 bits, a subsequence could be represented into an Integer, or we call this as an Integer hash code)
- Another benefits of this mapping is that, as long we add new character, we can update on related hash code through bit movement operation.
- 1. prepare the HashMap for the mapping.
- HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
- map.put('A', 0);
- map.put('C', 1);
- map.put('G', 2);
- map.put('T', 3);
- 2. move the subsequence window, and get realted Hashcode.
- int hash = 0;
- for (int i = 0; i < s.length(); i++) {
- if (i < 9) {
- hash = (hash << 2) + map.get(s.charAt(i));
- } else{
- hash = (hash << 2) + map.get(s.charAt(i));
- hash = hash & ((1 << 20) - 1);
- ...
- }
- }
- Note: once the slide window's size meet 10 characters, we should get the hash code for the window. The skill here is to use '&' with a 20 bits "1" to get those bits.
- 2.1 get 20 bits '1'.
- ((1 << 20) - 1)
- The idea is not hard: like 4 - 1 = 100 - 1 = 011
- 2.2 use '&'' operator to get the bits.
- hash = hash & ((1 << 20) - 1);
- Errors:
- When you put a <key, value> pair into hashmap, and the value based on the existing in the HashMap, you must test if the pair exist or not.
- if (counted.containsKey(hash))
- counted.put(hash, counted.get(hash)+1);
- else
- counted.put(hash, 1);
Solution:
- public class Solution {
- public List<String> findRepeatedDnaSequences(String s) {
- ArrayList<String> ret = new ArrayList<String> ();
- if (s.length() < 10)
- return ret;
- HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
- map.put('A', 0);
- map.put('C', 1);
- map.put('G', 2);
- map.put('T', 3);
- HashMap<Integer, Integer> counted = new HashMap<Integer, Integer> ();
- int hash = 0;
- for (int i = 0; i < s.length(); i++) {
- if (i < 9) {
- hash = (hash << 2) + map.get(s.charAt(i));
- } else{
- hash = (hash << 2) + map.get(s.charAt(i));
- hash = hash & ((1 << 20) - 1);
- if (counted.containsKey(hash) && counted.get(hash) == 1) {
- ret.add(s.substring(i-9, i+1));
- counted.put(hash, 2);
- } else{
- if (counted.containsKey(hash))
- counted.put(hash, counted.get(hash)+1);
- else
- counted.put(hash, 1);
- }
- }
- }
- return ret;
- }
- }
- Actually, since we only care about if a subsequence has appeared twice, we could use two HashSet to avoid the above ugly code.
- public class Solution {
- public List<String> findRepeatedDnaSequences(String s) {
- ArrayList<String> ret = new ArrayList<String> ();
- if (s.length() < 10)
- return ret;
- HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
- map.put('A', 0);
- map.put('C', 1);
- map.put('G', 2);
- map.put('T', 3);
- HashSet<Integer> appeared = new HashSet<Integer> ();
- HashSet<Integer> counted = new HashSet<Integer> ();
- int hash = 0;
- for (int i = 0; i < s.length(); i++) {
- if (i < 9) {
- hash = (hash << 2) + map.get(s.charAt(i));
- } else{
- hash = (hash << 2) + map.get(s.charAt(i));
- hash = hash & ((1 << 20) - 1);
- if (appeared.contains(hash) && !counted.contains(hash)) {
- ret.add(s.substring(i-9, i+1));
- counted.add(hash);
- } else{
- appeared.add(hash);
- }
- }
- }
- return ret;
- }
- }
[LeetCode#187]Repeated DNA Sequences的更多相关文章
- [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Java for LeetCode 187 Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [LeetCode] 187. Repeated DNA Sequences 解题思路
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [leetcode]187. Repeated DNA Sequences寻找DNA中重复出现的子串
很重要的一道题 题型适合在面试的时候考 位操作和哈希表结合 public List<String> findRepeatedDnaSequences(String s) { /* 寻找出现 ...
- 【LeetCode】187. Repeated DNA Sequences 解题报告(Python)
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/repeated ...
- 【LeetCode】187. Repeated DNA Sequences
题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
- 187. Repeated DNA Sequences
题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
- Leetcode:Repeated DNA Sequences详细题解
题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
随机推荐
- jquery easyui动态校验,easyui动态验证
>>>>>>>>>>>>>>>>>>>>>>>>> ...
- build/envsetup.sh中hmm、get_abs_build_var、get_build_var解析
function hmm() { # 打印帮助信息 cat <<EOF Invoke ". build/envsetup.sh" from your shell to ...
- HDU-1020(水题)
Encoding Problem Description Given a string containing only 'A' - 'Z', we could encode it using the ...
- (转)ASP.NET QueryString乱码解决问题
正常的情况下,现在asp.net的网站很多都直接使用UTF8来进行页面编码的,这与Javascript.缺省网站的编码是相同的,但是也有相当一部分采用GB2312. 对于GB2312的网站如果直接用j ...
- delphi 功能函数大全-备份用
function CheckTask(ExeFileName: string): Boolean;constPROCESS_TERMINATE=$0001;varContinueLoop: BOOL; ...
- java开发规范总结_代码编码规范
规范需要平时编码过程中注意,是一个慢慢养成的好习惯 1.基本原则 强制性原则: 1.字符串的拼加操作,必须使用StringBuilder: 2.try…catch的用法 try{ }c ...
- ios7 苹果原生二维码扫描(和微信类似)
在ios7苹果推出了二维码扫描,以前想要做二维码扫描,只能通过第三方ZBar与ZXing. ZBar在扫描的灵敏度上,和内存的使用上相对于ZXing上都是较优的,但是对于 “圆角二维码” 的扫描确很困 ...
- 数据库(学习整理)----1--如何彻底清除系统中Oracle的痕迹(重装Oracle时)
1.关于重装Oracle数据库: 由于以前装过Oracle数据库,但是版本不怎么样,结果过了试用期之后,我就没有破解和再找合适的版本了!直接使用电脑管家卸载了!可想而知,肯定没清除Oracle痕迹啊! ...
- php文件锁(转)
bool flock ( int handle, int operation [, int &wouldblock] );flock() 操作的 handle 必须是一个已经打开的文件指针.o ...
- phpExcel使用与中文处理教程
PHPExcel 是相当强大的 MS Office Excel 文档生成类库,当需要输出比较复杂格式数据的时候,PHPExcel 是个不错的选择.不过其使用方法相对来说也就有些繁琐. phpExcel ...