原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/

题目:

Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txtf2.txt ... fn.txt with content f1_contentf2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

  1. Imagine you are given a real file system, how will you search files? DFS or BFS?
  2. If the file content is very large (GB level), how will you modify your solution?
  3. If you can only read the file by 1kb each time, how will you modify your solution?
  4. What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
  5. How to make sure the duplicated files you find are not false positive?

题解:

用HashMap<String, List<String>> hm 来保存每个file 的content 和对应的path集合.

每个input string 按照 path fileName1(content1) fileName2(content2) 格式输入. 所以先按照空格断开,后面的都是文件名加上内容,再用"("断开提取内容.

最后看内容对应文件数大于1的就是有duplicate.

Time Complexity: O(paths.length * x). x为input string的平均长度.

Space:O(paths.length * x). hm size.

AC Java:

 class Solution {
public List<List<String>> findDuplicate(String[] paths) {
List<List<String>> res = new ArrayList<List<String>>();
HashMap<String, List<String>> hm = new HashMap<String, List<String>>(); for(String path : paths){
String [] pathArr = path.split("\\s+");
for(int i = 1; i<pathArr.length; i++){
String content = pathArr[i].substring(pathArr[i].indexOf("("));
String fileName = pathArr[i].substring(0, pathArr[i].indexOf("("));
List<String> list = hm.getOrDefault(content, new ArrayList<String>());
list.add(pathArr[0] + "/" + fileName);
hm.put(content, list);
}
} for(String key : hm.keySet()){
if(hm.get(key).size() > 1){
res.add(hm.get(key));
}
} return res;
}
}

LeetCode Find Duplicate File in System的更多相关文章

  1. [LeetCode] Find Duplicate File in System 在系统中寻找重复文件

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  2. LC 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  3. 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...

  4. 【leetcode】609. Find Duplicate File in System

    题目如下: Given a list of directory info including directory path, and all the files with contents in th ...

  5. [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  6. [leetcode-609-Find Duplicate File in System]

    https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of direct ...

  7. 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  8. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  9. HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)

    Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...

随机推荐

  1. nginx日志分割总结

    nginx日志自己不会进行分个,所有日志都会累积的记录在 access.log,error.log 中,当请求量大,一天就能到几百兆,如果不进行分给,对日志的查看和写入性能都有影响. 1. 编写脚本n ...

  2. JAVA基础补漏--ArrayList

    今天在写代码的时候,index定义的时候用了Integer,在list.remove(index)的时候,总是不成功,后来发现如果用Integer定义的时候,index不再是基础数据类型,被识别为re ...

  3. 第十节课-RNN介绍

    2017-08-21 这次的课程介绍了RNN的相关知识: 首先是RNN的几种模型: 分别又不同的应用场景,包括机器翻译,视频的分类... RNN的解释: 主要的特点就是用到了上一个隐含状态的信息,所以 ...

  4. Java基本数据类型与相应的封装类

    基本数据类型   封装类 int Integer short   Short float   Float double   Double long   Long boolean   Boolean b ...

  5. Windows安装Ubuntu桌面操作系统到移动硬盘中以及错误解决

    用到的工具:U盘一个(usb3.0,你懂的),移动硬盘(我这个是笔记本里面取出来的机械硬盘装上的盒子) 第一步:下载Ubuntu系统iso镜像文件 下载Ubuntu系统iso镜像文件,由于我是新手,下 ...

  6. hdu 5776 sum 前缀和

    sum Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 131072/131072 K (Java/Others)Total Submi ...

  7. apue.3e源码下载及编译

    下载地址:http://www.apuebook.com/code3e.html 编译方法:http://blog.sina.com.cn/s/blog_94977c890102vdms.html

  8. TCP中间件_个人方案

    按照功能分类,不管是直接的 insert/delete/update/select语句 还是 调用存储过程,基本的功能 就是 增删改查.又分为两大类: (1).查询(会返回结果集的),(2).非查询( ...

  9. Mongodb笔记(二) Index

    版本:mongodb3.4; Index  : 如果mongodb不能使用索引进行排序,就会将数据放入内存中进行排序,而当内存使用超过32MB时,就会报错. 在创建索引时,应确保索引的选择力,避免多余 ...

  10. canvas画的北斗七星和大熊座

    用canvas画的北斗七星和大熊座,主要用到的知识点是:canvas.定时器. html代码: <body> <canvas id="canvas" width= ...