Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txtf2.txt ... fn.txt with content f1_contentf2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

  1. Imagine you are given a real file system, how will you search files? DFS or BFS?
  2. If the file content is very large (GB level), how will you modify your solution?
  3. If you can only read the file by 1kb each time, how will you modify your solution?
  4. What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
  5. How to make sure the duplicated files you find are not false positive?

Runtime: 84 ms, faster than 49.48% of C++ online submissions for Find Duplicate File in System.

简单字符串判断。

class Solution {
public:
unordered_map<string, vector<string>> mp;
void process(const string& s){
vector<string> filecontent;
vector<string> emptycontent;
int idx = ;
for(int i=; i<s.size(); i++){
if(s[i] == ' '){
//cout << i << endl;
filecontent.push_back(s.substr(idx, i - idx));
idx = i + ;
}
}
filecontent.push_back(s.substr(idx));
//for(auto v : filecontent) cout << v << endl; for(int i=; i<filecontent.size(); i++){
for(int j=; j<filecontent[i].size(); j++){
if(filecontent[i][j] == '('){
if(filecontent[i][j+] == ')'){
emptycontent.push_back(filecontent[] +"/"+ filecontent[i].substr(,j));
}else {
auto tmp = filecontent[i].substr(j+,filecontent[i].size() - j - );
//cout << tmp << endl;
mp[filecontent[i].substr(j+,filecontent[i].size() - j - )].push_back(filecontent[] + "/"+filecontent[i].substr(,j));
}
}
}
}
}
vector<vector<string>> findDuplicate(vector<string>& paths) {
vector<vector<string>> ret;
for(int i=; i<paths.size(); i++){
process(paths[i]);
}
for(auto it = mp.begin(); it != mp.end(); it++){
if(it->second.size() >= ){
ret.push_back(it->second);
}
}
return ret;
}
};

LC 609. Find Duplicate File in System的更多相关文章

  1. 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  2. 【leetcode】609. Find Duplicate File in System

    题目如下: Given a list of directory info including directory path, and all the files with contents in th ...

  3. 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...

  4. [LeetCode] Find Duplicate File in System 在系统中寻找重复文件

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  5. [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  6. LeetCode Find Duplicate File in System

    原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...

  7. [leetcode-609-Find Duplicate File in System]

    https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of direct ...

  8. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  9. HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)

    Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...

随机推荐

  1. C# Monitor Wait()和Pulse()

    C# Monitor Wait()和Pulse()   1.Monitor.Wait方法当线程调用 Wait 时,它释放对象的锁并进入对象的等待队列,对象的就绪队列中的下一个线程(如果有)获取锁并拥有 ...

  2. 修改tomcat 端口

    假设tomcat所在目录为/usr/local/apache-tomcat/ 1.打开tomcat配置文件#vi /usr/local/apache-tomcat/conf/server.xml 2. ...

  3. Asp.Net MVC4 使用Unity 实现依赖注入

    项目创建参考 上一篇   <<Asp.Net  MVC5  使用Unity 实现依赖注入>>, 不同的是这里是 Unity.MVC4 装好后会出现 然后示例说在这里写对应关系 ...

  4. WebClient HttpWebRequest 下载文件到本地

      处理方式: 第一种:  我们需要获取文件,但是我们不需要保存源文件的名称 public void DownFile(string uRLAddress, string localPath, str ...

  5. phpMyAdmin出现Fatal error: Maximum execution time of 300 seconds

    在mysql用phpMyAdmin导入大数据的时候出现:Fatal error: Maximum execution time of 300 seconds exceeded in D:/查了很多文章 ...

  6. YII2.0.12兼容PHP7.2版本升级

    YII2.0.12兼容PHP7.2版本升级 报错信息: FastCGI sent in stderr: "PHP message: PHP Fatal error:  Cannot use ...

  7. js抽奖,跑马灯

    分享自己写的跑马灯抽奖. HTML代码 <!--首先将一个div的背景设为一个圆形--> <div style=" width:240px; height:232px; b ...

  8. Spring boot传值注意事项

    后台debug看到有获取到这个字段的值了,但就是传到前端后,就丢失了这个userId字段,觉得非常奇怪,想不通 后来看到 @JsonIgnore 这个注解就知道原因了 共同学习,共同进步,若有补充,欢 ...

  9. Chrome报错提示Unchecked runtime.lastError: The message port closed before a response was received.

    经过查询,此错误是Chrome扩展插件引起的.由于Chrome修改了API接口,原来的请求被拦截.(Chrome 73 onwards disallows cross-origin requests ...

  10. Python之hashlib模块的使用

    hashlib模块主要的作用: 加密保护消息安全,常用的加密算法如MD5,SHA1等. 1.查看可用的算法有哪些 #!/usr/bin/env python # -*- coding: utf-8 - ...