[LeetCode] Short Encoding of Words 单词集的短编码

Given a list of words, we may encode it by writing a reference string S and a list of indexes A.

For example, if the list of words is ["time", "me", "bell"], we can write it as S = "time#bell#" and indexes = [0, 2, 5].

Then for each index, we will recover the word by reading from the reference string from that index until we reach a "#" character.

What is the length of the shortest reference string S possible that encodes the given words?

Example:

Input: words = ["time", "me", "bell"]

Output: 10

Explanation: S = "time#bell#" and indexes = [0, 2, 5].

Note:

1 <= words.length <= 2000.
1 <= words[i].length <= 7.
Each word has only lowercase letters.

这道题给了我们一个单词数组，让我们对其编码，不同的单词之间加入#号，每个单词的起点放在一个坐标数组内，终点就是#号，能合并的单词要进行合并，问输入字符串的最短长度。题意不难理解，难点在于如何合并单词，我们观察题目的那个例子，me和time是能够合并的，只要标清楚其实位置，time的起始位置是0，me的起始位置是2，那么根据#号位置的不同就可以顺利的取出me和time。需要注意的是，如果me换成im，或者tim的话，就不能合并了，因为我们是要从起始位置到#号之前所有的字符都要取出来。搞清楚了这一点之后，我们在接着观察，由于me是包含在time中的，所以我们处理的顺序应该是先有time#，然后再看能否包含me，而不是先生成了me#之后再处理time，所以我们可以得出结论，应该先处理长单词，那么就给单词数组按长度排序一下就行，自己重写一个comparator就行。然后我们遍历数组，对于每个单词，我们都在编码字符串查找一下，如果没有的话，直接加上这个单词，再加一个#号，如果有的话，就可以得到出现的位置。比如在time#中查找me，得到found=2，然后我们要验证该单词后面是否紧跟着一个#号，所以我们直接访问found+word.size()这个位置，如果不是#号，说明不能合并，我们还是要加上这个单词和#号。最后返回编码字符串的长度即可，参见代码如下：

解法一：

class Solution {

public:

    int minimumLengthEncoding(vector<string>& words) {

        string str = "";

        sort(words.begin(), words.end(), [](string& a, string& b){return a.size() > b.size();});

        for (string word : words) {

            int found = str.find(word);

            if (found == string::npos || str[found + word.size()] != '#') {

                str += word + "#";

            }

        }

        return str.size();

    }

};

我们再来看一种不用自定义comparator的方法，根据之前的分析，我们知道其实是在找单词的后缀，比如me就是time的后缀。我们希望将能合并的单词排在一起，比较好处理，而后缀又不好排序。那么我们就将其转为前缀，做法就是给每个单词翻转一下，time变成emit，me变成em，这样我们只要用默认的字母顺序排，就可以得到em，emit的顺序，那么能合并的单词就放到一起了，而且一定是当前的合并到后面一个，那么就好做很多了。我们只要判读当前单词是否是紧跟着的单词的前缀，是的话就加0，不是的话就要加上当前单词的长度并再加1，多加的1是#号。判断前缀的方法很简单，直接在后面的单词中取相同长度的前缀比较就行了。由于我们每次都要取下一个单词，为了防止越界，只处理到倒数第二个单词，那么就要把最后一个单词的长度加入结果res，并再加1即可，参见代码如下：

解法二：

class Solution {

public:

    int minimumLengthEncoding(vector<string>& words) {

        int res = , n = words.size();

        for (int i = ; i < n; ++i) reverse(words[i].begin(), words[i].end());

        sort(words.begin(), words.end());

        for (int i = ; i < n - ; ++i) {

            res += (words[i] == words[i + ].substr(, words[i].size())) ?   : words[i].size() + ;

        }

        return res + words.back().size() + ;

    }

};

接下来的这种方法也很巧妙，用了一个HashSet，将所有的单词先放到这个HashSet中。原理是对于每个单词，我们遍历其所有的后缀，比如time，那么就遍历ime，me，e，然后看HashSet中是否存在这些后缀，有的话就删掉，那么HashSet中的me就会被删掉，这样保证了留下来的单词不可能再合并了，最后再加上每个单词的长度到结果res，并且同时要加上#号的长度，参见代码如下：

解法三：

class Solution {

public:

    int minimumLengthEncoding(vector<string>& words) {

        int res = ;

        unordered_set<string> st(words.begin(), words.end());

        for (string word : st) {

            for (int i = ; i < word.size(); ++i) {

                st.erase(word.substr(i));

            }

        }

        for (string word : st) res += word.size() + ;

        return res;

    }

};

参考资料：

https://leetcode.com/problems/short-encoding-of-words/

https://leetcode.com/problems/short-encoding-of-words/discuss/125825/Easy-to-understand-Java-solution

https://leetcode.com/problems/short-encoding-of-words/discuss/125822/C%2B%2B-4-lines-reverse-and-sort

https://leetcode.com/problems/short-encoding-of-words/discuss/125811/C%2B%2BJavaPython-Easy-Understood-Solution-with-Explanation

LeetCode All in One 题目讲解汇总(持续更新中...)

[LeetCode] Short Encoding of Words 单词集的短编码的更多相关文章

LC 820. Short Encoding of Words
Given a list of words, we may encode it by writing a reference string S and a list of indexes A. For ...
2.keras实现-->字符级或单词级的one-hot编码 VS 词嵌入
1. one-hot编码 # 字符集的one-hot编码 import string samples = ['zzh is a pig','he loves himself very much','p ...
【LeetCode】820. 单词的压缩编码 Short Encoding of Words（Python）
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址:https://leetcode-cn.com/problems/short- ...
[Swift]LeetCode820. 单词的压缩编码 | Short Encoding of Words
Given a list of words, we may encode it by writing a reference string S and a list of indexes A. For ...
【leetcode】820. Short Encoding of Words
题目如下: 解题思路:本题考查就是找出一个单词是不是另外一个单词的后缀,如果是的话,就可以Short Encode.所以,我们可以把words中每个单词倒置后排序,然后遍历数组,每个元素只要和其后面相 ...
[LeetCode] Concatenated Words 连接的单词
Given a list of words (without duplicates), please write a program that returns all concatenated wor ...
[LeetCode] Valid Word Square 验证单词平方
Given a sequence of words, check whether it forms a valid word square. A sequence of words forms a v ...
[LeetCode] Valid Word Abbreviation 验证单词缩写
Given a non-empty string s and an abbreviation abbr, return whether the string matches with the give ...
[LeetCode] Shortest Word Distance 最短单词距离
Given a list of words and two words word1 and word2, return the shortest distance between these two ...

随机推荐

MongoDB 分片集群技术
在了解分片集群之前,务必要先了解复制集技术! 1.1 MongoDB复制集简介一组Mongodb复制集,就是一组mongod进程,这些进程维护同一个数据集合.复制集提供了数据冗余和高等级的可靠性,这 ...
Win7 x64位打开VirtualBox报错处理。
错误代码如下: Failed to instantiate CLSID_VirtualBox w/ IVirtualBox, but CLSID_VirtualBox w/ IUnknown work ...
移动端bug集合
移动端软键盘遮挡输入框&IOS移动端点击输入框字体放大&IOS点击闪烁移动端软键盘遮挡输入框 ——> 转载自: https://www.jb51.net/article/1 ...
🍓vue & react 一些重要但没必要死记硬背的东西
Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC
解决Invalid character found in the request target. The valid characters are defined in RFC 7230 and RF ...
POJ 1251 Jungle Roads （最小生成树）
题目: Description The Head Elder of the tropical island of Lagrishan has a problem. A burst of foreign ...
转-filebeat 源码分析
背景在基于elk的日志系统中,filebeat几乎是其中必不可少的一个组件,例外是使用性能较差的logstash file input插件或自己造个功能类似的轮子:). 在使用和了解filebeat ...
PHP实用工具类
2018年10月25日 20:21:09 组装SQL语句适用MySQL, 链式调用, 组装后可传参选择是否查询数据库项目地址: 码云生成数据字典适用MySQL, 链式调用, 可生成word, ...
WPF样式中TargetType 属性 (Property) 和 x:Key 属性 (Attribute)
如第一个示例所示,如果将 TargetType 属性设置为 TextBlock 而不为样式分配 x:Key,样式就会应用于所有 TextBlock 元素.这种情况下,x:Key 隐式设置为 {x:Ty ...
洛谷P5206 [WC2019]数树 [容斥，DP，生成函数，NTT]
传送门 Orz神仙题,让我长了许多见识. 长式子警告思路 y=1 由于y=1时会导致后面一些式子未定义,先抓出来. printf("%lld",opt==0?1:(opt==1? ...

[LeetCode] Short Encoding of Words 单词集的短编码

[LeetCode] Short Encoding of Words 单词集的短编码的更多相关文章

随机推荐

热门专题