Moderate 加入空格使得可辨别单词数量最多 @CareerCup
递归题目,注意结合了memo的方法和trie的应用
package Moderate; import java.util.Hashtable; import CtCILibrary.AssortedMethods;
import CtCILibrary.Trie; /**
* Oh, no! You have just completed a lengthy document when you have an unfortu-
nate Find/Replace mishap. You have accidentally removed all spaces, punctuation,
and capitalization in the document. A sentence like "I reset the computer. It still
didn't boot!" would become "iresetthecomputeritstilldidntboot". You figure that you
can add back in the punctation and capitalization later, once you get the individual
words properly separated. Most of the words will be in a dictionary, but some strings,
like proper names, will not.
Given a dictionary (a list of words), design an algorithm to find the optimal way of
"unconcatenating" a sequence of words. In this case, "optimal" is defined to be the
parsing which minimizes the number of unrecognized sequences of characters.
For example, the string "jesslookedjustliketimherbrother" would be optimally parsed
as "JESS looked just like TIM her brother". This parsing has seven unrecognized char-
acters, which we have capitalized for clarity. 给一个string,把string内的所有标点,空格都去掉。然后要求找到把空格加回去使得不可辨别的
单词数量达到最少的方法(判断是否可以辨别是通过提供一个字典来判断) *
*/
public class S17_14 { public static String sentence;
public static Trie dictionary; /* incomplete code */
public static Result parse(int wordStart, int wordEnd, Hashtable<Integer, Result> cache) {
if (wordEnd >= sentence.length()) {
return new Result(wordEnd - wordStart, sentence.substring(wordStart).toUpperCase());
}
if (cache.containsKey(wordStart)) {
return cache.get(wordStart).clone();
}
String currentWord = sentence.substring(wordStart, wordEnd + 1);
boolean validPartial = dictionary.contains(currentWord, false);
boolean validExact = validPartial && dictionary.contains(currentWord, true); /* break current word */
Result bestExact = parse(wordEnd + 1, wordEnd + 1, cache);
if (validExact) {
bestExact.parsed = currentWord + " " + bestExact.parsed;
} else {
bestExact.invalid += currentWord.length();
bestExact.parsed = currentWord.toUpperCase() + " " + bestExact.parsed;
} /* extend current word */
Result bestExtend = null;
if (validPartial) {
bestExtend = parse(wordStart, wordEnd + 1, cache);
} /* find best */
Result best = Result.min(bestExact, bestExtend);
cache.put(wordStart, best.clone());
return best;
} public static int parseOptimized(int wordStart, int wordEnd, Hashtable<Integer, Integer> cache) {
if (wordEnd >= sentence.length()) {
return wordEnd - wordStart;
}
if (cache.containsKey(wordStart)) {
return cache.get(wordStart);
} String currentWord = sentence.substring(wordStart, wordEnd + 1);
boolean validPartial = dictionary.contains(currentWord, false); /* break current word */
int bestExact = parseOptimized(wordEnd + 1, wordEnd + 1, cache);
if (!validPartial || !dictionary.contains(currentWord, true)) {
bestExact += currentWord.length();
} /* extend current word */
int bestExtend = Integer.MAX_VALUE;
if (validPartial) {
bestExtend = parseOptimized(wordStart, wordEnd + 1, cache);
} /* find best */
int min = Math.min(bestExact, bestExtend);
cache.put(wordStart, min);
return min;
} public static int parseSimple(int wordStart, int wordEnd) {
if (wordEnd >= sentence.length()) {
return wordEnd - wordStart;
} String word = sentence.substring(wordStart, wordEnd + 1); /* break current word */
int bestExact = parseSimple(wordEnd + 1, wordEnd + 1);
if (!dictionary.contains(word, true)) {
bestExact += word.length();
} /* extend current word */
int bestExtend = parseSimple(wordStart, wordEnd + 1); /* find best */
return Math.min(bestExact, bestExtend);
} public static String clean(String str) {
char[] punctuation = {',', '"', '!', '.', '\'', '?', ','};
for (char c : punctuation) {
str = str.replace(c, ' ');
}
return str.replace(" ", "").toLowerCase();
} public static void main(String[] args) {
dictionary = AssortedMethods.getTrieDictionary();
sentence = "As one of the top companies in the world, Google will surely attract the attention of computer gurus. This does not, however, mean the company is for everyone.";
sentence = clean(sentence);
System.out.println(sentence);
//Result v = parse(0, 0, new Hashtable<Integer, Result>());
//System.out.println(v.parsed);
int v = parseOptimized(0, 0, new Hashtable<Integer, Integer>());
System.out.println(v);
} static class Result {
public int invalid = Integer.MAX_VALUE;
public String parsed = "";
public Result(int inv, String p) {
invalid = inv;
parsed = p;
} public Result clone() {
return new Result(this.invalid, this.parsed);
} public static Result min(Result r1, Result r2) {
if (r1 == null) {
return r2;
} else if (r2 == null) {
return r1;
} return r2.invalid < r1.invalid ? r2 : r1;
}
} }
Moderate 加入空格使得可辨别单词数量最多 @CareerCup的更多相关文章
- Storm监控文件夹变化 统计文件单词数量
监控指定文件夹,读取文件(新文件动态读取)里的内容,统计单词的数量. FileSpout.java,监控文件夹,读取新文件内容 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
- python核心编程正则表达式练习题1-2匹配由单个空格分隔的任意单词对,也就是性和名
# 匹配由单个空格分隔的任意单词对,也就是姓和名 import re patt = '[A-Za-z]+ [A-Za-z]+' # 方法一 +加号操作符匹配它左边的正则表达式至少出现一次的情况 # p ...
- Python GitHub上星星数量最多的项目
GitHub上星星数量最多的项目 """ most_popular.py 查看GitHub上获得星星最多的项目都是用什么语言写的 """ i ...
- Java基础IO类之字符串流(查字符串中的单词数量)与管道流
一.字符串流 定义:字符串流(StringReader),以一个字符为数据源,来构造一个字符流. 作用:在Web开发中,我们经常要从服务器上获取数据,数据返回的格式通常一个字符串(XML.JSON), ...
- go语言小练习——给定英语文章统计单词数量
给定一篇英语文章,要求统计出所有单词的个数,并按一定次序输出.思路是利用go语言的map类型,以每个单词作为关键字存储数量信息,代码实现如下: package main import ( " ...
- 练习1-21:编写程序entab,将空格串替换为最少数量的制表符和空格。。。(C程序设计语言 第2版)
#include <stdio.h> #define N 5 main() { int i, j, c, lastc; lastc = 'a'; i = j = ; while ((c=g ...
- hadoop-mapreduce-(1)-统计单词数量
编写map程序 package com.cvicse.ump.hadoop.mapreduce.map; import java.io.IOException; import org.apache.h ...
- 在Linux系统下有一个目录/usr/share/dict/ 这个目录里包含了一个词典的文本文件,我们可以利用这个文件来辨别单词是否为词典中的单词。
#!/bin/bash s=`cat /usr/share/dict/linux.words` for i in $s; do if [ $1 = $i ];then echo "$1 在字 ...
- Python的 counter内置函数,统计文本中的单词数量
counter是 colletions内的一个类 可以理解为一个简单的计数 import collections str1=['a','a','b','d'] m=collections.Counte ...
随机推荐
- 哈哈,CSDN又支持Windows Live Writer了
从10年开始写CSDN博客,后面不支持WLW了,就不怎么写了,话说自带的编辑器确实不怎么样,不过又支持了,那就哈哈,重新开工了. 关于如何配置的,跟以前一样,详情如下所示: http://blog.c ...
- 【CF39E】【博弈论】What Has Dirichlet Got to Do with That?
Description You all know the Dirichlet principle, the point of which is that if n boxes have no less ...
- 使用BeanUtils组件
使用BeanUtils组件 前提 1:导入commons-beanutils-1.8.3.jar //根据 本人使用的是1.8.3的版本 2:导入日志包 //就是loggin ...
- asp.net在应用母版的页面下采用了ModalPopupExtender弹出窗中应用autocomplete
autocomplete是jqueryUI的一个插件,可以实现自动填充的功能. 要点:1.应用了母版页,所以取页面上控件的ID时与一般方法不同 2.由于用了ajax的updatepanel,所以会出现 ...
- MySQL Procedure(MySQL存储过程)[转]
------Creating Stored Procedures in MySQL------ --Make sure you have version 5 of MySQL: SELECT VE ...
- js 强制转换
强制转换为布尔类型: <script> var text =Boolean(0) //=>以下转换的类型都为false text = Boolean(0.0) text = Bool ...
- ajax验证用户名和找回密码参考
// JavaScript Document function chkname(form){ var user = form.user.value; if(user == ''){ alert('请输 ...
- apache rewrite .htaccess 站点内容重定向实例
<IfModule mod_rewrite.c> Options +FollowSymlinks RewriteEngine On RewriteCond %{REQUEST_FILENA ...
- 解读为什么有符号的char可表示范围是-128~+127
问:为什么有符号的char可表示范围是-128~+127? 要明白这个问题,首先要明白一下几点: 对于char和int计算机中以补码形式存在. 严格来说计算机就是傻逼,它只知道某个位上是0还是1. 我 ...
- A题 - A + B Problem
Time Limit:1000MS Memory Limit:32768KB 64bit IO Format:%I64d & %I64u Description Cal ...