Moderate 加入空格使得可辨别单词数量最多 @CareerCup
递归题目,注意结合了memo的方法和trie的应用
package Moderate; import java.util.Hashtable; import CtCILibrary.AssortedMethods;
import CtCILibrary.Trie; /**
* Oh, no! You have just completed a lengthy document when you have an unfortu-
nate Find/Replace mishap. You have accidentally removed all spaces, punctuation,
and capitalization in the document. A sentence like "I reset the computer. It still
didn't boot!" would become "iresetthecomputeritstilldidntboot". You figure that you
can add back in the punctation and capitalization later, once you get the individual
words properly separated. Most of the words will be in a dictionary, but some strings,
like proper names, will not.
Given a dictionary (a list of words), design an algorithm to find the optimal way of
"unconcatenating" a sequence of words. In this case, "optimal" is defined to be the
parsing which minimizes the number of unrecognized sequences of characters.
For example, the string "jesslookedjustliketimherbrother" would be optimally parsed
as "JESS looked just like TIM her brother". This parsing has seven unrecognized char-
acters, which we have capitalized for clarity. 给一个string,把string内的所有标点,空格都去掉。然后要求找到把空格加回去使得不可辨别的
单词数量达到最少的方法(判断是否可以辨别是通过提供一个字典来判断) *
*/
public class S17_14 { public static String sentence;
public static Trie dictionary; /* incomplete code */
public static Result parse(int wordStart, int wordEnd, Hashtable<Integer, Result> cache) {
if (wordEnd >= sentence.length()) {
return new Result(wordEnd - wordStart, sentence.substring(wordStart).toUpperCase());
}
if (cache.containsKey(wordStart)) {
return cache.get(wordStart).clone();
}
String currentWord = sentence.substring(wordStart, wordEnd + 1);
boolean validPartial = dictionary.contains(currentWord, false);
boolean validExact = validPartial && dictionary.contains(currentWord, true); /* break current word */
Result bestExact = parse(wordEnd + 1, wordEnd + 1, cache);
if (validExact) {
bestExact.parsed = currentWord + " " + bestExact.parsed;
} else {
bestExact.invalid += currentWord.length();
bestExact.parsed = currentWord.toUpperCase() + " " + bestExact.parsed;
} /* extend current word */
Result bestExtend = null;
if (validPartial) {
bestExtend = parse(wordStart, wordEnd + 1, cache);
} /* find best */
Result best = Result.min(bestExact, bestExtend);
cache.put(wordStart, best.clone());
return best;
} public static int parseOptimized(int wordStart, int wordEnd, Hashtable<Integer, Integer> cache) {
if (wordEnd >= sentence.length()) {
return wordEnd - wordStart;
}
if (cache.containsKey(wordStart)) {
return cache.get(wordStart);
} String currentWord = sentence.substring(wordStart, wordEnd + 1);
boolean validPartial = dictionary.contains(currentWord, false); /* break current word */
int bestExact = parseOptimized(wordEnd + 1, wordEnd + 1, cache);
if (!validPartial || !dictionary.contains(currentWord, true)) {
bestExact += currentWord.length();
} /* extend current word */
int bestExtend = Integer.MAX_VALUE;
if (validPartial) {
bestExtend = parseOptimized(wordStart, wordEnd + 1, cache);
} /* find best */
int min = Math.min(bestExact, bestExtend);
cache.put(wordStart, min);
return min;
} public static int parseSimple(int wordStart, int wordEnd) {
if (wordEnd >= sentence.length()) {
return wordEnd - wordStart;
} String word = sentence.substring(wordStart, wordEnd + 1); /* break current word */
int bestExact = parseSimple(wordEnd + 1, wordEnd + 1);
if (!dictionary.contains(word, true)) {
bestExact += word.length();
} /* extend current word */
int bestExtend = parseSimple(wordStart, wordEnd + 1); /* find best */
return Math.min(bestExact, bestExtend);
} public static String clean(String str) {
char[] punctuation = {',', '"', '!', '.', '\'', '?', ','};
for (char c : punctuation) {
str = str.replace(c, ' ');
}
return str.replace(" ", "").toLowerCase();
} public static void main(String[] args) {
dictionary = AssortedMethods.getTrieDictionary();
sentence = "As one of the top companies in the world, Google will surely attract the attention of computer gurus. This does not, however, mean the company is for everyone.";
sentence = clean(sentence);
System.out.println(sentence);
//Result v = parse(0, 0, new Hashtable<Integer, Result>());
//System.out.println(v.parsed);
int v = parseOptimized(0, 0, new Hashtable<Integer, Integer>());
System.out.println(v);
} static class Result {
public int invalid = Integer.MAX_VALUE;
public String parsed = "";
public Result(int inv, String p) {
invalid = inv;
parsed = p;
} public Result clone() {
return new Result(this.invalid, this.parsed);
} public static Result min(Result r1, Result r2) {
if (r1 == null) {
return r2;
} else if (r2 == null) {
return r1;
} return r2.invalid < r1.invalid ? r2 : r1;
}
} }
Moderate 加入空格使得可辨别单词数量最多 @CareerCup的更多相关文章
- Storm监控文件夹变化 统计文件单词数量
监控指定文件夹,读取文件(新文件动态读取)里的内容,统计单词的数量. FileSpout.java,监控文件夹,读取新文件内容 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
- python核心编程正则表达式练习题1-2匹配由单个空格分隔的任意单词对,也就是性和名
# 匹配由单个空格分隔的任意单词对,也就是姓和名 import re patt = '[A-Za-z]+ [A-Za-z]+' # 方法一 +加号操作符匹配它左边的正则表达式至少出现一次的情况 # p ...
- Python GitHub上星星数量最多的项目
GitHub上星星数量最多的项目 """ most_popular.py 查看GitHub上获得星星最多的项目都是用什么语言写的 """ i ...
- Java基础IO类之字符串流(查字符串中的单词数量)与管道流
一.字符串流 定义:字符串流(StringReader),以一个字符为数据源,来构造一个字符流. 作用:在Web开发中,我们经常要从服务器上获取数据,数据返回的格式通常一个字符串(XML.JSON), ...
- go语言小练习——给定英语文章统计单词数量
给定一篇英语文章,要求统计出所有单词的个数,并按一定次序输出.思路是利用go语言的map类型,以每个单词作为关键字存储数量信息,代码实现如下: package main import ( " ...
- 练习1-21:编写程序entab,将空格串替换为最少数量的制表符和空格。。。(C程序设计语言 第2版)
#include <stdio.h> #define N 5 main() { int i, j, c, lastc; lastc = 'a'; i = j = ; while ((c=g ...
- hadoop-mapreduce-(1)-统计单词数量
编写map程序 package com.cvicse.ump.hadoop.mapreduce.map; import java.io.IOException; import org.apache.h ...
- 在Linux系统下有一个目录/usr/share/dict/ 这个目录里包含了一个词典的文本文件,我们可以利用这个文件来辨别单词是否为词典中的单词。
#!/bin/bash s=`cat /usr/share/dict/linux.words` for i in $s; do if [ $1 = $i ];then echo "$1 在字 ...
- Python的 counter内置函数,统计文本中的单词数量
counter是 colletions内的一个类 可以理解为一个简单的计数 import collections str1=['a','a','b','d'] m=collections.Counte ...
随机推荐
- C#操作求出SQL中某一字段所有行的和方法!
DataTable table = xx.sqlcha(sql1);//调数据库 ; foreach(DataRow row in table.Rows)//遍历所查出记录所有行 { v = v + ...
- C#获取本机IP搜集整理7种方法
今天打算试着写个小聊天程序,但是要用到获取本机IP,以前从没用过.摆渡百度了一会儿,出于贪心,想把各种获取本机IP的方法给找出来.摆渡+测试了几个小时,于是有了下面的成果,有点小累,但看到这些成果,也 ...
- mysqldump备份、还原数据库路径名含有空格的处理方法(如:Program Files)
虽然以下的方法也可以解决,不过最简单直接的,还是直接在路径前后加双引号-" ",这个方法简单有效. 首先要说明的是mysqldump.exe在哪里不重要,重要的是要处理好路径中的非 ...
- 那些年,我们一起被坑的H5音频
原文地址:http://weibo.com/p/23041874d6cedd0102vkbr 不要被这么文艺的标题吓到,这里不会跟你讲述中学时期泡妞史,也不会有其它什么现实不该有而小说噼里啪啦不能 ...
- thinksns消息提示的实现机制(转)
转自:http://jingyan.baidu.com/article/f25ef2541718eb482c1b8215.html thinksns的消息提示不是实时的,而是1分钟向服务器请求一次,再 ...
- PHP不依赖系统自动执行机制
不依赖系统,以及不依赖yii事物机制,则考虑人为触发.触发可以写在总体的公共页面上,但是考虑到对数据库以及WWW服务器的压力问题,程序的延迟问题,需要对执行函数进行一些优化. 首先,我们考虑对数据库的 ...
- MySql数据库3【优化2】sql语句的优化
1.SELECT语句优化 1).利用LIMIT 1取得唯一行[控制结果集的行数] 有时,当你要查询一张表是,你知道自己只需要看一行.你可能会去的一条十分独特的记录,或者只是刚好检查了任何存在的记录数, ...
- 总结几种C#窗体间通讯的处理方法
摘要:本文介绍了C#窗体间通讯的几种处理方法,即传值.继承.事件回调,希望对大家有用. http://www.cnblogs.com/jara/p/3439603.html 应用程序开发中,经常需要多 ...
- Python核心编程2第六章课后练习
6-1 字符串 .string 模块中是否有一种字符串方法或者函数可以帮我鉴定一下一个字符串是否是另一个大字符串的一部分? #!/usr/bin/env python def contain(str1 ...
- AST抽象语法树
抽象语法树简介 (一)简介 抽象语法树(abstract syntax code,AST)是源代码的抽象语法结构的树状表示,树上的每个节点都表示源代码中的一种结构,这所以说是抽象的,是因为抽象语法树并 ...