Java实现的词频统计

要求：

1.读取文件；

2.记录出现的词汇及出现频率；

3.按照频率降序排列；

4.输出结果。

概要：

1.读取的文件路径是默认的，为了方便调试，将要统计的文章、段落复制到文本中即可；
2.只支持英文；
3.会按照词汇出现的频率降序排列。

实现：

1.使用FileReader、BufferedReader读取文件；

2.采用StringTokenizer进行字符分割；

3.用hashmap保存统计数据；

4.自定义一个类用来实现按value排序；

5.输出结果。

默认路径文件：

         String filename = "E:/Test.txt";

         FileReader fk = new FileReader(filename);

         BufferedReader br = new BufferedReader(fk);

统计词频：

         String s;

         while((s = br.readLine()) != null) {

             file += s; //读出整篇文章，存入String类的file中。

         }

         StringTokenizer st = new StringTokenizer(file," ,.!?\"'"); //用于切分字符串

         while(st.hasMoreTokens()) {

             String word = st.nextToken();

             if(hm.get(word) != null) {

                 int value = ((Integer)hm.get(word)).intValue();

                 value++;

                 hm.put(word, new Integer(value));

             }

             else {

                 hm.put(word, new Integer(1));

             }

         }

排序类：

 import java.util.Comparator;

 import java.util.TreeMap;

 public class ByValueComparator implements Comparator<String> {

     TreeMap<String, Integer> treemap;

     public ByValueComparator(TreeMap<String, Integer> tm) {

         this.treemap = tm;

     }

     @Override

     public int compare(String o1, String o2) {

         // TODO Auto-generated method stub

         if(!treemap.containsKey(o1) || !treemap.containsKey(o2)) {

             return 0;

         }

         if(treemap.get(o1) < treemap.get(o2)) {

             return 1;

         } else if(treemap.get(o1) == treemap.get(o2)) {

             return 0;

         } else {

             return -1;

         }

     }

 }

输出结果：

        TreeMap tm = new TreeMap(hm);

        ByValueComparator bvc = new ByValueComparator(tm);

        List<String> ll = new ArrayList<String>(tm.keySet());

        Collections.sort(ll, bvc);

        for(String str:ll){

            System.out.println(str+"——"+tm.get(str));

        }

实例验证：

There are moments in life when you miss someone so much that you just want to pick them from your dreams and hug them for real! Dream what you want to dream;go where you want to go;be what you want to be,because you have only one life and one chance to do all the things you want to do.

May you have enough happiness to make you sweet,enough trials to make you strong,enough sorrow to keep you human,enough hope to make you happy? Always put yourself in others’shoes.If you feel that it hurts you,it probably hurts the other person, too.

The happiest of people don’t necessarily have the best of everything;they just make the most of everything that comes along their way.Happiness lies for those who cry,those who hurt, those who have searched,and those who have tried,for only they can appreciate the importance of people

who have touched their lives.Love begins with a smile,grows with a kiss and ends with a tear.The brightest future will always be based on a forgotten past, you can’t go on well in lifeuntil you let go of your past failures and heartaches.

When you were born,you were crying and everyone around you was smiling.Live your life so that when you die,you're the one who is smiling and everyone around you is crying.

Please send this message to those people who mean something to you,to those who have touched your life in one way or another,to those who make you smile when you really need it,to those that make you see the brighter side of things when you are really down,to those who you want to let them know that you appreciate their friendship.And if you don’t, don’t worry,nothing bad will happen to you,you will just miss out on the opportunity to brighten someone’s day with this message.

　　结果：

you——32

to——19

who——9

those——9

the——8

have——7

and——7

of——6

make——6

that——6

want——6

your——4

with——4

when——4

one——4

life——4

a——4

in——4

enough——4

for——3

don’t——3

just——3

it——3

on——3

them——3

their——3

will——3

what——2

were——2

way——2

touched——2

this——2

things——2

so——2

smiling——2

smile——2

really——2

people——2

past——2

only——2

miss——2

message——2

let——2

is——2

hurts——2

go——2

everyone——2

do——2

crying——2

be——2

around——2

are——2

appreciate——2

The——2

another——1

always——1

along——1

all——1

When——1

There——1

Please——1

May——1

Love——1

Live——1

If——1

Happiness——1

Dream——1

And——1

Always——1

die——1

day——1

cry——1

comes——1

chance——1

can’t——1

can——1

brightest——1

brighter——1

brighten——1

born——1

best——1

begins——1

because——1

based——1

bad——1

happen——1

grows——1

go;be——1

future——1

from——1

friendship——1

forgotten——1

feel——1

failures——1

everything;they——1

everything——1

ends——1

dreams——1

dream;go——1

down——1

know——1

kiss——1

keep——1

importance——1

if——1

hurt——1

human——1

hug——1

hope——1

heartaches——1

happy——1

happiness——1

happiest——1

or——1

opportunity——1

nothing——1

need——1

necessarily——1

much——1

most——1

moments——1

mean——1

lives——1

lifeuntil——1

lies——1

side——1

send——1

see——1

searched——1

real——1

re——1

put——1

probably——1

pick——1

person——1

peoplewho——1

out——1

others’shoes——1

other——1

tried——1

trials——1

too——1

they——1

tear——1

sweet——1

strong——1

sorrow——1

something——1

someone’s——1

someone——1

yourself——1

worry——1

where——1

well——1

was——1

代码地址：https://coding.net/u/regretless/p/WordFrequencyCount/git

Java实现的词频统计的更多相关文章

Java实现的词频统计——Web迁移
本次将原本控制台工程迁移到了web工程上,依旧保留原本控制台的版本. 需求: 1.把程序迁移到web平台,通过用户上传TXT的方式接收文件: 2.在页面上给出链接 (如果有封皮.作者.字数.页数等信息 ...
Java实现的词频统计——功能改进
本次改进是在原有功能需求及代码基础上额外做的修改,保证了原有的基础需求之外添加了新需求的功能. 功能: 1. 小文件输入——从控制台由用户输入到文件中,再对文件进行统计: 2.支持命令行输入英文作品的 ...
Java实现中文词频统计
昨日有个中文词频统计的需求, 百度一番后, 发现一大堆标题党文章, 讲的与内容严重不符, 这里就简单记录下自己实现的流程吧! 与英文单词的词频统计不同, 中文的难点在于如何分词, 不过好在有许多优秀的 ...
Java实现的词频统计——单元测试
前言:本次测试过程中发现了几个未知字符,这里将其转化为十六进制码对其加以区分. 1)保存统计结果的Result文件中显示如图: 2)将其复制到eclipse环境下的切分方法StringTokenize ...
如何用java完成一个中文词频统计程序
要想完成一个中文词频统计功能,首先必须使用一个中文分词器,这里使用的是中科院的.下载地址是http://ictclas.nlpir.org/downloads,由于本人电脑系统是win32位的,因此下 ...
词频统计的java实现方法——第一次改进
需求概要原需求 1.读取文件,文件内包可含英文字符,及常见标点,空格级换行符. 2.统计英文单词在本文件的出现次数 3.将统计结果排序 4.显示排序结果新需求: 1.小文件输入. 为表明程序能跑 ...
效能分析——词频统计的java实现方法的第一次改进
java效能分析可以使用JProfiler 词频统计处理的文件为WarAndPeace,大小3282KB约3.3MB,输出结果到文件在程序本身内开始和结束分别加入时间戳,差值平均为480-490ms ...
【第二周】Java实现英语文章词频统计（改进1）
本周根据杨老师的spec对英语文章词频统计进行了改进 1.需求分析: 对英文文章中的英文单词进行词频统计并按照有大到小的顺序输出, 2.算法思想: (1)构建一个类用于存放英文单词及其出现的次数 cl ...
java词频统计——web版支持
需求概要: 1.把程序迁移到web平台,通过用户上传TXT的方式接收文件. 2.用户直接输入要统计的文本,服务器返回结果 3.在页面上给出链接 (如果有封皮.作者.字数.页数等信息更佳)或表格,展示经 ...

随机推荐

ACM1005：Number Sequence
Problem Description A number sequence is defined as follows:f(1) = 1, f(2) = 1, f(n) = (A * f(n - 1) ...
5.18-笨办法学python-习题14
有了习题13的基础,习题14就不是问题了. 这一节主要是一个简单的提示符.提示符就是像">"这个的东西,因为我们之前用input的时候,它是用来让用户输入的,可是平常人并不知 ...
计算机专业C语言编程学习重点：指针化难为易
C语言是面向过程的,而C++是面向对象的 C和C++的区别: C是一个结构化语言,它的重点在于算法和数据结构.C程序的设计首要考虑的是如何通过一个过程,对输入(或环境条件)进行运算处理得到输出(或实现 ...
[BZOJ4712]洪水-[树链剖分+线段树]
Description 小A走到一个山脚下,准备给自己造一个小屋.这时候,小A的朋友(op,又叫管理员)打开了创造模式,然后飞到山顶放了格水.于是小A面前出现了一个瀑布.作为平民的小A只好老实巴交地爬 ...
解决非controller使用，@Autowired或者@Resource注解注入Mapper接口为null的问题
知识点:在service层中注入其它的service接口或者mapper接口都是可以的但是在封装的Utils工具类中或者非controller普通类中使用@Autowired@Resource注解注 ...
loj2538 「PKUWC 2018」Slay the Spire
pkusc 快到了--做点题涨涨 rp. ref我好菜啊QAQ. 可以发现期望只是一个幌子.我们的目的是:对于所有随机的选择方法(一共 \(\binom{2n}{m}\)种),这些选择方法都最优地打出 ...
关于网易云验证码V1.0版本的服务介绍
服务介绍易盾验证码是一个用于区分人和机器的通用验证码组件.传统的字符型验证码由于存在破解率高,用户体验不友好等问题,已不适用于现今的互联网环境.易盾验证码抛弃了传统字符型验证码展示-填写字符-比对答 ...
使用phpMyAdmin管理网站数据库(创建、导入、导出…)
作为一名站长,最重视的就是网站的数据安全了.本节襄阳网站优化就来讲讲如何使用phpMyAdmin管理软件进行mysql数据库的管理,实现基本的数据库管理用户.数据库的创建.数据的导入和导出操作(网站备 ...
Appium 安卓计算器demo
package testProject; import org.openqa.selenium.*; import org.openqa.selenium.remote.DesiredCapabili ...
Tree Traversals Again(根据前序，中序，确定后序顺序）
题目的大意是:进行一系列的操作push,pop.来确定后序遍历的顺序 An inorder binary tree traversal can be implemented in a non-recu ...

Java实现的词频统计

Java实现的词频统计的更多相关文章

随机推荐

热门专题