JAVA实验--统计文章中单词的个数并排序

分析：

1）要统计单词的个数，就自己的对文章中单词出现的判断的理解来说是：当出现一个非字母的字符的时候，对前面的一部分字符串归结为单词

2）对于最后要判断字母出现的个数这个问题，我认为应该是要用到map比较合适吧，因为map中有键-值的关系，可以把字符串设置为键，把出现的个数设置为整型，这样就能够建立起一一对应的关系，不用再判断所在的位置

根据上面自己的理解，今天我写了以下的一部分代码，对哈利波特第一集的这部分文章进行了单词的统计的测试，测试的结果相对良好，没有问题。

package pipei;

//洪鼎淇 20173627 信1705-3

import java.io.File;

import java.io.FileInputStream;

import java.io.FileOutputStream;

import java.io.IOException;

import java.io.InputStreamReader;

import java.io.OutputStreamWriter;

import java.util.HashMap;

import java.util.Map;

//哈利波特单词统计

public class Pipei {

    public Map<String,Integer> map1=new HashMap<String,Integer>();

    public static void main(String arg[]) {

        String sz[];

        Integer num[];

        final int MAXNUM=10; //统计的单词出现最多的前n个的个数

        sz=new String[MAXNUM+1];

        num=new Integer[MAXNUM+1];

        Pipei pipei=new Pipei();

        int account =1;

        //Vector<String> ve1=new Vector<String>();

        try {

            pipei.daoru();

        } catch (IOException e) {

            // TODO 自动生成的 catch 块

            e.printStackTrace();

        }

        System.out.println("英文单词的出现情况如下:");

        int g_run=0;

        for(g_run=0;g_run<MAXNUM+1;g_run++)

        {

            account=1;

            for(Map.Entry<String,Integer> it : pipei.map1.entrySet())

            {

                if(account==1)

                {

                    sz[g_run]=it.getKey();

                    num[g_run]=it.getValue();

                    account=2;

                }

                if(account==0)

                {

                    account=1;

                    continue;

                }

                if(num[g_run]<it.getValue())

                {

                    sz[g_run]=it.getKey();

                    num[g_run]=it.getValue();

                }

                //System.out.println("英文单词: "+it.getKey()+" 该英文单词出现次数: "+it.getValue());

            }

            pipei.map1.remove(sz[g_run]);

        }

        int g_count=1;

        String tx1=new String();

        for(int i=0;i<g_run;i++)

        {

            if(sz[i]==null)

                continue;

            if(sz[i].equals(""))

                continue;

            tx1+="出现次数第"+(g_count)+"多的单词为:"+sz[i]+"\t\t\t出现次数: "+num[i]+"\r\n";

            System.out.println("出现次数第"+(g_count)+"多的单词为:"+sz[i]+"\t\t\t出现次数: "+num[i]);

            g_count++;

        }

        try {

            pipei.daochu(tx1);

        } catch (IOException e) {

            // TODO 自动生成的 catch 块

            e.printStackTrace();

        }

    }

    public void daoru() throws IOException

    {

        File a=new File("1.Harry Potter and the Sorcerer's Stone.txt");

        FileInputStream b = new FileInputStream(a);

        InputStreamReader c=new InputStreamReader(b,"UTF-8");

        String string2=new String();

        while(c.ready())

        {

            char string1=(char) c.read();

            if(!isWord(string1))

            {

                if(map1.containsKey(string2))

                {

                    Integer num1=map1.get(string2)+1;

                    map1.put(string2,num1);

                }

                else

                {

                    Integer num1=1;

                    map1.put(string2,num1);

                }

                string2="";

            }

            else

            {

                string2+=string1;

            }

        }

        if(!string2.isEmpty())

        {

            if(map1.containsKey(string2))

            {

                Integer num1=map1.get(string2)+1;

                map1.put(string2,num1);

            }

            else

            {

                Integer num1=1;

                map1.put(string2,num1);

            }

            string2="";

        }

        c.close();

        b.close();

    }

    public void daochu(String txt) throws IOException

    {

        File fi=new File("tongji.txt");

        FileOutputStream fop=new FileOutputStream(fi);

        OutputStreamWriter ops=new OutputStreamWriter(fop,"UTF-8");

        ops.append(txt);

        ops.close();

        fop.close();

    }

    public boolean isWord(char a)

    {

        if(a<='z'&&a>='a'||a<='Z'&&a>='A')

            return true;

        return false;

    }

}

测试截图：

这是出现的单词的截图情况，对于其中出现s的情况的分析，s其实是单词里面的缩写，一般往往有It's 这种情况的出现，为了避免这种情况，我对文中的某一部分进行修改(修改部分如下图)，将It's看成一部分

isWord函数判断是否为字母的这一部分将单引号也作为一个符号放进去

修改完之后的测试如下：

搞定！！

JAVA实验--统计文章中单词的个数并排序的更多相关文章

统计文件中单词的个数---Shell及python版
最近在看shell中有个题目为统计单词的个数,使用了awk功能,代码如下 #!/bin/bash ];then echo "Usage:basename $0 filename" ...
C语言算法--统计字符串中单词的个数
#include <stdio.h> #include <string.h> #include <stdlib.h> int main(void) { int le ...
Python 统计文本中单词的个数
1.读文件,通过正则匹配 def statisticWord(): line_number = 0 words_dict = {} with open (r'D:\test\test.txt',enc ...
使用tuple统计文件中单词的个数
name = input("Enter file:") if len(name) < 1 : name = "input.txt" fhand = ope ...
java统计文本中单词出现的个数
package com.java_Test; import java.io.File; import java.util.HashMap; import java.util.Iterator; imp ...
c程序设计语言_习题1-13_统计输入中单词的长度，并且根据不同长度出现的次数绘制相应的直方图
Write a program to print a histogram of the lengths of words in its input. It is easy to draw the hi ...
学c语言做练习之统计文件中字符的个数
统计文件中字符的个数(采用命令行参数) #include<stdio.h> #include<stdlib.h> int main(int argc, char *argv[] ...
统计无向图中三角形的个数，复杂度m*sqrt(m).
统计无向图中三角形的个数,复杂度m*sqrt(m). #include<stdio.h> #include<vector> #include<set> #inclu ...
C语言统计一篇英文短文中单词的个数
//凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/ #include<stdio.h> #define N 1000 void main(){ ] ...

随机推荐

docker client和daemom
client 模式 docker命令对应的源文件是docker/docker.go, docker [options] command [arg...] 其中options参数为flag,任何时候执行 ...
Bug的分类和管理流程
1.按照严重程度划分定义:是指Bug对软件质量的破坏程度,即BUG的存在将对软件的功能和性能产生怎样的影响分类:系统崩溃.严重.一般.次要.建议 2.按优先级划分定义:表示处理和修正软件缺陷的现 ...
clone对象或数组
function clone(obj) { var o; if (typeof obj == "object") { if (obj === null) { o = null; } ...
Linux-04 Linux中Tomcat和MySQL的安装
1.下载apache-tomcat-7.0.79-tar.tar2.解压到当前用户目录,改名为tomcat [hduser@node1 ~]$ tar -zxvf apache-tomcat-7.0. ...
《3+1团队》【Alpha】Scrum meeting 1
项目内容这个作业属于哪个课程任课教师博客主页链接这个作业的要求在哪里作业链接地址团队名称 3+1团队团队博客地址 https://home.cnblogs.com/u/3-1group ...
mybaits2-Dao开发
项目结构: 1.创建project,导入相关依赖(前提).配置db.properties与mybaits-config #mysql驱动 db.driver=com.mysql.jdbc.Driver ...
【讲●解】KMP算法
KMP算法我们小组负责讲这个... 术语与规定为了待会方便,所以不得不做一些看起来很拖沓的术语,但这些规定能让我们更好地理解$KMP$甚至$AC$自动机. 字符串匹配形式化定义如下: 假设 ...
nginx如何防止高负载造成服务器崩溃
nginx-http-sysguard模块一.作用防止因nginx并发访问量过高或者遭受攻击造成服务器宕机,可根据负载设置界面跳转. 二.安装配置 1.下载模块软件包 wget https:/ ...
.NET中的缓存实现
软件开发中最常用的模式之一是缓存,这是一个简单但非常有效的概念,想法是重用操作结果,执行繁重的操作时,我们会将结果保存在缓存容器中,下次我们需要该结果时,我们将从缓存容器中取出它,而不是再次执行繁重的 ...
什么是Service Mesh？
转至大佬宋净明的博客:https://jimmysong.io/posts/what-is-a-service-mesh/ Service mesh 又译作 “服务网格”,作为服务间通信的基础设施层. ...

JAVA实验--统计文章中单词的个数并排序

JAVA实验--统计文章中单词的个数并排序的更多相关文章

随机推荐

热门专题