日期:2020.02.04

博客期:143

星期二

   【本博客的代码如若要使用,请在下方评论区留言,之后再用(就是跟我说一声)】

  所有相关跳转:

  a.【简单准备

  b.【云图制作+数据导入

  c.【拓扑数据

  d.【数据修复

  e.【解释修复+热词引用

   f.【JSP演示+页面跳转

  g.【热词分类+目录生成】(本期博客)

  h.【热词关系图+报告生成

  i . 【App制作

  j . 【安全性改造


  如下图,我已经解决的需求是标黄的部分,剩余需求就只有 热词分类、目录生成、热词关系图展示、数据报告导出 四部分了,这些需求是最紧要完成的,呼~撸起袖子加油干!

    

   1、热词分类

    老师说要参照各大平台的分类,我就直接按照博客园的分类来吧(我实在看不懂那些机器学习是怎么实现的,连入门的门槛都远远不及)!如下图,可以看到 博客园的新闻将新闻分成了如下几类:互联网类、IT业界类、软件开发类、开源类、电脑硬件类、游戏类、创业类、手机相关类、科学类、其他类。我就根据这几类将对应类新闻里爬出来的数据进行对应类的划分。(看来又要重新爬数据了啊)

    开始爬之前事先说明一下,这次改动应该是最后一次改动了,另外我发现每一类新闻都有 100 页,这...相当于每一类都有,所以不保证有误差的存在,另外为了减少数据量,我打算将 “频数为15” 这一条件上升到 “频数为20”,不然怎么爬的完?我先预算一下,今天和明天一起写这个博客,另外明天的话,就再写一份总结性的博客,这个小目标就算完结吧!当然最后可能会加入微信小程序部分或者APP部分,到时候再说。

    根据这10类新闻,我们总共要爬取些什么数据呢?

    首先,通过带有 header 的 request 方式爬取 https://news.cnblogs.com/ 这一初始链接,要爬以上 10 类新闻的链接,再爬取类中封装链接的构造,并开启新的爬取,对应每一类数据给爬到的热词信息后面追加一个“热词类型”的标签,这需要我们改造 KeyWords 类,向 KeyWords 类中加入 kind 属性,改写 __toString() 成员函数。之后改造调用过 KeyWords 类的地方。(News不需要)

    关于分类页面的构造方法:

      首先是原新闻网址:https://news.cnblogs.com/

      其次,以 “互联网” 为例:https://news.cnblogs.com/n/c1101

      然后是第 100 页的地址:https://news.cnblogs.com/n/c1101?page=100

      很容易的判断到是在原网址的基础上加入对应 互联网的 a 标签上的 href 链接,需要将数据加载到一起来组成爬取链接!

    但是爬的过程中发现了问题,就是我爬不到对应的分类链接,既然这样,我只能人工地获取它们的链接了,就10条数据无所谓了,本来因为懒想让网页帮我做的,看来是博客园让我勤快的。哈哈哈!

    对应链接:

      互联网类:https://news.cnblogs.com/n/c1101

      IT业界类:https://news.cnblogs.com/n/c1102

      软件开发类:https://news.cnblogs.com/n/c1103

      开源类:https://news.cnblogs.com/n/c1109

      电脑硬件类:https://news.cnblogs.com/n/c1111

      游戏类:https://news.cnblogs.com/n/c1110

      创业类:https://news.cnblogs.com/n/c1112

      手机相关类:https://news.cnblogs.com/n/c1113

      科学类:https://news.cnblogs.com/n/c1114

      其他类:https://news.cnblogs.com/n/c1199

    在 Surapity 类 中建立字典,存储类型的名称和对应链接。

    爬取时间较长,从下午4:51到现在第2天的1:44,过程曲折且难以简言明之。

    途中遇到好几个网站会使爬虫程序终止,比如 其他类的 Apple Watch UI动效解析 ,呜哇~试一次,卡一次。程序员的痛苦莫过于此!!!

    统计基础数据共计 17469 条 数据!文件大小约为 1.96 M !

    现在开始制作数据表:(先修改 fileR.py)

 import codecs

 def makeSql():
file_path = "../../testFile/frc/words_sql.txt"
f = codecs.open(file_path, "w+", 'utf-8')
f.write("")
f.close() fw = open("../../testFile/frc/word.txt", mode='r', encoding='utf-8')
tmp = fw.readlines() num = tmp.__len__() for i in range(0,num):
group = tmp[i].split("\t")
group[0] = "'" + group[0] + "'"
group[3] = "'" + group[3][0:group[3].__len__()-1] + "'"
f = codecs.open(file_path, "a+", 'utf-8')
f.write("Insert into words values ("+group[0]+","+group[1]+",'"+group[2]+"',"+group[3]+",'"+group[4]+"');"+"\n")
f.close() makeSql()

fileR.py

    执行并按照之前的方法导入数据,这里博主因为使用电脑管家清理了一下C盘,然后 Navicat就崩掉了,真的崩了(建立不了查询了,这个之后有解决方法的话,我再写一期博客吧!)!所以,不搞虚的,直接用文本导入了!

    建立 keywords 表(或视图)的方法同上上期的博客,那样获取每一个热词的数量!

 CREATE TABLE keywords
AS
(
SELECT
word AS word,
SUM(num) AS num
FROM
words
GROUP BY word
ORDER BY num
DESC
)

CreateKeywordsTable.sql

    

    哈哈哈哈!热词频数过万了呢!希望我的电脑还能撑住,继续爬!(但是现在已经2点了,先定个2个小时的闹钟,拓扑数据让它自己爬着)

    对于 WebConnector 类,我要着重说一下,我本次爬取将此代码注释掉了:

# 这句话处理以后,就将带有 “年”、“月”、“日” 字眼的语句以及之后的语句全部清除掉了,当时是旨在消除不必要的解释部分,但现在看来没必要!多多益善嘛!
tpl = StrSpecialDealer.ut_date(tpl)

    早上醒来发现大问题——电脑自己休眠了,唉~希望自己能够吃一堑长一智吧!

    在电脑熬夜干爬虫的时候尽力将休眠关闭,在设置中如下:

    拓扑数据也完成了,大约又历时 5 个小时,关键是在电脑爬虫时我还不能用电脑干其他的(尤其是截图软件,运行的话,爬虫程序一准给你崩停)

    终于有完整数据了,现在我们开始数据处理!

    根据不同分类将数据汇总和数据处理了(也就是说剩余没有Python的事情了),至此热词分类完毕。

  2、热词目录生成

    我们需要展示每一个分类的前10个数据,以此做成第一个页面。

    可以制作新的视图,也可以直接写大长 Sql 语句,我比较懒,就按长语句来了

 package com.servlet;

 import java.io.IOException;
import java.sql.SQLException;
import java.util.List; import javax.servlet.ServletException;
import javax.servlet.ServletOutputStream;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse; import org.json.JSONArray;
import org.json.JSONObject; import com.dblink.basic.utils.SqlUtils;
import com.dblink.basic.utils.sqlKind.MySql_s;
import com.dblink.basic.utils.user.UserInfo;
import com.dblink.bean.BeanGroup;
import com.dblink.sql.DBLink; @SuppressWarnings("unused")
public class ServletForMoreInfo extends HttpServlet{
/**
*
*/
private static final long serialVersionUID = 1L;
//----------------------------------------------------------------------//
public void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException
{
request.setCharacterEncoding("utf-8");
response.setCharacterEncoding("utf-8");
response.setContentType("application/json");
response.setHeader("Cache-Control", "no-cache");
response.setHeader("Pragma", "no-cache"); String kind = request.getParameter("kind"); JSONArray jsonArray = new JSONArray(); JSONObject jsonObj = new JSONObject(); DBLink dbLink = new DBLink(new SqlUtils(new MySql_s("rc"),new UserInfo("root","123456")));
BeanGroup bg = null;
try {
bg = dbLink.getSelect("Select word As word , SUM(num) As num From ( Select * From words Where kind = '"+kind+"' ) Group By word Order By num DESC Limit 0,10 ").beans; int leng = bg.size(); jsonObj.put("Length",leng); jsonArray.put(jsonObj); for(int i=0;i<leng;++i)
{
JSONObject jsonObject = new JSONObject();
jsonObject.put("word",bg.get(i).get(0));
jsonObject.put("num",bg.get(i).get(1));
jsonArray.put(jsonObject);
}
} catch (SQLException e) {
// Do Nothing ...
}
dbLink.free(); ServletOutputStream os = response.getOutputStream();
os.write(jsonArray.toString().getBytes());
os.flush();
os.close();
}
//---------------------------------------------------------------------------------//
}

ServletForMoreInfo.java

    如果你建立了对应 10 个分类的视图,你可以添加 Servlet 如下:(否则将视图名称替换成建立视图的Select语句)

 package com.servlet;

 import java.io.IOException;
import java.sql.SQLException;
import java.util.List; import javax.servlet.ServletException;
import javax.servlet.ServletOutputStream;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse; import org.json.JSONArray;
import org.json.JSONObject; import com.dblink.basic.utils.SqlUtils;
import com.dblink.basic.utils.sqlKind.MySql_s;
import com.dblink.basic.utils.user.UserInfo;
import com.dblink.bean.BeanGroup;
import com.dblink.sql.DBLink; @SuppressWarnings("unused")
public class ServletForKindKeyWords extends HttpServlet{
/**
*
*/
private static final long serialVersionUID = 1L;
//----------------------------------------------------------------------//
public void doPost(HttpServletRequest request,HttpServletResponse response) throws ServletException, IOException
{
request.setCharacterEncoding("utf-8");
response.setCharacterEncoding("utf-8");
response.setContentType("application/json");
response.setHeader("Cache-Control", "no-cache");
response.setHeader("Pragma", "no-cache"); String table = request.getParameter("table");
String sql_rest = request.getParameter("sql"); JSONArray jsonArray = new JSONArray(); JSONObject jsonObj = new JSONObject(); DBLink dbLink = new DBLink(new SqlUtils(new MySql_s("rc"),new UserInfo("root","123456")));
BeanGroup bg = null;
try {
bg = dbLink.getSelect("Select * From "+table+" "+sql_rest).beans; int leng = bg.size(); int maxSize = dbLink.getSelect("Select * From "+table+" ").beans.size(); int page = maxSize%leng==0?(maxSize/30):(maxSize/30)+1; jsonObj.put("Length",leng);
jsonObj.put("MaxSize",maxSize);
jsonObj.put("Page",page); jsonArray.put(jsonObj); for(int i=0;i<leng;++i)
{
JSONObject jsonObject = new JSONObject();
jsonObject.put("word",bg.get(i).get(0));
jsonObject.put("num",bg.get(i).get(1));
jsonObject.put("exp",bg.get(i).get(2));
jsonArray.put(jsonObject);
}
} catch (SQLException e) {
// Do Nothing ...
}
dbLink.free(); ServletOutputStream os = response.getOutputStream();
os.write(jsonArray.toString().getBytes());
os.flush();
os.close();
}
//---------------------------------------------------------------------------------//
}

ServletForKindKeyWords.java

    然后制作 js 部分:

      先显示分类,然后利用套装形式进行数据载入:

  如果点击 获取本类更多热词,就可以跳转至本类页面!

  Like this:

  附加新 js 代码:

 function makePageToKind()
{
var Area = '';
Area += '<div class="row">';
Area += ' <div class="col-md-12">';
Area += ' <h2>热词目录</h2>';
Area += ' </div>';
Area += '</div>';
Area += '<hr />';
Area += '<br>';
Area += '<br>';
Area += '<div id="MessageArea">';
Area += '</div>';
document.getElementById("page-inner").innerHTML = Area;
madeAllKindP();
}
function madeAllKindP()
{
var Area = '';
Area += '<div>';
Area += ' <ul>';
Area += ' <li>';
Area += ' <b>互联网类<b>';
Area += ' <div id="hlw"></div>';
Area += ' </li>';
Area += ' <li>';
Area += ' <b>IT业界类<b>';
Area += ' <div id="ityj"></div>';
Area += ' </li>';
Area += ' <li>';
Area += ' <b>软件开发类<b>';
Area += ' <div id="rjkf"></div>';
Area += ' </li>';
Area += ' <li>';
Area += ' <b>开源类<b>';
Area += ' <div id="ky"></div>';
Area += ' </li>';
Area += ' <li>';
Area += ' <b>电脑硬件类<b>';
Area += ' <div id="dnyj"></div>';
Area += ' </li>';
Area += ' <li>';
Area += ' <b>游戏类<b>';
Area += ' <div id="yx"></div>';
Area += ' </li>';
Area += ' <li>';
Area += ' <b>创业类<b>';
Area += ' <div id="cy"></div>';
Area += ' </li>';
Area += ' <li>';
Area += ' <b>手机相关类<b>';
Area += ' <div id="sjxg"></div>';
Area += ' </li>';
Area += ' <li>';
Area += ' <b>科学类<b>';
Area += ' <div id="kx"></div>';
Area += ' </li>';
Area += ' <li>';
Area += ' <b>其他类<b>';
Area += ' <div id="qt"></div>';
Area += ' </li>';
Area += ' </ul>';
Area += '</div>';
document.getElementById("MessageArea").innerHTML = Area;
makeNextStepOfGroupK("互联网类");
makeNextStepOfGroupK("IT业界类");
makeNextStepOfGroupK("软件开发类");
makeNextStepOfGroupK("开源类");
makeNextStepOfGroupK("电脑硬件类");
makeNextStepOfGroupK("游戏类");
makeNextStepOfGroupK("创业类");
makeNextStepOfGroupK("手机相关类");
makeNextStepOfGroupK("科学类");
makeNextStepOfGroupK("其他类");
}
function getKindWordsByKindName(word)
{
var id_t = "";
if(word=="互联网类")
id_t = "hlw";
else if(word=="IT业界类")
id_t = "ityj";
else if(word=="软件开发类")
id_t = "rjkf";
else if(word=="开源类")
id_t = "ky";
else if(word=="电脑硬件类")
id_t = "dnyj";
else if(word=="游戏类")
id_t = "yx";
else if(word=="创业类")
id_t = "cy";
else if(word=="手机相关类")
id_t = "sjxg";
else if(word=="科学类")
id_t = "kx";
else if(word=="其他类")
id_t = "qt";
return id_t;
}
function makeNextStepOfGroupK(word_t)
{
var xmlHttp = null;
try{
xmlHttp = new XMLHttpRequest();
} catch (e1) {
try {
xmlHttp = new ActiveXObject("Microsoft.XMLHTTP");
} catch (e2) {
alert("Your browser does not support XMLHTTP!");
return;
}
}
xmlHttp.onreadystatechange = function() {
if (xmlHttp.readyState == 4) {
if (xmlHttp.status == 200)
{
var Area = "&nbsp;&nbsp;";
s = xmlHttp.responseText;
var InformationSet = eval('('+s+')');
var leng = InformationSet[0].Length; var kindness = InformationSet[0].KindNess; for(var i=1;i<=leng;++i)
{
var word_s = InformationSet[i].word;
var num = InformationSet[i].num;
Area += "&nbsp;&nbsp;";
Area += "<a href='#' title='在本类型中引用次数:"+num+"' onclick='toSomeWhere(\""+word_s+"\")'>"+word_s+"</a>";
Area += "&nbsp;&nbsp;";
}
Area += "&nbsp;&nbsp;";
Area += "&nbsp;&nbsp;";
Area += "<a href='#' onclick='makePageToOneKind(\""+kindness+"\")'/>获取本类更多热词...</a>";
Area += "&nbsp;&nbsp;";
Area += "&nbsp;&nbsp;"; var id_t = getKindWordsByKindName(kindness);
document.getElementById(id_t).innerHTML = Area;
}
}
};
var url ="../com/servlet/ServletForMoreInfo";
var server = "kind="+word_t; xmlHttp.open("POST", url, true);
xmlHttp.setRequestHeader("Content-Type","application/x-www-form-urlencoded");
xmlHttp.send(server);
}
function makePageToOneKind(kind)
{
var Area = '';
Area += '<div class="row">';
Area += ' <div class="col-md-12">';
Area += ' <h2>'+kind+'</h2>';
Area += ' </div>';
Area += '</div>';
Area += '<hr />';
Area += '<br>';
Area += '<div style="background:rgb(0,153,255);margin-left:20px;margin-right:20px;height:25px;">';
Area += ' <div style="margin-left:10px;margin-right:10px;margin-top:5px;margin-bottom:5px;">';
Area += ' <b style="float:left;">热词表</b>';
Area += ' <div style="float:right;">';
Area += ' <select id="sty" onchange="simpleReset_Kind(\''+kind+'\')">';
Area += ' <option value="0" selected>按照词频顺序</option>';
Area += ' <option value="1">按照字母表顺序</option>';
Area += ' </select>';
Area += '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;';
Area += ' <select id="order" onchange="simpleReset_Kind(\''+kind+'\')">';
Area += ' <option value="0" selected>降序</option>';
Area += ' <option value="1">增序</option>';
Area += ' </select>';
Area += '&nbsp;&nbsp;';
Area += ' </div>';
Area += ' </div>';
Area += '</div>';
Area += '<br>';
Area += '<br>';
Area += '<div id="MessageArea">';
Area += '</div>';
document.getElementById("page-inner").innerHTML = Area;
simpleReset_Kind(kind);
}
function simpleReset_Kind(kind)
{
wordPage = 1;
resetAndFresh_Kind(kind);
}
function XReset_Kind(p,kind)
{
wordPage = p;
wordPage = parseInt(""+wordPage);
resetAndFresh_Kind(kind);
}
function makeSurePage_Kind(kind)
{
wordPage = document.getElementById("selPage").value;
wordPage = parseInt(""+wordPage);
resetAndFresh_Kind(kind);
}
function resetAndFresh_Kind(kind)
{
var sty = document.getElementById("sty").value;
var order = document.getElementById("order").value;
var xmlHttp = null;
try{
xmlHttp = new XMLHttpRequest();
} catch (e1) {
try {
xmlHttp = new ActiveXObject("Microsoft.XMLHTTP");
} catch (e2) {
alert("Your browser does not support XMLHTTP!");
return;
}
}
xmlHttp.onreadystatechange = function() {
if (xmlHttp.readyState == 4) {
if (xmlHttp.status == 200)
{
var Area = ""; s = xmlHttp.responseText;
var InformationSet = eval('('+s+')');
var leng = InformationSet[0].Length;
var max = InformationSet[0].MaxSize;
var pageNum = InformationSet[0].Page;
var kind = InformationSet[0].KindNess; Area += "<table class='WhatATable' style='margin-left:200px;float:left;'>";
Area += "<tr>";
Area += "<th style='width:100px;'>热词</th>";
Area += "<th style='width:100px;'>词频</th>";
Area += "<th style='width:100px;'>详细信息链接</th>";
Area += "</tr>";
if(leng<10)
{
for (var i=1;i<=leng;++i)
{
Area += "<tr>";
Area += " <td>";
Area += InformationSet[i].word;
Area += " </td>";
Area += " <td>";
Area += InformationSet[i].num;
Area += " </td>";
Area += " <td>";
Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>";
Area += " </td>";
Area += "</tr>";
}
}
else
{
for (var i=1;i<=10;++i)
{
Area += "<tr>";
Area += " <td>";
Area += InformationSet[i].word;
Area += " </td>";
Area += " <td>";
Area += InformationSet[i].num;
Area += " </td>";
Area += " <td>";
Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>";
Area += " </td>";
Area += "</tr>";
}
}
Area += "</table>"; if(leng>10)
{
Area += "<table class='WhatATable' style='margin-left:10px;float:left;'>";
Area += "<tr>";
Area += "<th style='width:100px;'>热词</th>";
Area += "<th style='width:100px;'>词频</th>";
Area += "<th style='width:100px;'>详细信息链接</th>";
Area += "</tr>";
if(leng<=20)
{
for (var i=11;i<=leng;++i)
{
Area += "<tr>";
Area += " <td>";
Area += InformationSet[i].word;
Area += " </td>";
Area += " <td>";
Area += InformationSet[i].num;
Area += " </td>";
Area += " <td>";
Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>";
Area += " </td>";
Area += "</tr>";
}
}
else
{
for (var i=11;i<=20;++i)
{
Area += "<tr>";
Area += " <td>";
Area += InformationSet[i].word;
Area += " </td>";
Area += " <td>";
Area += InformationSet[i].num;
Area += " </td>";
Area += " <td>";
Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>";
Area += " </td>";
Area += "</tr>";
}
}
Area += "</table>";
} if(leng>20)
{
Area += "<table class='WhatATable' style='margin-left:10px;float:left;'>";
Area += "<tr>";
Area += "<th style='width:100px;'>热词</th>";
Area += "<th style='width:100px;'>词频</th>";
Area += "<th style='width:100px;'>详细信息链接</th>";
Area += "</tr>";
for (var i=21;i<=leng;++i)
{
Area += "<tr>";
Area += " <td>";
Area += InformationSet[i].word;
Area += " </td>";
Area += " <td>";
Area += InformationSet[i].num;
Area += " </td>";
Area += " <td>";
Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>";
Area += " </td>";
Area += "</tr>";
}
Area += "</table>";
}
Area += "<div style='clear:both;'></div>";
Area += "<br>";
Area += "<br>";
Area += "<br>";
Area += "<br>";
Area += "<p style='margin-left:30px;margin-right:30px;'>";
Area += "&nbsp;<button onclick='simpleReset_Kind(\""+kind+"\")'>起始页</button>&nbsp;"; var start = ((wordPage-4)>=1)?wordPage-4:1;
var end = ((wordPage+4)<=pageNum)?(wordPage+4):pageNum; //alert(parseInt(wordPage+4+"")); if(start!=1)
{
Area += "&nbsp;...&nbsp;";
} for(var i=start;i<=end;++i)
{
Area += "&nbsp;<button onclick='XReset_Kind(\""+i+"\",\""+kind+"\")'>"+i+"</button>&nbsp;";
} if(end!=pageNum)
{
Area += "&nbsp;...&nbsp;";
} Area += "&nbsp;<button onclick='XReset_Kind("+pageNum+",\""+kind+"\")'>结束页</button>&nbsp;";
Area += "&nbsp;&nbsp;<b>选择页数跳转</b>&nbsp;&nbsp;";
Area += "<select id='selPage' onchange='makeSurePage_Kind(\""+kind+"\")'>";
for(var i=1;i<=pageNum;++i)
{
Area += "<option value='"+i+"'>"+i+"</option>";
}
Area += "</select>";
Area += "</p>";
document.getElementById("MessageArea").innerHTML = Area;
surePage_Kind();
}
}
};
var url ="../com/servlet/ServletForKindKeyWords";
var server = "sql=";
// 按照词频顺序
if(sty==0)
{
server += " order by num ";
}
// 按照字母表顺序
else if(sty==1)
{
server += " order by word ";
} // 如果是降序
if(order==0)
{
server += " DESC ";
} server += (" Limit "+((wordPage-1)*30)+",30 "); server += "&table="+kind; xmlHttp.open("POST", url, true);
xmlHttp.setRequestHeader("Content-Type","application/x-www-form-urlencoded");
xmlHttp.send(server);
}
function surePage_Kind(kind)
{
document.getElementById("selPage").selectedIndex = wordPage-1;
}

wordkind.js

 var wordPage = 1;
function makePageToWord()
{
var Area = '';
Area += '<div class="row">';
Area += '<div class="col-md-12">';
Area += '<h2>全部热词</h2>';
Area += '</div>';
Area += '</div>';
Area += '<hr />';
Area += '<br>';
Area += '<div style="background:rgb(0,153,255);margin-left:20px;margin-right:20px;height:25px;">';
Area += ' <div style="margin-left:10px;margin-right:10px;margin-top:5px;margin-bottom:5px;">';
Area += ' <b style="float:left;">热词表</b>';
Area += ' <div style="float:right;">';
Area += ' <select id="sty" onchange="simpleReset()">';
Area += ' <option value="0" selected>按照词频顺序</option>';
Area += ' <option value="1">按照字母表顺序</option>';
Area += ' </select>';
Area += '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;';
Area += ' <select id="order" onchange="simpleReset()">';
Area += ' <option value="0" selected>降序</option>';
Area += ' <option value="1">增序</option>';
Area += ' </select>';
Area += '&nbsp;&nbsp;';
Area += ' </div>';
Area += ' </div>';
Area += '</div>';
Area += '<br>';
Area += '<br>';
Area += '<div id="MessageArea">';
Area += '</div>';
document.getElementById("page-inner").innerHTML = Area;
simpleReset();
}
function simpleReset()
{
wordPage = 1;
resetAndFresh();
}
function XReset(p)
{
wordPage = p;
wordPage = parseInt(""+wordPage);
resetAndFresh();
}
function resetAndFresh()
{
var sty = document.getElementById("sty").value;
var order = document.getElementById("order").value;
var xmlHttp = null;
try{
xmlHttp = new XMLHttpRequest();
} catch (e1) {
try {
xmlHttp = new ActiveXObject("Microsoft.XMLHTTP");
} catch (e2) {
alert("Your browser does not support XMLHTTP!");
return;
}
}
xmlHttp.onreadystatechange = function() {
if (xmlHttp.readyState == 4) {
if (xmlHttp.status == 200)
{
var Area = ""; s = xmlHttp.responseText;
var InformationSet = eval('('+s+')');
var leng = InformationSet[0].Length;
var max = InformationSet[0].MaxSize;
var pageNum = InformationSet[0].Page; Area += "<table class='WhatATable' style='margin-left:200px;float:left;'>";
Area += "<tr>";
Area += "<th style='width:100px;'>热词</th>";
Area += "<th style='width:100px;'>词频</th>";
Area += "<th style='width:100px;'>详细信息链接</th>";
Area += "</tr>";
if(leng<10)
{
for (var i=1;i<=leng;++i)
{
Area += "<tr>";
Area += " <td>";
Area += InformationSet[i].word;
Area += " </td>";
Area += " <td>";
Area += InformationSet[i].num;
Area += " </td>";
Area += " <td>";
Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>";
Area += " </td>";
Area += "</tr>";
}
}
else
{
for (var i=1;i<=10;++i)
{
Area += "<tr>";
Area += " <td>";
Area += InformationSet[i].word;
Area += " </td>";
Area += " <td>";
Area += InformationSet[i].num;
Area += " </td>";
Area += " <td>";
Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>";
Area += " </td>";
Area += "</tr>";
}
}
Area += "</table>"; if(leng>10)
{
Area += "<table class='WhatATable' style='margin-left:10px;float:left;'>";
Area += "<tr>";
Area += "<th style='width:100px;'>热词</th>";
Area += "<th style='width:100px;'>词频</th>";
Area += "<th style='width:100px;'>详细信息链接</th>";
Area += "</tr>";
if(leng<=20)
{
for (var i=11;i<=leng;++i)
{
Area += "<tr>";
Area += " <td>";
Area += InformationSet[i].word;
Area += " </td>";
Area += " <td>";
Area += InformationSet[i].num;
Area += " </td>";
Area += " <td>";
Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>";
Area += " </td>";
Area += "</tr>";
}
}
else
{
for (var i=11;i<=20;++i)
{
Area += "<tr>";
Area += " <td>";
Area += InformationSet[i].word;
Area += " </td>";
Area += " <td>";
Area += InformationSet[i].num;
Area += " </td>";
Area += " <td>";
Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>";
Area += " </td>";
Area += "</tr>";
}
}
Area += "</table>";
} if(leng>20)
{
Area += "<table class='WhatATable' style='margin-left:10px;float:left;'>";
Area += "<tr>";
Area += "<th style='width:100px;'>热词</th>";
Area += "<th style='width:100px;'>词频</th>";
Area += "<th style='width:100px;'>详细信息链接</th>";
Area += "</tr>";
for (var i=21;i<=leng;++i)
{
Area += "<tr>";
Area += " <td>";
Area += InformationSet[i].word;
Area += " </td>";
Area += " <td>";
Area += InformationSet[i].num;
Area += " </td>";
Area += " <td>";
Area += " <a href='#' onclick='toSomeWhere(\""+InformationSet[i].word+"\")'>详细信息</a>";
Area += " </td>";
Area += "</tr>";
}
Area += "</table>";
}
Area += "<div style='clear:both;'></div>";
Area += "<br>";
Area += "<br>";
Area += "<br>";
Area += "<br>";
Area += "<p style='margin-left:30px;margin-right:30px;'>";
Area += "&nbsp;<button onclick='simpleReset()'>起始页</button>&nbsp;"; var start = ((wordPage-4)>=1)?wordPage-4:1;
var end = ((wordPage+4)<=pageNum)?(wordPage+4):pageNum; //alert(parseInt(wordPage+4+"")); if(start!=1)
{
Area += "&nbsp;...&nbsp;";
} for(var i=start;i<=end;++i)
{
Area += "&nbsp;<button onclick='XReset("+i+")'>"+i+"</button>&nbsp;";
} if(end!=pageNum)
{
Area += "&nbsp;...&nbsp;";
} Area += "&nbsp;<button onclick='XReset("+pageNum+")'>结束页</button>&nbsp;";
Area += "&nbsp;&nbsp;<b>选择页数跳转</b>&nbsp;&nbsp;";
Area += "<select id='selPage' onchange='makeSurePage()'>";
for(var i=1;i<=pageNum;++i)
{
Area += "<option value='"+i+"'>"+i+"</option>";
}
Area += "</select>";
Area += "</p>";
document.getElementById("MessageArea").innerHTML = Area;
surePage();
}
}
};
var url ="../com/servlet/ServletForAllKeyWords";
var server = "sql=";
// 按照词频顺序
if(sty==0)
{
server += " order by num ";
}
// 按照字母表顺序
else if(sty==1)
{
server += " order by word ";
} // 如果是降序
if(order==0)
{
server += " DESC ";
} server += (" Limit "+((wordPage-1)*30)+",30 "); xmlHttp.open("POST", url, true);
xmlHttp.setRequestHeader("Content-Type","application/x-www-form-urlencoded");
xmlHttp.send(server);
}
function toSomeWhere(word)
{
var Area = '';
Area += '<div class="row">';
Area += ' <div class="col-md-12">';
Area += ' <h2>'+word+'</h2>';
Area += ' </div>';
Area += '</div>';
Area += '<hr />';
Area += '<br>';
Area += '<div id="MessageArea">';
Area += '</div>';
document.getElementById("page-inner").innerHTML = Area; var xmlHttp = null;
try{
xmlHttp = new XMLHttpRequest();
} catch (e1) {
try {
xmlHttp = new ActiveXObject("Microsoft.XMLHTTP");
} catch (e2) {
alert("Your browser does not support XMLHTTP!");
return;
}
}
xmlHttp.onreadystatechange = function() {
if (xmlHttp.readyState == 4) {
if (xmlHttp.status == 200)
{
var Area = ""; s = xmlHttp.responseText;
var InformationSet = eval('('+s+')');
var word = InformationSet[1].word;
var num = InformationSet[1].num;
var exp = InformationSet[1].exp; Area += "<p><b id='word' style='font-size:120%;'>"+word+"</b></p>";
Area += "<p style='color:rgb(200,200,200);'>&nbsp;&nbsp;&nbsp;引用次数:"+num+"</p>"
Area += "<p style='font:\"楷体\";font-size:90%;'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;";
if(exp=="")
{
Area += "目前百度百科上并没有相关解释信息...";
}
else
{
Area += exp;
}
Area += "</p>";
Area += "<br>";
Area += "<div id='finalDIV'></div>"
document.getElementById("MessageArea").innerHTML = Area; getLinksForKey(word);
}
}
};
var url ="../com/servlet/ServletForAllKeyWords";
var server = "sql= where word='"+word+"'"; xmlHttp.open("POST", url, true);
xmlHttp.setRequestHeader("Content-Type","application/x-www-form-urlencoded");
xmlHttp.send(server);
}
function getLinksForKey(word)
{
var xmlHttp = null;
try{
xmlHttp = new XMLHttpRequest();
} catch (e1) {
try {
xmlHttp = new ActiveXObject("Microsoft.XMLHTTP");
} catch (e2) {
alert("Your browser does not support XMLHTTP!");
return;
}
}
xmlHttp.onreadystatechange = function() {
if (xmlHttp.readyState == 4) {
if (xmlHttp.status == 200)
{
var Area = "";
Area += "<br>";
Area += "<br>";
Area += "<b style='font-size:120%;'>引用网页:</b>";
Area += "<br>";
Area += "<br>";
Area += "<ul>";
s = xmlHttp.responseText;
var InformationSet = eval('('+s+')');
var leng = InformationSet[0].Length; for(var i=1;i<=leng;++i)
{
var word = InformationSet[i].word;
var num = InformationSet[i].num;
var title = InformationSet[i].title;
var link = InformationSet[i].link;
Area += "<li>";
Area += "<a href='"+link+"' title='引用次数:"+num+"'>"+title+"</a>"
Area += "</li>";
}
Area += "</ul>"; document.getElementById("finalDIV").innerHTML = Area;
}
}
};
var url ="../com/servlet/ServletForLinkData";
var server = "word="+word; xmlHttp.open("POST", url, true);
xmlHttp.setRequestHeader("Content-Type","application/x-www-form-urlencoded");
xmlHttp.send(server);
}
function surePage()
{
document.getElementById("selPage").selectedIndex = wordPage-1;
}
function makeSurePage()
{
wordPage = document.getElementById("selPage").value;
wordPage = parseInt(""+wordPage);
resetAndFresh();
}

word.js

  更新 web.xml 引用

 <?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://xmlns.jcp.org/xml/ns/javaee" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_4_0.xsd" id="WebApp_ID" version="4.0">
<display-name>HotWord</display-name>
<servlet>
<description>This is the description of my J2EE component</description>
<display-name>This is the display name of my J2EE component</display-name>
<servlet-name>ServletForWords</servlet-name>
<servlet-class>com.servlet.ServletForWords</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>ServletForWords</servlet-name>
<url-pattern>/com/servlet/ServletForWords</url-pattern>
</servlet-mapping>
<servlet>
<description>This is the description of my J2EE component</description>
<display-name>This is the display name of my J2EE component</display-name>
<servlet-name>ServletForAllKeyWords</servlet-name>
<servlet-class>com.servlet.ServletForAllKeyWords</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>ServletForAllKeyWords</servlet-name>
<url-pattern>/com/servlet/ServletForAllKeyWords</url-pattern>
</servlet-mapping>
<servlet>
<description>This is the description of my J2EE component</description>
<display-name>This is the display name of my J2EE component</display-name>
<servlet-name>ServletForLinkData</servlet-name>
<servlet-class>com.servlet.ServletForLinkData</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>ServletForLinkData</servlet-name>
<url-pattern>/com/servlet/ServletForLinkData</url-pattern>
</servlet-mapping>
<servlet>
<description>This is the description of my J2EE component</description>
<display-name>This is the display name of my J2EE component</display-name>
<servlet-name>ServletForMoreInfo</servlet-name>
<servlet-class>com.servlet.ServletForMoreInfo</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>ServletForMoreInfo</servlet-name>
<url-pattern>/com/servlet/ServletForMoreInfo</url-pattern>
</servlet-mapping>
<servlet>
<description>This is the description of my J2EE component</description>
<display-name>This is the display name of my J2EE component</display-name>
<servlet-name>ServletForKindKeyWords</servlet-name>
<servlet-class>com.servlet.ServletForKindKeyWords</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>ServletForKindKeyWords</servlet-name>
<url-pattern>/com/servlet/ServletForKindKeyWords</url-pattern>
</servlet-mapping>
<welcome-file-list>
<welcome-file>index.html</welcome-file>
<welcome-file>index.htm</welcome-file>
<welcome-file>index.jsp</welcome-file>
<welcome-file>default.html</welcome-file>
<welcome-file>default.htm</welcome-file>
<welcome-file>default.jsp</welcome-file>
</welcome-file-list>
</web-app>

web.xml

  更新 jsp 页面代码:

 <%@ page language="java" contentType="text/html; charset=utf-8"
pageEncoding="utf-8"%>
<!DOCTYPE html>
<html><!-- xmlns="http://www.w3.org/1999/xhtml" -->
<head>
<!--<meta charset="utf-8" />-->
<meta name="viewport" content="width=device-width, initial-scale=1.0" charset="utf-8"/>
<title>热词分析</title>
<!-- BOOTSTRAP STYLES-->
<link href="../assets/css/bootstrap.css" rel="stylesheet" />
<!-- FONTAWESOME STYLES-->
<link href="../assets/css/font-awesome.css" rel="stylesheet" />
<!-- CUSTOM STYLES-->
<link href="../assets/css/custom.css" rel="stylesheet" />
<!-- PERSONAL FONTS-->
<link href='../cssFiles/basic.css' rel='stylesheet' type='text/css' />
<!-- GOOGLE FONTS-->
<link href='http://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css' />
</head>
<script src="../jsFiles/jquery/jquery-3.4.1.min.js" charset="utf-8"></script>
<script src="../jsFiles/echarts/echarts.min.js" charset="utf-8"></script>
<script src="../jsFiles/echarts/echarts-wordcloud-master/dist/echarts-wordcloud.min.js" charset="utf-8"></script>
<!-- <script src="../jsFiles/echarts/echarts-wordcloud-master/dist/echarts-wordcloud.min.js" charset="utf-8"></script> -->
<script src="../jsFiles/basic.js" charset="utf-8"></script>
<script src='../jsFiles/echarts/echarts.simple.js'></script>
<script src="../jsFiles/word.js" charset="utf-8"></script>
<script src="../jsFiles/wordkind.js" charset="utf-8"></script>
<script src="../jsFiles/cloud.js" charset="utf-8"></script>
<body>
<div id="wrapper">
<div class="navbar navbar-inverse navbar-fixed-top">
<div class="adjust-nav">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".sidebar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand"><i class="fa fa-square-o "></i>&nbsp;欢迎您使用本热词分析系统</a>
</div>
</div>
</div>
<!-- /. NAV TOP -->
<div class="navbar-default navbar-side"> <!-- nav role="navigation" -->
<div class="sidebar-collapse">
<ul class="nav" id="main-menu">
<li class="text-center user-image-back">
<img src="../assets/img/find_user.png" class="img-responsive" />
</li>
<li>
<a href="#" onclick="makePageToMain()"><i class="fa fa-table "></i>主页</a>
</li>
<li>
<a href="#" onclick="makePageToWord()"><i class="fa fa-key "></i>全部热词</a>
</li>
<li>
<a href="#" onclick="makePageToKind()"><i class="fa fa-key "></i>热词目录</a>
</li>
<li>
<a href="#"><i class="fa fa-edit "></i>热词需求<span class="fa arrow"></span></a>
<ul class="nav nav-second-level">
<li>
<a href="#" onclick="makePageToCl()">热词云图</a>
</li>
<li>
<a href="#" onclick="makePageToRe()">热词关系图</a>
</li>
</ul>
</li>
</ul>
</div>
</div>
<!-- /. NAV SIDE -->
<div id="page-wrapper" >
<div id="page-inner">
<div class="row">
<div class="col-md-12">
<h2>主页</h2>
</div>
</div>
<!-- /. ROW -->
<hr />
<!-- /. ROW -->
<br>
<br>
<div id="MessageArea">
<br>
<h3>欢迎您使用本热词分析系统</h3>
</div>
</div>
<!-- /. PAGE INNER -->
</div>
<!-- /. PAGE WRAPPER -->
</div>
<!-- /. WRAPPER -->
<!-- SCRIPTS -AT THE BOTOM TO REDUCE THE LOAD TIME-->
<!-- JQUERY SCRIPTS -->
<script src="../assets/js/jquery-1.10.2.js"></script>
<!-- BOOTSTRAP SCRIPTS -->
<script src="../assets/js/bootstrap.min.js"></script>
<!-- METISMENU SCRIPTS -->
<script src="../assets/js/jquery.metisMenu.js"></script>
<!-- CUSTOM SCRIPTS -->
<script src="../assets/js/custom.js"></script>
</body>
</html>

index.jsp

  另外的部分我想了,还是分开写吧!

Python 爬取 热词并进行分类数据分析-[热词分类+目录生成]的更多相关文章

  1. Python 爬取 热词并进行分类数据分析-[热词关系图+报告生成]

    日期:2020.02.05 博客期:144 星期三 [本博客的代码如若要使用,请在下方评论区留言,之后再用(就是跟我说一声)] 所有相关跳转: a.[简单准备] b.[云图制作+数据导入] c.[拓扑 ...

  2. python 爬取豆瓣电影评论,并进行词云展示及出现的问题解决办法

    本文旨在提供爬取豆瓣电影<我不是药神>评论和词云展示的代码样例 1.分析URL 2.爬取前10页评论 3.进行词云展示 1.分析URL 我不是药神 短评 第一页url https://mo ...

  3. python爬取花木兰豆瓣影评,并进行词云分析

    前言 本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,如有问题请及时联系我们以作处理. PS:如有需要Python学习资料的小伙伴可以加点击下方链接自行获取 python免费学习资 ...

  4. Python 爬取 热词并进行分类数据分析-[云图制作+数据导入]

    日期:2020.01.28 博客期:136 星期二 [本博客的代码如若要使用,请在下方评论区留言,之后再用(就是跟我说一声)] 所有相关跳转: a.[简单准备] b.[云图制作+数据导入](本期博客) ...

  5. Python 爬取 热词并进行分类数据分析-[简单准备] (2020年寒假小目标05)

    日期:2020.01.27 博客期:135 星期一 [本博客的代码如若要使用,请在下方评论区留言,之后再用(就是跟我说一声)] 所有相关跳转: a.[简单准备](本期博客) b.[云图制作+数据导入] ...

  6. Python 爬取 热词并进行分类数据分析-[数据修复]

    日期:2020.02.01 博客期:140 星期六 [本博客的代码如若要使用,请在下方评论区留言,之后再用(就是跟我说一声)] 所有相关跳转: a.[简单准备] b.[云图制作+数据导入] c.[拓扑 ...

  7. Python 爬取 热词并进行分类数据分析-[解释修复+热词引用]

    日期:2020.02.02 博客期:141 星期日 [本博客的代码如若要使用,请在下方评论区留言,之后再用(就是跟我说一声)] 所有相关跳转: a.[简单准备] b.[云图制作+数据导入] c.[拓扑 ...

  8. Python 爬取 热词并进行分类数据分析-[拓扑数据]

    日期:2020.01.29 博客期:137 星期三 [本博客的代码如若要使用,请在下方评论区留言,之后再用(就是跟我说一声)] 所有相关跳转: a.[简单准备] b.[云图制作+数据导入] c.[拓扑 ...

  9. Python 爬取 热词并进行分类数据分析-[App制作]

    日期:2020.02.14 博客期:154 星期五 [本博客的代码如若要使用,请在下方评论区留言,之后再用(就是跟我说一声)] 所有相关跳转: a.[简单准备] b.[云图制作+数据导入] c.[拓扑 ...

随机推荐

  1. Go递归

    1. 递归介绍 package main import ( "fmt" ) func test(n int) { if n > 2 { n-- test(n) } fmt.P ...

  2. js-秒数转为XX时XX分XX秒(用于计算剩余时间或倒计时)

    export default { data() { return { hours: null, minute: null, second: null } }, methods: { // 秒数 转为 ...

  3. Django报错 The serializer field might be named incorrectly and not match any Got AttributeError when attempting to get a value for field `author_for` on serializer `KnownledgeBaseListSerializer`

    1.问题描述,在设置,model部分字段的serialier时,出现如下报错 字段如下: # 知识库List class KnownledgeBaseListSerializer(serializer ...

  4. Dataguard单机—>单机

    本演示案例所用环境: primary Standby OS Hostname CHINA-DB1 CHINA-DB2 OS Version SUSE Linux Enterprise Server 1 ...

  5. lintcode算法周竞赛

    ------------------------------------------------------------第七周:Follow up question 1,寻找峰值 寻找峰值 描述 笔记 ...

  6. Ubuntu 16 安装Nginx+Php+Mysql

    嗯哼,结束外派,我胡汉三又回来了,回来第一件事,就是重新装服务器,搭环境,以前用的apache,最近改了nginx,来吧,从头开始 因为以前一直用apache,这次换一个nginx试试. 1.更新系统 ...

  7. 吴裕雄 python 机器学习——模型选择学习曲线learning_curve模型

    import numpy as np import matplotlib.pyplot as plt from sklearn.svm import LinearSVC from sklearn.da ...

  8. MFC程序加打印(使用控制台)

    对于MFC界面编程,在调试过程常常希望时刻知道程序的运行状态,可以使用弹窗程序来进行显示,但这种操作非常的麻烦,因此可以考虑使用控制台程序,在控制台程序中添加输出信息.方法如下: 在stdafx.cp ...

  9. Spring的核心api和两种实例化方式

    一.spring的核心api Spring有如下的核心api BeanFactory :这是一个工厂,用于生成任意bean.采取延迟加载,第一次getBean时才会初始化Bean Applicatio ...

  10. C# FormData 文件太大报错404 Form表单上传大文件,无法进入后台Action,页面提示404.

    web.config中添加如下节点 <system.webServer> <security>      <requestFiltering >        &l ...