项目代码:链接:http://pan.baidu.com/s/1qXVcfCw 密码:apw1

01 回顾索引

  定义:索引是对数据库表中一列或多列的值进行排序的一种结构

  目的:加快对数据库表中记录的查询

  特点:以空间换取时间,提高查询速度快

02 体验百度搜索与原理图

03 什么是Lucene

  Lucene是apache软件基金会发布的一个开放源代码的全文检索引擎工具包,由资深全文检索专家Doug Cutting所撰写,它是一个全文检索引擎的架构,提供了完整的创建索引和查询索引,以及部分文本分析的引擎,Lucene的目的是为软件开发人员提供一个简单易用的工具包,以方便在目标系统中实现全文检索的功能,或者是以此为基础建立起完整的全文检索引擎,Lucene在全文检索领域是一个经典的祖先,现在很多检索引擎都是在其基础上创建的,思想是相通的。

即:Lucene是根据关健字来搜索的文本搜索工具,只能在某个网站内部搜索文本内容,不能跨网站搜索

04 Lucene通常用在什么地方

  Lucece不能用在互联网搜索(即像百度那样),只能用在网站内部的文本搜索(即只能在CRM,RAX,ERP内部使用),但思想是相通的。

05 Lucene中存的什么内容

  Lucene中存的就是一系列的二进制压缩文件和一些控制文件,它们位于计算机的硬盘上,

这些内容统称为索引库,索引库有二部份组成:

(1)原始记录

存入到索引库中的原始文本,例如:小平非常的聪明

(2)词汇表

按照一定的拆分策略(即分词器)将原始记录中的每个字符拆开后,存入一个供将来搜索的表

06 为什么网站内部有些地方要用Lucene来索搜,确不全用SQL来搜索

  (1)SQL只能针对数据库表搜索,不能直接针对硬盘上的文本搜索

  (2)SQL没有相关度排名

  (3)SQL搜索结果没有关健字高亮显示

  (4)SQL需要数据库的支持,数据库本身需要内存开销较大,例如:Oracle

  (5)SQL搜索有时较慢,尤其是数据库不在本地时,超慢,例如:Oracle

07 书写代码使用Lucene的流程图

  创建索引库:

    1)  创建JavaBean对象

    2)  创建Docment对象

    3)  将JavaBean对象所有的属性值,均放到Document对象中去,属性名可以和JavaBean相同或不同

    4)  创建IndexWriter对象

    5)  将Document对象通过IndexWriter对象写入索引库中

    6)  关闭IndexWriter对象

  根据关键字查询索引库中的内容:

    1)  创建IndexSearcher对象

    2)  创建QueryParser对象

    3)  创建Query对象来封装关键字

    4)  用IndexSearcher对象去索引库中查询符合条件的前100条记录,不足100条记录的以实际为准

    5)  获取符合条件的编号

    6)  用indexSearcher对象去索引库中查询编号对应的Document对象

    7)  将Document对象中的所有属性取出,再封装回JavaBean对象中去,并加入到集合中保存,以备将之用

*****08 Lucene快速入门

  步一:创建javaweb工程,取名叫lucene-day01

  步二:导入Lucene相关的jar包

    lucene-core-3.0.2.jar【Lucene核心】

      lucene-analyzers-3.0.2.jar【分词器】

      lucene-highlighter-3.0.2.jar【Lucene会将搜索出来的字,高亮显示,提示用户】

     lucene-memory-3.0.2.jar【索引库优化策略】

  步三:创建包结构

        cn.itcast.javaee.lucene.entity

       cn.itcast.javaee.lucene.firstapp

       cn.itcast.javaee.lucene.secondapp

      cn.itcast.javaee.lucene.crud

        cn.itcast.javaee.lucene.fy

     cn.itcast.javaee.lucene.utils

   。。 。。 。

  步四:创建JavaBean类

  1. public class Article {
  2. private Integer id;//标题
  3. private String title;//标题
  4. private String content;//内容
  5. public Article(){}
  6. public Article(Integer id, String title, String content) {
  7. this.id = id;
  8. this.title = title;
  9. this.content = content;
  10. }
  11. public Integer getId() {
  12. return id;
  13. }
  14. public void setId(Integer id) {
  15. this.id = id;
  16. }
  17. public String getTitle() {
  18. return title;
  19. }
  20. public void setTitle(String title) {
  21. this.title = title;
  22. }
  23. public String getContent() {
  24. return content;
  25. }
  26. public void setContent(String content) {
  27. this.content = content;
  28. }
  29. }

  步五:创建FirstLucene.java类,编写createIndexDB()和findIndexDB()二个业务方法

  1. @Test
  2. public void createIndexDB() throws Exception{
  3. Article article = new Article(,"培训","传智是一个Java培训机构");
  4. Document document = new Document();
  5. document.add(new Field("id",article.getId().toString(),Store.YES,Index.ANALYZED));
  6. document.add(new Field("title",article.getTitle(),Store.YES,Index.ANALYZED));
  7. document.add(new Field("content",article.getContent(),Store.YES,Index.ANALYZED));
  8. Directory directory = FSDirectory.open(new File("E:/LuceneDBDBDBDBDBDBDBDBDB"));
  9. Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
  10. MaxFieldLength maxFieldLength = MaxFieldLength.LIMITED;
  11. IndexWriter indexWriter = new IndexWriter(directory,analyzer,maxFieldLength);
  12. indexWriter.addDocument(document);
  13. indexWriter.close();
  14. }

*****09 创建LuceneUtil工具类,使用反射,封装通用的方法

  1. public class LuceneUtil {
  2. private static Directory directory ;
  3. private static Analyzer analyzer ;
  4. private static Version version;
  5. private static MaxFieldLength maxFieldLength;
  6. static{
  7. try {
  8. directory = FSDirectory.open(new File("E:/LuceneDBDBDBDBDBDBDBDBDB"));
  9. version = Version.LUCENE_30;
  10. analyzer = new StandardAnalyzer(version);
  11. maxFieldLength = MaxFieldLength.LIMITED;
  12. } catch (Exception e) {
  13. throw new RuntimeException(e);
  14. }
  15. }
  16. public static Directory getDirectory() {
  17. return directory;
  18. }
  19. public static Analyzer getAnalyzer() {
  20. return analyzer;
  21. }
  22. public static Version getVersion() {
  23. return version;
  24. }
  25. public static MaxFieldLength getMaxFieldLength() {
  26. return maxFieldLength;
  27. }
  28. public static Document javabean2documemt(Object obj) throws Exception{
  29. Document document = new Document();
  30. Class clazz = obj.getClass();
  31. java.lang.reflect.Field[] reflectFields = clazz.getDeclaredFields();
  32. for(java.lang.reflect.Field field : reflectFields){
  33. field.setAccessible(true);
  34. String fieldName = field.getName();
  35. String init = fieldName.substring(,).toUpperCase();
  36. String methodName = "get" + init + fieldName.substring();
  37. Method method = clazz.getDeclaredMethod(methodName,null);
  38. String returnValue = method.invoke(obj,null).toString();
  39. document.add(new Field(fieldName,returnValue,Store.YES,Index.ANALYZED));
  40. }
  41. return document;
  42. }
  43. public static Object document2javabean(Document document,Class clazz) throws Exception{
  44. Object obj = clazz.newInstance();
  45. java.lang.reflect.Field[] reflectFields = clazz.getDeclaredFields();
  46. for(java.lang.reflect.Field field : reflectFields){
  47. field.setAccessible(true);
  48. String fieldName = field.getName();
  49. String fieldValue = document.get(fieldName);
  50. BeanUtils.setProperty(obj,fieldName,fieldValue);
  51. }
  52. return obj;
  53. }
  54. }

*****10 使用LuceneUtil工具类,重构FirstLucene.java为SecondLucene.java

  1. public class SecondLucene {
  2. @Test
  3. public void createIndexDB() throws Exception{
  4. Article article = new Article(,"Java培训","传智是一个Java培训机构");
  5. Document document = LuceneUtil.javabean2documemt(article);
  6. IndexWriter indexWriter = new IndexWriter(LuceneUtil.getDirectory(),LuceneUtil.getAnalyzer(),LuceneUtil.getMaxFieldLength());
  7. indexWriter.addDocument(document);
  8. indexWriter.close();
  9. }
  10. @Test
  11. public void findIndexDB() throws Exception{
  12. List<Article> articleList = new ArrayList<Article>();
  13. String keywords = "传";
  14. QueryParser queryParser = new QueryParser(LuceneUtil.getVersion(),"content",LuceneUtil.getAnalyzer());
  15. Query query = queryParser.parse(keywords);
  16. IndexSearcher indexSearcher = new IndexSearcher(LuceneUtil.getDirectory());
  17. TopDocs topDocs = indexSearcher.search(query,);
  18. for(int i=;i<topDocs.scoreDocs.length;i++){
  19. ScoreDoc scoreDoc = topDocs.scoreDocs[i];
  20. int no = scoreDoc.doc;
  21. Document document = indexSearcher.doc(no);
  22. Article article = (Article) LuceneUtil.document2javabean(document,Article.class);
  23. articleList.add(article);
  24. }
  25. for(Article article : articleList){
  26. System.out.println(article.getId()+":"+article.getTitle()+":"+article.getContent());
  27. }
  28. }
  29. }

*****11 使用LuceneUtil工具类,完成CURD操作

  1. public class LuceneCURD {
  2. @Test
  3. public void addIndexDB() throws Exception{
  4. Article article = new Article(,"培训","传智是一个Java培训机构");
  5. Document document = LuceneUtil.javabean2documemt(article);
  6. IndexWriter indexWriter = new IndexWriter(LuceneUtil.getDirectory(),LuceneUtil.getAnalyzer(),LuceneUtil.getMaxFieldLength());
  7. indexWriter.addDocument(document);
  8. indexWriter.close();
  9. }
  10. @Test
  11. public void updateIndexDB() throws Exception{
  12. Integer id = ;
  13. Article article = new Article(,"培训","广州传智是一个Java培训机构");
  14. Document document = LuceneUtil.javabean2documemt(article);
  15. Term term = new Term("id",id.toString());
  16. IndexWriter indexWriter = new IndexWriter(LuceneUtil.getDirectory(),LuceneUtil.getAnalyzer(),LuceneUtil.getMaxFieldLength());
  17. indexWriter.updateDocument(term,document);
  18. indexWriter.close();
  19. }
  20. @Test
  21. public void deleteIndexDB() throws Exception{
  22. Integer id = ;
  23. Term term = new Term("id",id.toString());
  24. IndexWriter indexWriter = new IndexWriter(LuceneUtil.getDirectory(),LuceneUtil.getAnalyzer(),LuceneUtil.getMaxFieldLength());
  25. indexWriter.deleteDocuments(term);
  26. indexWriter.close();
  27. }
  28. @Test
  29. public void deleteAllIndexDB() throws Exception{
  30. IndexWriter indexWriter = new IndexWriter(LuceneUtil.getDirectory(),LuceneUtil.getAnalyzer(),LuceneUtil.getMaxFieldLength());
  31. indexWriter.deleteAll();
  32. indexWriter.close();
  33. }
  34. @Test
  35. public void searchIndexDB() throws Exception{
  36. List<Article> articleList = new ArrayList<Article>();
  37. String keywords = "传智";
  38. QueryParser queryParser = new QueryParser(LuceneUtil.getVersion(),"content",LuceneUtil.getAnalyzer());
  39. Query query = queryParser.parse(keywords);
  40. IndexSearcher indexSearcher = new IndexSearcher(LuceneUtil.getDirectory());
  41. TopDocs topDocs = indexSearcher.search(query,);
  42. for(int i = ;i<topDocs.scoreDocs.length;i++){
  43. ScoreDoc scoreDoc = topDocs.scoreDocs[i];
  44. int no = scoreDoc.doc;
  45. Document document = indexSearcher.doc(no);
  46. Article article = (Article) LuceneUtil.document2javabean(document,Article.class);
  47. articleList.add(article);
  48. }
  49. for(Article article : articleList){
  50. System.out.println(article.getId()+":"+article.getTitle()+":"+article.getContent());
  51. }
  52. }
  53. }

*****12 使用使用Jsp + Js + Servlet + Lucene完成分页一,同步分页

  步一:创建ArticleDao.java类

  1. public class ArticleDao {
  2. public Integer getAllObjectNum(String keywords) throws Exception{
  3. QueryParser queryParser = new QueryParser(LuceneUtil.getVersion(),"content",LuceneUtil.getAnalyzer());
  4. Query query = queryParser.parse(keywords);
  5. IndexSearcher indexSearcher = new IndexSearcher(LuceneUtil.getDirectory());
  6. TopDocs topDocs = indexSearcher.search(query,);
  7. return topDocs.totalHits;
  8. }
  9. public List<Article> findAllObjectWithFY(String keywords,Integer start,Integer size) throws Exception{
  10. List<Article> articleList = new ArrayList<Article>();
  11. QueryParser queryParser = new QueryParser(LuceneUtil.getVersion(),"content",LuceneUtil.getAnalyzer());
  12. Query query = queryParser.parse(keywords);
  13. IndexSearcher indexSearcher = new IndexSearcher(LuceneUtil.getDirectory());
  14. TopDocs topDocs = indexSearcher.search(query,);
  15. int middle = Math.min(start+size,topDocs.totalHits);
  16. for(int i=start;i<middle;i++){
  17. ScoreDoc scoreDoc = topDocs.scoreDocs[i];
  18. int no = scoreDoc.doc;
  19. Document document = indexSearcher.doc(no);
  20. Article article = (Article) LuceneUtil.document2javabean(document,Article.class);
  21. articleList.add(article);
  22. }
  23. return articleList;
  24. }
  25. }

  步二:创建PageBean.java类

  1. public class PageBean {
  2. private Integer allObjectNum;
  3. private Integer allPageNum;
  4. private Integer currPageNum;
  5. private Integer perPageNum = ;
  6. private List<Article> articleList = new ArrayList<Article>();
  7. public PageBean(){}
  8. public Integer getAllObjectNum() {
  9. return allObjectNum;
  10. }
  11. public void setAllObjectNum(Integer allObjectNum) {
  12. this.allObjectNum = allObjectNum;
  13. if(this.allObjectNum % this.perPageNum == ){
  14. this.allPageNum = this.allObjectNum / this.perPageNum;
  15. }else{
  16. this.allPageNum = this.allObjectNum / this.perPageNum + ;
  17. }
  18. }
  19. public Integer getAllPageNum() {
  20. return allPageNum;
  21. }
  22. public void setAllPageNum(Integer allPageNum) {
  23. this.allPageNum = allPageNum;
  24. }
  25. public Integer getCurrPageNum() {
  26. return currPageNum;
  27. }
  28. public void setCurrPageNum(Integer currPageNum) {
  29. this.currPageNum = currPageNum;
  30. }
  31. public Integer getPerPageNum() {
  32. return perPageNum;
  33. }
  34. public void setPerPageNum(Integer perPageNum) {
  35. this.perPageNum = perPageNum;
  36. }
  37. public List<Article> getArticleList() {
  38. return articleList;
  39. }
  40. public void setArticleList(List<Article> articleList) {
  41. this.articleList = articleList;
  42. }
  43. }

  步三:创建ArticleService.java类

  1. public class ArticleService {
  2. private ArticleDao articleDao = new ArticleDao();
  3. public PageBean fy(String keywords,Integer currPageNum) throws Exception{
  4. PageBean pageBean = new PageBean();
  5. pageBean.setCurrPageNum(currPageNum);
  6. Integer allObjectNum = articleDao.getAllObjectNum(keywords);
  7. pageBean.setAllObjectNum(allObjectNum);
  8. Integer size = pageBean.getPerPageNum();
  9. Integer start = (pageBean.getCurrPageNum()-) * size;
  10. List<Article> articleList = articleDao.findAllObjectWithFY(keywords,start,size);
  11. pageBean.setArticleList(articleList);
  12. return pageBean;
  13. }
  14. }

  步四:创建ArticleServlet.java类

  1. public class ArticleServlet extends HttpServlet {
  2. public void doPost(HttpServletRequest request, HttpServletResponse response)throws ServletException, IOException {
  3. try {
  4. request.setCharacterEncoding("UTF-8");
  5. Integer currPageNum = Integer.parseInt(request.getParameter("currPageNum"));
  6. String keywords = request.getParameter("keywords");
  7. ArticleService articleService = new ArticleService();
  8. PageBean pageBean = articleService.fy(keywords,currPageNum);
  9. request.setAttribute("pageBean",pageBean);
  10. request.getRequestDispatcher("/list.jsp").forward(request,response);
  11. } catch (Exception e) {
  12. e.printStackTrace();
  13. }
  14. }
  15. }

  步五:导入EasyUI相关的js包的目录

  步六:在WebRoot目录下创建list.jsp

  1. <%@ page language="java" pageEncoding="UTF-8"%>
  2. <%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
  3. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  4. <html>
  5. <head>
  6. <link rel="stylesheet" href="themes/default/easyui.css" type="text/css"></link>
  7. <link rel="stylesheet" href="themes/icon.css" type="text/css"></link>
  8. <script type="text/javascript" src="js/jquery.min.js"></script>
  9. <script type="text/javascript" src="js/jquery.easyui.min.js"></script>
  10. <script type="text/javascript" src="locale/easyui-lang-zh_CN.js"></script>
  11. </head>
  12. <body>
  13.  
  14. <!-- 输入区 -->
  15. <form action="${pageContext.request.contextPath}/ArticleServlet?currPageNum=1" method="POST">
  16. 输入关健字:<input type="text" name="keywords" value="传智" maxlength=""/>
  17. <input type="button" value="提交"/>
  18. </form>
  19.  
  20. <!-- 显示区 -->
  21. <table border="" align="center" width="70%">
  22. <tr>
  23. <th>编号</th>
  24. <th>标题</th>
  25. <th>内容</th>
  26. </tr>
  27. <c:forEach var="article" items="${pageBean.articleList}">
  28. <tr>
  29. <td>${article.id}</td>
  30. <td>${article.title}</td>
  31. <td>${article.content}</td>
  32. </tr>
  33. </c:forEach>
  34. </table>
  35.  
  36. <!-- 分页组件区 -->
  37. <center>
  38. <div id="pp" style="background:#efefef;border:1px solid #ccc;width:600px"></div>
  39. </center>
  40. <script type="text/javascript">
  41. $("#pp").pagination({
  42. total:${pageBean.allObjectNum},
  43. pageSize:${pageBean.perPageNum},
  44. showPageList:false,
  45. showRefresh:false,
  46. pageNumber:${pageBean.currPageNum}
  47. });
  48. $("#pp").pagination({
  49. onSelectPage:function(pageNumber){
  50. $("form").attr("action","${pageContext.request.contextPath}/ArticleServlet?currPageNum="+pageNumber);
  51. $("form").submit();
  52. }
  53. });
  54. </script>
  55. <script type="text/javascript">
  56. $(":button").click(function(){
  57. $("form").submit();
  58. });
  59. </script>
  60. </body>
  61. </html>

  步六:在WebRoot目录下创建list2.jsp

  1. <%@ page language="java" pageEncoding="UTF-8"%>
  2. <%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
  3. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  4. <html>
  5. <head>
  6. <title>根据关键字分页查询所有信息</title>
  7. </head>
  8. <body>
  9.  
  10. <!-- 输入区 -->
  11. <form action="${pageContext.request.contextPath}/ArticleServlet" method="POST">
  12. <input id="currPageNOID" type="hidden" name="currPageNO" value="">
  13. <table border="" align="center">
  14. <tr>
  15. <th>输入关键字:</th>
  16. <th><input type="text" name="keywords" maxlength="" value="${requestScope.keywords}"/></th>
  17. <th><input type="submit" value="站内搜索"/></th>
  18. </tr>
  19. </table>
  20. </form>
  21.  
  22. <!-- 输出区 -->
  23. <table border="" align="center" width="60%">
  24. <tr>
  25. <th>编号</th>
  26. <th>标题</th>
  27. <th>内容</th>
  28. </tr>
  29. <c:forEach var="article" items="${requestScope.pageBean.articleList}">
  30. <tr>
  31. <td>${article.id}</td>
  32. <td>${article.title}</td>
  33. <td>${article.content}</td>
  34. </tr>
  35. </c:forEach>
  36. <!-- 分页条 -->
  37. <tr>
  38. <td colspan="" align="center">
  39. <a onclick="fy(1)" style="text-decoration:none;cursor:hand">
  40. 【首页】
  41. </a>
  42. <c:choose>
  43. <c:when test="${requestScope.pageBean.currPageNO+1<=requestScope.pageBean.allPageNO}">
  44. <a onclick="fy(${requestScope.pageBean.currPageNO+1})" style="text-decoration:none;cursor:hand">
  45. 【下一页】
  46. </a>
  47. </c:when>
  48. <c:otherwise>
  49. 下一页
  50. </c:otherwise>
  51. </c:choose>
  52. <c:choose>
  53. <c:when test="${requestScope.pageBean.currPageNO-1>0}">
  54. <a onclick="fy(${requestScope.pageBean.currPageNO-1})" style="text-decoration:none;cursor:hand">
  55. 【上一页】
  56. </a>
  57. </c:when>
  58. <c:otherwise>
  59. 上一页
  60. </c:otherwise>
  61. </c:choose>
  62. <a onclick="fy(${requestScope.pageBean.allPageNO})" style="text-decoration:none;cursor:hand">
  63. 【未页】
  64. </a>
  65. </td>
  66. </tr>
  67. </table>
  68.  
  69. <script type="text/javascript">
  70. function fy(currPageNO){
  71. document.getElementById("currPageNOID").value = currPageNO;
  72. document.forms[].submit();
  73. }
  74. </script>
  75.  
  76. </body>
  77. </html>

-------------------------------------------------------------------------------------------------------

13 索引库优化

  1)什么是索引库

  索引库是Lucene的重要的存储结构,它包括二部份:原始记录表,词汇表

  原始记录表:存放的是原始记录信息,Lucene为存入的内容分配一个唯一的编号

   词汇表:存放的是经过分词器拆分出来的词汇和该词汇在原始记录表中的编号

  2)为什么要将索引库进行优化

    在默认情况下,向索引库中增加一个Document对象时,索引库自动会添加一个扩展名叫*.cfs的二进制压缩文件,如果向索引库中存Document对象过多,那么*.cfs也会不断增加,同时索引库的容量也会不断增加,影响索引库的大小。

  3)索引库优化方案

    1.合并cfs文件,合并后的cfs文件是二进制压缩字符,能解决是的文件大小和数量的问题

      indexWriter.addDocument(document);indexWriter.optimize(); indexWriter.close();

    2设定合并因子,自动合并cfs文件,默认10个cfs文件合并成一个cfs文件

       indexWriter.addDocument(document);indexWriter.setMergeFactor(3);indexWriter.close();

    3.使用RAMDirectory,类似于内存索引库,能解决是的读取索引库文件的速度问题,它能以空换时,提高速度快,但不能持久保存,因此启动时加载硬盘中的索引库到内存中的索引库,退出时将内存中的索引库保存到硬盘中的索引库,且内容不能重复。

  1. Article article = new Article(1,"培训","传智是一家Java培训机构");
  2. Document document = LuceneUtil.javabean2document(article);
  3.  
  4. Directory fsDirectory = FSDirectory.open(new File("E:/indexDBDBDBDBDBDBDBDB"));
  5. Directory ramDirectory = new RAMDirectory(fsDirectory);
  6.  
  7. IndexWriter fsIndexWriter = new IndexWriter(fsDirectory,LuceneUtil.getAnalyzer(),true,LuceneUtil.getMaxFieldLength());
  8. IndexWriter ramIndexWriter = new IndexWriter(ramDirectory,LuceneUtil.getAnalyzer(),LuceneUtil.getMaxFieldLength());
  9.  
  10. ramIndexWriter.addDocument(document);
  11. ramIndexWriter.close();
  12.  
  13. fsIndexWriter.addIndexesNoOptimize(ramDirectory);
  14. fsIndexWriter.close();

14 分词器

  1)什么是分词器

    采用一种算法,将中英文本中的字符拆分开来,形成词汇,以待用户输入关健字后搜索

  2)为什么要分词器

    因为用户输入的搜索的内容是一段文本中的一个关健字,和原始表中的内容有差别,

但作为搜索引擎来讲,又得将相关的内容搜索出来,此时就得采用分词器来最大限度

匹配原始表中的内容

  3)分词器工作流程

  按分词器拆分出词汇-》去除停用词和禁用词-》如果有英文,把英文字母转为小写,即搜索不分大小写

  4)演示分词测试 

  1. private static void testAnalyzer(Analyzer analyzer, String text) throws Exception {
  2. System.out.println("当前使用的分词器:" + analyzer.getClass());
  3. TokenStream tokenStream = analyzer.tokenStream("content",new StringReader(text));
  4. tokenStream.addAttribute(TermAttribute.class);
  5. while (tokenStream.incrementToken()) {
  6. TermAttribute termAttribute = tokenStream.getAttribute(TermAttribute.class);
  7. System.out.println(termAttribute.term());
  8. }
  9. }

  5)使用第三方IKAnalyzer分词器--------中文首选

    需求:过滤掉上面例子中的“说”,“的”,“呀”,且将“传智播客”看成一个整体 关健字

  步一:导入IKAnalyzer分词器核心jar包,IKAnalyzer3.2.0Stable.jar

   步二:将IKAnalyzer.cfg.xml和stopword.dic和xxx.dic文件复制到MyEclipse的src目录下,

    再进行配置,在配置时,首行需要一个空行

15 搜索结果高亮

  什么是搜索结果高亮

在搜索结果中,将与关健字相同的字符用红色显示

  1. String keywords = "培训";
  2. List<Article> articleList = new ArrayList<Article>();
  3. QueryParser queryParser = new QueryParser(LuceneUtil.getVersion(),"content",LuceneUtil.getAnalyzer());
  4. Query query = queryParser.parse(keywords);
  5. IndexSearcher indexSearcher = new IndexSearcher(LuceneUtil.getDirectory());
  6. TopDocs topDocs = indexSearcher.search(query,1000000);
  7.  
  8. Formatter formatter = new SimpleHTMLFormatter("<font color='red'>","</font>");
  9. Scorer scorer = new QueryScorer(query);
  10. Highlighter highlighter = new Highlighter(formatter,scorer);
  11.  
  12. for(int i=0;i<topDocs.scoreDocs.length;i++){
  13. ScoreDoc scoreDoc = topDocs.scoreDocs[i];
  14. int no = scoreDoc.doc;
  15. Document document = indexSearcher.doc(no);
  16.  
  17. String highlighterContent = highlighter.getBestFragment(LuceneUtil.getAnalyzer(),"content",document.get("content"));
  18. document.getField("content").setValue(highlighterContent);
  19.  
  20. Article article = (Article) LuceneUtil.document2javabean(document,Article.class);
  21. articleList.add(article);
  22. }
  23. for(Article article : articleList){
  24. System.out.println(article);
  25. }
  26. }

16 搜索结果摘要

  1)什么是搜索结果搞要

如果搜索结果内容太多,我们只想显示前几个字符, 必须与高亮一起使用

  1. String keywords = "培训";
  2. List<Article> articleList = new ArrayList<Article>();
  3. QueryParser queryParser = new QueryParser(LuceneUtil.getVersion(),"content",LuceneUtil.getAnalyzer());
  4. Query query = queryParser.parse(keywords);
  5. IndexSearcher indexSearcher = new IndexSearcher(LuceneUtil.getDirectory());
  6. TopDocs topDocs = indexSearcher.search(query,1000000);
  7.  
  8. Formatter formatter = new SimpleHTMLFormatter("<font color='red'>","</font>");
  9. Scorer scorer = new QueryScorer(query);
  10. Highlighter highlighter = new Highlighter(formatter,scorer);
  11.  
  12. Fragmenter fragmenter = new SimpleFragmenter(4);
  13. highlighter.setTextFragmenter(fragmenter);
  14.  
  15. for(int i=0;i<topDocs.scoreDocs.length;i++){
  16. ScoreDoc scoreDoc = topDocs.scoreDocs[i];
  17. int no = scoreDoc.doc;
  18. Document document = indexSearcher.doc(no);
  19.  
  20. String highlighterContent = highlighter.getBestFragment(LuceneUtil.getAnalyzer(),"content",document.get("content"));
  21. document.getField("content").setValue(highlighterContent);
  22.  
  23. Article article = (Article) LuceneUtil.document2javabean(document,Article.class);
  24. articleList.add(article);
  25. }
  26. for(Article article : articleList){
  27. System.out.println(article);
  28. }
  29. }

*****17 搜索结果

  1)什么是搜索结果排序

    搜索结果是按某个或某些字段高低排序来显示的结果

  2)影响网站排名的先后的有多种

    head/meta/;网页的标签整洁;网页执行速度; 采用div+css。。。。。。

  3)Lucene中的显示结果次序与相关度得分有关:ScoreDoc.score;

    默认情况下,Lucene是按相关度得分排序的,得分高排在前,得分低排在后如果相关度得分相同,按插入索引库的先后次序排序

  4)Lucene中的设置相关度得分

  1. IndexWriter indexWriter = new IndexWriter(LuceneUtil.getDirectory(),LuceneUtil.getAnalyzer(),LuceneUtil.getMaxFieldLength());
  2. document.setBoost(20F);
  3. indexWriter.addDocument(document);
  4. indexWriter.close();

  5)Lucene中按单个字段排序

  1. Sort sort = new Sort(new SortField("id",SortField.INT,true));
  2. TopDocs topDocs = indexSearcher.search(query,null,1000000,sort);

  6)Lucene中按多个字段排序

  1. Sort sort = new Sort(new SortField("count",SortField.INT,true),new SortField("id",SortField.INT,true));
  2. TopDocs topDocs = indexSearcher.search(query,null,1000000,sort);

  在多字段排序中,只有第一个字段排序结果相同时,第二个字段排序才有作用 提倡用数值型排序

*****18 条件搜索

  1)什么是条件搜索  

    用关健字与指定的单列或多例进行匹配的搜索

  2)单字段条件搜索

    QueryParser queryParser = new QueryParser(LuceneUtil.getVersion(),"content",LuceneUtil.getAnalyzer());

  3)多字段条件搜索,项目中提倡多字段搜索

    QueryParser queryParser = new MultiFieldQueryParser(LuceneUtil.getVersion(),new String[]{"content","title"},LuceneUtil.getAnalyzer());

*****19 用第三方工具类,将Map<String,Object>转成JSON文本

   导入第三方jar包:

》commons-beanutils-1.7.0.jar

》commons-collections-3.1.jar

》commons-lang-2.5.jar

》commons-logging-1.1.1.jar

》ezmorph-1.0.3.jar

》json-lib-2.1-jdk15.jar

(1)JavaBean->JSON

》JSONArray jsonArray = JSONArray.fromObject(city);

》String jsonJAVA = jsonArray.toString();

(2)List<JavaBean>->JSON

》JSONArray jsonArray = JSONArray.fromObject(cityList);

》String jsonJAVA = jsonArray.toString();

(3)List<String>->JSON

》JSONArray jsonArray = JSONArray.fromObject(stringList);

》String jsonJAVA = jsonArray.toString();

(4)Map<String,Object>->JSON【重点】

  1. List<User> userList = new ArrayList<User>();
  2. userList.add(new User(100,"哈哈",1000));
  3. userList.add(new User(200,"呵呵",2000));
  4. userList.add(new User(300,"嘻嘻",3000));
  5.  
  6. Map<String,Object> map = new LinkedHashMap<String,Object>();
  7. map.put("total",userList.size());
  8. map.put("rows",userList);
  9.  
  10. JSONArray jsonArray = JSONArray.fromObject(map);
  11. String jsonJAVA = jsonArray.toString();
  12. System.out.println(jsonJAVA);
  13.  
  14. jsonJAVA = jsonJAVA.substring(1,jsonJAVA.length()-1);
  15. System.out.println(jsonJAVA);

*****20 用JSON文本动态创建DataGrid

  1. <table id="dg"></table>
  2. $('#dg').datagrid({
  3. url : 'data/datagrid_data.json',
  4. columns:[[
  5. {field:'code',title:'编号',width:100},
  6. {field:'name',title:'姓名',width:100},
  7. {field:'price',title:'薪水',width:100}
  8. ]]
  9. });

*****21 用Servlet返回JSON文本动态创建DataGrid

  1. <table id="dg"></table>
  2. $('#dg').datagrid({
  3. url : '/lucene-day02/JsonServlet',
  4. columns:[[
  5. {field:'code',title:'编号',width:100},
  6. {field:'name',title:'姓名',width:100},
  7. {field:'price',title:'薪水',width:100}
  8. ]]
  9. });
  10. Servlet:
  11. public void doPost(HttpServletRequest request, HttpServletResponse response)
  12. request.setCharacterEncoding("UTF-8");
  13.  
  14. Integer currPageNO = null;
  15. try {
  16. //DateGrid会向服务端传入page参数,表示第几页
  17. currPageNO = Integer.parseInt(request.getParameter("page"));
  18. } catch (Exception e) {
  19. currPageNO = 1;
  20. }
  21. //DateGrid会向服务端传入rows参数,表示几条记录
  22. //Integer rows = Integer.parseInt(request.getParameter("rows"));
  23. //System.out.println(currPageNO+":"+rows);
  24.  
  25. UserService userService = new UserService();
  26. PageBean pageBean = userService.fy(currPageNO);
  27.  
  28. Map<String,Object> map = new LinkedHashMap<String,Object>();
  29. map.put("total",pageBean.getAllRecordNO());
  30. map.put("rows",pageBean.getUserList());
  31.  
  32. JSONArray jsonArray = JSONArray.fromObject(map);
  33. String jsonJAVA = jsonArray.toString();
  34. jsonJAVA = jsonJAVA.substring(1,jsonJAVA.length()-1);
  35.  
  36. System.out.println(jsonJAVA);
  37. response.setContentType("text/html;charset=UTF-8");
  38. response.getWriter().write(jsonJAVA);
  39. response.getWriter().flush();
  40. response.getWriter().close();
  41.  
  42. }

*****22 使用Jsp + Jquery + EasyUI+ Servlet + Lucene,完成分页二,异步分页

  步一:创建ArticleDao.java类

  1. public class ArticleDao {
  2. public Integer getAllObjectNum(String keywords) throws Exception{
  3. QueryParser queryParser = new QueryParser(LuceneUtil.getVersion(),"content",LuceneUtil.getAnalyzer());
  4. Query query = queryParser.parse(keywords);
  5. IndexSearcher indexSearcher = new IndexSearcher(LuceneUtil.getDirectory());
  6. TopDocs topDocs = indexSearcher.search(query,3);
  7. return topDocs.totalHits;
  8. }
  9. public List<Article> findAllObjectWithFY(String keywords,Integer start,Integer size) throws Exception{
  10. List<Article> articleList = new ArrayList<Article>();
  11. QueryParser queryParser = new QueryParser(LuceneUtil.getVersion(),"content",LuceneUtil.getAnalyzer());
  12. Query query = queryParser.parse(keywords);
  13. IndexSearcher indexSearcher = new IndexSearcher(LuceneUtil.getDirectory());
  14. TopDocs topDocs = indexSearcher.search(query,100000000);
  15. int middle = Math.min(start+size,topDocs.totalHits);
  16. for(int i=start;i<middle;i++){
  17. ScoreDoc scoreDoc = topDocs.scoreDocs[i];
  18. int no = scoreDoc.doc;
  19. Document document = indexSearcher.doc(no);
  20. Article article = (Article) LuceneUtil.document2javabean(document,Article.class);
  21. articleList.add(article);
  22. }
  23. return articleList;
  24. }
  25. }

  步二:创建PageBean.java类

  1. public class PageBean {
  2. private Integer allObjectNum;
  3. private Integer allPageNum;
  4. private Integer currPageNum;
  5. private Integer perPageNum = 2;
  6. private List<Article> articleList = new ArrayList<Article>();
  7. public PageBean(){}
  8. public Integer getAllObjectNum() {
  9. return allObjectNum;
  10. }
  11. public void setAllObjectNum(Integer allObjectNum) {
  12. this.allObjectNum = allObjectNum;
  13. if(this.allObjectNum % this.perPageNum == 0){
  14. this.allPageNum = this.allObjectNum / this.perPageNum;
  15. }else{
  16. this.allPageNum = this.allObjectNum / this.perPageNum + 1;
  17. }
  18. }
  19. public Integer getAllPageNum() {
  20. return allPageNum;
  21. }
  22. public void setAllPageNum(Integer allPageNum) {
  23. this.allPageNum = allPageNum;
  24. }
  25. public Integer getCurrPageNum() {
  26. return currPageNum;
  27. }
  28. public void setCurrPageNum(Integer currPageNum) {
  29. this.currPageNum = currPageNum;
  30. }
  31. public Integer getPerPageNum() {
  32. return perPageNum;
  33. }
  34. public void setPerPageNum(Integer perPageNum) {
  35. this.perPageNum = perPageNum;
  36. }
  37. public List<Article> getArticleList() {
  38. return articleList;
  39. }
  40. public void setArticleList(List<Article> articleList) {
  41. this.articleList = articleList;
  42. }
  43. }

  步三:创建ArticleService.java类

  1. public class ArticleService {
  2. private ArticleDao articleDao = new ArticleDao();
  3. public PageBean fy(String keywords,Integer currPageNum) throws Exception{
  4. PageBean pageBean = new PageBean();
  5. pageBean.setCurrPageNum(currPageNum);
  6. Integer allObjectNum = articleDao.getAllObjectNum(keywords);
  7. pageBean.setAllObjectNum(allObjectNum);
  8. Integer size = pageBean.getPerPageNum();
  9. Integer start = (pageBean.getCurrPageNum()-1) * size;
  10. List<Article> articleList = articleDao.findAllObjectWithFY(keywords,start,size);
  11. pageBean.setArticleList(articleList);
  12. return pageBean;
  13. }
  14. }

步四:创建ArticleServlet.java类

  1. public class UserServlet extends HttpServlet {
  2. public void doPost(HttpServletRequest request, HttpServletResponse response)throws ServletException, IOException {
  3. try {
  4. //获取当前页号,默认1
  5. String strCurrPageNO = request.getParameter("page");
  6. if(strCurrPageNO == null){
  7. strCurrPageNO = "1";
  8. }
  9. Integer currPageNO = Integer.parseInt(strCurrPageNO);
  10. //获取关健字
  11. String keywords = request.getParameter("keywords");
  12. //创建业务对象
  13. UserService userService = new UserService();
  14. //调用业务层
  15. PageBean pageBean = userService.fy(keywords,currPageNO);
  16. //以下代码生成DateGrid需要的JSON文本
  17. Map<String,Object> map = new LinkedHashMap<String,Object>();
  18. //总记录数
  19. map.put("total",pageBean.getAllRecordNO());
  20. //该页显示的内容
  21. map.put("rows",pageBean.getUserList());
  22. JSONArray jsonArray = JSONArray.fromObject(map);
  23. String jsonJAVA = jsonArray.toString();
  24. jsonJAVA = jsonJAVA.substring(1,jsonJAVA.length()-1);
  25. //以下代码是将json文本输出到浏览器给DateGrid组件
  26. response.setContentType("text/html;charset=UTF-8");
  27. response.getWriter().write(jsonJAVA);
  28. response.getWriter().flush();
  29. response.getWriter().close();
  30. } catch (Exception e) {
  31. }
  32. }
  33. }

步五:导入EasyUI相关的js包的目录

  步六:在WebRoot目录下创建list.jsp

  1. <%@ page language="java" pageEncoding="UTF-8"%>
  2. <%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %>
  3. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  4. <html>
  5. <head>
  6. <link rel="stylesheet" href="themes/default/easyui.css" type="text/css"></link>
  7. <link rel="stylesheet" href="themes/icon.css" type="text/css"></link>
  8. <script type="text/javascript" src="js/jquery.min.js"></script>
  9. <script type="text/javascript" src="js/jquery.easyui.min.js"></script>
  10. <script type="text/javascript" src="locale/easyui-lang-zh_CN.js"></script>
  11. </head>
  12.  
  13. <body>
  14.  
  15. 输入姓名关健字:
  16. <input type="text" size="4px" id="name"/>
  17. <input type="button" value="搜索" id="find"/>
  18.  
  19. <table id="dg" style="width:500px"></table>
  20.  
  21. <script type="text/javascript">
  22. //定位"搜索"按钮,同时添加单击事件
  23. $("#find").click(function(){
  24. //获取用户名
  25. var name = $("#name").val();
  26. //去二边的空格
  27. name = $.trim(name);
  28. //加载最新数据
  29. $("#dg").datagrid("load",{
  30. "keywords" : name
  31. });
  32. });
  33. </script>
  34.  
  35. <script type="text/javascript">
  36. //动态创建表格
  37. $("#dg").datagrid({
  38. url:'${pageContext.request.contextPath}/UserServlet?id=' + new Date().getTime(),
  39. fitColumns : true,
  40. singleSelect : true,
  41. columns:[[
  42. {field:'id',title:'编号',width:100,align:'center'},
  43. {field:'name',title:'姓名',width:100,align:'center'},
  44. {field:'sal',title:'薪水',width:100,align:'center'}
  45. ]],
  46. pagination : true,
  47. pageNumber : 1,
  48. pageSize : 2,
  49. pageList:[2]
  50. });
  51. </script>
  52.  
  53. </body>
  54.  
  55. </html>

-------------------------------------------------------------------------------------------------------

QueryParser queryParser = new
QueryParser(LuceneUtil.getVersion(),"content",LuceneUtil.getAnalyzer());

java深入探究14-lucene的更多相关文章

  1. 2018面向对象程序设计(Java)第14周学习指导及要求

    2018面向对象程序设计(Java)第14周学习指导及要求(2018.11.29-2018.12.2)   学习目标 (1) 掌握GUI布局管理器用法: (2) 掌握各类Java Swing组件用途及 ...

  2. Java 集合系列14之 Map总结(HashMap, Hashtable, TreeMap, WeakHashMap等使用场景)

    概要 学完了Map的全部内容,我们再回头开开Map的框架图. 本章内容包括:第1部分 Map概括第2部分 HashMap和Hashtable异同第3部分 HashMap和WeakHashMap异同 转 ...

  3. Java 集合系列 14 hashCode

    java 集合系列目录: Java 集合系列 01 总体框架 Java 集合系列 02 Collection架构 Java 集合系列 03 ArrayList详细介绍(源码解析)和使用示例 Java ...

  4. Apache Solr采用Java开发、基于Lucene的全文搜索服务器

    http://docs.spring.io/spring-data/solr/ 首先介绍一下solr: Apache Solr (读音: SOLer) 是一个开源.高性能.采用Java开发.基于Luc ...

  5. Java数据库设计14个技巧

    Java数据库设计14个技巧   1. 原始单据与实体之间的关系 可以是一对一.一对多.多对多的关系.在一般情况下,它们是一对一的关系:即一张原始单据对应且只对应一个实体.在特殊情况下,它们可能是一对 ...

  6. JAVA自学笔记14

    JAVA自学笔记14 1.正则表达式 1)是指一个用来描述或者匹配一系列符合某个句法规则的字符串的单个字符串.其实就是一种规则.有自己的特殊应用 2)组成规则: 规则字符在java.util.rege ...

  7. 【Java】-NO.14.Java.4.Java.1.001-【Java JUnit 5 】-

    1.0.0 Summary Tittle:[Java]-NO.14.Java.4.Java.1.001-[Java JUnit 5 ]- Style:Java Series:JUnit Since:2 ...

  8. Java设计模式(14)责任链模式(Chain of Responsibility模式)

    Chain of Responsibility定义:Chain of Responsibility(CoR) 是用一系列类(classes)试图处理一个请求request,这些类之间是一个松散的耦合, ...

  9. 面向对象程序设计(JAVA) 第14周学习指导及要求

    2019面向对象程序设计(Java)第14周学习指导及要求 (2019.11.29-2019.12.2)   学习目标 (1)掌握GUI布局管理器用法: (2)掌握Java Swing文本输入组件用途 ...

随机推荐

  1. 粗略。。。java设计模式总结。。。studying

    设计模式5--DAO(data access object) 1)把本机内存数据保存到指定目录中 2)把本机指定目录中数据读取到内存中 设计模式4--装饰模式 1)加强某个类的功能,并把该类加到加强类 ...

  2. FileToolkit 文件工具箱

    import org.apache.commons.io.FileUtils; import org.apache.commons.io.filefilter.*; import org.apache ...

  3. 41、Android中当数据库需要更新时我们该怎么办?

    转载  http://blog.csdn.net/jiangwei0910410003/article/details/39670813

  4. splay tree成段更新,成段查询poj3466

    线段树入门题,换成splay tree 来搞搞. #include <stdio.h> #include <string.h> #include <algorithm&g ...

  5. 【BZOJ1280】Emmy卖猪pigs 最大流

    [BZOJ1280]Emmy卖猪pigs Description Emmy在一个养猪场工作.这个养猪场有M个锁着的猪圈,但Emmy并没有钥匙.顾客会到养猪场来买猪,一个接着一个.每一位顾客都会有一些猪 ...

  6. IOS 代码块

    1.关系式表示 <returnType>(^BlockName)(list of arguments)=^(arguments){body;};

  7. junit5荟萃知识点(一):junit5的组成及安装

    1.什么是junit5? 和之前的junit版本不一样,junit5是由三个模块组成. JUnit 5 = JUnit Platform + JUnit Jupiter + JUnit Vintage ...

  8. JS基础知识简介

    使用js的三种方式 1.HTML标签内嵌js <button onclick="javascript:alert(真点啊)">有本事点我</button> ...

  9. django实现密码非加密的注册(数据对象插入)

    数据模型 from django.db import models class userinfo(models.Model): username = models.CharField(max_leng ...

  10. HTML 之 Table 表格详解

    HTML 之 Table 表格详解 HTML中的table可以大致分为三个部分: thead ---------表格的页眉 tbody ---------表格的主体 tfoot ---------定义 ...