C#编写了一个基于Lucene.Net的搜索引擎查询通用工具类：SearchEngineUtil

　　最近由于工作原因，一直忙于公司的各种项目（大部份都是基于spring cloud的微服务项目），故有一段时间没有与大家分享总结最近的技术研究成果的，其实最近我一直在不断的深入研究学习Spring、Spring Boot、Spring Cloud的各种框架原理，同时也随时关注着.NET CORE的发展情况及最新技术点，也在极客时间上订阅相关的专栏，只要下班有空我都会去认真阅读观看，纸质书箱也买了一些，总之近一年都是在通过：微信技术公众号（.NET、JAVA、算法、前端等技术方向）、极客时间、技术书箱不断的吸取、借鉴他人之精华，从而不断的充实提高自己的技术水平，所谓：学如逆水行舟，不进则退，工作中学习，学习后工作中运用，当然写文章分享是一种总结，同时也是“温故而知新”的最佳应用。

　　前面废话说得有点多了，就直奔本文的主题内容，编写一个基于Lucene.Net的搜索引擎查询通用工具类：SearchEngineUtil，Lucene是什么，见百度百科 ,重点是:Lucene是一个全文检索引擎的架构，提供了完整的查询引擎和索引引擎，Lucene.NET是C#及.NET运行时下的另一种语言的实现，官网地址：http://lucenenet.apache.org/ ，具体用法就不多说了，官网以及网上都有很多，但由于Lucene.Net的原生SDK中的API比较复杂，用起来不太方便，故我进行了适当的封装，把常用的增、删、改、查（分页查）在保证灵活度的情况下进行了封装，使得操作Lucene.Net变得相对简单一些，代码本身也不复杂，贴出完整的SearchEngineUtil代码如下：

using Lucene.Net.Analysis.PanGu;

using Lucene.Net.Documents;

using Lucene.Net.Index;

using Lucene.Net.QueryParsers;

using Lucene.Net.Search;

using Lucene.Net.Store;

using NLog;

using PanGu;

using PanGu.HighLight;

using System;

using System.Collections.Generic;

using System.IO;

using System.Linq;

using System.Reflection;

using System.Text;

namespace CN.Zuowenjun.Blog.Common

{

    /// <summary>

    /// Lucene 搜索引擎实用工具类

    /// Author:zuowenjun

    /// </summary>

    public class SearchEngineUtil

    {

        /// <summary>

        /// 创建并添加索引记录

        /// </summary>

        /// <typeparam name="TIndex"></typeparam>

        /// <param name="indexDir"></param>

        /// <param name="indexData"></param>

        /// <param name="setDocFiledsAction"></param>

        public static void AddIndex<TIndex>(string indexDir, TIndex indexData, Action<Document, TIndex> setDocFiledsAction)

        {

            //创建索引目录

            if (!System.IO.Directory.Exists(indexDir))

            {

                System.IO.Directory.CreateDirectory(indexDir);

            }

            FSDirectory directory = FSDirectory.Open(new DirectoryInfo(indexDir), new NativeFSLockFactory());

            bool isUpdate = IndexReader.IndexExists(directory);

            if (isUpdate)

            {

                //如果索引目录被锁定（比如索引过程中程序异常退出），则首先解锁

                if (IndexWriter.IsLocked(directory))

                {

                    IndexWriter.Unlock(directory);

                }

            }

            using (IndexWriter writer = new IndexWriter(directory, new PanGuAnalyzer(), !isUpdate, IndexWriter.MaxFieldLength.UNLIMITED))

            {

                Document document = new Document();

                setDocFiledsAction(document, indexData);

                writer.AddDocument(document);

                writer.Optimize();//优化索引

            }

        }

        /// <summary>

        /// 删除索引记录

        /// </summary>

        /// <param name="indexDir"></param>

        /// <param name="keyFiledName"></param>

        /// <param name="keyFileValue"></param>

        public static void DeleteIndex(string indexDir, string keyFiledName, object keyFileValue)

        {

            FSDirectory directory = FSDirectory.Open(new DirectoryInfo(indexDir), new NativeFSLockFactory());

            if (!IndexReader.IndexExists(directory))

            {

                return;

            }

            using (IndexWriter iw = new IndexWriter(directory, new PanGuAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED))

            {

                iw.DeleteDocuments(new Term(keyFiledName, keyFileValue.ToString()));

                iw.Optimize();//删除文件后并非从磁盘中移除，而是生成一个.del的文件，需要调用Optimize方法来清除。在清除文件前可以使用UndeleteAll方法恢复

            }

        }

        /// <summary>

        /// 更新索引记录

        /// </summary>

        /// <param name="indexDir"></param>

        /// <param name="keyFiledName"></param>

        /// <param name="keyFileValue"></param>

        /// <param name="doc"></param>

        public static void UpdateIndex(string indexDir, string keyFiledName, object keyFileValue, Document doc)

        {

            FSDirectory directory = FSDirectory.Open(new DirectoryInfo(indexDir), new NativeFSLockFactory());

            if (!IndexReader.IndexExists(directory))

            {

                return;

            }

            using (IndexWriter iw = new IndexWriter(directory, new PanGuAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED))

            {

                iw.UpdateDocument(new Term(keyFiledName, keyFileValue.ToString()), doc);

                iw.Optimize();

            }

        }

        /// <summary>

        /// 是否存在指定的索引文档

        /// </summary>

        /// <param name="indexDir"></param>

        /// <param name="keyFiledName"></param>

        /// <param name="keyFileValue"></param>

        /// <returns></returns>

        public static bool ExistsDocument(string indexDir, string keyFiledName, object keyFileValue)

        {

            FSDirectory directory = FSDirectory.Open(new DirectoryInfo(indexDir), new NativeFSLockFactory());

            if (!IndexReader.IndexExists(directory))

            {

                return false;

            }

            var reader = IndexReader.Open(directory, true);

            return reader.DocFreq(new Term(keyFiledName, keyFileValue.ToString())) > 0;

        }

        /// <summary>

        /// 查询索引匹配到的记录

        /// </summary>

        /// <typeparam name="TResult"></typeparam>

        /// <param name="indexDir"></param>

        /// <param name="buildQueryAction"></param>

        /// <param name="getSortFieldsFunc"></param>

        /// <param name="buildResultFunc"></param>

        /// <param name="topCount"></param>

        /// <param name="needHighlight"></param>

        /// <returns></returns>

        public static List<TResult> SearchIndex<TResult>(string indexDir, Func<BooleanQuery, IDictionary<string, string>> buildQueryAction,

            Func<IEnumerable<SortField>> getSortFieldsFunc, Func<Document, TResult> buildResultFunc, bool needHighlight = true, int topCount = 0)

        {

            FSDirectory directory = FSDirectory.Open(new DirectoryInfo(indexDir), new NoLockFactory());

            if (!IndexReader.IndexExists(directory))

            {

                return new List<TResult>();

            }

            IndexReader reader = IndexReader.Open(directory, true);

            IndexSearcher searcher = new IndexSearcher(reader);

            BooleanQuery bQuery = new BooleanQuery();

            var keyWords = buildQueryAction(bQuery);

            Sort sort = null;

            var sortFields = getSortFieldsFunc();

            if (sortFields != null)

            {

                sort = new Sort();

                sort.SetSort(sortFields.ToArray());

            }

            topCount = topCount > 0 ? topCount : int.MaxValue;//当未指定TOP值，则设置最大值以表示获取全部

            TopDocs resultDocs = null;

            if (sort != null)

            {

                resultDocs = searcher.Search(bQuery, null, topCount, sort);

            }

            else

            {

                resultDocs = searcher.Search(bQuery, null, topCount);

            }

            if (topCount > resultDocs.TotalHits)

            {

                topCount = resultDocs.TotalHits;

            }

            Dictionary<string, PropertyInfo> highlightProps = null;

            List<TResult> results = new List<TResult>();

            if (resultDocs != null)

            {

                for (int i = 0; i < topCount; i++)

                {

                    Document doc = searcher.Doc(resultDocs.ScoreDocs[i].Doc);

                    var model = buildResultFunc(doc);

                    if (needHighlight)

                    {

                        model = SetHighlighter(keyWords, model, ref highlightProps);

                    }

                    results.Add(model);

                }

            }

            return results;

        }

        /// <summary>

        /// 分页查询索引匹配到的记录

        /// </summary>

        /// <typeparam name="TResult"></typeparam>

        /// <param name="indexDir"></param>

        /// <param name="buildQueryAction"></param>

        /// <param name="getSortFieldsFunc"></param>

        /// <param name="buildResultFunc"></param>

        /// <param name="pageSize"></param>

        /// <param name="page"></param>

        /// <param name="totalCount"></param>

        /// <param name="needHighlight"></param>

        /// <returns></returns>

        public static List<TResult> SearchIndexByPage<TResult>(string indexDir, Func<BooleanQuery, IDictionary<string, string>> buildQueryAction,

            Func<IEnumerable<SortField>> getSortFieldsFunc, Func<Document, TResult> buildResultFunc, int pageSize, int page, out int totalCount, bool needHighlight = true)

        {

            FSDirectory directory = FSDirectory.Open(new DirectoryInfo(indexDir), new NoLockFactory());

            if (!IndexReader.IndexExists(directory))

            {

                totalCount = 0;

                return new List<TResult>();

            }

            IndexReader reader = IndexReader.Open(directory, true);

            IndexSearcher searcher = new IndexSearcher(reader);

            BooleanQuery bQuery = new BooleanQuery();

            var keyWords = buildQueryAction(bQuery);

            Sort sort = null;

            var sortFields = getSortFieldsFunc();

            if (sortFields != null)

            {

                sort = new Sort();

                sort.SetSort(sortFields.ToArray());

            }

            TopScoreDocCollector docCollector = TopScoreDocCollector.Create(1, true);

            searcher.Search(bQuery, docCollector);

            totalCount = docCollector.TotalHits;

            if (totalCount <= 0) return null;

            TopDocs resultDocs = searcher.Search(bQuery, null, pageSize * page, sort);

            Dictionary<string, PropertyInfo> highlightProps = null;

            List<TResult> results = new List<TResult>();

            int indexStart = (page - 1) * pageSize;

            int indexEnd = indexStart + pageSize;

            if (totalCount < indexEnd) indexEnd = totalCount;

            if (resultDocs != null)

            {

                for (int i = indexStart; i < indexEnd; i++)

                {

                    Document doc = searcher.Doc(resultDocs.ScoreDocs[i].Doc);

                    var model = buildResultFunc(doc);

                    if (needHighlight)

                    {

                        model = SetHighlighter(keyWords, model, ref highlightProps);

                    }

                    results.Add(model);

                }

            }

            return results;

        }

        /// <summary>

        /// 设置结果高亮

        /// </summary>

        /// <typeparam name="T"></typeparam>

        /// <param name="dicKeywords"></param>

        /// <param name="model"></param>

        /// <param name="props"></param>

        /// <returns></returns>

        private static T SetHighlighter<T>(IDictionary<string, string> dicKeywords, T model, ref Dictionary<string, PropertyInfo> props)

        {

            SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<font color=\"red\">", "</font>");

            Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new Segment());

            highlighter.FragmentSize = 250;

            Type modelType = typeof(T);

            foreach (var item in dicKeywords)

            {

                if (!string.IsNullOrWhiteSpace(item.Value))

                {

                    if (props == null)

                    {

                        props = new Dictionary<string, PropertyInfo>();

                    }

                    if (!props.ContainsKey(item.Key))

                    {

                        props[item.Key] = modelType.GetProperty(item.Key, BindingFlags.IgnoreCase | BindingFlags.Public | BindingFlags.Instance);

                    }

                    var modelProp = props[item.Key];

                    if (modelProp.PropertyType == typeof(string))

                    {

                        string newValue = highlighter.GetBestFragment(item.Value, modelProp.GetValue(model).ToString());

                        if (!string.IsNullOrEmpty(newValue))

                        {

                            modelProp.SetValue(model, newValue);

                        }

                    }

                }

            }

            return model;

        }

        /// <summary>

        /// 拆分关键词

        /// </summary>

        /// <param name="keywords"></param>

        /// <returns></returns>

        public static string GetKeyWordsSplitBySpace(string keyword)

        {

            PanGuTokenizer ktTokenizer = new PanGuTokenizer();

            StringBuilder result = new StringBuilder();

            ICollection<WordInfo> words = ktTokenizer.SegmentToWordInfos(keyword);

            foreach (WordInfo word in words)

            {

                if (word == null)

                {

                    continue;

                }

                result.AppendFormat("{0}^{1}.0 ", word.Word, (int)Math.Pow(3, word.Rank));

            }

            return result.ToString().Trim();

        }

        /// <summary>

        /// 【辅助方法】创建盘古查询对象

        /// </summary>

        /// <param name="field"></param>

        /// <param name="keyword"></param>

        /// <returns></returns>

        public static Query CreatePanGuQuery(string field, string keyword, bool needSplit = true)

        {

            if (needSplit)

            {

                keyword = GetKeyWordsSplitBySpace(keyword);

            }

            QueryParser parse = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, field, new PanGuAnalyzer());

            parse.DefaultOperator = QueryParser.Operator.OR;

            Query query = parse.Parse(keyword);

            return query;

        }

        /// <summary>

        /// 【辅助方法】创建盘古多字段查询对象

        /// </summary>

        /// <param name="keyword"></param>

        /// <param name="fields"></param>

        /// <returns></returns>

        public static Query CreatePanGuMultiFieldQuery(string keyword, bool needSplit, params string[] fields)

        {

            if (needSplit)

            {

                keyword = GetKeyWordsSplitBySpace(keyword);

            }

            QueryParser parse = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, fields, new PanGuAnalyzer());

            parse.DefaultOperator = QueryParser.Operator.OR;

            Query query = parse.Parse(keyword);

            return query;

        }

    }

}

　里面除了使用了Lucene.Net nuget包，还单独引用了PanGu分词器及其相关组件，因为大多数情况下我们的内容会包含中文。如上代码就不再细讲了，注释得比较清楚了。下面贴出一些实际的用法：

创建索引：

 SearchEngineUtil.AddIndex(GetSearchIndexDir(), post, (doc, data) => BuildPostSearchDocument(data, doc));

        private Document BuildPostSearchDocument(Post post, Document doc = null)

        {

            if (doc == null)

            {

                doc = new Document();//创建Document

            }

            doc.Add(new Field("Id", post.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));

            doc.Add(new Field("Title", post.Title, Field.Store.YES, Field.Index.ANALYZED));

            doc.Add(new Field("Summary", post.Summary, Field.Store.YES, Field.Index.ANALYZED));

            doc.Add(new Field("CreateTime", post.CreateTime.ToString("yyyy/MM/dd HH:mm"), Field.Store.YES, Field.Index.NO));

            doc.Add(new Field("Author", post.IsOriginal ? (post.Creator ?? userQueryService.FindByName(post.CreateBy)).NickName : post.SourceBy, Field.Store.YES, Field.Index.NO));

            return doc;

        }

　删除索引：

 SearchEngineUtil.DeleteIndex(GetSearchIndexDir(), "Id", post.Id);

　更新索引：

SearchEngineUtil.UpdateIndex(GetSearchIndexDir(), "Id", post.Id, BuildPostSearchDocument(post));

　分页查询：

               var keyword = SearchEngineUtil.GetKeyWordsSplitBySpace("梦在旅途 中国梦");

                var searchResult = SearchEngineUtil.SearchIndexByPage(indexDir, (bQuery) =>

                {

                    var query = SearchEngineUtil.CreatePanGuMultiFieldQuery(keyword, false, "Title", "Summary");

                    bQuery.Add(query, Occur.SHOULD);

                    return new Dictionary<string, string> {

                    { "Title",keyword},{"Summary",keyword}

                    };

                }, () =>

                {

                    return new[] { new SortField("Id", SortField.INT, true) };

                }, doc =>

                {

                    return new PostSearchInfoDto

                    {

                        Id = doc.Get("Id"),

                        Title = doc.Get("Title"),

                        Summary = doc.Get("Summary"),

                        Author = doc.Get("Author"),

                        CreateTime = doc.Get("CreateTime")

                    };

                }, pageSize, pageNo, out totalCount);

其它的还有：判断索引中的指定文档记录存不存在、查询符合条件的索引文档等在此没有列出，大家有兴趣的可以COPY到自己的项目中测试一下。

这里可以看一下我在自己的项目中(个人全新改版的自己博客，还在开发中)应用搜索场景的效果：

最后说明的是：Lucene并不是一个完整的全文检索引擎，但了解它对于学习elasticsearch、solr还是有一定的帮助，目前一般应用于实际的生产项目中，多半是使用更高层的elasticsearch、solr。

(本文中的代码我是今年很早前就写好了，只是今天才分享出来)

我喜欢对一些常用的组件进行封装，比如过往封装有：

基于MongoDb官方C#驱动封装MongoDbCsharpHelper类（CRUD类）

基于RabbitMQ.Client组件实现RabbitMQ可复用的 ConnectionPool（连接池）

C#编写了一个基于Lucene.Net的搜索引擎查询通用工具类：SearchEngineUtil的更多相关文章

8 个基于 Lucene 的开源搜索引擎推荐
Lucene是一种功能强大且被广泛使用的搜索引擎,以下列出了8种基于Lucene的搜索引擎,你可以想象它们有多么强大. 1. Apache Solr Solr 是一个高性能,采用Java5开发,基于L ...
一个基于Asp.net MVC的博客类网站开源了！
背景说明: 大学时毕业设计作品,一直闲置在硬盘了,倒想着不如开源出来,也许会对一些人有帮助呢,而且个人觉得这个网站做得还是不错了,毕竟是花了不少心思,希望对你有所帮助. github地址:https: ...
一个基于C++11的单例模板类
#ifndef _SINGLETON_H_#define _SINGLETON_H_ template<typename T>class Singleton : public Uncopy ...
WebGIS中兴趣点简单查询、基于Lucene分词查询的设计和实现
文章版权由作者李晓晖和博客园共有,若转载请于明显处标明出处:http://www.cnblogs.com/naaoveGIS/. 1.前言兴趣点查询是指:输入框中输入地名.人名等查询信息后,地图上可 ...
一个基于Consul的.NET Leader选举类库
前段时间有传言说Consul将不能在我国继续使用,后被查明是因法律问题Vault企业版产品不能在国内销售.Valut和Consul都是HashiCorp公司的产品,并且都推出了开源版本,继续使用开源版 ...
聊聊基于Lucene的搜索引擎核心技术实践
最近公司用到了ES搜索引擎,由于ES是基于Lucene的企业搜索引擎,无意间在“聊聊架构”微信公众号里发现了这篇文章,分享给大家. 请点击链接:聊聊基于Lucene的搜索引擎核心技术实践
【Open Search产品评测】－－淘点点：基于OpenSearch，轻松实现一整套O2O类搜索解决方案
[Open Search产品评测]-- 淘点点:基于OpenSearch,轻松实现一整套O2O类搜索解决方案 [使用背景] 我们淘点点团队应该可以算是内网首批使用opensearch来搭建应用 ...
【转】发布一个基于NGUI编写的UI框架
发布一个基于NGUI编写的UI框架 1.加载,显示,隐藏,关闭页面,根据标示获得相应界面实例 2.提供界面显示隐藏动画接口 3.单独界面层级,Collider,背景管理 4.根据存储的导航信息完成界面 ...
artDialog是一个基于javascript编写的对话框组件，它拥有精致的界面与友好的接口
artDialog是一个基于javascript编写的对话框组件,它拥有精致的界面与友好的接口自适应内容 artDialog的特殊UI框架能够适应内容变化,甚至连外部程序动态插入的内容它仍然能自适应 ...

随机推荐

Zabbix图表中文乱码(包含Docker安装乱码)
目录 Zabbix 4.0 版本 Zabbix 3.0 版本 Zabbix 4.0 Docker 版本图表乱码问题解决文章github 地址: 点我最近在看 Zabbix 4.0 版本的官方文档 ...
.net core 3.0中的Json API
在.net core 3.0中,内置了一套新的json api,主要用于去除asp.net core对json.net的依赖,同时也提供了更好的性能(直接处理 UTF-8,而无需转码到 UTF-16) ...
禁止直接通过IP访问--->nginx
在nginx.conf 中添加 server{ listen 80 default_server; return 501; } 注: nginx加载include是按顺序,如果是文件夹,就是文件顺序, ...
Hashtable 负载因子Load Factor
负载因子(load factor),它用来衡量哈希表的空/满程度,一定程度上也可以体现查询的效率,计算公式为: The ratio of the number of elements in the ...
sqlserver the name is not a valid identifier error in function
参考资料:https://stackoverflow.com/questions/22008859/the-name-is-not-a-valid-identifier-error-in-functi ...
CefSharp F12打开DevTools查看console js和c#方法互相调用
转载地址: https://www.cnblogs.com/lonelyxmas/p/11010018.html winform嵌入chrome浏览器,修改项目属性生成平台为x86 1.nuget ...
Python【day 14-2】递归遍历文件夹
#需求遍历文件夹中所有的子文件夹及子文件--用递归实现 '''''' ''' 伪代码 1.遍历根目录--listdir for 得到第一级子文件夹(不包含子文件夹的子文件)和文件 2.判断是文件还是 ...
win10自动休眠解决方法
win10使用外接显示器时,总是过2分钟自动睡眠. 这是系统无人值守时睡眠时间的设定,默认是两分钟. 解决方法: 1.运行注册表管理器,win+r ,输入regedit.exe 2.定位到HKEY_L ...
Java网络编程 -- Netty中的ByteBuf
由于JDK中提供的ByteBuffer无法动态扩容,并且API使用复杂等原因,Netty中提供了ByteBuf.Bytebuf的API操作更加便捷,可以动态扩容,提供了多种ByteBuf的实现,以及高 ...
Java变量声明和赋值
Java的8种基础类型变量声明,在得到Java 11支持后会有新的语法糖基础数据类型一共有8种整数类型:byte.short.int和long 小数类型:float和double 字符类型:cha ...

C#编写了一个基于Lucene.Net的搜索引擎查询通用工具类：SearchEngineUtil

C#编写了一个基于Lucene.Net的搜索引擎查询通用工具类：SearchEngineUtil的更多相关文章

随机推荐

热门专题