How to Build a Search Page with Elasticsearch and .NET
Although SQL Server's Full-Text search is good for searching text that is within a database, there are better ways of implementing search if the text is less-well structured, or comes from a wide variety of sources or formats. Ryszard takes ElasticSearch, and seven million questions from StackOverflow, in order to show you how to get started with one of the most popular search engines around.
We need search engines to query and analyse the massive amounts of data that many organizations are required to access: We have no great problem in storing it but how can we then find what we need? Large organizations store many types of structured and unstructured content such as documents in different formats, e-mails, CMS pages or Microsoft Office files. They want their employees and clients to be able to search and analyze it through one user interface.
At the same time, Internet users, who are used to Google-like search, expect every bespoke search to be as fast and precise: They need autocomplete, they assume that the search tolerates misspellings, and they expect to be able to use filters and many other advanced search features.
Many .NET developers might now ask: 'Why would we need other search engines when we are happy with SQL Server's Full-Text Search feature?' The answer is that it might be enough for simple searches, but other search engines are a better choice when we need to index and search unstructured data from different sources or when we need custom functionality such as spellchecking, hit-highlighting, autocomplete or advanced scoring.
This is where search engines come into play. In order to get more familiar with the way they work, I will show how to build a search page that will query the dump of StackOverflow questions. The dump has considerable amount of data (7 million questions) and it is easy for developers to test the relevance of the search results. The search page will have following features:
- Full text search
- Grouping by tags
- Autocomplete
Why Elasticsearch?
Elasticsearch is an open source search engine, written in Java and based on Lucene. It is currently the most popular search engine.
It offers greater scalability than SQL Server's full-text search: After all, Stack Exchange initially grew on SQL Server Full-Text Search, but the limitations of its feature and performance forced them migrate to Elasticsearch for their search requirements.
I decided to test Elasticsearch because it does not require that we create an up-front schema file and it exposes Web-friendly APIs (REST and JSON).
NEST
To interact with Elasticsearch, we will use NEST 2.3.0 which is one of two official .NET clients for Elasticsearch. NEST is a high-level client which maps closely to Elasticsearch API. All the request and response objects have been mapped. NEST provides the alternatives of either a fluent syntax for building queries, which resembles structure of raw JSON requests to API, or the use of object initializer syntax.
In order to build a web page, I will use Single Page Application (SPA) approach with AngularJS as MVVM framework. The client side will make AJAX requests to ASP.NET Web API 2. The Web API 2 controller will use NEST to communicate with Elasticsearch.
Code snippets in this article will only show the service implementation. Web API 2 code is just a boilerplate code so I decided to skip it, as well as the AngularJS code which you can replace with your favourite UI framework. The whole code is available on GitHub.
The page after applying HTML and styles may look like this:

Installation of Elasticsearch
Elasticsearch is very easy to install. Just go to its web page, download an installer, unzip it and install in three simple steps. Once it is installed, Elasticsearch should be available by default underhttp://localhost:9200.
It exposes a HTTP API so it is possible to use cURL to make requests but I recommend using Sensewhich is Chrome extension. Sense offers syntax highlighting, autocomplete, formatting and code folding. The Elasticsearch reference contains samples in cURL format: for example the request to get high level statistics for all our indices looks like this:
|
curl localhost:9200/_stats |
but Sense offers a nice copy and paste feature that translates cURL requests to the proper Sense syntax:
|
GET /_stats |
Search index population
Elasticsearch is document-oriented, meaning that it stores entire documents in its index. First of all we need to create a client to communicate with Elasticsearch.
|
var node = new Uri("http://localhost:9200"); var settings = new ConnectionSettings(node); settings.DefaultIndex("stackoverflow"); var client = new ElasticClient(settings); |
Next, let's create a class representing our document.
|
public class Post { public string Id { get; set; } public DateTime? CreationDate { get; set; } public int? Score { get; set; } public int? AnswerCount { get; set; } public string Body { get; set; } public string Title { get; set; } [String(Index = FieldIndexOption.NotAnalyzed)] public IEnumerable<string> Tags { get; set; } [Completion] public IEnumerable<string> Suggest { get; set; } } |
Although Elasticsearch is able to dynamically resolve the document type and its fields at index time, you can override field mappings or use attributes on fields in order to provide for more advanced usages. In this example we decorated our POCO class with some attributes (which I explain later) so we need to create mappings with AutoMap.
|
var indexDescriptor = new CreateIndexDescriptor(stackoverflow) .Mappings(ms => ms .Map<Post>(m => m.AutoMap())); |
Then, we can create our index called and put the mappings.
|
client.CreateIndex("stackoverflow", i => indexDescriptor); |
Now that we have defined our mappings and created an index, we can seed it with documents. Elasticsearch does not offer any handler to import specific file formats such as XML or CSV, but because it has client libraries for different languages, it is easy to build our own importer. AsStackOverflow dump is in XML format, we will use .NET XmlReader class to read question rows, map them to an instance of Post and add objects to the collection. The field Suggest should be also populated with the same values as Tags, which will be explained later in the article.
Next, we need to iterate over batches of 1-10k objects and call the IndexMany method on the client:
|
int batch = 1000; IEnumerable<Post> data = LoadPostsFromFile(path); foreach (var batches in data.Batch(batch)) { client.IndexMany<Post>(batches, "stackoverflow"); } |
On my machine, i7 quad core with 16GB RAM and HDD drive, it took around two seconds to index each batch. Depending on size and structure of documents, you can increase batch size until the performance drops drastically.
Full text search
Now that our document database is populated, let's define the search service interface:
|
public interface ISearchService<T> { SearchResult<T> Search(string query, int page, int pageSize); SearchResult<Post> SearchByCategory(string query, IEnumerable<string> tags, int page, int pageSize); IEnumerable<string> Autocomplete(string query, int count); |
and a search result class:
|
public class SearchResult<T> { public int Total { get; set; } public int Page { get; set; } public IEnumerable<T> Results { get; set; } public int ElapsedMilliseconds { get; set; } } |
The search method will execute the multi match query against user input. The multi match query is useful when we want to run the query against multiple fields. By using this, we can see how relevant the Elasticsearch results are with the default configuration.
First of all we need to call the parent Query method that is a container for any specific query we want to execute. Next, we call the MultiMatch method which calls the Query method with the actual search phrase as a parameter and a list of fields that we want to search against. In our case these are: Title,Body, and Tags.
|
var result = client.Search(x => x // use search method .Query(q => q // define query .MultiMatch(mp => mp // of type MultiMatch .Query(query) // pass text .Fields(f => f // define fields to search against .Fields(f1 => f1.Title, f2 => f2.Body, f3 => f3.Tags)))) .From(page - 1) // apply paging .Size(pageSize)); // limit to page size return new SearchResult<Post> { Total = (int)result.Total, Page = page, Results = result.Documents, ElapsedMilliseconds = result.Took }; |
The raw request to Elasticsearch will look like:
|
GET stackoverflow/post/_search { "query": { "multi_match": { "query": "elasticsearch", "fields": ["title","body","tags"] } } } |
Grouping by tags
Once our search returns results, we will group them by tags so that users can refine their search. To group result by categories, we will use the bucket aggregations. They allow as to compose bucket of documents which falls into given criterion or not. As we want to aggregate by tags, which is a text field, we will use the term aggregations.
Let's look at attribute on the Tags field
|
[String(Index = FieldIndexOption.NotAnalyzed)] public IEnumerable<string> Tags { get; set; } |
It tells Elasticsearch to neither analyze nor process the input, and to search against the field. It would store values as they are. Thanks to that, it would not change 'unit-testing' tag to 'unit' and 'testing' etc.
Now, we can extend the search result class with a dictionary containing the tag name and the number of posts decorated with this tag.
|
public Dictionary<string, long> AggregationsByTags { get; set; } |
Next, we need to add Aggregation, of type Term, to our query and give it a name.
|
var result = client.Search<Post>(x => x .Query(q => q .MultiMatch(mp => mp .Query(query) .Fields(f => f .Fields(f1 => f1.Title, f2 => f2.Body, f3 => f3.Tags)))) .Aggregations(a => a // aggregate results .Terms("by_tags", t => t // use term aggregations and name it .Field(f => f.Tags) // on field Tags .Size(10))) // limit aggregation buckets .From(page - 1) .Size(pageSize)); |
The search results now contain aggregation results so we use the newly-added field to return it back to the caller:
|
AggregationsByTags = result.Aggs.Terms("by_tags").Items .ToDictionary(x => x.Key, y => y.DocCount) |
The next step is to allow users to select one or more tags and use them as a filter. Let's add a new method to the interface. It will enable us to pass the selected tags to the search method.
|
SearchResult<Post> SearchByCategory(string query, IEnumerable<string> tags, int page, int pageSize); |
In the method implementation, first of all we need to map the tags into an array of filters.
|
var filters = tags .Select(c => new Func<FilterDescriptor<Post>, FilterContainer>(x => x .Term(f => f.Tags, c))); |
Then, we need to build our search as a bool query. Bool queries combine multiple queries with must orshould clauses. The queries inside clauses will be used for searching documents and applying a relevance score to them.
Then we can append a Filter clause which also contains a Bool query which filters the result set.
|
var result = client.Search<Post>(x => x .Query(q => q .Bool(b => b .Must(m => m // apply clause that must match .MultiMatch(mp => mp // our initial search query .Query(query) .Fields(f => f .Fields(f1 => f1.Title, f2 => f2.Body, f3 => f3.Tags)))) .Filter(f => f // apply filter on the results .Bool(b1 => b1 .Must(filters))))) // with array of filters .Aggregations(a => a .Terms("by_tags", t => t .Field(f => f.Tags) .Size(10))) .From(page - 1) .Size(pageSize)); |
The aggregations work in the scope of a query so they return a number of documents in a filtered set.
Autocomplete
One of the features that we frequently use in search forms is autocomplete, sometimes called 'typeahead' or 'search as you type'.

Searching big sets of text data by only a few characters is not a trivial task. Elasticsearch provides us with the completion suggester which works on a special field that is indexed in a way that enables very fast searching.
We need to decide which field or fields we want autocomplete to operate on and what results will be suggested. Elasticsearch enables us to define both input and output so, for example, user text can be searched against title or author and return a term or even the whole post or subset of its fields.
For simplicity, in our case we will search user input against the tags and display matched tags as well. It will work as a dictionary of tags. That is why we decorated our Post class with a special attribute.
|
[Completion] public IEnumerable<string> Suggest { get; set; } |
The field decorated with Completion may contain Input, Output, Payload (which can store any arbitrary object) and Weight that ranks suggestions. We will use only mandatory input so the type will be a collection of strings.
Now we can implement an autocomplete method:
|
var result = client.Suggest<Post>(x => x // use suggest method .Completion("tag-suggestions", c => c // use completion suggester and name it .Text(query) // pass text .Field(f => f.Suggest) // work against completion field .Size(count))); // limit number of suggestions return result.Suggestions["tag-suggestions"].SelectMany(x => x.Options) .Select(y => y.Text); |
The method will return a collection of terms that match the query. The result of particular suggestion is a collection of suggestion options. We may order them by frequency or weight (which we did not defined) and return the suggested text.
Summary
This article demonstrated how to build a full text search functionality that includes grouping results by tags and an autocomplete feature.
We have seen that the installation and configuration of Elasticsearch is very easy. The default configuration options are just right to start working with. Elasticsearch does not need a schema file and exposes a friendly JSON-based HTTP API for its configuration, index-population, and searching. The engine is optimized to work with large amount of data.
We used a high-level .NET client to communicate with Elasticsearch so it fits nicely in .NET project. It allowed us to define our index using POCO classes with little configuration work. We also choose to use a fluent syntax to build queries, but object initializer syntax is also available.
Finally, we have extended our search with two functionalities with not much effort. Having implemented a search service, we can now hook it up with either Web API with AngularJS or ASP.NET MVC.
Elasticsearch is an advanced search engine with many features and its own query DSL. Before we can build a production search site, we would require more analysis on how to store and query our data and fine tuning queries but I hope this article helped you to learn the basics in examples.
How to Build a Search Page with Elasticsearch and .NET的更多相关文章
- [elastic search][redis] 初试 ElasticSearch / redis
现有项目组,工作需要. http://www.cnblogs.com/xing901022/p/4704319.html Elastic Search权威指南(中文版) https://es.xiao ...
- 初识Elastic search—附《Elasticsearch权威指南—官方guide的译文》
本文作为Elastic search系列的开篇之作,简要介绍其简要历史.安装及基本概念和核心模块. 简史 Elastic search基于Lucene(信息检索引擎,ES里一个index—索引,一个索 ...
- full text search
definition https://www.techopedia.com/definition/17113/full-text-search A full-text search is a comp ...
- Elasticsearch 常用基本查询
安装启动很简单,参考官网步骤:https://www.elastic.co/downloads/elasticsearch 为了介绍Elasticsearch中的不同查询类型,我们将对带有下列字段的文 ...
- ElasticSearch 常用查询语句
为了演示不同类型的 ElasticSearch 的查询,我们将使用书文档信息的集合(有以下字段:title(标题), authors(作者), summary(摘要), publish_date(发布 ...
- 常用ElasticSearch 查询语句
为了演示不同类型的 ElasticSearch 的查询,我们将使用书文档信息的集合(有以下字段:title(标题), authors(作者), summary(摘要), publish_date(发布 ...
- ElasticSearch 7.X版本19个常用的查询语句
整理一篇常用的CRUD查询语句,之前这篇文件是在17年左右发表的,从英文翻译过来,现在采用7.x 版本进行实验,弃用的功能或者参数,我这边会进行更新,一起来学习吧. 为了演示不同类型的 Elastic ...
- 常用的Elasticseaerch检索技巧汇总
本篇博客是对前期工作中遇到ES坑的一些小结,顺手记录下,方便日后查阅. 0.前言 为了讲解不同类型ES检索,我们将要对包含以下类型的文档集合进行检索: . title 标题: . authors 作者 ...
- Elasticsearch 5.0 安装 Search Guard 5 插件 (五)
一.Search Guard 简介 Search Guard 是 Elasticsearch 的安全插件.它为后端系统(如LDAP或Kerberos)提供身份验证和授权,并向Elasticsearc ...
随机推荐
- @RequestParam @RequestBody @PathVariable 等参数绑定注解详解(转)
引言: 接上一篇文章,对@RequestMapping进行地址映射讲解之后,该篇主要讲解request 数据到handler method 参数数据的绑定所用到的注解和什么情形下使用: 简介: han ...
- 关于.net core程序的部署
最近发布.net core程序的时候,发现它是可以独立部署的,它支持如下两种部署方式: 依赖框架的部署FDD.只发布我们的程序,运行前用户需要手动安装.net core runtime. 独立部署SC ...
- Linux/CentOS服务器 一个网卡绑定多IP地址(永久设置)
有时我们在使用 Linux 服务器时需要配置多个IP地址.如果要配置多个IP地址是否需要多块网卡呢?答案是否定的.以 CentOS 系统为例,多个 IP 地址是可以共享一块物理网卡的. 如何永久为单网 ...
- STL中经常使用数据结构
STL中经常使用的数据结构: [1] stack.queue默认的底层实现为deque结构. [2] deque:用map管理多个size大小的连续内存块,方便头尾插入. [3] vector: ...
- 基于Linux的智能家居的设计(3)
2 硬件设计 本课题的硬件设计包含主控制器.传输数据设计.数据採集设计.控制驱动设计.显示设计.门禁设计. 2.1 主控制器 依据方案三选择S3C6410主控芯片,S3C6410是由Samsung ...
- mysql 移除服务,并在cmd下切换目录
实际中需要把注册的mysql移除, 一时忘了命令, 特此记录 在网上找的帮助 #Path to installation directory. All paths are usually resolv ...
- Make the DbContext Ambient with UnitOfWorkScope(now named DbContextScope by mehdime)
The Entity Framework DbContext (or LINQ-to-SQL DataContext) is a Unit Of Work implementation. That m ...
- SAP BAPI一览 史上最全
全BADI一览 List of BAPI's BAPI WG Component Function module name Description Description Obj. Ty ...
- Windows Phone本地数据库(SQLCE):7、Database mapping(翻译)
这是“windows phone mango本地数据库(sqlce)”系列短片文章的第七篇. 为了让你开始在Windows Phone Mango中使用数据库,这一系列短片文章将覆盖所有你需要知道的知 ...
- WordPress主题开发:WP_Query使用分页实例
functions.php加入 <?php function lingfeng_custom_pagenavi( $custom_query,$range = 4 ) { global $pag ...