可以执行全文搜索的原因 Elasticsearch full-text search Kibana RESTful API with JSON over HTTP elasticsearch_action es 模糊查询
https://www.elastic.co/guide/en/elasticsearch/guide/current/getting-started.html
Elasticsearch is a real-time distributed search and analytics engine. It allows you to explore your data at a speed and at a scale never before possible. It is used for full-text search, structured search, analytics, and all three in combination:
- Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, and search-as-you-type and did-you-mean suggestions.
- The Guardian uses Elasticsearch to combine visitor logs with social -network data to provide real-time feedback to its editors about the public’s response to new articles.
- Stack Overflow combines full-text search with geolocation queries and uses more-like-this to find related questions and answers.
- GitHub uses Elasticsearch to query 130 billion lines of code.
But Elasticsearch is not just for mega-corporations. It has enabled many startups like Datadog and Klout to prototype ideas and to turn them into scalable solutions. Elasticsearch can run on your laptop, or scale out to hundreds of servers and petabytes of data.
No individual part of Elasticsearch is new or revolutionary. Full-text search has been done before, as have analytics systems and distributed databases. The revolution is the combination of these individually useful parts into a single, coherent, real-time application. It has a low barrier to entry for the new user, but can keep pace with you as your skills and needs grow.
If you are picking up this book, it is because you have data, and there is no point in having data unless you plan to do something with it.
Unfortunately, most databases are astonishingly inept at extracting actionable knowledge from your data. Sure, they can filter by timestamp or exact values, but can they perform full-text search, handle synonyms, and score documents by relevance? Can they generate analytics and aggregations from the same data? Most important, can they do this in real time without big batch-processing jobs?
This is what sets Elasticsearch apart: Elasticsearch encourages you to explore and utilize your data, rather than letting it rot in a warehouse because it is too difficult to query.
【Lucene】
Elasticsearch is an open-source search engine built on top of Apache Lucene™, a full-text search-engine library. Lucene is arguably the most advanced, high-performance, and fully featured search engine library in existence today—both open source and proprietary.
But Lucene is just a library. To leverage its power, you need to work in Java and to integrate Lucene directly with your application. Worse, you will likely require a degree in information retrieval to understand how it works. Lucene is very complex.
Elasticsearch is also written in Java and uses Lucene internally for all of its indexing and searching, but it aims to make full-text search easy by hiding the complexities of Lucene behind a simple, coherent, RESTful API.
However, Elasticsearch is much more than just Lucene and much more than “just” full-text search. It can also be described as follows:
- A distributed real-time document store where every field is indexed and searchable
- A distributed search engine with real-time analytics
- Capable of scaling to hundreds of servers and petabytes of structured and unstructured data
And it packages up all this functionality into a standalone server that your application can talk to via a simple RESTful API, using a web client from your favorite programming language, or even from the command line.
It is easy to get started with Elasticsearch. It ships with sensible defaults and hides complicated search theory away from beginners. It just works, right out of the box. With minimal understanding, you can soon become productive.
As your knowledge grows, you can leverage more of Elasticsearch’s advanced features. The entire engine is configurable and flexible. Pick and choose from the advanced features to tailor Elasticsearch to your problem domain.
【work on an abstraction layer to make Lucene easier for Java programmers to add search to their applications】
The Mists of Time
Many years ago, a newly married unemployed developer called Shay Banon followed his wife to London, where she was studying to be a chef. While looking for gainful employment, he started playing with an early version of Lucene, with the intent of building his wife a recipe search engine.
Working directly with Lucene can be tricky, so Shay started work on an abstraction layer to make it easier for Java programmers to add search to their applications. He released this as his first open source project, called Compass.
Later Shay took a job working in a high-performance, distributed environment with in-memory data grids. The need for a high-performance, real-time, distributed search engine was obvious, and he decided to rewrite the Compass libraries as a standalone server called Elasticsearch.
The first public release came out in February 2010. Since then, Elasticsearch has become one of the most popular projects on GitHub with commits from over 300 contributors. A company has formed around Elasticsearch to provide commercial support and to develop new features, but Elasticsearch is, and forever will be, open source and available to all.
Shay’s wife is still waiting for the recipe search…
Once you’ve extracted the archive file, Elasticsearch is ready to run. To start it up in the foreground:
cd elasticsearch-<version>
./bin/elasticsearch
Add |
|
If you’re running Elasticsearch on Windows, simply run |
Test it out by opening another terminal window and running the following:
curl 'http://localhost:9200/?pretty'
You should see a response like this:
{
"name" : "Tom Foster",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.1.0",
"build_hash" : "72cd1f1a3eee09505e036106146dc1949dc5dc87",
"build_timestamp" : "2015-11-18T22:40:03Z",
"build_snapshot" : false,
"lucene_version" : "5.3.1"
},
"tagline" : "You Know, for Search"
}
This means that you have an Elasticsearch node up and running, and you can start experimenting with it. A node is a running instance of Elasticsearch. A cluster is a group of nodes with the same cluster.name
that are working together to share data and to provide failover and scale. (A single node, however, can form a cluster all by itself.) You can change the cluster.name
in the elasticsearch.yml
configuration file that’s loaded when you start a node.
Elasticsearch is running in the foreground.
Installing Sense
Sense is a Kibana app that provides an interactive console for submitting requests to Elasticsearch directly from your browser. Many of the code examples in the online version of this book include a View in Sense link. When clicked, it opens up a working example of the code in the Sense console. You do not have to install Sense, but it will make this book much more interactive by allowing you to experiment with the code samples on your local Elasticsearch cluster.
To install and run Sense:
To install and run Sense:
Run the following command in the Kibana directory to download and install the Sense app:
./bin/kibana plugin --install elastic/sense
Windows:
bin\kibana.bat plugin --install elastic/sense
.You can download Sense from https://download.elastic.co/elastic/sense/sense-latest.tar.gz to install it on an offline machine.
Start Kibana.
./bin/kibana
Windows:
bin\kibana.bat
.- Open Sense your web browser by going to
http://localhost:5601/app/sense
.
https://www.elastic.co/guide/en/elasticsearch/guide/current/_talking_to_elasticsearch.html#_restful_api_with_json_over_http
Java APIedit
If you are using Java, Elasticsearch comes with two built-in clients that you can use in your code:
【数据在集群的哪个节点】
- Node client
- The node client joins a local cluster as a non data node. In other words, it doesn’t hold any data itself, but it knows what data lives on which node in the cluster, and can forward requests directly to the correct node.
- 【转发请求】
- Transport client
- The lighter-weight transport client can be used to send requests to a remote cluster. It doesn’t join the cluster itself, but simply forwards requests to a node in the cluster.
Both Java clients talk to the cluster over port 9300, using the native Elasticsearch transport protocol. The nodes in the cluster also communicate with each other over port 9300. If this port is not open, your nodes will not be able to form a cluster.
节点客户端(Node client)节点客户端作为一个非数据节点加入到本地集群中。换句话说,它本身不保存任何数据,但是它知道数据在集群中的哪个节点中,并且可以把请求转发到正确的节点。
传输客户端(Transport client)轻量级的传输客户端可以可以将请求发送到远程集群。它本身不加入集群,但是它可以将请求转发到集群中的一个节点上。
A request to Elasticsearch consists of the same parts as any HTTP request:
curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
The parts marked with < >
above are:
|
The appropriate HTTP method or verb: |
|
Either |
|
The hostname of any node in your Elasticsearch cluster, or |
|
The port running the Elasticsearch HTTP service, which defaults to |
|
API Endpoint (for example |
|
Any optional query-string parameters (for example |
|
A JSON-encoded request body (if the request needs one.) |
Document Oriented
Objects in an application are seldom just a simple list of keys and values. More often than not, they are complex data structures that may contain dates, geo locations, other objects, or arrays of values.
Sooner or later you’re going to want to store these objects in a database. Trying to do this with the rows and columns of a relational database is the equivalent of trying to squeeze your rich, expressive objects into a very big spreadsheet: you have to flatten the object to fit the table schema—usually one field per column—and then have to reconstruct it every time you retrieve it.
Elasticsearch is document oriented, meaning that it stores entire objects or documents. It not only stores them, but also indexes the contents of each document in order to make them searchable. In Elasticsearch, you index, search, sort, and filter documents—not rows of columnar data. This is a fundamentally different way of thinking about data and is one of the reasons Elasticsearch can perform complex full-text search.
应用中的一个个对象,不仅仅是键值对的数据结构。
【document oriented】
Elasticsearch是以文档document为中心的,来存储整个应用中的对象、文档。Elasticsearch为每一个文档的内容(contents of each documents)建立索引,使得他们便于被检索到。
【可以执行全文搜索的原因 full-text search】
索引、搜索、排序、过滤的操作对象是文档,而不是传统的列式数据
基础入门 | Elasticsearch: 权威指南 | Elastic https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/getting-started.html
Elasticsearch 中没有一个单独的组件是全新的或者是革命性的。全文搜索很久之前就已经可以做到了, 就像早就出现了的分析系统和分布式数据库。 革命性的成果在于将这些单独的,有用的组件融合到一个单一的、一致的、实时的应用中。它对于初学者而言有一个较低的门槛, 而当你的技能提升或需求增加时,它也始终能满足你的需求。
信息检索的概念、分布式系统原理、Query DSL
ElasticSearch 模糊匹配查询 - CSDN博客 http://blog.csdn.net/hemuxiao/article/details/72869105
目前的需求输入:王? 女 济南 20-30
- 能够查询以王开头的人的名字
- 性别为女性
- 地址为济南
- *年龄为20-30
{
"from": 0,
"query": {
"bool": {
"minimum_should_match": 1,
"must": [
{
"match": {
"XB": "2 "
}
},
{
"wildcard": {
"XM": "王*"
}
}
],
"should": [
{
"range": {
"CSRQ": {
"gte": "2010-09-01",
"lte": "2014-09-01"
}
}
}
]
}
},
"size": 10
}
模糊查询
curl '1.2.21.10 : 9200/dvote/kwaddress/_search?pretty=true' -d '
{
"from": ,
"query": {
"bool": {
"minimum_should_match": ,
"must": [
{
"wildcard": {
"title": "*网*"
}
}
]
}
},
"size":
}
'
https://elasticsearch.cn/book/elasticsearch_definitive_guide_2.x/_distributed_nature.html
分布式特性
在本章开头,我们提到过 Elasticsearch 可以横向扩展至数百(甚至数千)的服务器节点,同时可以处理PB级数据。我们的教程给出了一些使用 Elasticsearch 的示例,但并不涉及任何内部机制。Elasticsearch 天生就是分布式的,并且在设计时屏蔽了分布式的复杂性。
Elasticsearch 在分布式方面几乎是透明的。教程中并不要求了解分布式系统、分片、集群发现或其他的各种分布式概念。可以使用笔记本上的单节点轻松地运行教程里的程序,但如果你想要在 100 个节点的集群上运行程序,一切依然顺畅。
Elasticsearch 尽可能地屏蔽了分布式系统的复杂性。这里列举了一些在后台自动执行的操作:
- 分配文档到不同的容器 或 分片 中,文档可以储存在一个或多个节点中
- 按集群节点来均衡分配这些分片,从而对索引和搜索过程进行负载均衡
- 复制每个分片以支持数据冗余,从而防止硬件故障导致的数据丢失
- 将集群中任一节点的请求路由到存有相关数据的节点
- 集群扩容时无缝整合新节点,重新分配分片以便从离群节点恢复
当阅读本书时,将会遇到有关 Elasticsearch 分布式特性的补充章节。这些章节将介绍有关集群扩容、故障转移(集群内的原理) 、应对文档存储(分布式文档存储) 、执行分布式搜索(执行分布式检索) ,以及分区(shard)及其工作原理(分片内部原理) 。
Distributed Nature | Elasticsearch: The Definitive Guide [2.x] | Elastic https://www.elastic.co/guide/en/elasticsearch/guide/current/_distributed_nature.html
Distributed Nature
At the beginning of this chapter, we said that Elasticsearch can scale out to hundreds (or even thousands) of servers and handle petabytes of data. While our tutorial gave examples of how to use Elasticsearch, it didn’t touch on the mechanics at all. Elasticsearch is distributed by nature, and it is designed to hide the complexity that comes with being distributed.
The distributed aspect of Elasticsearch is largely transparent. Nothing in the tutorial required you to know about distributed systems, sharding, cluster discovery, or dozens of other distributed concepts. It happily ran the tutorial on a single node living inside your laptop, but if you were to run the tutorial on a cluster containing 100 nodes, everything would work in exactly the same way.
Elasticsearch tries hard to hide the complexity of distributed systems. Here are some of the operations happening automatically under the hood:
- Partitioning your documents into different containers or shards, which can be stored on a single node or on multiple nodes
- Balancing these shards across the nodes in your cluster to spread the indexing and search load
- Duplicating each shard to provide redundant copies of your data, to prevent data loss in case of hardware failure
- Routing requests from any node in the cluster to the nodes that hold the data you’re interested in
- Seamlessly integrating new nodes as your cluster grows or redistributing shards to recover from node loss
As you read through this book, you’ll encounter supplemental chapters about the distributed nature of Elasticsearch. These chapters will teach you about how the cluster scales and deals with failover (Life Inside a Cluster), handles document storage (Distributed Document Store), executes distributed search (Distributed Search Execution), and what a shard is and how it works (Inside a Shard).
【面向对象编程语言如此流行的原因】
面向对象编程语言如此流行的原因之一是对象帮我们表示和处理现实世界具有潜在的复杂的数据结构的实体,到目前为止,一切都很完美!
但是当我们需要存储这些实体时问题来了,传统上,我们以行和列的形式存储数据到关系型数据库中,相当于使用电子表格。 正因为我们使用了这种不灵活的存储媒介导致所有我们使用对象的灵活性都丢失了。
但是否我们可以将我们的对象按对象的方式来存储? 这样我们就能更加专注于 使用 数据,而不是在电子表格的局限性下对我们的应用建模。 我们可以重新利用对象的灵活性。
一个 对象 是基于特定语言的内存的数据结构。 为了通过网络发送或者存储它,我们需要将它表示成某种标准的格式。 JSON 是一种以人可读的文本表示对象的方法。 它已经变成 NoSQL 世界交换数据的事实标准。当一个对象被序列化成为 JSON,它被称为一个 JSON 文档 。
可以执行全文搜索的原因 Elasticsearch full-text search Kibana RESTful API with JSON over HTTP elasticsearch_action es 模糊查询的更多相关文章
- Elasticsearch入坑指南之RESTful API
Elasticsearch入坑指南之RESTful API Tags:Elasticsearch ES为开发者提供了非常丰富的基于Http协议的Rest API,通过简单的Rest请求,就可以实现非常 ...
- 使用ElasticSearch服务从MySQL同步数据实现搜索即时提示与全文搜索功能
最近用了几天时间为公司项目集成了全文搜索引擎,项目初步目标是用于搜索框的即时提示.数据需要从MySQL中同步过来,因为数据不小,因此需要考虑初次同步后进行持续的增量同步.这里用到的开源服务就是Elas ...
- 使用ElasticSearch实现搜索时即时提示与全文搜索功能
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
- MySQL全文搜索
http://www.yiibai.com/mysql/full-text-search.html 在本节中,您将学习如何使用MySQL全文搜索功能. MySQL全文搜索提供了一种实现各种高级搜索技术 ...
- 如何在Web前端实现CAD图文字全文搜索功能之技术分享
现状 在CAD看图过程中我们经常会需要用到查找文字的功能,在AutoCAD软件查找一个文字时,可以通过打开左下角输入命令find,输入查找的文字,然后设置查找范围,就可以搜索到需要查询的文字.但在We ...
- java使用elasticsearch进行模糊查询-已在项目中实际应用
java使用elasticsearch进行模糊查询 使用环境上篇文章本人已书写过,需要maven坐标,ES连接工具类的请看上一篇文章,以下是内容是笔者在真实项目中运用总结而产生,并写的是主要方法和思路 ...
- 一种安全云存储方案设计(下)——基于Lucene的云端搜索与密文基础上的模糊查询
一种安全的云存储方案设计(未完整理中) 一篇老文了,现在看看错漏颇多,提到的一些技术已经跟不上了.仅对部分内容重新做了一些修正,增加了一些机器学习的内容,然并卵. 这几年来,云产品层出不穷,但其安全性 ...
- ElasticSearch 2 (14) - 深入搜索系列之全文搜索
ElasticSearch 2 (14) - 深入搜索系列之全文搜索 摘要 在看过结构化搜索之后,我们看看怎样在全文字段中查找相关度最高的文档. 全文搜索两个最重要的方面是: 相关(relevance ...
- Elasticsearch系列---深入全文搜索
概要 本篇介绍怎样在全文字段中搜索到最相关的文档,包含手动控制搜索的精准度,搜索条件权重控制. 手动控制搜索的精准度 搜索的两个重要维度:相关性(Relevance)和分析(Analysis). 相关 ...
随机推荐
- 模仿锤子手机的bigbang效果
<!DOCTYPE html> <html style="height: 100%"> <head> <meta http-equiv=& ...
- unix网络编程第一章demo
之前一直以为time_wait状态就是主动关闭的那一方产生.然后这个端口一直不可以用.实际我发现服务端监听一个端口.客户端发来连接后.传输数据后.服务端关闭客户端套接字后.用netstat -nat ...
- 捕获错误并发邮件 register_shutdown_function
/** * 脚本程序异常捕获 */ function handleError() { global $config; $error = error_get_last(); if (isset($err ...
- zoj 2760(网络流+floyed)
How Many Shortest Path Time Limit: 10 Seconds Memory Limit: 32768 KB Given a weighted directed ...
- AC日记——[Sdoi2013]森林 bzoj 3123
3123: [Sdoi2013]森林 Time Limit: 20 Sec Memory Limit: 512 MBSubmit: 3216 Solved: 944[Submit][Status] ...
- 洛谷—— P1849 [USACO12MAR]拖拉机Tractor
https://www.luogu.org/problemnew/show/P1849 题目描述 After a long day of work, Farmer John completely fo ...
- Android Studio 删除项目
在项目上右键 点击“Open Module Settings”,然后你会看到你的项目排成一列,如果想删除哪个,点击项目,然后在左上角,点击“-”号,然后返回后发现这个项目变为灰色,点击项目右键,看到“ ...
- tensorflow搭建神经网络基本流程
定义添加神经层的函数 1.训练的数据2.定义节点准备接收数据3.定义神经层:隐藏层和预测层4.定义 loss 表达式5.选择 optimizer 使 loss 达到最小 然后对所有变量进行初始化,通过 ...
- bash帮助文档简单学习;bash手册翻译
关于bash的四种工作方式的不同,可以参考:http://feihu.me/blog/2014/env-problem-when-ssh-executing-command-on-remote/,但是 ...
- VC2012编译protobuf出错处理
近来要学习protobuf的协议生成.须要从网上下载它的代码,从这个SVN地址下载: 个.因此编译提示上面的出错.仅仅须要把std;;tuple里的个数定义为10个就可以.,因此不支持5个以上的參数输 ...