definition

https://www.techopedia.com/definition/17113/full-text-search

A full-text search is a comprehensive search method that compares every word of the search request against every word within the document or database. Web search engines and document editing software make extensive use of the full-text search technique in functions for searching a text database stored on the Web or on the local drive of a computer; it lets the user find a word or phrase anywhere within the database or document.

Full-text search is the most common technique used in Web search engines and Web pages. Each page is searched and indexed, and if any matches are found, they are displayed via the indexes. Parts of original text are displayed against the user’s query and then the full text. Full-text search reduces the hassle of searching for a word in huge amounts of metadata, such as the World Wide Web and commercial-scale databases. Full-text search became popular in late 1990s, when the Internet began to became a part of everyday life.

特征

非结构化

针对的搜索对象是非结构化, 或者半结构化。 不同于SQL结构中, 按照表头查询的情况。 如果将非结构化数据库赋予全文搜索的能力, 例如 ES和MongoDB, 则叫全文搜索数据库。

https://stackoverflow.com/questions/tagged/full-text-search

Full text search involves searching documents, usually involving unstructured text, as opposed to searching text fields in a structured database.

例如有个js的全文搜索实现, 其可以添加的文档为一个对象, 具有嵌套多层的文档结构。

https://github.com/frankred/node-full-text-search-light

You can also add objects or arrays to the search. Every child value will be added to the search, no matter if it's an array or object.

// Add objects
var obj = {
name: 'Alexandra',
age: 27,
student: true,
hobbies: ['Tennis', 'Football', 'Party'];
car: {
make: 'Volvo',
year: 2012,
topspeed: 280
}
}; search.add(obj);

输入

搜索的对象是 一个或者多个 字或者词组 & 不指定具体的field名称。

返回的是 符合搜索条件的文档集合。

https://www.baeldung.com/elasticsearch-full-text-search-rest-api

Full-text search queries and performs linguistic searches against documents. It includes single or multiple words or phrases and returns documents that match search condition.

ElasticSearch is a search engine based on Apache Lucene, a free and open-source information retrieval software library. It provides a distributed, full-text search engine with an HTTP web interface and schema-free JSON documents.

Compare Full-Text Search queries to the LIKE predicate

https://docs.microsoft.com/en-us/sql/relational-databases/search/full-text-search?view=sql-server-ver15

In contrast to full-text search, the LIKE Transact-SQL predicate works on character patterns only. Also, you cannot use the LIKE predicate to query formatted binary data. Furthermore, a LIKE query against a large amount of unstructured text data is much slower than an equivalent full-text query against the same data. A LIKE query against millions of rows of text data can take minutes to return; whereas a full-text query can take only seconds or less against the same data, depending on the number of rows that are returned.

A full-text index includes one or more character-based columns in a table. These columns can have any of the following data types: char, varchar, nchar, nvarchar, text, ntext, image, xml, or varbinary(max) and FILESTREAM. Each full-text index indexes one or more columns from the table, and each column can use a specific language.

Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on the rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase. A full-text query returns any documents that contain at least one match (also known as a hit). A match occurs when a target document contains all the terms specified in the full-text query, and meets any other search conditions, such as the distance between the matching terms.

Django and full-text search

https://www.cnblogs.com/lexus/archive/2012/06/08/2541277.html

Django and full-text search

13th February 2009, 11:18 am

Lately I’ve been searching for a simple solution for full-text Model search using Django. Every task up to this point just seemed so easy, so I was a bit surprised to discover there’s no quick, clean and preferred way to go about adding site search functionality in the framework.

So far, the information I read seems to suggest existing solutions are:

  • Based on a dedicated full-text search module

    • djangosearch

      • Supposed to become the official search contrib. Rather recent history (during 2008).
      • It’s an framework over existing, dedicated full text indexing engines:
    • django-sphinx
      • Wrapper around Sphinx full-text search engine
  • Based on a database engine full-text capability (ie. you must create full text indexes with appropriate DB commands)
    • For the MySQL backend, there’s already a “fieldname__search” syntax already supported in the framework, translating into a MATCH AGAINST query in SQL.

      • Supports basic boolean operators
      • Reference (look at the conclusion of the article)
    • For PostgreSQL, depending on the version of the engine, there are solutions, but they seem complex, relative to the MySQL approach
  • Most simple, but very inefficient: based on a simple LIKE %keyword% query
    • Uses the “fieldname__icontains” filter syntax
    • That’s what I used temporarily for get the feature going in my prototype

Other approaches are mentioned in this thread on StackOverflow.

ES例子

https://cloud.tencent.com/developer/article/1350622

1. 基本的匹配(Query)查询

有两种方式来执行一个全文匹配查询:

  • 使用 Search Lite API,它从 url 中读取所有的查询参数
  • 使用完整 JSON 作为请求体,这样你可以使用完整的 Elasticsearch DSL

下面是一个基本的匹配查询,查询任一字段包含 Guide 的记录

GET /bookdb_index/book/_search?q=guide

[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.28168046,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": ["clinton gormley", "zachary tong"],
"summary": "A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.24144039,
"_source": {
"title": "Solr in Action",
"authors": ["trey grainger", "timothy potter"],
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher": "manning"
}
}
]

下面是完整 Body 版本的查询,生成相同的内容:

{
"query": {
"multi_match" : {
"query" : "guide",
"fields" : ["_all"]
}
}
}

multi_matchmatch 的作为在多个字段运行相同操作的一个速记法。

2. 多字段(Multi-filed)查询

正如我们已经看到来的,为了根据多个字段检索(e.g. 在 titlesummary 字段都是相同的查询字符串的结果),你可以使用 multi_match 语句

POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query" : "elasticsearch guide",
"fields": ["title", "summary"]
}
}
} [Results]
"hits": {
"total": 3,
"max_score": 0.9448582,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.9448582,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
],
"summary": "A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.17312013,
"_source": {
"title": "Elasticsearch in Action",
"authors": [
"radu gheorge",
"matthew lee hinman",
"roy russo"
],
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"publish_date": "2015-12-03",
"num_reviews": 18,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.14965448,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
],
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher": "manning"
}
}
]
}

注:第三条被匹配,因为 guidesummary 字段中被找到。

full text search的更多相关文章

  1. Javascript > Eclipse > problems encountered during text search

    Reproduce: Ctrl + H, Select "File Search", will encounter eclipse kinds of bug/error alert ...

  2. MongoDB的全文检索(Text Search)功能

    自己的项目中用到了mongodb,需要做一个搜索功能,刚开始不知道怎么搞,查了mongodb有个全文检索功能. 全文检索分为两步 第一,建立索引 db.stores.createIndex( { na ...

  3. Deep Dive into Neo4j 3.5 Full Text Search

    In this blog we will go over the Full Text Search capabilities available in the latest major release ...

  4. eclipse安装quick text search插件,全文搜索

    主要有两种方法 1.InstaSearch 同样可以做到workspace下的全文搜索 可以使用eclipse marktplace中搜索instaSearch,与普通软件安装类似 安装成功后的界面如 ...

  5. Using text search in Web page with Sikuli

    在網頁中如何使用Sikuli找特定字串呢? 原理: 我們可以使用 組合鍵 ctrl + 來放大網頁的比例,使得sikuli的OCR功能找的更清準 實作: for i in range(4): type ...

  6. Entity Framework 中使用SQL Server全文索引(Full Text Search)

    GitHub:https://github.com/fissoft/Fissoft.EntityFramework.Fts EntityFramework中原来使用全文索引有些麻烦,需要使用DbCon ...

  7. MyEclipse-File Serarch时报错:Problems encountered during text search

  8. Full Text Search 实现Sort的实现方案

    CREATE TABLE dbo.pageStore( ID int NOT NULL, StoreName varchar(50) NULL, OwnerOccupation varchar(50) ...

  9. Keywords Search

    Keywords Search Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others) ...

随机推荐

  1. 伪分布式下Hadoop3.0打不开localhost:50070

    伪分布式下Hadoop打不开localhost:50070,可以打开localhost:8088 1.对hdfs进行format hadoop namenode -format 在选择Y/N时输入大写 ...

  2. 初识Java(Java数字处理类-大数字运算)

    一.大数字运算 在 Java 中提供了大数字的操作类,即 java.math.BigInteger 类与  java.math.BigDecimal 类.这两个类用于高精度计算,体重 BigInteg ...

  3. Spring Boot 与 Spring Cloud 的版本对应

    事项 列表 spring官方对应查看网址 https://start.spring.io/actuator/info spring-cloud-dependencies 版本列表 https://mv ...

  4. HihoCoder - 1652:三角形面积和2(扫描线)

    题意:给定X轴上的一些三角形,求面积并. 每个三角形的给出形式是Li,Ri,Xi,Yi,表示三个顶点分别是(Li,0):(Ri,0):(Xi,Yi),且满足Li<=Xi<=Ri: 思路:我 ...

  5. Python 类的继承__init__() takes exactly 3 arguments (1 given)

    类(class),可以继承基类以便形成具有自己独特属性的类,我们在面向对象的编程中,经常用到类及其继承,可以说没有什么不是类的,今天我们就来详细探讨一下在python中,类的继承是如何做的. 我们假设 ...

  6. Maven pom文件中dependency scope用法

    在Maven中依赖的域有:compile.provided.runtime.system.test.import 一.compile(默认) 当依赖的scope为compile的时候,那么当前这个依赖 ...

  7. ACM之Java输入输出

    本文转自:ACM之Java输入输出 一.Java之ACM注意点 1. 类名称必须采用public class Main方式命名 2. 在有些OJ系统上,即便是输出的末尾多了一个“ ”,程序可能会输出错 ...

  8. 洛谷 P1144 最短路计数 题解

    P1144 最短路计数 题目描述 给出一个\(N\)个顶点\(M\)条边的无向无权图,顶点编号为\(1-N\).问从顶点\(1\)开始,到其他每个点的最短路有几条. 输入格式 第一行包含\(2\)个正 ...

  9. .NET总结--泛型与泛型集合,应用场景

    泛型优点 1.提高代码复用性,代码简洁直观 2.直接存储数据类型免去数据类型之间得隐式转换 3.免去拆箱装箱过程,提高效率 4.数据类型安全,存储时会验证是否对应该类型 泛型集合 一. ArrayLi ...

  10. vue-cli配置跨域代理

    现在使用vue大多使用了前后端分离模式,因此游览器经常显示跨域失败的信息,现在跨域的方式很多种,主要分两大类,ajax跨域,dom跨域,具体的方法就不例举啦. vue-cli作为一个强大的脚手架,内置 ...