首先本文测试数据100多万的域名的wwwtitle 信息  检索数据:

  • 首先建立临时表格:
CREATE TABLE `sph_counter` (
`index_id` tinyint(1) NOT NULL,
`max_id` int(11) NOT NULL,
PRIMARY KEY (`index_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
  • csft.complex.conf 文件修改配置:
#增量数据索引 区段查询 合并为一个文件实现 测试 配置文件
#数据源
source src
{
# data source type. mandatory, no default value
# known types are mysql, pgsql, mssql, xmlpipe, xmlpipe2, odbc
type = mysql #####################################################################
## SQL settings (for 'mysql' and 'pgsql' types)
##################################################################### # some straightforward parameters for SQL source types
sql_host = localhost
sql_user = root
sql_pass = 201671zhuang
sql_db = whomx
sql_port = # optional, default is
sql_query_pre = SET NAMES utf8
sql_query_pre = SET SESSION query_cache_type=OFF
sql_query_pre = REPLACE INTO sph_counter SELECT , MAX(id) FROM mx_domain_wwwinfo
#分次查询数据,不要一次将全部数据取
sql_query_range = SELECT MIN(id), (SELECT max_id FROM sph_counter WHERE index_id = ) FROM mx_domain_wwwinfo
#每一次的 数据条数量
sql_range_step =
sql_query = \
SELECT * \
FROM mx_domain_wwwinfo \
WHERE id>=$start AND id<=$end
sql_query_info = SELECT * FROM mx_domain_wwwinfo WHERE id=$id
}
# 表示增量数据源
source moresrc : src
{
sql_query_pre = SET NAMES utf8
sql_query_pre = SET SESSION query_cache_type=OFF
#增量索引实现
sql_query_range = SELECT (SELECT max_id FROM sph_counter WHERE index_id=),MAX(id) from mx_domain_wwwinfo
sql_range_step =
sql_query = \
SELECT * \
FROM mx_domain_wwwinfo \
WHERE id>=$start AND id<=$end
sql_query_post = \
REPLACE INTO sph_counter SELECT ,MAX(id) FROM mx_domain_wwwinfo
#获取数据后,改写sph_counter增量索引计数表中的数据
} #其他配置参照以前
  •   更新主索引
 root@timeless-HP-Pavilion-g4-Notebook-PC:/usr/local/coreseek/bin# /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft.complex.conf  -all --rotate
Coreseek Fulltext 4.1 [ Sphinx 2.0.-dev (r2922)]
Copyright (c) -,
Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.complex.conf'...
indexing index 'src'...
WARNING: Attribute count is : switching to none docinfo
collected docs, 117.1 MB
sorted 20.5 Mhits, 100.0% done
total docs, bytes
total 39.557 sec, bytes/sec, 37941.06 docs/sec
indexing index 'moresrc'...
WARNING: Attribute count is : switching to none docinfo
collected docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total docs, bytes
total 0.001 sec, bytes/sec, 632.51 docs/sec
total reads, 0.047 sec, 3240.3 kb/call avg, 1.2 msec/call avg
total writes, 0.216 sec, 834.8 kb/call avg, 0.8 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=).
  •  数据库mx_doimain_wwwinfo表格添加测试数据。更新增量索引
root@timeless-HP-Pavilion-g4-Notebook-PC:/usr/local/coreseek/bin# /usr/local/coreseek/bin/indexer moresrc -c /usr/local/coreseek/etc/csft.complex.conf --rotate
Coreseek Fulltext 4.1 [ Sphinx 2.0.-dev (r2922)]
Copyright (c) -,
Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.complex.conf'...
indexing index 'moresrc'...
WARNING: Attribute count is : switching to none docinfo
collected docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total docs, bytes
total 0.013 sec, bytes/sec, 74.83 docs/sec
total reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=).
  • 合并增量索引到主索引
root@timeless-HP-Pavilion-g4-Notebook-PC:/usr/local/coreseek/bin# /usr/local/coreseek/bin/indexer --merge src moresrc  -c /usr/local/coreseek/etc/csft.complex.conf --rotate
Coreseek Fulltext 4.1 [ Sphinx 2.0.-dev (r2922)]
Copyright (c) -,
Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.complex.conf'...
read 11.5 of 11.5 MB, 100.0% done
merged 2268.5 Kwords
merged in 2.724 sec
total reads, 0.075 sec, 1.5 kb/call avg, 0.0 msec/call avg
total writes, 0.110 sec, 253.4 kb/call avg, 0.2 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=).
  • 测试 增量索引
root@timeless-HP-Pavilion-g4-Notebook-PC:/usr/local/coreseek/bin# ./search -c /usr/local/coreseek/etc/csft.complex.conf 济南Coreseek Fulltext 4.1 [ Sphinx 2.0.-dev (r2922)]
Copyright (c) -,
Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.complex.conf'...
index 'src': query '济南 ': returned matches of total in 0.001 sec displaying matches:
. document=, weight=
id=
domain_id=
title=?????????????,??????,??????,?????????????????????????????????????,??????,??????,????,????,??????,??????,??????,??????,????????,????,??????,??????,??????,??????,????????,????????,?????????,?????????
addtime=
. document=, weight=
id=
domain_id=
title=??--??????,?????,????????,????????,??????,??????,?????,?????,???,?????,?????,?????,?????,????,????,??????,??????,??????,?????,?????,?????,???,???????,
addtime=
. document=, weight=
id=
domain_id=
title=???????-???.??.???.???????/?????/?????/?????????/?????/?????/???? ????? ????? ????? ????? ????? ????? ??POS??? ????? ????? ????? ??POS?
addtime=
. document=, weight=
id=
domain_id=
title=???????-???.??.???.???????/?????/?????/?????????/?????/?????/???? ????? ????? ????? ????? ????? ????? ??POS??? ????? ????? ????? ??POS?
addtime=
. document=, weight=
id=
domain_id=
title=?????? - ???? | ????? | ????? | ?????? | ?????? | ?????? | ?????? | ?????? | ?????? | ?????? | ?????? | ?????? | ?????? | ??????
addtime=
. document=, weight=
id=
domain_id=
title=????????|????????|?????????|?????????|??????|?????|????????????????|????????|?????????|?????????|??????|?????|????????
addtime=
. document=, weight=
id=
domain_id=
title=???????????????????????????????????????????????????????????????????????????????????????????????????????
addtime=
. document=, weight=
id=
domain_id=
title=????|??????|????????|??????|??????|??????|??????|??????|??????|??????|??????-????????
addtime=
. document=, weight=
id=
domain_id=
title=????????????????????????????????????????????????????????????????????
addtime=
. document=, weight=
id=
domain_id=
title=??????|??????|??SKF??|??NSK??|??FAG??|??NTN??|??KOYO??|??TIMKEN??|??FAG??|????|????????|????????|??????|-??
addtime=
. document=, weight=
id=
domain_id=
title=???????|??????????|???????|??????????|??????|???????|?????????|????????|?????????|???????|??????????
addtime=
. document=, weight=
id=
domain_id=
title=???? ?????? ?????? ?????? ????? ?????? ???? ???? ?????? ???? ??????
addtime=
. document=, weight=
id=
domain_id=
title=??????|????|?????|?????????|???????????|??????|??????|????|?????|?????|?????|?????
addtime=
. document=, weight=
id=
domain_id=
title=??????????????????????????????????????????????????????????????????????????????_??????????????
addtime=
. document=, weight=
id=
domain_id=
title=?????????????????????????????????????????????????????????????????????????_??????????????
addtime=
. document=, weight=
id=
domain_id=
title=????????????????????????????????????????????????????????????????????????????_??????????????
addtime=
. document=, weight=
id=
domain_id=
title=??????|??????|??????|??????|??????|?????|??????|????????|???????????|????????????|???--
addtime=
. document=, weight=
id=
domain_id=
title=????????? ???????????? ??????? ?????????? ??????? ??????? ????????? ??????? ??????? ???????_?????????
addtime=
. document=, weight=
id=
domain_id=
title=????-|??????|??????????????|????????|???????|??????|????T1|????T3|????T6|????U8
addtime=
. document=, weight=
id=
domain_id=
title=????|?????|?????|?????|?????|?????|?????|?????|????|??????????
addtime= words:
#起作用
1. '济南': 11041 documents, 22216 hits index 'moresrc': query '济南 ': returned 1 matches of 1 total in 0.000 sec displaying matches:
1. document=1501042, weight=1500
id=1501042
domain_id=4
title=???? ??
addtime= words:
1. '济南': 1 documents, 2 hits

增量索引起作用

  

sphinx (coreseek)——3、区段查询 与 增量索引实例的更多相关文章

  1. Coreseek:区段查询及增量索引取代实时索引

    1.区段查询 索引系统须要通过主查询来获取所有的文档信息,一种简单的实现是将整个表的数据读入内存,可是这可能导致整个表被锁定并使得其它操作被阻止(比如:在MyISAM格式上的INSERT操作),同一时 ...

  2. Coreseek:部门查询和增量索引代替实时索引

    1.行业调查 索引系统需要通过主查询来获取所有的文档信息,一个简单的实现是整个表的数据到内存,但是这可能会导致整个表被锁定,并且使其它操作被阻止(例如:在MyISAM格款式上INSERT操作).同时, ...

  3. 【PHP高效搜索专题(2)】sphinx&coreseek在PHP程序中的应用实例

    PHP可以通过三种途径来调用sphinx 通过Sphinx官方提供的API接口(接口有Python,Java,Php三种版本) 通过安装SphinxSE,然后创建一个中介sphinxSE类型的表,再通 ...

  4. coreseek增量索引

    1.在多数情况下,因为Coreseek索引速度高达10MB/s,所以只需要创建一个索引源即可满足需求,但是在数据量随时激增的大型应用中(如SNS.评论系统等),单一的索引源将会给indexer造成极大 ...

  5. sphinx增量索引

    首先建立一个计数表,保存数据表的最新记录ID CREATE TABLE `sph_counter` (  `id` int(11) unsigned NOT NULL,  `max_id` int(1 ...

  6. sphinx增量索引使用

    sphinx在使用过程中如果表的数据量很大,新增加的内容在sphinx索引没有重建之前都是搜索不到的. 这时可以通过建立sphinx增量索引,通过定时更新增量索引,合并主索引的方式,来实现伪实时更新. ...

  7. sphinx (coreseek)——2、区段查询实例

    首先需要知道区段查询的定义: 索引系统需要通过主查询来获取全部的文档信息,一种简单的实现是将整个表的数据读入内存,但是这可能导致整个表被锁定并使得其他操作被阻止(例如:在MyISAM格式上的INSER ...

  8. sphinx(coreseek)——1、增量索引

    首先介绍一下     CoreSeek/Sphinx的发布包 indexer: 用于创建全文索引;    search: 一个简单的命令行(CLI) 的测试程序,用于测试全文索引;    search ...

  9. sphinx增量索引和主索引来实现索引的实时更新

    项目中文章的信息内容因为持续有新增,而文章总量的基数又比较大,所以做搜索的时候,用了主索引+增量索引这种方式来实现索引的实时更新. 实现原理: 1. 新建一张表,记录一下上一次已经创建好索引的最后一条 ...

随机推荐

  1. Python中逗号作用的实例分析

    逗号在类型转化中的使用 主要是元组的转换 例如: >>> a=11>>> b=(a)>>> b11>>> b=(a,)>& ...

  2. jump_ur.php通知模板

    <title>出错啦~~~</title> <script language="javascript" type="text/javascr ...

  3. Android Studio打包签名全过程

    Step1:Build-----Generate Step2:签名文件保存位置,习惯性的保存在项目的根目录下E:\Users\StudioProject\SmartHust\smarthust.jks ...

  4. 分享一个牛逼的PHP无限极分类生成树方法,巧用引用(转)

    你还在用浪费时间又浪费内存的递归遍历无限极分类吗,看了该篇文章,我觉得你应该换换了.这是我在OSChina上看到的一段非常精简的PHP无限极分类生成树方法,巧在引用,整理分享了. function g ...

  5. Qt 学习之路 :文件

    文件操作是应用程序必不可少的部分.Qt 作为一个通用开发库,提供了跨平台的文件操作能力.从本章开始,我们来了解下 Qt 的文件以及输入输出的功能,也就是 I/O 系统. Qt 通过QIODevice提 ...

  6. QDomDocument类

    QDomDocument类代表了一个XML文件 QDomDocument类代表整个的XML文件.概念上讲:它是文档树的根节点,并提供了文档数据的基本访问方法. 由于元素.文本节点.注释.指令执行等等不 ...

  7. iOS--RunLoop原理介绍

    什么是RunLoop RunLoop从字面上看是运行循环的意思,这一点也不错,它确实就是一个循环的概念,或者准确的说是线程中的循环. 本文一开始就提到有些程序是一个圈,这个圈本质上就是这里的所谓的Ru ...

  8. IOS-tableView中的cellHeadView随着table滚动

    IOS-tableView中的cellHeadView随着table滚动 设置table的style 首先要将table设置为UITableViewStyleGrouped类型.这样就会得到table ...

  9. linux 上查找pid,筛选出来

    ps -ef | grep httpd find / -name "1000sql.txt" 查找命令

  10. Java基础知识强化之集合框架笔记35:List练习之产生10个1~20之间的随机数(要求:随机数不能重复)

    1. 需求:获取10个1-20之间的随机数,要求不能重复 用数组实现,但是数组的长度是固定的,长度不好确定.所以我们使用集合实现. 分析: • 创建产生随机数的对象 • 创建一个存储随机数的集合. • ...