用elasticsearch索引mongodb数据

三个步骤：

一，搭建单机replicSet
二，安装mongodb-river插件
三，创建meta，验证使用

第一步，搭建单机mongodb的replSet

1，配置/etc/mongodb.conf
增加两个配置：

replSet=rs0 #这里是指定replSet的名字

oplogSize=100 #这里是指定oplog表数据大小（太大了不支持）

启动mongodb：bin/mongod --fork --logpath /data/db/mongodb.log -f /etc/mongodb.conf

2，初始化replicSet

root# bin/mongo

>rs.initiate( {"_id" : "rs0", "version" : 1, "members" : [ { "_id" : 0, "host" : "127.0.0.1:27017" } ]})

3，搭建好replicSet之后，退出mongo shell重新登录，提示符会变成：

rs0:PRIMARY>

第二步，安装mongodb-river插件

插件项目：https://github.com/richardwilly98/elasticsearch-river-mongodb
安装插件命令：

bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.0

完毕后启动elasticsearch，正常会显示如下提示信息：

root# bin/elasticsearch

...

[2014-03-14 19:28:34,179][INFO ][plugins] [Super Rabbit] loaded [mongodb-river], sites [river-mongodb]

[2014-03-14 19:28:41,032][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] Starting river mongodb_test

[2014-03-14 19:28:41,087][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] MongoDB River Plugin - version[2.0.0] - hash[a0c23f1] - time[2014-02-23T20:40:05Z]

[2014-03-14 19:28:41,087][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] starting mongodb stream. options: secondaryreadpreference [false], drop_collection [false], include_collection [], throttlesize [], gridfs [false], filter [null], db [test], collection [page], script [null], indexing to [test]/[page]

[2014-03-14 19:28:41,303][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] MongoDB version - 2.2.7

第三步，创建meta信息

1，创建mongodb连接

root# curl -XPUT "localhost:9200/_river/mongodb_mytest/_meta" -d '

> {

> "type": "mongodb",

> "mongodb": {

> "host": "localhost",

> "port": "27017",

> "db": "testdb",

> "collection": "testcollection"

> },

> "index": {

> "name": "testdbindex",

> "type": "testcollection"} }'

{"_index":"_river","_type":"mongodb_mytest","_id":"_meta","_version":1,"created":true}'

返回created为true，表示创建成功，也可通过curl "http://localhost:9200/_river/mongodb_mytest/_meta"查看

主要分为三个部分：

type：river的类型，也就是“mongodb”
mongodb：mongodb的连接信息
index：elastisearch中用于接收mongodb数据的索引index和“type”。

其中mongodb_mytest为${es.river.name}，每个索引名称都不一样，如果重复插入会导致索引被覆盖的问题。

2，往mongodb插入数据

rs0:PRIMARY> db.testcollection.save({name:"stone"})

3，自定义查询

root# curl -XGET 'http://localhost:9200/testdbindex/_search?q=name:stone'

{"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.30685282,"hits":[{"_index":"testdbindex","_type":"testcollection","_id":"5322eb23fdfc233ffcfa02bb","_score":0.30685282, "_source" : {"_id":"5322eb23fdfc233ffcfa02bb","name":"stone"}}]}}

一个问题（我这边测试不存在这个问题，创建meta后之前mongodb中已存在的数据也会被索引，不过还是把原作者的解决方案放在下面吧）

"在river建立之后的数据变动会体现在elasticsearh里，但是river建立前的数据变动因为没有在oplog表里，不能被同步。解决方案是，遍历一次需要导出的表，重新插入到另外一个表里，然后将river指定到这个新表，这样新表的变动就可以全部体现在oplog里了。"

遍历mongodb的表可以通过cursor来实现：

var myCursor = db.oldcollection.find( { }, {html:0} );

myCursor.forEach(function(myDoc) {db.newcollection.save(myDoc); });

附：mongodb&mongodb-river（elasticsearch）部署

elasticsearch使用示例如下：（index索引对应 database数据库，type类型对应 table数据表）

1，查询单个索引条目

curl -XGET 'http://localhost:9200/testdbindex/testcollection/532a45ad94af83f0122292cf'

{"_index":"testdbindex","_type":"testcollection","_id":"532a45ad94af83f0122292cf","_version":1,"found":true, "_source" : {"_id":"532a45ad94af83f0122292cf","name":"stone"}}

2，查询多个索引条目

curl 'localhost:9200/testdbindex/testcollection/_mget' -d '{

    "ids":["532a40f51d82291684692d1d","532a45ad94af83f0122292cf"]

}'  

3，搜索指定域（类似关系型数据库列字段）

curl -XGET 'http://localhost:9200/testdbindex/testcollection/532a40f51d82291684692d1d?fields=title'

4，搜索

curl -XGET 'http://localhost:9200/testdbindex/testcollection/_search' -d '{

    "query":{

        "term" : {"name":"penjin"}

    }

}'

5，在所有type类型里面搜索name=stone

curl -XGET 'http://localhost:9200/testdbindex/_search?q=name:stone'

6，在指定type为testcollection里面搜索

curl -XGET 'http://localhost:9200/testdbindex/testcollection/_search?q=name:stone'

7

查找count数目

curl -XGET 'http://localhost:9200/testdbindex/testcollection/_count?q=name:stone'

curl -XGET 'http://localhost:9200/testdbindex/_count?q=name:stone'

curl -XGET 'http://localhost:9200/testdbindex/blogs/_count' -d '

{

    "query" : {

        "term" : { "name" : "stone" }

    }

}'

8，复杂查询

/**

* 1,指定查询起始及数目

* 2,指定排序

* 3,查询指定域

* 4,查询条件

*/

curl -XGET 'http://localhost:9200/testdbindex/blogs/_search' -d '

{

    "from" : 0, "size" : 10,

    "sort" : [

        { "name" : "desc" }

    ],

    "fields" : ["name"],

    "query" : {

        "term" : { "name" : "stone" }

    }

}'

/**

* 依赖分词

*/

curl -XGET 'http://localhost:9200/testdbindex/blogs/_search' -d '

{

    "query" : {

        "match" : {

            _all : "stone"

        }

    }

}'

/**

* 类似数据库like语句

*/

curl -XGET 'http://localhost:9200/testdbindex/blogs/_search' -d '

{

    "query" : {

        "fuzzy_like_this" : {

            "fields" : ["name"],

            "like_text" : "ston",

            "max_query_terms" : 12

        }

    }

}'

9，更多高级查询参照elasticsearch官方页面

如果索引数据多了，elasticsearch的data目录会很大，如果不得不清理磁盘的话，删除索引即可。一般情况需要扩容磁盘。

root# curl -XDELETE 'http://localhost:9200/testdbindex'

root# curl -XDELETE 'http://localhost:9200/_river' (这行不需要)

{"acknowledged":true}

java语言使用jar包查询等操作也很方便（依赖elasticsearch.jar与lucene-core.jar包，es的安装包解压后lib目录下有）

package com.ciaos;

import java.util.Iterator;

import java.util.Map.Entry;

import org.elasticsearch.action.search.SearchResponse;

import org.elasticsearch.action.search.SearchType;

import org.elasticsearch.client.transport.TransportClient;

import org.elasticsearch.common.transport.InetSocketTransportAddress;

import org.elasticsearch.common.unit.TimeValue;

import org.elasticsearch.index.query.QueryBuilder;

import org.elasticsearch.index.query.QueryBuilders;

import org.elasticsearch.search.SearchHit;

public class EsDemo {

    private static TransportClient client = null;

    public static void GetConnection(){

        client = new TransportClient().addTransportAddress(new InetSocketTransportAddress(

                "127.0.0.1", 9300));

    }

    public static void searchIndex() {

        QueryBuilder qb = QueryBuilders.termQuery("name", "stone");

        SearchResponse scrollResp = client.prepareSearch("testdbindex")

                        .setSearchType(SearchType.SCAN)

                        .setScroll(new TimeValue(60000))

                        .setQuery(qb.buildAsBytes())

                        .setSize(100).execute().actionGet();

        while (true) {

            scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();

            boolean hitsRead = false;

            for (SearchHit hit : scrollResp.getHits()) {

                hitsRead = true;

                Iterator<Entry<String, Object>> rpItor = hit.getSource().entrySet().iterator();

                while (rpItor.hasNext()) {

                     Entry<String, Object> rpEnt = rpItor.next();

                     System.out.println(rpEnt.getKey() + " : " + rpEnt.getValue());

                }

            }

            if (!hitsRead) {

                break;

            }

        }

    }

    public static void main(String[] args) {

        // TODO Auto-generated method stub

        GetConnection();

        searchIndex();

        client.close();

    }

}

运行结果如下：

_id : 532a49e294af83f0122292d3

name : stone

_id : 532a45ad94af83f0122292cf

name : stone

用elasticsearch索引mongodb数据的更多相关文章

Elasticsearch 与 Mongodb 数据同步问题
1.mongo-connector工具首先安装python环境 wget http://www.python.org/ftp/python/3.0.1/Python-3.0.1.tgz tar -z ...
通过logstash-input-mongodb插件将mongodb数据导入ElasticSearch
目的很简单,就是将mongodb数据导入es建立相应索引.数据是从特定的网站扒下来,然后进行二次处理,也就是数据去重.清洗,接着再保存到mongodb里,那么如何将数据搞到ElasticSearch中 ...
Linux安装ElasticSearch与MongoDB分布式集群环境下数据同步
ElasticSearch有一个叫做river的插件式模块,可以将外部数据源中的数据导入elasticsearch并在上面建立索引.River在集群上是单例模式的,它被自动分配到一个节点上,当这个节点 ...
MongoDB 数据自动同步到 ElasticSearch
我们产品中需要全文检索的功能,后端数据存储主要使用了 MySQL + MongoDB,而其中需要检索的内容是在 MongoDB 中的. MongoDB 本身是自带文本索引功能的,但是,不支持中文.术业 ...
logstash-out-mongodb实现elasticsearch到Mongodb的数据同步
本文主要实现将Elasticsearch中的索引数据Index同步到Mongodb中的集合collection中. 0.前提 1)已经安装好源数据库:elasticsearch V2.X; 2)已经安 ...
Elasticsearch：同步 MongoDB 数据到 Elasticsearch
转载自:https://elasticstack.blog.csdn.net/article/details/114639152 MongoDB 是一个基于分布式文件存储的数据库.由 C++ 语言编写 ...
Elasticsearch .Net Client NEST 索引DataSet数据
NEST 索引DataSet数据,先序列化然后转成dynamic 类型进行索引: /// <summary> /// 索引dataset /// </summary> /// ...
基于netcore实现mongodb和ElasticSearch之间的数据实时同步的工具（Mongo2Es）
基于netcore实现mongodb和ElasticSearch之间的数据实时同步的工具支持一对一,一对多,多对一和多对多的数据传输方式. 一对一 - 一个mongodb的collection对应一 ...
Elasticsearch索引按月划分以及获取所有索引数据
项目中数据库根据月份水平划分,由于没有用数据库中间件,没办法一下查询所有订单信息,所有用Elasticsearch做订单检索. Elasticsearch索引和数据库分片同步,也是根据月份来建立索引. ...

随机推荐

jQuery数据缓存data(name, value)详解及实现
一. jQuery数据缓存的作用 jQuery数据缓存的作用在中文API中是这样描述的:“用于在一个元素上存取数据而避免了循环引用的风险”.如何理解这句话呢,看看我下面的举例,不知道合不合适,如果你有 ...
[暂停一天]从零开始PHP学习 - 第六天
今天这个系列没有时间去写了在公司完善一个项目已经备好6瓶咖啡两天 + 一夜完成这个项目真是苦逼诶反正这几天明白一个道理:别以为你多牛B 你不会的东西多了! 比你牛B的人也多 ...
如何取消一个本地svn目录与svn的联系（即恢复原有图标等）
在使用svn 的时候容易手抖错选update地址,使其目录所有同级文件夹上出现蓝色“?”图样,非常烦人,下面记录一下解决方案. 首先在该目录下打开同级文件件,工具→文件夹选项→查看→隐藏文件和文件夹→ ...
qwtplot3D安装——终结解决方案（YOUYOU版）
转自CSDN: 首先不得不说,要感谢北京邮电大学的阿科.感谢他慷慨的分享和极具科学态度的记录,将自己搜集到的众多资料收集整理发布,拯救众多苦逼寻找方案的程序员于苦海之中.因为最近接手新的项目,涉及到使 ...
Android多线程任务优化1：探讨AsyncTask的缺陷
AsyncTask还有别的缺陷,在生成listview的时候,如果adapter里面的count动态改变的话,不能使用AsyncTask,只能使用Thread+Handler,否则会出现如下错误 j ...
js导出成excel
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/ ...
opencv 简单、常用的图像处理函数（2）
opencv的项目以来配置和环境变量的配置都很简单,对于我这个没有c++基础的来说,复杂的是opencv的api和一些大部分来自国外没有翻译的资料,以及一些常见的编码问题. 资料 opencv 中文a ...
github中的ssh配置
1.配置git信息设置git的user name和email: $ git config --global user.name "tigerjibo"$ git config - ...
Cocos2d-X 动作展示《一》
因为Cocos2d-X中的动作较多,我将全部的动作制作成了一个滚动视图.每一个滚动视图上都有动作名,单击滚动视图就能够展示对应的动作程序效果图: 使用滚动视图实现动作切换动作展示程序代码: 首先 ...
C++学习笔记29，引用变量（1）
引用变量在创建的时候就必须初始化.无法创建一个未被初始化的引用. #include <iostream> using namespace std; int main() { int x=1 ...

用elasticsearch索引mongodb数据

用elasticsearch索引mongodb数据的更多相关文章

随机推荐

热门专题