初识lucene

lucene的介绍网上有好多，再写一遍可能有点多余了。

使用lucene之前，有一系列的疑问

为什么lucene就比数据库快？
倒排索引是什么，他是怎么做到的
lucene的数据结构是什么样的，cpu消耗，内存消耗主要因为什么
lucene的索引流程以及查询流程是什么样的

推荐两篇文章，更进一步了解lucene

可以参考lucene与数据库对比部分

http://www.chedong.com/tech/lucene.html

可以参考第一篇和第二篇部分对lucene有一部分了解

http://blog.csdn.net/forfuture1978/article/details/5668956

《Lucene 原理与代码分析》看过一点，但是有点难度。

现在从《lucene实战》这本书来看，lucene使用的是4.7可能与3.0有所区别。

下面是第一节的例子

package com.mitchz.lucence;

import java.io.File;

import java.io.FileFilter;

import java.io.FileReader;

import java.io.IOException;

import org.apache.lucene.analysis.core.SimpleAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.document.StringField;

import org.apache.lucene.document.TextField;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.index.IndexWriterConfig;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.util.Version;

/**

 * @author mitchz

 * @version 1.0

 * @since 2014年4月30日

 * @category com.mitchz.lucence

 */

public class Indexer

{

	private IndexWriter writer;

	public Indexer(String indexDir) throws IOException

	{

		Directory dir = FSDirectory.open(new File(indexDir));

		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_47,

				new SimpleAnalyzer(Version.LUCENE_47));

		writer = new IndexWriter(dir, config);

	}

	public int index(String dataDir, FileFilter filter) throws Exception

	{

		File[] files = (new File(dataDir)).listFiles();

		for (File file : files)

		{

			if (!file.isDirectory() && !file.isHidden() && file.canRead()

					&& (filter == null || filter.accept(file)))

			{

				indexFile(file);

			}

		}

		return writer.numDocs();

	}

	private static class TextFilesFilter implements FileFilter

	{

		@Override

		public boolean accept(File path)

		{

			return path.getName().toLowerCase().endsWith(".txt");

		}

	}

	protected Document getDocument(File file) throws Exception

	{

		Document doc = new Document();

		doc.add(new TextField("contents", new FileReader(file)));

		doc.add(new StringField("filename", file.getName(), Field.Store.YES));

		doc.add(new StringField("fullpath", file.getCanonicalPath(), Field.Store.YES));

		return doc;

	}

	protected void indexFile(File file) throws Exception

	{

		System.out.println("Indexing " + file.getCanonicalPath());

		Document doc = getDocument(file);

		writer.addDocument(doc);

	}

	protected void close() throws IOException

	{

		writer.close();

	}

	public static void main(String[] args) throws Exception

	{

		if (args.length != 2)

		{

			throw new IllegalArgumentException("Usage java " + Indexer.class.getName()

					+ "<index dir> <data dir>");

		}

		String indexDir = args[0];

		String dataDir = args[1];

		System.out.println("indexDir:" + indexDir);

		System.out.println("dataDir:" + dataDir);

		long start = System.currentTimeMillis();

		Indexer indexer = new Indexer(indexDir);

		int numIndexed;

		try

		{

			numIndexed = indexer.index(dataDir, new TextFilesFilter());

		}

		finally

		{

			indexer.close();

		}

		long end = System.currentTimeMillis();

		System.out.println("Indexing " + numIndexed + " files took " + (end - start)

				+ " milliseconds");

	}

}

package com.mitchz.lucence;

import java.io.File;

import java.io.IOException;

import org.apache.lucene.analysis.core.SimpleAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.DirectoryReader;

import org.apache.lucene.queryparser.classic.ParseException;

import org.apache.lucene.queryparser.classic.QueryParser;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.ScoreDoc;

import org.apache.lucene.search.TopDocs;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.util.Version;

/**

 * @author mitchz

 * @version 1.0

 * @since 2014年4月30日

 * @category com.mitchz.lucence

 */

public class Searcher

{

	public static void main(String args[]) throws IOException, ParseException

	{

		if (args.length != 2)

		{

			throw new IllegalArgumentException("Usage java " + Searcher.class.getName()

					+ "<index dir> <query>");

		}

		String indexDir = args[0];

		String q = args[1];

		search(indexDir, q);

	}

	public static void search(String indexDir, String q) throws IOException,

			ParseException

	{

		Directory dir = FSDirectory.open(new File(indexDir));

		DirectoryReader dirReader = DirectoryReader.open(dir);

		IndexSearcher is = new IndexSearcher(dirReader);

		QueryParser parser = new QueryParser(Version.LUCENE_47, "contents",

				new SimpleAnalyzer(Version.LUCENE_47));

		Query query = parser.parse(q);

		long start = System.currentTimeMillis();

		TopDocs hits = is.search(query, 10);

		long end = System.currentTimeMillis();

		System.out.println("Found " + hits.totalHits + " document(s) (in "

				+ (end - start) + " milliseconds) that matched query '" + q + "':");

		for (ScoreDoc scoreDoc : hits.scoreDocs)

		{

			Document doc = is.doc(scoreDoc.doc);

			System.out.println(doc.get("filename"));

		}

	}

}

maven的配置如下：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

	<modelVersion>4.0.0</modelVersion>

	<groupId>com.mitchz</groupId>

	<artifactId>lucence-test</artifactId>

	<version>0.0.1-SNAPSHOT</version>

	<packaging>jar</packaging>

	<name>lucence-test</name>

	<url>http://maven.apache.org</url>

	<properties>

		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

	</properties>

	<dependencies>

		<dependency>

			<groupId>junit</groupId>

			<artifactId>junit</artifactId>

			<version>3.8.1</version>

			<scope>test</scope>

		</dependency>

		<dependency>

			<groupId>org.apache.lucene</groupId>

			<artifactId>lucene-core</artifactId>

			<version>4.7.0</version>

		</dependency>

		<dependency>

			<groupId>org.apache.lucene</groupId>

			<artifactId>lucene-analyzers-common</artifactId>

			<version>4.7.0</version>

		</dependency>

		<dependency>

			<groupId>org.apache.lucene</groupId>

			<artifactId>lucene-queryparser</artifactId>

			<version>4.7.0</version>

		</dependency>

	</dependencies>

</project>

初识lucene的更多相关文章

初识Lucene.net
最近想提高下自己的能力,也是由于自己的项目中需要用到Lucene,所以开始接触这门富有挑战又充满新奇的技术.. 刚刚开始,只是写了个小小的demo,用了用lucene,确实很好创建索引 Data ...
初识 Lucene
Lucene是一个信息检索工具库,而不是一个完整的搜索程序搜索程序 Lucene索引核心类 Lucene索引核心类: Document: 文档对象代表一些域(field)的集合 Field: 每个文 ...
第一章初识Lucene
多看几遍,慢就是快 1.1 应对信息爆炸 1.2 Lucene 是什么 1.2.1 Lucene 能做些什么 1.2.2 Lucene 的历史 1.3 Lucene 和搜索程序组件基本概念索引操作 ...
初识lucene（想看代码的跳过）
最早是在百度贴吧里看到的lucene这个名称,只知道跟搜索引擎有关,因为工作中一直以来没有类似的需求,所以没有花时间学习这方面的知识. 刚过完年,公司不忙,自己闲不住把<Netty权威指南> ...
1. 初识 Lucene
在学习Lucene之前呢,我们当然首先要了解下什么是Lucene. 0x01 什么是Lucene ? Lucene是一套用于全文检索和搜索的开放源代码程序库,由Apache软件基金会支持和提供. Lu ...
（转）初识 Lucene
Lucene 是一个基于 Java 的全文信息检索工具包,它不是一个完整的搜索应用程序,而是为你的应用程序提供索引和搜索功能.Lucene 目前是 Apache Jakarta 家族中的一个开源项目. ...
实战 Lucene，第 1 部分: 初识 Lucene (zhuan)
http://www.ibm.com/developerworks/cn/Java/j-lo-lucene1/ ******************************************** ...
搜索引擎学习（一）初识Lucene
一.Lucene相关基础概念定义:一个简易的工具包,实现文件搜索的功能,支持中文,关键字,多条件查询,凡是文件名或文件内容包含的都查出来. 数据分类:结构化数据(固定格式或有限长度的数据)和非结构化 ...
【转载】Lucene.Net入门教程及示例
本人看到这篇非常不错的Lucene.Net入门基础教程,就转载分享一下给大家来学习,希望大家在工作实践中可以用到. 一.简单的例子 //索引Private void Index(){ Index ...

随机推荐

了解 Windows Azure 存储的可伸缩性、可用性、持久性和计费
借助 Windows Azure存储,应用程序开发者及其应用程序和用户可以在云中使用可用性更高.持久性更长.可伸缩性更强的海量存储.开发者可以构建能随时随地高效访问数据的服务,在所需的时间段内存储任意 ...
【转】Ubuntu命令行下安装、卸载、管理软件包的方法
原文网址:http://oss.org.cn/html/47/n-67447.html 一.Ubuntu中软件安装方法 1.APT方式 (1)普通安装:apt-get install softname ...
解决Jenkins上git出现的“ERROR: Error fetching remote repo 'origin'”问题
今天对清掉了Jenkins中项目的工作空间,结果构建出现“ERROR: Error fetching remote repo 'origin'”问题:网上各种找也没找到解决这个问题的方法. 后来看错误 ...
poj 2773 利用欧拉函数求互质数
题意:找到与n互质的第 k个数开始一看n是1e6 敲了个暴力结果tle了,后来发现k达到了 1e8 所以需要用到欧拉函数. 我们设小于n的 ,与n互质的数为 (a1,a2,a3.......a(p ...
Android 开机启动通知
效果图: 学习: 1.静态注册实现开机启动 <uses-permission android:name="android.permission.RECEIVE_BOOT_COMPLET ...
c语言函数定义、函数声明、函数调用以及extern跨文件的变量引用
1.如果没有定义,只有声明和调用:编译时会报连接错误.undefined reference to `func_in_a'2.如果没有声明,只有定义和调用:编译时一般会报警告,极少数情况下不会报警告. ...
syslog_test.c 简单的syslog函数
#cat syslog_test.c #include<stdio.h> #include<stdlib.h> #include<syslog.h> int mai ...
hdu 5391 Zball in Tina Town(打表找规律)
问题描述 Tina Town 是一个善良友好的地方,这里的每一个人都互相关心. Tina有一个球,它的名字叫zball.zball很神奇,它会每天变大.在第一天的时候,它会变大11倍.在第二天的时候, ...
python3-day3(函数-返回值)
1.函数函数式:将某功能代码封装到函数中,日后便无需重复编写,仅调用函数即可面向对象:对函数进行分类和封装,让开发“更快更好更强...” 2.return返回值 import smtplibfro ...
vagrant拷贝后vagrant file需要加的配置
config.ssh.forward_agent config.ssh.username = "vagrant" config.ssh.password = "vagra ...

初识lucene

初识lucene的更多相关文章

随机推荐

热门专题