Solr之缓存篇

原文出自：http://my.oschina.net/u/1026644/blog/123957

Solr在Lucene之上开发了很多Cache功能，从目前提供的Cache类型有：

(1)filterCache

(2)documentCache

(3)fieldvalueCache

(4)queryresultCache

而每种Cache针对具体的查询请求进行对应的Cache。本文将从几个方面来阐述上述几种Cache在Solr的运用，具体如下：

（1）Cache的生命周期

（2）Cache的使用场景

（3）Cache的配置介绍

（4）Cache的命中监控

1 Cache生命周期

所有的Cache的生命周期由SolrIndexSearcher来管理，如果Cache对应的SolrIndexSearcher被重新构建都代表正在运行的Cache对象失效，而SolrIndexSearcher是否重新打开主要有几个方面影响。

（1）增量数据更新后提交DirectUpdateHandler2.commit(CommitUpdateCommand cmd)，该方法代码如下：

if (cmd.optimize) {

      optimizeCommands.incrementAndGet();

    } else {

      commitCommands.incrementAndGet();

      if (cmd.expungeDeletes) expungeDeleteCommands.incrementAndGet();

    }

    Future[] waitSearcher = null;

    if (cmd.waitSearcher) {//是否等待打开SolrIndexSearcher，一般新的Searcher会做一些预备工作，比如预热Cache

      waitSearcher = new Future[1];

    }

    boolean error=true;

    iwCommit.lock();

    try {

      log.info("start "+cmd);

      if (cmd.optimize) {//是否优化索引，一般增量数据不优化

        openWriter();

        writer.optimize(cmd.maxOptimizeSegments);

      } else if (cmd.expungeDeletes) {

        openWriter();

        writer.expungeDeletes();//一般对于标记删除的文档进行物理删除，当然优化也能将标记删除的doc删除，

//但是该方法会比优化快很多

 }

      closeWriter();//关闭增量打开的Writer对象

      callPostCommitCallbacks();

      if (cmd.optimize) {//如果有listener的话会执行这部分代码

        callPostOptimizeCallbacks();

      }

      // open a new searcher in the sync block to avoid opening it

      // after a deleteByQuery changed the index, or in between deletes

      // and adds of another commit being done.

      core.getSearcher(true,false,waitSearcher);//该方法是重新打开Searcher的关键方法，

   //其中有重要参数来限定是否new open 或者reopen IndexReader.

      // reset commit tracking

      tracker.didCommit();//提供Mbean的一些状态监控

      log.info("end_commit_flush");

      error=false;

    }

    finally {//commlit后将一些监控置0

      iwCommit.unlock();

      addCommands.set(0);

      deleteByIdCommands.set(0);

      deleteByQueryCommands.set(0);

      numErrors.set(error ? 1 : 0);

    }

    // if we are supposed to wait for the searcher to be registered, then we should do it

    // outside of the synchronized block so that other update operations can proceed.

    if (waitSearcher!=null && waitSearcher[0] != null) {

       try {

        waitSearcher[0].get();//等待Searcher经过一系列操作，例如Cache的预热。

      } catch (InterruptedException e) {

        SolrException.log(log,e);

      } catch (ExecutionException e) {

        SolrException.log(log,e);

      }

    }

  }

其中最重要的方法

core.getSearcher(true,false,waitSearcher);

再展开来看参数含义，

参数1 boolean forceNew，是否打开新的searcher对象

参数2 boolean returnSearcher，是否返回最新的searcher对象

参数3 final Future[] waitSearcher 是否等待searcher的预加工动作，也就是调用该方法的线程将会等待这个searcher对象的预加工动作，如果该searcher对象管理很多的 Cache并设置较大的预热数目，该线程将会等待较长时间才能返回。（预热，也许会很多人不了解预热的含义，我在这里稍微解释下，例如一个Cache已经缓存了比较多的值，如果因为新的IndexSearcher被重新构建，那么新的Cache又会需要重新累积数据，那么会发现搜索突然会在一段时间性能急剧下降，要等到Cache重新累计了一定数据，命中率才会慢慢恢复。所以这样的情形其实是不可接受的，那么我们可以做的事情就是将老Cache对应的
key,在重新构建SolrIndexSearcher返回之前将这些已经在老Cache中Key预先从磁盘重新load Value到Cache中，这样暴露出去的SolrIndexSearcher对应的Cache就不是一个内容为空的Cache。而是已经“背地”准备好内容的Cache）

getSearcher()关于Cache有2个最重要的代码段，其一，重新构造新的SolrIndexSearcher：

newestSearcher = getNewestSearcher(false);

      String newIndexDir = getNewIndexDir();

      File indexDirFile = new File(getIndexDir()).getCanonicalFile();

      File newIndexDirFile = new File(newIndexDir).getCanonicalFile();

      // reopenReaders在solrconfig.xml配置，如果为false，每次都是重新打开新的IndexReader

      if (newestSearcher != null && solrConfig.reopenReaders

          && indexDirFile.equals(newIndexDirFile)) {

        IndexReader currentReader = newestSearcher.get().getReader();

        IndexReader newReader = currentReader.reopen();//如果索引目录没变则是reopen indexReader

        if (newReader == currentReader) {

          currentReader.incRef();

        }

        tmp = new SolrIndexSearcher(this, schema, "main", newReader, true, true);//构建新的SolrIndexSearcher

      } else {//根据配置的IndexReaderFactory来返回对应的IndexReader

        IndexReader reader = getIndexReaderFactory().newReader(getDirectoryFactory().open(newIndexDir), true);

        tmp = new SolrIndexSearcher(this, schema, "main", reader, true, true);//返回构建新的SolrIndexSearcher

      }

在看看创建SolrIndexSearcher构造函数关于Cache的关键代码：

if (cachingEnabled) {//如果最后的参数为true代表可以进行Cache

      ArrayList<SolrCache> clist = new ArrayList<SolrCache>();

      fieldValueCache = solrConfig.fieldValueCacheConfig==null ? null : solrConfig.fieldValueCacheConfig.newInstance();

      if (fieldValueCache!=null) clist.add(fieldValueCache);//如果solrconfig配置 <fieldValueCache....，构建新的Cache

      filterCache= solrConfig.filterCacheConfig==null ? null : solrConfig.filterCacheConfig.newInstance();

      if (filterCache!=null) clist.add(filterCache);//如果solrconfig配置  <filterCache ...，构建新的Cache

      queryResultCache = solrConfig.queryResultCacheConfig==null ? null : solrConfig.queryResultCacheConfig.newInstance();

      if (queryResultCache!=null) clist.add(queryResultCache);//如果solrconfig配置  <queryResultCache...，构建新的Cache

      documentCache = solrConfig.documentCacheConfig==null ? null : solrConfig.documentCacheConfig.newInstance();

      if (documentCache!=null) clist.add(documentCache);//如果solrconfig配置  <documentCache...,构建新的Cache

      if (solrConfig.userCacheConfigs == null) {

        cacheMap = noGenericCaches;

      } else {//自定义的Cache

        cacheMap = new HashMap<String,SolrCache>(solrConfig.userCacheConfigs.length);

        for (CacheConfig userCacheConfig : solrConfig.userCacheConfigs) {

          SolrCache cache = null;

          if (userCacheConfig != null) cache = userCacheConfig.newInstance();

          if (cache != null) {

            cacheMap.put(cache.name(), cache);

            clist.add(cache);

          }

        }

      }

      cacheList = clist.toArray(new SolrCache[clist.size()]);

    }

其二，将老searcher对应的Cache进行预热：

future = searcherExecutor.submit(

                new Callable() {

                  public Object call() throws Exception {

                    try {

                      newSearcher.warm(currSearcher);

                    } catch (Throwable e) {

                      SolrException.logOnce(log,null,e);

                    }

                    return null;

                  }

                }

        );

展开看warm(SolrIndexSearcher old)方法（具体如何预热Cache将在其他文章进行详述）：

public void warm(SolrIndexSearcher old) throws IOException {

    // Make sure this is first!  filters can help queryResults execute!

    boolean logme = log.isInfoEnabled();

    long warmingStartTime = System.currentTimeMillis();

    // warm the caches in order...

    for (int i=0; i<cacheList.length; i++) {//遍历所有配置的Cache，将进行old-->new 的Cache预热。

      if (logme) log.info("autowarming " + this + " from " + old + "\n\t" + old.cacheList[i]);

      this.cacheList[i].warm(this, old.cacheList[i]);

      if (logme) log.info("autowarming result for " + this + "\n\t" + this.cacheList[i]);

    }

    warmupTime = System.currentTimeMillis() - warmingStartTime;//整个预热所耗时间

  }

到这里为止，SolrIndexSearcher进行Cache创建就介绍完毕，而Cache的销毁也是通过SolrIndexSearcher的关闭一并进行，见solrIndexSearcher.close()方法：

public void close() throws IOException {

    if (cachingEnabled) {

      StringBuilder sb = new StringBuilder();

      sb.append("Closing ").append(name);

      for (SolrCache cache : cacheList) {

        sb.append("\n\t");

        sb.append(cache);

      }

      log.info(sb.toString());//打印Cache状态信息，例如当前Cache命中率。累积命中率，大小等。

    } else {

      log.debug("Closing " + name);

    }

    core.getInfoRegistry().remove(name);

    // super.close();

    // can't use super.close() since it just calls reader.close() and that may only be called once

    // per reader (even if incRef() was previously called).

    if (closeReader) reader.decRef();//Reader对象计数减1

    for (SolrCache cache : cacheList) {

      cache.close();//关闭Cache

    }

    // do this at the end so it only gets done if there are no exceptions

    numCloses.incrementAndGet();

  }

OK，到这里，Cache经由SolrIndexSearcher管理的逻辑就完整介绍完毕。

2 Cache的使用场景

（1）filterCache

该Cache主要是针对用户Query中使用fq的情况，会将fq对应的查询结果放入Cache，如果业务上有很多比较固定的查询Query，例如固定状态值，比如固定查询某个区间的Query都可以使用fq将结果缓存到Cache中。查询query中可以设置多个fq进行Cache，但是值得注意的是多个fq都是以交集的结果返回。

另外一个最为重要的例外场景，在Solr中如果设置，useFilterForSortedQuery=true，filterCache不为空，且带有sort的排序查询，将会进入如下代码块:

if ((flags & (GET_SCORES|NO_CHECK_FILTERCACHE))==0 && useFilterForSortedQuery && cmd.getSort() != null && filterCache != null) {

      useFilterCache=true;

      SortField[] sfields = cmd.getSort().getSort();

      for (SortField sf : sfields) {

        if (sf.getType() == SortField.SCORE) {

          useFilterCache=false;

          break;

        }

      }

    }

    // disable useFilterCache optimization temporarily

    if (useFilterCache) {

      // now actually use the filter cache.

      // for large filters that match few documents, this may be

      // slower than simply re-executing the query.

      if (out.docSet == null) {//在DocSet方法中将会把Query的结果也Cache到filterCache中。

        out.docSet = getDocSet(cmd.getQuery(),cmd.getFilter());

        DocSet bigFilt = getDocSet(cmd.getFilterList());//fq不为空将Cache结果到filterCache中。

        if (bigFilt != null) out.docSet = out.docSet.intersection(bigFilt);//返回2个结果集合的交集

      }

      // todo: there could be a sortDocSet that could take a list of

      // the filters instead of anding them first...

      // perhaps there should be a multi-docset-iterator

      superset = sortDocSet(out.docSet,cmd.getSort(),supersetMaxDoc);//排序

      out.docList = superset.subset(cmd.getOffset(),cmd.getLen());//返回len 大小的结果集合

（2）documentCache主要是对document结果的Cache，一般而言如果查询不是特别固定，命中率将不会很高。

（3）fieldvalueCache 缓存在facet组件使用情况下对multiValued=true的域相关计数进行Cache，一般那些多值域采用facet查询一定要开启该Cache，主要缓存（参考UnInvertedField 的实现）:

maxTermCounts 最大Term数目

numTermsInField 该Field有多少个Term

bigTerms 存储那些Term docFreq 大于threshold的term

tnums 一个记录 term和何其Nums的二维数组

每次FacetComponent执行process方法–>SimpleFacets.getFacetCounts()–>getFacetFieldCounts()–>getTermCounts(facetValue)–>

UnInvertedField.getUnInvertedField(field, searcher);展开看该方法

public static UnInvertedField getUnInvertedField(String field, SolrIndexSearcher searcher) throws IOException {

    SolrCache cache = searcher.getFieldValueCache();

    if (cache == null) {

      return new UnInvertedField(field, searcher);//直接返回

    }

    UnInvertedField uif = (UnInvertedField)cache.get(field);

    if (uif == null) {//第一次初始化该域对应的UnInvertedField

      synchronized (cache) {

        uif = (UnInvertedField)cache.get(field);

        if (uif == null) {

          uif = new UnInvertedField(field, searcher);

          cache.put(field, uif);

        }

      }

    }

    return uif;

  }

（4）queryresultCache 对Query的结果进行缓存，主要在SolrIndexSearcher类的getDocListC（）方法中被使用，主要缓存具有 QueryResultKey的结果集。也就是说具有相同QueryResultKey的查询都可以命中cache,所以我们看看 QueryResultKey的equals方法如何判断怎么才算相同QueryResultKey：

public boolean equals(Object o) {

    if (o==this) return true;

    if (!(o instanceof QueryResultKey)) return false;

    QueryResultKey other = (QueryResultKey)o;

    // fast check of the whole hash code... most hash tables will only use

    // some of the bits, so if this is a hash collision, it's still likely

    // that the full cached hash code will be different.

    if (this.hc != other.hc) return false;

    // check for the thing most likely to be different (and the fastest things)

    // first.

    if (this.sfields.length != other.sfields.length) return false;//比较排序域长度

    if (!this.query.equals(other.query)) return false;//比较query

    if (!isEqual(this.filters, other.filters)) return false;//比较fq

    for (int i=0; i<sfields.length; i++) {

      SortField sf1 = this.sfields[i];

      SortField sf2 = other.sfields[i];

      if (!sf1.equals(sf2)) return false;//比较排序域

    }

    return true;

  }

从上面的代码看出，如果要命中一个queryResultCache，需要满足query、filterquery sortFiled一致才行。

3 Cache的配置介绍

要使用Solr的四种Cache，只需要在SolrConfig中配置如下内容即可：

<query>

        <filterCache               size="300"      initialSize="10"      autowarmCount="300"/>

        <queryResultCache      size="300"      initialSize="10"      autowarmCount="300"/>

        <fieldValueCache       size="300"      initialSize="10"       autowarmCount="300" />

        <documentCache             size="5000"      initialSize="512"      autowarmCount="300"/>

        <useFilterForSortedQuery>true</useFilterForSortedQuery>//是否能使用到filtercache关键配置

        <queryResultWindowSize>50</queryResultWindowSize>//queryresult的结果集控制

        <enableLazyFieldLoading>false</enableLazyFieldLoading>//是否启用懒加载field

 </query>

其中size为缓存设置大小，initalSize初始化大小，autowarmCount 是最为关键的参数代表每次构建新的SolrIndexSearcher的时候需要后台线程预热加载到新Cache中多少个结果集。

那是不是这个预热数目越大就越好呢，其实还是要根据实际情况而定。如果你的应用为实时应用，很多实时应用的实现都会在很短的时间内去得到重新打开的内存索引indexReader，而Solr默认实现就会重新打开一个新的SolrIndexSearcher,那么如果Cache需要预热的数目越多，那么打开新的SolrIndexSearcher就会越慢，这样对实时性就会大打折扣。

但是如果设置很小。每次都打开新的SolrIndexSearcher都是空Cache，基本上那些fq和facet的查询就基本不会命中缓存。所以对实时应用需要特别注意。

4 Cache的命中监控

页面查询：

http://localhost:8080/XXXX/XXXX/admin/stats.jsp 进行查询即可：

其中 lookups 为当前cache 查询数， hitratio 为当前cache命中率，inserts为当前cache插入数，evictions从cache中踢出来的数据个数,size 为当前cache缓存数， warmuptime为当前cache预热所消耗时间，而已cumulative都为该类型Cache累计的查询，命中，命中率，插入、踢出的数目。

Solr之缓存篇的更多相关文章

jQuery2.x源码解析(缓存篇)
jQuery2.x源码解析(构建篇) jQuery2.x源码解析(设计篇) jQuery2.x源码解析(回调篇) jQuery2.x源码解析(缓存篇) 缓存是jQuery中的又一核心设计,jQuery ...
缓存篇(Cache)~大话开篇
回到占占推荐博客索引闲话杂淡想写这篇文章很久了,但总是感觉内功还不太够,总觉得,要写这种编程领域里的心法(内功)的文章,需要有足够的实践,需要对具体领域非常了解,才能写出来.如今,感觉自己有写这种 ...
缓存篇(Cache)~第一回　使用static静态成员实现服务器端缓存(导航面包屑)
返回目录今天写缓存篇的第一篇文章,在写完目录后,得到了一些朋友的关注,这给我之后的写作带来了无穷的力量,在这里,感谢那几位伙伴,哈哈! 书归正传,今天我带来一个Static静态成员的缓存,其实它也不 ...
缓存篇(Cache)~第三回　HttpModule实现网页的文件级缓存
返回目录再写完缓存篇第一回之后,得到了很多朋友的好评和来信,所以,决定加快步伐,尽快把剩下的文章写完,本篇是第三回,主要介绍使用HttpModule实现的文件级缓存,在看本文之前,大家需要限度Htt ...
缓存篇~第六回　Microsoft.Practices.EnterpriseLibrary.Caching实现基于方法签名的数据集缓存
返回目录这一讲中主要是说EnterpriseLibrary企业级架构里的caching组件,它主要实现了项目缓存功能,它支持四种持久化方式,内存,文件,数据库和自定义,对于持久化不是今天讨论的重要, ...
(转)高性能网站架构之缓存篇—Redis集群搭建
看过高性能网站架构之缓存篇--Redis安装配置和高性能网站架构之缓存篇--Redis使用配置端口转发这两篇文章的,相信你已经对redis有一定的了解,并能够安装上,进行简单的使用了,但是在咱们的 ...
前端技巧：禁止浏览器static files缓存篇（转）
前端技巧:禁止浏览器static files缓存篇由于CSS/JS文件经常需要改动,前端调试时是不希望浏览器缓存这些文件的. 本文记录博主的经验. Meta法目前在chrome调试还没有遇到问题, ...
AspnetCore 缓存篇
AspnetCore 缓存篇一.缓存的作用怎样理解缓存: 其实所有的程序,架构,优化,线程...等技术手段,最终的目的都是如何使产品快速的响应用户的操作,提高用户的体验性,目标都是为了系统的使用者 ...
Spring Boot 揭秘与实战（二）数据缓存篇 - Redis Cache
文章目录 1. Redis Cache 集成 2. 源代码本文,讲解 Spring Boot 如何集成 Redis Cache,实现缓存. 在阅读「Spring Boot 揭秘与实战(二) 数据缓存 ...

随机推荐

.parent()和.parents()的区别
parent的取值很明确,就是当前元素的父元素:parents则是当前元素的祖先元素.下面列出例子说明: 如下: <div id='div1'><div id='div2'>& ...
DataSet.WriteXml()
枚举通常是作为 DataSet.WriteXml() 方法的第二个参数使用.它决定使用哪种格式保存XML: IgnoreSchema --默认值.只写数据集的数据,不带有任何架构信息.如果数据集内无数 ...
poj2135最小费用流
裸题,就是存个模板最小费用流是用spfa求解的,目的是方便求解负环,spfa类似于最大流中的bfs过程 #include<map> #include<set> #includ ...
WPF/WP/Silverlight/Metro App代码创建动画的思路
在2010年之前,我都是用Blend创建动画,添加触发器实现自动动画,后来写成代码创建的方式.如今Blend已经集成到Visual Studio安装镜像中了,最新的VS2015安装,Blend的操作界 ...
oracle 索引(3)
位图索引位图索引非常适合于决策支持系统(Decision Support System,DSS)和数据仓库,它们不应该用于通过事务处理应用程序访问的表.它们可以使用较少到中等基数(不同值的数量)的列 ...
08-THREE.JS 点面创建物体，克隆物体，多材质物体
<!DOCTYPE html> <html> <head> <title></title> < <script src=&quo ...
docker+jenkins 部署持续集成环境
1.使用docker container的方式安装jenkins [root@hadoop default]# docker pull jenkin 创建一个目录用于后边映射 [root@hadoop ...
ENTRYPOINT 与 CMD
在Dockerfile中 ENTRYPOINT 只有最后一条生效,如果写了10条,前边九条都不生效 ENTRYPOINT 的定义为运行一个Docker容器像运行一个程序一样,就是一个执行的命令两种写 ...
java实现sendemail
<dependency> <groupId>com.sun.mail</groupId> <artifactId>javax.mail</arti ...
「新手向」koa2从起步到填坑
前传出于兴趣最近开始研究koa2,由于之前有过一些express经验,以为koa还是很好上手的,但是用起来发现还是有些地方容易懵逼,因此整理此文,希望能够帮助到一些新人. 如果你不懂javascri ...

Solr之缓存篇

Solr之缓存篇的更多相关文章

随机推荐

热门专题