若要搜索的信息都是被存储在数据库里面的,但是solr不能直接搜数据库,所以只有借助Solr组件将要搜索的信息在搜索服务器上进行索引,然后在客户端供客户使用。

1. SolrDispatchFilter

SolrDispatchFilter的作用:将请求的url映射到定义在solrconfig.xml中的处理器handler。

要处理的动作有:

  enum Action {
PASSTHROUGH, FORWARD, RETURN, RETRY, ADMIN, REMOTEQUERY, PROCESS
}

PASSTHROUGH:通过webapp传递到Restlet。

FORWARD:跳转重写的url(没有路径前缀和核心/集合名称)到Restlet。

RETURN:返回控制,不需要更多特定的处理,通常在设置错误并返回时产生。

RETRY:重试请求。当没有发现工作的core时,设置此参数。

注:核心是指CoreContainer

SolrDispatchFilter间接继承了javax.servlet.Filter,实现方法为doFilter():

public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain, boolean retry) throws IOException, ServletException {
if (!(request instanceof HttpServletRequest)) return; AtomicReference<ServletRequest> wrappedRequest = new AtomicReference();
if (!authenticateRequest(request, response, wrappedRequest)) { // the response and status code have already been sent
return;
}
if (wrappedRequest.get() != null) {
request = wrappedRequest.get();
}
if (cores.getAuthenticationPlugin() != null) {
log.debug("User principal: {}", ((HttpServletRequest)request).getUserPrincipal());
} // No need to even create the HttpSolrCall object if this path is excluded.
if(excludePatterns != null) {
String servletPath = ((HttpServletRequest) request).getServletPath();
for (Pattern p : excludePatterns) {
Matcher matcher = p.matcher(servletPath);
if (matcher.lookingAt()) {
chain.doFilter(request, response);
return;
}
}
} HttpSolrCall call = getHttpSolrCall((HttpServletRequest) request, (HttpServletResponse) response, retry);
try {
Action result = call.call();
switch (result) {
case PASSTHROUGH:
chain.doFilter(request, response);
break;
case RETRY:
doFilter(request, response, chain, true);
break;
case FORWARD:
request.getRequestDispatcher(call.getPath()).forward(request, response);
break;
}
} finally {
call.destroy();
}
}

SolrDispatchFilter调用HttpSolrCall的call()方法来处理。

2. 调用HttpSolrCall处理请求

HttpSolrCall的构造函数:

  public HttpSolrCall(SolrDispatchFilter solrDispatchFilter, CoreContainer cores,
HttpServletRequest request, HttpServletResponse response, boolean retry) {
this.solrDispatchFilter = solrDispatchFilter;
this.cores = cores;
this.req = request;
this.response = response;
this.retry = retry;
this.requestType = RequestType.UNKNOWN;
queryParams = SolrRequestParsers.parseQueryString(req.getQueryString());
}

在call方法中完整请求处理:

 /**
* This method processes the request.
*/
public Action call() throws IOException {
MDCLoggingContext.reset();
MDCLoggingContext.setNode(cores); if (cores == null) {
sendError(503, "Server is shutting down or failed to initialize");
return RETURN;
} if (solrDispatchFilter.abortErrorMessage != null) {
sendError(500, solrDispatchFilter.abortErrorMessage);
return RETURN;
} try {
init();
/* Authorize the request if
1. Authorization is enabled, and
2. The requested resource is not a known static file
*/
if (cores.getAuthorizationPlugin() != null) {
AuthorizationContext context = getAuthCtx();
log.info(context.toString());
AuthorizationResponse authResponse = cores.getAuthorizationPlugin().authorize(context);
if (!(authResponse.statusCode == HttpStatus.SC_ACCEPTED) && !(authResponse.statusCode == HttpStatus.SC_OK)) {
sendError(authResponse.statusCode,
"Unauthorized request, Response code: " + authResponse.statusCode);
return RETURN;
}
} HttpServletResponse resp = response;
switch (action) {
case ADMIN:
handleAdminRequest();
return RETURN;
case REMOTEQUERY:
remoteQuery(coreUrl + path, resp);
return RETURN;
case PROCESS:
final Method reqMethod = Method.getMethod(req.getMethod());
HttpCacheHeaderUtil.setCacheControlHeader(config, resp, reqMethod);
// unless we have been explicitly told not to, do cache validation
// if we fail cache validation, execute the query
if (config.getHttpCachingConfig().isNever304() ||
!HttpCacheHeaderUtil.doCacheHeaderValidation(solrReq, req, reqMethod, resp)) {
SolrQueryResponse solrRsp = new SolrQueryResponse();
/* even for HEAD requests, we need to execute the handler to
* ensure we don't get an error (and to make sure the correct
* QueryResponseWriter is selected and we get the correct
* Content-Type)
*/
SolrRequestInfo.setRequestInfo(new SolrRequestInfo(solrReq, solrRsp));
execute(solrRsp);
HttpCacheHeaderUtil.checkHttpCachingVeto(solrRsp, resp, reqMethod);
Iterator<Map.Entry<String, String>> headers = solrRsp.httpHeaders();
while (headers.hasNext()) {
Map.Entry<String, String> entry = headers.next();
resp.addHeader(entry.getKey(), entry.getValue());
}
QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq);
if (invalidStates != null) solrReq.getContext().put(CloudSolrClient.STATE_VERSION, invalidStates);
writeResponse(solrRsp, responseWriter, reqMethod);
}
return RETURN;
default: return action;
}
} catch (Throwable ex) {
sendError(ex);
// walk the the entire cause chain to search for an Error
Throwable t = ex;
while (t != null) {
if (t instanceof Error) {
if (t != ex) {
SolrDispatchFilter.log.error("An Error was wrapped in another exception - please report complete stacktrace on SOLR-6161", ex);
}
throw (Error) t;
}
t = t.getCause();
}
return RETURN;
} finally {
MDCLoggingContext.clear();
} }

3.获取handler

RequestHandlerBase获取handler:

/**
* Get the request handler registered to a given name.
*
* This function is thread safe.
*/
public static SolrRequestHandler getRequestHandler(String handlerName, PluginBag<SolrRequestHandler> reqHandlers) {
if(handlerName == null) return null;
SolrRequestHandler handler = reqHandlers.get(handlerName);
int idx = 0;
if(handler == null) {
for (; ; ) {
idx = handlerName.indexOf('/', idx+1);
if (idx > 0) {
String firstPart = handlerName.substring(0, idx);
handler = reqHandlers.get(firstPart);
if (handler == null) continue;
if (handler instanceof NestedRequestHandler) {
return ((NestedRequestHandler) handler).getSubHandler(handlerName.substring(idx));
}
} else {
break;
}
}
}
return handler;
}

4.处理请求handleRequest

RequestHandlerBase的handleRequest()方法处理请求:

public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) {
numRequests.incrementAndGet();
TimerContext timer = requestTimes.time();
try {
if(pluginInfo != null && pluginInfo.attributes.containsKey(USEPARAM)) req.getContext().put(USEPARAM,pluginInfo.attributes.get(USEPARAM));
SolrPluginUtils.setDefaults(this, req, defaults, appends, invariants);
req.getContext().remove(USEPARAM);
rsp.setHttpCaching(httpCaching);
handleRequestBody( req, rsp );
// count timeouts
NamedList header = rsp.getResponseHeader();
if(header != null) {
Object partialResults = header.get("partialResults");
boolean timedOut = partialResults == null ? false : (Boolean)partialResults;
if( timedOut ) {
numTimeouts.incrementAndGet();
rsp.setHttpCaching(false);
}
}
} catch (Exception e) {
if (e instanceof SolrException) {
SolrException se = (SolrException)e;
if (se.code() == SolrException.ErrorCode.CONFLICT.code) {
// TODO: should we allow this to be counted as an error (numErrors++)? } else {
SolrException.log(SolrCore.log,e);
}
} else {
SolrException.log(SolrCore.log,e);
if (e instanceof SyntaxError) {
e = new SolrException(SolrException.ErrorCode.BAD_REQUEST, e);
}
} rsp.setException(e);
numErrors.incrementAndGet();
}
finally {
timer.stop();
}
}

5.具体请求落到各个handler的handleRequestBody()方法,以DataImportHandler为例:

 @Override
@SuppressWarnings("unchecked")
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {
rsp.setHttpCaching(false); //TODO: figure out why just the first one is OK...
ContentStream contentStream = null;
Iterable<ContentStream> streams = req.getContentStreams();
if(streams != null){
for (ContentStream stream : streams) {
contentStream = stream;
break;
}
}
SolrParams params = req.getParams();
NamedList defaultParams = (NamedList) initArgs.get("defaults");
RequestInfo requestParams = new RequestInfo(req, getParamsMap(params), contentStream);
String command = requestParams.getCommand(); if (DataImporter.SHOW_CONF_CMD.equals(command)) {
String dataConfigFile = params.get("config");
String dataConfig = params.get("dataConfig");
if(dataConfigFile != null) {
dataConfig = SolrWriter.getResourceAsString(req.getCore().getResourceLoader().openResource(dataConfigFile));
}
if(dataConfig==null) {
rsp.add("status", DataImporter.MSG.NO_CONFIG_FOUND);
} else {
// Modify incoming request params to add wt=raw
ModifiableSolrParams rawParams = new ModifiableSolrParams(req.getParams());
rawParams.set(CommonParams.WT, "raw");
req.setParams(rawParams);
ContentStreamBase content = new ContentStreamBase.StringStream(dataConfig);
rsp.add(RawResponseWriter.CONTENT, content);
}
return;
} rsp.add("initArgs", initArgs);
String message = ""; if (command != null) {
rsp.add("command", command);
}
// If importer is still null
if (importer == null) {
rsp.add("status", DataImporter.MSG.NO_INIT);
return;
} if (command != null && DataImporter.ABORT_CMD.equals(command)) {
importer.runCmd(requestParams, null);
} else if (importer.isBusy()) {
message = DataImporter.MSG.CMD_RUNNING;
} else if (command != null) {
if (DataImporter.FULL_IMPORT_CMD.equals(command)
|| DataImporter.DELTA_IMPORT_CMD.equals(command) ||
IMPORT_CMD.equals(command)) {
importer.maybeReloadConfiguration(requestParams, defaultParams);
UpdateRequestProcessorChain processorChain =
req.getCore().getUpdateProcessorChain(params);
UpdateRequestProcessor processor = processorChain.createProcessor(req, rsp);
SolrResourceLoader loader = req.getCore().getResourceLoader();
DIHWriter sw = getSolrWriter(processor, loader, requestParams, req); if (requestParams.isDebug()) {
if (debugEnabled) {
// Synchronous request for the debug mode
importer.runCmd(requestParams, sw);
rsp.add("mode", "debug");
rsp.add("documents", requestParams.getDebugInfo().debugDocuments);
if (requestParams.getDebugInfo().debugVerboseOutput != null) {
rsp.add("verbose-output", requestParams.getDebugInfo().debugVerboseOutput);
}
} else {
message = DataImporter.MSG.DEBUG_NOT_ENABLED;
}
} else {
// Asynchronous request for normal mode
if(requestParams.getContentStream() == null && !requestParams.isSyncMode()){
importer.runAsync(requestParams, sw);
} else {
importer.runCmd(requestParams, sw);
}
}
} else if (DataImporter.RELOAD_CONF_CMD.equals(command)) {
if(importer.maybeReloadConfiguration(requestParams, defaultParams)) {
message = DataImporter.MSG.CONFIG_RELOADED;
} else {
message = DataImporter.MSG.CONFIG_NOT_RELOADED;
}
}
}
rsp.add("status", importer.isBusy() ? "busy" : "idle");
rsp.add("importResponse", message);
rsp.add("statusMessages", importer.getStatusMessages());
}

6. 导入数据操作

分全量和增量:

void runCmd(RequestInfo reqParams, DIHWriter sw) {
String command = reqParams.getCommand();
if (command.equals(ABORT_CMD)) {
if (docBuilder != null) {
docBuilder.abort();
}
return;
}
if (!importLock.tryLock()){
LOG.warn("Import command failed . another import is running");
return;
}
try {
if (FULL_IMPORT_CMD.equals(command) || IMPORT_CMD.equals(command)) {
doFullImport(sw, reqParams);
} else if (command.equals(DELTA_IMPORT_CMD)) {
doDeltaImport(sw, reqParams);
}
} finally {
importLock.unlock();
}
}

以全量为例:

  public void doFullImport(DIHWriter writer, RequestInfo requestParams) {
LOG.info("Starting Full Import");
setStatus(Status.RUNNING_FULL_DUMP);
try {
DIHProperties dihPropWriter = createPropertyWriter();
setIndexStartTime(dihPropWriter.getCurrentTimestamp());
docBuilder = new DocBuilder(this, writer, dihPropWriter, requestParams);
checkWritablePersistFile(writer, dihPropWriter);
docBuilder.execute();
if (!requestParams.isDebug())
cumulativeStatistics.add(docBuilder.importStatistics);
} catch (Exception e) {
SolrException.log(LOG, "Full Import failed", e);
docBuilder.handleError("Full Import failed", e);
} finally {
setStatus(Status.IDLE);
DocBuilder.INSTANCE.set(null);
} }

7. EntityProcessorWrapper处理sql的实现类SqlEntityProcessor

  @Override
public void init(Context context) {
rowcache = null;
this.context = context;
resolver = (VariableResolver) context.getVariableResolver();
if (entityName == null) {
onError = resolver.replaceTokens(context.getEntityAttribute(ON_ERROR));
if (onError == null) onError = ABORT;
entityName = context.getEntityAttribute(ConfigNameConstants.NAME);
}
delegate.init(context); }

初始化时实现SqlEntityProcessor的初始化

  public void init(Context context) {
super.init(context);
dataSource = context.getDataSource();
}

contextImpl

  @Override
public DataSource getDataSource() {
if (ds != null) return ds;
if(epw==null) { return null; }
if (epw!=null && epw.getDatasource() == null) {
epw.setDatasource(dataImporter.getDataSourceInstance(epw.getEntity(), epw.getEntity().getDataSourceName(), this));
}
if (epw!=null && epw.getDatasource() != null && docBuilder != null && docBuilder.verboseDebug &&
Context.FULL_DUMP.equals(currentProcess())) {
//debug is not yet implemented properly for deltas
epw.setDatasource(docBuilder.getDebugLogger().wrapDs(epw.getDatasource()));
}
return epw.getDatasource();
}

DataImporter获取数据库配置:

public DataSource getDataSourceInstance(Entity key, String name, Context ctx) {
Map<String,String> p = requestLevelDataSourceProps.get(name);
if (p == null)
p = config.getDataSources().get(name);
if (p == null)
p = requestLevelDataSourceProps.get(null);// for default data source
if (p == null)
p = config.getDataSources().get(null);
if (p == null)
throw new DataImportHandlerException(SEVERE,
"No dataSource :" + name + " available for entity :" + key.getName());
String type = p.get(TYPE);
DataSource dataSrc = null;
if (type == null) {
dataSrc = new JdbcDataSource();
} else {
try {
dataSrc = (DataSource) DocBuilder.loadClass(type, getCore()).newInstance();
} catch (Exception e) {
wrapAndThrow(SEVERE, e, "Invalid type for data source: " + type);
}
}
try {
Properties copyProps = new Properties();
copyProps.putAll(p);
Map<String, Object> map = ctx.getRequestParameters();
if (map.containsKey("rows")) {
int rows = Integer.parseInt((String) map.get("rows"));
if (map.containsKey("start")) {
rows += Integer.parseInt((String) map.get("start"));
}
copyProps.setProperty("maxRows", String.valueOf(rows));
}
dataSrc.init(ctx, copyProps);
} catch (Exception e) {
wrapAndThrow(SEVERE, e, "Failed to initialize DataSource: " + key.getDataSourceName());
}
return dataSrc;
}

8.查询结果

public ResultSetIterator(String query) {

      try {
Connection c = getConnection();
stmt = c.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(batchSize);
stmt.setMaxRows(maxRows);
LOG.debug("Executing SQL: " + query);
long start = System.nanoTime();
if (stmt.execute(query)) {
resultSet = stmt.getResultSet();
}
LOG.trace("Time taken for sql :"
+ TimeUnit.MILLISECONDS.convert(System.nanoTime() - start, TimeUnit.NANOSECONDS));
colNames = readFieldNames(resultSet.getMetaData());
} catch (Exception e) {
wrapAndThrow(SEVERE, e, "Unable to execute query: " + query);
}
if (resultSet == null) {
rSetIterator = new ArrayList<Map<String, Object>>().iterator();
return;
} rSetIterator = new Iterator<Map<String, Object>>() {
@Override
public boolean hasNext() {
return hasnext();
} @Override
public Map<String, Object> next() {
return getARow();
} @Override
public void remove() {/* do nothing */
}
};
}

solr支持数据库的全量和增量索引建立,上述代码介绍了全量索引的来龙去脉,增量索引和全量索引雷同,就不赘述了。

solr源码分析之数据导入DataImporter追溯。的更多相关文章

  1. solr源码分析之searchComponent

    上文solr源码分析之数据导入DataImporter追溯中提到了solr的工作流程,其核心是各种handler. handler定义了各种search Component, @Override pu ...

  2. HDFS源码分析之数据块及副本状态BlockUCState、ReplicaState

    关于数据块.副本的介绍,请参考文章<HDFS源码分析之数据块Block.副本Replica>. 一.数据块状态BlockUCState 数据块状态用枚举类BlockUCState来表示,代 ...

  3. jQuery 源码分析(十) 数据缓存模块 data详解

    jQuery的数据缓存模块以一种安全的方式为DOM元素附加任意类型的数据,避免了在JavaScript对象和DOM元素之间出现循环引用,以及由此而导致的内存泄漏. 数据缓存模块为DOM元素和JavaS ...

  4. solr源码分析之solrclound

    一.简介 SolrCloud是Solr4.0版本以后基于Solr和Zookeeper的分布式搜索方案.SolrCloud是Solr的基于Zookeeper一种部署方式.Solr可以以多种方式部署,例如 ...

  5. SOFA 源码分析 — 链路数据透传

    前言 SOFA-RPC 支持数据链路透传功能,官方解释: 链路数据透传功能支持应用向调用上下文中存放数据,达到整个链路上的应用都可以操作该数据. 使用方式如下,可分别向链路的 request 和 re ...

  6. springMVC源码分析--HttpMessageConverter数据转化(一)

    之前的博客我们已经介绍了很多springMVC相关的模块,接下来我们介绍一下springMVC在获取参数和返回结果值方面的处理.虽然在之前的博客老田已经分别介绍了参数处理器和返回值处理器: (1)sp ...

  7. Hadoop源码分析之数据节点的握手,注册,上报数据块和心跳

    转自:http://www.it165.net/admin/html/201402/2382.html 在上一篇文章Hadoop源码分析之DataNode的启动与停止中分析了DataNode节点的启动 ...

  8. Android 7.0 Gallery图库源码分析3 - 数据加载及显示流程

    前面分析Gallery启动流程时,说了传给DataManager的data的key是AlbumSetPage.KEY_MEDIA_PATH,value值,是”/combo/{/local/all,/p ...

  9. 11.源码分析---SOFARPC数据透传是实现的?

    先把栗子放上,让大家方便测试用: Service端 public static void main(String[] args) { ServerConfig serverConfig = new S ...

随机推荐

  1. linux下部署monogoDB服务(以及安装php mogodb扩展)

    这两天网站转移到新的服务器后,登录时出现一个问题,会等待几分钟服务器才响应. 开始以为是nginx服务器的问题,后面经过排查发现是php一个登陆验证的函数的问题,每次跑到这个函数就会迟钝几分钟. 经过 ...

  2. Java多线程系列3 synchronized 关键词

    先来看一个线程安全的例子 ,两个线程对count进行累加,共累加10万次. public class AddTest { public static void main(String[] args) ...

  3. python datetime模块用法

    1. 创建naive(无时区信息)的datetime对象 import datetime dt_utc = datetime.datetime.utcnow() dt_utc # datetime.d ...

  4. diango中的url路由系统

    一.url配置 url本质是url与要为该url调用的视图函数之间的映射表 urlpatterns = [正则,视图函数[,别名]] 二.正则表达式 1.匹配原则 django是循环urlpatter ...

  5. 我的idea突然没有SVN了是怎么回事

    总结一下没有svn选项的几种情况: 情况1:IntelliJ IDEA打开带SVN信息的项目不显示SVN信息,项目右键SVN以及图标还有Changes都不显示解决方法 在VCS菜单中有个开关,叫Ena ...

  6. C#语言不常用语法笔记

    只看过3天C#语法书,了解个大概,与C++等不同之处,或者看开源遇到一些奇异用法,记录一下,脑子不够用的情况下,还是记笔记靠谱. ==================== 顺便吐槽下,这年头得会各种编 ...

  7. 《MarkMark学习笔记学习笔记》html学习笔记

    iframe里有一个srcdoc属性,很有用! window.location.href=document.referrer//可以实现返回上一级页面并刷新 HTML5权威指南©®,比较老的书了,有些 ...

  8. js计算器---转

    至今见过的一个还没问题的计算器,收藏在此. 转自javascript写的简单的计算器原文链接,谢分享! js部分 ar num=0,result=0,numshow="0"; va ...

  9. 配置微信jssdk自定义分享

    前段时间做这个功能的时候遇到这个问题,之前的话是微信自动抓取界面第一张图,现在微信更新api,必须自行配置,接入jssdk,才能实现该功能. 详细可以查看微信的jssdk文档 微信官方开发者文档    ...

  10. salt 配置管理

    索引 saltstack入门 salt state sls 描述文件 saltstack配置管理高级功能 saltstack入门 192.168.86.3 salt 修改 [root@Zabbix-s ...