ElasticSearch查询max_result

需要出一份印地语文章的表，导出规则为：

　　1.所有印地语(包含各种颜色，各种状态)的文章

　　2.阅读数大于300

　　3.按照阅读推荐比进行排序，取前3000篇文章

说明：

　　1.文章信息，和阅读推荐数量在两个Es中

　　2.印地语文章共30w+篇（不超过40w）

思路：

　　从Topic-Es中每次获取500个文章uuid，再去UserLog-Es中查询这500个uuid的阅读推荐数，将阅读数大于300的文章信息放入List集合中，导出Excel。

问题：

　　1.QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10100].

    Failed to execute phase [dfs], all shards failed; shardFailures {[aPdAdh6fTlOzXsE7-rJ71Q][holga_index][0]: RemoteTransportException[[node-01][10.25.167.4:9300][indices:data/read/search[phase/dfs]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[aPdAdh6fTlOzXsE7-rJ71Q][holga_index][1]: RemoteTransportException[[node-01][10.25.167.4:9300][indices:data/read/search[phase/dfs]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[aPdAdh6fTlOzXsE7-rJ71Q][holga_index][2]: RemoteTransportException[[node-01][10.25.167.4:9300][indices:data/read/search[phase/dfs]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10100]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }

Error

　　多次测试这个问题是必现问题，只要使用from...size...查询的页码大于1w就会出现该错误。使用的程序代码为：

searchRequestBuilder.setQuery(query).addSort(SortBuilders.fieldSort("add_time").order(SortOrder.DESC)).setFrom(index).setSize(100);

　　解决这个问题需要使用到scroll，解决方案如下：

searchRequestBuilder.setQuery(query).addSort(SortBuilders.fieldSort("add_time").order(SortOrder.DESC)).setSize(500).setScroll(new TimeValue(total));

　　2.The supplied data appears to be in the Office 2007+ XML. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)

Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
    at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:152)
    at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:140)
    at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:302)
    at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:87)
    at com.mkit.export.main.ExportExcel.write2File(ExportExcel.java:86)
    at com.mkit.export.main.ExportExcel.main(ExportExcel.java:35)

Error

　　出现这个问题是因为，读取的Excel文件是xlsx(offic2007版本excel),但是却使用了HSSF(HSSF只支持office2003版本文件)去接收读取到的Workbook变量，所以会导致错误发生。

 FileInputStream fs=new FileInputStream("d://aa.xls");　　　　  //offic2003文件
 POIFSFileSystem ps=new POIFSFileSystem(fs);　　　　
 HSSFWorkbook wb = new HSSFWorkbook(ps);　　　　　　　　　　　　　//HSSFWorkbook(office 2003)       XSSFWorkbook(office 2007)
 HSSFSheet sheet = wb.getSheetAt(0);　　　　　　　　　　　　　　  //获取到工作表，因为一个excel可能有多个工作表
 int lastRowNum = sheet.getLastRowNum();
 System.out.println("获取最后一行为："+lastRowNum);

ElasticSearch查询max_result_window问题处理的更多相关文章

elasticsearch查询之大数据集分页查询
一. 要解决的问题 search命中的记录特别多,使用from+size分页,直接触发了elasticsearch的max_result_window的最大值: { "error" ...
elasticsearch 查询（match和term）
elasticsearch 查询(match和term) es中的查询请求有两种方式,一种是简易版的查询,另外一种是使用JSON完整的请求体,叫做结构化查询(DSL). 由于DSL查询更为直观也更为简 ...
Func<T,T>应用之Elasticsearch查询语句构造器的开发
前言之前项目中做Elasticsearch相关开发的时候,虽然借助了第三方的组件PlainElastic.Net,但是由于当时不熟悉用法,而选择了自己拼接查询语句.例如: string queryG ...
ElasticSearch查询第五篇：布尔查询
布尔查询是最常用的组合查询,不仅将多个查询条件组合在一起,并且将查询的结果和结果的评分组合在一起.当查询条件是多个表达式的组合时,布尔查询非常有用,实际上,布尔查询把多个子查询组合(combine)成 ...
利用kibana插件对Elasticsearch查询
利用kibana插件对Elasticsearch查询 Elasticsearch是功能非常强大的搜索引擎,使用它的目的就是为了快速的查询到需要的数据. 查询分类: 基本查询:使用Elasticsear ...
ElasticSearch查询第四篇：匹配查询（Match）
<ElasticSearch查询>目录导航: ElasticSearch查询第一篇:搜索API ElasticSearch查询第二篇:文档更新 ElasticSearch查询第三篇: ...
elasticsearch查询语句总结
query 和 filter 的区别请看:https://www.cnblogs.com/bainianminguo/articles/10396956.html Filter DSL term 过 ...
（转载）elasticsearch 查询（match和term）
原文地址:https://www.cnblogs.com/yjf512/p/4897294.html elasticsearch 查询(match和term) es中的查询请求有两种方式,一种是简易版 ...
ElasticSearch查询第三篇：词条查询
<ElasticSearch查询>目录导航: ElasticSearch查询第一篇:搜索API ElasticSearch查询第二篇:文档更新 ElasticSearch查询第三篇: ...

随机推荐

JavaScript: __proto__和prototype
图来源于:http://www.cnblogs.com/smoothLily/p/4745856.html 个人的理解: 1. 所有对象都有 __proto__属性,返回该对象的原型对象.例如f1由语 ...
【转】iTween for Unity
http://www.cnblogs.com/zhaoqingqing/p/3833321.html?utm_source=tuicool&utm_medium=referral 你曾经在你的 ...
sass和postcss
sass是css预处理器需要安装node-sass支持核心是c++编写集成 sass-loader 把scss装换成css css-loader 找出@import和url()导入的语法,告诉w ...
mysql-Innodb事务隔离级别-repeatable read详解
http://blog.csdn.net/dong976209075/article/details/8802778 经验总结: Python使用MySQLdb数据库后,如使用多线程,每个线程创建一个 ...
linux下编译libmysqlclient, 安装mysql-server mysql-client
cmake . -DCMAKE_INSTALL_PREFIX=/home/zhangyawei/server/depends make make install 安装 mysql-server mys ...
Vue中slot内容分发
<slot>元素是一个内容分发API,使用多个内容插槽时可指定name属性 <!DOCTYPE html> <html> <head> <meta ...
ANT总结
1 Ant是什么? Apache Ant 是一个基于 Java的生成工具.生成工具在软件开发中用来将源代码和其他输入文件转换为可执行文件的形式(也有可能转换为可安装的产品映像形式).随着应用程序的生成 ...
[转]从头到尾彻底理解KMP
https://blog.csdn.net/v_july_v/article/details/7041827
practical system design with mef & mef[ trans from arup.codeplex.com/]
Practical System Design using MEF MVVM RX MOQ Unit Tests in WPF Posted on May 21, 2015 by Arup Baner ...
牛客挑战赛14-F细胞
https://www.nowcoder.com/acm/contest/81/F 循环卷积的裸题,太久没做FFT了,这么裸的循环卷积都看不出来注意一下本文的mod 都是指表示幂的模数,而不是NTT ...

ElasticSearch查询max_result_window问题处理

ElasticSearch查询max_result_window问题处理的更多相关文章

随机推荐

热门专题