org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents
异常:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)
1、场景
项目中需要使用到读取 word 文档中的内容,使用的工具是 apache poi 来实现 word 、ppt 、excel 等文件的读取。在开发过程中,读取文件的过程中,出现了异常: org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)
2、分析
office中,ppt 文档的保存是有 ppt(office 2003-2007) 和 pptx 两种格式的。在 apche poi 中,对 不同格式的 ppt 文档是不同类进行支持的。
图示中使用的是 XMLSlideShow 类读取 ppt 格式的文档,而 XMLSlideShow 是只支持 pptx 格式的文档的读取,所以会报错。
错误示例:
3、ppt 和 pptx 文档读取详解
现在对读取两种格式的ppt的读取,做正确的示例代码详解:
读取 ppt
// 使用 HSLFSlideShow 类读取 ppt 格式文档
// --------- ppt -----------
File file = new File("E:\\search-file\\44.ppt");
FileInputStream fis = null;
HSLFSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new HSLFSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
格式使用错误就会报错:org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
读取 pptx
// 使用 XMLSlideShow 类读取 pptx 格式的文档
// --------- pptx -----------
File file = new File("E:\\search-file\\33.pptx");
FileInputStream fis = null;
XMLSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new XMLSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
XWPFDocument 类读取 doc 格式文档使用错误会报错:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)
4、总结
apache poi 工具还是很强大的,功能非常多,对具体使用也可参考 apache poi 的官方文档:
https://poi.apache.org/apidocs/index.html
请注意自己使用的 apache poi 的版本,参考对应版本的 javadocs
org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents的更多相关文章
- org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML.
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Offic ...
- java解析excel2003和excel2007:The supplied data appears to be in the office 2007+XML Polonly supports OLE2 office documents
上传excel解析存到数据库时报: org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears ...
- hadoop错误org.apache.hadoop.yarn.exceptions.YarnException Unauthorized request to start container
错误: 14/04/29 02:45:07 INFO mapreduce.Job: Job job_1398704073313_0021 failed with state FAILED due to ...
- Why Apache Spark is a Crossover Hit for Data Scientists [FWD]
Spark is a compelling multi-purpose platform for use cases that span investigative, as well as opera ...
- org.apache.kafka.common.errors.SerializationException: Error deserializing... Caused by: org.apache.kafka.common.errors.SerializationException: Size of data received by IntegerDeserializer is not 4
原因,最近开发的kafka消息接收,突然报如下错: org.apache.kafka.common.errors.SerializationException: Error deserializing ...
- java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log from class org.apache.poi.openxml4j.opc.ZipPackage
代码说简单也简单,说复杂那还真是寸步难行. 之前好好的excel导出功能,本地启动调试的时候突然就不行了,一直报上面的错. 一直在本地折腾了半天,去测试环境上看,又是好的,可以正常导出excel. 搜 ...
- spark on yarn 动态资源分配报错的解决:org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
组件:cdh5.14.0 spark是自己编译的spark2.1.0-cdh5.14.0 第一步:确认spark-defaults.conf中添加了如下配置: spark.shuffle.servic ...
- org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService: mapreduce_shuffle do
在yarn-site.xml 配置文件中增加: <property> <name>yarn.nodemanager.aux-services</name> < ...
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException
这个是Flink 1.11.1 使用yarn-session 出现的错误:原因是在Flink1.11 之后不再提供flink-shaded-hadoop-*” jars 需要在yarn-sessio ...
- 根据xlsx模板生成excel数据文件发送邮件代码
package mail; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundExcept ...
随机推荐
- Traefik 2.0 实现自动化 HTTPS
文章转载自:https://mp.weixin.qq.com/s?__biz=MzU4MjQ0MTU4Ng==&mid=2247484457&idx=1&sn=35112e98 ...
- 自定义mapping与常见参数
PUT test { "mappings": { "dynamic": true, "properties": { "firstn ...
- SpringBoot项目的CI配置 # 安全变量
运行GitLab Runner容器 参考Run GitLab Runner in a container - Docker image installation and configuration 执 ...
- .NET6 JWT(生成Token令牌)
一.Net 6环境下的.net core项目里如何使用JWT. 第一步,在Nuget引入JWT.Microsoft.AspNetCore.Authentication.JwtBearer这两个NuGe ...
- C#并发编程-4 同步
如果程序用到了并发技术,那就要特别留意这种情况:一段代码需要修改数据,同时其他代码需要访问同一个数据. 这种情况就需要考虑同步地访问数据. 如果下面三个条件都满足,就必须用同步来保护共享的数据. 多段 ...
- 洛谷P1120 小木棍 (搜索+剪枝)
搜索的经典题. 我们要求木根的最小长度,就要是木根的数量尽可能多,可以发现木根的长度一定可以整除所有小木棒的总长度,从小到大枚举这个可能的长度,第一次有解的就是答案. 关心的状态:当前正在拼哪根木棍, ...
- Js实现一键复制小功能
function copyToClipboard(textToCopy) { // navigator clipboard 需要https等安全上下文 if (navigator.clipboard ...
- 《吐血整理》高级系列教程-吃透Fiddler抓包教程(28)-Fiddler如何抓取Android7.0以上的Https包-下篇
1.简介 虽然依旧能抓到大部分Android APP的HTTP/HTTPS包,但是别高兴的太早,有的APP为了防抓包,还做了很多操作:① 二次加密有的APP,在涉及到关键数据通信时,会将正文二次加密后 ...
- 1.关于SPring Boot项目的创建
一.引入依赖 <parent> <groupId>org.springframework.boot</groupId> <artifactId>spri ...
- scrapy操作mysql/批量下载图片
1.操作mysql items.py meiju.py 3.piplines.py 4.settings.py -------------------------------------------- ...