异常:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

1、场景

项目中需要使用到读取 word 文档中的内容,使用的工具是 apache poi 来实现 word 、ppt 、excel 等文件的读取。在开发过程中,读取文件的过程中,出现了异常: org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)


2、分析

office中,ppt 文档的保存是有 ppt(office 2003-2007) 和 pptx 两种格式的。在 apche poi 中,对 不同格式的 ppt 文档是不同类进行支持的。

图示中使用的是 XMLSlideShow 类读取 ppt 格式的文档,而 XMLSlideShow 是只支持 pptx 格式的文档的读取,所以会报错。

错误示例:


3、ppt 和 pptx 文档读取详解

现在对读取两种格式的ppt的读取,做正确的示例代码详解:

读取 ppt

// 使用 HSLFSlideShow 类读取 ppt 格式文档

// --------- ppt -----------
File file = new File("E:\\search-file\\44.ppt");
FileInputStream fis = null;
HSLFSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new HSLFSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}

格式使用错误就会报错:org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)

读取 pptx

// 使用 XMLSlideShow 类读取 pptx 格式的文档

// --------- pptx -----------
File file = new File("E:\\search-file\\33.pptx");
FileInputStream fis = null;
XMLSlideShow document = null;
SlideShowExtractor extractor = null;
try {
fis = new FileInputStream(file);
document = new XMLSlideShow(fis);
extractor = new SlideShowExtractor(document);
log.info("extractor.getText:{}", extractor.getText());
} catch (Exception e) {
e.printStackTrace();
}

XWPFDocument 类读取 doc 格式文档使用错误会报错:org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents. You need to call a different part of POI to process this data (eg HSSF instead of XSSF)

4、总结

apache poi 工具还是很强大的,功能非常多,对具体使用也可参考 apache poi 的官方文档:

https://poi.apache.org/apidocs/index.html

请注意自己使用的 apache poi 的版本,参考对应版本的 javadocs

org.apache.poi.openxml4j.exceptions.OLE2NotOfficeXmlFileException: The supplied data appears to be in the OLE2 Format. You are calling the part of POI that deals with OOXML (Office Open XML) Documents的更多相关文章

  1. org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML.

    org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Offic ...

  2. java解析excel2003和excel2007:The supplied data appears to be in the office 2007+XML Polonly supports OLE2 office documents

    上传excel解析存到数据库时报: org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears ...

  3. hadoop错误org.apache.hadoop.yarn.exceptions.YarnException Unauthorized request to start container

    错误: 14/04/29 02:45:07 INFO mapreduce.Job: Job job_1398704073313_0021 failed with state FAILED due to ...

  4. Why Apache Spark is a Crossover Hit for Data Scientists [FWD]

    Spark is a compelling multi-purpose platform for use cases that span investigative, as well as opera ...

  5. org.apache.kafka.common.errors.SerializationException: Error deserializing... Caused by: org.apache.kafka.common.errors.SerializationException: Size of data received by IntegerDeserializer is not 4

    原因,最近开发的kafka消息接收,突然报如下错: org.apache.kafka.common.errors.SerializationException: Error deserializing ...

  6. java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log from class org.apache.poi.openxml4j.opc.ZipPackage

    代码说简单也简单,说复杂那还真是寸步难行. 之前好好的excel导出功能,本地启动调试的时候突然就不行了,一直报上面的错. 一直在本地折腾了半天,去测试环境上看,又是好的,可以正常导出excel. 搜 ...

  7. spark on yarn 动态资源分配报错的解决:org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist

    组件:cdh5.14.0 spark是自己编译的spark2.1.0-cdh5.14.0 第一步:确认spark-defaults.conf中添加了如下配置: spark.shuffle.servic ...

  8. org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService: mapreduce_shuffle do

    在yarn-site.xml 配置文件中增加: <property> <name>yarn.nodemanager.aux-services</name> < ...

  9. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/exceptions/YarnException

    这个是Flink 1.11.1  使用yarn-session 出现的错误:原因是在Flink1.11 之后不再提供flink-shaded-hadoop-*” jars 需要在yarn-sessio ...

  10. 根据xlsx模板生成excel数据文件发送邮件代码

    package mail; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundExcept ...

随机推荐

  1. Traefik 2.0 实现自动化 HTTPS

    文章转载自:https://mp.weixin.qq.com/s?__biz=MzU4MjQ0MTU4Ng==&mid=2247484457&idx=1&sn=35112e98 ...

  2. 自定义mapping与常见参数

    PUT test { "mappings": { "dynamic": true, "properties": { "firstn ...

  3. SpringBoot项目的CI配置 # 安全变量

    运行GitLab Runner容器 参考Run GitLab Runner in a container - Docker image installation and configuration 执 ...

  4. .NET6 JWT(生成Token令牌)

    一.Net 6环境下的.net core项目里如何使用JWT. 第一步,在Nuget引入JWT.Microsoft.AspNetCore.Authentication.JwtBearer这两个NuGe ...

  5. C#并发编程-4 同步

    如果程序用到了并发技术,那就要特别留意这种情况:一段代码需要修改数据,同时其他代码需要访问同一个数据. 这种情况就需要考虑同步地访问数据. 如果下面三个条件都满足,就必须用同步来保护共享的数据. 多段 ...

  6. 洛谷P1120 小木棍 (搜索+剪枝)

    搜索的经典题. 我们要求木根的最小长度,就要是木根的数量尽可能多,可以发现木根的长度一定可以整除所有小木棒的总长度,从小到大枚举这个可能的长度,第一次有解的就是答案. 关心的状态:当前正在拼哪根木棍, ...

  7. Js实现一键复制小功能

    function copyToClipboard(textToCopy) { // navigator clipboard 需要https等安全上下文 if (navigator.clipboard ...

  8. 《吐血整理》高级系列教程-吃透Fiddler抓包教程(28)-Fiddler如何抓取Android7.0以上的Https包-下篇

    1.简介 虽然依旧能抓到大部分Android APP的HTTP/HTTPS包,但是别高兴的太早,有的APP为了防抓包,还做了很多操作:① 二次加密有的APP,在涉及到关键数据通信时,会将正文二次加密后 ...

  9. 1.关于SPring Boot项目的创建

    一.引入依赖 <parent> <groupId>org.springframework.boot</groupId> <artifactId>spri ...

  10. scrapy操作mysql/批量下载图片

    1.操作mysql items.py meiju.py 3.piplines.py 4.settings.py -------------------------------------------- ...