doc或docx(word)或image类型文件批量转PDF脚本

1.实际生产环境中遇到文件展示只能适配PDF版本的文件，奈何一万个文件有七千个都是word或者image类型的，由此搞个脚本批量转换下上传至OSS，为前端提供数据支撑。

2.环境准备，这里使用的是aspose-words-18.6-jdk16-crack.jar工具包，资源包就不提供了，网上百度一下即可。

3.javaMaven项目，jdk1.8.maven3.6

4.使用aspose-words-18.6-jdk16-crack.jar工具包会产生水印，需要配置resources下去除水印配置：

<?xml version="1.0" encoding="UTF-8" ?>

<License>

    <Data>

        <Products>

            <Product>Aspose.Total for Java</Product>

            <Product>Aspose.Words for Java</Product>

        </Products>

        <EditionType>Enterprise</EditionType>

        <SubscriptionExpiry>20991231</SubscriptionExpiry>

        <LicenseExpiry>20991231</LicenseExpiry>

        <SerialNumber>8bfe198c-7f0c-4ef8-8ff0-acc3237bf0d7</SerialNumber>

    </Data>

    <Signature>sNLLKGMUdF0r8O1kKilWAGdgfs2BvJb/2Xp8p5iuDVfZXmhppo+d0Ran1P9TKdjV4ABwAgKXxJ3jcQTqE/2IRfqwnPf8itN8aFZlV3TJPYeD3yWE7IT55Gz6EijUpC7aKeoohTb4w2fpox58wWoF3SNp6sK6jDfiAUGEHYJ9pjU=</Signature>

</License>

license.xml

5.工具类编写：

package org.utiles.dongl.tools;

import com.aspose.words.License;

import com.aspose.words.SaveFormat;

import com.itextpdf.text.*;

import com.itextpdf.text.pdf.PdfWriter;

import org.apache.log4j.Logger;

import org.utiles.dongl.comment.WordTranPDF;

import java.io.File;

import java.io.FileOutputStream;

import java.io.IOException;

import java.io.InputStream;

import java.util.*;

import java.util.List;

/**

 * @ClassName: FileTranPDFTool

 * @Description TODO

 * @Author: 东霖

 * @Date: 2022/7/23 10:50

 * @Version 1.0

 **/

public class FileTranPDFTool {

    private static Logger logger = Logger.getLogger(FileTranPDFTool.class);

    public static boolean getLicense() {

        boolean result = false;

        try {

            InputStream is = WordTranPDF.class.getClassLoader().getResourceAsStream("\\license.xml"); // license.xml应放在..\WebRoot\WEB-INF\classes路径下

            License aposeLic = new License();

            aposeLic.setLicense(is);

            result = true;

        } catch (Exception e) {

            e.printStackTrace();

        }

        return result;

    }

    /**

     * ImageToPDF

     * 支持类型：jpg/tif/..

     *

     * @param source

     * @param target

     */

    public static void ImageToPDF(String source, String target) {

        Document document = new Document();

        //设置文档页边距

        document.setMargins(0, 0, 0, 0);

        FileOutputStream fos = null;

        try {

            fos = new FileOutputStream(target);

            PdfWriter.getInstance(document, fos);

            //打开文档

            document.open();

            //获取图片的宽高

            Image image = Image.getInstance(source);

            float imageHeight = image.getScaledHeight();

            float imageWidth = image.getScaledWidth();

            //设置页面宽高与图片一致

            Rectangle rectangle = new Rectangle(imageWidth, imageHeight);

            document.setPageSize(rectangle);

            //图片居中

            image.setAlignment(Image.ALIGN_CENTER);

            //新建一页添加图片

            document.newPage();

            document.add(image);

        } catch (Exception ioe) {

            System.out.println(ioe.getMessage());

        } finally {

            //关闭文档

            document.close();

            try {

                fos.flush();

                fos.close();

            } catch (IOException e) {

                e.printStackTrace();

            }

        }

    }

    /**

     * word 文档类型转pdf

     *

     * @param inPath

     * @param outPath

     * @return

     */

    public static boolean doc2pdf(String inPath, String outPath) {

        if (!getLicense()) { // 验证License 若不验证则转化出的pdf文档会有水印产生

            return false;

        }

        FileOutputStream os = null;

        try {

            File file = new File(outPath); // 新建一个空白pdf文档

            os = new FileOutputStream(file);

            com.aspose.words.Document doc = new com.aspose.words.Document(inPath); // Address是将要被转化的word文档

//            doc.save(os, SaveFormat.PDF);// 全面支持DOC, DOCX, OOXML, RTF HTML, OpenDocument, PDF,

            doc.save(os, SaveFormat.DOCX);// 全面支持DOC, DOCX, OOXML, RTF HTML, OpenDocument, PDF,

            // EPUB, XPS, SWF 相互转换

        } catch (Exception e) {

            e.printStackTrace();

            return false;

        } finally {

            if (os != null) {

                try {

                    os.flush();

                    os.close();

                } catch (IOException e) {

                    e.printStackTrace();

                }

            }

        }

        return true;

    }

    /**

     * 遍历指定目录取文件名称

     *

     * @param foldPath 文件目录绝对路径

     * @return

     */

    public static List<String> listFileName(String foldPath) {

        List<String> listFiles = new ArrayList<>();

        //创建文件对象

        File f = new File(foldPath);

        //列出文件名称存入数组

        File[] files = f.listFiles();

        for (int i = 0; i < Objects.requireNonNull(files).length; i++) {

            listFiles.add(files[i].getName());

        }

        return listFiles;

    }

    /**

     * 删除指定文件

     * @param filePath

     * @return

     */

    public static boolean deleteByFilePath(String filePath) {

        File file = new File(filePath);

        return file.delete();

    }

    /**

     * 遍历指定目录取文件名称并接入路径

     *

     * @param oldPath 遍历文件目录绝对路径，也是要删除的文件目录

     * @return

     */

    public static Map<String, String> listFileNameAndPath(String oldPath) {

        Map<String, String> listFiles = new HashMap();

        //创建文件对象

        File f = new File(oldPath);

        //列出文件名称存入数组

        File[] files = f.listFiles();

        for (int i = 0; i < Objects.requireNonNull(files).length; i++) {

            listFiles.put(files[i].getPath(), files[i].getName());

        }

        return listFiles;

    }

    /**

     * 获取指定文件目录文件大小为0Size的

     * @param foldPath

     * @return

     */

    public static Integer getFileSize(String foldPath,String newFoldPath) {

        int j=1;

        //创建文件对象

        File file = new File(foldPath);

        File[] files = file.listFiles();

        for (int i = 0; i < files.length; i++) {

            if (files[i].length()==0){

                Boolean aBoolean = WriteToFileExample.moveFileToTarget("D:\\OSS\\ghwb\\ghksj_1_copy\\《金东区卫生健康事业发展“十四五”规划》.pdf", newFoldPath+files[i].getName(),null);

                if (aBoolean==true){

                    j++;

                    logger.info("移动："+files[i].getPath()+"到"+newFoldPath);

                }

                System.out.println(files[i].getPath());

            }

        }

        return j;

    }

    /**

     * 文件对比删除重复文件

     * @param oldFileNames

     * @param newPath 对比文件目录

     * @return

     */

    public static Integer deleteByFileName(Map<String, String> oldFileNames, String newPath) {

        int j = 0;

        List<String> newListNames = listFileName(newPath);

        for (Map.Entry<String, String> entry : oldFileNames.entrySet()) {

            for (int i = 0; i < newListNames.size(); i++) {

                String value = entry.getValue();

                String s = newListNames.get(i);

                if (value.substring(0,value.lastIndexOf(".")).equals(s.substring(0,s.lastIndexOf(".")))) {

                    boolean b = deleteByFilePath(entry.getKey());

                    if (b==true){

                        logger.info("成功删除指定文件："+entry.getKey()+"，共计："+j+"个");

                        j++;

                    }else{

                        logger.error("指定文件不存在："+entry.getKey());

                    }

                }

            }

        }

        return j;

    }

    public static void main(String[] args) {

        //文件对比删除

        Map<String, String> map = listFileNameAndPath("D:\\OSS\\ghwb\\word");

        int b = deleteByFileName(map, "D:\\OSS\\ghwb\\ghksj - 副本");

        //word转pdf

        doc2pdf("D:\\OSS\\ghwb\\13c5ad939a0b2001.doc",

                "D:\\OSS\\ghwb\\doc2docx\\13c5ad939a0b2001.docx");

        //移动文件size为0的数据到指定文件夹

//        getFileSize("D:\\OSS\\ghwb\\ghksj_3_copy","D:\\OSS\\ghwb\\test");

    }

}

WordORImageTranPDF

6.逻辑代码：

package org.utiles.dongl.comment;

import org.apache.log4j.Logger;

import org.utiles.dongl.tools.FileTranPDFTool;

import org.utiles.dongl.tools.WriteToFileExample;

import java.io.*;

import java.util.HashMap;

import java.util.Map;

import static org.utiles.dongl.tools.FileTranPDFTool.doc2pdf;

/**

 * @ClassName: WordTranPDF

 * @Description TODO

 * @Author: 东霖

 * @Date: 2022/7/22 8:55

 * @Version 1.0

 **/

public class WordTranPDF {

    private static Logger logger = Logger.getLogger(WordTranPDF.class);

    /**

     * 获取指定文件路径下所有文件对象

     *

     * @param inFilePath

     * @return

     */

    public static Map<String, String> getFilePathName(String inFilePath,String replacePathOld

            ,String replacePathNew,String wjjl,String pdfToPath) {

        Map<String, String> fileList = new HashMap();

        //创建文件对象

        File f = new File(inFilePath);

        //列出文件名称存入数组

        File[] files = f.listFiles();

        for (int i = 0; i < files.length; i++) {

            if (files[i].getName().endsWith("docx") || files[i].getName().endsWith("doc")

                    || files[i].getName().endsWith("wps") || files[i].getName().endsWith("rtf"))

            {

//                String str=files[i].getPath().substring(0,files[i].getPath().lastIndexOf(".")+1)+"pdf";

                String str=files[i].getPath().substring(0,files[i].getPath().lastIndexOf(".")+1)+"docx";

                fileList.put(files[i].getPath()+"&"+"word",str.replace(replacePathOld,replacePathNew));

//                logger.info("当前文件路径为："+files[i].getPath());

            } else if (files[i].getName().endsWith(".png") || files[i].getName().endsWith(".jpg") || files[i].getName().endsWith(".gif")

                    || files[i].getName().endsWith(".jpeg") || files[i].getName().endsWith(".tif"))

            {

                String str=files[i].getPath().substring(0,files[i].getPath().lastIndexOf(".")+1)+"pdf";

                fileList.put(files[i].getPath()+"&"+"image", str.replace(replacePathOld,replacePathNew));

//                logger.info("当前文件路径为："+files[i].getPath());

            }else if(files[i].getName().endsWith(".pdf")) {

                WriteToFileExample.moveFileToTarget(files[i].getPath(),pdfToPath+files[i].getName(),"");

                logger.info("移动："+files[i].getPath()+"到"+pdfToPath);

            }else{

                WriteToFileExample.writeFileSQL("当前文件无法转换："+files[i].getPath(),wjjl);

            }

        }

        return fileList;

    }

    public static void start(Map<String, String> hashMap) throws InterruptedException {

        long old = System.currentTimeMillis();

        int j = 0;

        for (Map.Entry<String, String> entry : hashMap.entrySet()) {

//            doc2pdf(entry.getKey(),entry.getValue());

            String[] split = entry.getKey().split("&");

            if(split[1].equals("word")){

                System.out.println(entry.getValue());

                doc2pdf(split[0],entry.getValue());

                Thread.sleep(Long.parseLong("15"));

            }else if (split[1].equals("image")){

                FileTranPDFTool.ImageToPDF(split[0],entry.getValue());

                Thread.sleep(Long.parseLong("15"));

            }else {

//                break;

            }

            j++;

            logger.info("转换第："+j+"个！"+"文件名称为："+entry.getKey());

        }

        long now = System.currentTimeMillis();

        logger.info("pdf转换成功，共耗时：" + ((now - old) / 1000.0) + "秒");

        logger.info("共转换：" + j + "个文件!");

    }

    public static void main(String[] args) throws InterruptedException {

        /**

         * inFilePath: 需要转换的文件夹路径

         * replacePathOld: 抓换后的文件要写入新文件，直接替换文件的上级目录关键字即可

         * replacePathNew: 新的文件父路径

         * wjjl: 不能转换的文件记录位置及记录名称

         * pdfToPath：当文件中已有pdf不用抓换的需配置文件留存方向。会从原文件目录移动至新文件目录

         */

        Map<String, String> filePathName = getFilePathName("D:\\OSS\\ghwb\\doc11",

                "doc11","doc2docx",

                "D:\\OSS\\ghwb\\"+System.currentTimeMillis()+".txt"

        ,"D:\\OSS\\yjbg\\gjxxzx\\ghksj_copy\\");

        start(filePathName);

    }

}

7.上述就是word或者image类型的批量脚本，可以在工具类中单元测试之后在使用批量逻辑代码。

doc或docx(word)或image类型文件批量转PDF脚本的更多相关文章

将doc文件批量转为pdf文件
需要将不少doc文件转为pdf,WPS带有这种功能,但是鼠标点击次数太多以后整个人都变得很烦躁用了一下午去搜这方面的工具软件,找到若干.有一些免费,有一些试用的,但总归就找到一个真正能用,虽说生成的 ...
C# 将Word,Execl,PPT,Project, 文件转成PDF, 不依赖Office!!
git 地址 https://gitee.com/bandung/Execl_WordTOPDF.git 包括了各种破解的dll Word转PDF 挨个引用 Word转PDF public void ...
文件批量scp分发脚本
#!/bin/bash SERVERS="172.17.xx.y 172.17.pp.mm" PASSWORD=机器登录密码 auto_ssh_copy_file() { expe ...
个人永久性免费-Excel催化剂功能第88波-批量提取pdf文件信息（图片、表格、文本等）
日常办公场合中,除了常规的Excel.Word.PPT等文档外,还有一个不可忽略的文件格式是pdf格式,而对于想从pdf文件中获取信息时,常规方法将变得非常痛苦和麻烦.此篇给大家送一pdf文件提取信息 ...
基于java 合并.doc和docx格式的Word文件
注:摘录自 https://www.cnblogs.com/shenzhouyh/articles/7243805.html 之前用过jacob 合并.doc,但是是有jacob有弊端: 服务器必须是 ...
C#仪器数据文件解析-Word文件（doc、docx）
不少仪器数据报告输出为Word格式文件,同Excel文件,Word文件doc和docx的存储格式是不同的,相应的解析Word文件的方式也类似,主要有以下方式: 1.通过MS Word应用程序的DCOM ...
Python：读取 .doc、.docx 两种 Word 文件简述及“Word 未能引发事件”错误
概述 Python 中可以读取 word 文件的库有 python-docx 和 pywin32. 下表比较了各自的优缺点. 优点缺点 python-docx 跨平台只能处理 .docx 格式 ...
python对不同类型文件（doc,txt,pdf）的字符查找
python对不同类型文件的字符查找 TXT文件: def txt_handler(self, f_name, find_str): """ 处理txt文件 :param ...
手把手教你用 Spring Boot搭建一个在线文件预览系统！支持ppt、doc等多种类型文件预览
昨晚搭建环境都花了好一会时间,主要在浪费在了安装 openoffice 这个依赖环境上(Mac 需要手动安装). 然后,又一步一步功能演示,记录,调试项目,并且简单研究了一下核心代码之后才把这篇文章写 ...

随机推荐

手绘图解java类加载原理
摘要:这也许是全网"最大"."最细"."最深"的java类加载原理图解了. 本文分享自华为云社区<[读书会第12期]这也许是全网&qu ...
SPPNet(特征金字塔池化)学习笔记
SPPNet paper:Spatial pyramid pooling in deep convolutional networks for visual recognition code 首先介绍 ...
Clickhouse实时数仓建设
1.概述 Clickhouse是一个开源的列式存储数据库,其主要场景用于在线分析处理查询(OLAP),能够使用SQL查询实时生成分析数据报告.今天,笔者就为大家介绍如何使用Clickhouse来构建实 ...
换个角度带你学C语言的基本数据类型
摘要: C语言的基本数据类型,大家从学生时代就开始学习了,但是又有多少人会试图从底层的角度去学习呢?这篇文章会用一问一答的形式,慢慢解析相关的内容和困惑. 本文分享自华为云社区<从深入理解底层的 ...
QQ空间未授权评论_已忽略
看群友们聊天时发现的, 大概是做了查看了动态访问时间的一个设置, 但是仅自己可见的说说还是被评论了的这么一个问题. 闲的没事就翻了一下找一下问题. 这个方法嘎嘎鸡肋, 可以说完全没用, 交到tsrc, ...
13.LAMP架构介绍及配置
LAMP架构介绍及配置 LAMP简介与概述 LAMP概述 LAMP架构是目前成熟的企业网站应用模式之一,指的是协同工作的一整套系统和相关软件,能够提供动态Web站点服务及其应用开发环境. LAMP是一 ...
如何正确理解古典概率中的条件概率《考研概率论学习之我见》 -by zobol
"B事件发生的条件下,A事件发生的概率"? "在A集合内有多少B的样本点"? "在B约束条件下,A发生的概率变化为?" "B事件中 ...
SAP IDOC-Segment E1EDP19 Document Item Object Identification
PO创建时,通过IDOC EDI 接口自动创建SO 案例. BD54 配置逻辑系统 SCC4 给集团分配逻辑系统 SM59 新建RFC 链接 WE21 创建IDOC 处理端口 we20 创建合作伙伴 ...
UiPath官方视频Level1
[UiPath官方视频Level1]第一课-UiPath简介https://www.bilibili.com/video/BV1zJ41187vB [UiPath官方视频Level1]第二课-变量和数 ...
基于.NetCore开发博客项目 StarBlog - (13) 加入友情链接功能
系列文章基于.NetCore开发博客项目 StarBlog - (1) 为什么需要自己写一个博客? 基于.NetCore开发博客项目 StarBlog - (2) 环境准备和创建项目基于.NetC ...

doc或docx(word)或image类型文件批量转PDF脚本

doc或docx(word)或image类型文件批量转PDF脚本

doc或docx(word)或image类型文件批量转PDF脚本的更多相关文章

随机推荐

热门专题