【libreoffice】libreoffice实现office转pdf、html、jpg等格式数据
其实libreoffice有好多功能,完全可以替代office
1.windows下将word转为pdf
1 安装libreoffice
到官网下载后安装即可。https://donate.libreoffice.org/
安装完成后目录:
其实安装完我们发现其有好多功能,现在介绍几个重要的功能。
soffice.exe --- 类似于一个全收录功能,双击可以新建好多格式文本。
sweb.exe---类似于一个html的编辑器,可以编辑好多文件,可能与notpad++更像。
scalc.exe---类似于excel,对表格处理。
simpress.exe---类似于ppt
swriter.exe---类似于word,编辑文档(当然可以打开docx文档)
sbase.exe----对数据库进行操作,可以通过JDBC、ODBC连接数据库,没有可视化工具的时候可以用这个。
2.配置环境变量(为了我们能在任何情况下调用命令)
执行命令测试soffice
C:\Users\liqiang>
LibreOffice 6.0.6.2 0c292870b25a325b5ed35f6b45599d2ea4458e77 Usage: soffice [argument...]
argument - switches, switch parameters and document URIs (filenames). Using without special arguments:
Opens the start center, if it is used without any arguments.
{file} Tries to open the file (files) in the components
suitable for them.
{file} {macro:///Library.Module.MacroName}
Opens the file and runs specified macros from
the file. Getting help and information:
--help | -h | -? Shows this help and quits.
--helpwriter Opens built-in or online Help on Writer.
--helpcalc Opens built-in or online Help on Calc.
--helpdraw Opens built-in or online Help on Draw.
--helpimpress Opens built-in or online Help on Impress.
--helpbase Opens built-in or online Help on Base.
--helpbasic Opens built-in or online Help on Basic scripting
language.
--helpmath Opens built-in or online Help on Math.
--version Shows the version and quits.
--nstemporarydirectory
(MacOS X sandbox only) Returns path of the temporary
directory for the current user and exits. Overrides
all other arguments. General arguments:
--quickstart[=no] Activates[Deactivates] the Quickstarter service.
--nolockcheck Disables check for remote instances using one
installation.
--infilter={filter} Force an input filter type if possible. For example:
--infilter="Calc Office Open XML"
--infilter="Text (encoded):UTF8,LF,,,"
--pidfile={file} Store soffice.bin pid to {file}.
--display {display} Sets the DISPLAY environment variable on UNIX-like
platforms to the value {display} (only supported by a
start script). User/programmatic interface control:
--nologo Disables the splash screen at program start.
--minimized Starts minimized. The splash screen is not displayed.
--nodefault Starts without displaying anything except the splash
screen (do not display initial window).
--invisible Starts in invisible mode. Neither the start-up logo nor
the initial program window will be visible. Application
can be controlled, and documents and dialogs can be
controlled and opened via the API. Using the parameter,
the process can only be ended using the taskmanager
(Windows) or the kill command (UNIX-like systems). It
cannot be used in conjunction with --quickstart.
--headless Starts in "headless mode" which allows using the
application without GUI. This special mode can be used
when the application is controlled by external clients
via the API.
--norestore Disables restart and file recovery after a system crash.
--safe-mode Starts in a safe mode, i.e. starts temporarily with a
fresh user profile and helps to restore a broken
configuration.
--accept={UNO-URL} Specifies an UNO-URL connect-string to create an UNO
acceptor through which other programs can connect to
access the API. UNO-URL is string the such kind
uno:connection-type,params;protocol-name,params;ObjectName.
--unaccept={UNO-URL} Closes an acceptor that was created with --accept. Use
--unaccept=all to close all open acceptors.
--language={lang} Uses specified language, if language is not selected
yet for UI. The lang is a tag of the language in IETF
language tag. Developer arguments:
--terminate_after_init
Exit after initialization complete (no documents loaded).
--eventtesting Exit after loading documents. New document creation arguments:
The arguments create an empty document of specified kind. Only one of them may
be used in one command line. If filenames are specified after an argument,
then it tries to open those files in the specified component.
--writer Creates an empty Writer document.
--calc Creates an empty Calc document.
--draw Creates an empty Draw document.
--impress Creates an empty Impress document.
--base Creates a new database.
--global Creates an empty Writer master (global) document.
--math Creates an empty Math document (formula).
--web Creates an empty HTML document. File open arguments:
The arguments define how following filenames are treated. New treatment begins
after the argument and ends at the next argument. The default treatment is to
open documents for editing, and create new documents from document templates.
-n Treats following files as templates for creation of new
documents.
-o Opens following files for editing, regardless whether
they are templates or not.
--pt {Printername} Prints following files to the printer {Printername},
after which those files are closed. The splash screen
does not appear. If used multiple times, only last
{Printername} is effective for all documents of all
--pt runs. Also, --printer-name argument of
--print-to-file switch interferes with {Printername}.
-p Prints following files to the default printer, after
which those files are closed. The splash screen does
not appear. If the file name contains spaces, then it
must be enclosed in quotation marks.
--view Opens following files in viewer mode (read-only).
--show Opens and starts the following presentation documents
of each immediately. Files are closed after the showing.
Files other than Impress documents are opened in
default mode , regardless of previous mode.
--convert-to OutputFileExtension[:OutputFilterName]
[--outdir output_dir] [--convert-images-to]
Batch convert files (implies --headless). If --outdir
isn't specified, then current working directory is used
as output_dir. If --convert-images-to is given, its
parameter is taken as the target MIME format for *all*
images written to the output format. If --convert-to is
used more than once, the last value of OutputFileExtension
[:OutputFilterName] is effective. If --outdir is used more
than once, only its last value is effective. For example:
--convert-to pdf *.odt
--convert-to epub *.doc
--convert-to pdf:writer_pdf_Export --outdir /home/user *.doc
--convert-to "html:XHTML Writer File:UTF8" *.doc
--convert-to "txt:Text (encoded):UTF8" *.doc
--print-to-file [--printer-name printer_name] [--outdir output_dir]
Batch print files to file. If --outdir is not specified,
then current working directory is used as output_dir.
If --printer-name or --outdir used multiple times, only
last value of each is effective. Also, {Printername} of
--pt switch interferes with --printer-name.
--cat Dump text content of the following files to console
(implies --headless). Cannot be used with --convert-to.
--script-cat Dump text content of any scripts embedded in the files to console
(implies --headless). Cannot be used with --convert-to.
-env:<VAR>[=<VALUE>] Set a bootstrap variable. For example: to set
a non-default user profile path:
-env:UserInstallation=file:///tmp/test Ignored switches:
-psn Ignored (MacOS X only).
-Embedding Ignored (COM+ related; Windows only).
--nofirststartwizard Does nothing, accepted only for backward compatibility.
--protector {arg1} {arg2}
Used only in unit tests and should have two arguments.
4.命令行转换pdf
转换到当前目录:
liqiang@root MINGW64 ~/Desktop/新建文件夹 ()
$ soffice --headless --convert-to pdf ./Java开发-太原科技大学-软件工程-乔利强.docx
convert C:\Users\liqiang\Desktop\▒½▒▒ļ▒▒▒ ()\Java▒▒▒▒-̫ԭ▒Ƽ▒▒▒ѧ-▒▒▒▒▒▒▒-▒▒▒▒ǿ.docx -> C:\Users\liqiang\Desktop\▒½▒▒ļ▒▒▒ ()\Java▒▒▒▒-̫ԭ▒Ƽ▒▒▒ѧ-▒▒▒▒▒▒▒-▒▒▒▒ǿ.pdf using filter : writer_pdf_Export
func=xmlSecCheckVersionExt:file=..\src\xmlsec.c:line=:obj=unknown:subj=unknown:error=:invalid version:mode=abi compatible;expected minor version=;real minor version=;expected subminor version=;real subminor version= liqiang@root MINGW64 ~/Desktop/新建文件夹 ()
$ ls
Java开发-太原科技大学-软件工程-乔利强.docx
Java开发-太原科技大学-软件工程-乔利强.pdf
如果需要转换到指定目录可以加--outdir参数
5.java程序实现word转pdf(原理是通过cmd调用上述命令)
import java.io.IOException;
import java.io.InputStream; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; public final class Test {
private static final Logger logger = LoggerFactory.getLogger(Test.class); public static void main(String[] args) throws NullPointerException {
long start = System.currentTimeMillis();
String srcPath = "C:/Users/liqiang/Desktop/ww/tt.docx", desPath = "C:/Users/liqiang/Desktop/ww";
String command = "";
String osName = System.getProperty("os.name");
if (osName.contains("Windows")) {
command = "soffice --headless --convert-to pdf " + srcPath + " --outdir " + desPath;
exec(command);
}
long end = System.currentTimeMillis();
logger.debug("用时:{} ms", end - start);
} public static boolean exec(String command) {
Process process;// Process可以控制该子进程的执行或获取该子进程的信息
try {
logger.debug("exec cmd : {}", command);
process = Runtime.getRuntime().exec(command);// exec()方法指示Java虚拟机创建一个子进程执行指定的可执行程序,并返回与该子进程对应的Process对象实例。
// 下面两个可以获取输入输出流
InputStream errorStream = process.getErrorStream();
InputStream inputStream = process.getInputStream();
} catch (IOException e) {
logger.error(" exec {} error", command, e);
return false;
} int exitStatus = 0;
try {
exitStatus = process.waitFor();// 等待子进程完成再往下执行,返回值是子线程执行完毕的返回值,返回0表示正常结束
// 第二种接受返回值的方法
int i = process.exitValue(); // 接收执行完毕的返回值
logger.debug("i----" + i);
} catch (InterruptedException e) {
logger.error("InterruptedException exec {}", command, e);
return false;
} if (exitStatus != 0) {
logger.error("exec cmd exitStatus {}", exitStatus);
} else {
logger.debug("exec cmd exitStatus {}", exitStatus);
} process.destroy(); // 销毁子进程
process = null; return true;
} }
另一种命令的方式为 cmd /c soffice ..... .
另外写的时候最好pdf后面跟上 :writer_pdf_Export,例如: --convert-to pdf:writer_pdf_Export 可能会在转换失败后调用过滤器重写。
结果:
2018-10-25 21:56:35 [Test]-[DEBUG] exec cmd : soffice --headless --convert-to pdf C:/Users/liqiang/Desktop/ww/tt.docx --outdir C:/Users/liqiang/Desktop/ww
2018-10-25 21:56:45 [Test]-[DEBUG] i----0
2018-10-25 21:56:45 [Test]-[DEBUG] exec cmd exitStatus 0
2018-10-25 21:56:45 [Test]-[DEBUG] 用时:9980 ms
2.linux实现将word转为pdf,以centos为例
1.linux下安装libreoffice
1.下载
我们安装采用yum安装,首先下载rpm包。这里需要三个包。
wget http://mirrors.ustc.edu.cn/tdf/libreoffice/stable/6.0.6/rpm/x86_64/LibreOffice_6.0.6_Linux_x86-64_rpm.tar.gz
wget http://mirrors.ustc.edu.cn/tdf/libreoffice/stable/6.0.6/rpm/x86_64/LibreOffice_6.0.6_Linux_x86-64_rpm_sdk.tar.gz
wget http://mirrors.ustc.edu.cn/tdf/libreoffice/stable/6.0.6/rpm/x86_64/LibreOffice_6.0.6_Linux_x86-64_rpm_langpack_zh-CN.tar.gz
其实我们在windows下通过浏览器访问上面链接也是可以下载tar.gz包的,如果需要不同的版本只需要修改url上的版本号即可。比如我想下载6.0.3的我可以访问下面url:
其实好多时候我们采用wget下载的时候如果下载不下来, 我们可以先在windows下访问url下载完只会传到linux服务器,这也是一种思路。
2.上传到服务器并解压
采用 tar -xvf xxxxxx.tar.gz解压即可。解压结果如下:
[root@VM_0_12_centos libreoffice]# ll
total
drwxr-xr-x root root Jul : LibreOffice_6.0.6.2_Linux_x86-64_rpm
drwxr-xr-x root root Jul : LibreOffice_6.0.6.2_Linux_x86-64_rpm_langpack_zh-CN
drwxr-xr-x root root Jul : LibreOffice_6.0.6.2_Linux_x86-64_rpm_sdk
-rw-r--r-- root root Oct : LibreOffice_6..6_Linux_x86-64_rpm_langpack_zh-CN.tar.gz
-rw-r--r-- root root Oct : LibreOffice_6..6_Linux_x86-64_rpm_sdk.tar.gz
-rw-r--r-- root root Oct : LibreOffice_6..6_Linux_x86-64_rpm.tar.gz
3.采用yum localinstall *.rpm安装rpm文件
[root@VM_0_12_centos RPMS]# pwd
/opt/libreoffice/LibreOffice_6.0.6.2_Linux_x86-64_rpm/RPMS
[root@VM_0_12_centos RPMS]# yum localinstall *.rpm
RPMS下存放的是需要安装的rpm文件,进入该文件夹下采用通配符的方式安装即可。(三个tar.gz解压后的都需要安装)
4.测试libreoffice
[root@VM_0_12_centos RPMS]# libreoffice6.0 -help
Warning: -help is deprecated. Use --help instead.
LibreOffice 6.0.6.2 0c292870b25a325b5ed35f6b45599d2ea4458e77 Usage: soffice [argument...]
argument - switches, switch parameters and document URIs (filenames). Using without special arguments:
Opens the start center, if it is used without any arguments.
{file} Tries to open the file (files) in the components
suitable for them.
{file} {macro:///Library.Module.MacroName}
Opens the file and runs specified macros from
the file. Getting help and information:
--help | -h | -? Shows this help and quits.
--helpwriter Opens built-in or online Help on Writer.
--helpcalc Opens built-in or online Help on Calc.
--helpdraw Opens built-in or online Help on Draw.
--helpimpress Opens built-in or online Help on Impress.
--helpbase Opens built-in or online Help on Base.
--helpbasic Opens built-in or online Help on Basic scripting
language.
--helpmath Opens built-in or online Help on Math.
--version Shows the version and quits.
--nstemporarydirectory
(MacOS X sandbox only) Returns path of the temporary
directory for the current user and exits. Overrides
all other arguments. General arguments:
--quickstart[=no] Activates[Deactivates] the Quickstarter service.
--nolockcheck Disables check for remote instances using one
installation.
--infilter={filter} Force an input filter type if possible. For example:
--infilter="Calc Office Open XML"
--infilter="Text (encoded):UTF8,LF,,,"
--pidfile={file} Store soffice.bin pid to {file}.
--display {display} Sets the DISPLAY environment variable on UNIX-like
platforms to the value {display} (only supported by a
start script). User/programmatic interface control:
--nologo Disables the splash screen at program start.
--minimized Starts minimized. The splash screen is not displayed.
--nodefault Starts without displaying anything except the splash
screen (do not display initial window).
--invisible Starts in invisible mode. Neither the start-up logo nor
the initial program window will be visible. Application
can be controlled, and documents and dialogs can be
controlled and opened via the API. Using the parameter,
the process can only be ended using the taskmanager
(Windows) or the kill command (UNIX-like systems). It
cannot be used in conjunction with --quickstart.
--headless Starts in "headless mode" which allows using the
application without GUI. This special mode can be used
when the application is controlled by external clients
via the API.
--norestore Disables restart and file recovery after a system crash.
--safe-mode Starts in a safe mode, i.e. starts temporarily with a
fresh user profile and helps to restore a broken
configuration.
--accept={UNO-URL} Specifies an UNO-URL connect-string to create an UNO
acceptor through which other programs can connect to
access the API. UNO-URL is string the such kind
uno:connection-type,params;protocol-name,params;ObjectName.
--unaccept={UNO-URL} Closes an acceptor that was created with --accept. Use
--unaccept=all to close all open acceptors.
--language={lang} Uses specified language, if language is not selected
yet for UI. The lang is a tag of the language in IETF
language tag. Developer arguments:
--terminate_after_init
Exit after initialization complete (no documents loaded).
--eventtesting Exit after loading documents. New document creation arguments:
The arguments create an empty document of specified kind. Only one of them may
be used in one command line. If filenames are specified after an argument,
then it tries to open those files in the specified component.
--writer Creates an empty Writer document.
--calc Creates an empty Calc document.
--draw Creates an empty Draw document.
--impress Creates an empty Impress document.
--base Creates a new database.
--global Creates an empty Writer master (global) document.
--math Creates an empty Math document (formula).
--web Creates an empty HTML document. File open arguments:
The arguments define how following filenames are treated. New treatment begins
after the argument and ends at the next argument. The default treatment is to
open documents for editing, and create new documents from document templates.
-n Treats following files as templates for creation of new
documents.
-o Opens following files for editing, regardless whether
they are templates or not.
--pt {Printername} Prints following files to the printer {Printername},
after which those files are closed. The splash screen
does not appear. If used multiple times, only last
{Printername} is effective for all documents of all
--pt runs. Also, --printer-name argument of
--print-to-file switch interferes with {Printername}.
-p Prints following files to the default printer, after
which those files are closed. The splash screen does
not appear. If the file name contains spaces, then it
must be enclosed in quotation marks.
--view Opens following files in viewer mode (read-only).
--show Opens and starts the following presentation documents
of each immediately. Files are closed after the showing.
Files other than Impress documents are opened in
default mode , regardless of previous mode.
--convert-to OutputFileExtension[:OutputFilterName]
[--outdir output_dir] [--convert-images-to]
Batch convert files (implies --headless). If --outdir
isn't specified, then current working directory is used
as output_dir. If --convert-images-to is given, its
parameter is taken as the target MIME format for *all*
images written to the output format. If --convert-to is
used more than once, the last value of OutputFileExtension
[:OutputFilterName] is effective. If --outdir is used more
than once, only its last value is effective. For example:
--convert-to pdf *.odt
--convert-to epub *.doc
--convert-to pdf:writer_pdf_Export --outdir /home/user *.doc
--convert-to "html:XHTML Writer File:UTF8" *.doc
--convert-to "txt:Text (encoded):UTF8" *.doc
--print-to-file [--printer-name printer_name] [--outdir output_dir]
Batch print files to file. If --outdir is not specified,
then current working directory is used as output_dir.
If --printer-name or --outdir used multiple times, only
last value of each is effective. Also, {Printername} of
--pt switch interferes with --printer-name.
--cat Dump text content of the following files to console
(implies --headless). Cannot be used with --convert-to.
--script-cat Dump text content of any scripts embedded in the files to console
(implies --headless). Cannot be used with --convert-to.
-env:<VAR>[=<VALUE>] Set a bootstrap variable. For example: to set
a non-default user profile path:
-env:UserInstallation=file:///tmp/test Ignored switches:
-psn Ignored (MacOS X only).
-Embedding Ignored (COM+ related; Windows only).
--nofirststartwizard Does nothing, accepted only for backward compatibility.
--protector {arg1} {arg2}
Used only in unit tests and should have two arguments.
安装后的命令是libreoffice6.0
5.为了使用libreoffice我们创建别名
[root@VM_0_12_centos ~]# alias libreoffice='libreoffice6.0'
[root@VM_0_12_centos ~]# alias
alias cp='cp -i'
alias egrep='egrep --color=auto'
alias fgrep='fgrep --color=auto'
alias grep='grep --color=auto'
alias l.='ls -d .* --color=auto'
alias libreoffice='libreoffice6.0'
alias ll='ls -l --color=auto'
alias ls='ls --color=auto'
2.linux下面命令行测试word转pdf(其参数与windows下的参数大体相同)
[root@VM_0_12_centos tmpFile]# ls
tt.docx
[root@VM_0_12_centos tmpFile]# libreoffice6.0 --convert-to pdf:writer_pdf_Export ./tt.docx
func=xmlSecCheckVersionExt:file=xmlsec.c:line=:obj=unknown:subj=unknown:error=:invalid version:mode=abi compatible;expected minor version=;real minor version=;expected subminor version=;real subminor version=
convert /root/tmpFile/tt.docx -> /root/tmpFile/tt.pdf using filter : writer_pdf_Export
[root@VM_0_12_centos tmpFile]# ls
tt.docx tt.pdf
[root@VM_0_12_centos tmpFile]#
我们将上面生成的pdf传回windows下面查看发现中文乱码。
3.关于word转pdf中文乱码问题的解决办法
1.查看fonts目录
[root@VM_0_12_centos tmpFile]# cat /etc/fonts/fonts.conf | grep fon
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<!-- /etc/fonts/fonts.conf file to configure system font access -->
<fontconfig>
problems to the fontconfig bugzilla system located at fontconfig.org
Note that the normal 'make install' procedure for fontconfig is to
replace any existing fonts.conf file with the new version. Place
<dir>/usr/share/fonts</dir>
<dir>/usr/share/X11/fonts/Type1</dir> <dir>/usr/share/X11/fonts/TTF</dir> <dir>/usr/local/share/fonts</dir>
<dir prefix="xdg">fonts</dir>
<dir>~/.fonts</dir>
<include ignore_missing="yes">/etc/fonts/conf.d</include>
<cachedir>/var/cache/fontconfig</cachedir>
<cachedir prefix="xdg">fontconfig</cachedir>
<cachedir>~/.fontconfig</cachedir>
in fonts. All other blank chars are assumed to be broken and
</fontconfig>
发现上面的字体存在/usr/share/fonts目录下。
2.把Windows下的字体C:\Windows\Fonts下的宋体,即simsun.ttc上传到linux服务器并赋值到上面的字体目录下赋予读写权限
[root@VM_0_12_centos libreoffice]# ll | grep simsun.ttc
-rw-r--r-- root root Oct : simsun.ttc
cp simsun.ttc /usr/share/fonts
cd /usr/share/fonts
赋予权限(默认权限也可以,如果不可以就手动赋予权限即可)
chmod simsun.ttc
3.更新字体缓存
fc-cache -fv
再次转换pdf发现完美解决。
4.linux下Java程序调用libreoffice转换pdf
文件的位置与输出目录通过主函数参数传递进去。
(1)先写一个简单的程序进行测试
import java.io.IOException; public class Test { public static void main(String[] args) throws NullPointerException {
String filePath = args[0];
String destDir = args[1];
String osName = System.getProperty("os.name");
System.out.println(filePath);
System.out.println(destDir);
System.out.println(osName);
String cmd = "libreoffice6.0 --convert-to pdf:writer_pdf_Export " + filePath + " --outdir " + destDir;
System.out.println(cmd);
try {
Runtime.getRuntime().exec(cmd);
} catch (IOException e) {
System.err.println(e.getMessage());
}
} }
我们在linux下面进行编译并且运行:
[root@VM_0_12_centos tmpFile]# javac Test.java
[root@VM_0_12_centos tmpFile]# java Test ./tt.docx ./
./tt.docx
./
Linux
libreoffice6. --convert-to pdf:writer_pdf_Export ./tt.docx --outdir ./
[root@VM_0_12_centos tmpFile]# ls
Test.class Test.java tt.docx tt.pdf
(2)接下来简单的编写程序获取转换时间:(使线程等待抓换完成)
import java.io.IOException; public class Test { public static void main(String[] args) throws NullPointerException {
long start = System.currentTimeMillis();
String filePath = args[0];
String destDir = args[1];
String osName = System.getProperty("os.name");
System.out.println(filePath);
System.out.println(destDir);
System.out.println(osName);
String cmd = "libreoffice6.0 --convert-to pdf:writer_pdf_Export " + filePath + " --outdir " + destDir;
System.out.println(cmd);
try {
Process process = Runtime.getRuntime().exec(cmd);
try {
// 获取返回状态
int status = process.waitFor();
// 销毁process
process.destroy();
process = null;
System.out.println("status -> " + status);
} catch (InterruptedException e) {
System.err.println(e.getMessage());
}
} catch (IOException e) {
System.err.println(e.getMessage());
}
long end = System.currentTimeMillis();
System.out.println("用时:" + (end - start) + "ms");
} }
再次在linux下面编译运行:
[root@VM_0_12_centos tmpFile]# java Test ./tt.docx ./
./tt.docx
./
Linux
libreoffice6. --convert-to pdf:writer_pdf_Export ./tt.docx --outdir ./
status ->
用时:1463ms
[root@VM_0_12_centos tmpFile]# ls
Test.class Test.java tt.docx tt.pdf
至此完成了使用libreoffice在windows与linux下面转换pdf,这种方式感觉比较稳定。同时也学会了Runtime 调用本地程序以单线程方式运行的方法。
文中用到的所有的tar包以及字体simsun.ttc下载地址:http://qiaoliqiang.cn/fileDown/linuxlibreoffice.zip
补充:word也可以转为html,测试word转html
word内容:
soffice.exe --headless --convert-to html .\通用功能需求收集20180723.docx
结果:
补充:word可以转jpg
soffice.exe --headless --convert-to jpg .\通用功能需求收集20180723.docx
结果生成jpg:
补充:word可以转txt
soffice.exe --headless --convert-to txt .\通用功能需求收集20180723.docx
结果:
补充:其实excel和ppt也可以转为pdf和html以及jpg,下面研究excel转换(只是边框被去掉,如果需要显示边框在excel中的样式需要显示边框;而且内容过长会折行,解决办法就是缩小列宽、减少列数)
原来excel内容:
转换:
soffice.exe --headless --convert-to jpg ./test.xls soffice.exe --headless --convert-to html ./test.xls soffice.exe --headless --convert-to pdf ./test.xls
(1)转换后的jpg
(2)转换的html
(3)转换后的pdf
补充:直接拷贝目录遇到的问题:
今天拷贝下载好的目录使用时,发现报错缺失VCRUNTIME140.dll和MSVCP140.dll,于是拷贝另外一台电脑到缺失的电脑上就可以了。记住是C:\Windows\System32目录和C:\Windows\SysWOW64目录下对应的dll,这两个文件夹下的dll不一样,虽然文件名一样,但是大小不一样,所以要复制对应的dll。
补充;java也可以用jodconverter进行转换,我用的是jodconverter2.2版本(该工具包依赖openoffice或libreoffice插件)
依赖的jar包如下:
代码如下:
import java.io.File;
import java.io.IOException; import com.artofsolving.jodconverter.DocumentConverter;
import com.artofsolving.jodconverter.openoffice.connection.OpenOfficeConnection;
import com.artofsolving.jodconverter.openoffice.connection.SocketOpenOfficeConnection;
import com.artofsolving.jodconverter.openoffice.converter.OpenOfficeDocumentConverter; public class Office2Pdf { // 将word格式的文件转换为pdf格式
public static void WordToPDF(String startFile, String overFile) throws IOException {
// 源文件目录
File inputFile = new File(startFile);
if (!inputFile.exists()) {
System.out.println("源文件不存在!");
return;
} // 输出文件目录
File outputFile = new File(overFile);
if (!outputFile.getParentFile().exists()) {
outputFile.getParentFile().exists();
} // 调用openoffice服务线程
/** 我把openOffice下载到了 C:/Program Files (x86)/下 ,下面的写法自己修改编辑就可以 **/
String command = "D:/zdc8/lo/program/soffice.exe -headless -accept=\"socket,host=127.0.0.1,port=8300;urp;\"";
Process p = Runtime.getRuntime().exec(command); // 连接openoffice服务
OpenOfficeConnection connection = new SocketOpenOfficeConnection("127.0.0.1", 8300);
connection.connect(); // 转换
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile); // 关闭连接
connection.disconnect(); // 关闭进程
p.destroy();
} public static void main(String[] args) {
String start = "C:\\Users\\Administrator\\Desktop\\123.xlsx";
String over = "C:\\Users\\Administrator\\Desktop\\123.xlsx.pdf";
try {
WordToPDF(start, over);
} catch (IOException e) {
e.printStackTrace();
}
}
}
如果想去掉留痕,需要反编译jodconverter-2.2.2.jar,获取类OpenOfficeDocumentConverter.java,修改方法loadAndExport,如下:(加粗部分是添加的代码)
private void loadAndExport(String inputUrl, Map/* <String,Object> */ loadProperties, String outputUrl,
Map/* <String,Object> */ storeProperties) throws OpenOfficeException {
XComponent document;
try {
document = loadDocument(inputUrl, loadProperties);
} catch (ErrorCodeIOException errorCodeIOException) {
throw new OpenOfficeException(
"conversion failed: could not load input document; OOo errorCode: " + errorCodeIOException.ErrCode,
errorCodeIOException);
} catch (Exception otherException) {
throw new OpenOfficeException("conversion failed: could not load input document", otherException);
}
if (document == null) {
throw new OpenOfficeException("conversion failed: could not load input document");
} XPropertySet mxDocProps = (XPropertySet) UnoRuntime.queryInterface(XPropertySet.class, document);
try {
mxDocProps.setPropertyValue("RedlineDisplayType", RedlineDisplayType.NONE);
} catch (Exception e) {
throw new OpenOfficeException("dispose RedlineDisplay failed", e);
} refreshDocument(document); try {
storeDocument(document, outputUrl, storeProperties);
} catch (ErrorCodeIOException errorCodeIOException) {
throw new OpenOfficeException(
"conversion failed: could not save output document; OOo errorCode: " + errorCodeIOException.ErrCode,
errorCodeIOException);
} catch (Exception otherException) {
throw new OpenOfficeException("conversion failed: could not save output document", otherException);
}
}
补充:基于libreoffice和jodconverter的文件在线预览插件,这个插件功能强大,使用简单
git地址: https://github.com/kekingcn/kkFileView
博客地址: https://my.oschina.net/keking/blog/3064732
【libreoffice】libreoffice实现office转pdf、html、jpg等格式数据的更多相关文章
- java操作office和pdf文件java读取word,excel和pdf文档内容
在平常应用程序中,对office和pdf文档进行读取数据是比较常见的功能,尤其在很多web应用程序中.所以今天我们就简单来看一下Java对word.excel.pdf文件的读取.本篇博客只是讲解简单应 ...
- 记录libreoffice实现office转pdf(适用于windows、linux)
由于目前的工作跟office打交道比较多,所以才有了此篇blog,需求是实现word转换pdf方便页面展示.之前lz采用的是jacob(仅支持windows)进行转换的,但是现在服务器改成linux显 ...
- LibreOffice转换文档到pdf时中文乱码
根据我的测试,LibreOffice转换文档到pdf乱码主要有三个方面的原因: 1.centos缺少中文字体 2.jdk缺少中文字体 3.LibreOffice缺少中文字体. 解决该问题需要将wind ...
- linux php 环境word转pdf、excel转pdf、office转pdf
最近项目中遇到一个需求,将word.excel文件转换成pdf,并且打上水印,我利用的是libreoffice,这个需要Java 的jdk环境.废话不多说,开撸 1.在linux上搭建jdk环境 文章 ...
- Confluence 6 Office 和 PDF 文件
插入一个文件到页面中是能够让你将有用的文件,电子表格,幻灯片或者其他可用的文件在你小组中进行分享的好方法. 针对所有的文件类型,你可以选择以链接方式插入一个文件.缩略图将会对文档的内容进行预览同时可以 ...
- openoffice+pdf2swf+FlexPaper在线显示office和pdf
前提:本人的系统为Ubuntu 13.10 64位系统.本篇是我在配置好环境后一段时间写的,所以操作上可能会有也错误,因此仅供参考. 搜索在线显示office和pdf,最常见的方法就是把都转为swf, ...
- ABBYY PDF Transformer+支持的格式
ABBYY PDF Transformer+是一个新的,全面的巧妙解决PDF文档的工具,它将泰比的光学字符识别(OCR)技术和Adobe®PDF技术完美结合,以确保实现便捷地处理任何类型的PDF文件, ...
- C# 将PDF文件转换为word格式
Pdf(Portable Document Format)意为“便携式文档格式”,是现在最流行的文件格式之一,它有很多优点如:尺寸较小.阅读方便.操作系统平台通用等,非常适合在网络上传播和使用.如今在 ...
- 在js内生成PDF文件并下载的功能实现(不调用后端),以及生成pdf时换行的格式不被渲染,word-break:break-all
在js内生成PDF文件并下载的功能实现(不调用后端),以及生成pdf时换行的格式不被渲染,word-break:break-all 前天来了个新需求, 有一个授权书的文件要点击下载, 需要在前端生成, ...
随机推荐
- Debug时含有的子元素,在代码里获取不到的问题
比如,Debug时如下图展示: 我想要获取的是:ansList.get(i).getComponent().getConnectorId() debug时明明有这个元素,但是当我写出来的时候却发现:a ...
- 【POI每日题解 #6】KRA-The Disks
题目链接 : [POI2006]KRA-The Disks 好有既视感啊... 注意一下输入输出 输入是从上到下输入箱子的宽度 输出是最上面的积木停在哪一层 即 箱子高度 - 积木高度 + 1 在初始 ...
- [洛谷P4245]【模板】任意模数NTT
题目大意:给你两个多项式$f(x)$和$g(x)$以及一个模数$p(p\leqslant10^9)$,求$f*g\pmod p$ 题解:任意模数$NTT$,最大的数为$p^2\times\max\{n ...
- cf757F Team Rocket Rises Again (dijkstra+支配树)
我也想要皮卡丘 跑一遍dijkstra,可以建出一个最短路DAG(从S到任意点的路径都是最短路),然后可以在上面建支配树 并不会支配树,只能简单口胡一下在DAG上的做法 建出来的支配树中,某点的祖先集 ...
- C# 遍历指定目录下的所有文件及文件夹以及遍历数据库的方法
// DirectoryInfo di = new DirectoryInfo(@"D:\Test"); // FindFile(di); static void FindFile ...
- Netty 4.1 Getting Start (翻译) + Demo
一.先来官方入门页面的翻译(翻译不好请多包涵) 入门 本章以简单的例子来介绍Netty的核心概念,以便让您快速入门.当您阅读完本章之后,您就能立即在Netty的基础上写一个客户端和一个服务器. 如果您 ...
- python2.x到python3.x函数变化
首先,python 3.x中urllib库和urilib2库合并成了urllib库. 其中urllib2.urlopen()变成了urllib.request.urlopen() urllib2.Re ...
- tp3 save操作小bug误区
$china_area->save(['is_del' => 1,'updatetime' => time()]); SHOW COLUMNS FROM `tf_china_area ...
- CSS的显示模式
div与span div与span有什么区别 div单独占一行,span不会单独占一行 div是容器级的标签,而span是一个文本级的标签 容器级的标签有:div , h , ul , ol , dl ...
- teleport使用说明
teleport使用说明 浏览器下载网页:只能浏览主页和少数网页,其它不能浏览,容量几百kb teleport下载项目一能完全离线看网页,7328多文件 9个JPG文件,大小134M te ...