最近在利用poi往excel中写入大量数据时,发现excel2003最多只支持65535条,大量数据时容易造成oom,上网查了一下api,发现目前对于2003,每个sheet最多支持65535条,若数据量远超65535,建议分sheet处理,而poi3.8之后,出现了SXSSFWorkbook,可以支持大数据量的写入excel操作,但是目前只支持excel2007

HSSF是POI工程对Excel 97(-2007)文件操作的纯Java实现
XSSF是POI工程对Excel 2007 OOXML (.xlsx)文件操作的纯Java实现

从POI 3.8版本开始,提供了一种基于XSSF的低内存占用的API----SXSSF

SXSSF通过一个滑动窗口来限制访问Row的数量从而达到低内存占用的目录,XSSF可以访问所有行。旧的行数据不再出现在滑动窗口中并变得无法访问,与此同时写到磁盘上。
在自动刷新的模式下,可以指定窗口中访问Row的数量,从而在内存中保持一定数量的Row。当达到这一数量时,在窗口中产生新的Row数据,并将低索引的数据从窗口中移动到磁盘中。
或者,滑动窗口的行数可以设定成自动增长的。它可以根据需要周期的根据一次明确的flushRow(int keepRows)调用来进行修改。

SXSSF (Streaming Usermodel API)

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

You can specify the window size at workbook construction time via new SXSSFWorkbook(int windowSize) or you can set it per-sheet via SXSSFSheet#setRandomAccessWindowSize(int windowSize)

When a new row is created via createRow() and the total number of unflushed records would exceed the specified window size, then the row with the lowest index value is flushed and cannot be accessed via getRow() anymore.

The default window size is 100 and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.

A windowSize of -1 indicates unlimited access. In this case all records that have not been flushed by a call to flushRows() are available for random access.

Note that SXSSF allocates temporary files that you must always clean up explicitly, by calling the dispose method.

SXSSFWorkbook defaults to using inline strings instead of a shared strings table. This is very efficient, since no document content needs to be kept in memory, but is also known to produce documents that are incompatible with some clients. With shared strings enabled all unique strings in the document has to be kept in memory. Depending on your document content this could use a lot more resources than with shared strings disabled.

Carefully review your memory budget and compatibility needs before deciding whether to enable shared strings or not.

The example below writes a sheet with a window of 100 rows. When the row count reaches 101, the row with rownum=0 is flushed to disk and removed from memory, when rownum reaches 102 then the row with rownum=1 is flushed, etc.

import junit.framework.Assert;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;

    public static void main(String[] args) throws Throwable {
        SXSSFWorkbook wb = new SXSSFWorkbook(100); // keep 100 rows in memory, exceeding rows will be flushed to disk
        Sheet sh = wb.createSheet();
        for(int rownum = 0; rownum < 1000; rownum++){
            Row row = sh.createRow(rownum);
            for(int cellnum = 0; cellnum < 10; cellnum++){
                Cell cell = row.createCell(cellnum);
                String address = new CellReference(cell).formatAsString();
                cell.setCellValue(address);
            }

        }

        // Rows with rownum < 900 are flushed and not accessible
        for(int rownum = 0; rownum < 900; rownum++){
          Assert.assertNull(sh.getRow(rownum));//调用了getRow方法,写入到磁盘中,释放了内存
        }

        // ther last 100 rows are still in memory
        for(int rownum = 900; rownum < 1000; rownum++){
           // Assert.assertNotNull(sh.getRow(rownum));//未调用,保留在内存中
        }

        FileOutputStream out = new FileOutputStream("D:\\sxssf.xlsx");
        wb.write(out);
        out.close();

        // dispose of temporary files backing this workbook on disk
        wb.dispose();
    }

The next example turns off auto-flushing (windowSize=-1) and the code manually controls how portions of data are written to disk

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;

    public static void main(String[] args) throws Throwable {
        SXSSFWorkbook wb = new SXSSFWorkbook(-1); // turn off auto-flushing and accumulate all rows in memory
        Sheet sh = wb.createSheet();
        for(int rownum = 0; rownum < 1000; rownum++){
            Row row = sh.createRow(rownum);
            for(int cellnum = 0; cellnum < 10; cellnum++){
                Cell cell = row.createCell(cellnum);
                String address = new CellReference(cell).formatAsString();
                cell.setCellValue(address);
            }

           // manually control how rows are flushed to disk
           if(rownum % 100 == 0) {
                ((SXSSFSheet)sh).flushRows(100); // retain 100 last rows and flush all others

                // ((SXSSFSheet)sh).flushRows() is a shortcut for ((SXSSFSheet)sh).flushRows(0),
                // this method flushes all rows
           }

        }

        FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");
        wb.write(out);
        out.close();

        // dispose of temporary files backing this workbook on disk
        wb.dispose();
   }

SXSSF flushes sheet data in temporary files (a temp file per sheet) and the size of these temporary files can grow to a very large value. For example, for a 20 MB csv data the size of the temp xml becomes more than a gigabyte. If the size of the temp files is an issue, you can tell SXSSF to use gzip compression:

  SXSSFWorkbook wb = new SXSSFWorkbook();
  wb.setCompressTempFiles(true); // temp files will be gzipped

以上内容来自API及个人总结,详见源API    http://poi.apache.org/spreadsheet/how-to.html

java poi 写入大量数据到excel中的更多相关文章

  1. 手把手教你springboot中导出数据到excel中

    手把手教你springboot中导出数据到excel中 问题来源: 前一段时间公司的项目有个导出数据的需求,要求能够实现全部导出也可以多选批量导出(虽然不是我负责的,我自己研究了研究),我们的项目是x ...

  2. 【转】c# winform DataGridView导出数据到Excel中,可以导出当前页和全部数据

    准备工作就是可以分页的DataGridView,和两个按钮,一个用来导出当前页数据到Excel,一个用来导出全部数据到Excel 没有使用SaveFileDialog,但却可以弹出保存对话框来 先做导 ...

  3. Python用xlrd读取Excel数据到list中再用xlwt把数据写入到新的Excel中

    一.先用xlrd读取Excel数据到list列表中(存入列表中的数据如下图所示) import xlrd as xd #导入需要的包 import xlwt data =xd.open_workboo ...

  4. POI读写大数据量excel,解决超过几万行而导致内存溢出的问题

    1. Excel2003与Excel2007 两个版本的最大行数和列数不同,2003版最大行数是65536行,最大列数是256列,2007版及以后的版本最大行数是1048576行,最大列数是16384 ...

  5. python 导出数据到excel 中,一个好用的导出数据到excel模块,XlsxWriter

    最近公司有项目需要导出数据到excel,首先想到了,tablib,xlwt,xlrd,xlwings,win32com[还可以操作word],openpyxl,等模块但是 实际操作中tablib 写入 ...

  6. Java 添加、读取、删除Excel中的图表趋势线

    本文以Java示例介绍如何在Excel中添加趋势线,以及读取趋势线公式.通过文中的方法可支持添加6种不同类型的趋势线,包括Linear.Exponential.Logarithmic.Moving A ...

  7. 1.ASP.NET MVC使用EPPlus,导出数据到Excel中

    好久没写博客了,今天特地来更新一下,今天我们要学习的是如何导出数据到Excel文件中,这里我使用的是免费开源的Epplus组件. 源代码下载:https://github.com/caofangshe ...

  8. SqlServer表数据与excel中数据的互相复制

    一.SqlServer表数据复制到excel 1.新建查询,用sql语句把表数据读出来 2.然后,选择数据,右键,复制(也可以点击连同标题复制),复制到记事本中(不然会乱码) 3.然后再把记事本的内容 ...

  9. Java的各类型数据在内存中分配情况详解

    1.      有这样一种说法,如今争锋于IT战场的两大势力,MS一族偏重于底层实现,Java一族偏重于系统架构.说法根据无从考证,但从两大势力各自的社区力量和图书市场已有佳作不难看出,此说法不虚,但 ...

随机推荐

  1. Ubuntu 14.04 下 安装Protocol Buffers

    参考: Protocol Buffers - Google's data interchange format Ubuntu 14.04 下 安装Protocol Buffers 环境 Ubuntu ...

  2. HDU 5884 Sort(二分+优先队列)

    http://acm.hdu.edu.cn/showproblem.php?pid=5884 题意:有个屌丝设计了一个程序,每次可以将k个数组进行合并,代价为这k个数组总的长度之和.现在另外一个屌丝要 ...

  3. 纯CSS实现一个微信logo,需要几个标签?

    博客已迁移至http://lwzhang.github.io. 纯CSS实现一个微信logo并不难,难的是怎样用最少的html标签实现.我一直在想怎样用一个标签就能实现,最后还是没想出来,就只好用两个 ...

  4. http post发送请求

    一: 用java自带URL发送 public synchronized JSONObject getJSON(String url2, String param) { try { URL url = ...

  5. ubuntu 14.04 (desktop amd 64) 下载

    http://cdimage.ubuntu.com/ubuntukylin/releases/14.04/release/

  6. Selenium UI 举例 getCssValue

    selenium jar包中,在WebElement的接口中, String getCssValue(String var1); 可以通过标签,获取对应的css值.具体要怎么用呢,如下: WebEle ...

  7. Qt5_QString_测试

    ZC: 下面的测试效果看,可以只是用 “QString.isEmpty()” 或者 “QString == ""”来判断 QString是否为 空或者NULL . 1. 1.1. ...

  8. 快速幂模n运算

    模运算里的求幂运算,比如 5^596 mod 1234, 当然,直接使用暴力循环也未尝不可,在书上看到一个快速模幂算法 大概思路是,a^b mod n ,先将b转换成二进制,然后从最高位开始(最高位一 ...

  9. export与export default exports与module.exports的用法

    转载:http://blog.csdn.net/zhou_xiao_cheng/article/details/52759632 本文原创地址链接:http://blog.csdn.net/zhou_ ...

  10. SpringBoot中的数据库连接池

    内置的连接池 目前Spring Boot中默认支持的连接池有dbcp,dbcp2, tomcat, hikari三种连接池. 数据库连接可以使用DataSource池进行自动配置. 由于Tomcat数 ...