使用DocumentFormat.OpenXml操作Excel文件.xlsx
1.开始
DocumentFormat.OpenXml是ms官方给一个操作office三大件新版文件格式(.xlsx,.docx,.pptx)的组件;特色是它定义了OpenXml所包含的所有对象(たぶん),能做到精确微调文件内容格式;因此它没有EppPlus那么容易上手,性能也很看使用者的水平。。
DocumentFormat.OpenXml的语法很接近直接操作xml,所以使用它来操作Excel,得先熟悉Excel的xml文档结构:
↑已经忘记从哪里找来的了; WorkbookPart包含4个重要子节点:
- WorkSheetPart:表格数据内容就在这里面,结构最复杂的部分,Workheet的子节点除了Colmns、SheetData还有合并单元格集合MergeCells(图中缺失);
- WorkSheet:存放表单id及命名(sheet1, Sheet2...),这里有excel的坑,如果包含多个Sheet直接Sheets.First()有可能获取到最后一张Sheet,最好根据Name来搜索;
- WorkbootStylePart:存放样式;
- SharedStringTablePart(上图中缺失):共享字符串集合,字符串默认会存在里面,然后Cell引用其数组下标,这也是类似保存1w行"一二三亖"的.xlsx比.txt小的原因
- using System;
- using System.Collections.Generic;
- using System.Linq;
- using System.Xml;
- using DocumentFormat.OpenXml;
- using DocumentFormat.OpenXml.Packaging;
- using DocumentFormat.OpenXml.Spreadsheet;
- namespace EOpenXml
- {
- public static class OpenXmlExcelExtentions
- {
- public static Sheet GetSheet(this WorkbookPart workbookPart, string sheetName)
- {
- return workbookPart.Workbook
- .GetFirstChild<Sheets>()
- .Elements<Sheet>().Where(s => s.Name == sheetName).FirstOrDefault();
- }
- /// <summary>
- /// Given a worksheet and a row index, return the row.
- /// </summary>
- /// <param name="sheetData"></param>
- /// <param name="rowIndex"></param>
- /// <returns></returns>
- public static Row GetRow(this SheetData sheetData, uint rowIndex)
- {
- return sheetData.
- Elements<Row>().Where(r => r.RowIndex == rowIndex).FirstOrDefault();
- }
- public static Cell GetCell(this SheetData sheetData, string columnName, uint rowIndex)
- {
- Row row = GetRow(sheetData, rowIndex);
- if (row == null)
- return null;
- return row.Elements<Cell>().Where(c => string.Compare
- (c.CellReference.Value, columnName +
- rowIndex, true) == ).FirstOrDefault();
- }
- // https://msdn.microsoft.com/en-us/library/office/cc861607.aspx
- // Given a column name, a row index, and a WorksheetPart, inserts a cell into the worksheet.
- // If the cell already exists, returns it.
- public static Cell GetOrCreateCell(this SheetData sheetData, string columnName, uint rowIndex)
- {
- string cellReference = columnName + rowIndex;
- // If the worksheet does not contain a row with the specified row index, insert one.
- Row row;
- if (sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).Count() != )
- {
- row = sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
- }
- else
- {
- row = new Row() { RowIndex = rowIndex };
- sheetData.Append(row);
- }
- return row.GetOrCreateCell(cellReference);
- }
- public static Cell GetOrCreateCell(this Row row, string cellReference)
- {
- // If there is not a cell with the specified column name, insert one.
- if (row.Elements<Cell>().Where(c => c?.CellReference?.Value == cellReference).Count() > )
- {
- return row.Elements<Cell>().Where(c => c.CellReference.Value == cellReference).First();
- }
- else
- {
- // Cells must be in sequential order according to CellReference. Determine where to insert the new cell.
- Cell refCell = null;
- foreach (Cell cell in row.Elements<Cell>())
- {
- if (cell.CellReference.Value.Length == cellReference.Length)
- {
- if (string.Compare(cell.CellReference.Value, cellReference, true) > )
- {
- refCell = cell;
- break;
- }
- }
- }
- Cell newCell = new Cell() { CellReference = cellReference };
- row.InsertBefore(newCell, refCell);
- return newCell;
- }
- }
- public static string GetValue(this Cell cell, SharedStringTablePart shareStringPart)
- {
- if (cell == null)
- return null;
- string cellvalue = cell.InnerText;
- if (cell.DataType != null)
- {
- if (cell.DataType == CellValues.SharedString)
- {
- int id = -;
- if (Int32.TryParse(cellvalue, out id))
- {
- SharedStringItem item = GetItem(shareStringPart, id);
- if (item.Text != null)
- {
- //code to take the string value
- cellvalue = item.Text.Text;
- }
- else if (item.InnerText != null)
- {
- cellvalue = item.InnerText;
- }
- else if (item.InnerXml != null)
- {
- cellvalue = item.InnerXml;
- }
- }
- }
- }
- return cellvalue;
- }
- public static string GetValue(this Cell cell, string[] shareStringPartValues)
- {
- if (cell == null)
- return null;
- string cellvalue = cell.InnerText;
- if (cell.DataType != null)
- {
- if (cell.DataType == CellValues.SharedString)
- {
- int id = -;
- if (Int32.TryParse(cellvalue, out id))
- {
- cellvalue = shareStringPartValues[id];
- }
- }
- }
- return cellvalue;
- }
- public static Cell SetValue(this Cell cell, object value = null, SharedStringTablePart shareStringPart = null, int shareStringItemIndex = -, uint styleIndex = )
- {
- if (value == null)
- {
- cell.CellValue = new CellValue();
- if (shareStringItemIndex != -)
- {
- cell.CellValue = new CellValue(shareStringItemIndex.ToString());
- cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);
- }
- }
- else if (value is string str)
- {
- if (shareStringPart == null)
- {
- cell.CellValue = new CellValue(str);
- cell.DataType = new EnumValue<CellValues>(CellValues.String);
- }
- else
- {
- // Insert the text into the SharedStringTablePart.
- int index = shareStringPart.GetOrInsertItem(str, false);
- // Set the value of cell
- cell.CellValue = new CellValue(index.ToString());
- cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);
- }
- }
- else if (value is int || value is short || value is long ||
- value is float || value is double || value is uint ||
- value is ulong || value is ushort || value is decimal)
- {
- cell.CellValue = new CellValue(value.ToString());
- cell.DataType = new EnumValue<CellValues>(CellValues.Number);
- }
- else if (value is DateTime date)
- {
- cell.CellValue = new CellValue(date.ToString("yyyy-MM-dd")); // ISO 861
- cell.DataType = new EnumValue<CellValues>(CellValues.Date);
- }
- else if (value is XmlDocument xd)
- {
- if (shareStringPart == null)
- {
- throw new Exception("Param [shareStringPart] can't be null when value type is XmlDocument.");
- }
- else
- {
- int index = shareStringPart.GetOrInsertItem(xd.OuterXml, true);
- // Set the value of cell
- cell.CellValue = new CellValue(index.ToString());
- cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);
- }
- }
- if (styleIndex != )
- cell.StyleIndex = styleIndex;
- return cell;
- }
- // https://msdn.microsoft.com/en-us/library/office/gg278314.aspx
- // Given text and a SharedStringTablePart, creates a SharedStringItem with the specified text
- // and inserts it into the SharedStringTablePart. If the item already exists, returns its index.
- public static int GetOrInsertItem(this SharedStringTablePart shareStringPart, string content, bool isXml)
- {
- // If the part does not contain a SharedStringTable, create one.
- if (shareStringPart.SharedStringTable == null)
- {
- shareStringPart.SharedStringTable = new SharedStringTable();
- }
- int i = ;
- // Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
- foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
- {
- if ((!isXml && item.InnerText == content) || (isXml && item.OuterXml == content))
- {
- return i;
- }
- i++;
- }
- // The text does not exist in the part. Create the SharedStringItem and return its index.
- if (isXml)
- shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(content));
- else
- shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new Text(content)));
- shareStringPart.SharedStringTable.Save();
- return i;
- }
- private static SharedStringItem GetItem(this SharedStringTablePart shareStringPart, int id)
- {
- return shareStringPart.SharedStringTable.Elements<SharedStringItem>().ElementAt(id);
- }
- /// <summary>
- /// https://docs.microsoft.com/en-us/office/open-xml/how-to-merge-two-adjacent-cells-in-a-spreadsheet
- /// </summary>
- /// <param name="worksheet"></param>
- /// <returns></returns>
- public static MergeCells GetOrCreateMergeCells(this Worksheet worksheet)
- {
- MergeCells mergeCells;
- if (worksheet.Elements<MergeCells>().Count() > )
- {
- mergeCells = worksheet.Elements<MergeCells>().First();
- }
- else
- {
- mergeCells = new MergeCells();
- // Insert a MergeCells object into the specified position.
- if (worksheet.Elements<CustomSheetView>().Count() > )
- {
- worksheet.InsertAfter(mergeCells, worksheet.Elements<CustomSheetView>().First());
- }
- else if (worksheet.Elements<DataConsolidate>().Count() > )
- {
- worksheet.InsertAfter(mergeCells, worksheet.Elements<DataConsolidate>().First());
- }
- else if (worksheet.Elements<SortState>().Count() > )
- {
- worksheet.InsertAfter(mergeCells, worksheet.Elements<SortState>().First());
- }
- else if (worksheet.Elements<AutoFilter>().Count() > )
- {
- worksheet.InsertAfter(mergeCells, worksheet.Elements<AutoFilter>().First());
- }
- else if (worksheet.Elements<Scenarios>().Count() > )
- {
- worksheet.InsertAfter(mergeCells, worksheet.Elements<Scenarios>().First());
- }
- else if (worksheet.Elements<ProtectedRanges>().Count() > )
- {
- worksheet.InsertAfter(mergeCells, worksheet.Elements<ProtectedRanges>().First());
- }
- else if (worksheet.Elements<SheetProtection>().Count() > )
- {
- worksheet.InsertAfter(mergeCells, worksheet.Elements<SheetProtection>().First());
- }
- else if (worksheet.Elements<SheetCalculationProperties>().Count() > )
- {
- worksheet.InsertAfter(mergeCells, worksheet.Elements<SheetCalculationProperties>().First());
- }
- else
- {
- worksheet.InsertAfter(mergeCells, worksheet.Elements<SheetData>().First());
- }
- worksheet.Save();
- }
- return mergeCells;
- }
- /// <summary>
- /// Given the names of two adjacent cells, merges the two cells.
- /// Create the merged cell and append it to the MergeCells collection.
- /// When two cells are merged, only the content from one cell is preserved:
- /// the upper-left cell for left-to-right languages or the upper-right cell for right-to-left languages.
- /// </summary>
- /// <param name="mergeCells"></param>
- /// <param name="cell1Name"></param>
- /// <param name="cell2Name"></param>
- public static void MergeTwoCells(this MergeCells mergeCells, string cell1Name, string cell2Name)
- {
- MergeCell mergeCell = new MergeCell() { Reference = new StringValue(cell1Name + ":" + cell2Name) };
- mergeCells.Append(mergeCell);
- }
- public static IEnumerable<string> GetItemValues(this SharedStringTablePart shareStringPart)
- {
- foreach (var item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
- {
- if (item.Text != null)
- {
- //code to take the string value
- yield return item.Text.Text;
- }
- else if (item.InnerText != null)
- {
- yield return item.InnerText;
- }
- else if (item.InnerXml != null)
- {
- yield return item.InnerXml;
- }
- else
- {
- yield return null;
- }
- };
- }
- public static XmlDocument GetCellAssociatedSharedStringItemXmlDocument(this SheetData sheetData, string columnName, uint rowIndex, SharedStringTablePart shareStringPart)
- {
- Cell cell = GetCell(sheetData, columnName, rowIndex);
- if (cell == null)
- return null;
- if (cell.DataType == CellValues.SharedString)
- {
- int id = -;
- if (Int32.TryParse(cell.InnerText, out id))
- {
- SharedStringItem ssi = shareStringPart.GetItem(id);
- var doc = new XmlDocument();
- doc.LoadXml(ssi.OuterXml);
- return doc;
- }
- }
- return null;
- }
- }
- }
2.插入数据
- private static void GenerateExcel()
- {
- using (MemoryStream mem = new MemoryStream())
- {
- using (var temp = File.OpenRead(@"E:\template.xlsx"))
- {
- temp.CopyTo(mem);
- }
- using (SpreadsheetDocument doc = SpreadsheetDocument.Open(mem, true))
- {
- WorkbookPart wbPart = doc.WorkbookPart;
- Worksheet worksheet = wbPart.WorksheetParts.First().Worksheet;
- //statement to get the sheetdata which contains the rows and cell in table
- SheetData sheetData = worksheet.GetFirstChild<SheetData>();
- SharedStringTablePart shareStringPart;
- if (wbPart.GetPartsOfType<SharedStringTablePart>().Any())
- shareStringPart = wbPart.GetPartsOfType<SharedStringTablePart>().First();
- else
- shareStringPart = wbPart.AddNewPart<SharedStringTablePart>();
- //假设模板第一行是Title不用动,把要用到的样式都定义在了第二行的单元格里
- var secondRow = sheetData.GetRow();
- uint[] lineStyles = secondRow.Elements<Cell>().Select(c => c.StyleIndex.Value).ToArray();
- sheetData.RemoveChild(secondRow);
- //从第二行开始循环插入4列1000数据
- uint currentRowIndex = ;
- for (int i = ;i<;i++)
- {
- Row row = new Row();
- row.RowIndex = currentRowIndex;//设置行号
- row.AppendChild(new Cell().SetValue(, shareStringPart, styleIndex: lineStyles[]));
- row.AppendChild(new Cell().SetValue(DateTime.Now, shareStringPart, styleIndex: lineStyles[]));
- row.AppendChild(new Cell().SetValue(3.1415926535, shareStringPart, styleIndex: lineStyles[]));
- row.AppendChild(new Cell().SetValue("通商宽衣", shareStringPart, styleIndex: lineStyles[]));//这里慢
- sheetData.AppendChild(row);
- currentRowIndex++;
- }
- wbPart.Workbook.Save();
- doc.SaveAs($@"E:\Temp_{DateTime.Now.ToString("yyMMddHHmm")}.xlsx");
- doc.Close();
- }
- mem.Close();
- }
- }
以上就生成了一个Excel打开不会报任何格式错误提示的标准.xlsx文件;但有需要优化的地方:在每次插入字符串的时候会去循环共享字符集,调用shareStringPart.GetItemValues().ToArray()可以将集合全部存到数组或Dictionary<string,int>里面会快很多,如果清楚集合内容就不用去判断重复了,当然也可以简单粗暴的保存为CellValues.InlineString,这样在重复字符串比较多的时候两种方式所生成的文件大小会有很大差异。
3.快速遍历
- public static void Read()
- {
- using (var sd = SpreadsheetDocument.Open(@"E:\temp.xlsx", false))
- {
- WorkbookPart wbPart = sd.WorkbookPart;
- SharedStringTablePart shareStringPart;
- if (wbPart.GetPartsOfType<SharedStringTablePart>().Count() > )
- shareStringPart = wbPart.GetPartsOfType<SharedStringTablePart>().First();
- else
- shareStringPart = wbPart.AddNewPart<SharedStringTablePart>();
- string[] shareStringItemValues = shareStringPart.GetItemValues().ToArray();
- WorksheetPart worksheetPart = wbPart.WorksheetParts.First();
- uint dataRowStart = ;//数据开始行
- OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
- while (reader.Read())
- {
- if (reader.ElementType == typeof(Worksheet))
- {
- reader.ReadFirstChild();
- }
- if (reader.ElementType == typeof(Row))
- {
- Row r = (Row)reader.LoadCurrentElement();
- if (r.RowIndex < dataRowStart)
- continue;
- foreach (Cell c in r.Elements<Cell>())
- {
- if (c.CellReference != null && c.CellReference.HasValue)
- {
- string cv = c.GetValue(shareStringItemValues);
- Console.WriteLine(cv);
- if (c.CellReference.Value == "B" + r.RowIndex)
- Console.WriteLine("刚读取的是B列");
- }
- }
- }
- }
- sd.Close();
- }
- }
4.总结
DocumentFormat.OpenXml不友好但操作透明,使用前最好先自行封装下,习惯之后相信能用的很爽。
以上です。
使用DocumentFormat.OpenXml操作Excel文件.xlsx的更多相关文章
- C# 操作 Excel 文件(.xls 或 .xlsx)
在.net中,常用的操作excel文件的方式,有三种: OLE DB的形式, 第三方框架NPOI, Office组件. 总结: 通过对比,在读取大数据量的excel文件,建议用OLE DB的形式,把e ...
- java使用Apache POI操作excel文件
官方介绍 HSSF is the POI Project's pure Java implementation of the Excel '97(-2007) file format. XSSF is ...
- 记录python接口自动化测试--把操作excel文件的方法封装起来(第五目)
前面补充了如何来操作excel文件,这次把如何获取excel文件的sheet对象.行数.单元格数据的方法进行封装,方便后面调用 handle_excel.py# coding:utf-8 import ...
- python3:操作excel文件
前提:自动化接口测试中,可以将用例放在excel中管理.结合实际情况讲解如何操作excel文件 1.安装xlrd:pip install xlrd 2.导入模块:import xlrd 3.打开Exc ...
- python 操作Excel文件
1 安装xlrd.xlwt.xlutils cmd下输入: pip install xlrd #读取excel pip install xlwt #写入excel pi ...
- C#项目中操作Excel文件——使用NPOI库
转载自:http://blog.csdn.net/dcrmg/article/details/52356236# 感谢-牧野- 实际C#项目中经常会涉及到需要对本地Excel文件进行操作,特别是一些包 ...
- C#使用oledb操作excel文件的方法
本文实例讲述了C#使用oledb操作excel文件的方法.分享给大家供大家参考.具体分析如下: 不管什么编程语言都会提供操作Excel文件的方式,C#操作Excel主要有以下几种方式: 1.Excel ...
- java使用POI操作excel文件,实现批量导出,和导入
一.POI的定义 JAVA中操作Excel的有两种比较主流的工具包: JXL 和 POI .jxl 只能操作Excel 95, 97, 2000也即以.xls为后缀的excel.而poi可以操作Exc ...
- 用Python的pandas框架操作Excel文件中的数据教程
用Python的pandas框架操作Excel文件中的数据教程 本文的目的,是向您展示如何使用pandas 来执行一些常见的Excel任务.有些例子比较琐碎,但我觉得展示这些简单的东西与那些你可以在其 ...
随机推荐
- mapreduce 读写Parquet格式数据 Demo
import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs ...
- maker使用说明书
1.以自带的示例数据为例 dpp_contig.fasta dpp_est.fasta dpp_protein.fasta te_proteins.fasta 2.生成控制文件 控制文件是特定于运行的 ...
- 【转】AVL之C++实现
AVL树的介绍 AVL树是高度平衡的而二叉树.它的特点是:AVL树中任何节点的两个子树的高度最大差别为1. 上面的两张图片,左边的是AVL树,它的任何节点的两个子树的高度差别都<=1:而右边的不 ...
- C语言中宏的相关知识
2019/04/27 16:02 1.宏的定义:宏定义就是预处理命令的一种,它允许用一个标识符来表示一个字符串.格式如下: #define name(宏名) stuff(字符串) 本质就是使用宏名去替 ...
- AppDomin学习与分享
最近学习并分享了appdomin的一些东西,以前没怎么记录过,现在记录一下吧2016-03-17 什么是AppDomin •全称:Application Domin(应用程序域) •定义:AppDom ...
- 2018的Java
少用复制黏贴 程序员很多时候都习惯复制黏贴,这里复制一点,那里复制一点,拼拼凑凑的搞出了一段代码.这是一种常态,毕竟没有必要重复造轮子,在开发的时候,讲究的是效率,讲究速度,有时候也是不得不这样做.但 ...
- Spring JDBC最佳实践(2)
原文地址:https://my.oschina.net/u/218421/blog/38576 使用DataSourceUtils进行Connection的管理由上节代码可知,JdbcTemplate ...
- django 中进程监控工具flower的使用
工程结构:请参考https://www.cnblogs.com/apple2016/p/11425307.html flower官方文档:https://flower.readthedocs.io/e ...
- Metasploaitable和侦察httrack-安全牛课堂网络安全之Web渗透测试练习记录
环境配置 首先在网上下载kali的镜像以及Metasploaitable虚拟机,打开按照网上教程安装好kali虚拟机,另一边打开Metasploaitable虚拟机,进入输入初始账户msfadmin, ...
- unity的yield
这里说的是Unity通过StartCoroutine开启IEnumerator协程里的yield相关 1.yield return 0,yield return null 等待下一帧接着执行下面的内容 ...