Open XML SDK class structure

March 27, 2012 by Vincent

I’ve gotten a few questions on the class structure of the Open XML SDK. There are articles on Open XML itself, where you work directly with XML files and tags, and zip them up yourself. Basically you work with WordprocessingML (Word), SpreadsheetML (Excel), PresentationML (PowerPoint) and possibly DrawingML (for images and stuff). Eric White did a lot of stuff on this.

There are also articles on the use of Open XML SDK itself. However, the articles I’ve found tend to give you code samples and some explanation of why you do those things, but didn’t really explain the deep “why”.

The fundamental question I was asked was “How are the Open XML SDK classes related to each other?“. A related question is “Why do I have to use that particular class?”.

While I’m familiar with the spreadsheet portion of the SDK, I believe the general class structure applies to the WordprocessingML and PresentationML parts too. So I’ll use the SpreadsheetML part as the example.

I also didn’t find any article giving names to the class categories I’m going to tell you, so I’m going to make up my own. If there’s an official Microsoft article on this, let me know. Generally speaking, there are 4 types of SDK classes:

  • Relationship Part Classes (henceforth referred to as RPCs)
  • Root Classes (henceforth referred to as RCs)
  • Content Classes (henceforth referred to as CCs)
  • Special Classes

Before I continue, keep in mind that Open XML documents are basically a bunch of XML files zipped together. Just like a relational database, certain XML files are related to each other (just like certain database tables are related to each other). Which brings us to the first type of class.

Relationship Part Classes

RPCs are the glue that holds certain SDK classes together. They are most easily recognisable by their class type name. For example, WorkbookPart and WorksheetPart.

However, not all SDK classes with names that end with “Part” are RPCs. For example, TablePart is not an RPC (it’s actually a Content Class). The corresponding RPC is actually TableDefinitionPart.

The most important part of an RPC is that it carries a relationship ID, and this relationship ID is used to tie relevant classes together. An RPC is also different in that it has as a property, a Root Class.

Root Classes

Remember that Open XML documents are a bunch of XML files zipped together? Well, an RC represents one of those XML files.

For example, the RPC WorksheetPart has as its RC, the Worksheet class. The Worksheet class holds information that basically translates into an XML file, typically sheet1.xml (and sheet2.xml and so on). The Worksheet class contains your worksheet cell data information.

Content Classes

If RCs represent an XML file, then CCs are basically XML tags.

For example, the Worksheet class contains the SheetData class, which contains the Row class(es), which in turn contains the Cell class(es). The corresponding XML tags are “worksheet”, “sheetData”, “row” and “c”.

Yes, an RC represents an XML file, and also translates to be the first XML tag of that XML file. That’s why it’s called a Root Class, because it also represents the root element (of the underlying XML structure/document).

Special Classes

These aren’t really that special. As far as I know, there are only 3 classes under this category: WordprocessingDocument, SpreadsheetDocument and PresentationDocument.

Those 3 classes form the starting point of any code relying on the Open XML SDK. You can consider them as Super Relationship Part Classes, because their properties are mainly RPCs.

An illustration

You might still be confused at this point (I don’t blame you…). Here’s a diagram for a simple Open XML spreadsheet:

In green, we have the Special Class SpreadsheetDocument as the ultimate root.

In blue, we have the RPCs, 1 WorkbookPart class and 2 WorksheetPart classes. The SpreadsheetDocument class has the WorkbookPart class as a property. The WorkbookPart class contains a collection of WorksheetPart classes.

In grey, we have the RCs, 1 Workbook class and 2 Worksheet classes. The Workbook class is the RC of WorkbookPart class. The Worksheet classes are RCs of corresponding WorksheetPart classes. The Workbook class represents the workbook.xml file and the Worksheet classes (typically) represent sheet1.xml and sheet2.xml files.

In orange, we have the CCs. The Workbook class contains the Sheets class, which in turn contains 2 Sheet classes. The Sheet classes have a property holding the relationship ID of the corresponding WorksheetPart classes, which is how they’re tied to the Worksheet classes.

One of the most confusing parts…

After working with the Open XML SDK for a while, you might find yourself asking these questions:

  • Why are there so many classes?
  • Why are some of these classes devoid of any meaningful functionality?
  • Why are some of these classes duplicates of each other?

When I was first using the SDK, I felt the same way when I first used the .NET Framework:Being overwhelmed. There were many namespaces, with many classes in them, and I didn’t know which class to use for a specific purpose until I looked it up and wrote a small test program for it. Having a comprehensive help database/file for the .NET Framework was a really good idea.

And so it was with the Open XML SDK. I mean, it’s a spreadsheet. I can see a couple dozen of classes. Maybe. It turns out to be a lot of classes. That’s why there are so many code samples out there, but you don’t really know why you need to use that particular class.

And there are classes without any meaningful properties or functions. They inherit from a base Open XML class, and that’s it. For example, the Sheets class.

Then there are classes with identical properties. For example, the InlineString class, the SharedStringItem class and CommentText class. Or the Color, BackgroundColor, ForegroundColor and TabColor classes.

The answer is the same for all the above questions. The Open XML SDK is meant to abstract away the XML file structure.

There are duplicate classes because each class eventually translates into an XML tag (if it’s a CC or RC). XML requires a different tag for different purposes, hence the Open XML SDK has different classes. Even though the classes are identical in programming functionality, they become rendered as different XML tags.

There are classes without any seemingly meaningful properties or functions because their sole purpose is to have children. (Ooh Open XML SDK joke!) The Sheets class has as children, the Sheet classes. In XML, they’re correspondingly the “sheets” tag with “sheet” tags as children. The final XML tags have no XML attributes, hence the corresponding SDK classes also have no properties. Tada!

And finally, there are so many classes, because frankly speaking, you need one SDK class corresponding to each individually different XML tag. There are a lot of XML tags used in Open XML, hence so many classes. And that’s before we add in the Relationship Part Classes.

If you have any questions, leave them in a comment or contact me. And I’ll see if I can answer them in an article.

转自:http://polymathprogrammer.com/2012/03/27/open-xml-sdk-class-structure/

【转】Open XML SDK class structure的更多相关文章

  1. Csharp: create word file using Open XML SDK 2.5

    using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; usin ...

  2. 下载和编译 Open XML SDK

    我们需要一些工具来开始 Open XML 的开发. 开发工具 推荐的开发工具是 Visual Studio 社区版. 开发工具:Visual Studio Community 2013 下载地址:ht ...

  3. Working with WordprocessingML documents (Open XML SDK)

    Last modified: January 13, 2012 Applies to: Office 2013 | Open XML This section provides conceptual ...

  4. User Word Automation Services and Open XML SDK to generate word files in SharePoint2010

    SharePoint 2010 has established a new service called "Word Automation Services" to operate ...

  5. Csharp: read excel file using Open XML SDK 2.5

    using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; usin ...

  6. Open Xml SDK Word模板开发最佳实践(Best Practice)

    1.概述 由于前面的引文已经对Open Xml SDK做了一个简要的介绍. 这次来点实际的——Word模板操作. 从本质上来讲,本文的操作都是基于模板替换思想的,即,我们通过替换Word模板中指定元素 ...

  7. Open Xml SDK 引文

    什么是Open Xml SDK? 什么是Open Xml? 首先,我们得知道,Open Xml为何物? 我们还是给她起个名字——就叫 “开放Xml”,以方便我们中文的阅读习惯.之所以起开放这个名字,因 ...

  8. Open XML SDK 在线编程黑客松

    2015年2月10日-3月20日,开源社 成员 微软开放技术,GitCafe,极客学院联合举办" Open XML SDK 在线编程黑客松 ",为专注于开发提高生产力的应用及服务的 ...

  9. Embedding Documents in Word 2007 by Using the Open XML SDK 2.0 for Microsoft Office

    Download the sample code This visual how-to article presents a solution that creates a Word 2007 doc ...

随机推荐

  1. UML要点总结(一)

    UML中的事物 UML事物包含结构事物.行为事物.组织事物和辅助事物. 结构事物: 类.接口.用例.协作.活动类.组件和节点. 行为事物: 也称动作事物,交互和状态机. 组织事物: 也称分组事物,仅仅 ...

  2. AHCI vs NVMe

    http://www.hkepc.com/13139 儘管現時有不少高階 SSD 產品改用 PCIe 接口,以突破 SATA 接口的頻寬瓶頸,但控制器設計與 SATA  接口 SSD 一樣,採用老舊的 ...

  3. careercup-栈与队列 3.1

    3.1 描述如何只用一个数组来实现三个栈. 解答 我们可以很容易地用一个数组来实现一个栈,压栈就往数组里插入值,栈顶指针加1: 出栈就直接将栈顶指针减1:取栈顶值就把栈顶指针指向的单元的值返回: 判断 ...

  4. 节点类(CCNode)

    节点与渲染树 回顾前面的介绍,我们已经知道了精灵.层和场景如何构成一个游戏的框架.精灵属于层,层属于场景,玩家与精灵互动,并导致游戏画面在不同场景中切换.把每个环节拼接在一起,我们得到了一个完整的关系 ...

  5. Java基础知识强化之集合框架笔记60:Map集合之TreeMap(TreeMap<Student,String>)的案例

    1. TreeMap(TreeMap<Student,String>)的案例 2. 案例代码: (1)Student.java: package cn.itcast_04; public ...

  6. Android(java)学习笔记177:BroadcastReceiver之 应用程序安装和卸载 的广播接收者

           国内的主流网络公司(比如网易.腾讯.百度等等),他们往往采用数据挖掘技术获取用户使用信息,从而采用靶向营销.比如电脑上,我们浏览网页的时候,往往会发现网页上会出现我们之前经常浏览内容的商 ...

  7. Android UI 开发

    今天主要学习了Android UI开发的几个知识 6大布局 样式和主题→自定义样式.主题 JUnit单元测试 Toast弹窗功能简介 6大布局 RelativeLayout LinearLayout ...

  8. Linux文件/目录权限整理

  9. Spring整合CXF步骤,Spring实现webService,spring整合WebService

    Spring整合CXF步骤 Spring实现webService, spring整合WebService >>>>>>>>>>>> ...

  10. Andriod中WebView加载登录界面获取Cookie信息并同步保存,使第二次不用登录也可查看个人信息。

    Android使用WebView加载登录的html界面,则通过登录成功获取Cookie并同步,可以是下一次不用登录也可以查看到个人信息,注:如果初始化加载登录,可通过缓存Cookie信息来验证是否要加 ...