3.1 XML Introduction

An XML document is made up of elements.

  • An element can have attributes (key/value pairs) and child nodes.
  • Child nodes can be elements or text.

XML is widely used for structured documents, configration files, communication protocols, data interchange, and so on.


3.3  Parsing an XML Doucment

1. Three parser methods:

  • 1.1 DOM (Document Object Model) -- produces a tree structure
  • 1.2 SAX (Smiple API for XML) -- notify you whenever it encounters another feature (such as the start or end of an element)
  • 1.3 "pull parser" (StAX) -- where you program a loop that gets each feature

2. If not a huge data set, it's simplest to use the DOM. Here is how:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder(); // usage 1): parse a file
File f = ...;
Document doc = builder.parse(f); // usage 2): parse a URL
URL u = ...;
Document doc = builder.parse(u); // usage 3): parse an inputStream
InputStream in = ...
Doucment doc = builder.parse(in);

3. Analyzing the DOM

3.1 To analyze the DOM tree, start with the root element:

Element elem = doc.getDocumentElement();

3.2 When you have any element, you care about three pieces of information:

  • 1) Tag name
  • 2) Attributes
  • 3) Children

3.2.1 Get the tag name by calling:  elem.getTagName()

3.2.2 This Code walks throught all attributes:

NamedNodeMap attributes = element.getAttributes();
for (int i = 0; i < attributes.getLength(); i++) {
Node attribute = attributes.item(i);
String name = attribute.getNodeName();
String value = attribute.getNodeValue();
...
}

3.2.3 For Children:

NodeList children = root.getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
Node child = children.item(i);
...
}

4. Node Types

4.1 Most applications need to process Element and Text nodes, unless your XML document has a DTD or schema, the DOM includes all whitespace as Text nodes:

4.2 Can filter text out like this:

Node chld = children.item(i);
if (child instanceof Element) {
Element childElement = (Element) child;
...
}

4.3 To get the text child from an element (such as <font> element above), call:

Text textNode = (Text) element.getFirstChild();
String text = textNode.getData();

3.4 Validation

1. Many XML documents have speicifc rules about vaild elements and attributes.

2. Java API supports two mechanism for describing these rules:

  • Document type definitions (DTD)
  • XML Schema

3. When a parser validates a document, it checks that the document conforms to the rules.

4. To turn on DTD validation, call

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);

  then, xml using with DTD

<?xml version="1.0"?>
<!DOCTYPE config SYSTEM "config.dtd">
<config>
<entry id="background">
<construct class="java.awt.Color">
<int>55</int>
<int>200</int>
<int>100</int>
</construct>
</entry>
<entry id="currency">
<factory class="java.util.Currency">
<string>USD</string>
</factory>
</entry>
</config>

5. For XML Schema, use following code. Unfortunately, the parser doesn't discard whitespace.

factory.setNamespaceAware(true);
final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);

xml using with XML Schema

<?xml version="1.0"?>
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="config.xsd">
<entry id="background">
<construct class="java.awt.Color">
<int>55</int>
<int>200</int>
<int>100</int>
</construct>
</entry>
<entry id="currency">
<factory class="java.util.Currency">
<string>USD</string>
</factory>
</entry>
</config>

3.5 XPath

1. It can be tedious to analyze the DOM tree by visiting descends. => XPath is a standard way of locating node sets. For example, /html/body/table describes all tables in an XHTML document.

2. To use XPath, you need a factory, then call the evaluate method to get a string result:

// to create a factory
XPathFactory xpFactory = XPathFactory.newInstance();
path = xpFactory.newXPath(); // call evaluate
String title = path.evaluate("/html/head/title", doc);

3. To get a result as a node list, node, or number, call

// node list
NodeList nodes = (NodeList) path.evaluate("/html/body/table", doc, XPathConstants.NODESET); // node
Node node = (Node) path.evaluate("/html/body/table", doc, XPathConstants.NODE); // number
int count = ((Number) path.evaluate("count("html/body/table")", doc, XPathConstants.NUMBER)).intValue();

4. You don't have to start a search at the top of the document:

result = path.evaluate(expression, node);

3.6 Namespaces

1. With namespaces, XMLdocuments can use elements from two grammers. This follwing document contains XHTML and SVG:

// Note the xmlns:prefix in the root node
<html xmls="http://www.w3.org/1999/xhtml"
xmlns:svg="http://www.w3.org/2000/svg"> // document as below: An unprefixed element (body) is XML, and the SVG elements have an svg prefix
<body>
<svg:svg width="100" height="100">
<svg:circle cx="50" cy="50" r="40" stroke="green" stroke-width="4"/>
</svg:svg>
</body>

2. To turn on namespace processing in Java, call

factory.setNameSpaceAware(true);

3. Now, getNodeName yields the qualified name such as svg:circle in our example.

4. There are methods to get the namesapce URI and the unprefixed tag name:

Node.getNamespaceURI()   // Gets the name space URI, http://www.w3.org/2000/svg
Node.getLocalName() // Gets the unprefixed name, such as circle

3.7 Streaming parser

3.7.1 SAX Parser

1. Streaming parsers are useful for parsing huge documents.

2. Instead of building a tree structure (DOM, section 3.3, -2), the SAX parser reports events. You supply a handler with methods:

  • startElement and endElement
  • characters
  • startDocument and endDocument

3. In the callback methods, you get the element name and attributes, or the text content.

4. Start the parsing process like this:

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse(source, handler);

3.7.2 StAX Parser

1. The StAX parser is a "pull parser". Instead of installing an event handler, you iterate through the events:

InputStream in = ...;

XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in); while (parser.hasNext()) {
int eventType = parser.Next();
...
}

2. Then branch on the event type (START_ELEMENT, END_ELEMENT, CHARACTERS, START_DOCUMENT, END_DOCUMENT, and so on).

3.1 Element -- Analyze an element like this:

String name = parser.getName();    // the local name; call getQName if you need namespace data
int attrCount = parser.getAttributeCount();
for (int i = 0; i < attrCount; i++) {
Process parser.getAttributeLocalName(i) and parser.getAttributeValue(i)
}

 You can also look up an attribute by name:

String value = parser.getAttributeValue(null, attributeName);

3.2 CHARACTERS, call parser.getText() returns the text


3.8 Building XML Document

1. Building a DOM tree in 1.1 -1.3

1.1 Don't write XML with print statements, you can build a DOM tree with follwing codes:

Document doc = builder.newDoucment();
Element rootElement = doc.createElement(rootName);
Element childElement = doc.createElement(childName);
Text textNode = doc.createTextNode(textContents);

1.2 Set attributes like this:

rootElement.setAttribute(name, value);

1.3 Attach the chldren to the parents:

doc.appendChild(rootElement);
rootElement.appendChild(childElement);
childElement.appendChild(textNode);

Then writing Documents 1.4-1.6

1.4 There is no easy way to write a DOM tree. The easiest approach is with LSSerializer interface. You get an instance with this magic incantation:

DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature("LS", "3.0");
LSSerializer ser = implLS.createLSSerializer();

1.5 If you want sapces and line breaks, set this flag:

ser.getDomConfig().setParameter("format-pretty-print", true);

1.6 You can also save the document to a file:

LSOutput out = implLS.createLSOutput();
out.setEncoding("UTF-8");
out.setByteStream(Files.newOutputStream(path));
ser.write(doc, out);

Or you can turn the document into a string:

String str = ser.writeToString(doc);

2. (Easier) Writing an XML Document with StAX

2.1 Wasteful to build a DOM tree just to write a document. The StAX API lets you write an XML document directly.

2.2 Construct an XMLStreamWriter from an OutputStream instance:

XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = factory.createXMLStreamWriter(out);

2.3 To produce the XML header, call:

writer.writeStartDocument();

2.4 Then start the first element:

writer.writeStartElement(name);

2.5 Add attributes by calling:

writer.writeAttribute(name, value);

2.6 Now you can add child elements by calling writeStartElement again, or write characters with:

writer.writeCharacters(text);

2.7 Call writeEndElement to end an element and writeEndDocument at the end of the document.

2.8 You can write a self-closing tag (such as <img .../>) with the writeEmptyElement method.

2.9 When you are all done, close the XMLStreamWriter -- it isn't auto-closable.

读后笔记 -- Java核心技术(第11版 卷 II) Chapter3 XML的更多相关文章

  1. java核心技术(第十版卷一)笔记(纯干货!)

    这是我读过的第三本关于java基础的书.第一本<<java从入门到精通>>这本书让我灵识初开.第二本<<java敏捷开发>>这本书则是有一次被一位师傅批 ...

  2. java核心技术第十版 笔记

    1.java区分大小写 2.类名是以大写字母开头 (驼峰) 3.http://docs.oracle.com/javase/specs  java语言规范 4. /* */ 注释不能嵌套 5. Jav ...

  3. 《Java核心技术 卷II 高级特性(原书第9版)》

    <Java核心技术 卷II 高级特性(原书第9版)> 基本信息 原书名:Core Java Volume II—Advanced Features(Ninth Edition) 作者: ( ...

  4. Java核心技术·卷 II(原书第10版)分享下载

    Java核心技术·卷 II 内容介绍 Java领域最有影响力和价值的著作之一,由拥有20多年教学与研究经验的资深Java技术专家撰写(获Jolt大奖),与<Java编程思想>齐名,10余年 ...

  5. 《Java核心技术卷I》观赏指南

    Tomxin7 如果你有想看书的计划,但是还在纠结哪些书值得看,可以简单看看"观赏指南"系列,本文会简单列出书中内容,给还没有买书的朋友提供一个参考. 前言 秋招过去很久了,虽然在 ...

  6. Java核心技术卷阅读随笔--第4章【对象与类】

    对 象 与 类 4.1 面向对象程序设计概述 面向对象程序设计(简称 OOP) 是当今主流的程序设计范型, 它已经取代了 20 世纪 70 年代的" 结构化" 过程化程序设计开发技 ...

  7. 《Java核心技术卷1》拾遗

    之前对Java的基础知识有过学习,现在开始学习<Java核心技术卷1>,将一些新学的知识点,做简要记录,以备后续回顾: 1.double (1)所有的“非数值”都认为是不相同的 if(x= ...

  8. java的优点和误解 《java核心技术卷i》第一章

    <java核心技术卷i>第一章主要内容包括三点: 1:Java白皮书的关键术语:描述Java的十一个关键字: 2:Java applet 3 :关于Java的常见误解   1:第一章:Ja ...

  9. Java核心技术卷阅读随笔--第3章【Java 的基本程序设计结构】

    Java 的基本程序设计结构 现在, 假定已经成功地安装了 JDK,并且能够运行第 2 章中给出的示例程序.我们从现在开始将介绍 Java 应用程序设计.本章主要介绍程序设计的基本概念(如数据类型.分 ...

  10. Java核心技术(Java白皮书)卷Ⅰ 第一章 Java程序设计概述

    第1章 Java程序设计概述1.1 Java程序设计平台 具有令人赏心悦目的语法和易于理解的语言,与其他许多优秀语言一样,Java满足这些要求. 可移植性 垃圾收集 提供大型的库  如果想要有奇特的绘 ...

随机推荐

  1. 推荐系统[二]:召回算法超详细讲解[召回模型演化过程、召回模型主流常见算法(DeepMF_TDM_Airbnb Embedding_Item2vec等)、召回路径简介、多路召回融合]

    1.前言:召回排序流程策略算法简介 推荐可分为以下四个流程,分别是召回.粗排.精排以及重排: 召回是源头,在某种意义上决定着整个推荐的天花板: 粗排是初筛,一般不会上复杂模型: 精排是整个推荐环节的重 ...

  2. 病程极短(≤16周)的495例未分化关节炎患者随访2年的结局[EULAR2015_SAT0055]

    病程极短(≤16周)的495例未分化关节炎患者随访2年的结局   SAT0055 TWO-YEAR OUTCOME IN 495 PATIENTS WITH UNDIFFERENTIATED ARTH ...

  3. Computed 和 Watch 的区别

    1.computed计算属性: 作用:(1)解决模板中放入过多的逻辑会让模板过重且难以维护的问题.例如两个数据的拼接或字体颜色的判断. (2)它支持缓存,只有依赖的数据发生了变化,才会重新计算.例如模 ...

  4. Gateway集成Netty服务

    目录 一.Netty简介 二.Netty入门案例 1.服务端启动 2.通道初始化 3.自定义处理器 4.测试请求 三.Gateway集成 1.依赖层级 2.自动化配置 四.配置加载 1.基础配置 2. ...

  5. LeetCode-2013 检测正方形

    来源:力扣(LeetCode)链接:https://leetcode-cn.com/problems/detect-squares 题目描述 给你一个在 X-Y 平面上的点构成的数据流.设计一个满足下 ...

  6. maven发布到本地仓库

    <distributionManagement> <repository> <id>localRepository</id> <url>fi ...

  7. vue 项目配置自动打压缩包

    vue cli3 创建的项目 1.安装包 npm install filemanager-webpack-plugin --save-dev npm install silly-datetime -- ...

  8. rtl8188eu 关闭power save

    RTL8188eu 关闭power saving 当PC端没有和设备交换数据时,rtl8188eu会进入节能模式,很影响调试开发. 关闭方法:找到驱动代码include/autoconf.h 发现要禁 ...

  9. SAP BW/4HANA学习笔记2

    2.Data Modeling BW/4HANA Data Modeling简介 Data Quality:数据质量问题: silos(桶仓):大量重复冗余的主数据,独立计算统计: 数据silos缺点 ...

  10. sheet.getLastRowNum()获取行数不准的问题

    // 获得总共有多少行int rowNum = 0;//存在样式的空行.会被统计进来.所以主要的问题是要判断是否是空行.for (int num = 1; num <= sheet.getLas ...