dom4j基础教程【转】

转自 http://blog.csdn.net/whatlonelytear/article/details/42234937 ,但经过大量美化及补充.

Dom4j是一个易用的、开源的库，用于XML，XPath和XSLT。它应用于Java平台，采用了Java集合框架并完全支持DOM，SAX和JAXP。

dom4j基本使用

DOM4J使用起来非常简单。只要你了解基本的XML-DOM模型，就能使用。然而他自己带的指南只有短短一页（html），不过说的到挺全。国内的中文资料很少。

之前看过IBM developer社区的文章（参见附录），提到一些XML解析包的性能比较，其中DOM4J的性能非常出色，在多项测试中名列前茅。（事实上DOM4J的官方文档中也引用了这个比较）所以这次的项目中我采用了DOM4J作为XML解析工具。
在国内比较流行的是使用JDOM作为解析器，两者各擅其长，但DOM4J最大的特色是使用大量的接口，这也是它被认为比JDOM灵活的主要原因。

interface java.lang.Cloneable

interface org.dom4j.Node

interface org.dom4j.Attribute

interface org.dom4j.Branch

interface org.dom4j.Document

interface org.dom4j.Element

interface org.dom4j.CharacterData

interface org.dom4j.CDATA

interface org.dom4j.Comment

interface org.dom4j.Text

interface org.dom4j.DocumentType

interface org.dom4j.Entity

interface org.dom4j.ProcessingInstruction

一目了然，很多事情都清楚了。大部分都是由Node继承来的。知道这些关系，将来写程序就不会出现ClassCastException了。
下面给出一些例子（部分摘自DOM4J自带的文档），简单说一下如何使用。

１．读取并解析XML文档：

读写XML文档主要依赖于org.dom4j.io包，其中提供DOMReader和SAXReader两类不同方式，而调用方式是一样的。这就是依靠接口的好处。

// 从文件读取XML，输入文件名，返回XML文档

public Document read(String fileName) throws MalformedURLException, DocumentException {

    SAXReader reader = new SAXReader();

    Document document = reader.read(new File(fileName));

    return document;

}

其中，reader的read方法是重载的，可以从 InputStream, File, Url 等多种不同的源来读取。得到的Document对象就带表了整个XML。
根据本人自己的经验，读取的字符编码是按照XML文件头定义的编码来转换。如果遇到乱码问题，注意要把各处的编码名称保持一致即可。

２．取得Root节点

读取后的第二步，就是得到Root节点。熟悉XML的人都知道，一切XML分析都是从Root元素开始的。

public Element getRootElement(Document doc){

    return doc.getRootElement();

}

３．遍历XML树

DOM4J提供至少3种遍历节点的方法：

// 枚举所有子节点

for ( Iterator i = root.elementIterator(); i.hasNext(); ) {

    Element element = (Element) i.next();

    // do something

}

// 枚举名称为foo的节点

for ( Iterator i = root.elementIterator(foo); i.hasNext();) {

    Element foo = (Element) i.next();

    // do something

}

// 枚举属性

for ( Iterator i = root.attributeIterator(); i.hasNext(); ) {

    Attribute attribute = (Attribute) i.next();

    // do something

}

4 判断是否是叶子节点

if (element.isTextOnly()) {//是否是叶子节点,只有文本,没有子节点

  //dosomething

}

3) Visitor模式

最令人兴奋的是DOM4J对Visitor的支持，这样可以大大缩减代码量，并且清楚易懂。了解设计模式的人都知道，Visitor是GOF设计模式之一。其主要原理就是两种类互相保有对方的引用，并且一种作为Visitor去访问许多Visitable。我们来看DOM4J中的Visitor模式(快速文档中没有提供)
只需要自定一个类实现Visitor接口即可。

public class MyVisitor extends VisitorSupport {

    public void visit(Element element){

        System.out.println(element.getName());

    }

    public void visit(Attribute attr){

        System.out.println(attr.getName());

    }

}

调用： root.accept(new MyVisitor())

Visitor接口提供多种Visit()的重载，根据XML不同的对象，将采用不同的方式来访问。上面是给出的Element和Attribute的简单实现，一般比较常用的就是这两个。VisitorSupport是DOM4J提供的默认适配器，Visitor接口的Default Adapter模式，这个模式给出了各种visit(*)的空实现，以便简化代码。
注意，这个Visitor是自动遍历所有子节点的。如果是root.accept(MyVisitor)，将遍历子节点。我第一次用的时候，认为是需要自己遍历，便在递归中调用Visitor，结果可想而知。

4. XPath支持

DOM4J对XPath有良好的支持，如访问一个节点，可直接用XPath选择。

public void bar(Document document) {

List list = document.selectNodes( "//foo/bar" );

Node node = document.selectSingleNode("//foo/bar/author");

String name = node.valueOf( "@name ");

}

例如，如果你想查找XHTML文档中所有的超链接，下面的代码可以实现：

public void findLinks(Document document) throws DocumentException {

    List list = document.selectNodes( "//a/@href" );

    for (Iterator iter = list.iterator(); iter.hasNext(); ) {

        Attribute attribute = (Attribute) iter.next();

        String url = attribute.getValue();

    }

}

5. 字符串与XML的转换

有时候经常要用到字符串转换为XML或反之，
// XML转字符串

Document document = null;

String text = document.asXML();

// 字符串转XML

String text = "<person> <name>James</name> </person>";

Document document = DocumentHelper.parseText(text);

6 用XSLT转换XML

public Document styleDocument(Document document,String stylesheet) throws Exception {

    // load the transformer using JAXP

    TransformerFactory factory = TransformerFactory.newInstance();

    Transformer transformer = factory.newTransformer( new StreamSource( stylesheet ) ) ;

    // now lets style the given document

    DocumentSource source = new DocumentSource( document );

    DocumentResult result = new DocumentResult();

    transformer.transform( source, result );

    // return the transformed document

    Document transformedDoc = result.getDocument();

    return transformedDoc;

}

7. 创建XML

一般创建XML是写文件前的工作，这就像StringBuffer一样容易。

public Document createDocument() {

　　Document document = DocumentHelper.createDocument();

　　Element root = document.addElement(root);

　　Element author1 =root.addElement(author).addAttribute(name, James).addAttribute(location, UK).addText(James Strachan);

　　Element author2 =root.addElement(author).addAttribute(name, Bob).addAttribute(location, US).addText(Bob McWhirter);

　　return document;

}

8. 文件输出

一个简单的输出方法是将一个Document或任何的Node通过write方法输出到指定文件

// 指定文件

XMLWriter writer = new XMLWriter(new FileWriter( output.xml ));

writer.write( document );

writer.close();

美化格式格式化输出

// 美化格式

OutputFormat format = OutputFormat.createPrettyPrint();

org.dom4j.io.XMLWriter xmlWriter = null;

StringWriter sw = new StringWriter();

try {

    xmlWriter = new org.dom4j.io.XMLWriter(sw, format);//sw也可以换成System.out ,直接打印输出

    xmlWriter.write(doc);

} catch (Exception e) {

    e.printStackTrace();

}

String text = sw.toString();

压缩输出

// 缩减格式

format = OutputFormat.createCompactFormat();

writer = new XMLWriter( System.out, format );

writer.write( document );

9.生成Document

/**

 *

 * dom4j生成与解析XML文档

 *

 * @author wanglp 2012-2-23

 */

public class Dom4jDemo {    

    /**

     * 利用dom4j进行xml文档的写入操作

     */

    public void createXml(File file) {    

        // XML 声明 <?xml version="1.0" encoding="UTF-8"?> 自动添加到 XML文档中

        // 使用DocumentHelper类创建文档实例(生成 XML文档节点的 dom4j API工厂类)

        Document document = DocumentHelper.createDocument();

        // 使用addElement()方法创建根元素 employees(用于向 XML 文档中增加元素)

        Element root = document.addElement("employees");

        // 在根元素中使用 addComment()方法添加注释"An XML Note"

        root.addComment("An XML Note");

        // 在根元素中使用 addProcessingInstruction()方法增加一个处理指令

        root.addProcessingInstruction("target", "text");

        // 在根元素中使用 addElement()方法增加employee元素。

        Element empElem = root.addElement("employee");

        // 使用 addAttribute()方法向employee元素添加id和name属性

        empElem.addAttribute("id", "0001");

        empElem.addAttribute("name", "wanglp");

        // 向employee元素中添加sex元素

        Element sexElem = empElem.addElement("sex");

        // 使用setText()方法设置sex元素的文本

        sexElem.setText("m");

        // 在employee元素中增加age元素 并设置该元素的文本。

        Element ageElem = empElem.addElement("age");

        ageElem.setText("25");

        // 在根元素中使用 addElement()方法增加employee元素。

        Element emp2Elem = root.addElement("employee");

        // 使用 addAttribute()方法向employee元素添加id和name属性

        emp2Elem.addAttribute("id", "0002");

        emp2Elem.addAttribute("name", "fox");

        // 向employee元素中添加sex元素

        Element sex2Elem = emp2Elem.addElement("sex");

        // 使用setText()方法设置sex元素的文本

        sex2Elem.setText("f");

        // 在employee元素中增加age元素 并设置该元素的文本。

        Element age2Elem = emp2Elem.addElement("age");

        age2Elem.setText("24");

        // 可以使用 addDocType()方法添加文档类型说明。

        // document.addDocType("employees", null, "file://E:/Dtds/dom4j.dtd");

        // 这样就向 XML 文档中增加文档类型说明：

        // <!DOCTYPE employees SYSTEM "file://E:/Dtds/dom4j.dtd">

        // 如果文档要使用文档类型定义（DTD）文档验证则必须有 Doctype。

        try {

            XMLWriter output = new XMLWriter(new FileWriter(file));

            output.write(document);

            output.close();

        } catch (IOException e) {

            System.out.println(e.getMessage());

        }

    }    

    /**

     * 利用dom4j进行xml文档的读取操作

     */

    public void parserXml(File file) {    

        Document document = null;

        // 使用 SAXReader 解析 XML 文档 catalog.xml：

        SAXReader saxReader = new SAXReader();

        try {

            document = saxReader.read(file);

        } catch (DocumentException e) {

            e.printStackTrace();

        }

        // 将字符串转为XML

        // document = DocumentHelper.parseText(fileString);

        // 获取根节点

        Element root = document.getRootElement();

        // 打印节点名称

        System.out.println("<" + root.getName() + ">");

        // 获取根节点下的子节点遍历

        Iterator<?> iter = root.elementIterator("employee");

        // 遍历employee节点

        while (iter.hasNext()) {

            // 获取当前子节点

            Element empEle = (Element) iter.next();

            System.out.println("<" + empEle.getName() + ">");

            // 获取当前子节点的属性遍历

            Iterator<?> attrList = empEle.attributeIterator();

            while (attrList.hasNext()) {

                Attribute attr = (Attribute) attrList.next();

                System.out.println(attr.getName() + "=" + attr.getValue());

            }

            // 遍历employee节点下所有子节点

            Iterator<?> eleIte = empEle.elementIterator();

            while (eleIte.hasNext()) {

                Element ele = (Element) eleIte.next();

                System.out.println("<" + ele.getName() + ">" + ele.getTextTrim());

            }

            // 获取employee节点下的子节点sex值

            // String sex = empEle.elementTextTrim("sex");

            // System.out.println("sex:" + sex);

        }

        System.out.println("</" + root.getName() + ">");

    }    

    public static void main(String[] args) {

        Dom4jDemo dom4j = new Dom4jDemo();

        File file = new File("e:/dom4j.xml");

        // dom4j.createXml(file);

        dom4j.parserXml(file);

    }

}

10 给文档添加注释

node.add(newComment)之后, 注释会出现在该node内部的尾部,如果想让位置放到正确位置, 需要先遍历清空node下的所有元素,再把遍历过程中的新节点(含注释Comment和元素Element)列表放回去, 不能一边遍历一边增删 ,不然会报ConcurrentModificationException异常.

Comment newComment = new DefaultComment("↓"+cnName.toString()+"↓");

node.add(newComment);

个人项目实践: xmljsoncomment项目-->XmlCommentTool.java

递归遍历所有子节点

参考自: http://blog.csdn.net/chenleixing/article/details/44353491

import java.util.List;

import org.apache.log4j.Logger;

import org.dom4j.Attribute;

import org.dom4j.Document;

import org.dom4j.DocumentHelper;

import org.dom4j.Element;

import com.testdemo.core.service.common.impl.ConnectToV5ByXmlImpl;

import com.testdemo.core.util.AESTool;

import com.testdemo.core.util.FileTool;

import com.web.CommRemoteCall;

public class Dom4jIterator  {

    /**

     * 从指定节点开始,递归遍历所有子节点

     *

     * @author chenleixing

     */

    public static void iteratorNodes(Element node) throws Exception{

        System.out.println("--------------------");

        // 当前节点的名称、文本内容和属性

        System.out.println("当前节点名称：" + node.getName());// 当前节点名称

        System.out.println("当前节点的内容：" + node.getTextTrim());// 当前节点名称

        List<Attribute> listAttr = node.attributes();// 当前节点的所有属性的list

        for (Attribute attr : listAttr) {// 遍历当前节点的所有属性

            String name = attr.getName();// 属性名称

            String value = attr.getValue();// 属性的值

            System.out.println("属性名称：" + name + "属性值：" + value);

        }

        // 递归遍历当前节点所有的子节点

        List<Element> listElement = node.elements();// 所有一级子节点的list

        for (Element e : listElement) {// 遍历所有一级子节点

            iteratorNodes(e);// 递归

        }

    }

    public static void main(String[] args) throws Exception{

        // 读取文件

        String xmlStr = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><Order property ='a'><Cid>456</Cid><Pwd>密码</Pwd><Pid>商品单号</Pid><Prices><Price>商品价格01</Price><Price>商品价格02</Price></Prices></Order>";

        System.out.println(xmlStr);

        Document document = DocumentHelper.parseText(xmlStr);

        Element element = document.getRootElement();

        iteratorNodes(element);

    }

}

如何，DOM4J够简单吧，当然，还有一些复杂的应用没有提到，如ElementHandler等。如果你动心了，那就一起来用DOM4J.

其它相关

$使用dom4j可解析返回&#x等字样的 html转义字符

dom4j 创建缩进换行格式的xml，并输出xml到字符串中

优秀文章:

Dom4j完整教程，操作XML教程【转】