XML文件解析之SAX解析

使用DOM解析的时候是需要把文档的所有内容读入内存然后建立一个DOM树结构，然后通过DOM提供的接口来实现XML文件的解析，如果文件比较小的时候肯定是很方便的。但是如果是XML文件很大的话，那么这种方式的解析效率肯定会大打折扣的，所以SAX解析就很有必要的了。SAX采用基于事件驱动的处理方式，它将XML文档转换成一系列的事件，由单独的事件处理器来决定如何处理。在读入文档的过程中便实现了解析过程，现在就简单介绍下SAX解析的具体实现过程。

1.主要对象

SAXParserFactory：解析工厂

SAXParser：解析器，通过解析工厂获取

ContentHander、DTDHander、ErrorHandler，EntityResolver：事件处理器接口

DefaultHandler：继承了上面的四个事件接口，在实际开发中直接从DefaultHandler继承并实现相关函数就可以了

2.XML文档

和上次DOM解析的XML文件是一样的

<?xml version="1.0" encoding="UTF-8"?>

<world>

    <comuntry id="1">

        <name>China</name>

        <capital>Beijing</capital>

        <population>1234</population>

        <area>960</area>

    </comuntry>

    <comuntry id="2">

        <name id="">America</name>

        <capital>Washington</capital>

        <population>234</population>

        <area>900</area>

    </comuntry>

    <comuntry id="3">

        <name >Japan</name>

        <capital>Tokyo</capital>

        <population>234</population>

        <area>60</area>

    </comuntry>

    <comuntry id="4">

        <name >Russia</name>

        <capital>Moscow</capital>

        <population>34</population>

        <area>1960</area>

    </comuntry>

</world>

3.主要接口分析

EntityResolver ：

package org.xml.sax;

import java.io.IOException;

public interface EntityResolver {

    /**

     * Allow the application to resolve external entities.

     *

     * <p>The parser will call this method before opening any external

     * entity except the top-level document entity.  Such entities include

     * the external DTD subset and external parameter entities referenced

     * within the DTD (in either case, only if the parser reads external

     * parameter entities), and external general entities referenced

     * within the document element (if the parser reads external general

     * entities).  The application may request that the parser locate

     * the entity itself, that it use an alternative URI, or that it

     * use data provided by the application (as a character or byte

     * input stream).</p>

     *

     * <p>Application writers can use this method to redirect external

     * system identifiers to secure and/or local URIs, to look up

     * public identifiers in a catalogue, or to read an entity from a

     * database or other input source (including, for example, a dialog

     * box).  Neither XML nor SAX specifies a preferred policy for using

     * public or system IDs to resolve resources.  However, SAX specifies

     * how to interpret any InputSource returned by this method, and that

     * if none is returned, then the system ID will be dereferenced as

     * a URL.  </p>

     *

     * <p>If the system identifier is a URL, the SAX parser must

     * resolve it fully before reporting it to the application.</p>

     *

     * @param publicId The public identifier of the external entity

     *        being referenced, or null if none was supplied.

     * @param systemId The system identifier of the external entity

     *        being referenced.

     * @return An InputSource object describing the new input source,

     *         or null to request that the parser open a regular

     *         URI connection to the system identifier.

     * @exception org.xml.sax.SAXException Any SAX exception, possibly

     *            wrapping another exception.

     * @exception java.io.IOException A Java-specific IO exception,

     *            possibly the result of creating a new InputStream

     *            or Reader for the InputSource.

     * @see org.xml.sax.InputSource

     */

    public abstract InputSource resolveEntity (String publicId,

                                               String systemId)

        throws SAXException, IOException;

}

DTDHandler ：

package org.xml.sax;

/**

 * Receive notification of basic DTD-related events.

 *

 * <blockquote>

 * <em>This module, both source code and documentation, is in the

 * Public Domain, and comes with <strong>NO WARRANTY</strong>.</em>

 * See <a href='http://www.saxproject.org'>http://www.saxproject.org</a>

 * for further information.

 * </blockquote>

 *

 * <p>If a SAX application needs information about notations and

 * unparsed entities, then the application implements this

 * interface and registers an instance with the SAX parser using

 * the parser's setDTDHandler method.  The parser uses the

 * instance to report notation and unparsed entity declarations to

 * the application.</p>

 *

 * <p>Note that this interface includes only those DTD events that

 * the XML recommendation <em>requires</em> processors to report:

 * notation and unparsed entity declarations.</p>

 *

 * <p>The SAX parser may report these events in any order, regardless

 * of the order in which the notations and unparsed entities were

 * declared; however, all DTD events must be reported after the

 * document handler's startDocument event, and before the first

 * startElement event.

 * (If the {@link org.xml.sax.ext.LexicalHandler LexicalHandler} is

 * used, these events must also be reported before the endDTD event.)

 * </p>

 *

 * <p>It is up to the application to store the information for

 * future use (perhaps in a hash table or object tree).

 * If the application encounters attributes of type "NOTATION",

 * "ENTITY", or "ENTITIES", it can use the information that it

 * obtained through this interface to find the entity and/or

 * notation corresponding with the attribute value.</p>

 *

 * @since SAX 1.0

 * @author David Megginson

 * @see org.xml.sax.XMLReader#setDTDHandler

 */

public interface DTDHandler {

    /**

     * Receive notification of a notation declaration event.

     *

     * <p>It is up to the application to record the notation for later

     * reference, if necessary;

     * notations may appear as attribute values and in unparsed entity

     * declarations, and are sometime used with processing instruction

     * target names.</p>

     *

     * <p>At least one of publicId and systemId must be non-null.

     * If a system identifier is present, and it is a URL, the SAX

     * parser must resolve it fully before passing it to the

     * application through this event.</p>

     *

     * <p>There is no guarantee that the notation declaration will be

     * reported before any unparsed entities that use it.</p>

     *

     * @param name The notation name.

     * @param publicId The notation's public identifier, or null if

     *        none was given.

     * @param systemId The notation's system identifier, or null if

     *        none was given.

     * @exception org.xml.sax.SAXException Any SAX exception, possibly

     *            wrapping another exception.

     * @see #unparsedEntityDecl

     * @see org.xml.sax.Attributes

     */

    public abstract void notationDecl (String name,

                                       String publicId,

                                       String systemId)

        throws SAXException;

    /**

     * Receive notification of an unparsed entity declaration event.

     *

     * <p>Note that the notation name corresponds to a notation

     * reported by the {@link #notationDecl notationDecl} event.

     * It is up to the application to record the entity for later

     * reference, if necessary;

     * unparsed entities may appear as attribute values.

     * </p>

     *

     * <p>If the system identifier is a URL, the parser must resolve it

     * fully before passing it to the application.</p>

     *

     * @exception org.xml.sax.SAXException Any SAX exception, possibly

     *            wrapping another exception.

     * @param name The unparsed entity's name.

     * @param publicId The entity's public identifier, or null if none

     *        was given.

     * @param systemId The entity's system identifier.

     * @param notationName The name of the associated notation.

     * @see #notationDecl

     * @see org.xml.sax.Attributes

     */

    public abstract void unparsedEntityDecl (String name,

                                             String publicId,

                                             String systemId,

                                             String notationName)

        throws SAXException;

}

ContentHandler：

package org.xml.sax;

/**

 * Receive notification of the logical content of a document.

 *

 * <blockquote>

 * <em>This module, both source code and documentation, is in the

 * Public Domain, and comes with <strong>NO WARRANTY</strong>.</em>

 * See <a href='http://www.saxproject.org'>http://www.saxproject.org</a>

 * for further information.

 * </blockquote>

 *

 * <p>This is the main interface that most SAX applications

 * implement: if the application needs to be informed of basic parsing

 * events, it implements this interface and registers an instance with

 * the SAX parser using the {@link org.xml.sax.XMLReader#setContentHandler

 * setContentHandler} method.  The parser uses the instance to report

 * basic document-related events like the start and end of elements

 * and character data.</p>

 *

 * <p>The order of events in this interface is very important, and

 * mirrors the order of information in the document itself.  For

 * example, all of an element's content (character data, processing

 * instructions, and/or subelements) will appear, in order, between

 * the startElement event and the corresponding endElement event.</p>

 *

 * <p>This interface is similar to the now-deprecated SAX 1.0

 * DocumentHandler interface, but it adds support for Namespaces

 * and for reporting skipped entities (in non-validating XML

 * processors).</p>

 *

 * <p>Implementors should note that there is also a

 * <code>ContentHandler</code> class in the <code>java.net</code>

 * package; that means that it's probably a bad idea to do</p>

 *

 * <pre>import java.net.*;

 * import org.xml.sax.*;

 * </pre>

 *

 * <p>In fact, "import ...*" is usually a sign of sloppy programming

 * anyway, so the user should consider this a feature rather than a

 * bug.</p>

 *

 * @since SAX 2.0

 * @author David Megginson

 * @see org.xml.sax.XMLReader

 * @see org.xml.sax.DTDHandler

 * @see org.xml.sax.ErrorHandler

 */

public interface ContentHandler

{

    /**

     * Receive an object for locating the origin of SAX document events.

     *

     * <p>SAX parsers are strongly encouraged (though not absolutely

     * required) to supply a locator: if it does so, it must supply

     * the locator to the application by invoking this method before

     * invoking any of the other methods in the ContentHandler

     * interface.</p>

     *

     * <p>The locator allows the application to determine the end

     * position of any document-related event, even if the parser is

     * not reporting an error.  Typically, the application will

     * use this information for reporting its own errors (such as

     * character content that does not match an application's

     * business rules).  The information returned by the locator

     * is probably not sufficient for use with a search engine.</p>

     *

     * <p>Note that the locator will return correct information only

     * during the invocation SAX event callbacks after

     * {@link #startDocument startDocument} returns and before

     * {@link #endDocument endDocument} is called.  The

     * application should not attempt to use it at any other time.</p>

     *

     * @param locator an object that can return the location of

     *                any SAX document event

     * @see org.xml.sax.Locator

     */

    public void setDocumentLocator (Locator locator);

    /**

     * Receive notification of the beginning of a document.

     *

     * <p>The SAX parser will invoke this method only once, before any

     * other event callbacks (except for {@link #setDocumentLocator

     * setDocumentLocator}).</p>

     *

     * @throws org.xml.sax.SAXException any SAX exception, possibly

     *            wrapping another exception

     * @see #endDocument

     */

    public void startDocument ()

        throws SAXException;

    /**

     * Receive notification of the end of a document.

     *

     * <p><strong>There is an apparent contradiction between the

     * documentation for this method and the documentation for {@link

     * org.xml.sax.ErrorHandler#fatalError}.  Until this ambiguity is

     * resolved in a future major release, clients should make no

     * assumptions about whether endDocument() will or will not be

     * invoked when the parser has reported a fatalError() or thrown

     * an exception.</strong></p>

     *

     * <p>The SAX parser will invoke this method only once, and it will

     * be the last method invoked during the parse.  The parser shall

     * not invoke this method until it has either abandoned parsing

     * (because of an unrecoverable error) or reached the end of

     * input.</p>

     *

     * @throws org.xml.sax.SAXException any SAX exception, possibly

     *            wrapping another exception

     * @see #startDocument

     */

    public void endDocument()

        throws SAXException;

    /**

     * Begin the scope of a prefix-URI Namespace mapping.

     *

     * <p>The information from this event is not necessary for

     * normal Namespace processing: the SAX XML reader will

     * automatically replace prefixes for element and attribute

     * names when the <code>http://xml.org/sax/features/namespaces</code>

     * feature is <var>true</var> (the default).</p>

     *

     * <p>There are cases, however, when applications need to

     * use prefixes in character data or in attribute values,

     * where they cannot safely be expanded automatically; the

     * start/endPrefixMapping event supplies the information

     * to the application to expand prefixes in those contexts

     * itself, if necessary.</p>

     *

     * <p>Note that start/endPrefixMapping events are not

     * guaranteed to be properly nested relative to each other:

     * all startPrefixMapping events will occur immediately before the

     * corresponding {@link #startElement startElement} event,

     * and all {@link #endPrefixMapping endPrefixMapping}

     * events will occur immediately after the corresponding

     * {@link #endElement endElement} event,

     * but their order is not otherwise

     * guaranteed.</p>

     *

     * <p>There should never be start/endPrefixMapping events for the

     * "xml" prefix, since it is predeclared and immutable.</p>

     *

     * @param prefix the Namespace prefix being declared.

     *  An empty string is used for the default element namespace,

     *  which has no prefix.

     * @param uri the Namespace URI the prefix is mapped to

     * @throws org.xml.sax.SAXException the client may throw

     *            an exception during processing

     * @see #endPrefixMapping

     * @see #startElement

     */

    public void startPrefixMapping (String prefix, String uri)

        throws SAXException;

    /**

     * End the scope of a prefix-URI mapping.

     *

     * <p>See {@link #startPrefixMapping startPrefixMapping} for

     * details.  These events will always occur immediately after the

     * corresponding {@link #endElement endElement} event, but the order of

     * {@link #endPrefixMapping endPrefixMapping} events is not otherwise

     * guaranteed.</p>

     *

     * @param prefix the prefix that was being mapped.

     *  This is the empty string when a default mapping scope ends.

     * @throws org.xml.sax.SAXException the client may throw

     *            an exception during processing

     * @see #startPrefixMapping

     * @see #endElement

     */

    public void endPrefixMapping (String prefix)

        throws SAXException;

    /**

     * Receive notification of the beginning of an element.

     *

     * <p>The Parser will invoke this method at the beginning of every

     * element in the XML document; there will be a corresponding

     * {@link #endElement endElement} event for every startElement event

     * (even when the element is empty). All of the element's content will be

     * reported, in order, before the corresponding endElement

     * event.</p>

     *

     * <p>This event allows up to three name components for each

     * element:</p>

     *

     * <ol>

     * <li>the Namespace URI;</li>

     * <li>the local name; and</li>

     * <li>the qualified (prefixed) name.</li>

     * </ol>

     *

     * <p>Any or all of these may be provided, depending on the

     * values of the <var>http://xml.org/sax/features/namespaces</var>

     * and the <var>http://xml.org/sax/features/namespace-prefixes</var>

     * properties:</p>

     *

     * <ul>

     * <li>the Namespace URI and local name are required when

     * the namespaces property is <var>true</var> (the default), and are

     * optional when the namespaces property is <var>false</var> (if one is

     * specified, both must be);</li>

     * <li>the qualified name is required when the namespace-prefixes property

     * is <var>true</var>, and is optional when the namespace-prefixes property

     * is <var>false</var> (the default).</li>

     * </ul>

     *

     * <p>Note that the attribute list provided will contain only

     * attributes with explicit values (specified or defaulted):

     * #IMPLIED attributes will be omitted.  The attribute list

     * will contain attributes used for Namespace declarations

     * (xmlns* attributes) only if the

     * <code>http://xml.org/sax/features/namespace-prefixes</code>

     * property is true (it is false by default, and support for a

     * true value is optional).</p>

     *

     * <p>Like {@link #characters characters()}, attribute values may have

     * characters that need more than one <code>char</code> value.  </p>

     *

     * @param uri the Namespace URI, or the empty string if the

     *        element has no Namespace URI or if Namespace

     *        processing is not being performed

     * @param localName the local name (without prefix), or the

     *        empty string if Namespace processing is not being

     *        performed

     * @param qName the qualified name (with prefix), or the

     *        empty string if qualified names are not available

     * @param atts the attributes attached to the element.  If

     *        there are no attributes, it shall be an empty

     *        Attributes object.  The value of this object after

     *        startElement returns is undefined

     * @throws org.xml.sax.SAXException any SAX exception, possibly

     *            wrapping another exception

     * @see #endElement

     * @see org.xml.sax.Attributes

     * @see org.xml.sax.helpers.AttributesImpl

     */

    public void startElement (String uri, String localName,

                              String qName, Attributes atts)

        throws SAXException;

    /**

     * Receive notification of the end of an element.

     *

     * <p>The SAX parser will invoke this method at the end of every

     * element in the XML document; there will be a corresponding

     * {@link #startElement startElement} event for every endElement

     * event (even when the element is empty).</p>

     *

     * <p>For information on the names, see startElement.</p>

     *

     * @param uri the Namespace URI, or the empty string if the

     *        element has no Namespace URI or if Namespace

     *        processing is not being performed

     * @param localName the local name (without prefix), or the

     *        empty string if Namespace processing is not being

     *        performed

     * @param qName the qualified XML name (with prefix), or the

     *        empty string if qualified names are not available

     * @throws org.xml.sax.SAXException any SAX exception, possibly

     *            wrapping another exception

     */

    public void endElement (String uri, String localName,

                            String qName)

        throws SAXException;

    /**

     * Receive notification of character data.

     *

     * <p>The Parser will call this method to report each chunk of

     * character data.  SAX parsers may return all contiguous character

     * data in a single chunk, or they may split it into several

     * chunks; however, all of the characters in any single event

     * must come from the same external entity so that the Locator

     * provides useful information.</p>

     *

     * <p>The application must not attempt to read from the array

     * outside of the specified range.</p>

     *

     * <p>Individual characters may consist of more than one Java

     * <code>char</code> value.  There are two important cases where this

     * happens, because characters can't be represented in just sixteen bits.

     * In one case, characters are represented in a <em>Surrogate Pair</em>,

     * using two special Unicode values. Such characters are in the so-called

     * "Astral Planes", with a code point above U+FFFF.  A second case involves

     * composite characters, such as a base character combining with one or

     * more accent characters. </p>

     *

     * <p> Your code should not assume that algorithms using

     * <code>char</code>-at-a-time idioms will be working in character

     * units; in some cases they will split characters.  This is relevant

     * wherever XML permits arbitrary characters, such as attribute values,

     * processing instruction data, and comments as well as in data reported

     * from this method.  It's also generally relevant whenever Java code

     * manipulates internationalized text; the issue isn't unique to XML.</p>

     *

     * <p>Note that some parsers will report whitespace in element

     * content using the {@link #ignorableWhitespace ignorableWhitespace}

     * method rather than this one (validating parsers <em>must</em>

     * do so).</p>

     *

     * @param ch the characters from the XML document

     * @param start the start position in the array

     * @param length the number of characters to read from the array

     * @throws org.xml.sax.SAXException any SAX exception, possibly

     *            wrapping another exception

     * @see #ignorableWhitespace

     * @see org.xml.sax.Locator

     */

    public void characters (char ch[], int start, int length)

        throws SAXException;

    /**

     * Receive notification of ignorable whitespace in element content.

     *

     * <p>Validating Parsers must use this method to report each chunk

     * of whitespace in element content (see the W3C XML 1.0

     * recommendation, section 2.10): non-validating parsers may also

     * use this method if they are capable of parsing and using

     * content models.</p>

     *

     * <p>SAX parsers may return all contiguous whitespace in a single

     * chunk, or they may split it into several chunks; however, all of

     * the characters in any single event must come from the same

     * external entity, so that the Locator provides useful

     * information.</p>

     *

     * <p>The application must not attempt to read from the array

     * outside of the specified range.</p>

     *

     * @param ch the characters from the XML document

     * @param start the start position in the array

     * @param length the number of characters to read from the array

     * @throws org.xml.sax.SAXException any SAX exception, possibly

     *            wrapping another exception

     * @see #characters

     */

    public void ignorableWhitespace (char ch[], int start, int length)

        throws SAXException;

    /**

     * Receive notification of a processing instruction.

     *

     * <p>The Parser will invoke this method once for each processing

     * instruction found: note that processing instructions may occur

     * before or after the main document element.</p>

     *

     * <p>A SAX parser must never report an XML declaration (XML 1.0,

     * section 2.8) or a text declaration (XML 1.0, section 4.3.1)

     * using this method.</p>

     *

     * <p>Like {@link #characters characters()}, processing instruction

     * data may have characters that need more than one <code>char</code>

     * value. </p>

     *

     * @param target the processing instruction target

     * @param data the processing instruction data, or null if

     *        none was supplied.  The data does not include any

     *        whitespace separating it from the target

     * @throws org.xml.sax.SAXException any SAX exception, possibly

     *            wrapping another exception

     */

    public void processingInstruction (String target, String data)

        throws SAXException;

    /**

     * Receive notification of a skipped entity.

     * This is not called for entity references within markup constructs

     * such as element start tags or markup declarations.  (The XML

     * recommendation requires reporting skipped external entities.

     * SAX also reports internal entity expansion/non-expansion, except

     * within markup constructs.)

     *

     * <p>The Parser will invoke this method each time the entity is

     * skipped.  Non-validating processors may skip entities if they

     * have not seen the declarations (because, for example, the

     * entity was declared in an external DTD subset).  All processors

     * may skip external entities, depending on the values of the

     * <code>http://xml.org/sax/features/external-general-entities</code>

     * and the

     * <code>http://xml.org/sax/features/external-parameter-entities</code>

     * properties.</p>

     *

     * @param name the name of the skipped entity.  If it is a

     *        parameter entity, the name will begin with '%', and if

     *        it is the external DTD subset, it will be the string

     *        "[dtd]"

     * @throws org.xml.sax.SAXException any SAX exception, possibly

     *            wrapping another exception

     */

    public void skippedEntity (String name)

        throws SAXException;

}

ErrorHandler：

package org.xml.sax;

/**

 * Basic interface for SAX error handlers.

 *

 * <blockquote>

 * <em>This module, both source code and documentation, is in the

 * Public Domain, and comes with <strong>NO WARRANTY</strong>.</em>

 * See <a href='http://www.saxproject.org'>http://www.saxproject.org</a>

 * for further information.

 * </blockquote>

 *

 * <p>If a SAX application needs to implement customized error

 * handling, it must implement this interface and then register an

 * instance with the XML reader using the

 * {@link org.xml.sax.XMLReader#setErrorHandler setErrorHandler}

 * method.  The parser will then report all errors and warnings

 * through this interface.</p>

 *

 * <p><strong>WARNING:</strong> If an application does <em>not</em>

 * register an ErrorHandler, XML parsing errors will go unreported,

 * except that <em>SAXParseException</em>s will be thrown for fatal errors.

 * In order to detect validity errors, an ErrorHandler that does something

 * with {@link #error error()} calls must be registered.</p>

 *

 * <p>For XML processing errors, a SAX driver must use this interface

 * in preference to throwing an exception: it is up to the application

 * to decide whether to throw an exception for different types of

 * errors and warnings.  Note, however, that there is no requirement that

 * the parser continue to report additional errors after a call to

 * {@link #fatalError fatalError}.  In other words, a SAX driver class

 * may throw an exception after reporting any fatalError.

 * Also parsers may throw appropriate exceptions for non-XML errors.

 * For example, {@link XMLReader#parse XMLReader.parse()} would throw

 * an IOException for errors accessing entities or the document.</p>

 *

 * @since SAX 1.0

 * @author David Megginson

 * @see org.xml.sax.XMLReader#setErrorHandler

 * @see org.xml.sax.SAXParseException

 */

public interface ErrorHandler {

    /**

     * Receive notification of a warning.

     *

     * <p>SAX parsers will use this method to report conditions that

     * are not errors or fatal errors as defined by the XML

     * recommendation.  The default behaviour is to take no

     * action.</p>

     *

     * <p>The SAX parser must continue to provide normal parsing events

     * after invoking this method: it should still be possible for the

     * application to process the document through to the end.</p>

     *

     * <p>Filters may use this method to report other, non-XML warnings

     * as well.</p>

     *

     * @param exception The warning information encapsulated in a

     *                  SAX parse exception.

     * @exception org.xml.sax.SAXException Any SAX exception, possibly

     *            wrapping another exception.

     * @see org.xml.sax.SAXParseException

     */

    public abstract void warning (SAXParseException exception)

        throws SAXException;

    /**

     * Receive notification of a recoverable error.

     *

     * <p>This corresponds to the definition of "error" in section 1.2

     * of the W3C XML 1.0 Recommendation.  For example, a validating

     * parser would use this callback to report the violation of a

     * validity constraint.  The default behaviour is to take no

     * action.</p>

     *

     * <p>The SAX parser must continue to provide normal parsing

     * events after invoking this method: it should still be possible

     * for the application to process the document through to the end.

     * If the application cannot do so, then the parser should report

     * a fatal error even if the XML recommendation does not require

     * it to do so.</p>

     *

     * <p>Filters may use this method to report other, non-XML errors

     * as well.</p>

     *

     * @param exception The error information encapsulated in a

     *                  SAX parse exception.

     * @exception org.xml.sax.SAXException Any SAX exception, possibly

     *            wrapping another exception.

     * @see org.xml.sax.SAXParseException

     */

    public abstract void error (SAXParseException exception)

        throws SAXException;

    /**

     * Receive notification of a non-recoverable error.

     *

     * <p><strong>There is an apparent contradiction between the

     * documentation for this method and the documentation for {@link

     * org.xml.sax.ContentHandler#endDocument}.  Until this ambiguity

     * is resolved in a future major release, clients should make no

     * assumptions about whether endDocument() will or will not be

     * invoked when the parser has reported a fatalError() or thrown

     * an exception.</strong></p>

     *

     * <p>This corresponds to the definition of "fatal error" in

     * section 1.2 of the W3C XML 1.0 Recommendation.  For example, a

     * parser would use this callback to report the violation of a

     * well-formedness constraint.</p>

     *

     * <p>The application must assume that the document is unusable

     * after the parser has invoked this method, and should continue

     * (if at all) only for the sake of collecting additional error

     * messages: in fact, SAX parsers are free to stop reporting any

     * other events once this method has been invoked.</p>

     *

     * @param exception The error information encapsulated in a

     *                  SAX parse exception.

     * @exception org.xml.sax.SAXException Any SAX exception, possibly

     *            wrapping another exception.

     * @see org.xml.sax.SAXParseException

     */

    public abstract void fatalError (SAXParseException exception)

        throws SAXException;

}

上面是四个基本处理事件的接口源码，通过阅读代码就可以知道每个事件需要完成的事情。

4.SAX解析具体实现过程，主要包括两个过程一个是解析规则的定义还有就是文件的读取

事件处理MyHandler.java

import java.io.IOException;

import org.xml.sax.Attributes;

import org.xml.sax.InputSource;

import org.xml.sax.Locator;

import org.xml.sax.SAXException;

import org.xml.sax.SAXParseException;

import org.xml.sax.helpers.DefaultHandler;

public class MyHandler extends DefaultHandler {

    /**

     * 开始前缀 URI 名称空间范围映射。

     * 此事件的信息对于常规的命名空间处理并非必需：

     * 当 http://xml.org/sax/features/namespaces 功能为 true（默认）时，

     * SAX XML 读取器将自动替换元素和属性名称的前缀。

     * 参数意义如下：

     *    prefix ：前缀

     *    uri ：命名空间

     */

    @Override

    public void startPrefixMapping(String prefix, String uri)

            throws SAXException {

        // TODO Auto-generated method stub

         System.out.println("(startPrefixMapping)start prefix_mapping : xmlns:"+prefix+" = "

                    +"\""+uri+"\"");

    }

    /**

     * 结束前缀 URI 范围的映射。

     * @param prefix  前缀

     */

    @Override

    public void endPrefixMapping(String prefix) throws SAXException {

        // TODO Auto-generated method stub

        System.out.println("(endPrefixMapping)end prefix_mapping : "+prefix);

    }

    /**

     * 文档结束

     */

    @Override

    public void endDocument() throws SAXException {

        // TODO Auto-generated method stub

        System.out.println("(endDocument)doument is ended");

    }

    /**

     * 接收文档的结尾的通知。

     * 参数意义如下：

     *    uri ：元素的命名空间

     *    localName ：元素的本地名称（不带前缀）

     *    qName ：元素的限定名（带前缀）

     */

    @Override

    public void endElement(String uri, String localName, String qName)

            throws SAXException {

        // TODO Auto-generated method stub

        System.out.println("(endElement)end element : "+qName+"("+uri+")");

    }

    /**

     * 接收元素内容中可忽略的空白的通知。

     * 参数意义如下：

     *     ch : 来自 XML 文档的字符

     *     start : 数组中的开始位置

     *     length : 从数组中读取的字符的个数

     */

    @Override

    public void ignorableWhitespace(char[] ch, int start, int length)

            throws SAXException {

        // TODO Auto-generated method stub

        StringBuffer buffer = new StringBuffer();

        for(int i = start ; i < start+length ; i++){

            switch(ch[i]){

                case '\\':buffer.append("\\\\");break;

                case '\r':buffer.append("\\r");break;

                case '\n':buffer.append("\\n");break;

                case '\t':buffer.append("\\t");break;

                case '\"':buffer.append("\\\"");break;

                default : buffer.append(ch[i]);

            }

        }

        System.out.println("(ignorableWhitespace)ignorable whitespace("+length+"): "+buffer.toString());

    }

    /**

     * 接收用来查找 SAX 文档事件起源的对象。

     * 参数意义如下：

     *     locator : 可以返回任何 SAX 文档事件位置的对象

     */

    @Override

    public void setDocumentLocator(Locator locator) {

        // TODO Auto-generated method stub

        System.out.println("(setDocumentLocator)set document_locator : (lineNumber = "+locator.getLineNumber()

                +",columnNumber = "+locator.getColumnNumber()

                +",systemId = "+locator.getSystemId()

                +",publicId = "+locator.getPublicId()+")");

    }

    /**

     * 接收文档的开始的通知。

     */

    @Override

    public void startDocument() throws SAXException {

        // TODO Auto-generated method stub

        System.out.println("(startDocument)document is startting");

    }

    /**

     * 接收元素开始的通知。

     * 参数意义如下：

     *    uri ：元素的命名空间

     *    localName ：元素的本地名称（不带前缀）

     *    qName ：元素的限定名（带前缀）

     *    atts ：元素的属性集合

     */

    @Override

    public void startElement(String uri, String localName, String qName,

            Attributes attributes) throws SAXException {

        // TODO Auto-generated method stub

         System.out.println("(startElement)start element : "+qName+"("+uri+")");

    }

    /**

     * 接收注释声明事件的通知。

     * 参数意义如下：

     *     name - 注释名称。

     *     publicId - 注释的公共标识符，如果未提供，则为 null。

     *     systemId - 注释的系统标识符，如果未提供，则为 null。

     */

    @Override

    public void notationDecl(String name, String publicId, String systemId)

            throws SAXException {

        // TODO Auto-generated method stub

        System.out.println("(notationDecl)notation declare : (name = "+name

                +",systemId = "+publicId

                +",publicId = "+systemId+")");

    }

    /**

     * 允许应用程序解析外部实体。

     * 解析器将在打开任何外部实体（顶级文档实体除外）前调用此方法

     * 参数意义如下：

     *     publicId ： 被引用的外部实体的公共标识符，如果未提供，则为 null。

     *     systemId ： 被引用的外部实体的系统标识符。

     * 返回：

     *     一个描述新输入源的 InputSource 对象，或者返回 null，

     *     以请求解析器打开到系统标识符的常规 URI 连接。

     */

    @Override

    public InputSource resolveEntity(String publicId, String systemId)

            throws IOException, SAXException {

        // TODO Auto-generated method stub

        return super.resolveEntity(publicId, systemId);

    }

    /**

     * 接收跳过的实体的通知。

     * 参数意义如下：

     * name : 所跳过的实体的名称。如果它是参数实体，则名称将以 '%' 开头，

     *            如果它是外部 DTD 子集，则将是字符串 "[dtd]"

     */

    @Override

    public void skippedEntity(String name) throws SAXException {

        // TODO Auto-generated method stub

        System.out.println("(skippedEntity)the name of the skipped entity : "+name);

    }

    /**

     * 接收未解析的实体声明事件的通知。

     * 参数意义如下：

     *     name - 未解析的实体的名称。

     *     publicId - 实体的公共标识符，如果未提供，则为 null。

     *     systemId - 实体的系统标识符。

     *     notationName - 相关注释的名称。

     */

    @Override

    public void unparsedEntityDecl(String name, String publicId,

            String systemId, String notationName) throws SAXException {

        // TODO Auto-generated method stub

          System.out.println("(unparsedEntityDecl)unparsed entity declare : (name = "+name

                    +",systemId = "+publicId

                    +",publicId = "+systemId

                    +",notationName = "+notationName+")");

    }

    /**

     * 接收处理指令的通知。

     * 参数意义如下：

     *     target : 处理指令目标

     *     data : 处理指令数据，如果未提供，则为 null。

     */

    @Override

    public void processingInstruction(String target, String data)

            throws SAXException {

        // TODO Auto-generated method stub

         System.out.println("(processingInstruction)process instruction : (target = \""

                    +target+"\",data = \""+data+"\")");

    }

    /**

     * 接收字符数据的通知。

     * 在DOM中 ch[begin:end] 相当于Text节点的节点值（nodeValue）

     */

    @Override

    public void characters(char[] ch, int start, int length)

            throws SAXException {

        // TODO Auto-generated method stub

          StringBuffer buffer = new StringBuffer();

            for(int i = start ; i < start+length ; i++){

                switch(ch[i]){

                    case '\\':buffer.append("\\\\");break;

                    case '\r':buffer.append("\\r");break;

                    case '\n':buffer.append("\\n");break;

                    case '\t':buffer.append("\\t");break;

                    case '\"':buffer.append("\\\"");break;

                    default : buffer.append(ch[i]);

                }

            }

            System.out.println("(characters)characters("+length+"): "+buffer.toString());

    }

    /**

     * 错误异常处理 可恢复

     */

    @Override

    public void error(SAXParseException e) throws SAXException {

        // TODO Auto-generated method stub

         System.err.println("(error)Error ("+e.getLineNumber()+","

                    +e.getColumnNumber()+") : "+e.getMessage());

    }

    /**

     * 致命性错误处理 不可恢复

     */

    @Override

    public void fatalError(SAXParseException e) throws SAXException {

        // TODO Auto-generated method stub

         System.err.println("(fatalError)FatalError ("+e.getLineNumber()+","

                    +e.getColumnNumber()+") : "+e.getMessage());

    }

    /**

     * 警告处理

     */

    @Override

    public void warning(SAXParseException e) throws SAXException {

        // TODO Auto-generated method stub

         System.err.println("(warning)("+e.getLineNumber()+","

                    +e.getColumnNumber()+") : "+e.getMessage());

    }

}

解析开始：

SAXParse.java

import java.io.File;

import java.io.FileInputStream;

import java.io.FileNotFoundException;

import java.io.IOException;

import javax.xml.parsers.ParserConfigurationException;

import javax.xml.parsers.SAXParser;

import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.InputSource;

import org.xml.sax.SAXException;

import org.xml.sax.XMLReader;

/**

 * 1.得到SAX解析器的工厂实例

 * 2.从SAX工厂实例中获得SAX解析器

 * 3.把要解析的XML文档转化为输入流，以便DOM解析器解析它

 * 4.解析XML文档

 */

public class SAXParse {

    /**

     * @param args

     */

    public static void main(String[] args) {

        // TODO Auto-generated method stub

        // 得到SAX解析工厂

        SAXParserFactory factory = SAXParserFactory.newInstance();

        // 创建解析器

        SAXParser parser =null;

        try {

            parser = factory.newSAXParser();

            XMLReader xmlReader = parser.getXMLReader();

            InputSource input = new InputSource(new FileInputStream(new File("world.xml")));

            xmlReader.setContentHandler(new MyHandler());

            xmlReader.parse(input);

        } catch (ParserConfigurationException | SAXException e) {

            // TODO Auto-generated catch block

            e.printStackTrace();

        }catch (FileNotFoundException e) {

            // TODO Auto-generated catch block

            e.printStackTrace();

        } catch (IOException e) {

            // TODO Auto-generated catch block

            e.printStackTrace();

        }  

    }

}

5.结果输出;

(setDocumentLocator)set document_locator : (lineNumber = 1,columnNumber = 1,systemId = null,publicId = null)

(startDocument)document is startting

(startElement)start element : world()

(characters)characters(2): \n\t

(startElement)start element : comuntry()

(characters)characters(3): \n\t\t

(startElement)start element : name()

(characters)characters(5): China

(endElement)end element : name()

(characters)characters(3): \n\t\t

(startElement)start element : capital()

(characters)characters(7): Beijing

(endElement)end element : capital()

(characters)characters(3): \n\t\t

(startElement)start element : population()

(characters)characters(4): 1234

(endElement)end element : population()

(characters)characters(3): \n\t\t

(startElement)start element : area()

(characters)characters(3): 960

(endElement)end element : area()

(characters)characters(2): \n\t

(endElement)end element : comuntry()

(characters)characters(2): \n\t

(startElement)start element : comuntry()

(characters)characters(3): \n\t\t

(startElement)start element : name()

(characters)characters(7): America

(endElement)end element : name()

(characters)characters(3): \n\t\t

(startElement)start element : capital()

(characters)characters(10): Washington

(endElement)end element : capital()

(characters)characters(3): \n\t\t

(startElement)start element : population()

(characters)characters(3): 234

(endElement)end element : population()

(characters)characters(3): \n\t\t

(startElement)start element : area()

(characters)characters(3): 900

(endElement)end element : area()

(characters)characters(2): \n\t

(endElement)end element : comuntry()

(characters)characters(2): \n\t

(startElement)start element : comuntry()

(characters)characters(3): \n\t\t

(startElement)start element : name()

(characters)characters(5): Japan

(endElement)end element : name()

(characters)characters(3): \n\t\t

(startElement)start element : capital()

(characters)characters(5): Tokyo

(endElement)end element : capital()

(characters)characters(3): \n\t\t

(startElement)start element : population()

(characters)characters(3): 234

(endElement)end element : population()

(characters)characters(3): \n\t\t

(startElement)start element : area()

(characters)characters(2): 60

(endElement)end element : area()

(characters)characters(2): \n\t

(endElement)end element : comuntry()

(characters)characters(2): \n\t

(startElement)start element : comuntry()

(characters)characters(3): \n\t\t

(startElement)start element : name()

(characters)characters(6): Russia

(endElement)end element : name()

(characters)characters(3): \n\t\t

(startElement)start element : capital()

(characters)characters(6): Moscow

(endElement)end element : capital()

(characters)characters(3): \n\t\t

(startElement)start element : population()

(characters)characters(2): 34

(endElement)end element : population()

(characters)characters(3): \n\t\t

(startElement)start element : area()

(characters)characters(4): 1960

(endElement)end element : area()

(characters)characters(2): \n\t

(endElement)end element : comuntry()

(characters)characters(1): \n

(endElement)end element : world()

(endDocument)doument is ended

6.SAX解析完成，这是一个很简单的解析读取过程，具体的应用需要定制。

XML文件解析之SAX解析的更多相关文章

XML 解析---dom解析和sax解析
眼下XML解析的方法主要用两种: 1.dom解析:(Document Object Model.即文档对象模型)是W3C组织推荐的解析XML的一种方式. 使用dom解析XML文档,该解析器会先把XML ...
javaweb学习总结十二(JAXP对XML文档进行SAX解析)
一:JAXP使用SAX方式解析XML文件 1:dom解析与sax解析异同点 2:sax解析特点二:代码案例 1:xml文件 <?xml version="1.0" enco ...
Android XML文档解析(一)——SAX解析
---------------------------------------------------------------------------------------------------- ...
XML解析之SAX解析技术案例
Java代码: package com.xushouwei.xml; import java.io.File; import java.io.IOException; import java.text ...
解析XML文件之使用SAM解析器
XML是一种常见的传输数据方式,所以在开发中,我们会遇到对XML文件进行解析的时候,本篇主要介绍使用SAM解析器,对XML文件进行解析. SAX解析器的长处是显而易见的.那就是SAX并不须要将全部的文 ...
经典面试题：一张表区别DOM解析和SAX解析XML
============DOM解析 vs ...
Dom4j解析和sax解析xml
xml基础知识 1)标签对大小写敏感,2)xml解析方式有两种dom解析和sax解析 3)常用的解析工具有dom的dom4j和sax的sax解析工具 4)文档声明中使用<?xml versio ...
java解析XML之DOM解析和SAX解析（包含CDATA的问题）
Dom解析功能强大,可增删改查,操作时会将XML文档读到内存,因此适用于小文档: SAX解析是从头到尾逐行逐个元素解析,修改较为不便,但适用于只读的大文档:SAX采用事件驱动的方式解析XML.如同在电 ...
XML解析(二) SAX解析
XML解析之SAX解析: SAX解析器:SAXParser类同DOM一样也在javax.xml.parsers包下,此类的实例可以从 SAXParserFactory.newSAXParser() 方 ...
解析XML文件之使用DOM解析器
在前面的文章中.介绍了使用SAX解析器对XML文件进行解析.SAX解析器的长处就是占用内存小.这篇文章主要介绍使用DOM解析器对XML文件进行解析. DOM解析器的长处可能是理解起来比較的直观,当然, ...

随机推荐

java设置北京时间的时区
java设置北京时间的时区解决方法: 设置北京时间的时区,消除时间差. TimeZone timeZone = TimeZone.getTimeZone("GMT+8"); ...
jmeter 查看结果树，获取响应体写法校验是否提取写法是否正确的方法
JSON Path Expression里面写入提出值的写法,点击Test测试提取
C#压缩打包文件
该控件是使用csharp写的,因此可以直接在dotnet环境中引用,不需要注册. 利用 SharpZipLib方便地压缩和解压缩文件最新版本的SharpZipLib(0.84)增加了很多新的功能,其中 ...
Selenium ? 也要学...!
一.selenium 简介 Selenium是ThroughtWorks公司一个强大的开源Web功能测试工具系列,包括Selenium-IDE.Selenium-RC.Selenium-Webdriv ...
php 阿里云国内短信实例
调用:先去阿里云申请短信服务 $smsArr = array( "accessKeyId" => "", // key "accessKeySe ...
python3.6-Yelp/elastalert0.2.1-elk7.2.0邮件加企业微信告警
0.修改时区(前提条件已经安装好elk7.2) rm -f /etc/localtimecp /usr/share/zoneinfo/Asia/Shanghai /etc/localtimetimed ...
php 将图片文件转成base64编码的方法
php 将图片文件转成base64编码的方法<pre><?php /** 文件转base64输出 * @param String $file 文件路径 * @return Strin ...
【转载】熟练利用google hacking来辅助我们快速渗透
转载于:https://klionsec.github.io/2014/12/14/search-hacking/?tdsourcetag=s_pcqq_aiomsg 0x01 google hack ...
Hbuider APP打包流程
1,下载HBuilder,注册并登陆.首先打开“文件”-“新建”-“移动APP”,输入“应用名称”,“位置”可以根据需要自己选择即可,“选择模板”建议选择空模板: 2,新建完成后, 在项目管理器会 ...
[转帖]新手必读，16个概念入门 Kubernetes
新手必读,16个概念入门 Kubernetes https://www.kubernetes.org.cn/5906.html 2019-09-29 22:13 中文社区分类:Kubernetes教 ...

XML文件解析之SAX解析

XML文件解析之SAX解析的更多相关文章

随机推荐

热门专题