Wednesday 13 April 2011

Line and Column Numbers in an XML DOM Document.

If an application reads data from XML configuration files, it can be useful to give the filename, line number and column if a problem data is found with the data.

You might expect all XML parsers to provide access to this sort of information as a matter of course.  The standard SAX parser does (as we will see), but the DOM parser does not, unless the XML actually fails to parse.  The standard DOM parser probably uses a SAX parser under the hood, but the API denies us access to it.

Switching from DOM to SAX is a high price to pay to make your error reporting better.  Beside, you may need to use DOM tools such as XSLT.  You could switch to using a thirdy party parser, access the underlying SAX parser in some sneaky non-standard and unsupported way, or you can just use the following trick.

Use the java.xml.transform API.

The great thing about the XSL Transformation API is that it will use a number of different types of input sources and output destinations.  Possible options are:
  • input and output streams
  • files
  • SAX parser input sources
  • DOM fragments and documents
  • JAXB object models

So we can read XML using a SAX parser and get a resulting DOM.

In our case we don't actually want to apply an XSL transformation.  However, the API will provide us with a Transformer that just copies to the output form without altering the data.  So here we have a tool that can convert XML from one form to another. The following examples converts XML from a file into a DOM and back.

Reading an XML file into a DOM:
    TransformerFactory transformerFactory
            = TransformerFactory.newInstance();
    // Do not share transformers between threads
    Transformer nullTransformer = transformerFactory.newTransformer();

    Source fileSource = new StreamSource(new File("input.xml"));
    DOMResult domResult = new DOMResult();
    nullTransformer.transform(fileSource, domResult);

    Document dom = (Document) domResult.getNode();
Writing an XML DOM to a file:
    Source domSource = new DOMSource(dom);
    Result fileResult = new StreamResult(new File("output.xml"));
    nullTransformer.transform(domSource, fileResult);

So how do we obtain the line number information for DOM nodes?

The trick is to use a SAX parser and attach the location infomation it provides to the created element nodes as they are added to the DOM.  Here is a SAX filter that does exactly this:
public class LocationAnnotator extends XMLFilterImpl {

    private Locator locator;
    private Element lastAddedElement;
    private Stack<Locator> locatorStack = new Stack<Locator>();
    private UserDataHandler dataHandler = new LocationDataHandler();

    LocationAnnotator(XMLReader xmlReader, Document dom) {
        super(xmlReader);

        // Add listener to DOM, so we know which node was added.
        EventListener modListener = new EventListener() {
            @Override
            public void handleEvent(Event e) {
                EventTarget target = ((MutationEvent) e).getTarget();
                lastAddedElement = (Element) target;
            }
        };
        ((EventTarget) dom).addEventListener("DOMNodeInserted",
                modListener, true);
    }

    @Override
    public void setDocumentLocator(Locator locator) {
        super.setDocumentLocator(locator);
        this.locator = locator;
    }

    @Override
    public void startElement(String uri, String localName,
            String qName, Attributes atts) throws SAXException {
        super.startElement(uri, localName, qName, atts);

        // Keep snapshot of start location,
        // for later when end of element is found.
        locatorStack.push(new LocatorImpl(locator));
    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {

        // Mutation event fired by the adding of element end,
        // and so lastAddedElement will be set.
        super.endElement(uri, localName, qName);
      
        if (locatorStack.size() > 0) {
            Locator startLocator = locatorStack.pop();
          
            LocationData location = new LocationData(
                    startLocator.getSystemId(),
                    startLocator.getLineNumber(),
                    startLocator.getColumnNumber(),
                    locator.getLineNumber(),
                    locator.getColumnNumber());
          
            lastAddedElement.setUserData(
                    LocationData.LOCATION_DATA_KEY, location,
                    dataHandler);
        }
    }

    // Ensure location data copied to any new DOM node.
    private class LocationDataHandler implements UserDataHandler {

        @Override
        public void handle(short operation, String key, Object data,
                Node src, Node dst) {
          
            if (src != null && dst != null) {
                LocationData locatonData = (LocationData)
                        src.getUserData(LocationData.LOCATION_DATA_KEY);
              
                if (locatonData != null) {
                    dst.setUserData(LocationData.LOCATION_DATA_KEY,
                            locatonData, dataHandler);
                }
            }
        }
    }
}
Next the LocationData objects that the filter attaches to each DOM element node.
public class LocationData {

    public static final String LOCATION_DATA_KEY = "locationDataKey";

    private final String systemId;
    private final int startLine;
    private final int startColumn;
    private final int endLine;
    private final int endColumn;

    public LocationData(String systemId, int startLine,
            int startColumn, int endLine, int endColumn) {
        super();
        this.systemId = systemId;
        this.startLine = startLine;
        this.startColumn = startColumn;
        this.endLine = endLine;
        this.endColumn = endColumn;
    }

    public String getSystemId() {
        return systemId;
    }

    public int getStartLine() {
        return startLine;
    }

    public int getStartColumn() {
        return startColumn;
    }

    public int getEndLine() {
        return endLine;
    }

    public int getEndColumn() {
        return endColumn;
    }

    @Override
    public String toString() {
        return getSystemId() + "[line " + startLine + ":"
                + startColumn + " to line " + endLine + ":"
                + endColumn + "]";
    }
}
The final piece of code shows how to wire up all the pieces:
    /*
     * During application startup
     */
    DocumentBuilderFactory documentBuilderFactory
            = DocumentBuilderFactory.newInstance();
    TransformerFactory transformerFactory
            = TransformerFactory.newInstance();
    Transformer nullTransformer
            = transformerFactory.newTransformer();

    /*
     * Create an empty document to be populated within a DOMResult.
     */
    DocumentBuilder docBuilder
            = documentBuilderFactory.newDocumentBuilder();
    Document doc = docBuilder.newDocument();
    DOMResult domResult = new DOMResult(doc);

    /*
     * Create SAX parser/XMLReader that will parse XML. If factory
     * options are not required then this can be short cut by:
     *      xmlReader = XMLReaderFactory.createXMLReader();
     */
    SAXParserFactory saxParserFactory
            = SAXParserFactory.newInstance();
    // saxParserFactory.setNamespaceAware(true);
    // saxParserFactory.setValidating(true);
    SAXParser saxParser = saxParserFactory.newSAXParser();
    XMLReader xmlReader = saxParser.getXMLReader();

    /*
     * Create our filter to wrap the SAX parser, that captures the
     * locations of elements and annotates their nodes as they are
     * inserted into the DOM.
     */
    LocationAnnotator locationAnnotator
            = new LocationAnnotator(xmlReader, doc);

    /*
     * Create the SAXSource to use the annotator.
     */
    String systemId = new File("example.xml").getAbsolutePath();
    InputSource inputSource = new InputSource(systemId);
    SAXSource saxSource
            = new SAXSource(locationAnnotator, inputSource);

    /*
     * Finally read the XML into the DOM.
     */
    nullTransformer.transform(saxSource, domResult);

    /*
     * Find one of the element nodes in our DOM and output the location
     * information.
     */
    Node n = doc.getElementsByTagName("title").item(0);
    LocationData locationData = (LocationData)
            n.getUserData(LocationData.LOCATION_DATA_KEY);
    System.out.println(locationData);

Although XML files can include other XML files by enabling XInclude on the SAXParserFactory, this does not currently give correct location within included files.  See XERCESJ-1247.   

3 comments:

  1. Thanks for this code as it proved to be very useful. Upon some testing I saw that it doesn't attach location data to non-child nodes. I addressed this by replacing lastAddedElement with a stack instead.

    public class LocationAnnotator extends XMLFilterImpl {

    private Locator locator;
    private Stack<Locator> locatorStack = new Stack<Locator>();
    private Stack<Element> elementStack = new Stack<Element>();
    private UserDataHandler dataHandler = new LocationDataHandler();

    LocationAnnotator(XMLReader xmlReader, Document dom) {
    super(xmlReader);

    // Add listener to DOM, so we know which node was added.
    EventListener modListener = new EventListener() {
    @Override
    public void handleEvent(Event e) {
    EventTarget target = e.getTarget();
    elementStack.push((Element) target);
    }
    };
    ((EventTarget) dom).addEventListener("DOMNodeInserted", modListener, true);
    }
    ...
    }

    Then in endElement(), also pop off the elementStack to give you the Element which should have the location data attached.

    ReplyDelete
  2. Very elegant workaround.
    Thanx.

    When we call doc.adoptNode(node), the UserDataHandler is called only
    for the node object. So the subtree line number information is lost.
    I changed by UserDataHandler to warn the developer of that.
    Using doc.importNode(node,true) has the desired effect for me.

    userDataHandler = new UserDataHandler()
    {
    @Override
    public void handle(short operation, String key, Object data, Node src, Node dest)
    {
    switch( operation )
    {
    case NODE_ADOPTED:
    src.setUserData(key, data, this);
    throw new RuntimeException( "When using doc.adoptNode(node), userData will be lost, use node = doc.importNode(node,true) instead" );

    case NODE_IMPORTED:
    case NODE_CLONED:
    case NODE_RENAMED:
    dest.setUserData(key, data, this);
    break;

    case NODE_DELETED:
    break;
    }
    }
    };

    ReplyDelete