Tuesday 22 March 2011

Vendor independent XML processing

A few years ago, when I was "low in work" at the particular customer, I created a little tool set on XML processing. I called it Darwin XML Editor and XML Tester. The latter name I now find not so well choosen. So I would now call it XML Tools.

The purpose of the tool set was to figure out how to process XML in java. I already had a private xml based solution to gather my frequently used browser links into an xml file and convert it using an attached xslt stylesheet to an html page. This html page provided the links in poplists and using java script it could load the link.
It would be nice to have a hierarchical editor that showed my xml in an explorer and let me edit particular attributes in a table format.
 It turned out handy for me. But the most I had a toolkit that allowed me to easily load XML as a text file and parse it, perform xpath queries and just traverse through nodes, etc. In several projects I used it to read xml-based config files. Many of those cases I probably  could have done using Apache-commons.
But it is handy to have your own toolkit. Especially in cases when you have repetitive lists of entities with properties.

My project originally was written using the Oracle XMLParser. Handy, because I use JDeveloper and that comes with the parser. For some time now I had plans to look if I could make it vendor independent.

The results can be found here. It's open source, whatever that may mean. But it would be nice to leave the credit-references in place or refer to the credits. That would be nice for my ego...

I will provide here some highlights of the changes.
I used the following sites as my references:


Parsing

It starts with parsing. The "Oracle way" I did it was:
public void parse() {

        try {
            String text = super.getText();
            if (text != null) {
                resetError();
                DOMParser parser = new DOMParser();
                InputSource inputStream = new InputSource();
                inputStream.setCharacterStream(new StringReader(text));
                parser.parse(inputStream);
                xmlDoc = parser.getDocument();

                xmlRoot = xmlDoc.getDocumentElement();
                nodeSelected = new NodeSelected((Node)xmlRoot);
                parsed = true;
                log("File: " + super.getFilePath() + " is succesfully parsed");
            } else {
                log("File: " + super.getFilePath() + " empty or not loaded!");
            }

        } catch (SAXException e) {
            setErrorCode(EC_ERROR);
            setError("Error parsing XML: " + e.toString());
            error(e);
        } catch (IOException e) {
            setErrorCode(EC_ERROR);
            setError("Error reading XML: " + e.toString());
            error(e);
        }
    }

With the following oracle imports:
import oracle.xml.parser.v2.DOMParser;
import oracle.xml.parser.v2.XMLDocument;
import oracle.xml.parser.v2.XMLNode;
import oracle.xml.parser.v2.XSLException;

The vendor independent way I used is:
/**
     * Create a new DocumentBuilder.
     * @return
     * @throws ParserConfigurationException
     */
    private DocumentBuilder newDocBuilder() throws ParserConfigurationException {
        DocumentBuilderFactory domFactory = 
            DocumentBuilderFactory.newInstance();
        domFactory.setNamespaceAware(true); // never forget this!
        DocumentBuilder docBuilder;
        docBuilder = domFactory.newDocumentBuilder();
        return docBuilder;
    }

    /**
     * Create an InputSource from the xml-text
     * @return
     */
    public InputSource getInputSource() {
        String text = super.getText();
        InputSource inputStream = null;
        if (text != null) {
            inputStream = new InputSource();
            inputStream.setCharacterStream(new StringReader(text));
        }
        return inputStream;
    }

    /**
     * Parse the XML in the file
     */
    public void parse() {
        try {
            InputSource inputSource = getInputSource();
            if (inputSource != null) {
                resetError();
                DocumentBuilder docBuilder = newDocBuilder();
                Document doc = docBuilder.parse(inputSource);
                setDoc(doc);
                parsed = true;
                log("File: " + super.getFilePath() + " is succesfully parsed");
            } else {
                log("File: " + super.getFilePath() + " empty or not loaded!");
            }
        } catch (ParserConfigurationException e) {
            setErrorCode(EC_ERROR);
            setError("Error creating parser: " + e.toString());
            error(e);
        } catch (SAXException e) {
            setErrorCode(EC_ERROR);
            setError("Error parsing XML: " + e.toString());
            error(e);
        } catch (IOException e) {
            setErrorCode(EC_ERROR);
            setError("Error reading XML: " + e.toString());
            error(e);
        }
    }
with the following imports:
import javax.xml.namespace.QName;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

Mark that this code is part of a XMLFile class that extends a TextFile class that gives me the methods to load the file into a String attribute (text).
Here I extracted the conversion from the text attribute to an InputSource object in a separate method. Also the creation of the parser (the DocumentBuilder) I put in a seperate method. This because I also need it to be able to create an empty Document.

XPath Expressions

The "Oracle way" is pretty straight forward:

/**
     * Select Nodes using Xpath
     * @param xpath
     * @return NodeList 
     */
    public NodeList selectNodes(String xpath) throws XSLException {
        XMLDocument xmlDoc = getXmlDoc();
        NodeList nl = xmlDoc.selectNodes(xpath);
        return nl;
    }
To have it "Namespace Aware" you'll need a Namespace Resolver, which is a simple class based on a HashMap, implementing an Interface:
package com.darwinit.xmlfiles.xml;
/**
 * Namespace Resolver: Helper class to do namespace aware XPath queries. 
 *
 * @author Martien van den Akker
 * @author Darwin IT Professionals
 *
 * @remark: changed to JSE 1.4 code because of BPEL PM 10.1.2
 */
import java.util.HashMap;

import oracle.xml.parser.v2.NSResolver;

public class XMLNSResolver implements NSResolver{
  private HashMap<String, String> nsMap = new HashMap<String, String>();
    
    public XMLNSResolver() {
    }

   public void addNS(String abbrev, String namespace){
       nsMap.put(abbrev, namespace);
   }
    public String resolveNamespacePrefix(String string) {
        return nsMap.get(string);
    }
}
Then using the namespace resolver the code would be something like:

/**
     * Select nodes using Xpath with Namespace included
     * @param xpath
     * @return Nodelist 
     */
    public NodeList selectNodesNS(String xpath) throws XSLException {
    XMLDocument xmlDoc = getXmlDoc();
    XMLNSResolver nsRes = getNsRes();
        NodeList nl = this.xmlDoc.selectNodes(xpath, nsRes);
        return nl;
    }

The vendor indepent way is a little more complex. But it gives you some more flexibility.
/**
     * Evaluate xpath expression 
     * 
     * @param xpathExpr the xpath expression
     * @param returnType the return type that is expected.
     * http://www.ibm.com/developerworks/library/x-javaxpathapi.html:
     * XPathConstants.NODESET => node-set maps to an org.w3c.dom.NodeList
     * XPathConstants.BOOLEAN => boolean maps to a java.lang.Boolean
     * XPathConstants.NUMBER => number maps to a java.lang.Double
     * XPathConstants.STRING => string maps to a java.lang.String
     * XPathConstants.NODE
     * 
     * @throws XPathExpressionException
     */
    public   Object evaluate(String xpathExpr, 
                    QName returnType) throws XPathExpressionException {
        XPathFactory factory = XPathFactory.newInstance();
        XPath xpath = factory.newXPath();
        XMLNSResolver nsRes = getNsRes();
        if (nsRes != null) {
            xpath.setNamespaceContext(nsRes);
        }
        XPathExpression expr = xpath.compile(xpathExpr);
        Document doc = getDoc();
        Object resultObj = expr.evaluate(doc, returnType);
        return resultObj;
    }

    /**
     * Evaluate xpath expression to a double (when a number is expected from the xpath expression)
     * 
     * @throws XPathExpressionException
     */
    public Double evaluateDouble(String xpathExpr) throws XPathExpressionException {
        Double result = null;
        Object resultObj = evaluate(xpathExpr, XPathConstants.NUMBER);
        if (resultObj instanceof Double) {
            result = (Double)resultObj;
        }
        return result;
    }

    /**
     * Select Nodes using Xpath
     * 
     * @param xpath
     * @return NodeList
     * @throws XPathExpressionException 
     */
    public NodeList selectNodes(String xpath) throws XPathExpressionException {
        NodeList nl = (NodeList)evaluate(xpath, XPathConstants.NODESET);
        return nl;
    }
The Namespace resolver changed slightly. It actually implements another interface:
But the idea is the same:
package com.darwinit.xmlfiles.xml;

import java.util.HashMap;

import java.util.Iterator;

import javax.xml.namespace.NamespaceContext;


/**
 * Namespace Resolver: Helper class to do namespace aware XPath queries. 
 *
 * @author Martien van den Akker
 * @author Darwin IT Professionals
 *
 */
public class XMLNSResolver implements NamespaceContext {
    private HashMap<String, String> nsMap = new HashMap<String, String>();
   /**
     * Constructor
     */
    public XMLNSResolver() {
    }

    /**
     * Add a Namespace
     * @param prefix
     * @param namespace
     */
    public void addNS(String prefix, String namespace) {
        nsMap.put(prefix, namespace);
    }

    /**
     * Resolve a namespace from a prefix
     * @param prefix
     * @return
     */
    public String resolveNamespacePrefix(String prefix) {
        return nsMap.get(prefix);
    }

    /**
     * Resolve a namespace from a prefix
     * @param prefix
     * @return
     */
    public String getNamespaceURI(String prefix) {
        return resolveNamespacePrefix(prefix);
    }

    /**
     * Get the prefix that is registered for a NamespaceURI 
     * However, not necessary for xpath processing
     * @param namespaceURI
     * @return
     */
    public String getPrefix(String namespaceURI) {
        throw new UnsupportedOperationException();
    }

    /**
     * Get an iterator with the prefix registered for a NamespaceURI
     * @param namespaceURI
     * @return
     */
    public Iterator getPrefixes(String namespaceURI) {
        return null;
    }
}


The differences are in the fact that you first have to 'compile' an xpath expression. And that the evaluation of the XPath on a Document expects you to provide the expected result datatype. (see the comments of the evaluate method).
Then it will result an Object that you have to cast to the particular Java Class that corresponds with the Result XML DataType.

XSLT

The last thing is transforming XSLT. The "Oracle Way" I used was:

package com.darwinit.xmlfiles.xml;

import java.io.IOException;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.HashMap;
import java.util.Hashtable;

import oracle.xml.parser.v2.XMLDocument;
import oracle.xml.parser.v2.XSLException;
import oracle.xml.parser.v2.XSLProcessor;
import oracle.xml.parser.v2.XSLStylesheet;


public class XmlTransformer {
    private void pl(String text) {
        System.out.println(text);
    }

    public XmlTransformer() {
    }

    public String transform(XMLDocument xslDoc, XMLDocument xmlDoc) {     
     return transform (xslDoc, xmlDoc, null);
    }    
    
    /**
     * Transform the xmlDoc using xslDoc into String
     * @param xslDoc
     * @param xmlDoc
     * @param parameters contains parameter that are passed to the xslt
     * @return String output
     */
    public String transform(XMLDocument xslDoc, XMLDocument xmlDoc
                           , Hashtable<String, String> parameters) {
        XSLProcessor xslProcessor = new XSLProcessor();
        XSLStylesheet xslt;
        String result = "";
        try {
            StringWriter sw = new StringWriter();
            PrintWriter pw = new PrintWriter(sw);
            xslt = xslProcessor.newXSLStylesheet(xslDoc);
            
            xslProcessor.setXSLTVersion(XSLProcessor.XSLT20);
            xslProcessor.showWarnings(true);
            xslProcessor.setErrorStream(System.err);
            
            if (parameters != null) {
            
             for (String key : parameters.keySet()) {
              String value = parameters.get(key);
              xslProcessor.setParam("", key, value);
             }
            }                      
            
            xslProcessor.processXSL(xslt, xmlDoc, pw);
            pw.flush();
            pw.close();
            sw.close();
            result = sw.toString();
        } catch (XSLException e) {

            pl(e.toString());
        } catch (IOException e) {
            pl(e.toString());
        }
        return result;
    }   
    /**
     * Clean up text from XML Leftovers
     * @param text
     * @return
     */
    public String cleanText(final String text){
        String result = text;
        result = result.replaceAll("&lt;","<");
        return result;
    }
    /**
     * Transfomr the xmlDoc using xslDoc into String
     * Cleanup XML code leftovers
     * @param xslDoc
     * @param xmlDoc
     * @return String output
     */
    public String transform2Ascii(XMLDocument xslDoc, XMLDocument xmlDoc) {
      String result = transform(xslDoc, xmlDoc);
      result = cleanText(result);
      return result;
    }

}

And the vendor indepent version of the same class:
package com.darwinit.xmlfiles.xml;

import java.io.IOException;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.Hashtable;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;

import com.darwinit.xmlfiles.log.LocalStaticLogger;

/**
 * XmlTransformer: Class implementing functionality to Transform XML using XSLT.
 * 
 * See also http://www.ling.helsinki.fi/kit/2004k/ctl257/JavaXSLT/Ch05.html
 * 
 * @author Martien van den Akker
 * @author Darwin IT Professionals
 */

public abstract class XmlTransformer {
 public static final String className = "XmlTransformer";
 private static LocalStaticLogger lgr;

 /**
  * Transform the xmlDoc using xslDoc into String
  * 
  * @param xslDoc
  * @param xmlDoc
  * @return
  */
 public static String transform(Document xslDoc, Document xmlDoc) {
  return transform(xslDoc, xmlDoc, null);
 }

 /**
  * Transform the xmlDoc using xslDoc into String, with parameters
  * 
  * @param xslDoc
  * @param xmlDoc
  * @param parameters
  *            contains parameter that are passed to the xslt
  * @return String output
  */
 @SuppressWarnings("static-access")
 public static String transform(Document xslDoc, Document xmlDoc,
   Hashtable<String, String> parameters) {
  final String methodName = "transform";
  lgr.logStart(className, methodName);
  String result = "";
  try {
   DOMSource xsltSource = new DOMSource(xslDoc);
   DOMSource xmlSource = new DOMSource(xmlDoc);
   StringWriter sw = new StringWriter();
   PrintWriter pw = new PrintWriter(sw);
   StreamResult streamResult = new StreamResult(pw);
   // Get the transformer factory
   TransformerFactory transFact = TransformerFactory.newInstance();
   // Get a transformer for this particular stylesheet
   Transformer trans = transFact.newTransformer(xsltSource);
   // Add parameters
   if (parameters != null) {
    for (String key : parameters.keySet()) {
     String value = parameters.get(key);
     trans.setParameter(key, value);
    }
   }
   // Do the transformation
   trans.transform(xmlSource, streamResult);
   pw.flush();
   pw.close();
   sw.close();
   // Get the result string
   result = sw.toString();
  } catch (IOException e) {
   lgr.error(className, methodName, e);
  } catch (TransformerException e) {
   lgr.error(className, methodName, e);
  }
  lgr.logEnd(className, methodName);
  return result;
 }

 /**
  * Clean up text from XML Leftovers
  * 
  * @param text
  * @return
  */
 public static String cleanText(final String text) {
  String result = text;
  result = result.replaceAll("&lt;", "<");
  return result;
 }

 /**
  * Transform the xmlDoc using xslDoc into String Cleanup XML code leftovers
  * 
  * @param xslDoc
  * @param xmlDoc
  * @return String output
  */
 public static String transform2Ascii(Document xslDoc, Document xmlDoc) {
  String result = transform(xslDoc, xmlDoc);
  result = cleanText(result);
  return result;
 }

}
The code is not much different. Most important is that you have to wrap the input, xslt and result objects into particular interface objects. The examples I found worked with File(s) as Source and Result objects. But I wanted Document objects for the XLST and Input objects. And I want to have the result into a String, to be able to process it in anyway I want.

I added also the possiblity to pass XSLT parameters. But I got it from a project I worked on and see that I haven't put that in the class in my XmlFiles project.

Conclusion

I hope this helps in understanding how to work with XML parsers. Ofcourse all is rudimentary (I have several large books on the subject in the cupboard). But I have pretty much enough with the above. The code in my XmlFiles project is basically around this code. On top of these methods I have several other helper methods to do traverse nodes or to get it in a particular way.

I thought it might be helpfull to put the Oracle Native methods side by side with the vendor-independent (JAXP) code. Doing so I would not state that one way is better than the other. The vendor-independent code have clearly the advantage that you can simply replace the parser, just by changing the class path. The code will work with the Oracle XML Parser just as good as with the Apache Xerces-J parser.
The reason I did it the Oracle way before was just because I ran into these examples first, the time I started this. Probably there still are good reasons to use the native API's.

Oracle also has a Pl/Sql Wrapper around the XML Parser. So the Oracle XML Parsing code has a Pl/Sql counter part. Might be nice to do this in a Pl/Sql way sometime.
But as the code above isn't rocket science, it isn't new also. XML Parsing in Pl/Sql can be done from Oracle 8i onwards. Also roughly ten years already.

1 comment :

Martien van den Akker said...

Hi Monica,

Thanks. You could look into my earlier blogs on XML, see tags.

Regards,
Martien