January 16, 2008

An Introduction to Streaming Transformations for XML

There is an interesting acticle by Oliver Becker, Paul Brown and Petr Cimprich about
Streaming Transformations for XML.

...XML transformation language that operates on streams of SAX events. STX resembles XSLT 1.0, the tree-driven transformation language for XML, but STX offers unique features and advantages for some applications.

XSLT's popularity has grown over the past three years, both aiding and riding on the adoption of XML. In comparison to API-level programming with Document Object Model (DOM), XSLT provides a loosely-typed, declarative environment tailored for tree-oriented transformation of XML documents, which has achieved wide adoption as a general-purpose XML manipulation tool despite the proscription:

...XSLT is not intended as a completely general-purpose XML transformation language. Rather it is designed primarily for the kinds of transformations that are needed when XSLT is used as part of XSL. — XSLT 1.0
DOM versus SAX
"Which is better, DOM or SAX?" is a common question for newcomers on XML-related discussion lists. (And, one could legimately also ask "DOM or JDOM or DOM4J or XOM?" and "SAX or XNI"?)

DOM provides an overall view of an XML document through tree traversal and manipulation. DOM is heavyweight, however, in that it typically imposes a memory footprint of around five times the size of the underlying XML text for simple documents. DOM also imposes a significant time overhead for creating the necessary objects.

SAX provides a sequential view of an XML document through a stream of events. SAX-based programs typically maintain some amount of state information that encapsulates already-received events, but SAX processing requires a negligible amount of memory (typically only the representation of the current event and the buffer for the parser).

SAX is the event-oriented sibling of the DOM API. (See the sidebar for a short discussion of DOM versus SAX.) STX provides a streaming analog for XSLT by adopting some of the now familiar concepts from XSLT (e.g., matching based on templates and an XPath 1.0-like expression language) but using SAX as the underlying interface to the XML document. In line with the proscription about XSLT, STX is neither a general purpose XML transformation language, nor is it an attempt to improve, extend, or replace XSLT.

Like SAX, STX is a completely free, grassroots effort by the XML community, initiated by Petr Cimprich; the specification and a mailing list are hosted on SourceForge. The current version of the STX specification contains a list of other contributors. There are currently two STX processor implementations:

  • Joost, a Java-based processor by Oliver Becker
  • STX::XML, a Perl-based processor by Petr Cimprich.

[Read full article from XML.com]

No comments: