public class XmlParser
extends Object
implements ContentHandler
A helper class for parsing XML into a tree of Node instances for a simple way of processing XML. This parser does not preserve the XML InfoSet - if that's what you need try using W3C DOM, dom4j, JDOM, XOM etc. This parser ignores comments and processing instructions and converts the XML into a Node for each element in the XML with attributes and child Nodes and Strings. This simple model is sufficient for most simple use cases of processing XML. Parsing is eager: each parse operation consumes the SAX event stream and builds a complete Node tree before returning.
Example usage:
import groovy.xml.XmlParser
def xml = '<root><one a1="uno!"/><two>Some text!</two></root>'
def rootNode = new XmlParser().parseText(xml)
assert rootNode.name() == 'root'
assert rootNode.one[0].@a1 == 'uno!'
assert rootNode.two.text() == 'Some text!'
rootNode.children().each { assert it.name() in ['one','two'] }
| Constructor and description |
|---|
XmlParser()Creates a non-validating and namespace-aware XmlParser which does not allow DOCTYPE declarations in documents. |
XmlParser(boolean validating, boolean namespaceAware)Creates a XmlParser which does not allow DOCTYPE declarations in documents. |
XmlParser(boolean validating, boolean namespaceAware, boolean allowDocTypeDeclaration)Creates a XmlParser. |
XmlParser(XMLReader reader)Creates a parser backed by the supplied SAX reader. |
XmlParser(SAXParser parser)Creates a parser backed by the supplied SAX parser. |
| Type Params | Return Type | Name and description |
|---|---|---|
|
protected void |
addTextToNode()Transfers buffered character data into the current node when an element boundary is reached. |
|
public void |
characters(char[] buffer, int start, int length)Buffers character data until the enclosing element boundary is reached. |
|
protected Node |
createNode(Node parent, Object name, Map attributes)Creates a new node with the given parent, name, and attributes. |
|
public void |
endDocument()Completes the current parse and clears the internal element stack. |
|
public void |
endElement(String namespaceURI, String localName, String qName)Flushes buffered text and pops the current element when its end tag is seen. |
|
public void |
endPrefixMapping(String prefix)Receives namespace prefix scope end notifications. |
|
public DTDHandler |
getDTDHandler()Returns the SAX DTD handler configured on the underlying reader. |
|
public Locator |
getDocumentLocator()Returns the document locator last provided by SAX. |
|
protected Object |
getElementName(String namespaceURI, String localName, String qName)Return a name given the namespaceURI, localName and qName. |
|
public EntityResolver |
getEntityResolver()Returns the SAX entity resolver configured on the underlying reader. |
|
public ErrorHandler |
getErrorHandler()Returns the SAX error handler configured on the underlying reader. |
|
public boolean |
getFeature(String uri)Looks up a SAX feature on the underlying reader. |
|
public Object |
getProperty(String uri)Looks up a SAX property on the underlying reader. |
|
protected XMLReader |
getXMLReader()Returns the configured XML reader after registering this parser as its content handler. |
|
public void |
ignorableWhitespace(char[] buffer, int start, int len)Receives ignorable whitespace and optionally preserves it as text content. |
|
public boolean |
isAllowDocTypeDeclaration()Determine if DOCTYPE declarations are allowed. |
|
public boolean |
isKeepIgnorableWhitespace()Returns the current keep ignorable whitespace setting. |
|
public boolean |
isNamespaceAware()Determine if namespace handling is enabled. |
|
public boolean |
isTrimWhitespace()Returns the current trim whitespace setting. |
|
public boolean |
isValidating()Determine if the parser validates documents. |
|
public Node |
parse(File file)Parses the content of the given file as XML turning it into a tree of Nodes. |
|
public Node |
parse(Path path)Parses the content of the file at the given path as XML turning it into a tree of Nodes. |
|
public Node |
parse(InputSource input)Parse the content of the specified input source into a tree of Nodes. |
|
public Node |
parse(InputStream input)Parse the content of the specified input stream into a tree of Nodes. |
|
public Node |
parse(Reader in)Parse the content of the specified reader into a tree of Nodes. |
|
public Node |
parse(String uri)Parse the content of the specified URI into a tree of Nodes. |
<T> |
public T |
parseAs(Class<T> type, Reader reader)Parse XML from a reader into a typed object. |
<T> |
public T |
parseAs(Class<T> type, InputStream stream)Parse XML from an input stream into a typed object. |
<T> |
public T |
parseAs(Class<T> type, File file)Parse XML from a file into a typed object. |
<T> |
public T |
parseAs(Class<T> type, Path path)Parse XML from a path into a typed object. |
|
public Node |
parseText(String text)A helper method to parse the given text as XML. |
<T> |
public T |
parseTextAs(Class<T> type, String text)Parse the content of the specified XML text into a typed object. |
|
public void |
processingInstruction(String target, String data)Receives processing instruction callbacks. |
|
public void |
setAllowDocTypeDeclaration(boolean allowDocTypeDeclaration)Enable and/or disable DOCTYPE declaration support. |
|
public void |
setDTDHandler(DTDHandler dtdHandler)Sets the SAX DTD handler on the underlying reader. |
|
public void |
setDocumentLocator(Locator locator)Stores the locator supplied by SAX for later diagnostics or subclass use. |
|
public void |
setEntityResolver(EntityResolver entityResolver)Sets the SAX entity resolver on the underlying reader. |
|
public void |
setErrorHandler(ErrorHandler errorHandler)Sets the SAX error handler on the underlying reader. |
|
public void |
setFeature(String uri, boolean value)Enables or disables a SAX feature on the underlying reader. |
|
public void |
setKeepIgnorableWhitespace(boolean keepIgnorableWhitespace)Sets the keep ignorable whitespace setting value. |
|
public void |
setNamespaceAware(boolean namespaceAware)Enable and/or disable namespace handling. |
|
public void |
setProperty(String uri, Object value)Sets a SAX property on the underlying reader. |
|
public void |
setTrimWhitespace(boolean trimWhitespace)Sets the trim whitespace setting value. |
|
public void |
setValidating(boolean validating)Enable and/or disable validation. |
|
public void |
skippedEntity(String name)Receives skipped entity notifications. |
|
public void |
startDocument()Resets the current root node before SAX events for a new document begin. |
|
public void |
startElement(String namespaceURI, String localName, String qName, Attributes list)Creates a new Node for the current element and pushes it onto the parse stack. |
|
public void |
startPrefixMapping(String prefix, String namespaceURI)Receives namespace prefix mapping notifications. |
Creates a non-validating and namespace-aware XmlParser which does not allow DOCTYPE declarations in documents.
Parser options can be configured via setters before the first parse call:
// Using Groovy named parameters:
def parser = new XmlParser(namespaceAware: false, trimWhitespace: true)
Creates a XmlParser which does not allow DOCTYPE declarations in documents.
validating - true if the parser should validate documents as they are parsed; false otherwise.namespaceAware - true if the parser should provide support for XML namespaces; false otherwise. Creates a XmlParser.
validating - true if the parser should validate documents as they are parsed; false otherwise.namespaceAware - true if the parser should provide support for XML namespaces; false otherwise.allowDocTypeDeclaration - true if the parser should provide support for DOCTYPE declarations; false otherwise.Creates a parser backed by the supplied SAX reader.
reader - the XML reader whose features, properties, and handlers will be usedTransfers buffered character data into the current node when an element boundary is reached. Subclasses may override to customize text normalization or whitespace preservation during parsing.
Buffers character data until the enclosing element boundary is reached.
buffer - the character buffer supplied by SAXstart - the start offset in the bufferlength - the number of characters to read Creates a new node with the given parent, name, and attributes. The
default implementation returns an instance of
groovy.util.Node.
parent - the parent node, or null if the node being created is the
root nodename - an Object representing the name of the node (typically
an instance of QName)attributes - a Map of attribute names to attribute valuesCompletes the current parse and clears the internal element stack.
Flushes buffered text and pops the current element when its end tag is seen.
namespaceURI - the namespace URI, or an empty string if namespaces are unavailablelocalName - the local element nameqName - the qualified element name as reported by SAXReceives namespace prefix scope end notifications. The default implementation performs no action.
prefix - the prefix leaving scopeReturns the SAX DTD handler configured on the underlying reader.
null if none has been setReturns the document locator last provided by SAX.
null if parsing has not startedReturn a name given the namespaceURI, localName and qName.
namespaceURI - the namespace URIlocalName - the local nameqName - the qualified nameReturns the SAX entity resolver configured on the underlying reader.
null if none has been setReturns the SAX error handler configured on the underlying reader.
null if none has been setLooks up a SAX feature on the underlying reader.
uri - the fully qualified SAX feature URItrue if the feature is enabledLooks up a SAX property on the underlying reader.
uri - the fully qualified SAX property URIReturns the configured XML reader after registering this parser as its content handler. Subclasses may override to customize reader preparation before parsing begins.
Receives ignorable whitespace and optionally preserves it as text content.
buffer - the character buffer supplied by SAXstart - the start offset in the bufferlen - the number of characters to readDetermine if DOCTYPE declarations are allowed.
Returns the current keep ignorable whitespace setting.
Determine if namespace handling is enabled.
Returns the current trim whitespace setting.
Determine if the parser validates documents.
Parses the content of the given file as XML turning it into a tree of Nodes.
file - the File containing the XML to be parsedParses the content of the file at the given path as XML turning it into a tree of Nodes.
path - the path of the File containing the XML to be parsedParse the content of the specified input source into a tree of Nodes.
input - the InputSource for the XML to parseParse the content of the specified input stream into a tree of Nodes.
Note that using this method will not provide the parser with any URI for which to find DTDs etc
input - an InputStream containing the XML to be parsedParse the content of the specified reader into a tree of Nodes.
Note that using this method will not provide the parser with any URI for which to find DTDs etc
in - a Reader to read the XML to be parsedParse the content of the specified URI into a tree of Nodes.
uri - a String containing a URI pointing to the XML to be parsedParse XML from a reader into a typed object. Requires jackson-databind on the classpath for type conversion.
type - the target typereader - the reader of XMLT - the target typeParse XML from an input stream into a typed object. Requires jackson-databind on the classpath for type conversion.
type - the target typestream - the input stream of XMLT - the target typeParse XML from a file into a typed object. Requires jackson-databind on the classpath for type conversion.
type - the target typefile - the XML fileT - the target typeParse XML from a path into a typed object. Requires jackson-databind on the classpath for type conversion.
type - the target typepath - the path to the XML fileT - the target typeA helper method to parse the given text as XML.
text - the XML text to parse Parse the content of the specified XML text into a typed object.
Requires jackson-databind on the classpath for type conversion.
Supports @JsonProperty and @JsonFormat annotations.
type - the target typetext - the XML text to parseT - the target typeReceives processing instruction callbacks. The default implementation ignores processing instructions.
target - the processing instruction targetdata - the processing instruction dataEnable and/or disable DOCTYPE declaration support. Must be set before the first parse call.
allowDocTypeDeclaration - the new desired valueSets the SAX DTD handler on the underlying reader.
dtdHandler - the DTD handler to receive notation and unparsed entity callbacksStores the locator supplied by SAX for later diagnostics or subclass use.
locator - the document locator for the current parseSets the SAX entity resolver on the underlying reader.
entityResolver - the resolver to use for external entitiesSets the SAX error handler on the underlying reader.
errorHandler - the handler to receive parser warnings and errorsEnables or disables a SAX feature on the underlying reader.
uri - the fully qualified SAX feature URIvalue - the value to applySets the keep ignorable whitespace setting value.
keepIgnorableWhitespace - the desired new valueEnable and/or disable namespace handling. Must be set before the first parse call.
namespaceAware - the new desired valueSets a SAX property on the underlying reader.
uri - the fully qualified SAX property URIvalue - the value to applySets the trim whitespace setting value.
trimWhitespace - the desired setting valueEnable and/or disable validation. Must be set before the first parse call.
validating - the new desired valueReceives skipped entity notifications. The default implementation performs no action.
name - the skipped entity nameResets the current root node before SAX events for a new document begin.
Creates a new Node for the current element and pushes it onto the parse stack.
namespaceURI - the namespace URI, or an empty string if namespaces are unavailablelocalName - the local element nameqName - the qualified element name as reported by SAXlist - the element attributesReceives namespace prefix mapping notifications. The default implementation does not retain separate prefix state.
prefix - the declared prefixnamespaceURI - the namespace URI bound to the prefixCopyright © 2003-2026 The Apache Software Foundation. All rights reserved.