This is a Class modeled on the PullParser module found in the Perl HTML::Parser
distribution. It moves the API from an event-based model to a token-based model. Instead of processing data
as it is passed from the parser to callbacks, a script using XML_PullParser requests "tokens"
from various "tokenizing" functions, most particularly from XML_PullParser_getToken. and
XML_PullParser_getElement.
Tokens are arrays representing XML structures, which become
available in the order in which they appear in the document being parsed.
In addtion to the tokenizers, a rich set of accessors are provided to extract data from
the elements and attributes bundled in the tokens.
There are also techniques and class methods for selecting elements and attributes, and for testing
for their position and relevancy. Finally, there are
package-level functions to set the contexts that affect the operations of the module.
XML_PullParser is not as clearly a "token" parser as
HTML::PullParser . The Perl module focuses on the
individual tag as it comes on stream, which makes it suited to large blocks
of text with a great many embedded tags, whereas XML_PullParser
is oriented towards nested structures, which makes it suited to the kinds of database
structures that much XML is used for. The current DocBook paragraph is a good example
of where the Perl module has the advantage:
Example 1
<classname>XML_PullParser </classname> is not as purely a "token" parser as
<classname>HTML::PullParser </classname>. The Perl module foucuses on the
individual tag as it comes on stream, which makes it suited to large blocks
of text with a great many embedded tags, whereas <classname>XML_PullParser </classname>
is oriented towards nested structures. . .
If Perl's HTML::PullParser were to format Example 1 ,
the <classname> tags would be announced at the points at which they occur in the stream,
and so re-casting <classname> to bold italics, as here, would be a simple matter of exchanging <b> <i>
for <classname> whenever the <classname> tag came on stream. XML_PullParser,
on the other hand, would output an entire structure enclosed by either <blockquote> or <programlisting>.
To convert the classnames to bold, it is then necessary to review this structure and apply a replacement
function like preg_replace to each element that calls for re-casting.
XML_PullParser has a function which does just this:
XML_PullParser_getTextMarkedUp.
This page is very likely being generated on the fly from the orignal XML, using
XML_PullParser_getTextMarkedUp, and certainly over the web there's no
noticeable performance defecit. Nevertheless, the strength of XML_PullParser,
is with structures like like Example 2.
Example 2
<ENTRY>
<ipaddress> 172.20.19.6 </ipaddress>
<domain> example.com </domain>
<server ip="192.168.10.1"> example_1.com </server>
<server ip="192.168.10.2"> example_2.com </server>
<server ip="192.168.10.3"> example_3.com </server>
<alias> <www.example.com </alias>
</ENTRY>
<ENTRY>
<ipaddress> 172.20.19.7 </ipaddress>
•
•
<alias> <www.example.org </alias>
</ENTRY>
In a database-like file with a set of entries like this, XML_PullParser would
loop through the file grabbing up an entire <ENTRY> structure with each iteration
and provide immediate, direct access to each of its elements. The next two sections
introduce the coding for such tasks and try to give a tase of how XML_PullParser works.