There are two implementations of
XML_PullParser , one for use with files, the other
for use with documents presented to the constructor as strings:
resource XML_PullParser(string $filename, [array $tags], [array $child_tags]);
resource XML_PullParser_doc(string $doc, [array $tags], [array $child_tags]);
Both implementations expect two arrays,
$tags and
$child_tags, which may either be
passed into the constructor or predeclared using two utility functions:
array XML_PullParser_declareElements(mixed $tags)
array XML_PullParser_declareChildElements(mixed $tags)
Keeping to the
DNS
example, we would have something like this:
Listing 5
1. $tags = array("entry");
2. $child_tags = array("ipaddress","server", "domain", "alias");
3. $parser = new XML_PullParser("DNS.xml",$tags,$child_tags);
It's important to understand the relationship of the
$tags array to the
$child_tags array. Both act as selectors, identifying elements of interest.
The elements selected by the
$tags array are returned by
XML_PullParser_getToken,
while those selected by the
$child_tags array are returned by
XML_PullParser_getElement . The tokens returned by these functions are arrays
mapped to the structure of each selected element--this includes all children, character data
and attributes. The order in which elements are declared in the
$tags and
$child_tags arrays have no bearing on the order in which the tokens are returned:
as explained below, tokens are returned in document order.
Generally speaking, the $tags are the parent elements and the $child_tags
are their children. That's the case in the DNS example, where
entry is the parent to all of the other elements. But this doesn't prevent the
$child_tags from themselves having children.
The most important consideration is that a child element and its parent
cannot both be declared in the
$tags array. More precisely, this
prohibition applies to a parent and any of its descendents, i.e. its children's children.
Such a declaration causes the parsing process to become corrupted. The same
prohibition does not apply to the
$child_tags , which allows for the
declaration of both parents and children.
When
XML_PullParser_getToken is called, the resulting token is stored internally in the array
$converted_token. 1
It's the value of this array that is returned by
XML_PullParser_getToken.
When
XML_PullParser_getElement is called, it draws on the same raw data structure out of which
the
$converted_token is constructed and stores the result internally in the array
$current_element.
It's the value of this array that is returned by
XML_PullParser_getElement.
When
XML_PullParser_getElement is called it's passed a single parameter, the
$name
of an element. The array it returns consists of all the elements of
$name
found in
$converted_token.
The element specified by $name has to have been declared in the
$child_tags array.
2
A call to XML_PullParser_getToken is the prerequisite for all the other
tokenizing functions, including XML_PullParser_getElement.
The reason for this is simple: the raw data structure from which these other arrays are constructed,
either directly or indirectly, does
not come into existence until XML_PullParser_getToken is called. This data
structure reflects the document order of elements. Therefore,
XML_PullParser_getToken returns its tokens in document order, and independently
of the order in which the tokens are declared in the $tags array.
A final observation. The $child_tags array, while required, can be empty.
The $tags array is always required.
There are, in fact, many cases where it might make sense not to pass in any child tags.
In the case of the DNS
structure, if we were interested only in making a list of the IP addresses we could do the following:
Listing 6
1. $child_tags = array();
2. $tags = array("ipaddress");
3. $parser = new XML_PullParser("DNS.xml",$tags,$child_tags);
4.
5. while($token = $parser->XML_PullParser_getToken())
6. {
7. echo "IP address: " . $parser->XML_PullParser_getText('ipaddress') ."\n";
8. }
Line 1 initializes an empty $child_tags array, while on line 2 ipaddress
is assigned to the $tags array. Both are passed into the constructor (line 3).
This same strategy would hold if we had a database of movies and just wanted a list of titles, etc.
| Notes |
|---|
| 1. Throughout the documentation, the term "current token" is used. This refers
to the $converted_token, as described above. |
| 2. In many functions which need a tokenized array to search for element data,
the array parameter will be optional; if it's absent, the function will
first look in the $current_element and if that's not found, it will
use the $converted_token.
The structure of both these arrays is illustrated in
Appendix 1 |