XML_PullParser
A token-based interface to the PHP expat XML library
version 1.3.2
Myron Turner
Instantiating the XML_PullParser Object

Contents         

There are two implementations of XML_PullParser , one for use with files, the other for use with documents presented to the constructor as strings:

resource XML_PullParser(string $filename, [array $tags], [array $child_tags]);

resource XML_PullParser_doc(string $doc, [array $tags], [array $child_tags]);

Both implementations expect two arrays, $tags and $child_tags, which may either be passed into the constructor or predeclared using two utility functions:

array XML_PullParser_declareElements(mixed $tags)

array XML_PullParser_declareChildElements(mixed $tags)

Keeping to the DNS example, we would have something like this:

Listing 5
        1.  $tags = array("entry");
        2.  $child_tags = array("ipaddress","server", "domain", "alias");
        3.  $parser = new XML_PullParser("DNS.xml",$tags,$child_tags);

It's important to understand the relationship of the $tags array to the $child_tags array. Both act as selectors, identifying elements of interest. The elements selected by the $tags array are returned by XML_PullParser_getToken, while those selected by the $child_tags array are returned by XML_PullParser_getElement . The tokens returned by these functions are arrays mapped to the structure of each selected element--this includes all children, character data and attributes. The order in which elements are declared in the $tags and $child_tags arrays have no bearing on the order in which the tokens are returned: as explained below, tokens are returned in document order.

Generally speaking, the $tags are the parent elements and the $child_tags are their children. That's the case in the DNS example, where entry is the parent to all of the other elements. But this doesn't prevent the $child_tags from themselves having children.

The most important consideration is that a child element and its parent cannot both be declared in the $tags array. More precisely, this prohibition applies to a parent and any of its descendents, i.e. its children's children. Such a declaration causes the parsing process to become corrupted. The same prohibition does not apply to the $child_tags , which allows for the declaration of both parents and children.

When XML_PullParser_getToken is called, the resulting token is stored internally in the array $converted_token. 1 It's the value of this array that is returned by XML_PullParser_getToken. When XML_PullParser_getElement is called, it draws on the same raw data structure out of which the $converted_token is constructed and stores the result internally in the array $current_element. It's the value of this array that is returned by XML_PullParser_getElement. When XML_PullParser_getElement is called it's passed a single parameter, the $name of an element. The array it returns consists of all the elements of $name found in $converted_token. The element specified by $name has to have been declared in the $child_tags array. 2

A call to XML_PullParser_getToken is the prerequisite for all the other tokenizing functions, including XML_PullParser_getElement. The reason for this is simple: the raw data structure from which these other arrays are constructed, either directly or indirectly, does not come into existence until XML_PullParser_getToken is called. This data structure reflects the document order of elements. Therefore, XML_PullParser_getToken returns its tokens in document order, and independently of the order in which the tokens are declared in the $tags array.

A final observation. The $child_tags array, while required, can be empty. The $tags array is always required. There are, in fact, many cases where it might make sense not to pass in any child tags. In the case of the DNS structure, if we were interested only in making a list of the IP addresses we could do the following:

Listing 6
        1.   $child_tags = array();
        2.   $tags = array("ipaddress");
        3.   $parser = new XML_PullParser("DNS.xml",$tags,$child_tags);
        4.
        5.   while($token = $parser->XML_PullParser_getToken())
        6.   { 
        7.       echo "IP address: " . $parser->XML_PullParser_getText('ipaddress') ."\n";  
        8.   }
       

Line 1 initializes an empty $child_tags array, while on line 2 ipaddress is assigned to the $tags array. Both are passed into the constructor (line 3). This same strategy would hold if we had a database of movies and just wanted a list of titles, etc.

Notes
1. Throughout the documentation, the term "current token" is used. This refers to the $converted_token, as described above.
2. In many functions which need a tokenized array to search for element data, the array parameter will be optional; if it's absent, the function will first look in the $current_element and if that's not found, it will use the $converted_token. The structure of both these arrays is illustrated in Appendix 1