XML_PullParser
A token-based interface to the PHP expat XML library
version 1.3.2
Myron Turner
Strategies 4: Nested Selecting

Contents         

Until now, we have been using a very straight-foward XML example. But what if the DNS structure looked something like the following?

Example 1
<ENTRY> 
<ipaddress>172.20.19.6  </ipaddress> 
<domain> example.com  </domain> 
<server ip="192.168.10.1">
example_1.com 
<registrant>mturner.org </registrant>
</server> 
<server ip="192.168.10.2"> example_2.com  </server> 
<server ip="192.168.10.3"> example_3.com  </server> 
<alias> www.example.com  </alias> 
</ENTRY> 

If we used the code in Listing 10 to parse this structure for server names and ip addresses, we would not get what we want.


/*
 Result
        Name: example_1.com
                IP: 192.168.10.1
        Name: mturner.org
                IP:
        Name: example_2.com
                IP: 192.168.10.2
        Name: example_3.com
                IP: 192.168.10.3
*/

This result is, in fact, technically correct. It reflects the way in which XML_PullParser_getSequence works:

    $parser->XML_PullParser_getElement('server');    
    $seq =  $parser->XML_PullParser_getSequence();

XML_PullParser_getElement creates a tokenized array of all the server elements, their children and the attributes of parents and children. 1 Among these children is registrant. So, the sequence array correctly reports back that the second element is the child element registrant and this is fed to XML_PullParser_getText , which correctly returns "mturner.org" as $name. And the code also correctly reports back that the IP field is blank, because the registrant element has no ip attribute.

There are a number of ways to deal with this issue. The most obvious way would be to test for whether the elements in the sequence are the correct ones, which is a simple matter, since XML_PullParser_getSequence provides the name of each element in its array:

                 list($server, $which) = each($seq[$i]);  
                 if($server != 'SERVER') continue;

Another solution is Listing 12.

Listing 12
         1.    while($token = $parser->XML_PullParser_getToken())
         2.   { 
         3.
         4.     $servers = $parser->XML_PullParser_getElement('server'); 
         5.     $servers = $parser->XML_PullParser_childXCL($servers);   
         6.     $seq =  $parser->XML_PullParser_getSequence($servers); 
         7.
         8.      for($i=0; $i  < count($seq); $i++) {  
         9.         list($server, $which) = each($seq[$i]);  
        10. 
        11.          $name = $parser->XML_PullParser_getText($server,$which);
        12.          echo "Name: $name \n";
        13.
        14.          $ip = $parser->XML_PullParser_getAttributes($server,$which);         
        15.          echo "\tIP: " . $parser->XML_PullParser_getAttrVal('ip', $ip) . "\n";
        16.      }        
        17.    }

/*
 Result
    Name:
    example_1.com

            IP: 192.168.10.1
    Name:  example_2.com
            IP: 192.168.10.2
    Name:  example_3.com
            IP: 192.168.10.3
*/

Listing 12 has excluded the registrant element by the use of this function:

array XML_PullParser_childXCL (array $parent, [mixed $args = ""])

Its purpose is to exclude specified child elements from a parent. 2 When elements are not specified, it removes all child elements, leaving the parent. It does not affect the current token or $current_element.

In Listing 10 it wasn't necessary to pass an array to XML_PullParser_getSequence, because it defaulted to the internal array created by XML_PullParser_getElement . 1 In the present case, however, we have to pass in to XML_PullParser_getSequence the stripped down array created by XML_PullParser_childXCL. Its this stripped down array that forms the basis for the sequencing array $seq.

In the Result section of Listing 12 there's small glitch in the output. There are extra newlines before an after "example_1.com". This is in fact a reflection of the document:

     <server ip="192.168.10.1">
    example_1.com 
     <registrant>mturner.org </registrant>
     </server>

The newlines would disappear from the output if we put the entire unit on one line:

     <server ip="192.168.10.1">example_1.com <registrant>mturner.org </registrant> </server>

But since this isn't always possible, one solution is to pass the results from XML_PullParser_getText through the PHP trim function. A second solution is to let XML_PullParser do this for you by calling this package level function with a parameter of true:

void XML_PullParser_trimCdata (boolean $bool)

Because it's not a class method, you can call it in advance of creating the class itself.

Using XML_PullParser_childXCL is one way to deal with the problem of the registrant . Another is to drop XML_PullParser_getSequence altogether and work directly with the parameters to XML_PullParser_getText.

Listing 13
         1.    XML_PullParser_trimCdata(true);
         2.    while($token = $parser->XML_PullParser_getToken())
         3.    { 
         4.      $parser->XML_PullParser_getElement('server');    
         5.      $n=1;
         5.      while($server = $parser->XML_PullParser_getText('server',$n)) {
         7.          $ip = $parser->XML_PullParser_getAttributes('server',$n);
         8.          echo "Name: $server\n";
         9.         echo "\tIP: " . $parser->XML_PullParser_getAttrVal('ip', $ip) . "\n";
        10.
        11.          $n++;
        12.      }
        13.        
        14.    }

/*
 Result 
        Name: example_1.com
                IP: 192.168.10.1
        Name:  example_2.com
                IP: 192.168.10.2
        Name:  example_3.com
                IP: 192.168.10.3
*/

This was run with XML_PullParser_trimCdata set to true, so the extra line-feeds have been cleaned up. More importantly, the solution is itself cleaner, requiring less code. Whereas the sequencing array includes registrant among its list of elements, requiring us to make an adjustment, here the "server" name is fed directly to both the text and attribute functions.

In our example, we are interested in only one element, but in situations where many elements are involved and where there are few registrant type twists, the sequencing array can be an efficient and effective technique.

Notes
1. See the earlier discussion of XML_PullParser_getElement and the class documentation.
2. See the class documentation.