eXcavator
An XML Query Facility for XML_PullParser
version 1.0.6
Myron Turner
Formatted Output

Contents         

eXcavator provides three methods for use in formatting of text. All three methods work on internally stored arrays which hold the results of the query. The method that does the actual formatting is eXcavator_getFormattedText. The two other methods are utilities; they come into play if there is a need to visualize the data which will be formatted by eXcavator_getFormattedText. They are not meant to be used in the actual execution of a script (although they might be) but in debugging and visualizing the contents of the special arrays which underlie the text formatting facility. These three methods are the following:

  1. string eXcavator_getFormattedText($which, $element, $pattern)
    This method will be described in detail below.
  2. array eXcavator_showSchemeAsArray($which, $element)
    $which is an index into the array which holds the results of the query; the index of the first result is zero. $element is the name of the target element which will be returned. Use the PHP function print_r() to view the returned array. See the commentary below for further detail.
  3. string eXcavator_showSchemeAsString($which, $element)
    The parameters are the same as for eXcavator_showSchemeAsArray above. But this method returns a string which lists only the element names, not their data. See Appendix 2 for examples.


eXcavator_getFormattedText($which, $element, $pattern)
Parameters:
$which: the index number of the target result in the results array.
$element: the context-element for which data is to be displayed.
$pattern: a template for formatting the data

eXcavator_getFormattedText creates a formatted string for $element and any of its dependents. The result that holds $element is identified by $which. $pattern has this general form:

optional-text {selector} optional-text {selector} . . .

The optional text will be reproduced verbatim from the template. The selectors locate and identify data. If data is found it replaces the curly braces and the enclosed selectors. 1 If there is no data at the location, an empty string replaces the braces. But if there is an error, the braces and selector are not replaced: they remain in the returned string.

Basic Selectors
The basic selector consists of an element name:
{first_name}
or of the element name followed by a semi-colon and the name of an attribute:
{first_name;middle_init}

In the latter case, the selector points to the middle_init attribute of the first_name element. Here is a pattern which will output the full name of each vehicle owner:
'Name: {first_name} {first_name;middle_init}. {last_name}'

Let's look put this pattern into a small script:

Listing 4
 $condition = "owner[[CDATA]]";

 $eXc = new eXcavator($doc, eXcavator_STRING);
 $eXc->eXcavator_Query($condition);
 $pattern =  'Name: {first_name} {first_name;middle_init}. {last_name}';

  for($i=0; $i  <$eXc->eXcavator_getResultCount(); $i++) {
       echo  $eXc->eXcavator_getFormattedText($i, 'owner', $pattern) . "\n";;
  }

/* Result
Name: Michael M. Taylor
Name: Douglas J. Jones
*/

The condition asks for all the owners in our XML document. In the for loop, we use eXcavator_getResultCount to get the number of results, which in this case will be two, since our XML document has two owner elements, one for each of the two vehicles. The first parameter to eXcavator_getFormattedText is the index into the results array; the second parameter is the context-element, which in this case is owner , and the last is the template. The result of this loop are the two names, formatted and output according to $pattern. The curly braces and selectors have been replaced by the requested data.

Note on the Context-Element
Usually, the context-element is the element for which the result has been fetched, as in Listing 4 above, where owner is both the context for the query's result and for the $element parameter to eXcavator_getFormattedText. But the code in Listing 4 would also have worked if the query had been 'vehicle[[CDATA]]'. This works because owner is a descendent of vehicle. The reverse, however, would not work. If the query is 'owner[[CDATA]]', the $element parameter to eXcavator_getFormattedText cannot be 'vehicle'. This is because vehicle is out of scope. The result that has been saved is owner and what is outside that element is unknown.

Selector Addressing
Addressing first_name and last_name is straight-forward because each of these appears only once in owner. But owner has two city elements and, if we were to save the entire contents of each vehicle, we would need a way of addressing the street elements in both owner and dealer and several other other duplicate element names. There are two ways to address these elements, direct and indirect. But whichever method is used, scope and context determine the organization of the data arrays and, therefore, the formulation of all addresses.

Direct Addressing
Direct addressing is the technique which is described above in the section on Basic Selectors and used in Listing 4. Each element is addressed solely through the use of its name, which is a key in the internal array that holds the data. 2 When there is more than one element of the same name, the elements are numbered, beginning with the second element found. The numbers start at 1. Each number is appended to three underscores: '___1' . And this number string is then appended to the element name. For example, vehicle has three city elements, one for dealer and two for owner . If a query asks for the return of an entire vehicle element, these cities would appear in the result array as:
CITY, CITY___1, and CITY___2

A template using these names is no different from the template in Listing 4 . The names are inserted into curly braces:
Dealer's City:  {CITY}    Owner's City:  {CITY___1}, City 2: {CITY___2}

But the numbering of city would change if, instead of asking for vehicle, the query narrows its focus to owner:
owner[[last_name=>Taylor]]

The result array would now contain only two city elements: CITY and CITY___1 . 3 Duplicate names are numbered sequentially in document order. But each result is stored in its own array and its duplicates are therefore numbered independently of their position in the overall XML document.


Indirect Addressing
Direct Addressing is ideal for documents without duplicate element names or documents where duplicates can easily be kept track of without a great deal of book-keeping. Otherwise, indirect addressing tends to be more intuitive and simpler to use. Indirect addressing allows us to address element blocks, like owner, independently of the context element that governs the result. That is, if the query is 'vehicle[[color=>green]]', the context-element is vehicle. But with indirect addressing, we can create an address which temporarily re-sets the context to owner .

This is the syntax:
{ temporary-context-element=>target-element[target-element-index] }
The only requirement is that temporary-context-element must be a descendent of the context-element. 4 Assuming that the query is 'vehicle[[color=>green]]', then this is a valid address:
{ owner=>city[1] }
The index is not dependent on the query. There are two city elements in owner and they will always be owner=>city[1] and owner=>city[2]. This is in contrast to direct addressing where, as we saw, the numbering of elements changes with the structure of the array which maps the result.

In indirect addressing, the indexing of elements is always relative to the context which appears to the left of the arrow operator. It doesn't matter that our query returns a result for vehicle, where there are three city elements. If owner is set as the temporary context, then all that matters is how many city elements occur in owner.

Because all addresses are relative to the temporary-context-element, the call to eXcavator_getFormattedText must have the temporary-context-element as its $element parameter, as in the following example, where the context element for the query is vehicle, but owner is passed into the method:


 $condition = "vehicle[[color=>white]]";
 $eXc = new eXcavator($doc, eXcavator_STRING);
 $eXc->eXcavator_Query($condition);

 $pattern =  '{owner=>first_name} {owner=>first_name;middle_init} {owner=>last_name}';
 echo  $eXc->eXcavator_getFormattedText(0, 'owner', $pattern) . "\n";

/*
 Result:
 Douglas J Jones
*/


Context-element as temporary context
There is one exception to the rule that temporary-context-element must be a descendent of the context-element. The context-element itself can serve as a temporary context-element. if vehicle is the context-element, this would be valid:
{ vehicle=>color[1] }
While this makes it possible to use the indirect addressing syntax with the context-element, it should be kept in mind that all duplicate elements descended from the context-element are included in its address space so that, keeping to our example, there would now be three city elements, the two descending from owner and the one descending from dealer . And these would be addressed as:
{ vehicle=>city[1] }, { vehicle=>city[2] }, { vehicle=>city[3] }


Indirect addressing with attributes
Indirect addresses with attributes are the same as for direct addresses, apart from the difference in indexing syntax:
{ owner=>first_name[1];middle_init }
This, however, will not work:
{ vehicle=>vehicle;make }
The element on the left side of the arrow cannot be the same as the element on the right side. This case requires direct addressing.


Indirect addressing and the first descendent
It is permissible to omit the index from an indirect address, in which case the address will access the first instance of the named element. This means that '{owner=>city}' and '{owner=>city[1]}' refer to the same element. Omitting the index in these cases is a particularly clean way of of addressing elements which appear in more than one context but which are not duplicated within their own contexts. For instance, zip appears in both owner and dealer .   A query which has vehicle as its context-element could refer to these as '{owner=>zip}' and '{dealer=>zip}'.


Addressing multiple contexts of the same name
In our example XML document there are three name elements. Two are descendents of dealer and one of owner . If we wanted to use name as our context-element in order to address last_name and first_name, we have to have a way to indicate which name element we are interested in. One way to do this is to use direct addressing:

  $pattern =  '{first_name} {first_name;middle_init}  {last_name}' . "\n";
 echo  $eXc->eXcavator_getFormattedText(0, 'name___2', $pattern);

The other way to do this is with indirect addresses, and these can take two forms:
{ name__2=>last_name }, { name__2=>first_name }
{ name[3]=>last_name }, { name[3]=>first_name }
While it may initially seem surprising to find name___2 in the indirect addressing mode, it has to be remembered that it is in fact the actual direct address for the third name element, and using it is no different from using owner in our other examples. That is, owner is a direct address, and if there were two elements of that name, then one would be owner and the other owner___1 , and we would be correct in addressing the second instance as owner___1 in any addressing situation.

When dealing with multiple contexts of the same name, the $element parameter of eXcavator_getFormattedText must point to the relevant context-element and this can be either a direct or an indirect address. Either of the following would be correct:

   $eXc->eXcavator_getFormattedText(0,"name___2", $pattern);
   $eXc->eXcavator_getFormattedText(0,"name[3]", $pattern);

This addressing paradigm is very flexible, and the various formats can be mixed. All of the following are valid and do the same job:

 $eXc = new eXcavator($doc, eXcavator_STRING);
 $condition = "vehicle[[color=>white]]";
 $eXc->eXcavator_Query($condition);

 $pattern =  '1. {name[3]=>first_name} {name[3]=>first_name;middle_init} {name[3]=>last_name[1]}';
 echo  $eXc->eXcavator_getFormattedText(0, 'name[3]', $pattern)  . "\n";

 $pattern = '2. {name___2=>first_name[1]} {name___2=>first_name[1];middle_init} {name___2=>last_name[1]}';    
 echo  $eXc->eXcavator_getFormattedText(0, 'name___2', $pattern) ."\n";

 $pattern =  '3. {first_name} {first_name;middle_init}  {last_name}';
 echo  $eXc->eXcavator_getFormattedText(0, 'name___2', $pattern)  . "\n";

 $pattern =  '4. {name[3]=>first_name} {name[3]=>first_name;middle_init} {name[3]=>last_name[1]}';
 echo  $eXc->eXcavator_getFormattedText(0, 'name___2', $pattern)  . "\n";

  $pattern =  '5. {owner=>first_name} {owner=>first_name;middle_init} {owner=>last_name}';
  echo  $eXc->eXcavator_getFormattedText(0, 'owner', $pattern)  . "\n";

/*
 Result:

 1. Douglas J Jones
 2. Douglas J Jones
 3. Douglas J  Jones
 4. Douglas J Jones
 5. Douglas J Jones

*/


Example of Formatted Text
Listing 5 is sample code that illustrates some of the features we have been describing.

Listing 5

 1. $condition = "vehicle[[color=>white]]";

 2. $eXc = new eXcavator($doc, eXcavator_STRING);
 3. $eXc->eXcavator_Query($condition);

 4.  $pattern = "Vehicle: {vehicle;make}  Color: {vehicle=>color[1]}\n";
 5.  echo  $eXc->eXcavator_getFormattedText(0, 'vehicle', $pattern) . "\n";

 6.  $pattern =  'Owner: {first_name} {owner=>first_name[1];middle_init}. {last_name}' . "\n"
 7.		. "Street: {owner=>street[1]}\n"
 8.		. "City: {owner=>city[1]},{owner=>city[2]}\n"
 9.		. "Zip: {owner=>zip}\n";

10.  echo  $eXc->eXcavator_getFormattedText(0, 'owner', $pattern) . "\n";


/* Result

Vehicle: Honda  Color: white

Owner: Douglas J. Jones
Street: 200 Winnipegosis Ave
City: St Adolphe,Winnipeg
Zip: R3L 1Z5


*/

Line Notes on Listing 5
1 the query establishes vehicle as the context-element
4 make is accessed using direct addressing. The address { vehicle=>vehicle;make } would not work, since the name to the right of the arrow cannot be the same as the name to its left. We use an indirect address for color, just to illustrate the use. But we could and probably would use direct addressing in this case.
6 This line uses a direct address for both first_name and last_name because the document contains only one each of these elements. It uses indirect addressing to extract the middle initial from first name in order to illustrate the method for addressing attributes with indirect addresses.
9 This illustrates the use of indirect addressing without an index number to access the first,or only, element of a given name.


HTML Table Example
The following example uses the file "vehicles.xml", which is a list of vehicles with different specifications. It creates an HTML table of vehicles which are less than $32000, demonstrating the elegance and efficiency of eXcavator in combination with the resources of PHP. The result can be seen by clicking here.

Listing 6

$condition = "vehicle[[price => CDATA  < 32000]]";
$eXc = new eXcavator('vehicles.xml', eXcavator_FILE);
$eXc->eXcavator_Query($condition);

$pattern = " <tr> <td> {vehicle;year }  <td> {vehicle;make}  <td> {vehicle;model} "
        . " <td> {color}  <td> {price}  <td> {carfax;buyback}\n";

echo " <h3>Vehicles under \$32,000 </h3>\n";
echo " <Table cellspacing=0> <th>Year <th>Make <th>Model <th>Color <th>Price <th>Buyback\n";

$element = "vehicle";

 for($i = 0; $i < $eXc->eXcavator_getResultCount(); $i++) {
    echo  $eXc->eXcavator_getFormattedText($i, $element, $pattern);
 }


array eXcavator_showSchemeAsArray($which, $element)
Parameters:
$which: the index number of the target result in the results array.
$element: the context-element for which data is to be displayed.

This method returns only the first element named $element. If there is more than one element of the same name, for which a result is desired, then each subsequent element of that name must be addressed with its numeric index. If, for instance, an element named book has three authors, a request for book will return an array that includes author, author___1, and author___2. But if the request is for author , then only the first author element, with its children, will be returned. Separate requests would have to be made for author___1 and author___2 .

In effect, the $element parameter is aimed primarily at the top-level, parent elements that provide the overall structural patterns to a query result, rather than at their descendents, even though it will respond to requests for properly indexed descendent elements.

Examples of the output from this method are illustrated in Appendix 2. The returned array is a copy of the array which eXcavator_getFormattedText uses when formatting data.



Notes
1. The use of curly braces to separate data from user text is a feature of XQuery and bears a resemblance to Smarty.
2. See Appendix 2 for examples of the array and the method descriptions above.
3. This is illustated in Appendix 2 in the array returned eXcavator_showSchemeAsArray and the string returned by eXcavator_showSchemeAsString .
4. The exception to this is the context-element itself, which can be placed to the left of the arrow operator and serve as a temporary context. (See above: Context-element as temporary context. )