eXcavator
An XML Query Facility for XML_PullParser
version 1.0.6
Myron Turner
Introduction

Contents         


What eXcavator Is
eXcavator is an XML query processing class that sits on top of XML_PullParser. Its aim is to provide PHP developers with a facility for integrating XML queries into both command line and web applications. Queries are constructed using a small query language that will not be totally unfamiliar to progammers. It will look even more familar to users who know something about XPath and XQuery.

Executing a query involves only three method calls: calling the constructor, passing the query string in to eXcavator_Query(), and then retrieving and outputting the results using one of several methods. One of these, eXcavator_getFormattedText(), provides a facility for inserting query results into programmer-defined template strings and creating formatted out.

Since there are resemblances between XPath and the query language used by eXcavator , it's worth saying at the outset why eXcavator does not use XPath. Most importantly, unlike XPath, eXcavator uses a model for parsing XML that is only loosely hierarchical and that is not based on a tree structure. In addition, while it shares some symbolic and syntactical features with XPath, its queries are based solely on syntactical constructs, whereas XPath provides functions that can be used in conjunction with syntax rules for locating and manipulating data. 1


eXcavator and XML_PullParser
The first thing to be said is that using eXcavator requires no knowledge of XML_PullParser . That being said, however, knowing something about XML_PullParser can be useful.

Because eXcavator uses XML_PullParser, all methods and functions of XML_PullParser are accessible. The package level functions can simply be called directly. For instance, XML_PullParser, following the expat parser, is case-insensitive by default. But this can be changed by calling the package level function XML_PullParser_caseSensitive:

XML_PullParser_caseSensitive(TRUE)

All of the object methods of XML_PullParser are accessible through the $_parser variable. So, if you have an instance of eXcavator referenced through the variable $eXc, then you can refer to any XML_PullParser method:

$eXc->_parser->XML_PullParser_Method()

As noted below under Methods, for instance, it is possible to pass the results from eXcavator_getResultAsXMLDoc to an instance of XML_PullParser for further processing. This can result in a very efficient and clean coding of XML_PullParser . 2


Methods
Creating the Class and Executing the Query
  1. eXcavator($doc, $doc_type)
    Constructor
    $doc is either a string or the name of an XML file.
    $doc_type is one of eXcavator_STRING or eXcavator_FILE which lets the parser know whether it is receiving a string or a file.
  2. eXcavator_Query($query)
    This method is called immediately after the constructor. The $query string is constructed using the eXcavator syntax rules, which are detailed elsewhere in this document.
Extracting the Data
  1. string eXcavator_getFormattedText($which, $element, $pattern)
    This method takes a formatting $pattern and uses it to format the data pointed to by $which and $element . This method will be described in detail later in the manual .
    Parameter(s):
    integer $which: A query may produce more than one result; $which is the number of the desired result. eXcavator_getResultCount() returns the number of results found by the query. Results begin at zero .
    string $element: the name of the element structure which holds the desired data.
    string $pattern:
    patterns are based on a simple syntax that inserts texts from the query result into the text of a user supplied string.
  2. array eXcavator_getResultAsData()
    This method returns each result of the query as an associative array which can be passed to a function that might, for instance, format for ouput or store data in a file. The exact form of this array is dealt with in the manual page on output functions.
  3. string eXcavator_getResultAsString($html=False, $comment=False)
    This method returns the result of the query as a string formatted as XML but not guaranteed to be well-formed.
    Parameters:
    boolean $html: defaults to FALSE. If set to TRUE the returned string will be suitable for display in a browser.
    boolean $comment: defaults to FALSE. If set to TRUE each result found will be preceded by a comment with the number of the result.
  4. array eXcavator_getResultArray()
    This returns the array which eXcavator creates as it saves the data which satsifes the query. It has the same structure as an array returned by XML_PullParser_getChildren. It is a numerically indexed array in which each indexed element holds a query result in the form of an XML_PullParser tokenized array.
  5. string eXcavator_getResultAsXMLDoc()
    This method returns the result of the query as a string which is formatted as well-formed XML. This can be passed to an instance of XML_PullParser and further processed.
Utilities
  1. void eXcavator_free()
    The method frees the resources used by XML_PullParser. First, it frees the handle returned by PHP's XML library. Secondly, if the XML document passed in to the constructor is a file, it closes the file. It is necessary to call this method if the script creates more than one instance of eXcavator or if the instance of eXcavator is to be followed by an instance of XML_PullParser. Any use of eXcavator in a loop which creates successive instances of the class requires a call to this method before each new instance is created.
  2. integer eXcavator_getResultCount()
    This returns the number of the results found by the query. The results of a query are stored internally in an array, one element dedicated to each result. For this reason, when referencing results, as in any of the methods above which take a $which parameter, the first result is zero
  3. array eXcavator_showSchemeAsArray($which, $element)
    An informational facility. It returns an array that is mapped to the requested element structure, showing the the elements it holds and their contents, both character data and attributes. It can be useful when creating patterns for eXcavator_getFormattedText. (See the detailed discussion of that method. )
    Parameter(s):
    integer $which: number of the desired result.
    string $element: the name of the element structure which holds the desired data.
  4. string eXcavator_showSchemeAsString($which, $element)
    An informational facility for use with eXcavator_getFormattedText. It returns a string which lists the names of all the elements held in the designated element structure.
    Parameter(s):
    integer $which: number of the desired result.
    string $element: the name of the element structure which holds the desired data.


A Query Session
The first step in initiating a session is to read in the eXcavator module:

require_once "eXcavator.inc";

eXcavator, in turn, reads in a helper class named eXcavator_QueryParser, which parses the query string. This class is found in eXcavator_QueryParser.inc, which should be in the same directory as eXcavator.inc.

Once the module has been read in, creating and executing eXcavator inolves three basic steps.
  1. Constructor
    The constructor is called with two parameters. The first is either the name of a file or a string that holds the XML document. The second is one of two defined constants. If the XML document is held as a string, the second parameter is eXcavator_STRING; If the document is held in a file, it is eXcavator_FILE. The Synopsis below illustrates the two ways to create an instance of eXcavator.
  2. Query
    eXcavator_Query() is called with a single parameter. This is the query string,
  3. Output
    One of several different methods are called to retrieve and output the results.

The API to eXcavator is clearly quite simple. Any complexities that this module may have will be in the query syntax, which is treated in the next manual section. The Synopsis below lays out some basics and introduces some of the query syntax. The rest of the manual will deal with eXcavator in detail.

Synopsis
require_once "eXcavator.inc";

// use file for XML data
$file = "db.xml";
$eXc =  new eXcavator($file, eXcavator_FILE);
$query = 'vehicle[[@year > = 2004]]'; // get all vehicles from 2004 and later
eXc->eXcavator_Query($query);
echo eXc->eXcavator_getResultAsString(true);   // true gets string as HTML for browser


// use string for XML data
$eXc =  new eXcavator($doc, eXcavator_STRING); // parse XML stored as string in $doc
$query = 'vehicle[[color => CDATA = white]]'; // get all white vehicles
eXc->eXcavator_Query($query);
echo eXc->eXcavator_getResultAsString();   // print result to terminal, not as HTML



// request all vehicles with color either green or white
$eXc->eXcavator_Query("vehicle[[color=> CDATA = green _OR_  CDATA = white ]]");

// output the year, make and model attributes of the vehicle element
$pattern =  "Year: {vehicle;year }    Make: {vehicle;make }    Model: {vehicle;model}"

// output the text from the color and price elements in vehicle
. "    Color: {color}     Price: {price}\n"

// output the names from the owner element
. "Owner:  {owner=>first_name} {owner=>first_name;middle_init} {owner=>Last_name}\n\n" ;

$element = "vehicle";

for($which = 0; $which  < $eXc->eXcavator_getResultCount(); $which++) {
     $text = $eXc->eXcavator_getFormattedText($which, $element, $pattern);
     echo $text;
}

/*
Formatted output:

Year: 2004    Make: Acura    Model: 3.2TL    Color: green     Price: 33900
Owner:  Michael M Taylor

Year: 2005    Make: Honda    Model: Accord    Color: white     Price: 28500
Owner:  Douglas J Jones

*/


Some notes on the Synopsis examples

1. The Queries
In the above Synopsis, year is an attribute of vehicle and color is the name of an element that contains character data. Character data is symbolized as CDATA and attributes are signed with an @ prefix. The latter is consistent with XPath. In the final example, the query asks for any vehicle structures that have a color element with text equal to either green or white. The arrow operator, inside the square brackets, tells the processor that the preceding name, i.e. color, is the name of an element that occurs in the element structure to the left of the brackets, namely vehicle.

2. The $pattern passed to eXcavator_getFormattedText
   { vehicle;year } extracts the year attribute of the vehicle element; the same syntax extracts the make and model attributes;
   { color } and { price } extract the text from the color and price elements;
   {owner=>first_name} uses the arrow operator to tell the method to look for first_name in owner
   the output contains two sets of results, one for $which = 0 and the second for $which = 1.

Notes
1. Since PHP programmers already have access to a rich set of data manipulation functions; eXcavator concentrates solely on data retrieval. XPath's other functions are dedicated to its tree structure and so not relevant to eXcavator. Similarly, XQuery supplies a programmiing langauge for templating and data manipulation, resources built in to PHP.
1. See Listing 2 for a sample script.