What eXcavator Is
eXcavator is an XML query processing class that sits on top of
XML_PullParser. Its aim is to provide
PHP developers with a facility for integrating XML queries into both command line and web applications.
Queries are constructed using a small query language that will not be totally unfamiliar
to progammers. It will look even more familar to users who know something about XPath
and XQuery.
Executing a query involves only three method calls: calling the constructor,
passing the query string in to eXcavator_Query(), and then retrieving and outputting the results
using one of several methods. One of these, eXcavator_getFormattedText(),
provides a facility for inserting query results into programmer-defined template strings
and creating formatted out.
Since there are resemblances between XPath and the query language used by eXcavator ,
it's worth saying at the outset why eXcavator does not use XPath.
Most importantly, unlike XPath, eXcavator uses a model for parsing XML
that is only loosely hierarchical and that is not based on a tree structure.
In addition, while it shares some symbolic and syntactical features with XPath, its queries are
based solely on syntactical constructs, whereas XPath provides functions that can be used
in conjunction with syntax rules for locating and manipulating data. 1
eXcavator and XML_PullParser
The first thing to be said is that using eXcavator
requires no knowledge of XML_PullParser . That being said, however,
knowing something about XML_PullParser can be useful.
Because eXcavator uses XML_PullParser,
all methods and functions of XML_PullParser are accessible.
The package level functions can simply be called directly. For instance,
XML_PullParser, following the expat parser, is case-insensitive
by default. But this can be changed by calling the package level function
XML_PullParser_caseSensitive:
XML_PullParser_caseSensitive(TRUE)
All of the object methods of XML_PullParser are accessible through
the $_parser variable. So, if you have an instance of eXcavator
referenced through the variable $eXc, then you can refer to any
XML_PullParser method:
$eXc->_parser->XML_PullParser_Method()
As noted below under Methods, for instance, it is possible to pass the results from
eXcavator_getResultAsXMLDoc to an instance of XML_PullParser
for further processing. This can result in a very efficient and clean
coding of XML_PullParser . 2
Methods
| Creating the Class and Executing the Query |
- eXcavator($doc, $doc_type)
| Constructor
$doc is either a string or the name of an XML file.
$doc_type is one of eXcavator_STRING or
eXcavator_FILE which lets the parser know whether it is
receiving a string or a file.
|
- eXcavator_Query($query)
| This method is called immediately after the constructor.
The $query string is constructed using the eXcavator syntax rules,
which are detailed elsewhere in this document.
|
- string eXcavator_getFormattedText($which, $element, $pattern)
| This method takes a formatting $pattern and uses it
to format the data pointed to by $which and $element .
This method will be described in detail later in the
manual .
Parameter(s):
integer $which:
A query may produce more than one result; $which is the number of the
desired result. eXcavator_getResultCount() returns the
number of results found by the query. Results begin at zero .
string $element:
the name of the element structure which holds the desired data.
string $pattern:
patterns are based on a simple syntax that inserts texts from the query result
into the text of a user supplied string.
|
- array eXcavator_getResultAsData()
| This method returns each result of the query as an associative array which can be passed
to a function that might, for instance, format for ouput or store data in a file.
The exact form of this array is dealt with in the manual page on
output functions.
|
- string eXcavator_getResultAsString($html=False, $comment=False)
| This method returns the result of the query as a string formatted as XML
but not guaranteed to be well-formed.
Parameters:
boolean $html: defaults to FALSE. If set to TRUE
the returned string will be suitable for display in a browser.
boolean $comment: defaults to FALSE. If set to
TRUE each result found will be preceded by a comment with the number
of the result.
|
- array eXcavator_getResultArray()
| This returns the array which eXcavator creates
as it saves the data which satsifes the query. It has the same structure as
an array returned by XML_PullParser_getChildren. It is a
numerically indexed array in which each indexed element holds a query result
in the form of an XML_PullParser tokenized array.
|
- string eXcavator_getResultAsXMLDoc()
| This method returns the result of the query as a string which is formatted as well-formed
XML. This can be passed to an instance of XML_PullParser and further
processed.
|
void eXcavator_free()
| The method frees the resources used by XML_PullParser.
First, it frees the handle returned by PHP's XML library.
Secondly, if the XML document passed in to the constructor is a file,
it closes the file. It is necessary to call this method
if the script creates more than one instance of
eXcavator or if the instance of
eXcavator is to be followed by an
instance of XML_PullParser.
Any use of eXcavator in a loop which creates
successive instances of the class requires a call to this method
before each new instance is created.
|
integer eXcavator_getResultCount()
| This returns the number of the results found by the query. The results of a query
are stored internally in an array, one element dedicated to each result. For this reason,
when referencing results, as in any of the methods above which take a $which
parameter, the first result is zero
|
array eXcavator_showSchemeAsArray($which, $element)
| An informational facility. It returns an array that is mapped to the requested element structure,
showing the the elements it holds and their contents, both character data and attributes. It can
be useful when creating patterns for
eXcavator_getFormattedText. (See the detailed discussion of that
method. )
Parameter(s):
integer $which: number of the desired result.
string $element:
the name of the element structure which holds the desired data.
|
string eXcavator_showSchemeAsString($which, $element)
| An informational facility for use with eXcavator_getFormattedText.
It returns a string which lists the names of all the elements held in the
designated element structure.
Parameter(s):
integer $which: number of the desired result.
string $element:
the name of the element structure which holds the desired data.
|
A Query Session
The first step in initiating a session is to read in the eXcavator module:
require_once "eXcavator.inc";
eXcavator, in turn, reads in a helper class named
eXcavator_QueryParser, which parses the query string.
This class is found in eXcavator_QueryParser.inc,
which should be in the same directory as eXcavator.inc.
Once the module has been read in, creating and executing
eXcavator inolves
three basic steps.
- Constructor
The constructor is called with two parameters. The first is either the name of a file or a string that
holds the XML document. The second is one of two defined constants.
If the XML document is held as a string, the second parameter is eXcavator_STRING;
If the document is held in a file, it is eXcavator_FILE.
The Synopsis below illustrates the two ways to create an instance of eXcavator.
- Query
eXcavator_Query() is called with a single parameter. This is the query string,
- Output
One of several different methods are called to retrieve and output the results.
The API to eXcavator is clearly quite simple. Any complexities that this module may have
will be in the query syntax, which is treated in the next manual section. The Synopsis below lays out some
basics and introduces some of the query syntax. The rest of the manual will deal with eXcavator
in detail.
Synopsis
require_once "eXcavator.inc";
// use file for XML data
$file = "db.xml";
$eXc = new eXcavator($file, eXcavator_FILE);
$query = 'vehicle[[@year > = 2004]]'; // get all vehicles from 2004 and later
eXc->eXcavator_Query($query);
echo eXc->eXcavator_getResultAsString(true); // true gets string as HTML for browser
// use string for XML data
$eXc = new eXcavator($doc, eXcavator_STRING); // parse XML stored as string in $doc
$query = 'vehicle[[color => CDATA = white]]'; // get all white vehicles
eXc->eXcavator_Query($query);
echo eXc->eXcavator_getResultAsString(); // print result to terminal, not as HTML
// request all vehicles with color either green or white
$eXc->eXcavator_Query("vehicle[[color=> CDATA = green _OR_ CDATA = white ]]");
// output the year, make and model attributes of the vehicle element
$pattern = "Year: {vehicle;year } Make: {vehicle;make } Model: {vehicle;model}"
// output the text from the color and price elements in vehicle
. " Color: {color} Price: {price}\n"
// output the names from the owner element
. "Owner: {owner=>first_name} {owner=>first_name;middle_init} {owner=>Last_name}\n\n" ;
$element = "vehicle";
for($which = 0; $which < $eXc->eXcavator_getResultCount(); $which++) {
$text = $eXc->eXcavator_getFormattedText($which, $element, $pattern);
echo $text;
}
/*
Formatted output:
Year: 2004 Make: Acura Model: 3.2TL Color: green Price: 33900
Owner: Michael M Taylor
Year: 2005 Make: Honda Model: Accord Color: white Price: 28500
Owner: Douglas J Jones
*/
Some notes on the Synopsis examples
1. The Queries
In the above Synopsis, year is an attribute of vehicle
and color is the name of an element that contains character data.
Character data is symbolized as CDATA and attributes are signed with
an @ prefix. The latter is consistent with XPath.
In the final example, the query asks for any vehicle structures that have a color element with text equal to
either green or white. The arrow operator, inside the square brackets, tells the
processor that the preceding name, i.e. color, is the name of an element that occurs in the element structure
to the left of the brackets, namely vehicle.
2. The $pattern passed to eXcavator_getFormattedText
 
•
{ vehicle;year }
extracts the year attribute of the vehicle element;
the same syntax extracts the make and model attributes;
•
{ color } and { price } extract the text from
the color and price elements;
•
{owner=>first_name} uses the arrow operator to
tell the method to look for first_name
in owner
•
the output contains two sets of results, one for
$which = 0 and the second for $which = 1.
| Notes |
|---|
| 1.
Since PHP programmers already have access to a rich set of data manipulation functions;
eXcavator concentrates solely on data retrieval.
XPath's other functions are dedicated to its tree structure and so not relevant to
eXcavator. Similarly, XQuery supplies a programmiing
langauge for templating and data manipulation, resources built in to PHP. |
| 1.
See Listing 2 for a sample script. |