Unless otherwise noted, the examples in this manual are based on the XML document in
the
appendix
Basic Query Unit
The basic syntactical expression in eXcavator is an element name followed
by a condition in square brackets:
element_name[condition]
If the condition evalutes to TRUE, then the query is moved along to the
next syntactical unit. The following is a valid query:
color[green]
If eXcavator finds an elment which holds "green", this expression
will evaluate to TRUE, but it will not do anything in itself. That is, it
will return TRUE, but it will not return any data. To have it return data,
we have to enclose the condition in double square brackets:
color[[green]]
Listing 1 below is a complete query session using this query:
Listing 1
$eXc = new eXcavator($doc, eXcavator_STRING);
$eXc->eXcavator_Query("color[[green]]");
echo $eXc->eXcavator_getResultAsString() . "\n";
/* Result
<COLOR>green</COLOR>
*/
There are a number of things to notice about the query in Listing 1.
- The expression
[[green]] resolves to a test for equality. Moreover, the test is
case-insensitive. This is in keeping with the default behavior of XML_PullParser and
the expat XML parser. To change the default behavior call XML_PullParser_caseSensitive(TRUE).
What this Result tells us that there is indeed a green car in our inventory.
- The query returns everything between the color element's START and END tags. If there were other xml
data between these START and END tags, that also would be returned.
The CDATA keyword
We could have written the test for green, using the
CDATA keyword:
color[[CDATA = green]]
CDATA stands for any character data that is found in the
specified element. This expression makes clear that the test for "green" is
a test for equality, for which
[[green]] is a short-cut. The
CDATA
keyword has another and very important use. When used in a condition by itself, it matches
ANY character data found in the subject element:
color[[CDATA]]
Had we used this expression in Listing 1, then the query would have returned the color for
each of the vehicles in our inventory. The call to
eXcavator_getResultAsString
would have yielded the following result:
<COLOR>green</COLOR>
<COLOR>white</COLOR>
|
Accessing Attributes
Attributes are designated with the
@ prefix. For instance,
the vehicle element in the example document has three attributes, make, model, and year.
Assuming we had a large database of vehicles and want to locate all vehicles
of model years earlier than 2005, we could use the less-than operator and
write the following expression:
vehicle[[@year <2005]]
This would return all the vehicle elements which satisfy the condition. The returned
values would include the
vehicle element and all its descendents;
in other words it includes the START and END tags of the
vehicle
element and everything in between.
For an illustration of this, see the
example
in the section on Expression Chaining
It is also possible to write an expression which tests for the presence of an attribute,
regardless of its value:
first_name[[@middle_init]]
A query using this expression would find all first_name elements
which have a middle_init attrribute. The actual output from
our document, using eXcavator_getResultAsString, would be:
<FIRST_NAME MIDDLE_INIT = "M">Michael</FIRST_NAME>
<FIRST_NAME MIDDLE_INIT = "J">Douglas</FIRST_NAME>
|
Strings
if a string consists of only a single word, then quotes are not required:
color[green]
But if a string has more than one word, it must be enclosed in double quotes:
dealer:zip [["R3N 2B2"]]
This is true for attribute values as well. This means that a query string
which includes a quoted condition string must be enlosed in single quotes:
$eXc->eXcavator_Query('dealer:zip [["R3N 2B2"]]');
Expression Chaining
Unlike most XML technology, XML_PullParser does not view the document as a tree.
Rather it reconfigures the XML document to a "flat" array, that is, a numerically indexed array
in which each array element holds either the name of a tag or data.
For this reason, eXcavator can access elements outside of a strict
hierarchical sequence, without having to work its way from root to branch.
This, for instance, would be a valid query:
last_name[[CDATA]]
With this query, eXcavator would return all the last names it finds.
The parser does not have to know the complete genealogy
of last_name in order to find it.
Nevertheless, XML is by its nature hierarchical. And it's only by respecting its hierachical
character that one can locate data with precision.
It's usually necessary to set the context for a search. For instance,
a query for
name[[CDATA]], would return all the names found in the document, both
those of the owners and those of the dealers. The solution to this is
expression
chaining, which uses the colon as a separator. In expression chaining
the element to the right of the colon
must always be a
descendent of the element to the left of the colon: it does not have to be a
first-generation child of the left-hand element. Chaining allows sequences of elements, as
long as the sequence respects the principle that each element to the right of the colon must
be a descendent of the element to the left.
element_1:descendent_of_1...:descendent_of_n-1
Let's look at an example. The expression
owner:name[[CDATA]]
would return all the
name elements that are descendents
of
owner. In the example document there would be two results,
one for the owner named Taylor, the other for Jones. The data returned would
include everything between the
name START and END tags, as in the
following instance:
<NAME>
<LAST_NAME>Jones</LAST_NAME>
<FIRST_NAME MIDDLE_INIT = "J">Douglas</FIRST_NAME>
<ADDRESS>
<STREET>200 Winnipegosis Ave</STREET>
<CITY>St Adolphe</CITY>
<CITY>Winnipeg</CITY>
<ZIP>R3L 1Z5</ZIP>
</ADDRESS>
</NAME>
|
Chaining allows for narrowing of focus. An important and powerful mechanism here is that it allows
each element in the chain to have its own condition. Suppose we want to locate the owner of the Honda.
The vehicle's make is an attribute of the vehicle
element. So, we could find the owner using this query:
vehicle[@make=Honda]:owner[[CDATA]]
eXcavator locates the vehicle with a
make attribute that equals "Honda", but since the condition is
not in double square brackets, it doesn't save the vehicle element.
Instead, it passes the query on to the next condition in the chain. If that evalutes to
true, i.e. if there is character data in the owner element,
it saves the element and all its descendents.
Had the first condition been in double brackets,
it would have saved both a copy of the vehicle element
and a copy of owner, making for unnecessary duplication,
since the owner data is a subset of the vehicle element.
Context and Context Element
Because
eXcavator does not use strict hierarchical structures,
a chained element does not have to be a first-generation child of the element to
its left. As we have noted above, it needs only to be a descendent.
Therefore, this would be a valid query expression:
owner:street[[CDATA]]
street is a third-generation descendent of
owner.
But this query would return all the
street data enclosed by
owner elements. Here
owner is the
context-element.
eXcavator looks at each
owner
element to see whether it has a descendent named
street and if it
finds one with character data, it returns the data.
As far as the
eXcavator
evaluation engine is concerned all of these are equal in status:
owner:name,owner:last_name,owner:address,owner:street
It does not matter that these elements represent three generations of parents, children
and siblings. The only factor of significance is that they are all governed by the same context-element,
owner. In a chained sequence of elements, each element to the
left of the colon is the context and hence the context-element for the element to the right of
the colon.
If an element is the first element in the chain, i.e. the left-most element, then its context
is implicitly the root element. But any possible parent/grand-parent elements which it might have are out of scope.
Scope plays an important role when formatting text with
eXcavator_getFormattedText.
eXcavator saves only the data which has been requested with the double brackets.
Therefore, the data from any contexts which precede the double brackets has not been saved and is consequently
out of scope:
vehicle[color=>green]:owner[[CDATA]]
The formatting method will be able to access the
street element of
owner but not the
color or
carfax
elements, which appear only under
vehicle.
The Arrow Operator
The arrow operator is used to meet a specific need in query processing.
Assume that there are more than one
owner with a Honda, and that we'd like to get the owner information for
the owner named Jones. The following expression would return only
the last name:
owner:last_name[[Jones]]
And we already have that. If instead, we wrote the expression as follows:
owner[[CDATA]]:last_name[[Jones]]
the evaluation engine would answer TRUE every time it encountered character
data in an owner element, so in effect we'd get all the owners in our list.
And then when it encountered Jones, it would turn out a single
last_name
element with the name Jones. The only way to control the output is to
place
owner in a context where it evaluates to FALSE in every case
except one: the case in which Jones occurs in
last_name.
The solution to this problem is the ARROW operator, which enables us to place an element name
inside the condition:
owner[[last_name=>Jones]]
This expression tell eXcvator to to look in each
owner element
until it finds one with last_name equal to "Jones".
Only then will it return TRUE. The expression to the right of the ARROW
operator can be any valid eXcavator expression. For
instance this could be re-written as
[[last_name=>CDATA = Jones]]
or, if we wanted everyone but Jones:
[[last_name=>CDATA != Jones]]. 1
There is only one exception. The ARROW operator does not support the unqualified CDATA construct:
owner[[last_name=>CDATA]]
To get this result use the following:
owner[[last_name=>CDATA != ""]]
The Arrow Operator and Focussing in on Descendent Elements
Let us assume that we want to return an entire
vehicle where the
dealer is in a particular city. We could use the following:
vehicle[[city=>CDATA _INCL "St Adolphe"]]
But this would return all
city elements which include the city of St. Adolphe,
so that if there were owners who lived in St. Adolphe, their
vehicle
elments would also be returned. The way to deal with this is as follows:
vehicle[[dealer/address/city=>CDATA _INCL "St Adolphe"]]
This would also work:
vehicle[[dealer/city=>CDATA _INCL "St Adolphe"]]
Note: Each element in the descendent list has to be a descendent of the previous element in the
list.
The Comma Operator
The comma operator separates a list of parallell elements, each of which is a
descendent of the same
context-element.
A context-element must always be explicitly suppplied. Here is an example
query using the comma operator:
owner:last_name[[CDATA]],first_name[[CDATA]],street[[CDATA]]
This would also work, since all three of these elements are descendents of name:
owner:last_name:name[[CDATA]],first_name[[CDATA]],street[[CDATA]]
Using
eXcavator_getResultAsXMLDoc to display the output from this query,
we get:
<?xml version = "1.0"?>
<__root__>
<LAST_NAME>Taylor</LAST_NAME>
<FIRST_NAME MIDDLE_INIT = "M">Michael</FIRST_NAME>
<STREET>323 Oak Bay</STREET>
<LAST_NAME>Jones</LAST_NAME>
<FIRST_NAME MIDDLE_INIT = "J">Douglas</FIRST_NAME>
<STREET>200 Winnipegosis Ave</STREET>
</__root__>
|
There are a number of points to keep in mind when using the comma operator.
- The results appear in the order in which they occur in the list. If street
had been placed first in the list, then it would appear first in the results.
- The evaluation engine will stop processing when it comes on an expression which
evaluates to FALSE. So, if we only wanted the first name and street address
of owners with the last name of Jones, we could start the list with
owner:last_name[[Jones]].
But the following sequence could lead to unexpected results:
owner:first_name[[CDATA]],last_name[[Jones]],street[[CDATA]]
As eXcavator works its way through a database with this query,
it will pump out all the first names that it finds, because CDATA will always evaluate to TRUE
and because there is nothing in front of it that evaluates to FALSE.
But the last_name element
will evaluate to TRUE only when the evaluation engine encounters "Jones", at which point
eXcavator will save that element and its data.
Having encountered "Jones", it then goes on to the next expression,
which is street[[CDATA]]. and it saves that element.
In every other case, the last_name
expression evaluates to FALSE and eXcavator goes on to the
next owner, saving neither last_name
nor street. So, we end up with a long list of
first names and one entry each for Jones's last name and street address.
- The elements in the list must be precisely parallel, so that this would not give us
the street, even though
address:street is
a descendent of owner:
owner:name:last_name[[CDATA]],first_name[[CDATA]],address:street[[CDATA]]
This, however, would give us the names and the address elements:
owner:name:last_name[[CDATA]],first_name[[CDATA]],address[[CDATA]]
- If a context-element is not explicitly supplied, the query will fail.
The _OR_ and _AND_ Operators
Beginning with version 1.0.2 of
eXcavator ,
these operators have an enhanced functionality which is detailed in the
next section
of the manual.
| Notes |
|---|
| 1. The full range of operators and expressions is detailed
on the Operators page. |
|