XML_PullParser_getTextStripped takes one parameter, which can be either a tokenized array or the name of an element.
If a name is passed in, or if no parameter is passed in, then it assumes that the subject of
the request is either the
$current_element or the current token.
1
If a token is passed in then it uses the token. Its defining characteristic is
that is does not observe element boundaries. It returns a concatenated string made up of all the
text found in the token. This includes the text of all children, and descendent elements.
It includes as well all white space separating element from element, and white space includes
new-lines. The default delimiter between the concatenated members of this string is a single
space character. This can be changed using
string XML_PullParser_setDelimiter (string $delimiter)
The returned string is the old delimiter, which can then be reset, if necessary, with
a second call to
XML_PullParser_setDelimiter.
void XML_PullParser_trimCdata (boolean $bool)
Setting this to true will cause all text extracted from each element to be passed through
the PHP trim function.
In the following example the <emphasis> element is concatenated with the
first <News_item> element and both with the second <News_item>; they are separated
by the default delimiter, a single space:
XML:
<News_item>
There was a <emphasis>big</emphasis> rainstorm last night.
</News_item>
<News_item> It rained cats and dogs </News_item>
Result: There was a big rainstorm last night. It rained cats and dogs!
|
2. XML_PullParser_getTextArray
This method is a front-end to
XML_PullParser_getTextStripped.
It returns an array of the strings in the element specified in the parameter, which
is required and which is either a tokenized array or a string and is treated exactly the same
as the
parameter to
XML_PullParser_getTextStripped.
This method takes advantage of the fact that XML_PullParser_getTextStripped ignores
element boundaries and returns a concatenated string of texts separated by a pre-set delimiter.
It changes the delimiter to ';;' by calling XML_PullParser_setDelimiter(';;');
then it creates the array by calling explode on the string. It then resets the delimiter to
its old value. Obviously, this means that if a database uses a double semi-colon,
this method will not work correctly, but it can be easily enough duplicated.
Let's assume the following database to demonstrate XML_PullParser_getTextArray.
Example 1: Movies.xml
<Movies>
<Movie>
<Title>Gone With The wind</Title>
<date>1939</date>
<leading_lady>Vivien Leigh</leading_lady>
<leading_man>Clark Gable</leading_man>
</Movie>
<Movie>
<Title>How Green Was My Valley</Title>
<date>1941</date>
<leading_lady>Maureen O'Hara</leading_lady>
<leading_man>Walter Pidgeon</leading_man>
</Movie>
<Movie>
<Title>Jurassic Park</Title>
<date>1993</date>
<leading_lady>Laura Dern</leading_lady>
<leading_man>Sam Neil</leading_man>
</Movie>
</Movies>
To get all the titles from Movies.xml , all that's necessary is the following call:
$parser->XML_PullParser_getTextArray("Title")
The technique is demonstrated in Listing 17:
Listing 17
1. $tags = array("Movies");
2. $child_tags = array();
3.
3. $parser = new XML_PullParser("Movies.xml", $tags,$child_tags);
5.
6. $token = $parser->XML_PullParser_getToken();
7.
8. $text_array = $parser->XML_PullParser_getTextArray("Title");
9. print_r($text_array);
/*
Result
Array
(
[0] => Gone With The wind
[1] => How Green Was My Valley
[2] => Jurassic Park
)
*/
One precautionary note. Given the current coding, the following call
will not return the expected result:
$parser->XML_PullParser_getTextArray("Title")
The expected result is:
Array
(
[0] => Gone With The wind
[1] => 1939
[2] => Vivien Leigh
[3] => Clark Gable
[4] => How Green Was My Valley
[5] => 1941
[6] => Maureen O'Hara
[7] => Walter Pidgeon
[8] => Jurassic Park
[9] => 1993
[10] => Laura Dern
[11] => Sam Neil
)
|
But instead we get:
Array
(
[0] =>
[1] =>
[2] => Gone With The wind
[3] =>
[4] => 1939
[5] =>
[6] => Vivien Leigh
[7] =>
•
•
•
)
|
The empty array elements represent new-lines, and we can see that's the case since there is no new-line
between elements [2] and [3] or elements [4] and [5]. What's required here is a call to
XML_PullParser_excludeBlanksStrict with a value of true. That
gets rid of all the blank elements and gives the expected result.
3. XML_PullParser_getText
All calls to this method are eventually passed on to
XML_PullParser_getTextStripped.
XML_PullParser_getText identifies and prepares the element which will be passed in
to
XML_PullParser_getTextStripped, and that method then returns all the text found
in the element in accordance with the rules that govern its return values.
XML_PullParser_getText takes three optional parameters, $el,
which is a tokenized element (an array) or its name (a string), a
$which value, and the boolean $xcl. In its default state, none of these parameters are passed in and
it uses either the $current_element or the current token, whichever is currently operative,
together with a $which value of zero and an $xcl value of FALSE.
The following listing demonstrates the use of the defaults;
it uses the DNS
example
we've worked with throughout.
Listing 18
1. $tags = array("entry");
2. $child_tags = array("server","domain");
3.
4. $parser = new XML_PullParser("DNS.xml",$tags,$child_tags);
5.
6. $parser->XML_PullParser_getToken();
7. echo $parser->XML_PullParser_getText() . "\n";
8.
9. $el = $parser->XML_PullParser_getElement("server");
10. echo $parser->XML_PullParser_getText() . "\n";
11.
12.
13. $parser->XML_PullParser_getElement("domain");
14. echo $parser->XML_PullParser_getText() . "\n";
/*
Result
172.20.19.6
example.com
example_1.com
example_2.com
example_3.com
www.example.com
example_1.com example_2.com example_3.com
example.com
*/
Line 6 retrieves the entire Entry element and all of its children, and these
are output on line 7, giving us the first block of the Result section. This consists of everything
included in the element and all of the white space, which is why the result appears on separate
lines. Had we called XML_PullParser_excludeBlanks(true) the result would have
appeared as a single line of text:
172.20.19.6 example.com example_1.com example_2.com example_3.com www.example.com
The result from the call to XML_PullParser_getElement('server') in line 9 appears
on a single line, because XML_PullParser_getElement incorporates into the token
only the server elements. In this case, any whitespace found within the elements
themselves would appear in the result but not the whitespace separating element from element.
It's the latter, with its new-lines, which causes the texts derived from the
$converted_token created by XML_PullParser_getToken
to be printed on separate lines.
The call to XML_PullParser_getElement('domain') in line 13
yields
example.com
because there is only one domain element in the XML document. Had there been more than
one domain element we would have to use the $which parameter to
single out the desired domain element. The same mechanism applies, of course, to
the server elements.
A Closer Look at the Parameters to XML_PullParser_getText
The element parameter (
$el ) passed in to
XML_PullParser_getText can be either a string,
which is the name of an element, or a tokenized array.
-
If the element parameter is the name of an element,
then either the $current_element or the current token will be searched for the named element,
depending on which is currently operative.
The method returns the which_th instance of that element. If $which = 0,
it will return the texts from all instances of the named element found in the token.
-
If the element parameter is a tokenized array, the method will
return the character data from which_th element found in the array.
If $which = 0, it will return all the character data found in the array.
This is the rule which governs the output of line 6 in Listing 16 above.
That is, no parameters are passed into the method, so that the the default token becomes the entire
<ENTRY> array and $which defaults to zero. Therefore, all the
character data found in the default token is returned--all parents, all descendents,
The difference between the two sets of returned values arises out of what the method knows. In the first
case, it knows the name of the element and can therefore search the default token for one of more instances
of the named element. In the second case, it doesn't have the name of an element. Therefore, if
it's passed a
$which value of 1, it returns the character data of the first element,
regardless of its name.
The third parameter to
XML_PullParser_getText is the boolean
$xcl.
This parameter plays a part only in the handling of arrays, that is where
$el is a
tokenized array or one of the two default tokens. It defaults to FALSE. But when it is set to TRUE,
the subject array is filtered through
XML_PullParser_childXCL.
This means that all descendent elements are removed and that we are left with an array consisting
solely of the parent or of elements with the same name as the first top-level element
and which are themselves not descendents of any other element.
| Note |
|---|
| Prior to release 1.2.1, if the $el parameter was the name of the default token,
Null was returned. In current releases,
if $el is the name of the default token, the behavior is the same as
the behavior when an array is passed in as $el. |
4. XML_PullParser_getTextMarkedUp
This function is designed for converting streams of XML to HTML. It converts XML elements to
HTML tags. Otherwise, its functionlity is the essentially the same as
that of
XML_PullParser_getTextStripped, with one exception: it is not subject to the
CDATA modifiers.
It takes two parameters. The first is the $markup array which maps XML elements to HTML tags,
the second an optional element parameter consisting of either a tokenized array or the name of an element.
The element parameter behaves exactly as it does in XML_PullParser_getTextStripped.
The advantage of placing the optional element parameter last is that it can be omitted when one of the
two default tokens is being used. 2 All that is needed then is to
pass in the $markup array.
| The markup array uses four helper methods: |
- array XML_PullParser_getCSSSpans (array $markup)
- array XML_PullParser_getHTMLTags (array $markup)
- array XML_PullParser_getStyledSpans (array $markup, array $attributes)
- array XML_PullParser_getStyledTags (array $markup, array $attributes)
All the parameters are associative arrays. In the two "Spans" methods, the $markup
arrays map XML element names to HTML class names:
array("code"=>"code", "emphasis"=>"bold_italic", "classname"=>"cname") |
These will create <SPAN> tags with the class attribute set to the
the mapped value:
<span class="cname">XML_PullParser</span> |
In the two "Tags" methods, the $markup arrays map XML element names to
standard HTML tag names:
array("code"=>"code", "emphasis"=>"b", "classname"=>"i") |
The $attributes parameter of the two "Styled" methods allows for additional
attributes to be inserted in the HTML tags. For the most part these will be style
attributes, but technically they can be anything. The $attributes parameters
are also associative arrays:
array("style"=>"font-size: 10pt; text-decoration:underline",
"style"=>"background-color:blue; color: yellow;", "style"="color: #999999">) |
The $attributes array has to be sequentially parallel to the $markup
array, so that if the above styles were applied to the tags example, the first tag would
get the first style, the second tag the second style, etc:
<code style="font-size: 10pt; text-decoration:underline">$markup</code>
<b style="background-color:blue; color: yellow;">This is BOLD yellow on Blue</b>
|
The $markup arrays can be concatenated:
$markup = $parser->XML_PullParser_getCSSSpans(array(. . . .));
$markup += $parser->XML_PullParser_getHTMLTags(array(. . . .));
$markup += $parser->XML_PullParser_getStyledTags (array(. . . .), array(. . . .));
$text = $parser->XML_PullParser_getTextMarkedUp($markup);
|
A final point. This manual was written in conformance with the Docbook specification.
XML_PullParser_getTextMarkedUp
has built-in support for the Docbook ulink element
and will automatically convert a ulink
to an HTML A tag:
<ulink url="http://XML_PullParse.org/manual.html">Manual</ulink>
<A href="http://XML_PullParse.org/manual.html">>Manual</A>
|