XML_PullParser is a token-based interface to the PHP expat XML library.
It moves the the API of the php XML facility from an event-based model to a token-based model. Instead of procesing data as it is passed from the parser to callbacks, a script using PullParser requests "tokens" from XML_PullParser_getToken(). Tokens are arrays representing XML structures, which become available in the order in which they appear in the document being parsed.
Methods are provided to get tokens and extract their data. The API consists of the methods and functions with the XML_PullParser_ prefix. Methods beginning with the underscore are internal.
All class methods which return tokens return NULL when tokens are not available and so can be used in while loops:
while($token=$parser->XML_PullParser->getToken()){
}Similarly, all data accessors return either NULL, or the empty string or array when no data is available.
Documentation and examples are available in this Class Documentation created using phpDocumentor; in the manual pages, which were created from XML files using XML_PullParser; and in the sample files, which are complete PHP files based on the manual code listings.
Located in /XML_PullParser.inc (line 454)
| Class | Description |
|---|---|
| XML_PullParser_doc |
Created in XML_PullParser_setAttrLoop and used as a stack by XML_PullParser_nextAttr to return the next attribute.
This stack remains in memory until the next call to XML_PullParser_setAttrLoop or XML_PullParser_setAttrLoop_elcd
The stack pointer can be reset to the top of the stack using XML_PullParser_resetAttrLoopPtr
Elements declared in both the $tags array and the $child $tags array.
A separate stack is maintained for these and they are accessed through XML_PullParser_getEscapedToken
$_next_element_array == $current_element; the assigment is made in in XML_PullParser_getElement
It is used by XML_PullParser_nextElement as a stack for returning the next element; When the final element is removed the stack is exhausted
used by methods which silently reset the $current_element
to an alternate token
Excludes specified child elements from parent array and returns resulting array
If no elements are specified for exclusion, it excludes all child elements and returns the parent alone, from which all child elements have been removed.
This method assumes that the first element that it encounters is the parent and all others are its descendent elements. The descendent elements may themselves have descendent elements.
If there is more than one top-level element of the same name as the parent, these are also included in the returned array. This might be the case, for instance, in a tokenized array returned originally by XML_PullParser_getElement.
Clear the escaped token stack
Removes all blank CDATA array elements from $token
Tests all CDATA packets against the PERL regex '/\w/' and deletes from the token array any CDATA element which does not meet this test.
Free the parser object.
Needed only if restarting the parsing process.
Returns an associative array of attribute-name/attribute-value pairs found in the specified element.
Example:
Array
(
[IP2] => 192.168.0.1
[TECH] => tech@example.net
[IP3] => 192.168.0.2
[IP4] => 192.168.0.3
)When namespace support is not in effect, it is possible to extract individual names/values from members of this array, using the standard PHP constructs, each and </b>foreach</b>; or, where $name is known, it is possible to get the value: $value = $array[$name]. It is safest to use XML_PullParser_getAttrVal, which guarantees the current case folding setting.
When namespace support is invoked, XML_PullParser_getAttrVal should always be used, because attribute names are re-formatted into keys with the namespace URI's.
Get the value of an attribute.
This method guarantees conformance with the current case-folding setting.
It should always be used to access attribute values when namespace support is in effect.
This method excavates all the attributes of a specified element.
It is guaranteed to return all attribute names and values even in cases where a parent has more than one child of the same name with same-named attributes. For instance:
<directory>
<file name = "tutorial.doc" role = "doc" />
<file name = "classes.php" role = "php" />
<file name = "constants.inc" role = "php" />
</directory>The above would return the following array:
Array ( [0] => Array ( [NAME] => tutorial.doc [ROLE] => doc ) [1] => Array ( [NAME] => classes.php [ROLE] => php ) [2] => Array ( [NAME] => constants.inc [ROLE] => php ) )
array($child=>$parent).$child and $parent can be the same.
Get the value of an attribute if it falls within the current namespace definition
This method was designed primarily for internal use but may have applicability in some scripting situations. Generally, however, XML_PullParser_getAttrVal should be used to get attribute values.
attribute-name=>attribute-valueattribute-name is the name supplied by XML_PullParser_NS, which is an internally constructed key. The keys can be derived from the attribute arrays supplied by XML_PullParser_getAttributes, XML_PullParser_nextAttr and XML_PullParser_getAttrValues
Retrieves a child element and its dependents from its parent
This method extracts individual child elements from either $el or, if $el is not specified, from the $current_element or, failing that, from the current token.
$which specifies which instance of the child element to extract; the instances are treated as a sequence, in the order of appearance in the element's array, which follows the order of appearance in the XML document
if $child is the name of $el and $which == 1, then $el will be returned
This method will extract all the children named by $child from $el. If $el is not specified, then the $current_element is used. If the $current_element is not set, because XML_PullParser_getElement has not been called, then it searches the current token returned by XML_PullParser_getToken.
Fetches children from parent using a string to specify the parent element.
See {@linkXML_PullParser_getChildren} for the method which fetches children from a parent specified as array
This method uses either $current_element or current token as the array from which to derive the child elements. If $current_element is not set then the current token is used.
This takes an associative array of XML element tags and CSS class names and converts it to an array structure suitable for use in XML_PullParser_getTextMarkedUp:
array(xml_element =>css_class_name, xml_element =>css_class_name, . . )
For instance:
array("code"=>"code", "emphasis"=>"boldface_italic");
Gets tokenized arrays of elements specified in the $child_tags array.
This method is second only to XML_PullParser_getToken in importance. XML_PullParser_getToken returns elements specified in the tags array: XML_PullParser::$tags, whereas this method returns elements specified in the child tags array: XML_PullParser::$child_tags. The array it returns consists of all the elements name $el found in $converted_token and all of their dependents.
get the name of the element array $el or the element name portion of the internal string representation of the element
Returns a single escaped token on each call.
An escaped token is an element which is declared in both the $tags array and the $child_tags array. A separate stack is created for these tokens. Each time XML_PullParser_getEscapedToken returns a token the token is popped off the stack, until the stack is exhausted, at which point it returns Null, making this method suitable for use in a loop.
The stack is persistent. If it is not exhausted and if the file being processed is larger than $read_length, tokens will be added to the stack when the next chunk of the file is parsed.
To clear the stack at any pont call XML_PullParser_clearEscapedTokens
An escaped token can be accessed by XML_PullParser_getEscapedToken at any time, as long as it is still on the stack. It can also be accessed in normal document order by XML_PullParser_getElement. But escaped tokens are not returned by XML_PullParser_getToken. However, if an escaped element is the child of the current token, then it can be accessed in the usual ways, e.g. XML_PullParser_getChild.
This takes an associative array of XML element tags and HTML tags and converts it to an array structure suitable for use in XML_PullParser_getTextMarkedUp:
array(xml_element =>html_tag_name, xml_element =>html_tag_name, . . )
For instance:
array("code"=>"code", "emphasis"=>"b", "classname"=>"i");
Gets unqualified attribute name from the internally created attribute key
The attribute key is created internally for namespace-qualified attributes from both the attribute name and the namespace
Extracts the namespace URI from an internally constructed key for either attributes or elements
Element namespaces are held along with attributes in the attribute array assigned to an element:
Array
(
[HTTP://EXAMPLE.COM/DNS.TXT/|IP] => 192.168.10.3
[_ns_] => Array
(
[HTTP://EXAMPLE.COM/DNS.TXT/] => 1
)
)These arrays are returned by XML_PullParser_getAttributes or can be extracted from the arrays returned by XML_PullParser_getAttrValues and XML_PullParser_nextAttr
Get listing of all elements in sequence, including those of dependents, found in the array $el.
if $el is not supplied, then this function looks for the $current_element and if that's not found it then uses the current token.
The list consists of an associative array in which the keys are the names of the elements and the values the sequence number in the array being scannned. For instance, if the $current_element is set to "SERVER", then:
Array
(
[0] => Array
(
[SERVER] => 1
)
[1] => Array
(
[SERVER] => 2
)
[2] => Array
(
[SERVER] => 3
)
)This array is used to sequence through elements with functions that take a position number, for instance XML_PullParser_getText and XML_PullParser_getAttributes.
XML_PullParser_getSequence silently resets the $current_element to $el. To set it back to its original value after the sequence array has finished its work, call:
$parser->XML_PullParser_resetCurrentElement($parser->_save_current_element)
Adds attributes to spans and classes to spans and is used with XML_PullParser_getTextMarkedUp
This method takes two parameters, $attributes and $markup. Both are associative arrays. The $markup array is the same as the XML_PullParser_getCSSSpans array:
array(xml_element =>css_class_name, xml_element =>css_class_name, . . )The $attributes array is an associative array of this format:
array(html_attribute =>attribute_value, . . )
If the first element in the $markup array has the following form:
("emphasis"=>"bold_text")and if its counterpart in the $attributes array has this format:
("style"=>"font-size: 10pt")The tag would become:
<span class="bold_text" style="font-size: 10pt">
The $markup array always defaults to
class="markup"
The attribute/value pair of the $attributes array can be any valid markup.
The two arrays must be sequentially parallel, so that $markup-1 is modified by $attributes-1, etc. There cannot be duplicate keys, since the last duplicate overwrites the previous.
It returns an array dedicated for use in XML_PullParser_getTextMarkedUp; it can be combined with arrays returned by XML_PullParser_getStyledTags, XML_PullParser_getCSSSpans, and XML_PullParser_getHTMLTags.
This function is almost identical to XML_PullParser_getStyledSpans, except that it modifies standard HTML tags, so that one could convert
<b> to <b class="title">
Get character data from the specified element.
If no parameters are passed to this method, it is assumed that the subject of the search defaults either to $current_element, or if that has not been set, then to the current token. All the character data of the of the default array is returned. This is in keeping with rule #1 below, where $which = 0 and $el is an array. (It is also implied by rule #5.)
All requests to this function are preprocessed here and ultimately passed on to XML_PullParser_getTextStripped, which means that its output is subject to the CDATA modifiers: XML_PullParser_excludeBlanks, XML_PullParser_trimCdata, and XML_PullParser_excludeBlanksStrict.
Gets an array of strings consisting of the character data specified by the parameter $el, where $el is either a string naming the element or an array holding the element.
The return value is a numerically indexed array of strings which reflects the structure of the element referenced. If the element specified by $el is, for instance, a structure such as
<Movies>
<Movie>
<Title>Gone With The wind</Title>
<date>1939</date>
<leading_lady>Vivien Leigh</leading_lady>
<leading_man>Clark Gable</leading_man>
</Movie>
<Movie>
<Title>How Green Was My Valley</Title>
<date>1941</date>
<leading_lady>Maureen O'Hara</leading_lady>
.
</Movie>
<Title>Jurassic Park</Title>
.
.
</Movie>
</Movies>XML_PullParser_getTextArray("Title") will return an array of titles.
Array
(
[0] => Gone With The Wind
[1] => How Green Was My Valley
[2] => Jurassic Park
)But XML_PullParser_getTextArray("Movies") will return an array consisting of all the character data between the <Movies> and </Movies>:
Array
(
[0] => Gone With The Wind
[1] => 1939
[2] => Vivien Leigh
[3 => Clark Gable
[4] => How Green Was My Valley
[5] => 1941
.
.
[8] => Jurassic Park
.
.
)Because this method uses XML_PullParser_getTextStripped to retrieve the character data, all character data is returned, including character data from dependent child elements.
It is useful to call XML_PullParser_excludeBlanks, otherwise the array returned will include empty elements where they appear in the XML.
If $el is a string, it searches the $current_element for the specified element and failing that the current token. If $el is an array, the array should be specific to the text required, i.e. a container consisting of Start and End tags within which the text data resides.
This method will mark up text, essentially for redisplay as in HTML, using the $mark_up array for determining which XML elements are to be marked up and how they are to be marked up.
The $mark_up array should be created using the functions provided: XML_PullParser_getHTMLTags, XML_PullParser_getCSSSpans, XML_PullParser_getStyledSpans, and XML_PullParser_getStyledTags.
$mark_up = $parser->XML_PullParser_getCSSSpans(
array("code"=>"code", "emphasis"=>"emphasis")
);
$mark_up += $parser->XML_PullParser_getHTMLTags(array("classname"=>"b"));
$text = $parser->XML_PullParser_getTextMarkedUp($mark_up);NOTE: The tags marked up by this function cannot be empty, i.e. they must have both an open tag and a closing tag.
In other respects, this function works essentially the same as XML_PullParser_getTextStripped with one difference: it is not subject to the CDATA modifiers $XML_PullParser_XCLUDE_BLANKS, $XML_PullParser_XCLUDE_BLANKS_STRICT, $XML_PullParser_TRIM_CDATA
This method is designed to return all the character data contained within the START and END tags of an element, regardless of whether or not the texts are enclosed by child elements.
Example:
<News_item>There was a <b>big</b> rainstorm last night</News_item>This would resolve to: There was a big rainstorm last night
The default delimiter which separates the text from contiguous elements is a single space. This can be reset in XML_PullParser_setDelimiter, making it possible to gobble up the text from a known sequence of elements and split out the results.
<maintainer>
<user>foo_33</user>
<name>Joe Foo</name>
<email>Joe Foo@shaw.ca</email>
<role>lead</role>
</maintainer> foo_33;Joe Foo;Joe;Foo@shaw.ca;lead
The text returned from this function is also subject to the CDATA modifiers:
XML_PullParser_getToken initializes and returns the next top level element and all of its children for use with the class data access methods. The top level elements are those delcared in the tags array: XML_PullParser::$tags.
This method is the workhorse of XML_PullParser. It is repeatedly called, most typically in a while loop, to fetch the next token off the token stack. Each token consists of an element declared in the tags array and all of its dependent child elements. The tags array is pre-declared in XML_PullParser_declareElements or passed in through the constructor.
The companion to this method is XML_PullParser_getElement, which returns elements declared in in the child tags array: XML_PullParser::$child_tags.
Legacy method
The PHP XML parser, by default, converts all tag names to upper case, called case-folding.
This method returns TRUE if case folding is in effect. To put case sensitivity into effect call package-level XML_PullParser_caseSensitive.
Returns child array if $name is child of $el,
Used to determine whether $name is a child of $el. The returned array is equivalent to TRUE. If $name is not a child of $el then this method returns NULL.
if($parser->XML_PullParser_isChildOf($name,$el) ) {
// code here
}$el can be either the name of an element or an array holding the element; if it is the name of an element then the $current_element is used and lacking that the current token
Note: this method will also return an array if $name is the name of $el,
Determine whether element $el is an element of type $name
Get the next attribute from attribute loop
The array returned by this method has this structure:
[0] element name
[1] associative array of all the attributes in this element; the names of the attributes are the keys
[2] the character data enclosed by the element, if the array is created by XML_PullParser_setAttrLoop_elcd; Or, the empty string if created by XML_PullParser_setAttrLoop
Example:
<server ip="192.168.0.1" tech="tech@footloose.org"> ns1.example.net</server>
The above yields this array:
Array
(
[0] => SERVER
[1] => Array
(
[IP] => 192.168.0.1
[TECH] => tech@example.net
)
[2] => ns1.example.net OR ""
)Code to use this function:
$servers=$parser->XML_PullParser_getElement('server');
$attrs = $parser->XML_PullParser_setAttrLoop();
$n =1;
while($at = $parser->XML_PullParser_nextAttr()) {
$server_name = $parser->XML_PullParser_getText($servers,$n);
$n++;
echo "$at[0]: $server_name\n";
foreach($at[1] as $attr_name => $attr_value) {
echo "$attr_name => $attr_value\n";
}
}
Designed to work in loops using the internal array created by XML_PullParser_getElement.
This function removes each next element from the next element stack, and returns it until the stack is exhausted. The stack is a copy of $current_element.
This is useful only where there is more than one instance of an element:
172.20.19.6 example.com example_1.com example_2.com example_3.com www.example.com
Used in this situation, XML_PullParser_nextElement() will return each next server element in document order, making it possible to get at the text and the attributes. Note: This method is not used for accessing child elements of the elements saved by XML_PullParser_getElement. For that we have to use XML_PullParser_getSequence() or XML_PullParser_getChild(), or else include the child elements in the child tags array
The array it returns is a tokenized array that can be passed to the class methods which accept them. By default, this array is filtered through XML_PullParser_childXCL, which means that all children of the parent element are removed. This guarantees that the result returned when requesting text and attributes is for the element named in the parameter to XML_PullParser_getElement. But this also means that it is not suitable for use in applications which need to slurp together text from parent and all its children, as in a marked-up paragraph, since all the mark-up would be deleted in favor of the parent element.
The default behavior can be turned off by passing in a False value as a parameter, in which case the results are not filtered through XML_PullParser_childXCL
The idiom for its use is:
$parser->XML_PullParser_getElement('element_name');
while($next = $parser->XML_PullParser_nextElement()) {
$data = $parser->XML_PullParser_getText($next);
}
Pushes the current token back on the stack so that it can be re-read
XML_PullParser_clearPbackStack() should be called if XML_PullParser_pushbackToken() returns false. XML_PullParser_clearPbackStack() will return the pushed back token and prevent the possibility of an infinite loop.
Resets attribute loop pointer back to zero, so that the attributes loop can be re-read, starting at the top
Sets current element to a new value.
Creates an array of all attributes located in $el and its children.
Use XML_PullParser::XML_PullParser_nextAttr() to get the attributes.
For fuller description and example of use see XML_PullParser_nextAttr
This method does the same thing as XML_PullParser_setAttrLoop_elcd
The one difference is that this method requires that $el be declared in the $child tags array. XML_PullParser_setAttrLoop_elcd is an improvement on the code in this method and should be used unless imcompatibilties between the current and the previous version of XML_PullParser_setAttrLoop_elcd arise.
XML_PullParser_setAttrLoop_elcd is a wrapper for XML_PullParser_setAttrLoop.
This method differs from XML_PullParser_setAttrLoop in that it captures the text associated with each element, in addition to the attribute and element name. For an illustration of the array structure that it creates see XML_PullParser_nextAttr.
Like XML_PullParser_setAttrLoop, this method uses XML_PullParser_nextAttr to loop through the attribute array. The difference is that in the array returned by XML_PullParser_nextAttr, the second array element holds the element's character data instead of the empty string. For details see XML_PullParser_nextAttr.
This is an improvement on the old code for this method, which is still available as XML_PullParser_setAttrLoop_cdata. XML_PullParser_setAttrLoop_cdata should be used only if incompatibilties between the current and the previous version of XML_PullParser_setAttrLoop_elcd arise. In that event, please notify the developer at Myron_Turner_(at)_Shaw_(dot)_ca.
Creates the current namespace definition
It takes a single parameter, a string consisting of one or more namespace URI's. They must be exactly as definied in the XML document. If there is a trailing forward slash in the URI, then this must be included. If more than one namespace is passed in, they must separated by the vertical bar:
$parser->XML_PullParser_setCurrentNS("http://room535.org/movies/title/|"
. "http://room535.org/movies/mov/|http://room535.org/movies/star/");This method will return FALSE if namespace support has not been invoked in advance using XML_PullParser_NamespaceSupport and will not set the namespace definition; otherwise it sets the new namespace definition and returns the previously set definition, which is suitable for passing back into the method; if there is no previous namespace definition, it sets the definition to $ns and returns True.
Sets the delimiter for XML_PullParser_getTextStripped and returns the old delimiter
This method converts the array returned by XML_PullParser_getChildren into a valid tokenized array
It takes either one or two parameters.
Unsets current element and returns its value;
Unsets the current namespace definition
When namespace support is in effect and the current namespace definition is unset, XML_PullParser behaves as though the XML document had no namespaces.
PHP XML Callback: Internal
Converts token returned by _getTokenRaw to a form compatible with the tokens returned by {@linkXML_PullParser_getElement}
This method is essentially internal and is called by XML_PullParser_getToken.
Unless the raw token is converted, the PullParser data accesors are not available: XML_PullParser_getAttributes, XML_PullParser_getText, XML_PullParser_getTextStripped, XML_PullParser_getChild, XML_PullParser_getChildren
Initialize the XML Parser: Internal
PHP XML Callback: Internal
Returns the next available raw token and initializes a number of internal data structures.
Its return value cannot be used with the PullParser data access functions.
For full functionality, use PullParser::XML_PullParser_getToken.
This is used to test whether an element or an attribute falls within the current namespace definition.
This is mainly for internal use, particularly insofar as it is applied to elements. But it can be used by the programmer to determine whether an attribute resides within the current namespace definition. This can be done first by extracting the namespace URI from the attribute's name, using XML_PullParser_getNS_URI, and then passing into _is_current_NS() the URI and the attribute's value as key=>value array:
$name = XML_PullParser_getNS_URI($name);
if(_is_current_NS(array($name=>$value)) ) {
}
Helper function to XML_PullParser_childXCL
An internal function used to initialize a number of internal structures.
It is called by default by XML_PullParser_getToken.
Get next chunk of data from XML Parser: Internal
PHP XML Callback: Internal
Documentation generated on Thu, 07 Dec 2006 12:08:03 -0500 by phpDocumentor 1.3.0RC6