Class XML_PullParser

Description

XML_PullParser is a token-based interface to the PHP expat XML library.

It moves the the API of the php XML facility from an event-based model to a token-based model. Instead of procesing data as it is passed from the parser to callbacks, a script using PullParser requests "tokens" from XML_PullParser_getToken(). Tokens are arrays representing XML structures, which become available in the order in which they appear in the document being parsed.

Methods are provided to get tokens and extract their data. The API consists of the methods and functions with the XML_PullParser_ prefix. Methods beginning with the underscore are internal.

All class methods which return tokens return NULL when tokens are not available and so can be used in while loops:

          while($token=$parser->XML_PullParser->getToken()){
          }

Similarly, all data accessors return either NULL, or the empty string or array when no data is available.

Documentation and examples are available in this Class Documentation created using phpDocumentor; in the manual pages, which were created from XML files using XML_PullParser; and in the sample files, which are complete PHP files based on the manual code listings.

Located in /XML_PullParser.inc (line 454)


	
			
Direct descendents
Class Description
XML_PullParser_doc
Variable Summary
Method Summary
XML_PullParser XML_PullParser (string $file, [array $tags = Null], [array $child_tags = Null])
array XML_PullParser_childXCL (array $parent, [mixed $args = ""])
array XML_PullParser_deleteBlanks ( $token)
array XML_PullParser_getAttributes (mixed $name, [mixed $which = 1], [array $el = ""])
string XML_PullParser_getAttrVal (string $name, array $attr_array)
array XML_PullParser_getAttrValues (array $ar)
string XML_PullParser_getAttr_NS (string $name, array $attr_array)
array XML_PullParser_getChild (string $child, [integer $which = 1], [array $el = ""])
array XML_PullParser_getChildren (string $child, [array $el = ""])
array XML_PullParser_getChildrenFromName (string $name, string $el)
array XML_PullParser_getCSSSpans (array $markup)
array XML_PullParser_getElement (string $el)
string XML_PullParser_getElementName (mixed $el)
array XML_PullParser_getHTMLTags (array $markup)
string XML_PullParser_getNS_AttrName (mixed $str)
string XML_PullParser_getNS_URI (mixed $str, [string $name = Null])
array XML_PullParser_getSequence ([array $el = ""], [mixed $args = ""])
array XML_PullParser_getStyledSpans (array $markup, array $attributes)
array XML_PullParser_getStyledTags (array $markup, array $attributes)
mixed XML_PullParser_getText ([mixed $el = ""], [integer $which = 0], [boolean $xcl = false])
arrray XML_PullParser_getTextArray (mixed $el)
string XML_PullParser_getTextMarkedUp (array $mark_up, [mixed $el = ""])
string XML_PullParser_getTextStripped ([mixed $el = ""])
mixed XML_PullParser_isChildOf (string $name, [mixed $el = ""])
bool XML_PullParser_isTypeOf (string $name, array $el)
array XML_PullParser_nextElement ([boolean $xcl = true])
array XML_PullParser_resetCurrentElement (array $cur_el)
array XML_PullParser_setAttrLoop ([array $el = ""], [boolean $assignText = false])
array XML_PullParser_setAttrLoop_cdata ([ $el = ""])
array XML_PullParser_setAttrLoop_elcd ([array $el = ""])
mixed XML_PullParser_setCurrentNS (string $ns)
string XML_PullParser_setDelimiter (string $delimiter)
array XML_PullParser_tokenFromChildren (mixed $child, [mixed $el = ""])
void _aligned ( $token)
void _characterData ( $parser,  $data)
array _convertToken (array $token)
array _createParser (mixed $file)
void _endElement ( $parser,  $name)
void _externalEntityParser ( $parser,  $openEntityNames,  $base,  $systemId,  $publicId)
array _getTokenRaw ()
boolean _is_current_NS (array $ns_array)
void _markUnmarkedStartTags ( $el)
void _nullify ( $temp,  $pos)
void _processToken ()
void _readData ()
void _startElement ( $parser,  $name, [ $attrs = NULL])
Variables
array $accumulator (line 490)
  • var: internal
array $child_tags (line 484)
  • var:

    An array of child elements of interest, which is used by the constructor.

    It can be passed into the constructor as an array of names or set using XML_PullParser_declareChildElements

  • see: XML_PullParser_declareChildElements()
array $converted_token (line 569)
  • var: This holds a copy of the token returned by XML_PullParser_getToken. It is the fall-back for several of the data accessors when no other usable token, including $current_element, is available for their searches. It is often referred to in this documentation and in the manual as "current token."
array $current_element (line 558)
array $current_slice (line 515)
  • var: internal
array $current_tag_array (line 523)
  • var: internal
resource $fp (line 509)
  • var: file pointer to xml document
array $push_back_stk (line 582)
  • var: internal
integer $read_length (line 590)
string $stream (line 576)
  • var: the data stream
array $tags (line 469)
  • var: The list of tag names of elements on which XML_PullParser will report. The format of this array is:
         array ("element_1", "element_2", . . "element_n")
    It can be passed into the constructor or predeclared using XML_PullParser_declareElements.
  • see: XML_PullParser_declareElements()
array $top_level_tags (line 497)
  • var: internal
resource $xml_parser (line 503)
  • var: a resource handle for referencing the XML instance
mixed $XML_PullParser_currentNS = Null (line 617)
  • var: holds the current namespace definition will be a string if only one namespace is declared, an array if more than one
mixed $_attr_loop_array = array() (line 665)

Created in XML_PullParser_setAttrLoop and used as a stack by XML_PullParser_nextAttr to return the next attribute.

This stack remains in memory until the next call to XML_PullParser_setAttrLoop or XML_PullParser_setAttrLoop_elcd

The stack pointer can be reset to the top of the stack using XML_PullParser_resetAttrLoopPtr

integer $_attr_loop_pos = 0 (line 633)
mixed $_escaped_tags (line 608)

Elements declared in both the $tags array and the $child $tags array.

A separate stack is maintained for these and they are accessed through XML_PullParser_getEscapedToken

mixed $_in_stream = false (line 668)
mixed $_next_element_array = array() (line 648)

$_next_element_array == $current_element; the assigment is made in in XML_PullParser_getElement

It is used by XML_PullParser_nextElement as a stack for returning the next element; When the final element is removed the stack is exhausted

mixed $_save_current_element (line 675)

used by methods which silently reset the $current_element

to an alternate token

string $_strippedCdataDelimiter = ' ' (line 599)
Methods
Constructor XML_PullParser (line 3576)
XML_PullParser XML_PullParser (string $file, [array $tags = Null], [array $child_tags = Null])
  • string $file: path/to/xml-file
  • array $tags: array of elements on which to report back via tokens
  • array $child_tags: array of child elements on which to report
XML_PullParser_childXCL (line 1502)

Excludes specified child elements from parent array and returns resulting array

If no elements are specified for exclusion, it excludes all child elements and returns the parent alone, from which all child elements have been removed.

This method assumes that the first element that it encounters is the parent and all others are its descendent elements. The descendent elements may themselves have descendent elements.

If there is more than one top-level element of the same name as the parent, these are also included in the returned array. This might be the case, for instance, in a tokenized array returned originally by XML_PullParser_getElement.

array XML_PullParser_childXCL (array $parent, [mixed $args = ""])
  • array $parent
  • mixed $args: names of elements to be excluded, can be either array or variable length list of strings
XML_PullParser_clearEscapedTokens (line 1281)

Clear the escaped token stack

void XML_PullParser_clearEscapedTokens ()
XML_PullParser_clearPbackStack (line 1366)
array XML_PullParser_clearPbackStack ()
XML_PullParser_deleteBlanks (line 3465)

Removes all blank CDATA array elements from $token

Tests all CDATA packets against the PERL regex '/\w/' and deletes from the token array any CDATA element which does not meet this test.

array XML_PullParser_deleteBlanks ( $token)
  • $token
XML_PullParser_free (line 3560)

Free the parser object.

Needed only if restarting the parsing process.

void XML_PullParser_free ()

Redefined in descendants as:
XML_PullParser_getAttributes (line 2048)

Returns an associative array of attribute-name/attribute-value pairs found in the specified element.

Example:

  Array
  (
      [IP2] => 192.168.0.1
      [TECH] => tech@example.net
      [IP3] => 192.168.0.2
      [IP4] => 192.168.0.3
  )

When namespace support is not in effect, it is possible to extract individual names/values from members of this array, using the standard PHP constructs, each and </b>foreach</b>; or, where $name is known, it is possible to get the value: $value = $array[$name]. It is safest to use XML_PullParser_getAttrVal, which guarantees the current case folding setting.

When namespace support is invoked, XML_PullParser_getAttrVal should always be used, because attribute names are re-formatted into keys with the namespace URI's.

  • return: Returns an associative array of name/value pairs where the keys are the attribute names. If $which is an integer, the result is from the $which_th instance of the element $el. If $which is set to false then it returns all the attributes found in $el and its child elements. In the latter case, if more than one attribute has the same name, there will be duplicate keys and only the last of the duplicates is returned because of the nature of associative arrays. Returns Null if no attributes are found. See XML_PullParser_getAttrValues for a function which guarantees duplicate names.
array XML_PullParser_getAttributes (mixed $name, [mixed $which = 1], [array $el = ""])
  • mixed $name: name of the element to be searched, can be the name of $el itself. $name can also be an array returned from one of the tokenizing functions. e.g. XML_PullParser_getElement, XML_PullParser_getChild, XML_PullParser_getToken. In this case, the tokenized array will be used for $el and the name of its topmost element will be used for $name. This feature has particular functionality when used in a loop with XML_PullParser_nextElement, enabling the return value from XML_PullParser_nextElement to be passed into XML_PullParser_getAttributes as $name.
  • mixed $which: (optional) integer or boolean. If $which is an integer, the function looks for the $which_th element of $name. If set to false, $name is ignored, and all attributes in $el are returned. But $name must still passed in as a Null parameter.
    NOTE: The false option is not suitable for cases where there is more than one attribute of the same name.
  • array $el: (optional) the element where $name is found If $el is Null, the function first tries $current_element, then the current token. $el is ignored if $name is an array.
XML_PullParser_getAttrVal (line 2552)

Get the value of an attribute.

This method guarantees conformance with the current case-folding setting.

It should always be used to access attribute values when namespace support is in effect.

  • return: the requested value for $name or Null if not found.
string XML_PullParser_getAttrVal (string $name, array $attr_array)
XML_PullParser_getAttrValues (line 2628)

This method excavates all the attributes of a specified element.

It is guaranteed to return all attribute names and values even in cases where a parent has more than one child of the same name with same-named attributes. For instance:

       <directory>
           <file name = "tutorial.doc" role = "doc" />
           <file name = "classes.php" role = "php" />
           <file name = "constants.inc" role = "php" />
      </directory>

The above would return the following array:

	Array
	(
	    [0] => Array
	        (
	            [NAME] => tutorial.doc
	            [ROLE] => doc
	        )

	    [1] => Array
	        (
	            [NAME] => classes.php
	            [ROLE] => php
	        )

	    [2] => Array
	        (
	            [NAME] => constants.inc
	            [ROLE] => php
	        )

	)

array XML_PullParser_getAttrValues (array $ar)
  • array $ar: an associative array holding a single array element in which the key is the name of an xml child element enclosing any number of attributes and the value is either the name of the parent element (string) or a tokenized array which is the parent to $name. e.g.
     array($child=>$parent).
    $child and $parent can be the same.
XML_PullParser_getAttr_NS (line 1085)

Get the value of an attribute if it falls within the current namespace definition

This method was designed primarily for internal use but may have applicability in some scripting situations. Generally, however, XML_PullParser_getAttrVal should be used to get attribute values.

  • return: the value of the attribute or NULL if the attribute name is not qualified by a namespace
string XML_PullParser_getAttr_NS (string $name, array $attr_array)
  • string $name: name of the attribute without its namespace qualification
  • array $attr_array: an assocative array consisting of the attribute's name and value, formed as follows:
             attribute-name=>attribute-value
    attribute-name is the name supplied by XML_PullParser_NS, which is an internally constructed key. The keys can be derived from the attribute arrays supplied by XML_PullParser_getAttributes, XML_PullParser_nextAttr and XML_PullParser_getAttrValues
XML_PullParser_getChild (line 1420)

Retrieves a child element and its dependents from its parent

This method extracts individual child elements from either $el or, if $el is not specified, from the $current_element or, failing that, from the current token.

$which specifies which instance of the child element to extract; the instances are treated as a sequence, in the order of appearance in the element's array, which follows the order of appearance in the XML document

if $child is the name of $el and $which == 1, then $el will be returned

  • return: Returns child array if found, or Null if child not found in $el or False if $el was not passed in and no $current_element or current token is found.
array XML_PullParser_getChild (string $child, [integer $which = 1], [array $el = ""])
  • string $child: name of the element to be extracted
  • integer $which: (optional) element to be extracted, defaults to 1,the first instance of the child
  • array $el: (optional) element from which the child is to be extracted
XML_PullParser_getChildren (line 1605)

This method will extract all the children named by $child from $el. If $el is not specified, then the $current_element is used. If the $current_element is not set, because XML_PullParser_getElement has not been called, then it searches the current token returned by XML_PullParser_getToken.

  • return: Returns requested array or False if !$el
array XML_PullParser_getChildren (string $child, [array $el = ""])
  • string $child: name of the child element to be searched for
  • array $el: optional array holding the child elements
XML_PullParser_getChildrenFromName (line 1657)

Fetches children from parent using a string to specify the parent element.

See {@linkXML_PullParser_getChildren} for the method which fetches children from a parent specified as array

This method uses either $current_element or current token as the array from which to derive the child elements. If $current_element is not set then the current token is used.

array XML_PullParser_getChildrenFromName (string $name, string $el)
  • string $name: name of the children sought
  • string $el: name of the parent element where children reside
XML_PullParser_getCSSSpans (line 2804)

This takes an associative array of XML element tags and CSS class names and converts it to an array structure suitable for use in XML_PullParser_getTextMarkedUp:

    array(xml_element =>css_class_name, xml_element =>css_class_name, . . )

For instance:

     array("code"=>"code", "emphasis"=>"boldface_italic");

array XML_PullParser_getCSSSpans (array $markup)
  • array $markup
XML_PullParser_getCurrentElement (line 2532)
array XML_PullParser_getCurrentElement ()
XML_PullParser_getElement (line 3137)

Gets tokenized arrays of elements specified in the $child_tags array.

This method is second only to XML_PullParser_getToken in importance. XML_PullParser_getToken returns elements specified in the tags array: XML_PullParser::$tags, whereas this method returns elements specified in the child tags array: XML_PullParser::$child_tags. The array it returns consists of all the elements name $el found in $converted_token and all of their dependents.

  • return: Returns requested array or False if !$el
array XML_PullParser_getElement (string $el)
  • string $el
XML_PullParser_getElementName (line 3227)

get the name of the element array $el or the element name portion of the internal string representation of the element

string XML_PullParser_getElementName (mixed $el)
  • mixed $el
XML_PullParser_getEscapedToken (line 1319)

Returns a single escaped token on each call.

An escaped token is an element which is declared in both the $tags array and the $child_tags array. A separate stack is created for these tokens. Each time XML_PullParser_getEscapedToken returns a token the token is popped off the stack, until the stack is exhausted, at which point it returns Null, making this method suitable for use in a loop.

The stack is persistent. If it is not exhausted and if the file being processed is larger than $read_length, tokens will be added to the stack when the next chunk of the file is parsed.

To clear the stack at any pont call XML_PullParser_clearEscapedTokens

An escaped token can be accessed by XML_PullParser_getEscapedToken at any time, as long as it is still on the stack. It can also be accessed in normal document order by XML_PullParser_getElement. But escaped tokens are not returned by XML_PullParser_getToken. However, if an escaped element is the child of the current token, then it can be accessed in the usual ways, e.g. XML_PullParser_getChild.

array XML_PullParser_getEscapedToken ()
XML_PullParser_getHTMLTags (line 2834)

This takes an associative array of XML element tags and HTML tags and converts it to an array structure suitable for use in XML_PullParser_getTextMarkedUp:

    array(xml_element =>html_tag_name, xml_element =>html_tag_name, . . )

For instance:

     array("code"=>"code", "emphasis"=>"b", "classname"=>"i");

array XML_PullParser_getHTMLTags (array $markup)
  • array $markup
XML_PullParser_getNS_AttrName (line 1008)

Gets unqualified attribute name from the internally created attribute key

The attribute key is created internally for namespace-qualified attributes from both the attribute name and the namespace

  • return: the attribute name extracted from the key
string XML_PullParser_getNS_AttrName (mixed $str)
XML_PullParser_getNS_URI (line 964)

Extracts the namespace URI from an internally constructed key for either attributes or elements

Element namespaces are held along with attributes in the attribute array assigned to an element:

  Array
  (
     [HTTP://EXAMPLE.COM/DNS.TXT/|IP] => 192.168.10.3
     [_ns_] => Array
        (
            [HTTP://EXAMPLE.COM/DNS.TXT/] => 1
        )
  )

These arrays are returned by XML_PullParser_getAttributes or can be extracted from the arrays returned by XML_PullParser_getAttrValues and XML_PullParser_nextAttr

  1. If the parameter is a string, this method assumes that it is the internal name of an attribute, as in the example above: HTTP://EXAMPLE.COM/DNS.TXT/|IP
  2. If the parameter is an array and $name is not specified, it assumes that the element's own namespace is being sought. This is held in '_ns_';
  3. If the parameter is an array and $name is specified, it looks for the attribute of that $name.

  • return: namespace URI or NULL if not found
string XML_PullParser_getNS_URI (mixed $str, [string $name = Null])
  • mixed $str: the internally constructed attribute name or an attribute array,
  • string $name: optional name of the attribute; this is the unqualified name, i.e. without either the namespace URI or the namespace prefix
XML_PullParser_getSequence (line 3390)

Get listing of all elements in sequence, including those of dependents, found in the array $el.

if $el is not supplied, then this function looks for the $current_element and if that's not found it then uses the current token.

The list consists of an associative array in which the keys are the names of the elements and the values the sequence number in the array being scannned. For instance, if the $current_element is set to "SERVER", then:

    Array
   (
       [0] => Array
            (
                [SERVER] => 1
            )

        [1] => Array
            (
                [SERVER] => 2
            )

        [2] => Array
            (
                [SERVER] => 3
            )

    )

This array is used to sequence through elements with functions that take a position number, for instance XML_PullParser_getText and XML_PullParser_getAttributes.

XML_PullParser_getSequence silently resets the $current_element to $el. To set it back to its original value after the sequence array has finished its work, call:
$parser->XML_PullParser_resetCurrentElement($parser->_save_current_element)

array XML_PullParser_getSequence ([array $el = ""], [mixed $args = ""])
  • array $el: (optional) array to parse, required only if $args is not present
  • mixed $args: (optional) variable length list or array of element names to include in returned array; those not in list will be ignored. If this parameter is not passed in, then the sequence array will include all the elements in $el or the default array
XML_PullParser_getStyledSpans (line 2740)

Adds attributes to spans and classes to spans and is used with XML_PullParser_getTextMarkedUp

This method takes two parameters, $attributes and $markup. Both are associative arrays. The $markup array is the same as the XML_PullParser_getCSSSpans array:

    array(xml_element =>css_class_name, xml_element =>css_class_name, . . )
The $attributes array is an associative array of this format:

    array(html_attribute =>attribute_value, . . )

If the first element in the $markup array has the following form:

  ("emphasis"=>"bold_text")

and if its counterpart in the $attributes array has this format:

  ("style"=>"font-size: 10pt")

The tag would become:

<span class="bold_text" style="font-size: 10pt">

The $markup array always defaults to

       class="markup"

The attribute/value pair of the $attributes array can be any valid markup.

The two arrays must be sequentially parallel, so that $markup-1 is modified by $attributes-1, etc. There cannot be duplicate keys, since the last duplicate overwrites the previous.

It returns an array dedicated for use in XML_PullParser_getTextMarkedUp; it can be combined with arrays returned by XML_PullParser_getStyledTags, XML_PullParser_getCSSSpans, and XML_PullParser_getHTMLTags.

array XML_PullParser_getStyledSpans (array $markup, array $attributes)
  • array $markup
  • array $attributes
XML_PullParser_getStyledTags (line 2768)

This function is almost identical to XML_PullParser_getStyledSpans, except that it modifies standard HTML tags, so that one could convert

<b> to <b class="title">

array XML_PullParser_getStyledTags (array $markup, array $attributes)
  • array $markup: associative array
  • array $attributes: associative array
XML_PullParser_getText (line 1882)

Get character data from the specified element.

If no parameters are passed to this method, it is assumed that the subject of the search defaults either to $current_element, or if that has not been set, then to the current token. All the character data of the of the default array is returned. This is in keeping with rule #1 below, where $which = 0 and $el is an array. (It is also implied by rule #5.)

  1. If $el is an array and $which==0, the array is passed to XML_PullParser_getTextStripped and all character data enclosed by the parent START and END tags is returned;
    [A.] if the element has child elements with text, that text, too, will be returned.
    [B.] if there is more than one element of the same name bound into the array, all character data from all elements of the same name will be returned.

  2. If $el is an array and $which > 0, the array is passed to XML_PullParser_getTextArray and the requested string is returned, using $which as an index into the array of strings returned by XML_PullParser_getTextArray. This array includes the text of parent and all descendents. To exclude descendents from this array set $xcl to true. Then the $which_th string will be selected only from elements named $el and not from any of its descendents.
    It is often advisable to call XML_PullParser_excludeBlanks in advance.

  3. If $el is a string and the name of a child element, then either the $current_element, or if that's not set, the current token, is searched and the character data returned depends on the value of $which:
    [A.] if $which has a value > 0, then the character data from the $which_th instance of $el is returned but not the character data of its children;
    [B.] if $which retains its optional value of zero, then the character data of all elements named $el is returned but not the character data of their children

  4. (Since release 1.2.1) If $el is a string and is the name of the default token, then the behavior is the same as when $el is an array.

  5. If $el is Null, then the character data of the default token is returned, including that of child elements.

  6. If $el is an array, or the name of the default token (see point 4 above), then the result can be filtered through XML_PullParser_Childxcl by setting $xcl to TRUE.

  7. Returns NULL or an Empty String if text is not found in $el

  8. Returns FALSE if $el does not resolve to an array and neither $current_element nor current token is found

All requests to this function are preprocessed here and ultimately passed on to XML_PullParser_getTextStripped, which means that its output is subject to the CDATA modifiers: XML_PullParser_excludeBlanks, XML_PullParser_trimCdata, and XML_PullParser_excludeBlanksStrict.

mixed XML_PullParser_getText ([mixed $el = ""], integer $which, [boolean $xcl = false])
  • mixed $el: optional name of element or a tokenized array
  • integer $which: position of child element within the parent (default = 0)
  • boolean $xcl: (default=false) when set to true the array $el or the default token is filtered through XML_PullParser_Childxcl()
XML_PullParser_getTextArray (line 1770)

Gets an array of strings consisting of the character data specified by the parameter $el, where $el is either a string naming the element or an array holding the element.

The return value is a numerically indexed array of strings which reflects the structure of the element referenced. If the element specified by $el is, for instance, a structure such as

          <Movies>
             <Movie>
               <Title>Gone With The wind</Title>
               <date>1939</date>
               <leading_lady>Vivien Leigh</leading_lady>
               <leading_man>Clark Gable</leading_man>
               </Movie>
             <Movie>
               <Title>How Green Was My Valley</Title>
               <date>1941</date>
               <leading_lady>Maureen O'Hara</leading_lady>
                          .
             </Movie>
               <Title>Jurassic Park</Title>
                          .
                          .
             </Movie>
          </Movies>

XML_PullParser_getTextArray("Title") will return an array of titles.

        Array
          (
              [0] => Gone With The Wind
              [1] => How Green Was My Valley
              [2] => Jurassic Park
          )

But XML_PullParser_getTextArray("Movies") will return an array consisting of all the character data between the <Movies> and </Movies>:

        Array
          (
              [0] => Gone With The Wind
              [1] => 1939
              [2] => Vivien Leigh
              [3  => Clark Gable
              [4] => How Green Was My Valley
              [5] => 1941
                        .
                        .
              [8] => Jurassic Park
                        .
                        .
          )

Because this method uses XML_PullParser_getTextStripped to retrieve the character data, all character data is returned, including character data from dependent child elements.

It is useful to call XML_PullParser_excludeBlanks, otherwise the array returned will include empty elements where they appear in the XML.

If $el is a string, it searches the $current_element for the specified element and failing that the current token. If $el is an array, the array should be specific to the text required, i.e. a container consisting of Start and End tags within which the text data resides.

arrray XML_PullParser_getTextArray (mixed $el)
  • mixed $el: the name of the element or an array holding the element
XML_PullParser_getTextMarkedUp (line 2879)

This method will mark up text, essentially for redisplay as in HTML, using the $mark_up array for determining which XML elements are to be marked up and how they are to be marked up.

The $mark_up array should be created using the functions provided: XML_PullParser_getHTMLTags, XML_PullParser_getCSSSpans, XML_PullParser_getStyledSpans, and XML_PullParser_getStyledTags.

 $mark_up = $parser->XML_PullParser_getCSSSpans(
         array("code"=>"code", "emphasis"=>"emphasis")
     );
 $mark_up += $parser->XML_PullParser_getHTMLTags(array("classname"=>"b"));

 $text = $parser->XML_PullParser_getTextMarkedUp($mark_up);

NOTE: The tags marked up by this function cannot be empty, i.e. they must have both an open tag and a closing tag.

In other respects, this function works essentially the same as XML_PullParser_getTextStripped with one difference: it is not subject to the CDATA modifiers $XML_PullParser_XCLUDE_BLANKS, $XML_PullParser_XCLUDE_BLANKS_STRICT, $XML_PullParser_TRIM_CDATA

string XML_PullParser_getTextMarkedUp (array $mark_up, [mixed $el = ""])
XML_PullParser_getTextStripped (line 3034)

This method is designed to return all the character data contained within the START and END tags of an element, regardless of whether or not the texts are enclosed by child elements.

Example:

      <News_item>There was a <b>big</b> rainstorm last night</News_item>
This would resolve to: There was a big rainstorm last night

The default delimiter which separates the text from contiguous elements is a single space. This can be reset in XML_PullParser_setDelimiter, making it possible to gobble up the text from a known sequence of elements and split out the results.

                <maintainer>
                       <user>foo_33</user>
                       <name>Joe Foo</name>
                       <email>Joe Foo@shaw.ca</email>
                       <role>lead</role>
                </maintainer>

  1.   $result $parser->XML_PullParser_getTextStripped($maintainer);
  2.   list($user$name,$email$roleexplode(';'$result);

$result would be:
              foo_33;Joe Foo;Joe;Foo@shaw.ca;lead

The text returned from this function is also subject to the CDATA modifiers:

  1. If the package level function XML_PullParser_excludeBlanks is called with a true value, XML_PullParser_getTextStripped will skip over instances of character data which contain only new lines, spaces, tabs, and carriage returns. This is aimed at XML_PullParser_getText and XML_PullParser_getTextArray, where nesting of elements and text can cause instance numbering and array counts to be misaligned.
  2. If XML_PullParser_excludeBlanksStrict is called with a true value, XML_PullParser_getTextStripped will reject any CDATA packet which does not contain at least one member of the regular expression character class "\w", which includes [A-Za-z0-9_-]
  3. If XML_PullParser_trimCdata is called with a true value, all CDATA packets will be trimmed using the PHP trim() function.

  • return: Returns a string concatenated from all the character data enclosed within the subject element, including the character data enclosed within its child elements. Returns Null if not found. White space counts a CDATA, and will not yield Null, unless XML_PullParser_trimCdata is called.
string XML_PullParser_getTextStripped ([mixed $el = ""])
  • mixed $el: (optional) array or string specifying element to be parsed; if $el is not set then it is assumed that subject of the text request is the $current_element or, lacking that, the current token
XML_PullParser_getToken (line 1201)

XML_PullParser_getToken initializes and returns the next top level element and all of its children for use with the class data access methods. The top level elements are those delcared in the tags array: XML_PullParser::$tags.

This method is the workhorse of XML_PullParser. It is repeatedly called, most typically in a while loop, to fetch the next token off the token stack. Each token consists of an element declared in the tags array and all of its dependent child elements. The tags array is pre-declared in XML_PullParser_declareElements or passed in through the constructor.

The companion to this method is XML_PullParser_getElement, which returns elements declared in in the child tags array: XML_PullParser::$child_tags.

array XML_PullParser_getToken ()
XML_PullParser_isCaseEnfolded (line 3548)

Legacy method

boolean XML_PullParser_isCaseEnfolded ()
XML_PullParser_isCaseFolded (line 3536)

The PHP XML parser, by default, converts all tag names to upper case, called case-folding.

This method returns TRUE if case folding is in effect. To put case sensitivity into effect call package-level XML_PullParser_caseSensitive.

boolean XML_PullParser_isCaseFolded ()
XML_PullParser_isChildOf (line 3298)

Returns child array if $name is child of $el,

Used to determine whether $name is a child of $el. The returned array is equivalent to TRUE. If $name is not a child of $el then this method returns NULL.

   if($parser->XML_PullParser_isChildOf($name,$el) ) {
        // code here
   }

$el can be either the name of an element or an array holding the element; if it is the name of an element then the $current_element is used and lacking that the current token

Note: this method will also return an array if $name is the name of $el,

  • return: Returns the child array or Null if child not found
mixed XML_PullParser_isChildOf (string $name, [mixed $el = ""])
  • string $name
  • mixed $el
XML_PullParser_isTypeOf (line 3257)

Determine whether element $el is an element of type $name

bool XML_PullParser_isTypeOf (string $name, array $el)
  • array $el
  • string $name
XML_PullParser_nextAttr (line 2266)

Get the next attribute from attribute loop

The array returned by this method has this structure:
[0] element name
[1] associative array of all the attributes in this element; the names of the attributes are the keys
[2] the character data enclosed by the element, if the array is created by XML_PullParser_setAttrLoop_elcd; Or, the empty string if created by XML_PullParser_setAttrLoop

Example:
<server ip="192.168.0.1" tech="tech@footloose.org"> ns1.example.net</server>

The above yields this array:

  Array
  (
   [0] => SERVER
   [1] => Array
       (
           [IP] => 192.168.0.1
           [TECH] => tech@example.net
       )
   [2] => ns1.example.net  OR  ""
  )

Code to use this function:

    $servers=$parser->XML_PullParser_getElement('server');
    $attrs = $parser->XML_PullParser_setAttrLoop();

   $n =1;
    while($at = $parser->XML_PullParser_nextAttr()) {
       $server_name = $parser->XML_PullParser_getText($servers,$n);
       $n++;
       echo "$at[0]: $server_name\n";
       foreach($at[1] as $attr_name => $attr_value) {
          echo "$attr_name => $attr_value\n";
      }

     }

array XML_PullParser_nextAttr ()
XML_PullParser_nextElement (line 2171)

Designed to work in loops using the internal array created by XML_PullParser_getElement.

This function removes each next element from the next element stack, and returns it until the stack is exhausted. The stack is a copy of $current_element.

This is useful only where there is more than one instance of an element:

    
    
    172.20.19.6
    example.com
    example_1.com
    example_2.com
    example_3.com
    www.example.com
    
    

Used in this situation, XML_PullParser_nextElement() will return each next server element in document order, making it possible to get at the text and the attributes. Note: This method is not used for accessing child elements of the elements saved by XML_PullParser_getElement. For that we have to use XML_PullParser_getSequence() or XML_PullParser_getChild(), or else include the child elements in the child tags array

The array it returns is a tokenized array that can be passed to the class methods which accept them. By default, this array is filtered through XML_PullParser_childXCL, which means that all children of the parent element are removed. This guarantees that the result returned when requesting text and attributes is for the element named in the parameter to XML_PullParser_getElement. But this also means that it is not suitable for use in applications which need to slurp together text from parent and all its children, as in a marked-up paragraph, since all the mark-up would be deleted in favor of the parent element.

The default behavior can be turned off by passing in a False value as a parameter, in which case the results are not filtered through XML_PullParser_childXCL

The idiom for its use is:

   $parser->XML_PullParser_getElement('element_name');
    while($next = $parser->XML_PullParser_nextElement()) {
         $data = $parser->XML_PullParser_getText($next);
    }

array XML_PullParser_nextElement ([boolean $xcl = true])
  • boolean $xcl: defaults to True; False turns off filtering through XML_PullParser_childXCL
XML_PullParser_pushbackToken (line 1346)

Pushes the current token back on the stack so that it can be re-read

XML_PullParser_clearPbackStack() should be called if XML_PullParser_pushbackToken() returns false. XML_PullParser_clearPbackStack() will return the pushed back token and prevent the possibility of an infinite loop.

bool XML_PullParser_pushbackToken ()
XML_PullParser_resetAttrLoopPtr (line 2496)

Resets attribute loop pointer back to zero, so that the attributes loop can be re-read, starting at the top

void XML_PullParser_resetAttrLoopPtr ()
XML_PullParser_resetCurrentElement (line 2520)

Sets current element to a new value.

array XML_PullParser_resetCurrentElement (array $cur_el)
  • array $cur_el
XML_PullParser_setAttrLoop (line 2428)

Creates an array of all attributes located in $el and its children.

Use XML_PullParser::XML_PullParser_nextAttr() to get the attributes.

For fuller description and example of use see XML_PullParser_nextAttr

array XML_PullParser_setAttrLoop ([array $el = ""], [boolean $assignText = false])
  • array $el: (optional) if $el is not set then it is assumed that subject of the request is the $current_element or, lacking that, the current token
  • boolean $assignText: (internal)
XML_PullParser_setAttrLoop_cdata (line 2294)

This method does the same thing as XML_PullParser_setAttrLoop_elcd

The one difference is that this method requires that $el be declared in the $child tags array. XML_PullParser_setAttrLoop_elcd is an improvement on the code in this method and should be used unless imcompatibilties between the current and the previous version of XML_PullParser_setAttrLoop_elcd arise.

array XML_PullParser_setAttrLoop_cdata ([ $el = ""])
  • $el
XML_PullParser_setAttrLoop_elcd (line 2366)

XML_PullParser_setAttrLoop_elcd is a wrapper for XML_PullParser_setAttrLoop.

This method differs from XML_PullParser_setAttrLoop in that it captures the text associated with each element, in addition to the attribute and element name. For an illustration of the array structure that it creates see XML_PullParser_nextAttr.

Like XML_PullParser_setAttrLoop, this method uses XML_PullParser_nextAttr to loop through the attribute array. The difference is that in the array returned by XML_PullParser_nextAttr, the second array element holds the element's character data instead of the empty string. For details see XML_PullParser_nextAttr.

This is an improvement on the old code for this method, which is still available as XML_PullParser_setAttrLoop_cdata. XML_PullParser_setAttrLoop_cdata should be used only if incompatibilties between the current and the previous version of XML_PullParser_setAttrLoop_elcd arise. In that event, please notify the developer at Myron_Turner_(at)_Shaw_(dot)_ca.

array XML_PullParser_setAttrLoop_elcd ([array $el = ""])
  • array $el: optional tokenized array
XML_PullParser_setCurrentNS (line 879)

Creates the current namespace definition

It takes a single parameter, a string consisting of one or more namespace URI's. They must be exactly as definied in the XML document. If there is a trailing forward slash in the URI, then this must be included. If more than one namespace is passed in, they must separated by the vertical bar:

    $parser->XML_PullParser_setCurrentNS("http://room535.org/movies/title/|"
      . "http://room535.org/movies/mov/|http://room535.org/movies/star/");

This method will return FALSE if namespace support has not been invoked in advance using XML_PullParser_NamespaceSupport and will not set the namespace definition; otherwise it sets the new namespace definition and returns the previously set definition, which is suitable for passing back into the method; if there is no previous namespace definition, it sets the definition to $ns and returns True.

mixed XML_PullParser_setCurrentNS (string $ns)
  • string $ns
XML_PullParser_setDelimiter (line 3115)

Sets the delimiter for XML_PullParser_getTextStripped and returns the old delimiter

string XML_PullParser_setDelimiter (string $delimiter)
  • string $delimiter
XML_PullParser_tokenFromChildren (line 3501)

This method converts the array returned by XML_PullParser_getChildren into a valid tokenized array

It takes either one or two parameters.

  1. Two Parameters: the name of a child element and a tokenized array.
    In this case it extracts the children from the token using XML_PullParser_getChildren and then converts the resulting array to a valid token
  2. One Parameter: an array that has already been processed by XML_PullParser_getChildren.
    In this case, it converts this array to a valid token

array XML_PullParser_tokenFromChildren (mixed $child, [mixed $el = ""])
XML_PullParser_unsetCurrentElement (line 2506)

Unsets current element and returns its value;

array XML_PullParser_unsetCurrentElement ()
XML_PullParser_unsetCurrentNS (line 919)

Unsets the current namespace definition

When namespace support is in effect and the current namespace definition is unset, XML_PullParser behaves as though the XML document had no namespaces.

  • return: previously set namespace definition
string XML_PullParser_unsetCurrentNS ()
_aligned (line 1162)
void _aligned ( $token)
  • $token
_characterData (line 680)

PHP XML Callback: Internal

void _characterData ( $parser,  $data)
  • $parser
  • $data
_convertToken (line 1136)

Converts token returned by _getTokenRaw to a form compatible with the tokens returned by {@linkXML_PullParser_getElement}

This method is essentially internal and is called by XML_PullParser_getToken.

Unless the raw token is converted, the PullParser data accesors are not available: XML_PullParser_getAttributes, XML_PullParser_getText, XML_PullParser_getTextStripped, XML_PullParser_getChild, XML_PullParser_getChildren

array _convertToken (array $token)
  • array $token
_createParser (line 794)

Initialize the XML Parser: Internal

  • return: (resource: xml parser handle, resource: file handle)
array _createParser (mixed $file)

Redefined in descendants as:
_endElement (line 739)

PHP XML Callback: Internal

void _endElement ( $parser,  $name)
  • $parser
  • $name
_externalEntityParser (line 764)
void _externalEntityParser ( $parser,  $openEntityNames,  $base,  $systemId,  $publicId)
  • $parser
  • $openEntityNames
  • $base
  • $systemId
  • $publicId
_getCurrentPosition (line 1113)
void _getCurrentPosition ()
_getCurrentSlice (line 3329)
void _getCurrentSlice ()
_getCurrentTagArray (line 3523)
void _getCurrentTagArray ()
_getTokenRaw (line 1228)

Returns the next available raw token and initializes a number of internal data structures.

Its return value cannot be used with the PullParser data access functions.

For full functionality, use PullParser::XML_PullParser_getToken.

array _getTokenRaw ()
_is_current_NS (line 1039)

This is used to test whether an element or an attribute falls within the current namespace definition.

This is mainly for internal use, particularly insofar as it is applied to elements. But it can be used by the programmer to determine whether an attribute resides within the current namespace definition. This can be done first by extracting the namespace URI from the attribute's name, using XML_PullParser_getNS_URI, and then passing into _is_current_NS() the URI and the attribute's value as key=>value array:

     $name = XML_PullParser_getNS_URI($name);
     if(_is_current_NS(array($name=>$value)) ) {
     }

boolean _is_current_NS (array $ns_array)
  • array $ns_array: this is a single element array of the type key=>value, where the key is the name space string and the value either an attribute value or TRUE
_markUnmarkedStartTags (line 1378)
void _markUnmarkedStartTags ( $el)
  • $el
_nullify (line 1575)

Helper function to XML_PullParser_childXCL

void _nullify ( $temp,  $pos)
  • $temp
  • $pos
_processToken (line 3170)

An internal function used to initialize a number of internal structures.

It is called by default by XML_PullParser_getToken.

void _processToken ()
_readData (line 820)

Get next chunk of data from XML Parser: Internal

void _readData ()

Redefined in descendants as:
_startElement (line 691)

PHP XML Callback: Internal

void _startElement ( $parser,  $name, [ $attrs = NULL])
  • $parser
  • $name
  • $attrs

Documentation generated on Thu, 07 Dec 2006 12:08:03 -0500 by phpDocumentor 1.3.0RC6