Small and Simple XML-Parser in REXX

Автор: Johannes Stelzer

Дата: 08.10.2008

Источник: Joshi's Blog

I've written a very small and simple XML-parser in Rexx.
The parser is based on the COBOL-way of parsing.

The parser queues XML-events (e.g. "START-OF-ELEMENT"), on the external queue. The calling procedure reads those events and processes them.

For example: xmltest.rexx


/* REXX */
parse arg infile
 call 'parseXML' infile

 do while queued() > 0
    parse pull xmlevent
    parse pull xmltext
    say  xmlevent : xmltext
 end
exit 0


The parser doesn't check the XML on being well-formed or against some xsd or dtd.
And it doesn't know namepaces...

But the code can be easily adapted to run under zOS without any dependency.
That's why i wrote it.

The Code:


/* Rexx xml parser
Copyright (c) 2008 Johannes Stelzer

This software is provided 'as-is', without any express or implied warranty.
In no event will the authors be held liable for any damages arising from
the use of this software.

Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it freely,
subject to the following restrictions:

   1. The origin of this software must not be misrepresented; you must not
      claim that you wrote the original software. If you use this software in
   a product, an acknowledgment in the product documentation would be
   appreciated but is not required.
   2. Altered source versions must be plainly marked as such, and must not be
      misrepresented as being the original software.
   3. This notice may not be removed or altered from any source distribution.
*/
   parse arg xmlfile
   call throwEvent 'START-OF-DOCUMENT', xmlfile

   do while lines(xmlfile) > 0
      xmlline = linein(xmlfile)
      call parseline xmlline
   end

   call throwEvent 'END-OF-DOCUMENT', xmlfile
exit 0

throwEvent: procedure
   parse arg xmlevent, xmltext
   queue xmlevent
   queue xmltext
return

parseLine: procedure
   parse arg xmlline

   do while pos('>',xmlline) > 0
      parse var xmlline vorher '<' token '>' +1 xmlline
      token = '<' || token || '>'

      if vorher <> '' then
         call throwEvent 'CONTENT-CHARACTERS', parseEntities(vorher)

      select
        when index(token,'<?') > 0 then do
              parse var token '<?' name '?>'
              call throwEvent 'PROCESSING-INSTRUCTION', strip(name)
        end

        when index(token,'<!--') > 0 then do
             parse var token '<!--' innen '-->'
             call throwEvent 'COMMENT', space(innen)
        end

        when index(token,'</') > 0 then do
             parse var token '</' innen '>'
             call throwEvent 'END-OF-ELEMENT', space(innen)
        end

        when index(token,'/>') > 0 then do
             parse var token '<' name attributes '/>'
             call throwEvent 'START-OF-ELEMENT', strip(name)
             call parseAttributes attributes
             call throwEvent 'END-OF-ELEMENT', strip(name)
        end
        otherwise
             parse var token '<' name attributes '>'
             call throwEvent 'START-OF-ELEMENT',  strip(name)
             call parseAttributes attributes
      end
   end

   if xmlline <> '' then
      call throwEvent 'CONTENT-CHARACTERS', parseEntities(xmlline)
return

parseAttributes: procedure
   parse arg attributes
   attributes = space(attributes)

   do while attributes <> ''
      parse var attributes name '=' '"' val '"' attributes
      call throwEvent 'ATTRIBUTE-NAME', strip(name)
      call throwEvent 'ATTRIBUTE-VALUE', parseEntities(val)
   end
return

parseEntities: procedure
   parse arg text
   out = ''

   do while text <> ''
      parse var text vor '&'+1 entity ';'+1 text
      if entity <> '' then do
         select
/* XML-entities */
            when entity = 'lt'     then entity = '<'
            when entity = 'gt'     then entity = '>'
            when entity = 'amp'    then entity = '&'
            when entity = 'apos'   then entity = "'"
            when entity = 'quot'   then entity = '"'
            otherwise
                 entity = '&' || entity || ';'
         end
      end
      out = out || vor || entity
   end
return out