Advanced REXX Programming Topics

Автор: Charles Daney

Скачать книгу в формате TXT.

The entire contents of this paper are Copyright (c) 1994-5 by
Charles Daney - All Rights Reserved. Permission is granted to disseminate
this paper electronically as long as no changes are made. The paper may
not be republished in hard copy form without explicit permission from
the author, whose email address is 75300.2450@compuserve.com.

This paper, future updates (if any), and much additional information
on the REXX language are available from Quercus Systems Web pages -
https://www.quercus-sys.com.

Introduction
------------

REXX is one of the great under-appreciated treasures of OS/2. Although
REXX as a programming language is now 15 years old and REXX has been
implemented on just about every major computer platform, many OS/2 users
will be encountering it for the first time. And when it is encountered
for the first time in OS/2, users are often introduced to REXX as just a
modern replacement for the primitive Microsoft-designed "batch" language.

In fact, REXX is a very good "batch" or "procedure" language, in that it
can be used to automate repetitive sequences of operating system
commands that need to be used together in a "batch". But REXX brings all
of the features of a full-fledged programming language to this task -
variables, arithmetic, input/output, and control structures. These
features make it possible to do things easily in REXX which could be
done in the old batch language (if they could be done at all) only with
very arcane and convoluted techniques. Many computer columnists and book
authors at one time wrote endless streams of words describing yet another
clever trick to make the batch language perform tasks it simply wasn't
designed to do - tasks that are almost trivial in REXX.

Many OS/2 users are introduced to REXX for the first time when they come
across a simple REXX procedure to manipulate features of the Workplace
Shell - such as creating folders, adding background bitmaps, modifying
OS/2 configuration parameters, or changing system fonts. Other users
encounter REXX for the first time as a tool for writing installation
programs for various applications. These are all excellent examples of
REXX used as a "batch" language.

Unfortunately, this usage of REXX may have obscured the fact that, since
REXX is a complete programming language, it can actually do much more.
Recently, several tools for visual programming have become very popular
in OS/2 - VX-REXX from Watcom, VisPro/REXX from Hockware, and GpfRexx
from Gpf Systems. As the product names imply, each of these products uses
REXX as its underlying development language. So OS/2 users who first
encountered REXX as a better batch language also discovered that it is
the foundation of powerful tools for visual programming.

It is possible, using tools like these, for people with little detailed
knowledge of GUI programming (or perhaps of any kind of programming) to
quickly learn to develop professional-looking applications that use the
full toolbox of OS/2 Presentation Manager elements, such as dialogs,
push buttons, list boxes, notebooks, and containers. The user interface
builders do most of the work of automatically generating REXX code to
create the desired interface. It is necessary for the developer to write
only a relatively small amount of REXX code to create a customized
application. This makes it natural to use REXX for personal programming
or application prototyping in a GUI environment.

But there are even more surprises for OS/2 users as far as REXX is
concerned. In fact, OS/2 users are just now discovering that many OS/2
applications have adopted REXX as their primary or alternate "scripting"
language. The list of such "REXX-enabled" applications now includes:

    Text editors
    ---- -------
    KEDIT (Mansfield Software)
    SPF/2 (Command Technology)
    Tritus SPF (Tritus)
    EPM (IBM)
    SourceLink (One Up Corporation)

    Communication software
    ------------- --------
    REXXTERM (Quercus Systems)
    PMCOMM (Multinet)
    TE/2 (Oberon Software)
    Extra! (Attachmate)
    Communications Manager (IBM)
    TCP/IP (IBM)

    Word processors
    ---- ----------
    Ami Pro (Lotus)
    DeScribe (DeScribe)

    Database tools
    -------- -----
    DB2/2 (IBM)
    dbfREXX (dSoft Development)
    REXXBASE (American Coders)
    QELIB (Q+E Software)
    XDB-QMT (XDB Systems)

    Other
    -----
    1-2-3 (Lotus)
    MMPM/2 (IBM)
    Deskman/2 (Development Technologies)
    Fax/PM (Microformatic)
    Chron (Hilbert Computing)

Clearly something is going on here - REXX is being adopted as a
universal macro language by sophisticated OS/2 applications. The most
obvious benefit of this is that users no longer need to learn a new
language to control each application - one common language can do it
all. In the past, the very term "macro language" has been daunting to
many users because such languages have tended to be obscure and
difficult to use since they were designed with only a limited purpose in
mind. And this is on top of the fact that they were different for each
application. REXX makes a big improvement here. While it is not true
that there is an insignificant learning curve to REXX, the language is
still fairly natural and intuitive. But most importantly, once it has
been mastered, one doesn't have to learn it all over again for a new
application.

There is a second major advantage to having REXX as a universal macro
language. This is the fact that it puts REXX in the position of being
able to communicate with any REXX-enabled application. And hence, every
such application can communicate with any other one through REXX. So
REXX becomes a sophisticated, programmable, interprocess communication
tool.

The bottom line of all this is that REXX is not just a better batch
language, or a tool for visual programming and rapid application
development, or a universal macro language. It is all of these things at
the same time, so perhaps the best way to think of REXX is as a gateway
to most of the facilities of OS/2 and among all REXX-enabled
applications. REXX gives programmers access to most OS/2 services, like
the Workplace Shell, multimedia, interprocess communication, and the
file system. But because REXX is also accessible from applications, it
makes all these services available to the applications, and it makes
the services of each application available to others. A REXX-aware
application does not (necessarily) need to provide its own support
for multimedia, serial communications, or database services, because
it can utilize REXX scripts to get at these capabilities. REXX can
be thought of as "application glue".

With all this as prologue, it is not difficult to understand why so many
OS/2 users have already taken the time to become at least passingly
familiar with REXX (and many more will in the near future). Now, REXX
was designed to be easy to use, but there's no point in pretending it
is effortless, since it isn't. Prospective users of REXX should be
prepared for a learning curve and should allow some time to get
reasonably proficient with it - anywhere from a few days to a few weeks
depending on the individual and his or her background. The good news is
that, once proficiency is gained, the investment can be reused again and
again because REXX is so versatile.

In the rest of this paper, we will assume that the reader has already
attained some level of comfort in understanding and using REXX, because
the purpose here is to provide an introduction to some advanced topics
in REXX. Just as with the initial learning of the language, one will be
repaid over and over by the mastery of some of these advanced
techniques, because they are (potentially) equally useful in a complex
VX-REXX based application, an Ami Pro macro, or a personal utility
batch program.

Just in case you don't feel you already have a good grounding in the
fundamentals of REXX, we'll mention a few resources you can turn to for
review. (See the Bibliography for full details.) First, online
documentation for REXX comes with OS/2. But hard copy, including a
tutorial that isn't online, can be purchased from IBM. _The REXX
Language_ by Mike Cowlishaw (who invented the language) is a very good
language definition and complete reference on the rules. (A lot of IBM's
own documentation is straight out of this book.) The present author's
book, _Programming in REXX_, is recommended for readers who want a more
detailed explanation of REXX concepts and general programming
techniques. These two books deal with REXX in general and do not refer
to OS/2 specifically.

There are already several books that do cover REXX explicitly from an
OS/2 perspective. Two of the best are the _OS/2 2.1 REXX Handbook_ by
Hal German and _Application Development Using OS/2 REXX_ by Tony Rudd.
Finally, every even halfway serious REXX programmer should have a copy
of the _REXX Reference Summary Handbook_ by Dick Goran. This last is not
a textbook but rather a handy reference summary filled with the
essential information about REXX, several REXX function libraries, and
details of parameters used to control the Workplace Shell.

Data Structure, Program Structure
---- ---------- ------- ---------

It's possible to become proficient with REXX simply by learning the
relatively simple syntax rules of the language, studying a few good
sample programs, and proceeding to actually use REXX in creating
prototypes or full applications that are useful to yourself personally
or to your company. Any one of the visual REXX programming tools
mentioned before can be highly recommended to assist in the learning
process. They do a lot of the work for you and teach you many of the
standard REXX rules and techniques as you go. They also include
their own debuggers, which can show you exactly how REXX programs
operate.

However, because REXX is a full and complete programming language,
becoming really good at it requires that you learn a few key concepts as
well. Some of these concepts are shared by other programming languages,
but there are others that are more or less unique to REXX itself.

The concepts we want to focus on have to do with data structure and
program structure. In both cases, there are significant differences in
how these structures are used in REXX as compared to other languages.

Data structures have to do with how program data is organized. Like
other languages, REXX has variables. But REXX is rather different in how
it organizes collections of data. Unlike most languages, REXX does not
have arrays in the usual sense, but it does have a much more powerful
structure called "compound" variables. Although such compound variables
can be a little more awkward to work with than ordinary arrays, they can
also do a lot more. A compound variable is something like an array that
may have non-numeric subscripts. Sometimes this is called an
"associative" array, because data items can be retrieved by their
"association" with other data items. Another way to think of this is
that it's like looking up a record in a database on the basis of some
key value, e. g. a person's name or a book's title.

Another thing that is definitely lacking in REXX is the notion of a
"structure" in the narrow sense of a C structure or Cobol record. This
can be a major stumbling block for would-be advanced users of REXX.
C structures are often used for one of two things. They may
represent "records", which are collections of related data, such as
information related to a particular employee. There are relatively
simple (though somewhat clumsy) ways to simulate records in REXX using
compound variables.

C structures can also be employed to build up more complex data objects
using techniques involving linked lists, trees, and so forth. Such
intermediate level data structures can in turn be used to represent
fairly high-level data abstractions like sets, collections, tables, and
the like. It becomes rather awkward to use REXX compound variables to
simulate linked lists directly, and even harder to use these to
implement the high-level abstractions.

But fortunately, it is often possible to succeed by rethinking the whole
problem in terms of native REXX facilities. Many high-level abstractions
can be handled in REXX without working with lower-level representations
at all. A lot of the discussion here will explain how to do this.

When we come to discuss program structure here we will not say very much
about traditional concepts of "structured" programming - loops, subroutines
and functions, handling of alternative cases. The REXX facilities in this
respect are largely similar to what exists in other languages like C or
PL/I. Instead, we will look at how REXX applications are organized in
one or more files and how these semi-independent pieces can communicate
and share data. This is an issue that often does not even arise with other
languages, where programs are traditionally all linked into a single
executable file.

REXX applications, in contrast, often consist of separate program units
that are never explicitly linked together at all. Sometimes the relation
between such units is the simple one of caller and callee, in much the
same way that subroutines and functions can be used within a single
source file. But sometimes the relation can be more complicated,
particularly in OS/2 with its sophisticated multithreading abilities.
The relationship of different program parts can become even more complex
when visual REXX programming tools are used, since there may be one or
more pieces of REXX code associated with each event that can occur for a
specific interface element - such as a button press.

Keeping persistent data associated with a single REXX program, and
sharing data between independent REXX programs, can become a very tricky
problem, but a solvable one. And this is the other issue we plan to
address here.

There is, moreover, a significant interaction between data structure and
program structure. Specifically, you have to consider how to represent
data in REXX not only in light of how it will be used internally, but
also in terms of how it may be shared between independent REXX programs.

Compound variables
-------- ---------

Arrays are the most commonly used data structure in programming.
Although REXX does not have arrays as such, compound variables can
usually be used like an array, although with some occasional
syntactical awkwardness.

To review, a compound variable is one whose name is derived from a
compound symbol, that is a symbol which begins with a legal symbol
character other than a number or a period, contains at least one period,
and at least one character following the last period. The following are
all legal compound symbols:

    array.i
    restaurant..address
    a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z

The part of a compound symbol up to and including the first period is
called the "stem". It is always used literally. All other simple symbols
occurring in a compound symbol, i. e. the symbols delimited by the
periods, are replaced with their current values in forming the name of a
compound variable. In this respect they are analogous to the subscripts
of arrays in other languages. Indeed, it is possible to think of a
symbol like A.i.j.k as equivalent to an array element, which would be
expressed as A[i][j][k] in C.

There are, however, significant differences between such "arrays" in
REXX and arrays in other languages. Some of these differences represent
advantages of REXX, but others are disadvantages. On the positive side,
because of REXX's dynamic memory management, it is never necessary to
declare in advance how large an array will be. It simply grows as
needed, and (usually) does not consume storage for "unused" elements of
the array.

In fact, a REXX array does not even have a specific "dimension" like an
array in other languages. This is because the periods in a compound
symbol have syntactic meaning only in the symbol. Once the name has been
derived by substituting all values of simple symbols, there are really
only two parts to it: the stem and the "tail". For instance if we have

    i = 3
    j = 4

then the symbol A.i.j actually consists of just the stem, which is "A."
and the tail, which is "3.4". At this point, the fact that the tail
still contains a period is irrelevant. So, if we also have

    x = 34
    y = 10
    z = x/y

then the symbol A.z refers to exactly the same piece of data, because z
has the value "3.4". Useful programs can actually be written that take
advantage of this ambiguity of the "dimension" of a REXX array.

The tail of a REXX variable can consist of completely arbitrary data,
including blanks and unprintable ASCII characters. In contrast to the
usual situation in REXX, blanks are significant in a variable tail. Thus
if

    x = ""
    y = " "
    z = "  "

then A.x, A.y, and A.z refer to three completely different data items,
even though x, y, and z are "equal" when compared with the normal
comparison operators. This is another respect in which REXX "arrays"
differ from those in other languages.

In some ways, however, the REXX array notation is not as powerful, or at
least as convenient, as the notation of other languages. In particular,
it is not possible to have expressions in a REXX "subscript". For
instance, A.i+j is the sum of A.i and j, instead of an array element
with the subscript i+j. Even parentheses cannot be used to circumvent
this problem, since A.(i+j) is actually a function call to a function
named "A.".

This notation is usually the most inconvenient when you simply want to
use another compound symbol as a subscript. So if i.j is a value you
wish to use as a "subscript", you cannot just refer to A.i.j. You must
assign i.j to a simple variable first:

    x = i.j
    say A.x

Despite these syntactical inconveniences, the great power of REXX's
notation lies in the fact that "subscripts" can be non-numeric. This
allows you to build data structures which easily associate data values
with data names. Suppose, for instance, that you want to work with a
database of books. In REXX you can do this by having a number of arrays,
each of which is subscripted by the name of the book. The names of these
arrays might be "author", "date", "publisher", "ISBN", and so forth.
Then if the name of a particular book is stored in the variable "title",
you can retrieve all of the other information directly be referring to
author.title, publisher.title, etc. Because of this direct association
from a name to a value, such data structures are sometimes called
"associative arrays".

As far as the language user is concerned, there is no search process at
all involved in looking up the author of a given book. In reality, of
course, REXX does need to do a search to find each piece of data. The
advantage is that this search process is all built-in and transparent to
the user.

To continue with the example, the collection of variables indexed by the
book name is, in effect, a data structure much like a table. The rows of
the table are labelled by book names, and the columns of the table have
labels like "author", "publisher", etc. This is very typical of a more
complex data structure in REXX: while it's conceptually a single object,
it is actually composed of a number of REXX compound variables that have
been "subscripted" in the same way.

Concretely, you would select appropriate compound variable stem names
for each "column" of the table. For example:

    book_author.
    book_publisher.
    book_date.
    book_isbn.

Perhaps, if you do some programming in C where case is significant, you
might prefer to use:

    BookAuthor.
    BookPublisher.
    BookDate.
    BookISBN.

Just remember that case is ignored in REXX stem names, so that
"bookauthor." (for example) is not a different name.

In practice, you will have data on a number of books, and before you
can use it in a program, it has to be loaded from somewhere. The data
may normally be kept in a flat file, for instance. Unless you are
importing the data from another source that has already determined a
file format, you are relatively free to structure the data file any way
you want. Let's say you decide to identify each data element with a
tag so that your file contains lines like this:

    Title:          Programming in REXX
    Author:         Charles Daney
    Publisher:      McGraw-Hill
    Date:           1992
    ISBN:           0-07-015305-1

You may put some sort of delimiter between the lines corresponding
to a single book. Or you may just assume that every time you find a
line beginning with "Title:" is starts the data for a new book. Then
you could use this code to read in the data:

    do while lines(input) \= 0
        parse value linein(input) with label ':' data
        data = strip(data)
        label = translate(label)
        select
            when label = 'TITLE' then
                title = data
            when label = 'AUTHOR' then
                book_author.title = data
            when label = 'PUBLISHER' then
                book_publisher.title = data
            when label = 'DATE' then
                book_date.title = data
            when label = 'ISBN' then
                book_isbn.title = data
            otherwise nop
            end
        end
    call lineout input

This example is fairly straightforward, but we will come to some ways to
simplify it later. As an aside, note the use of the STRIP function to
remove leading and trailing blanks. We did not assume that the "data"
part of the record was a single word, since we may very well want to
have embedded blanks. And we used the TRANSLATE function to make sure we
were working with the label in upper case. Also note that this example
(as with most of the rest) has minimal error checking. We don't check
that the labels are valid, for instance, except to provide an OTHERWISE
case that does nothing in the SELECT statement.

Since a table is a typical sort of two dimensional array, you may be
wondering whether it could be represented by using a single compound
variable with two "subscripts". You might think of using a stem "book."
and where one subscript is one of the column labels such as "author"
while the other is the book name. In other words, a table entry would be
referred to as "book.field.book_name", where "field" would be a variable
with a value like "AUTHOR".

There are several pitfalls with this approach, which is why we didn't
suggest it to begin with. The first is the fact that you will frequently
want to refer to the elements of a particular column by putting the
column name in explicitly:

    say book.author.title

in order to display the author of a book if you were given a title. This
would actually work, provided there is no variable named "author" that
has been assigned a value. (Since REXX uses a default value of "AUTHOR"
- note the upper case.) However, if some time previously you had

    author = "William Shakespeare"

then you would be attempting to reference a compound variable whose tail
begins with "William Shakespeare" instead of "AUTHOR".

There are various ways around this problem. For instance, you can always
use an extra variable in the symbol:

    field = 'AUTHOR'
    say book.field.title

But this is cumbersome, and you also have to be very careful about
alphabetic case, which is significant in the tail of a compound
variable. I. e.

    field = 'Author'
    say book.field.title

wouldn't work unless you always use 'Author' instead of 'AUTHOR'.

Note that one thing you can't do is to use a literal in the compound
variable name:

    say book.'AUTHOR'.title

doesn't work, since the expression following SAY is the concatenation
of (the value of) book., the literal 'AUTHOR', and ".TITLE".

Even if you are careful never to assign anything to "author", there is a
performance penalty with using it, because REXX will try to look up a
value anyway. One thing you could do to get around that is to pick
somewhat odd names for the columns, like '0AUTHOR'. This is actually
legal, and will work, and will not try to perform unwanted substitution:

    say book.0author.title

but it certainly isn't elegant.

There is perhaps just one thing that can be said for the path we have
been exploring where we use just one stem name for this collection
of data. That is, you can more easily refer to the whole collection
somewhat more compactly when you want to pass its name to a subroutine,
or use it with EXPOSE or DROP. E. g.

    drop book.

makes the whole collection undefined, whereas otherwise you would need
to be much more verbose:

    drop book_author book_publisher book_date book_isbn

and it could be a lot harder to maintain code that uses names in
this form, since you must make a lot of explicit references to all of
the columns of the table.

However, there is one thing that can be done to ameliorate this
problem a little. REXX allows you to use a whole list of names with
DROP and EXPOSE if you assign the list to another variable:

    book_stuff = 'book_author book_publisher book_date book_isbn'
    drop (book_stuff)

But it's clumsy even so. Still, there is hope. It is possible to avoid
having to refer explicitly to every "column" of the table in some cases.
Earlier we said there was a more compact way to write the code for
reading in the book data. It is based on the fact that the stem part of
a compound variable name can be a variable or computed string if we use
the VALUE function. Here is how the code to read in book data might look
with this approach:

    do while lines(input) \= 0
        parse value linein(input) with label ':' data
        if label = '' then
            iterate
        data = strip(data)
        label = translate(label)
        if label = 'TITLE' then
            title = data
        else
            call value 'book_'label'.title', data
        end
    call lineout input

This is a lot more compact, and doesn't need to be changed if arbitrary
new types of book information (i. e. columns) are added. VALUE is a
tricky function to understand, but it's very convenient once you get the
hang of it. If it isn't clear, try running the code using TRACE R to get
a feel for what is happening here. (Note that we tested for a null input
line, which might occur at the end of file and could be elsewhere.)

There is another, entirely different, difficulty with this table
structure as we have outlined it so far. As we have presented it, the
code and data structure are well-designed for taking a complete book
title and retrieving information about it, such as the book's author,
publisher, etc. All you have to do is reference the data using the
book title as an associative key.

You can even tell easily whether a new book to be added to the database
is already included:

    if symbol('book_author.title') = 'VAR' then
        say 'Already in database:' title

But it is an altogether different matter if you want to use the data
some other way. Perhaps you want to list all books in the database that
have a certain author or a certain character string in their title. Here
we see a major disadvantage of non-numeric subscripts in REXX: there is
no simple way (as there is with numbers) to iterate through "all values"
of the subscript.

This is a problem that arises repeatedly when using non-numeric
subscripts. There just isn't any way to find, in standard REXX, all the
variables having a given stem. (Ironically, programs written in other
languages that interface to REXX can do this through the shared variable
interface.) But there are ways around the problem. One way to accomplish
this objective is to make a compromise with the conceptual simplicity of
pure associative arrays and re-introduce numerically subscripted arrays.

In our example, what we do is use the stem book_title. to store book
titles. At the same time we build the rest of the table, we also set
book_title.i to the title of book number i. Then, in any circumstance
where we have to search through the whole list of books, we iterate
through values of book_title.i for i=1 to whatever the largest book
number is. Using the name stored in book_title.i we can then retrieve
any of the other information, since it is "subscripted" with that name.

Here's how we might write the code to read in the data with this
technique:

    count = 0
    do while lines(input) \= 0
        parse value linein(input) with label ':' data
        if label = '' then
            iterate
        data = strip(data)
        label = translate(label)
        if label = 'TITLE' then do
            count = count + 1
            book_title.count = data
            title = data
            end
        else
            call value 'book_'label'.title', data
        end
    call lineout input
    book_title.0 = count

Notice that we set the .0 element of the compound variable to the
number of elements in the "array". This is a very common convention
one finds in REXX, though it isn't an official part of the language.

Now we have a means of looking up book information either directly
through the book name or by processing the whole table of information
sequentially. But there is yet another problem to consider. It depends
on the details of how REXX compound variables are actually implemented.
Most implementations store the data in binary trees. For processing
efficiency, each node of the tree contains the string value of the
"subscript". If the same string is used as a subscript on many
variables, it may therefore appear many times in storage. With names
that are long (such as the names of books), this can waste a lot of
storage.

One last variation of the sample code fixes this:

    count = 0
    do while lines(input) \= 0
        parse value linein(input) with label ':' data
        if label = '' then
            iterate
        data = strip(data)
        label = translate(label)
        if label = 'TITLE' then do
            count = count + 1
            index.data = count
            end
        call value 'book_'label'.count', data
        end
    call lineout input
    book_title.0 = count

Again the solution is to make another compromise and use numeric
subscripts on most arrays. Since each book is associated with a unique
index (the index in the "book_title" array), we might just as well use
this numeric subscript to index all of the individual columns in the
table. Then, to retain the ability to do associative lookups we
introduce one more array, which will be the only one actually
subscripted with the book title. We might call this array "index", and
provide that index.title is the common numeric subscript for all the
other arrays (the table's columns) which hold particular kinds of data
about books. This data now includes the book's title, and each row is
instead labeled by a number.

Then it will be true that if i = index.title, book_title.i = title. And
all the other information is referenced as book_author.i,
book_publisher.i, etc. So it is still possible to do associative
retrieval. Suppose we want to display all information about a book in
response to a query. We might use the code fragment:

    i = index.title
    say 'Data for' title':' 'Author='book_author.i',',
        'Publisher='book_publisher.i',' 'Pub. date='book_date.i

But now it is also very easy to produce a report on all books in the
database:

    do i = 1 to book_title.0
        say "Book:" book_title.i', Author:' book_author.i
        end

Or you could insert any selection logic you want into to the loop if
you need to limit the search based on date or publisher or whatever. For
instance, if you want only books that contain a certain phrase in the
title:

    do i = 1 to book_title.0
        if pos(phrase, book_title.i) = 0 then
            iterate
        say "Book:" book_title.i', Author:' book_author.i
        end

That's not a bad solution. We now have the means to do both associative
retrieval and sequential search in our "database". Notice that our
database is kept entirely in memory in REXX variables, using natural
REXX data structures. Unless the database is very large (perhaps a
megabyte or more), this isn't a problem in OS/2.

Even so, we might wonder whether it isn't possible to do better. It
seems at least a little inelegant to have to do a sequential scan of the
database every time we want to find something that has not been
explicitly indexed by keeping a separate array. Particularly since
complex conditions may run somewhat slowly. But it's not really possible
to do much better with standard OS/2 REXX.

There are, however, third-party extensions to REXX that are available for
adding all kinds of new capabilities to the language. One of these, Quercus
Systems' REXXLIB, has a number of functions for working with REXX arrays
and compound variables.

One of the functions, called CVTAILS, is capable of scanning all the tails
of a given compound variable searching for matches on a particular string.
Using CVTAILS, the above loop would reduce to:

    call cvtails 'index.', 'list.', phrase
    do n=1 to list.0
        title = list.n
        i = index.title
        say "Book:" book_title.i', Author:' book_author.i
        end

What CVTAILS does is to construct another array (in the 'list.'
variable) whose values are the selected titles. Although we
still need a loop, it runs only over the list of actual "hits" we found
in order to display them.

In order to understand this example, recall that the index. compound
variable was "subscripted" by actual book titles, and the value of
each item was the numeric index in a "conventional" array.

CVTAILS could have been used without a search phrase to actually
generate the list of all tails of a compound variable without having
had to set this up explicitly. (In the present case, this corresponds
to the "book_title." array.)

Let's consider a slightly different problem. What if we wanted to find
all the book titles by a particular author? We could do:

    do i = 1 to book_title.0
        if book_author.i \= name then
            iterate
        say "Book:" book_title.i', Author:' book_author.i
        end

All we've changed is to search a different column of the table. Can this
be done with CVTAILS? No, because the author name was not kept in an
index variable. But there is another function, CVSEARCH, in REXXLIB that
will do what we want:

    call cvsearch 'book_author.', 'list.', name
    do n=1 to list.0
        i = list.n
        say "Book:" book_title.i', Author:' book_author.i
        end

Suppose we wanted to get fancier and product a report of books sorted
alphabetically on the title. This could be a challenging exercise, since
REXX does not have a built-in sort routine. But let's suppose for a minute
that it did, called ARRAYSORT. This takes a REXX array (i. e. a compound
variable indexed from 1 to whatever) and sorts it based on the value of
each item. Or more generally, on one or more subfields within the value.
One could write such a function in REXX itself, though it is not a trivial
exercise, and the performance would probably by a little slow.

We still have to adapt such a routine to be used with the way we have
stored the data. Suppose you just did:

    call arraysort 'book_title.'

in order to rearrange the 'book_title.' array. The problem is that this
sorts only one column of the table. All other columns of the table would
be unaffected, so all connection between titles and the other
information would be lost. The relevance of this to the issue of exactly
how we choose to store the data is that if we had continued to subscript
all columns by the actual title there would not have been a problem. To
reiterate, you have to consider how you will use the data at the time
you decide how it will be stored.

But there are relatively simple expedients that can be employed if you
decide you want to continue storing the data in numerically-subscripted
arrays. You could, for instance, make a copy of the array of titles, and
sort the copy:

    do i = 0 to book_title.0
        sorted_book_title.i = book_title.i
        end
    call arraysort 'sorted_book_title.'

REXXLIB contains two functions, ARRAYCOPY and CVCOPY, either of which
could be used to make the copy without using a loop:

    call arraycopy 'book_title.', 'sorted_book_title.'
    call arraysort 'sorted_book_title.'

Then you could iterate through 'sorted_book_title.' and retrieve the
proper subscript for each column using the 'index.' array:

    do n = 1 to sorted_book_title.0
        title = sorted_book_title.n
        i = index.title
        say "Book:" title', Author:' book_author.i
        end

But there are additional alternatives. For instance, you could construct
a new array that contains both the title and its original numeric index:

    do i = 1 to book_title.0
        sorted_list.i = right(i, 5) || book_title.i
        end
    sorted_list.0 = book_title.0
    call arraysort 'sorted_list.',,, 6

The extra argument to ARRAYSORT is the position in the string at which
sorting is to begin. It skips over the first 5 positions which contain
the index number.

Then to use this array:

    do n = 1 to sorted_list.0
        parse var sorted list.n i 6 title
        say "Book:" title', Author:' book_author.i
        end

There are many other techniques that could be used as well, and the
choice of which to use depends on the nature of the problem, or your own
personal taste. It can be hard to predict ahead of time which methods
will perform best. So if this matters, the best advice is to actually
benchmark different approaches.

Incidentally, ARRAYSORT is another function that is included in REXXLIB.

We're going to move on soon to the other important "structure" issue in
REXX programming, namely overall program structure. But first let's look
at another data structure issue that turns out to be relevant to program
structure.

Up until now we have been supposing that our book database is stored in
a flat ASCII file. It could just as well have been stored in a more
conventional database file maintained by DB2/2 or some other database
manager. If we used a conventional database manager, then many of the
issues of data retrieval and sorting might be handled by the database
manager itself. This would be convenient, since we wouldn't have to
program the operations in REXX.

However, there are drawbacks to using a conventional database manager.
For instance, a DBMS can be difficult to set up and use. Installation
alone takes time, and then there is the problem of defining the files
to be used and different record structures for each application. And,
if you want to distribute your application to others, you need some
assurance that others have the database software - perhaps you will
have to supply it to them (which can get expensive).

If you are using one of the REXX GUI application builders, this job may
be easier, since they all have various tools for using a number of OS/2
database management systems.

However, we think that in fact a very large number of "database"
applications can be programmed solely in REXX without an extra DBMS,
simply by using the techniques presented here. Any time you have a set
of data - whether it is about books, or people, or your multimedia
CD-ROM collection - you have a database. This collection of data can be
kept entirely in memory while you are processing it (if it isn't too
large) by representing it in REXX variables. In other words, the
aggregate of all of the variable values used in a REXX program (or suite
of programs) constitutes a database.

Of course, you need some way to make this data persistent, so that it
endures beyond the execution of any associated program, even though it
may (and often will) be modified in part by the program. And this is
certainly one thing that a conventional DBMS would do for you. But
suppose we don't want to use a DBMS. Are there alternatives in REXX
itself to storing everything in flat files?

Of course there are. For instance, we could store the data right inside
the program. Returning to our book database example, let's first
consider how we would do it in another language - say C. In the first
place, C has real "structures". You might have this to represent a
single book record (bear with us if you don't know C):

    struct book_record {
        char *title;
        char *author; };

(We'll omit other parts of the record for brevity.) You could then
define all your data as a table of such structures:

    struct book_record book_table[] = {
        { "Star Maker", "Olaf Stapledon" },
        { "Brightness Falls from the Air", "James Tiptree" } };

How does one do a similar thing in REXX? Well, right away we run into
the fact that REXX doesn't have data structure declarations analogous to
what is in C or in many other languages. Consequently, it isn't possible
to define data statically. If you're going to keep the data in the
program, you pretty much have to do it with a series of assignments that
are performed at run time:

    book_title.1 = "Star Maker"
    book_author.1 = "Olaf Stapledon"
    book_title.2 = "Brightness Falls from the Air"
    book_author.2 = "James Tiptree"

And don't forget the to set the total size of each array:

    book_title.0 = 2
    book_author.0 = 2

And create the index array too:

    do i = 1 to book_title.0
        title = book_title.i
        index.title = i
        end

That looks like a lot more work than you have to do in C, and it is also
more work than you have to do in order to read the data from a flat
file. So what advantage could there be to keeping the data in the
program itself? Well, if we could find a better way to do this in REXX,
it might actually be less work to type in. Recall that we used labels in
our ASCII file in order to identify different elements of a record.
There's a lot of extra typing just for all those labels.

Here's an alternative. Create a function that can be called with
arguments that identify each record element. Name the function
"make_record" (for instance). Then you could put a series of calls in
your program:

    call make_record "Star Maker", "Olaf Stapledon"
    call make_record "Brightness Falls from the Air", "James Tiptree"

That's not really much more trouble than it is in C, since you can
probably use features of your favorite text editor to insert the first
part of each line. Here's what make_record looks like:

    make_record: procedure expose (book_stuff)
    n = book_title.0 + 1
    book_title.0 = n
    book_author.0 = n
    title = arg(1)
    book_title.n = title
    book_author.n = arg(2)
    index.title = n
    return

Of course, there will be even more run-time overhead with using this
method of initializing your REXX data structures than there is with
the series of assignments. But not much. I will mention later one way
to mostly eliminate this overhead.

I wouldn't necessarily recommend you always use this approach instead of
entering your data in a flat file. To some extent this is another matter
of taste. Especially for relatively small collections of data (100
records? 1000 records?) it can be very convenient to keep it inside the
program that needs it, instead of in a separate file. You might well
have some data that warrants this treatment because it is tightly
coupled to the program and not very meaningful outside of it - tabulated
numerical data, for instance. Also, if you are using a REXX GUI tool
that builds a .EXE file, your data will automatically be incorporated in
this file, and it may even be encrypted (if that is important to you).

Perhaps you want to write a data-entry program, using one of the REXX
GUI tools, to help you actually enter the data you have (if it must be
done manually). You will probably wind up having to write a function
like make_record anyhow.

Further, this suggests an interesting possibility. How about having
functions in your program to access data items as well as store them?
If you frequently need to fetch the author of a particular title, it
would probably be nice to have a function like this:

    author: procedure expose (book_stuff)
    parse arg title
    n = index.title
    return book_author.n

Though this is only a few lines of code, it can be cumbersome to have to
rewrite it every time you need it. This is an alternative that allows
associative retrieval of authors given a title, without having to
construct an author index.

Or you could get even more general, and write a function that would
retrieve any field of the book record:

    book_data: procedure expose (book_stuff)
    parse arg field_name, title
    return value('book_'field_name'.'index.title)

Which is used like so:

    field = 'author'
    Say 'The' field 'of "'title'" is' book_data(field, title)

Note how this looks syntactically like a 2-dimensional array reference.

One of the big advantages of this as a technique is that it encapsulates
the details of the data structure used inside the access functions. So
you are free to change these structures as you see fit, without having
to rewrite a lot of code. This is a big advantage because you will
probably want to experiment with different data structures in an
application as your understanding of the problem evolves. You might even
choose at some point to move the data into a DBMS, yet most of the
program wouldn't need to be aware of this.

We noted earlier that there might be a lot of overhead upon startup if a
REXX program has to initialize each data record with a procedure call -
certainly this is so in comparison with a language like C where the data
can be stored statically. There may be just as much overhead, or more,
if the program has to read the data from a flat file, since each line of
the file has to be parsed for content. It would be good if there were
some way of storing the data in a file that could be loaded into REXX
variables with a minimum of overhead.

There is such a way. One of the facilities available in some REXX
function libraries is the ability to store a group of variables in an
external file with a single call and to reload the variables with
another call. In REXXLIB, for instance, there are actually a couple of
ways to do this. VARWRITE is the function that writes the data, and
VARREAD reads it. These functions can deal with either selected named
variables (or stems), or all of the variables in a given file or
program.

So, if you have initialized the book database as suggested above,

    call varwrite filename, 'i', 'book_title.', 'book_author.',,
        'book_publisher.', 'book_isbn.', 'book_date.', 'index.'

would be enough to save all of the information. (The second argument,
'i', means that the remaining arguments are a list of the variables to
be included in the operation.) Reloading it would be even easier:

    call varread filename

Given this, you might consider putting all of the data initialization
calls into a separate REXX program that stores the data in a file with
VARWRITE. The main program can then load this data any time it is needed
with VARREAD, and the start-up overhead is minimized. If the data
doesn't change very often, you wouldn't need to write a special
data-entry program for it, since you could just edit the file creation
program.


Multiple-file program structure
------------- ------- ---------

We have just indicated a special case where the separation of a single
REXX application into more than one source code file makes sense. I. e.,
when you have one program to initialize some data, and a different one
that uses the data. There are many more cases when an application might
be divided into two or more source files. Certainly, if there is a lot
of code in the application, it is much easier to maintain the code
(especially if more than one programmer is involved), if multiple files
are used. Or there may be a lot of code in the form of subroutines that
needs to be used in different places. It is obviously desirable to keep
only one copy of the subroutine code, which can be invoked as needed.

Another reason that it is sometimes necessary, or at least desirable,
to keep REXX code in separate source files has to do with the REXX GUI
tools. Each one has different requirements, but in general different
application windows may most easily be created and maintained when their
associated REXX code is kept in separate files.

Whatever the reason, using REXX code that resides in more than one
source file is a fact of life with applications of any appreciable size.
It turns out that this presents several problems. So the rest of this
paper will deal with various problems and solutions for working with
REXX code in multiple files. This is what we mean by "program structure"
(rather than issues having to do with how code might be structured
within a single file).

The main problem that arises with multiple REXX source files is the
difficulty of sharing data among them. There is no feature in the REXX
language itself that provides globally shared data in a more or less
transparent manner. In most other languages it is at least possible to
have static data that can be accessed by different source files that
have been linked together. With REXX, on the other hand, the situation
is always like that with respect to the sharing of data between separate
.EXE files.

There is a second problem we will touch on later - the sharing of code
(subroutines) among the different files in a large application.

In a nutshell, here are some of the ways that data can be shared or
passed around among a number of separate REXX programs:

    1. data files on disk (or virtual disk)
    2. OS/2 .INI files
    3. OS/2 "environment variables"
    4. data sharing mechanisms provided by the REXX GUI tools and other
       third-party REXX add-ons
    5. REXX external data queues
    6. other interprocess communication facilities such as named pipes

Possibly the most straightforward data sharing technique simply uses
data files. The files might be managed by a DBMS, which provides the
greatest amount of functionality, as well as safeguards to guarantee the
integrity of data in case of concurrent access by multiple programs. Or
the files could be flat files or files created and accessed by functions
like VARREAD and VARWRITE, such as we have already discussed. In this
case, the REXX program may have to assume responsibility for properly
handling concurrent access if there is a possibility that multiple
threads might need to access the data (while at least one thread might
be updating it). This can be done with OS/2 semaphores. There is no
support for semaphores in OS/2 REXX as delivered. But it is available in
some of the REXX GUI packages and some of the third-party REXX libraries
like REXXLIB.

Semaphores are easy to use, given a package that supports them. The
type of semaphore that has to be used is called a "mutual exclusion"
semaphore. Only one thread at a time can have "ownership" of a semaphore.
A thread gains ownership of a semaphore simply by requesting it. However,
if another thread already owns it, the second thread has to wait until
the semaphore is released.

Before a semaphore can be used at all it has to be created. Every
semaphore has a name, which resembles a file name that begins with
"\SEM32". Once it has been created, other threads simply refer to this
semaphore by its name. Using REXXLIB functions, the call to create a
semaphore looks like this:

    call mutexsem_create "\sem32\my_semaphore"

Suppose we want to have exclusive access to a file while it is being
updated. Although the file system itself provides some protection against
interference between different threads accessing a file, a program that
is merely trying to read the file may not be able to determine that the
reason it is unable to access the file at a given moment is due to file
system serialization. It may be easier to use semaphores explicitly. To
protect a file this way you might have:

    /* wait until resource is free */
    call mutexsem_request "\sem32\my_semaphore"
    do ...
        /* do something with the resource */
        end
    call mutexsem_release "\sem32\my_semaphore"

The resource in question here doesn't need to be a file. It could be
anything that might conceivably be shared between threads, such as an
external data queue or a shared variable. (We'll discuss these shortly.)

OS/2 .INI files present a special case of data files, because there is a
special access function provided in OS/2 REXX as delivered: SysIni. This
interface is at a fairly high level, and provides for several logical
levels of data. Separate .INI files can be used for different
applications. Even within the same file, data can be organized in a
two-level hierarchy of major categories and individual "keys". The
categories are called "applications", but in practice you would probably
want to use at least one separate .INI file for each application.

REXX programs can use the system .INI file (OS2.INI), but this should
usually be avoided for performance reasons, as well as to avoid possible
name-space conflicts. Another problem with using OS2.INI is that it is
vulnerable to corruption since it is used by so many other applications
as well as by OS/2 itself. OS2.INI is sometimes difficult to back up,
and it can be completely lost when system problems occur or OS/2 is
reinstalled.

.INI files can be used simply for communication among any number of
independent processes in the system, but they are best used when data
needs to be persistent. You should definitely avoid using OS2.INI if you
have a lot of data or if you are not concerned about data persistence.

OS/2 guarantees that access to the .INI files themselves is properly
protected with respect to concurrent updates. That is, calls to SysIni
(for the same file) are atomic. However, if you have to make a series of
calls to read or write a number of different data elements, you should
use semaphores (or an equivalent technique) to ensure consistency of the
data.

One of the nice things about .INI files is that the SysIni function
allows you to retrieve more than one piece of data at time. The highest
level of data organization in a .INI file is called the "application",
and you can retrieve the names of all applications in the file like this:

    call sysini filename, 'ALL:', 'applist.'

which puts the names into the array 'applist.', with the number of
items in applist.0.

For any specific application name, the second level of information is
called a "key". You can retrieve the names of all keys for an application
like this:

    call sysini filename, appname, 'ALL:', 'keylist.'

That yields only the names of the keys. You then have to make separate
calls to SysIni to retrieve the value associated with each key.

This framework is not especially well-suited for dealing with arrays,
such as repeated records of a database. But it can be done. For
instance, you could store the book database in one .INI file by making
each title a separate "application", since the title is the unique "key"
that identifies each record. (But watch out for possible duplication of
titles!) Then for each title there would be separate keys for "author",
"publisher", "date", and "isbn". (We'll assume, as we have all along,
that there is only one author per book - or else we keep all author
names somehow in the same data item.)

Given that assumption, then this code could write a .INI file with our
book database:

    do i = 1 to book_title.0
        call sysini filename, book_title.i, 'author', book_author.i
        call sysini filename, book_title.i, 'publisher', book_publisher.i
        call sysini filename, book_title.i, 'date', book_date.i
        call sysini filename, book_title.i, 'isbn', book_isbn.i
        end

And this code could read all the information back into variables:

    call sysini filename, 'ALL:', 'book_title.'
    do i = 1 to book_title.0
        call sysini filename, book_title.i, 'ALL:', 'list.'
        do j = 1 to list.0
            call value 'book_'list.j'.'i,,
                sysini(filename, book_title.i, list.j)
            end
        end

Note that in this example we have allowed that there might be additional
"keys" for any book besides the standard ones we have been using for
illustration. (This example doesn't set the .0 element of arrays except
for book_title., and it doesn't build the index. compound variable.)

Although this code is quite a bit more complex than the equivalent using
VARWRITE and VARREAD shown above, it does use only facilities delivered
with OS/2. The code is also a little simpler than the equivalent for
writing and reading the data in a flat file.

There are other ways to store our book database in a .INI file, such as
including the numeric subscripts in individual keys. Whether you would
want to use a different approach, of course, depends ultimately on how
you most often need to use the data.

Another nice thing about .INI files is that they can be used by programs
in any language that can access OS/2 API functions, as well as from
REXX. So they offer one way to communicate between programs written in
REXX and those in other languages.

OS/2 environment variables present one of the easiest techniques for
sharing data between programs. REXX programs can read or write environment
variables by using the VALUE function:

    call value 'dirname', 'c:\myfiles', 'os2environment'

sets the environment variable called DIRNAME, and

    x = value('dirname', , 'os2environment')

retrieves it.

OS/2 environment variables are specific to one particular process within
the system. They can therefore be used to share data among REXX programs
that have a calling relationship or are otherwise part of the same
process. But they can't be used to exchange data across processes.

Another problem with environment variables is that there isn't any support
at all for easy use of arrays. You could create separate environment
variables with names like

    book_title.1
    book_title.2

and so forth, but you would have to make a separate call to VALUE to
read or write each data item.

Finally, as with most other data sharing techniques, you have to develop
your own access control mechanisms using semaphores to handle concurrent
update problems.

Each of the REXX GUI tools provides its own method of sharing REXX
variables between separate code files. VisPro/REXX, for instance, allows
you to define variables associated with each "form" (roughly speaking,
a window). These variables are accessible to all event procedures for
the form, and to all subforms.

In the first release of VisPro this technique was necessary, since REXX
variables were not global to an application by default. In release 2.0
of VisPro ordinary REXX variables did become global by default, and this
extra mechanism can instead be used as a way of keeping private data
associated with a form.

Variables to be handled this way are defined by entering their names in
one page of the settings notebook for the form. The nice thing about
this mechanism is that you can easily share all elements of a compound
variable simply by entering the stem name in the settings notebook. The
variables are accessed within the program just be referring to them in
the normal REXX way.

The downside to the way that VisPro handles this is that there is some
runtime overhead associated with making private copies of such global
variables when forms are opened or closed.

In VX-REXX there are methods called PutVar and GetVar that can be used
to set and retrieve global variable values. This is similar to how the
VALUE function is used for working with environment variables, and it is
less convenient than the way things are done in VisPro, since these
variables cannot be accessed in the normal REXX way. GetVar and PutVar
must be used. There is the further restriction that only compound
variables that are "arrays" (i. e. having positive integral tails with
the number of such elements in the .0 element) can be used this way.
The advantage of this approach is in reduced overhead in opening and
closing windows.

GpfRexx has a set of functions (QueryStem, QueryStemElement,
RemoveGlobal, RemoveStem, RemoveStemElement, SetGlobal, SetStem, and
SetStemElement) for managing global variables. This is basically like
the VX-REXX facilities, except that arbitrary compound variables can be
handled.

In all three cases, the "global" variable facilities actually apply only
to a single program (.EXE file). So they can be used to share data among
the different parts of one application, but not at all between
applications. Furthermore, the data is not persistent beyond the
lifetime of each invocation of the application.

Other third party tools can overcome some of these restrictions. Quercus
Systems' Personal REXX has a utility command called GLOBALV that is
capable of creating and maintaining true system-wide global variables.
Such global variables can even be made persistent for as long as OS/2
is running, or (optionally) even indefinitely by keeping the data in
special disk files.

GLOBALV also supports a two-level hierarchy for structuring data. The
first level is called a "table", and within each table can be any
number of individual variables. There is not, however, an exact and
automatic mapping onto REXX compound variables.

GLOBALV is patterned closely on a command of the same name available
in VM/CMS. This makes it possible for programs that use it to be portable
among systems that implement GLOBALV (which means, currently, DOS, OS/2,
VM/CMS, and Windows).

As with most other data sharing techniques, GLOBALV protects its own
internal data structures with respect to concurrent update. But
consistency of related data items remains the programmer's
responsibility.

Just as with files, GLOBALV makes it possible to share data among any
processes in OS/2. This may be important, because the REXX language does
not contain any multi-tasking capabilities of its own. Although there
are multi-threading functions provided in the REXX GUI tools and certain
third-party add-on function packages, the only way to achieve
concurrency in OS/2 REXX as it is delivered is to use multiple
processes. One might well choose to implement an application as multiple
processes running REXX code in order to use the multitasking capabilities
of OS/2. In this case, some means for communication between the processes
becomes necessary.

There are, of course, alternatives to GLOBALV for interprocess
communication. The two we will discuss here are REXX external data
queues and named pipes. There are other IPC mechanisms (such as a native
OS/2 "queue" mechanism, which is distinct from a REXX queue). But the
various techniques differ among themselves mostly by their syntax and
their performance.

There are two kinds of REXX external data queues: the unnamed "session"
queue and named queues. The "session" queue is unique to a particular
REXX session. This is broader in scope than a single OS/2 process, since
it will usually include all descendant processes as well. For instance,
one invocation of the CMD.EXE command shell, together with any commands
it may run, represent one session. This means that one REXX program can
save data in the session queue, and it will still be available to later
REXX programs until it has been fully consumed. This is true even if
the data is placed in the queue by a REXX-aware application (.EXE file)
or a .EXE file built by one of the GUI tools. The session queue
is not automatically destroyed when the .EXE file terminates. Instead, it
survives, and its data may persist, until the original copy of CMD.EXE is
closed.

Programs running in other sessions, however, have their own private
session queue and cannot access a different session queue. The privacy
of the session queue is an advantage, in that truly independent REXX
programs cannot interfere unintentionally with each other through the
session queue. But it also means that they can't use the session queue
to communicate with each other.

The other type of external data queue is a "named" queue. This name is
truly system-wide, and such queues can be used for communication between
"unrelated" programs, as long as they all use the same name for the
queue. It is the programmer's responsibility to ensure that unique names
are created when appropriate, and that every application which needs to
use a particular named queue has a means of finding out the correct
name.

It is safest to use the unnamed session queue only for communication
among different REXX files that have some calling relation to each other
or that are part of the same application which has been created by one
of the GUI tools. While data in the session queue will normally persist
between different invocations of a macro by a REXX-enabled application,
it may not be a good idea to rely on this, since the data will be lost
if the application (or the system) unexpectedly terminates. On the other
hand, this is appropriate behavior for transient, temporary data. So
one good use of the session queue is as a large "scratch pad" data area
that can hold data for use by several related REXX programs.

In particular, the session queue is a good way to pass relatively large
amounts of data to a subprocedure. One of the most frequently asked
questions about REXX is how to pass an array (especially a large one)
to an external subroutine and how to return an array. The answer is that
it just isn't possible directly. (For internal subroutines, of course, it
is possible to share compound variables by the use of the EXPOSE keyword
on a PROCEDURE statement.)

To pass an array to a subroutine through the queue you just QUEUE or
PUSH each element. The data queue is maintained as a double-ended list.
Data may be removed only from the front (or top) of the list, with the
PULL instruction, but it can be added to the list either at the end
(bottom) or the front of the list, depending on whether you use QUEUE or
PUSH, respectively. These two possibilities are referred to as "first-in
first-out" (FIFO) and "last-in first-out" (LIFO).

The FIFO case is normally the easiest to conceptualize, so it is the one
you would probably choose most frequently. The main reason to choose
LIFO is to avoid problems when the same queue is used in a nested
fashion by more than one level of subroutine. That is, if any given
routine adds data only to front of the queue before calling a
subroutine, and if the callee only removes as much as it needs, then it
is possible to nest such calls, even recursively, without disturbing
data placed on the queue for a different purpose.

If the queue needs to be used for several, or many, unrelated purposes,
it is probably safer to simply use multiple named queues, even if only
one session is actually involved. But one problem with using a named
queue this way is that you have to be careful that you construct a name
that will be unique, so that if the same application is invoked more
than once simultaneously then there is no interference between the
multiple invocations. You also need to have a means for the intended
user of the queue to find out what the actual name to be used is.

One other problem with named queues is simply that they are a little
inconvenient to work with. You first have to create the queue with a
call to the RXQUEUE built-in function. Then you have to check that
the name was not already in use (as indicated by RXQUEUE returning a
name other than the one you asked for). Finally, you have to make
the new queue the default queue by another call to RXQUEUE:

    qname = rxqueue('c', proposed_name)
    if proposed_name \= qname then  /* this queue must already exist */
        call rxqueue 'd', qname     /* destroy the unwanted new queue */
    call rxqueue 's', proposed_name

The last call here makes the named queue become the default. This is a
necessary step, since there is no way to indicate on the QUEUE or PUSH
instruction what queue is meant - it is always the "default" queue. Names
that are valid for data queues are just like names that are valid for
REXX variables - they must begin with a letter (or certain characters
like "!", "_") and not be longer than 250 characters.

This procedure can be simplified slightly if you let REXX choose a name
for you when creating the queue. This is probably the best approach to
use when you want to have a queue that is used only for temporary
scratch space. You might do this for communication between separate
REXX programs, or you might do it within a single program (even a single
source file) just to avoid any possibility of conflicting use of the
queue. To have REXX assign the name, just omit any name when you create
the queue:

    qname = rxqueue('c')
    call rxqueue 's', qname

Any time you do create a queue, you are responsible for eventually
destroying it, particularly if you are only using it for scratch space.
By the very nature of a named queue, it is persistent (as long as OS/2
is running), so it isn't automatically destroyed when the program (or
even the program's session) terminates. The 'd' option passed to RXQUEUE
destroys a queue:

    call rxqueue 'd', qname

Incidentally, a queue is not destroyed when all data is removed from it.
A queue can be empty, as it will be when it is initially created.

You can tell how many data items are in a queue with the QUEUED built-in
function. This function takes no arguments - it works only on the
current default queue. The fastest way to remove all items from a queue,
is just PULL until QUEUED becomes 0:

    do while queued() \= 0
        pull
        end

Note that it is legal to use PULL with nothing else on the line.

Named queues are often used with applications that are structured as
"client-server". This can be done only so long as the server and all
clients are on the same computer, since queues are not supported across
a network. (Named pipes, to be discussed shortly, are good for network
use.) In this case, there will probably be at least one queue whose
name is known "publicly", through which clients can contact the server.

When a REXX queue is used in a "public" way like this, it's a very good
idea to use semaphores to control access to it, just as with other forms
of interprocess communication. This is particularly true if several
separate data items have to be placed on the queue, because it would
probably not work to have messages from different clients intermixed.

In a client-server situation, it is necessary for the server to have
some way to wait for data to appear on a particular queue, and for the
clients to wait for data to be returned (in the same queue or a different
one). The PULL instruction, which is ordinarily used to read from a queue
is not appropriate for this situation, because it is defined to read from
the keyboard if no data is available in the default queue. One could
continually "poll" for data in the queue with the QUEUED function, but
this will eat up CPU cycles.

The solution is to use a poorly-documented feature of the REXX I/O
system. This is the fact that you can use a stream name of "QUEUE:" with
the LINEIN and LINEOUT functions. On output (LINEOUT), the effect is the
same as the QUEUE instruction. But on input,

    data = linein('QUEUE:')

has the effect of suspending the REXX program until data becomes
available in the queue, without wasting CPU time.

Just about any form of interprocess communication that can be done with
a REXX external data queue can also be done with a named pipe. In fact,
a lot more is possible, because named pipes work across a network, and
also because programs that do not even have special support for pipes
can use them simply by treating them as a file. This makes it possible
for an OS/2 REXX program to communication with a program running in a
DOS session, for instance.

The problem with named pipes is that REXX as delivered with OS/2 has
very little explicit support for named pipes, although a REXX program,
like any other, can treat an already existing pipe as if it were a file.
When used as a file, a pipe always has a name of the form "\PIPE\xxx"
(when the pipe has been created by a process on the same computer) or
"\\server\PIPE\xxx" when the pipe has been created by a process on a
network-connected computer called "server".

There is exactly one function in standard OS/2 REXX for working with
pipes: SysWaitNamedPipe. You have to use this in case a pipe is "busy",
i. e. already in use, which is indicated by an error in the STREAM
function used to open the pipe:

    parse value stream(pipename, 'c', 'open') with state ':' retc
    if retc = 231 then
        call syswaitnamedpipe(pipename, -1)

Various function packages available from third parties provide more
complete support for named pipes. REXXLIB, in particular, has such
support. The functions provided there make it possible to do just
about anything that is possible with named pipes, except that only
one instance of a pipe can be open a time, because of the single
threaded nature of REXX.

As an example of the use of named pipes, consider the problem of
debugging a complex REXX program, especially one that is constructed
using one of the GUI tools. Although these tools include debuggers,
sometimes the nature of the problem is such that the easiest technique
to use is for the program to send messages to a message log when
certain abnormal conditions are detected. The GUI tools provide for
this through a "console I/O window" which receives the output of SAY
instructions. However, if you aren't using a GUI tool, you have to
have another way to handle this. It may not be reasonable to mix SAY
debugging output with normal output of the program.

The solution is for the program being debugged to send output through a
named pipe to a simple server running in another process. The server can
display the information in its own window, or possibly even analyze it
for the occurrence of specific events. Here is how the server code might
look, using named pipe functions provided by REXXLIB:

    pipe = '\pipe\echo'
    call nmpipe_create pipe, 'm', 'm', 'w'
    do i=1
        call nmpipe_connect pipe
        say 'Connect RC =' result
        if result \= 0 then
            exit
        do forever
            message = nmpipe_read(pipe)
            if message = 'end' | message == '' then do
                call nmpipe_disconnect pipe
                iterate i
                end
            say 'Message received: "'message'"'
            end
        end

The pipe is created with NMPIPE_CREATE. NMPIPE_CONNECT is called so that
the server can wait for a client to open its side of the pipe. After
this occurs, a series of calls to NMPIPE_READ retrieve all data sent
from the client, until either a null string or the string "end" is
received. This is taken to mean that the client is done (perhaps died)
or has closed its end of the pipe.

The client side of this is even simpler. The client might simply open
the pipe implicitly by calling LINEOUT with some data. Subsequent calls
to LINEOUT send additional messages, and when the client is all done, it
just calls LINEOUT with only the pipe name in order to close it.

The last topic we're going to look at in this paper is the question of
structuring a large application as a number of separate REXX source
files. What we've just covered is a variety of techniques for sharing
data among such files, which is a problem since "global" variable values
of a calling program are not "inherited" by external subprocedures. But
there are are other issues that arise too.

For one thing, state information of a running REXX program is not
inherited, either, in a call to an external routine: trace settings,
NUMERIC and ADDRESS settings, condition handlers, and timers. However,
this state information is inherited on internal procedure calls. This
can cause problems for you if you break a large program up into a main
routine and several external procedures, since these procedures may no
longer have the same state settings as before.

In spite of this, there are some features of a calling REXX program that
are inherited by a callee. You should be aware of this, since it is not
only unexpected (in light of the fact that most things aren't
inherited), but also quite undocumented as well. Most importantly, open
file information is inherited. This means, in particular, that a
subroutine inherits position information about all open files, and the
file can be left in a different position when the subprocedure returns.

Other information of somewhat lesser importance that is always inherited
includes the name of the current default external data queue. Also, if
you use any external function libraries that keep their own state
information, this will most likely be maintained independently of
external procedure calls.

Apart from the question of how various kinds of environmental
information behave with respect to external procedure calls, the main
issue you face in using external procedures is simply the extra overhead
of finding and loading them. Although it doesn't take much time to find
and load an external REXX procedure once, this can be a big factor if it
has to be done 1000 times in a loop. Many REXX applications run
surprisingly slowly for just this reason.

Yet there are good reasons for wanting to break up a large program -
the standard concerns of modularity, modifiability, sharing of code,
and so forth. Fortunately, there is an alternative that allows keeping
code in separate file, without the overhead of searching for them on
disk and loading them when required.

As supplied with OS/2, REXX supports a feature called "macro spaces".
Basically this is a capability for loading program files into shared
memory and keeping them resident as long as desired so that the search
and load overhead can be bypassed. All or parts of this shared code
space can be saved in disk files (in "tokenized" form) in order to create
(in effect) REXX subroutine libraries. Such libraries can be saved or
distributed and reloaded as a whole when appropriate.

Among other things, macro space libraries provide an answer to a
question that often arises with programmers who have to distribute REXX
code. For a variety of reasons (protection against modification, hiding
of confidential information, etc.) it is often desirable not to
distribute source code. Since the REXX code in a macro space is saved in
the "tokenized" form, one gets an immediate solution to this problem.
(Note that IBM does not guarantee tokenized code will continue to
operate in future releases of OS/2. If you do this, be prepared to
redistribute your code if an incompatible change does occur).

So macro spaces can help manage two distinct problems: the performance
of calls to external routines and the need for a way to avoid exposing
source code. Unfortunately, there is one hitch: REXX as distributed
doesn't provide REXX-callable functions for working with a macro space.
Third party libraries, again, provide the solution.

REXXLIB has a set of functions for managing macro spaces. The first
step is to load all of the source files you want into the macro space.
If you don't want to create a permanent macro library, that is all you
have to do. Otherwise, you make one more call that causes the library
to be written to disk.

For instance, suppose that the compound variable names. contains the
list of names of procedures required. Suppose the extension of all files
is ".CMD". Then you would use:

    do i = 1 to names.0
        call macroadd names.i, names.i".cmd", 'B'
        end
    call macrosave 'mymacros.mac', 'names.'

Although the file names used in this example are the same as the routine
names (except for the extension), this does not need to be the case. You
could associate any procedure name you want with any file.

The third argument of MACROADD is either 'B' ("before") or 'A'
("after"), which indicates whether REXX will search for that particular
external procedure name in the macro space either before or after it
searches on disk. There would be no performance advantage of macro
spaces if you didn't use 'B', so it is the default. But you might want
to use 'A' to create a kind of "default" procedure which would be
executed only if the name wasn't found on disk. (This is a rather risky
thing to do, since your application will probably fail if an unrelated
REXX procedure with this name just happens to exist on disk. But it also
makes it possible to supply overriding procedures if that is desirable.)

When a procedure has been loaded into the macro space with the 'B'
option, REXX finds it before it searches the disk, and before it finds
functions registered in .DLL and .EXE files. This makes it possible for
you to override function names that you would otherwise not have
control over. However, the macro code is still executed in exactly the
same way any external procedure is. Consequently, you can't pass
compound variable stems to a macro space routine any more than you can
any other external REXX code. And all the other considerations listed
previously for calling external routines still apply. Also, you can't
override the names of built-in functions, since these are considered
to be internal.

In order to use a macro space library you have created at an earlier
time, it takes only one call to load it:

    call macroload 'mymacros.mac'

If the macro library is one you use regularly, you should probably just
load this in a STARTUP.CMD file. Unless you have a really large library,
the memory usage isn't too significant, and it will be swapped out pretty
soon anyhow if it's not actually used.

Although the macro space facility is a nice, little-known feature of
OS/2 REXX, it has a few problems (apart from the fact there's no
REXX-callable interface supplied by IBM). For one thing, the macro
space is global to the whole OS/2 system. Macros loaded by one process
immediately become visible to all other running processes. The same is
true of any other change made to the macro space, such as deletion of
routines. Of course, this isn't so different from the fact that a new
REXX program added to your disk immediately becomes available to all
processes as well. You just have to be careful, and you should probably
establish naming conventions to avoid unintended name collisions.

Also, you can't get a list of macros that have already been loaded into
the macro space. But you can query whether any particular name has been
loaded. If you are especially security conscious, you might want to
think about doing this to be sure a "trojan horse" REXX routine hasn't
been loaded into the macro space to replace a routine you rely upon.
(MACROQUERY is the REXXLIB function that provides this service.)

Bibliography
------------

1. Cowlishaw, M. F.; The REXX Language
   Prentice-Hall, ISBN 0-13-779067-8

2. Daney, Charles; Programming in REXX
   McGraw-Hill, ISBN 0-07-015305-1

3. German, Hallett; OS/2 2.1 REXX Handbook
   Van Nostrand Reinhold, ISBN 0-442-01734-0

4. Goran, Dick; REXX Reference Summary Handbook
   CFS Nevada, ISBN 0-963-98541-8

5. Rudd, Anthony; Application Development Using OS/2 REXX
   Wiley-QED, ISBN 0-471-60691-X