Guide to Writing CGI Scripts in REXX and Perl

Источник: manmrk.net

Last Update: July 24, 1998.


Contents

  • Introduction
  • Getting Input to the Script
  • Decoding Forms Input
  • Sending Document Back to the Client
  • Reporting Errors
  • Two Simple WWW REXX CGI Scripts
  • Other Sources of Information

Introduction

This Guide is aimed at people who wish to write their own WWW executable scripts using WWW's Common Gateway Interface ( CGI). Though the main emphasis is on REXX many examples are also provided in Perl.

There are some simple software libraries to facilitate writing CGI scripts. cgi-lib.rxx is a REXX library of functions (available at SLAC by using the REXX
CALL PUTENV 'REXXPATH=/afs/slac/www/slac/www/tool/cgi-rexx'
statement to include the library at execution time)and cgi-lib.pl is a similar library in Perl written by Steve Brenner (there is an executable copy of this libary at SLAC in /afs/slac/g/www/cgi-lib/cgi-lib.pl). NCSA has a very useful set of Perl CGI handler subroutines that are available via anonymous FTP.Another set of Perl CGI Server Side Scripts written by Brigitte Jellinek is available under Gnu public license. There is also the Source code for www.stanford.edu scripts and programs. There is also an index to Perl WWW programs gathered by Earl Hood. Finally see the Web Development Center.

Since there are security and other risks associated with executing user scripts in a WWW server, the reader may wish to first view a document providing information on a SLAC Security Wrapper for users' CGI scripts. Besides improving security, this wrapper also simplifies the task of writing a CGI script for a beginner.

Before embarking on writing a script, you may also want to check out some rough notes on SLAC Web Utilities Provided by CGI Scripts.

The CGI is an interface for running external programs, or gateways, under an information server. Currently, the supported information servers are HTTP (the Transport Protocol used by WWW) servers.

Gateway programs are executable programs (e.g. UNIX scripts) which can be run by themselves (but you wouldn't want to except for debugging purposes). They have been made executable to allow them to run under various (possibly very different) information servers interchangeably. Gateway programs conforming to this specification can be written in any language, including REXX or Perl, which produces an executable file

Getting the Input to the Script

The input may be sent to the script in several ways depending on the client's Uniform Resource Locator (URL) or an HyperText Markup Language (HTML) Form:

  • QUERY_STRING Environment Variable

    QUERY_STRING is defined as anything which follows the first ? in the URL used to access your gateway. This information could be added by an HTML ISINDEX document, or by an HTML Form (with the GET action). It could also be manually embedded in an HTML hypertext link, or anchor, which references your gateway. This string will usually be an information query, e.g. what the user wants to search for in databases, or perhaps the encoded results of your feedback Form. It can be accessed in REXX by using String=GETENV('QUERY_STRING')
    or in Perl by using $string=$ENV('QUERY_STRING');

    This string is encoded in the standard URL format which changes spaces to +, and encoding special characters with %xx hexadecimal encoding. You will need to decode it in order to use it. You can review the cgi-lib.rxx REXX PROCEDURE DeWeb or the Perl code fragment giving examples of how to decode the special characters.

    If your server is not decoding results from a Form, you will also get the query string decoded for you onto the command line. This means that the query string will be available in REXX via the PARSE ARG command, or in the Perl $ARGV[n] array.

    For example, if you have a URL https://www.slac.stanford.edu/cgi-bin/foo?hello+world and you use the REXX command PARSE ARG Arg1 Arg2 then Arg1 will contain "hello" and Arg2 will contain "world" (i.e. the + sign is replaced with a space).
    In Perl $ARGV[1] contains "hello" and $ARGV[2] contains "world". If you choose to use the command line to access the input, you need to do less processing on the data before using it.

  • PATH_INFO Environment Variable

    Much of the time, you will want to send data to your gateways which the client shouldn't muck with. Such information could be the name of the Form which generated the results they are sending.

    CGI allows for extra information to be embedded in the URL for your gateway which can be used to transmit extra context-specific information to the scripts. This information is usually made available as "extra" information after the path of your gateway in the URL. This information is not encoded by the server in any way. It can be accessed in REXX by using String=GETENV('PATH_INFO'), or in Perl by using $string=$ENV('PATH_INFO');

    To illustrate this, let's say I have a CGI script which is accessible to my server with the name foo. When I access foo from a particular document, I want to tell foo that I'm currently in the English language directory, not the Pig Latin directory. In this case, I could access my script in an HTML document as:

    foo

    When the server executes foo, it will give me PATH_INFO of /language=english, and my program can decode this and act accordingly.

    The PATH_INFO and the QUERY_STRING may be combined. For example, the URL:
    https://www/cgi-bin/htimage/usr/www/img/map?404,451
    will cause the server to run the script called htimage. It would pass remaining path information "/usr/www/img/map" to htimage in the PATH_INFO environment variable, and pass "405,451" in the QUERY_STRING variable. In this case, htimage is a script for implementing active maps supplied with the CERN HTTPD.

  • Standard Input

    If your Form has METHOD="POST" in its FORM tag, your CGI program will receive the encoded Form input on standard input (stdin in Unix). The server will NOT send you an EOF on the end of the data, instead you should use the environment variable CONTENT_LENGTH to determine how much data you should read from stdin. You can accomplish this in REXX by using In=CHARIN(,1,GETENV('CONTENT_LENGTH')), or in Perl by using read(STDIN,$in,$ENV{'CONTENT_LENGTH'});

     

    If you wish to pass the standard input onto another script that you will call later, then you may wish to review the cgi-lib.rxx REXX PROCEDURE ReadPost.

You can review the REXX Code Fragment giving an example of how to read the various form of input into your script.

The REXX PROCEDUREs ReadForm together with MethGet and MethPost, all available in cgi-lib.rxx, may be used to simplify the task of reading input from a Form.

Decoding Forms Input

When you write a Form, each of your input items has a name tag. When the user places data in these items in the Form, that information is encoded into the Form data. The value each of the input items is given by the user is called the value.

Form data is a stream of name=value pairs separated by the ampersand (&) character. Each name=value pair is URL encoded, i.e. spaces are changed into plus signs and some characters are encoded into hexadecimal. To decode the Form data you must first parse the Form data block into separate name=value pairs tossing out the ampersands. Then you must parse each name=value pair into the separate name and value. Use the first equal sign you encounter to split the data. If there is more than one, then something is wrong with the data. Again toss out the equals signs. Finally undo the URL encoding of each name and value.

You can review the REXX or the Perl code fragment giving examples of decoding the Form input.

When using the name and value information in the script, you need to be aware that:

  • nothing dictates the order in which the name=value will be concatenated in;
  • not every name and value defined in the form is necessarily sent by the client, for example if nothing is selected in a scrolling list then neither the name nor the value will be sent;
  • more than one value may be sent for a given name, for example if a scrolling list allows the selection of several options.

Sending Document Back to Client

CGI programs can return a myriad of document types. They can send back an image to the client, an HTML document, a plaintext document, a Postscript documents or perhaps even an audio clip of your bodily functions. They can also return references to other documents (to save space we will ignore this latter case here, more information may be found in NCSA's CGI Primer). The client must know what kind of document you're sending it so it can present it accordingly. In order for the client to know this, your CGI program must tell the server what type of document it is returning.

In order to tell the server what kind of document you are sending back, CGI requires you to place a short header on your output. This header is ASCII text, consisting of lines separated by either linefeeds or carriage returns followed by linefeeds. Your script must output at least two such lines before its data will be sent directly back to the client. These lines are used to indicate the MIME type of the following document

Some common MIME types relevant to WWW are:

  • A "text" Content-Type which is used to represent textual information in a number of character sets and formatted text description languages in a standardised manner. The two most likely subtypes are:
    • text/plain: text with no special formatting requirements.
    • text/html: text with embedded HTML commands
  • An "application" Content-Type, which is used to transmit application data or binary data. Two frequently used subtypes are:
    • application/postscript: The data is in PostScript, and should be fed to a PostScript interptreter.
    • application/binary: the data is in some unknown binary format, such as the results of a file transfer.
  • An "image" Content-Type for transmitting still image (picture) data. There are many possible subtypes, but the ones most often used on WWW are:
    • image/gif: an image in the GIF format.
    • image/xbm: an image in the X Bitmap format.
    • image/jpeg: an image in the JPEG format.

In order to tell the server your output's content type, the first line of your output should read:
Content-type: type/subtype
where type/subtype is the MIME type and subtype for your output.

Next, you have to send the second line. With the current specification, THE SECOND LINE SHOULD BE BLANK. This means that it should have nothing on it except a linefeed. Once the server retrieves this line, it knows that you're finished telling the server about your output and will now begin the actual output. If you skip this line, the server will attempt to parse your output trying to find further information about your request and you will become very unhappy.

You can review a REXX Code Fragment giving an example of handling the Content-type information.

After these two lines have been outputted, any output to stdout (e.g. a REXX SAY command) will be included in the document sent to the client. This output must be consistent with the Content-type header. For example if the header specified Content-type text/html then the following output must include HTML formatting such as using
or

for starting new lines or

 to remove HTML's automatic formatting.

 

 

Diagnostics and Reporting Errors

Since stdout is included in the document sent to the, diagnostics diagnostics outputted with the SAY command will appear in the document. You can review a REXX Code Fragment giving an example of diagnostic reporting.

If errors are encountered (e.g. no input provided, invalid characters found, too many arguments specified, requested an invalid command to be executed, invalid syntax or undefined variable encountered in the REXX script) the script should provide detailed information on what is wrong etc. It may be very useful to provide information on the settings of various WWW Environment Variables that are set.

The CGIerror, CGIdie and MyURL REXX PROCEDUREs in cgi-lib.rxx provide some assistance for error reporting. In addition review the REXX code fragments using CGIerror and using CGIdie and also typical CGIerror output and CGIdie output.

Two Simple REXX WWW CGI Scripts

To get your Web server to execute a CGI script you must:

  • Write the script. To simplify this, you may wish to take advantage of the cgi-lib.rxx library of functions, including some introduced previously on this page. A couple of simple, but complete examples may help:
    1. source of a script to enable a UNIX finger function.
    2. source of a minimal HTTP Form and Script.
  • Make the script executable by your Web server. At SLAC on Unix this is done using the chmod command, e.g.
    • chmod o+x /u/sf/cottrell/bin/cgi1.rxx
      chmod u+x /u/sf/cottrell/bin/cgi1.rxx
  • Get your Web-Master to add a rule to the Web server's rules file to allow the Web server to execute your script. More information on the W3C server's rules file may be found by looking at Configuration File of W3C httpd, as well as a simple example of some of the mapping statements usable in the rules file.

The Web-Master will want to insure that Security Aspects of your script have been addressed before adding your script to the Rules file.

 

Other Sources of Interest

  • Hard Copy:
    • The book HTML & CGI Unleashed has much useful information on writing CGI scripts in C, Perl and REXX.
    • The book Introduction to CGI/PERL by Steve Brenner & Edwin Aoki is a useful introduction to writing CGI scripts in Perl.
  • Writing World-Wide Web CGI Scripts in REXX presented at the Spring 1996 SHARE Technical Conference, March 7, 1996, Anaheim California.
  • The NetRexx Language Page provides information on an experimental project by Mike Cowlishaw (the autor of REXX) to create a Rexx front end to Java.
  • Also tune into the newsgroup comp.infosystems.www.authoring.cgi which covers discussion of the development of Common Gateway Interface (CGI) scripts as they relate to Web page authoring. Possible subjects include discussion how to handle the results of forms, how to generate images on the fly, and how to put together other interactive Web offerings.
  • The World Wide Web (Frequently Asked Questions, with Answers) answers many, many questions about the World Wide Web in general.
  • If you are using Perl and you have a general Perl question that isn't really a CGI-specific question, check out the Perl FAQ.
  • If you will be writing scripts for Windows NT then see Somarsoft - Windows NT Security Issues

Acknowledgements

Much of the text on the Common Gateway Interface and Forms comes from NCSA documents. Useful information and text was also obtained from The World-Wide Web: How Servers Work, by Mark Handley and John Crowcroft, published in ConneXions, February 1995.