Get Usenet News Articles Using REXX, Network News Transfer Protocol and Sockets

Автор: Dave Briccetti's

Дата: 1995

Источник: Way Back Machine

This tip demonstrates retrieving selected news articles from an NNTP server using REXX, NNTP, and TCP Sockets.

If you are a REXX, NNTP, or Sockets expert and you see any errors or possible improvements to this tip, please share your knowledge with me.

As a software developer and contract programmer, I like to keep up with available contracts by searching Usenet newsgroups, especially ba.jobs.contract. I got tired of starting up my newsreader, opening the group, searching for "OS/2," and then weeding out those postings which say "W2ONLY." I wanted to click on an object and have it all done for me automatically. So I wrote this REXX program.

Newsreaders often get the news articles from a program known as a News Server, which runs on a host somewhere. Your Internet service provider or system administrator should give you the name of the server to use. Newsreaders communicate with the news server using the Network News Transfer Protocol, or NNTP.

To use this program, which is called GetNews, type the following from the command line (or set up program objects to supply your frequently used options):

GetNews NewsGroup SearchString [ExcludeString]

NewsGroup is the name of the newsgroup you want to search, such as ba.jobs.contract.

SearchString is the string you are searching for, such as OS/2. All articles containing your search string in the Subject: line will be considered for retrieval.

ExcludeString is an optional string, which, if found in the body of the article, will cause the article to be skipped.

 Here is an example:

GetNews ba.jobs.contract OS/2 W2ONLY

This will retrieve all articles from ba.jobs.contract with subjects containing "OS/2" and where the string "W2ONLY" does not appear anywhere in the article.

 Here's part of the result of running the program as in the above example, with commentary interspersed. The lines in the larger font are the NNTP commands we are sending.

200 shellx.best.com InterNetNews NNRP server INN 1.4 22-Dec-93 ready (posting ok).

This is what the news server says when we connect to it. The 200 code tells us that all is well and we can continue.

 

GROUP ba.jobs.contract

 

Here we select the newsgroup.

211 3614 72340 75980 ba.jobs.contract

This response (code 211) tells us that the server accepted the GROUP command, and gives the number of articles, the starting article number, and the ending article number of the group.

 

XPAT SUBJECT 1- *OS/2*

 

We request a list of all articles whose subject contains the string "OS/2."

221 subject matches follow.
72392 US-CA-San Fran-MGR-OS/2 WARP-Recruiter
72440 US-CA-San Fran-OS/2 Testing Engineer-RecruiterChen & McGinley, Inc.
72491 US-CA-San Fran-QA-OS/2, Testing-Recruiter
72563 US-CA-Oakland-LAN-LAN WAN TCP/IP OS/2 NT-Recruiter
72567 US-CA-Oakland-LAN-LAN WAN TCP/IP OS/2 NT-Recruiter
.

The search results appear, followed by a line containing only a period, which identifies the end of the data.

 

BODY 72392

 

Now we ask for the body of the first message.

222 72392 <80652395268020@dice.com> body
SEARCH KEYS: TYPE:MGR   TERM:CON   W2ONLY   STATE:CA    AREA:415 

POSITION ID: ARCSF.011  
DATE POSTED: 12/01/95

POSITION TITLE     : PROJECT MANAGER                                    
SKILLS REQUIREMENTS: OS/2 WARP                                          

LOCATION  : SAN MATEO
START DATE: 12/20/95       
PAY RATE  : NEGOTIABLE  (+ benefits
LENGTH    : 1 year         

COMMENTS: Manage large-scale, multi-site rollout of high powered        
          superstation desktops.  Very interesting, high profile        
          position.                                                     
.
And here it is, again ending with the period.

 

QUIT

 

We indicate we're finished.

205

The server accepts the command and ends the session.

 

The complete REXX program follows. You can also get it in zipped form.

 

/* ===========================================================================

Get Usenet News Articles Matching Search Criteria, Using Network News Transfer Protocol
RFC977 (https://www.cis.ohio-state.edu/htbin/rfc/rfc977.html) (или здесь) and the Draft Common NNTP Extensions(ftp://ftp.internic.net/internet-drafts/draft-barber-nntp-imp-01.txt)

Written by a novice REXX programmer

Dave Briccetti, December 1995
daveb@davebsoft.com, https://www.davebsoft.com

May be used for any purpose

=========================================================================== */

parse arg NewsGroup SearchString ExcludeString

if NewsGroup = '' | SearchString = '' then
do
    say 'usage: GetNews NewsGroup SearchString [ExcludeString]'
    say 'example: GetNews ba.jobs.contract OS/2 W2ONLY'
    say '  shows all postings in ba.jobs.contract with OS/2 in the'
    say '  subject line and without the string 'W2ONLY' in the body'
    exit
end

OutFile = 'results.' || NewsGroup  /* Change this if you don't have long file names */
/*OutFile = 'search.txt'*/      
NewsServer    = 'your.news.server' /* News server */
SearchField   = 'subject'          /* Article header field to search */

TRUE                    = 1
FALSE                   = 0
REPLYTYPE_OK            = '2'   /* NNTP reply code first byte */

/* Load the REXX Socket interface */
call RxFuncAdd 'SockLoadFuncs', 'rxSock', 'SockLoadFuncs'
call SockLoadFuncs

'@if exist' OutFile 'del' OutFile

if EstablishProtocol() = FALSE then
    exit

/* Get the postings */
call GetPostings socket, NewsGroup, SearchField, ,
    SearchString, ExcludeString, OutFile

/* End the protocol with QUIT */
CmdReply = TransactCommand(socket, 'QUIT', 1, '0d0a'x)

/* Close the socket */
call SockSoClose socket

exit


/* ======================================================================== */
EstablishProtocol:
/* ======================================================================== */

socket = ConnectToNewsServer(NewsServer)   
if socket <.= 0 then
do
    say 'Could not connect to news server'
    return FALSE
end

CmdReply = GetCmdReply(socket, '0d0a'x)
say CmdReply

if left(CmdReply, 1) \= REPLYTYPE_OK then
do
    say 'Could not establish protocol'
    return FALSE
end

return TRUE


/* ======================================================================== */
GetPostings: procedure
/* ======================================================================== */

parse arg socket, NewsGroup, SearchField, SearchString, ExcludeString, OutFile
CRLF = '0d0a'x
Dot = CRLF || '.' || CRLF
REPLYTYPE_OK            = '2'   /* NNTP reply code first byte */

CmdReply = TransactCommand(socket, 'GROUP' NewsGroup, 1, CRLF)

parse var CmdReply code num first last group

if left(code, 1) = REPLYTYPE_OK then
do
    CmdReply = TransactCommand(socket, ,
        'XPAT' SearchField '1- *' || SearchString || '*', 0, Dot)
    if left(CmdReply, 1) \= REPLYTYPE_OK then
    do
        say 'xpat failed'
        return FALSE
    end
    CmdReply = StripFirstLine(CmdReply)
    call lineout OutFile, CmdReply

    do while length(CmdReply) >. 5
        line = GetFirstLine(CmdReply)
        if line = '' then
            CmdReply = ''
        else
        do
            CmdReply = StripFirstLine(CmdReply)
            parse var line num rest
            body = TransactCommand(socket, 'BODY' num, 0, Dot)
            if ExcludeString = '' | (pos(ExcludeString, body) = 0) then
            do
                From = HeaderLine(socket, 'from')
                call lineout OutFile, From
                Subject = HeaderLine(socket, 'subject')
                call lineout OutFile, Subject
                BodyStripped = StripFirstLine(body)
                call lineout OutFile, BodyStripped
            end
        end
    end
end

return


/* ======================================================================== */
ConnectToNewsServer: procedure
/* ======================================================================== */

parse arg NewsServer
socket = 0

/* Open a socket to the news server.  (The Sock* functions are
   documented in the REXX Socket book in the Information folder
   in the OS/2 System folder */

call SockInit
if SockGetHostByName(NewsServer, 'host.!') = 0 then
    say 'Could not get host by name' errno h_errno
else
do
    socket = SockSocket('AF_INET','SOCK_STREAM',0)
    address.!family = 'AF_INET'
    address.!port = 119          /* the standard NNTP port */
    address.!addr = host.!addr
    if SockConnect(socket, 'address.!') = -1 then
        say 'Could not connect socket' errno h_errno
end
return socket


/* ======================================================================== */
GetCmdReply: procedure
/* ======================================================================== */

parse arg socket, EndString

/* Receive the response to the command into a variable.  Use
   more than one socket read if necessary to collect the whole
   response. */
  
if SockRecv(socket, 'CmdReply', 1000) <. 0 then do
    say 'Error reading from socket' errno h_errno
    exit
end

ReadCount = 1
MaxParts = 10

do while ReadCount <. MaxParts & right(CmdReply, length(EndString)) \= EndString
    if SockRecv(socket, 'CmdReplyExtra', 1000) <. 0 then do
        say 'Error reading from socket'
        exit
    end
    CmdReply = CmdReply || CmdReplyExtra
    ReadCount = ReadCount + 1
end

return CmdReply


/* ======================================================================== */
TransactCommand:
/* ======================================================================== */

parse arg socket, Cmd, SayCmd, EndString

/* Send a command to the SMTP server, echoing it to the display
   if requested */

if SayCmd then
    say Cmd
   
rc = SockSend(socket, Cmd || '0d0a'x)
reply = GetCmdReply(socket, EndString)
if SayCmd then
    say reply
return reply


/* ======================================================================== */
GetFirstLine: procedure
/* ======================================================================== */

parse arg TextBlock
p = pos('0a'x, TextBlock)
if p >. 0 then
    line = left(TextBlock,p)
else
    line = ''
   
return line

           
/* ======================================================================== */
StripFirstLine: procedure

/* ======================================================================== */

parse arg TextBlock
p = pos('0a'x, TextBlock)
if p >. 0 then
    StrippedTextBlock = right(TextBlock,length(TextBlock)-p)
else
    StrippedTextBlock = ''
   
return StrippedTextBlock

           
/* ======================================================================== */
HeaderLine: procedure           
/* ======================================================================== */

parse arg socket, linetype

CRLF = '0d0a'x
Dot = CRLF || '.' || CRLF

XhdrResponse = TransactCommand(socket, 'xhdr' linetype, 0, Dot)

hl = StripFirstLine(XhdrResponse)   /* Strip off the first line */
hl = GetFirstLine(hl)               /* Take the first line of what remains */
hl = delword(hl,1,1)                /* Delete the article number */
return hl