wiki:Transport_Layer
Last modified 8 years ago Last modified on 01/19/12 19:39:34
                             TRANSPORT.C                4/24/98 

Table of contents:
  I.  Introduction
  1.  Overview of transport routines
      1.1   Transport.h structures used by the calling program.
      1.2   Initializing/terminating access to shared memory.
      1.3   Writing messages to shared memory.
      1.4   Retrieving messages from shared memory.
      1.5   Buffering messages in a private memory region.	
      1.6   Communicating with the shared memory header flag.
      1.7   Error reporting by transport functions.
  2.  Function calls
      2.1   tport_create
      2.2   tport_destroy
      2.3   tport_attach
      2.4   tport_detach
      2.5   tport_putmsg
      2.6   tport_getmsg
      2.7   tport_copyto 
      2.8   tport_copyfrom
      2.9   tport_buffer
      2.10  tport_bufthr
      2.11  tport_putflag
      2.12  tport_getflag
      2.13  tport_syserr
      2.14  tport_buferror 
  3.  Programming tips
  4.  Bug fixes and program modifications
      4.1   Mishandled shared memory pointer wraps in tport_putmsg.
      4.2   Missing argument to shmctl.
      4.3   Speed enhancement using memcpy.
      4.4   Making tport_putmsg multi-thread safe.
      4.5   Mishandled shared memory pointer resets in tport_getmsg.
      4.6   Minor crack in tport_getmsg and tport_copyfrom.
      4.7   Logo-tracking problem with GET_TOOBIG messages, 
	    tport_getmsg and tport_copyfrom.
      4.8   Tracking problem when no messages of requested logo
            are ever returned, tport_getmsg and tport_copyfrom.
      4.9   Variable name changed to allow use of C++ compilers.
      4.10  Semaphore operations problem in tport_putmsg and tport_copyto 
            (Solaris version).


I.  INTRODUCTION

Transport.c contains a set of functions for accessing System V IPC shared 
memory regions under SunOS 4.1.1 and Solaris 2.4.  These routines, with 
exactly the same function calls, have also been ported to OS/2 and Windows NT. 
  void  tport_create();
  void  tport_destroy();
  void  tport_attach();
  void  tport_detach();
  int   tport_putmsg();
  int   tport_getmsg();
  int   tport_copyto(); 
  int   tport_copyfrom();
  void  tport_putflag();
  int   tport_getflag();
  void  tport_syserr();

In June 1995, a set functions were added to transport.c to create multi-
threaded, message-buffering applications under Solaris 2.4, OS/2, and NT.
(SunOS does not support multi-threaded applications):
  int   tport_buffer();
  void *tport_bufthr();
  void  tport_buferror(); 

On Solaris, source files using transport functions should include these lines:
    #include <earthworm.h>   /* required by multi-thread transport functions */
    #include <transport.h>

On OS/2, source files using transport functions should include these lines
(the first 3 lines must be before the transport.h include line):
    #define INCL_DOSMEMMGR                       
    #define INCL_DOSSEMAPHORES              
    #include <os2.h>             
    #include <earthworm.h>   /* required by multi-thread transport functions */
    #include <transport.h>


1.  OVERVIEW OF TRANSPORT ROUTINES

In the following paragraphs, anything written in all capital letters is
defined in transport.h.   The following topics are explained in more detail
below:
  1.1   Transport.h structures used by the calling program.
  1.2   Initializing/terminating access to shared memory.
  1.3   Writing messages to shared memory.
  1.4   Retrieving messages from shared memory.
  1.5   Buffering messages in a private memory region.	
  1.6   Communicating with the shared memory header flag.
  1.7   Error reporting by transport functions.


1.1   Transport.h structures used by the calling program.

	Many constants and five structure types and are defined in transport.h. 
Two of the structure types are used as arguments to transport functions.  The 
other defined structure types are used internally by the transport functions;
for more information on those, please read the comments in transport.h.  The
first structure type used as an argument in transport calls is a shared memory
information structure:

   Solaris version:
       typedef struct {                    
             SHM_HEAD  *addr;     /* pointer to beginning of memory region */
             long       key;      /* key to shared memory region           */
             long       mid;      /* shared memory region identifier       */
             long       sid;      /* associated semaphore identifier       */
       } SHM_INFO;      

   OS/2 version:
       typedef struct {                    
             SHM_HEAD  *addr;     /* pointer to beginning of memory region */
             long       key;      /* key to shared memory region           */
             PVOID      objAlloc; /* pointer to memory object              */
             HMTX       hmtx;     /* mutex semaphore handle                */
       } SHM_INFO;                       

All the values in this structure are set within function tport_create or
tport_attach.  It contains all the information needed to identify and use the
memory region in all other transport function calls.                  

The second structure type used as an argument is the message logo structure:
       typedef struct {                     
             unsigned char  type;    /* message is of this type       */
             unsigned char  mod;     /* was created by this module id */
             unsigned char  instid;   /* at this installation          */
       } MSG_LOGO;           
This structure describes the message it is associated with.  A single 
MSG_LOGO structure is passed an argument to tport_putmsg.  tport_getmsg 
takes an array of MSG_LOGO structures as a list of requested logos and it 
sets values in an individual MSG_LOGO structure to identify the retrieved
message.


1.2   Initializing/terminating access to shared memory.

	Four of the transport functions deal with getting a program ready to 
use or to finish with a shared memory region.  tport_create() creates the 
memory region given a unique "key" to identify the region and the size (in 
bytes) of the region. The created memory region consists of 2 parts: a header 
section (SHM_HEAD) for keeping track of pointers, etc., and a circular buffer 
area for storing variable-length messages.  The region should be made large 
enough compared to the size of the messages it holds to give each message a 
reasonable residence time in the memory before it is overwritten.  All 
information needed to identify and use the memory region in other transport 
function calls is stored in a shared memory information structure (SHM_INFO). 
To access an existing shared memory region, a program must first attach to it 
by passing tport_attach() the region's unique key.  tport_attach then sets up 
the shared memory information structure. Note: A program should call EITHER 
tport_create() to create and attach to a memory region OR tport_attach() to 
attach to an existing region. It should never call both.
	Just before exitting, a program that had attached to a memory region 
should detach from it using tport_detach() and one that had created it should 
destroy it using tport_destroy().  
	None of these four functions has a return value; if a system error 
occurs, they will write a message to stdout and exit.


1.3   Writing messages to shared memory.

	Messages are written to a shared memory region using tport_putmsg() or 
tport_copyto(), given the region's shared memory information structure.  When 
one tport_putmsg or tport_copyto is writing to memory, no other tport_putmsg or 
tport_copyto can access the same region.  Both functions write a transport 
layer header (TPORT_HEAD) in front of each message in shared memory. The first
byte of this header is always set to FIRST_BYTE to signal the beginning of a 
new message.  The header also includes the length of the following message, 
its "message logo" (MSG_LOGO; its message type, module id and installation id), 
and a sequence number.  If tport_copyto is used, the sequence number is passed 
as an argument to the function, and sequencing from another source can be 
preserved.  If tport_putmsg is used, the sequence number is assigned and 
tracked by tport_putmsg; any previous sequencing of messages will be lost.  
tport_putmsg has a limit to the number of different logos for which it can 
keep track of sequences numbers (NTRACK_PUT).  If this limit is exceeded, 
tport_putmsg will not write messages with new logos to memory; it will return 
PUT_NOTRACK, write a warning to stdout and continue.  tport_copyto has no 
tracking limits.  tport_putmsg and tport_copyto are multi-thread safe (they 
can be used by multiple threads of the same process without problems).


1.4   Retrieving messages from shared memory.

	Messages of a given logo are retrieved from a shared memory region 
using tport_getmsg() or tport_copyfrom(). A single logo can be requested or 
an array of logos can be requested. Additionally, any or all components (type,
module, instid) of the requested message logo(s) can be wildcarded (set to 
WILD).  tport_getmsg or tport_copyfrom will return when it has found the first
message which matches any of the requested logos.  Both functions also keep
track of the sequence number they expect to see for the next message of each
logo; therefore, tport_getmsg or tport_copyfrom can tell if they have missed 
any messages.  If tport_getmsg misses messages, it returns GET_MISS; if
tport_copyfrom misses messages, it returns either GET_MISS_LAPPED (if memory 
was over-written by tport_putmsg or tport_copyto) or GET_MISS_SEQGAP (if a 
gap in sequence numbers was passed along by tport_copyto).  There is a limit
(NTRACK_GET) to the number of logos for which tport_getmsg or tport_copyfrom 
can track sequence numbers.  If this limit is exceeded, both functions will 
still return a message matching any requested logo, but they won't know if 
they have missed any; they will return GET_NOTRACK, write a warning to stdout 
and continue.  Both functions write the message logo, length (bytes), and 
message to addresses in their argument lists.  tport_copyfrom has one 
additional address argument to which it writes the TPORT_HEAD sequence number 
of the returned message.  Since both functions have their own private tracking 
variables, it is very important that each module use only one of these 
functions to grab messages for a given region-logo combination.  Otherwise, 
the module may see the same message twice!  tport_getmsg and tport_copyfrom 
are not multi-thread safe; they cannot be used safely by two threads of the 
same process.


1.5   Buffering messages in a private memory region.
	
	Several functions have been added to transport.c to give modules a
multi-threaded message-buffering capability.  After attaching to or creating 
a public shared memory region and creating (tport_create) a private shared 
memory region, a module can call tport_buffer() to start the buffering thread,
passing it 2 shared memory information structures (public and private), an 
array of logos, and the module id and installation id of the calling module.  
tport_buffer creates a thread, tport_bufthr(), which uses tport_copyfrom and 
tport_copyto to transfer all messages of the given logo(s) from the public 
region to the private region.  All sequence numbers from the public region 
are preserved in the private region.  The buffering-thread reports errors by 
calling tport_buferror(), which writes error messages, labeled with the main 
thread's module id and installation, to the public region using tport_putmsg.  
The main thread must retrieve all of its buffered messages from the private 
region using tport_getmsg. [tport_copyfrom and tport_getmsg are not multi-
thread safe, and since the buffering-thread is hard-wired to call 
tport_copyfrom, the main thread must use tport_getmsg.]  The buffering-thread
will exit when the shared memory header flag in the public region is set to
TERMINATE.  The main thread must destroy its private buffering region 
(tport_destroy) before it exits.


1.6   Communicating with the shared memory header flag.

	Two transport functions deal only with the flag in the shared memory 
header structure.  This flag is included as a means of communication between 
different programs accessing the same region.  For instance, if the flag is 
set to a certain value, all attached programs should detach and terminate. To 
change the value of the flag in a given region, use tport_putflag().  To find 
out the current value of the flag, use tport_getflag().


1.7   Error reporting by transport functions.

	Transport routines report errors by use of one of 2 functions, 
tport_syserr() or tport_buferror().  Both are meant for internal use only by 
the other transport functions.  tport_syserr is called when a system error has
occurred; it writes a message to stdout and exits.  tport_buferror is called by 
tport_bufthr (the buffering-thread) when return values from other transport 
routines indicate a problem.  tport_buferror writes an error message, tagged 
with the main thread's module id and installation id, to the public shared 
memory region using tport_putmsg and then it returns.




2.  FUNCTION CALLS
          
Below are the function calls, return values and comment lines from the 
transport.c source code. They provide a general description of each 
function's purpose and its program flow. 

  2.1   tport_create
  2.2   tport_destroy
  2.3   tport_attach
  2.4   tport_detach
  2.5   tport_putmsg
  2.6   tport_getmsg
  2.7   tport_copyto 
  2.8   tport_copyfrom
  2.9   tport_buffer
  2.10  tport_bufthr
  2.11  tport_putflag
  2.12  tport_getflag
  2.13  tport_syserr
  2.14  tport_buferror 


2.1   tport_create:  create a shared memory region & its semaphore, attach
		     to it and initialize shared memory header values.

void tport_create( SHM_INFO *region,   /* info structure for memory region  */
                   long      nbytes,   /* size of shared memory region      */
                   long      memkey )  /* key to shared memory region       */ 

Arguments used as passed:  nbytes, memkey

Arguments reset by function:  *region 

Return Value: None. If any system error occurs during its execution, 
              tport_create writes a message to stdout and exits.

Program flow:
/* Destroy memory region if it already exists */
/* Create shared memory region */
/* Attach to shared memory region */
/* Initialize shared memory region header */
/* Make semaphore for this shared memory region & set semval = SHM_FREE */
/* set values in the shared memory information structure */



2.2   tport_destroy:  destroy a shared memory region.

void tport_destroy( SHM_INFO *region )  /* info structure for memory region */

Arguments used as passed:  region

Arguments reset by function:  none 

Return Value: None. If any system error occurs during its execution, 
              tport_destroy writes a message to stdout and exits.

Program flow:
/* Set kill flag, give other attached programs time to terminate */
/* Detach from shared memory region */
/* Destroy semaphore set for shared memory region */
/* Destroy shared memory region */



2.3   tport_attach:  map to an existing shared memory region.

void tport_attach( SHM_INFO *region,   /* info structure for memory region  */
                   long      memkey )  /* key to shared memory region       */

Arguments used as passed:  memkey

Arguments reset by function:  *region 

Return Value: None. If any system error occurs during its execution, 
              tport_attach writes a message to stdout and exits.

Program flow:
/* attach to header; find out size memory region; detach */
/* reattach to the entire memory region; get semaphore */
/* set values in the shared memory information structure */



2.4   tport_detach:  detach from a shared memory region. 

void tport_detach( SHM_INFO *region )   /* info structure for memory region  */

Arguments used as passed:  region

Arguments reset by function:  none 

Return Value: None. If any system error occurs during its execution, 
              tport_detach writes a message to stdout and exits.



2.5   tport_putmsg:  write a message into a shared memory region.

int tport_putmsg( SHM_INFO *region,   /* info structure for memory region   */
                  MSG_LOGO *putlogo,  /* type,module,instid of incoming msg */
                  long      length,   /* size of incoming message           */
                  char     *msg )     /* pointer to incoming message        */

Arguments used as passed:  region, putlogo, length, msg

Arguments reset by function:  none 

Return values: PUT_OK if it put the message in memory with no problems.
               PUT_NOTRACK if it did not put the message in memory because
                   its sequence number tracking limit (NTRACK_PUT) was 
                   exceeded.
               PUT_TOOBIG if it did not put the message in memory because
                   it was too long to fit in the region.

If a system error occurs while tport_putmsg is executing or if a
pointer into the memory region gets lost (doesn't point to a FIRST_BYTE),
tport_putmsg writes a message to stdout and exits.

Program flow:
/* First time around, init the sequence counters, semaphore controls */
/* Set up pointers for shared memory, etc. */
/* First, see if the incoming message will fit in the memory region */
/* Change semaphore; let others know you're using tracking structure & memory */
/* Next, find incoming logo in list of combinations already seen */
/* Incoming logo is a new combination; store it, if there's room */
/* Store everything you need in the transport header */
/* First see if keyin will wrap; if so, reset both keyin and keyold */
/* Then see if there's enough room for new message in shared memory */
/*      If not, "delete" oldest messages until there's room         */
/* Now copy transport header into shared memory by chunks... */
/* ...and copy message into shared memory by chunks */
/* Finished with shared memory, let others know via semaphore */



2.6   tport_getmsg:  read a message out of shared memory.

int tport_getmsg( SHM_INFO  *region,   /* info structure for memory region  */
                  MSG_LOGO  *getlogo,  /* requested logo(s)                 */
                  short      nget,     /* number of logos in getlogo        */
                  MSG_LOGO  *logo,     /* logo of retrieved msg             */
                  long      *length,   /* size of retrieved message         */
                  char      *msg,      /* retrieved message                 */
                  long       maxsize ) /* max length for retrieved message  */

Arguments used as passed:  region, getlogo, nget, maxsize

Arguments reset by function:  *logo, *length, *msg 

Return values: GET_OK   if it got a message of requested logo(s).
               GET_NONE if there were no new messages of requested logo(s).
               GET_MISS if it got a message, but missed some.  Messages could
		   be missed for one of 3 reasons:
		   1) memory was overwritten before the message was retrieved.
		   2) message was lost before being written to memory and a  
		      sequence # gap was passed to memory by tport_copyto.
 		   3) previous message of returned logo was skipped because
		      it was longer than maxsize.
               GET_NOTRACK if it got a message, but couldn't tell if it
                   had missed any because its sequence # tracking limit
                   (NTRACK_GET) was exceeded.
               GET_TOOBIG if it found a message of requested logo(s) but
                   it was too long to fit in caller's buffer. No message
                   returned, but length and logo of the "toobig" message
                   are returned.

If a pointer into the memory region gets lost (doesn't point to a FIRST_BYTE),
tport_getmsg writes a message to stdout and exits.

Program flow:
/* Get the pointers set up */
/* First time around, initialize sequence counters, outpointers */
/* find latest starting index to look for any of the requested logos */
/* See if keyin and keyold were wrapped and reset by tport_putmsg; */
/*       If so, reset trak[xx].keyout and go back to findkey       */
/* Find next message from requested type, module, instid */
   /* make sure you haven't been lapped by tport_putmsg */
   /* load next header; make sure you weren't lapped */
   /* make sure it starts at beginning of a header */
   /* see if this msg matches any requested type */
/* Found a message of requested logo; retrieve it! */ 
        /* complain if retrieved msg is too big */      
        /* copy message by chunks to caller's address */
        /* see if we got run over by tport_putmsg while copying msg */
        /* if we did, go back and try to get a msg cleanly          */
        /* set other returned variables */
        /* find logo in tracked list */
        /* new logo, track it if there's room */   
        /* check if sequence #'s match; update sequence # */
        /* Ok, we're finished grabbing this one */
/* If you got here, there were no messages of requested logo(s) */
/* update outpointer ->msg after retrieved one for all requested logos */



2.7   tport_copyto:  put a message into a shared memory region; preserve the 
		     sequence number (passed as an argument) as the transport  
		     layer sequence number.

int tport_copyto( SHM_INFO     *region,  /*info structure for memory region   */
                  MSG_LOGO     *putlogo, /*type,module,instid of incoming msg */
                  long          length,  /*size of incoming message           */
                  char         *msg,     /*pointer to incoming message        */
                  unsigned char seq )    /*preserve as sequence# in TPORT_HEAD*/

Arguments used as passed:  region, putlogo, length, msg, seq

Arguments reset by function:  none 

Return values: PUT_OK if it put the message in memory with no problems.
               PUT_TOOBIG if it did not put the message in memory because
                   it was too long to fit in the region.

If a system error occurs while tport_copyto is executing or if a
pointer into the memory region gets lost (doesn't point to a FIRST_BYTE),
tport_copyto writes a message to stdout and exits.

Program flow:
/* First time around, initialize semaphore controls */
/* Set up pointers for shared memory, etc. */
/* First, see if the incoming message will fit in the memory region */
/* Store everything you need in the transport header */
/* Change semaphore to let others know you're using memory */
/* First see if keyin will wrap; if so, reset both keyin and keyold */
/* Then see if there's enough room for new message in shared memory */
/*      If not, "delete" oldest messages until there's room         */
/* Now copy transport header into shared memory by chunks... */
/* ...and copy message into shared memory by chunks */
/* Finished with shared memory, let others know via semaphore */



2.8   tport_copyfrom:  get a message out of public shared memory; save the
		       sequence number from the transport layer.

int tport_copyfrom( SHM_INFO  *region,   /* info structure for memory region */
                    MSG_LOGO  *getlogo,  /* requested logo(s)                */
                    short      nget,     /* number of logos in getlogo       */
                    MSG_LOGO  *logo,     /* logo of retrieved message        */
                    long      *length,   /* size of retrieved message        */
                    char      *msg,      /* retrieved message                */
                    long       maxsize,  /* max length for retrieved message */
                    unsigned char *seq ) /* TPORT_HEAD seq# of retrieved msg */

Arguments used as passed:  region, getlogo, nget, maxsize

Arguments reset by function:  *logo, *length, *msg, *seq

Return values: GET_OK   if it got a message of requested logo(s).
               GET_NONE if there were no new messages of requested logo(s).
               GET_MISS_LAPPED if it got a message, but missed some due to 
		   msgs being overwritten (by tport_putmsg or tport_copyto)  
		   before it got to them.
 	       GET_MISS_SEQGAP if it got a message, but noticed a gap in the
		   sequence numbers in the ring.  This means one of 2 things:
		   1) a msg was lost before being placed in shared memory and 
		   the sequence gap was transferred into shared memory by 
		   tport_copyto.
		   2) the previous message of the returned logo was skipped 
		   because it was longer than maxsize.
               GET_NOTRACK if it got a message, but couldn't tell if it
                   had missed any because its sequence # tracking limit
                   (NTRACK_GET) was exceeded.
               GET_TOOBIG if it found a message of requested logo(s) but
                   it was too long to fit in caller's buffer. No message
                   returned, but length and logo of the "toobig" message
                   are returned.

If a pointer into the memory region gets lost (doesn't point to a FIRST_BYTE),
tport_getmsg writes a message to stdout and exits.

Program flow:
Same as tport_getmsg program flow (see section 2.6).



2.9   tport_buffer:  initialize the input buffering thread.

int tport_buffer( SHM_INFO  *region1,      /* transport ring             */
                  SHM_INFO  *region2,      /* private ring               */
                  MSG_LOGO  *getlogo,      /* array of logos to copy     */
                  short      nget,         /* number of logos in getlogo */
                  unsigned   maxMsgSize,   /* size of message buffer     */
                  unsigned char module,    /* module id of main thread   */
                  unsigned char instid )   /* inst id of main thread     */

Arguments used as passed:  region1, region2, getlogo, nget, maxMsgSize, 
			   module, instid

Arguments reset by function:  none

Return values:   0 if there were no errors.
		-1 if there was an error allocating the internal message buffer,
		   or if there was an error creating the thread.
		
Program flow:
/* Allocate internal message buffer */
/* Copy function arguments to global variables */
/* Start the input buffer thread, tport_bufthr */
/* Yield to the buffer thread */



2.10  tport_bufthr:  thread to buffer input from one transport region to another.

void *tport_bufthr( void *dummy )

Arguments:  none

Return values:  none

Program flow:
This function is an infinite loop which will exit only when the termination
flag is set in the public shared memory region's header:
/* Check the flag in the public region; exit if it's set to TERMINATE */
/* Try to copy a message from the public memory region with tport_copyfrom */
/* Handle return values from tport_copyfrom */
/* If you did get a message, copy it to private ring with tport_copyto */



2.11  tport_putflag:  set the flag in a shared memory header.       

void tport_putflag( SHM_INFO *region,  /* shared memory info structure  */
                    short     flag )   /* value to set header flag to   */

Arguments used as passed:  region, flag

Arguments reset by function:  none 

Return Value: none 



2.12  tport_getflag:  return the value of the flag from a shared memory header.

int tport_getflag( SHM_INFO *region )   /* shared memory info structure */
  
Arguments used as passed:  region

Arguments reset by function:  none 

Return value: The value of the shared memory header flag.



2.13  tport_syserr:  print a system error and exit.

void tport_syserr( char *msg,   /* message to print */
                   long  key )  /* key of memory region that had an error  */

Arguments used as passed:  msg, key

Arguments reset by function:  none 

Return Value: None. In fact it never returns, but always exits after
              writing the error message to stdout.



2.14  tport_buferror:  build an ascii earthworm error message and put it 
		       in the public memory region using tport_putmsg.

void tport_buferror( short  ierr,       /* 2-byte error word       */
                     char  *note  )     /* string describing error */

Arguments used as passed:  ierr, note

Arguments reset by function:  none 

Return Value:  none



3.  PROGRAMMING TIPS
     
Here are some tips for writing and running programs using transport.c:

Region key(s) should be defined in a .h file which is included by all 
programs that will access the region(s). One program should create the 
memory region(s) (tport_create); other programs accessing those regions 
will attach to them (tport_attach). The "creator" can also be a "putter" 
or "getter" or it can be a program with no purpose other than
creating/destroying memory regions.

When deciding how large to make a memory region (tport_create), remember
that the transport layer uses a portion of the memory region for its own
bookkeeping.  The region size is NOT required to be an even multiple of the
size of the messages it will contain.  However, suppose a user wants the
region to be exactly large enough to store NUMRING messages of size MSGSIZE.
To include space for transport bookkeeping too, the region size should be:
   sizeof(SHM_HEAD) + NUMRING * ( sizeof(TPORT_HEAD) + MSGSIZE ) 

At run time, the "creator" must be started first.  A few seconds should 
be allowed for the regions to be set up before starting "attachers". 
Otherwise the "attachers" will exit immediately because they can't find 
the memory regions.

Any program accessing shared memory should periodically look at the flag 
in the memory's header structure (tport_getflag).  If the flag is set to
TERMINATE, any "attacher" should detach from memory (tport_detach) and 
exit, and the "creator" should destroy the memory region(s) (tport_destroy) 
and exit. 
 
To initiate such a polite termination of all programs, one program
must set that termination flag (tport_putflag).  A "killer" program,
whose only purpose is to attach to a region and set the flag, is a useful
tool for keyboard-initiated exits.

Simple examples of these types of programs reside in the same directory
as transport.c.  They are:
  putter1.c   creates regions and writes messages as module 1.
  putter2.c   attaches to regions and writes messages as module 2.
  getter.c    attactes to regions and retrieves messages, printing them.
  killer.c    sets terminate flag to stop all programs.
  keys.h      include file defining shared memory region keys.
  go          simple script to start the programs.
  Makefile

Note: Transport.c was designed to work in programs which run continuously.
If, however, a putter or getter is a transient beast that is run only
intermittently, the getter may return the "GET_MISS" status without actually
missing any messages.  This is due to the fact that every time a putter or 
starts up, its sequence # trackers are set to 0.



4.  BUG FIXES AND PROGRAM MODIFICATIONS
   
   4.1   Mishandled shared memory pointer wraps in tport_putmsg.
   4.2   Missing argument to shmctl.
   4.3   Speed enhancement using memcpy.
   4.4   Making tport_putmsg multi-thread safe.
   4.5   Mishandled shared memory pointer resets in tport_getmsg.
   4.6   Minor crack in tport_getmsg and tport_copyfrom.
   4.7   Logo-tracking problem with GET_TOOBIG messages, 
	 tport_getmsg and tport_copyfrom.
   4.8   Tracking problem when no messages of requested logo
         are ever returned, tport_getmsg and tport_copyfrom.
   4.9   Variable name changed to allow use of C++ compilers.
   4.10  Semaphore operations problem in tport_putmsg and tport_copyto 
         (Solaris version).


4.1   Mishandled shared memory pointer wraps.  
Problem: tport_putmsg mishandled wraps in the shared memory header's
         unsigned long keyin and keyold.  The caused the transport layer to
         lose its place in the memory ring and die.

The Fix: After resetting keyin and keyold, check to make sure that keyin is
         larger than keyold.  If not make keyin = keyin + keymax.
         Change made in tport_putmsg on 10/24/94 by Lynn Dietz.

         I also changed transport.c so that it writes warning and error
         messages to stdout (instead of stderr as it was doing) so that the
         messages can easily be redirected to a log file.
         Change made in transport.c on 10/24/94 by Lynn Dietz.


4.2   Missing argument to shmctl.
Problem: tport_create and tport_destroy each have a call to shmctl(). 
	 Shmctl() takes 3 arguments, but I only had the first two passed.  
	 The compiler under SunOS never complained about it, but the
	 Solaris compiler 3.0.1 did.

The Fix: I added the 3rd argument (struct shmid_ds shmbuf) to both of
	 the shmctl() calls.
         Change made in transport.c on 3/28/95 by Lynn Dietz.


4.3   Speed enhancement using memcpy.
Problem: I noticed that coaxtoring, a program that just reads messages 
         from ethernet and puts them into shared memory using tport_putmsg, 
         took a big chunk of the cpu on a Sparc2 when handling large 
         messages (>50,000 bytes).  Suspect that something isn't optimized. 

The Fix: I changed how tport_putmsg and tport_getmsg copy messages from
         one address to another. A byte-by-byte for loop was replaced with
         one or two (if the message was wrapped around the end of the ring)
         calls to memcpy().  This sped up the coaxtoring program by 20-30%.
         Change made in transport.c on 6/20/95 by Lynn Dietz.


4.4   Making tport_putmsg multi-thread safe.
Problem: Previously, the semaphore was set in tport_putmsg after the incoming
	 logo was found in the tracking list.  If more than one thread of the
	 same process was using tport_putmsg, they could have competed for
 	 access to the tracking structure, potentially causing duplicated
	 sequence numbers or other errors.

The Fix: tport_putmsg now sets the semaphore before it looks for the logo in
	 the tracking structure.  Since only one tport_putmsg can access the 
	 tracking structure at a time, multiple threads of one process can 
	 safely use the same routine.
 	 Change made in transport.c on 6/27/95 by Lynn Dietz.


4.5   Mishandled shared memory pointer resets in tport_getmsg.
Problem: Each tport_getmsg() and tport_copyfrom() must reset its tracking 
         pointers (trak[xx].keyout) after shared memory header keyin & keyold 
         are wrapped and reset (by tport_putmsg or tport_copyto).  Sometimes,
         keyout was mistakenly reset to a number less than keyold, causing 
         the getter to grab messages from the ring starting with the oldest 
         complete message in the ring.  This results in a "missed message" 
         error, because of a gap in transport sequence numbers.  It also
         causes some messages to be processed twice.

The Fix: After resetting a keyout value in tport_getmsg() and tport_copyfrom(),
         first see if it still points to the FIRST_BYTE of a message.  
         If it does, make sure the value of keyout lies between keyold and 
         keyin.  If it doesn't point to a FIRST_BYTE, the getter was lapped 
         by a putter; reset keyout to keyold.
         Change made in transport.c on 1/17/96 by Lynn Dietz


4.6   Minor crack in tport_getmsg and tport_copyfrom.
Problem: When reading shared memory, both tport_getmsg and tport_copyfrom use
	 this logic: make sure I haven't been lapped by a putter, grab a 
	 TPORT_HEAD from the ring, make sure that the TPORT_HEAD starts with 
	 a FIRST_BYTE.  On very rare occassions, a putter will overwrite the
	 first byte (or the TPORT_HEAD) between the getter's lap-check an its 
	 grabbing the header from the ring.  In this case, the getter will
	 complain that the header doesn't begin with a FIRST_BYTE and it will
	 exit.

The Fix: Add another lap-check just after tport_getmsg and tport_copyfrom
	 grab a TPORT_HEAD from the ring.  Their logic now looks like this:
         make sure I haven't been lapped by a putter, grab a TPORT_HEAD from 
	 the ring, make sure I haven't been lapped by a putter, make sure 
	 that the TPORT_HEAD starts with a FIRST_BYTE.  Note: another lap-
	 check is done after each message is grabbed from the ring.

 Change: In a move totally unrelated to the above problem, I changed 
	 the word "WARNING" to "NOTICE" in all references to wraps of
	 keyin/keyout/keyget to reflect the fact that this is really a 
	 normal, albeit rare, occurrance. 
         Changes made in transport.c on 6/12/96 by Lynn Dietz


4.7   Logo-tracking problem with GET_TOOBIG messages, 
      tport_getmsg and tport_copyfrom.
Problem: Whenever tport_getmsg or tport_copyfrom find a message that matches
	 the requested logo(s) but is too long for the target address, they 
	 return the logo and length of the message, but they never enter the
	 logo-tracking part of the routine.  This causes a problem only if the
	 very first message is GET_TOOBIG; since no logos are being tracked, 
	 these functions don't record the fact that they've looked at this 
	 GET_TOOBIG message already.  On the next call, they look at the same 
	 GET_TOOBIG message, and thus get stuck looking at this same message 
	 forever... (which may put you into an infinite loop depending on
	 how you handled the return codes).

The Fix: Modify the program flow of tport_getmsg and tport_copyfrom such that
	 after a TOOBIG message is found, they enter the logo-tracking part
	 of the routine.  Also make sure that the return code does NOT get
	 changed from GET_TOOBIG!
         Changes made in transport.c on 6/12/96 by Lynn Dietz


4.8   Tracking problem when no messages of requested logo are ever returned, 
      tport_getmsg and tport_copyfrom.
Symptom: If a module never finds a message of any requested logo in a given
	 memory region, that module eventually becomes a CPU hog.  We know 
	 something is wrong because the module has nothing to process; it 
	 should be doing a loop something like: call tport_getmsg, get a 
	 return code of GET_NONE, sleep a little bit, try again.  Where is 
	 the CPU going?

Problem: The problem is essentially the same as that described in section 4.7.
         No entries exist in the logo-tracking list until a message of a
         requested logo is actually found in shared memory.  If no such 
	 message has been found, tport_getmsg and tport_copyfrom have no way
         to record the position in shared memory of the last message that
	 they considered (and rejected).  So on every single call, tport_getmsg 
	 or tport_copyfrom start at the oldest complete message in memory and 
	 look at every single one (even though they've probably seen most of 
	 them already...) before concluding that none of them match their 
 	 request.  If the memory region is large and there are a lot of 
	 little messages in it, this can take a lot of CPU!

The Fix: Modify tport_getmsg and tport_copyfrom so that the first thing they
	 do is verify that each of the requested logos is entered in the 	
 	 logo-tracking list.  This way, even if none of the requested logos 
	 is found, there is place to record the position of the last message 
	 that was considered for each requested logo.  (The sequence number
	 tracking for each logo remains "inactive" until the first message
	 with that logo is found).  On subsequent calls, tport_getmsg and
	 tport_copyto will only look at messages they haven't seen before.
         Changes made in transport.c on 6/18/96 by Lynn Dietz


4.9   Variable name changed to allow use of C++ compilers.
Problem: We had used "class" as the variable name for the installation in 
         the MSG_LOGO structure.  However, "class" is also a keyword in C++,
         so if you want to use a C++ compiler, you cannot use "class" as
         a variable name. 

The Fix: Change all references to "class" to "instid" to allow this software
         to be compiled with a C++ compiler.
         Changes made in transport.c and transport.h on 3/13/97 by Lynn Dietz


4.10  Semaphore operations problem in tport_putmsg and tport_copyto (Solaris).
Symptom: All modules attached to a given transport ring (running on a Solaris 
         system) suddenly die with a message like: 
           "ERROR: tport_getmsg; keyget not at FIRST_BYTE, Region xxxx"
         This message implies that the transport ring is corrupted. The symptom 
         was first noticed when Doug Neuhauser ran his transport-based UCB code 
         on a dual-processor Ultra.  A dual-processor X86 Solaris machine has 
         also exhibited this symptom while running Earthworm v3.1 code.  

Problem: Many thanks go to Doug Neuhauser for tracking down the bug!  In both 
         tport_putmsg and tport_copyto, the structure sembuf sops, used as
         an argument to the semaphore operation function semop(), had been 
         declared as a static struct. In multi-threaded code, you can have two 
         simultaneous invocations of tport_putmsg(), eg one for a heartbeat 
         and one for data.  Each one will overwrite the values of sops for the
         other thread. This bug shows up readily on a multi-processor machine
         on which two threads can really run simultaneously. It could also 
         presumably occur on a single-processor machine, but we've never
         experienced it yet. This bug can manifest itself with these symptoms:
         1) a corrupted transport ring, from both threads writing to the 
            ring at the same time, and 
         2) deadlock, where both threads are waiting for the semaphore.

The Fix: Remove "static" from the declaration of "struct sembuf sops;" in 
         tport_putmsg and tport_copyto.  Also, pull the initialization of 
         sops structure members out of one-time-only initialization loops.
         Changes made in solaris/transport.c on 4/24/98 by Lynn Dietz

     
For more information contact:  Lynn Dietz
                               dietz@andreas.wr.usgs.gov
                               415-329-5520