The main idea behind the Event Transfer System software is to create an extremely fast method of transferring events from process to process. An event is simply a memory buffer that can be filled with any data users wants to share with each other.
In a nutshell, the ET system consists of a single process which memory maps a file into its memory space. This mapped memory contains all events as well as other important data. The number and size of the events are set by the ET system creator with all events being the same size. Users of the ET system that are on the same node as the ET system will transparently map the same memory which allows for quick communication between processes and forms the foundation of the event transfer system. Users of the ET system on different nodes will transparently communicate with the system over the network using TCP/IP as the system has a server built in. The ET system is responsible for the transfer of all events from user to user, or more accurately, from station to station.
The ET system consists of stations, each of which are essentially two lists: 1) an input list of events to be read, and 2) an output list of events that have been read and are ready to be sent to the next station. These stations, in turn, are themselves arranged into an ordered list. Events pass from station to station until they reach the last station in the list and are then returned to the first station.
The first station is special and for lack of a better name is called, GRAND_CENTRAL station. It is a repository of unused events which it gives to event producers who ask for them. This station is created automatically when starting up an ET system. All other stations are created by users and may be placed in any order after GRAND_CENTRAL.
User processes can use functions from an ET system library to connect to (open) the ET system. Once open, the user can proceed to create stations and then make attachments to those or other stations. Once attached to a particular station, one can read events from and write events to it. The above steps can also be reversed by detaching from stations, removing stations, and closing the ET system in that order.
In the process of reading or "getting" an event, the user grabs one from a station's input list and similarly, in the process of writing or "putting" it, the user places it into the station's output list. All output lists have enough space to contain all events in the system. Thus, a user can always put events since there will always be room. However, reading events from a station may block if there are no events to be read.
In the following document, processes which write data into event buffers thereby creating data are called producers, while processes which are interested in reading, analyzing, and even modifying data produced by others are called consumers.
Take a closer look by examining the figure below. It shows the flow of events with everything occurring completely in the ET system process. One advantage of doing things this way is that crashed user processes will not affect the flow of events, avoiding bringing the whole system to a grinding halt.
The way that this is accomplished is that ET is multithreaded. Each station has its own event transfer thread - or conductor - which is waiting for output events. When an event is written, it wakes up the conductor which reads all events in the list, determines which events go where, and writes them in blocks to each station. The conductor also releases the specially allocated memory associated with temporary events (more on temp events later).
The use of threads have made complete error recovery possible 99.9% of the time. The system and user processes each have a thread which sends out a heartbeat (increments an integer in shared memory). The system monitors each process and each process monitors the system in yet another thread. If the system dies, user processes automatically return from any function calls that are currently pending and can make a function call to find out if the system is still alive or can wait until it resurrects. Likewise, if a user's heartbeat stops, the system kills the user and erases any trace of it from the system. All events tied up by the dead user process are returned to the system. Users can tell a station to take those events and send them to either: 1) the station's input list, 2) the station's output list, or 3) Grand_Central station (essentially dumping them).
Safety features include tracking an event's owner - the process that currently has control over it. Keeping tabs on who has an event prevents the user from writing the same event twice or writing events into the system which it doesn't own and thereby creating serious problems.
Temporary events are called such because occasionally, a user will need an event to hold a large amount of data - larger than the space that was allocated for an event when the ET system was started and the event size was determined. In such cases, a request for a large event will cause a file to be memory mapped with all the requested space. When all users are done with it, this temporary event will be disposed of - freeing up its memory. This is all transparent to the user.
Events can be either high or low priority. High priority events that are placed into the system are always placed at the head of stations' input and output lists. That is, they are placed below other high priority, but above all the low priority items.
The ET system consists of one process and allows no environmental variables to affect its behavior. In addition, there are no global or static variables in the code, making it reentrant. This allows one to use more than one ET system at the same time. Multiple systems peacefully coexist.
Currently ET systems will run on the Linux, Solaris, and MacOS operating systems.
From start to finish, events flow something like this. An ET system is started up with a unique filename to identify it. Once a system exists, a process can open the system which maps ET memory into its own space or talks over the network to its server. At this point the user can begin to use it.
To do anything interesting, a user must attach to a station that it created or that already exists and receive a unique identifier which it can then use to read and write events. They can be read or written either singly or in blocks (i.e. arrays).
A process can attach to many different stations, and it will receive a unique identifier for each station that it attaches to. Processes which wish to be producers can do so by attaching to GRAND_CENTRAL and requesting new events. Alternatively, any attached processes can request new events and write them into their own stations.
After attaching to a station, one can also detach from that station. This is a necessary prerequisite - all attachments must be detached - should one want to remove a station. GRAND_CENTRAL station (the first and automatically created station) can never be removed.
Should the ET system ever die, this can be detected. It is also possible to wait until the system restarts by calling a single routine.