Remote ET

Remote Node Operation Overview

It is possible to have an ET system on one machine and its consumers on another (called remote consumers). Remote consumers can call all the routines that local ones can. Of course, the speed of transferring events over the network is quite a bit slower than the speed of accessing shared memory.

The way this is done is that each ET system has a server built-in. That is, there are multiple threads in the ET system's process which facilitate accessing that system from another computer. One or more of these threads respond to the UDP packets from remote consumers trying to find an ET system of a particular name somewhere on the network usually by broadcasting or multicasting. There is one thread listening on each local subnet broadcast address, one thread on each multicast address used, and one thread on each local network interface address. The response is to send back the port number of the socket that the TCP server thread is listening on and its host's name and all of its IP addresses (among other things). This TCP server thread, when connected with a consumer, is the one which handles all the receiving and sending of events and other information with the consumer.

When an ET system is started up, it's configuration can be set by using the et_system_config_... set of routines. A call to et_system_config_setserverport sets the port number of the ET system's tcp server thread in that particular configuration. If the port is unavailable when actually starting the ET system by a call to et_system_start using that same configuration, the process will exit with an error message. Note that if the server port is not explicitly set in this way, then by default ET_SERVER_PORT (defined as 11111 in et.h) is used as the port number. Similarly, a call to et_system_config_setport sets the port number of the threads listening for the UDP packets.

System Discovery

In order to have things work seamlessly, the user needs to make some decisions. First of all, the decision needs to be made whether consumers connect to ET systems using a direct TCP connection by specifying host and port, or possibly by using broadcasting and/or multicasting. When a consumer does not know the host, broadcasting and/or multicasting is the only option. The response to the UDP packet contains the port number and host name of the TCP server thread to which a connection may be made.

Direct Connection

There are times when using either broadcasting or multicasting is inconvenient or impossible. For example, if an ET system and a consumer are on different subnets, broadcasting from one to the other is stopped by any routers unless such are reprogrammed to allow broadcasting to get through - a hassle in any case. In situations such as these, a direct connection can be made.

Remote consumers need to know the TCP server's port number and the host name that the ET system resides on. Then using et_open_config_setserverport the port can be set, using et_open_config_sethost the host can be set, and using et_open_config_setcast a direct connection can be specified with ET_DIRECT.

Broadcasting

Broadcasting is done to IP addresses which in dotted-decimal form (e.g. 128.7.6.35) can be represented as {netid, subnetid, hostid}. The only type of broadcast address used in ET systems is subnet-directed and is of the form {netid, subnetid,-1} where -1 simply means that that part of the address is composed of all 1's in binary. For example, if 128.7.6 is the subnet with a mask of 255.255.255.0, then 128.6.7.255 is the broadcast address for that subnet. A broadcast will be received by all machines on that particular subnet. You may find the broadcast address(es) of your subnet(s) by using the command "ifconfig -a".

An ET system automatically responds to broadcasts on all its local subnets and no configuration is necessary or possible.

An ET consumer, by default, broadcasts on all its local subnets to find ET systems. Otherwise, one can use the et_open_config_setcast routine to set the configuration to a setting of ET_BROADCAST or ET_BROADANDMULTICAST to do so. Call et_open_config_addbroadcast to add a specific broadcast address to the list of active broadcast addresses. Use it with a value of ET_SUBNET_ALL to add all the local broadcast addresses to the list (the default remember), and use it with the value ET_SUBNET_DEFAULT to add the broadcast address associated with the hostname returned as a result of executing "uname". Likewise, call et_open_config_removebroadcast to remove addresses from the active list with ET_SUBNET_ALL removing all broadcast addresses.

Multicasting

In multicasting a consumer sends out a packet to a special multicast IP address. The listeners (ET systems) sign up to receive any packets send to that address and only computers hosting such listeners will receive the packets - not all machines on the subnet as is the case in broadcasting. Multicasting has the ability to go beyond the local subnet and thus is more flexible than broadcasting. The following table lists all available multicast addresses as well as "TTL" values (reproduced from Unix Network Programming, Volume 1 by Richard Stevens):

Scope	IPv6 Scope	IPv4
Scope	IPv6 Scope	TTL Scope	Administrative Scope
node-local	1	0
link-local	2	1	224.0.0.0 to 224.0.0.225
site-local	5	<32	239.255.0.0 to 239.255.255.255
organization-local	8		239.192.0.0 to 239.195.255.255
global	14	<255	224.0.1.0 to 238.255.255.255

Although this author is NOT an expert ..., the use of TTL values and ranges of addresses is meant to set the range or the scope of the multicasts. The use of setting the TTL value for scoping is accepted and even recommended practice with a default value of one meaning the local subnet only. However, administrative scoping is preferred when possible. The range 239.0.0.0 to 239.25.255.255 is the administratively scoped IPv4 multicast space. "Addresses in this range are assigned locally by an organization but are not guaranteed to be unique across organizational boundaries. An organization must configure its boundary routers (multicast routers at the boundary of the organization) not to forward multicast packets destined to any of these addresses".

In short, pick an address between 239.0.0.0 and 239.25.255.255 for use at one particular site. If this is confusing, talk to your system administrator and ask for a safe multicast address for your use. The default TTL value used in ET is 32 while the default multicast address is ET_MULTICAST_ADDR which is defined in et.h as 239.200.0.0.

An ET system can respond to multicasts on up to ET_MAXADDRESSES (defined in et_private.h as 10) multicast addresses. Add an address with et_system_config_addmulticast and remove it with et_system_config_removemulticast.

An ET consumer can multicast by using the et_open_config_setcast routine to set the configuration to a setting of ET_MULTICAST or ET_BROADANDMULTICAST. Call et_open_config_addmulticast to add a specific multicast address to the list of active addresses. Likewise, call et_open_config_removemulticast to remove addresses from the list. Use et_open_config_setTTL to set the TTL value of the multicast.

Both broadcasting and multicasting may be done simultaneously by specifying ET_BROADANDMULTICAST as an argument for et_open_config_setcast.

Port Selection for Broad/Multicasting

In addition to choosing broadcasting and/or multicasting and choosing the address, the user must also choose the port number for these communications. The Internet Assigned Numbers Authority (IANA) states that the range of port numbers from 0 to 1023 are controlled and assigned by the IANA. Thus, these are unavailable. The ports 1024 to 49151 are NOT controlled by the IANA and are available for use, but the IANA registers and lists the uses of these ports as a convenience to the internet community. For example, ports 6000 to 6063 are assigned for an X window server for both TCP and UDP. Generally, the higher numbered ports are less likely to be used. Finally, ports 49152 to 65535 are called dynamic or private or ephemeral ports. The IANA says nothing about these.

Use the routine et_system_config_setport to configure an ET system to listen for broad/multicasts on a particular port. Use et_open_config_setport to configure a consumer to send broadcasts and et_open_config_setmultiport to send multicasts on a particular port. The port numbers used by the consumer must be the same as those used by the ET system for things to work. By default, if not set explicitly, they are set to ET_BROADCAST_PORT and ET_MULTICAST_PORT respectively (both defined as 11111 in et.h).

Defaults & Macros

When defining a configuration to use in opening an ET system, the defaults are to use broadcasting only to port ET_BROADCAST_PORT (defined as 11111 in et.h) on all local subnet addresses. If the automatic finding of subnets fails, a value of ET_BROADCAST_ADDR is used (defined as "129.57.29.255" in et.h - the author's personal subnet). The macro ET_MULTICAST_PORT is also similarly defined to be 11111, while the macro ET_MULTICAST_ADDR is defined to be "239.200.0.0". The value of ET_MULTICAST_TTL is 32. All of these macros are only defined for the users' convenience.

Examples

When setting up an ET system, very little needs to be done to allow it to be discovered by broadcasting consumers:

et_sys_id id;
et_sysconfig config;

/* initialize configuration */
et_system_config_init(&config);
/* listen to broadcasts by default */
/* start ET system */
et_system_start(&id, config);
/* release configuration's allocated memory */
et_system_config_destroy(config);

When setting up an ET system for both broadcasting and multicasting, try the following:

et_sys_id id;
et_sysconfig config;

et_system_config_init(&config);
/* already listening for broadcasts */
/* listen for multicasts to these 2 addresses */
et_system_config_addmulticast(config, ET_MULTICAST_ADDR);
et_system_config_addmulticast(config, "239.111.222.0");
et_system_start(&id, config);
et_system_config_destroy(config);

When setting up an ET system with specified ports, try the following:

et_sys_id id;
et_sysconfig config;

et_system_config_init(&config);
/* remote users broad/multicast to this port */
et_system_config_setport(config, ET_BROADCAST_PORT);
/* set port of tcp server thread */
et_system_config_setserverport(config, 11222);
et_system_start(&id, config);
et_system_config_destroy(config);

When setting up a consumer to open an ET system on an unknown host which may be anywhere (local or remote), and it's trying to find that system using broadcasting, then include the following code:

et_sys_id id;
et_openconfig config;

et_open_config_init(&config);
/* broadcasting by default */
/* ET is on an unknown host */
et_open_config_sethost(config, ET_HOST_ANYWHERE);
et_open(&id, "et_name", config);
et_open_config_destroy(config);

When setting up a consumer that knows the ET system is on a different host, and is trying to find it using multicasting on port ET_MULTICAST_PORT at address ET_MULTICAST_ADDR, then include the following code:

et_sys_id id;
et_openconfig config;

et_open_config_init(&config);
/* ET is remote */
et_open_config_sethost(config, ET_HOST_REMOTE);
/* use multicast to find ET system */
et_open_config_setcast(config, ET_MULTICAST);
/* remote users multicast to this port */
et_open_config_setmultiport(config, ET_MULTICAST_PORT);
/* remote users multicast to this address */
et_open_config_addmulticast(config, ET_MULTICAST_ADDR);
et_open(&id, "et_name", config);
et_open_config_destroy(config);

When setting up a consumer that knows the name of the host running the ET system (ethost.mylab.org) but nothing else, and is trying to find that system using both broadcasting and multicasting at address 239.235.89.12, then include the following code:

et_sys_id id;
et_openconfig config;

et_open_config_init(&config);
/* ET is running on ethost.mylab.org */
et_open_config_sethost(config, "ethost.mylab.org");
/* use broad and multicasting to find ET system */
et_open_config_setcast(config, ET_BROADANDMULTICAST);
/* remote users multicast to this address */
et_open_config_addmulticast(config, "239.235.89.12"); et_open(&id, "et_name", config);
et_open_config_destroy(config);

When setting up a consumer to open an ET system on a known host (129.182.54.67), and trying to directly connect to it on server port 12345 (bypassing all UDP communications), then include the following code:

et_sys_id id;
et_openconfig config;

et_open_config_init(&config);
/* ET is on 129.182.54.67 */
et_open_config_sethost(config, "129.182.54.67");
/* use a direct connection to the ET system */
et_open_config_setcast(config, ET_DIRECT);
/* ET system's server is on this port */
et_open_config_setserverport(config, 12345);
et_open(&id, "et_name", config);
et_open_config_destroy(config);

Network Interface Selection

There are occasions when the ET consumer wants to select which network interfaces it wants to use when communicating with the ET system. Often hosts have multiple interfaces – perhaps on different subnets and with different speeds. It is not unusual that a slower interface is used for control information while a faster one is used for data transfer. The two parts to this problem that must be considered are the ET system interfaces and those on the consumer.

ET System’s Network Interfaces

Linux and MacOS, and to a lesser extent Solaris, have what is called a “Weak End” model of network communication. This means all of a host’s IP addresses are considered to belong to the host in general and not to a particular network interface. This can create a problem with ARP tables – tables which associate a specific IP address with a specific MAC hardware address. When an ARP request gets sent out, by default on Linux, a particular interface may respond with all the IP addresses on its host and the ARP table may end up with the interface’s MAC address associated with an incorrect IP address. Thus a TCP packet may arrive at the correct host but on an incorrect network interface – one associated with a different IP address. What happens at this point is that Linux merely forwards the packet to the socket even though, strictly speaking, it came in the “wrong” interface.

How could this affect an ET system and its consumers? Say an ET system exists on a host with 2 interfaces, one fast and the other slow. It is possible that a consumer would select the host it wants to connect to by specifying the IP address of the fast interface (using et_open_config_sethost). Because of the ARP table’s incorrect mapping, the ET consumer’s TCP packets would end up being delivered to the slow interface on that host. They would still reach their intended destination but over the slow network connection.

The correction for this problem is fairly simple. It’s possible to correct the ARP table (even across reboots) by making the following changes in Linux to the /etc/sysctl.conf file. Simply add the following 2 lines and reboot:

# Allow ARP reply only if the target IP address is local address configured on the incoming interface

net.ipv4.conf.default.arp_ignore = 1

To make the change without rebooting, in a console, type similar lines for each network interface:

% sysctl net.ipv4.conf.eth0.arg_ignore=1

% ifconfig eth0 down

% ifconfig eth0 up

Once the ARP table is correct, TCP packets will be delivered to the correct interface.

ET Consumer’s Network Interfaces

Specifying which network interface gets used by a consumer’s host to communicate with the ET system is simple. Before opening the ET system, call et_open_config_setinterface and supply it with a dotted-decimal format IP address associated with the interface of interest. If no interface is specified, the operating system makes the choice.

Remote Programming Details

Errors

As mentioned previously, ET_ERROR_NOREMOTE is the error returned when calling a routine which is not supported for remote use. Currently, however, there are no routines which return this error. Some remote user errors are given by ET_ERROR_REMOTE - those errors which are unique to a remote user and do not occur locally. In practice, this error is returned when memory cannot be allocated by the remote end. If there are errors in reading or writing over the network, the errors generated will be ET_ERROR_READ or ET_ERROR_WRITE.

Remote Behavior on a Local Host

It is possible to tell consumers to run the code that a remote consumer runs even if it is running on the same computer as the ET system. In this case, all communication with the ET system is done through sockets with no usage of the shared memory. This is done by calling et_open_config_setmode with the ET_HOST_AS_REMOTE option. The default mode is ET_HOST_AS_LOCAL.

Modifying Events

After opening an ET system, creating a station, and attaching to it, users are ready to start reading events. There are a few details to keep in mind when doing so remotely.

Remote users can gain quite a bit of efficiency by minimizing communication with the ET system. The minimizing of communication is done transparently and is the default mode of operation. That is, when a remote user calls et_event(s)_get, the ET system copies the events and sends them over the network to the user but also immediately puts them back into the ET system with a call to et_event(s)_put. There may be times, however, when a user first wishes to modify the events and then send them back over the network to the ET system. To aid in this effort an extra flag is introduced, ET_MODIFY. By ORing this flag to ET_SLEEP, ET_TIMED, or ET_ASYNC, the user announces an intention to modify the requested event. Thus, when the ET server initially gets the event for the remote user, it does NOT put it back into the ET system immediately afterwards. It waits until the user has called et_event(s)_put before doing that. Without this flag, the server puts the events back into the ET system immediately.

There may be occasions when the remote user doesn't want to modify the data but only the header information such as the priority, control words, and such. In that case it makes no sense to send all the data back to the ET system when putting the event back. By using the flag ET_MODIFY_HEADER instead of ET_MODIFY, only the header information will be sent back - speeding up communication greatly.

Multi-Threading

If a remote consumer is a multi-threaded program, no special precautions are necessary as the ET library is thread-safe. However, if more than one thread uses the same ET system id (et_open called only once), there will be a bottle neck as only one remote ET library function call at a time can be made. To avoid this problem, each thread that wants access to the ET system needs to do its own et_open and thus communicate on its own socket to its own server thread. This should speed things up.

Swapping Data

Transferring data between machines where one is big endian (the most significant byte is placed in the lowest memory address) and the other is little endian (the least significant byte is placed in the lowest memory address), requires the data to be "swapped". Since in general a user may not be knowledgeable about the machine on which a particular event was originally produced, a simple call to the function et_event_needtoswap(et_event *pe, int *swap) will reveal whether the data needs to be swapped or not. If the return value placed in swap is ET_NOSWAP, no swapping is necessary; however, if the return value is ET_SWAP, then the opposite is true.

The ET system automatically keeps track of the endianness of an event's data. However, the user may want to forcibly set the data's endianness for some reason. In that case, a call to et_event_setendian(et_event *pe, int endian) can be made. The endianness can be set to ET_ENDIAN_BIG, ET_ENDIAN_LITTLE, ET_ENDIAN_LOCAL (same endian as local host), ET_ENDIAN_NOTLOCAL (opposite endian as local host), or ET_ENDIAN_SWITCH (switch the endian from whatever it is). This routine does NOT swap the data but simply keeps track of the data's endianness in the event's header. A user may also read the endianness of an event's data by a call to et_event_getendian(et_event *pe, int *endian). It returns either ET_ENDIAN_BIG or ET_ENDIAN_LITTLE.

The routine et_event_CODAswap(et_event *pe) was provided for those who need to swap data in CODA format. However, it cannot swap all types of CODA anymore; thus, it has been deprecated. To do a complete swap use the routine provided in the evio library, evioswap().

Users of data formats other than CODA format must provide their own swapping routines.

Another routine of interest is et_system_getlocality(et_sys_id id, int *locality). This returns the value ET_REMOTE in the variable locality if the ET system is remote, ET_LOCAL if it is local, and ET_LOCAL_NOSHARE is it is local but is using an operating system which does not allow sharing of pthread mutexes across processes (e.g. Linux).

Transferring Events Between Two ET Systems

While it is certainly possible for a user to copy events from one ET system and place them in another with "normal" ET function calls, the ET system provides a more efficient way to do this. By using ET's bridging software, unnecessary coping of the data may be eliminated from the procedure. Regardless of whether the ET systems are on the same or different computers or if the process running the bridging routine is on one or the other or on yet a third machine, the transfer should take place smoothly. It will save time except perhaps when both ET systems and the bridging process are on the same machine in which case only a single copy of the data is made - no different than when using the "normal" ET function calls. A call to the following function will take care of all the details:

et_events_bridge(et_sys_id id_from, et_sys_id id_to, et_att_id att_from, et_att_id att_to, int num, int *ntransferred, et_bridgeconfig bconfig).

The arguments are respectively: the ID of the ET system from which the events are copied, the ID of the ET system to which the events are going, the attachment to a station on the "from" ET system, the attachment to a station on the "to" ET system (usually an attachment to GRAND_CENTRAL), the total number of events desired to be transferred, the total number of events that were actually transferred at the routine's return, and a configuration argument that will be described shortly. The configuration argument may be NULL in which case defaults are used.

The configuration for bridging events is very similar to the configuration for opening a system or creating a system. There are a number of functions used to create and define the config argument. It is initialized by a call to et_bridge_config_init (et_bridgeconfig *config). When the user is finished using the configuration, et_bridge_config_destroy (et_bridgeconfig config) must be called in order to properly release all memory used.

After initialization, calls can be made to functions which set various properties of the specific configuration. Calls to these setting functions will fail unless the configuration is first initialized. The functions used to SET these properties are listed below along with an explanation for each:

1. et_bridge_config_setmodefrom(et_bridgeconfig config, int val) : setting val to ET_SLEEP, ET_TIMED, or ET_ASYNC determines the mode of getting events from the "from" ET system. The default is ET_SLEEP.

2. et_bridge_config_setmodeto(et_bridgeconfig config, int val) : setting val to ET_SLEEP, ET_TIMED, or ET_ASYNC determines the mode of getting new events from the "to" ET system. The default is ET_SLEEP.

3. et_bridge_config_setchunkfrom(et_bridgeconfig config, int val) : setting val sets the maximum number of events to get from the "from" ET system in a single call to et_events_get - the chunk size if you will. The default is 100.

4. et_bridge_config_setchunkto(et_bridgeconfig config, int val) : setting val sets the maximum number of new events to get from the "to" ET system in a single call to et_events_new - the chunk size if you will. The default is 100.

5. et_bridge_config_settimeoutfrom(et_bridgeconfig config, struct timespec val) : setting val sets the time to wait for the "from" ET system when the mode is set to ET_TIMED. The default is 0 sec.

6. et_bridge_config_settimeoutto(et_bridgeconfig config, struct timespec val) : setting val sets the time to wait for the "to" ET system when the mode is set to ET_TIMED. The default is 0 sec.

7. et_bridge_config_setfunc(et_bridgeconfig config, ET_SWAP_FUNCPTR func) : setting func to a function pointer (function name) means that the function will be called to swap data whenever it's determined to be necessary. Using this feature is a convenient way of swapping data while it's being moved from one ET system to another with no intervention from the user needed. The function must be of the form: int func(et_event *src, et_event *dest, int bytes, int same_endian) . It returns ET_OK if successful otherwise ET_ERROR. The arguments consists of: src which is a pointer to the event whose data is to be swapped, dest which is a pointer to the event where the swapped data goes, bytes which tells the length of the data in bytes, and same_endian which is a flag equaling one if the machine and the data are of the same endian and zero otherwise. This function must be able to work with src and dest being the same event. With this as a prototype, the user can write a routine which swaps data in the appropriate manner. Notice that the first two arguments are pointers to events and not data buffers. This allows the writer of such a routine to have access to any of the event's header information. In general, such functions should NOT call et_event_setendian in order to change the registered endian value of the data. This is already taken care of in et_events_bridge. The default is NULL which means no swapping is done.

8. DEPRECATED et_bridge_CODAswap(et_event *src, et_event *dest, int bytes, int same_endian) : this was a function that could be used in et_bridge_config_setfunc if the user wanted to swap CODA format data. Currently, however, it only returns an error as swapping of evio data must now be handled using the evio library.

There are corresponding et_bridge_config_get... functions to get the configuration values of everything except the swapping function.