CNetworkQueue - SW (Nov 2006 to Mar 2008)

RC3

Latest Changes and Updates

Introduction

The network queue is a middle layer software used to etablish communication between two our more network devices. The main approach is to have a fast and easy way to connect different embedded computing devices together or connect them to a control system server.

Overview

The wide implementation of embedded systems with direct Ethernet / IP interface at particle accelerators is ongoing. A defined way needs to be specified to exchange information between embedded systems with Ethernet/IP interface and matured control system protocols. As earlier steps to port control system protocol kernels directly to the embedded system were finally not successful, a defined, stable and portable way of exchanging information between full-featured control system servers and embedded systems with respect to endianess, RISC/CISC architecture, ease of use, portability and stability was needed. An agreement was done to implement a so-called Network Queue to transport "messages" from local peer to remote peer and vice versa. The following keynotes outline the basics of the network queue implementation:

The first planned use cases are Neutron Detector data transmission, Klystron Interlock and Mover/Rotational Steerer business. It should be pretty easy to adopt the source code to other use cases.

nq_overview_v1.gif

Architectural Overview

On the image above, one can see the main outline of the Network Queue architecture. Dedicated tasks or threads (model is chosen based on platform providings, either tasks or threads) are used for decoupled sending and receiving of network packets). Inside send task, a logical message is extracted out of the send queue and encoded into a network packet that is send out using the UDP socket to remote peer. Inside receive task, a network packet is received if available, decoded into a logical message and put into the receive queue. The receive and send queue are used to decouple the tasks from the main application. Event signalling on the queues is done via semaphores. By making use of this technique, no additional CPU time is wasted. One can set the depth of these queues (as all other parameters) in the platform-independent Declarations.h file. Inside the main application, an instance of the main class, namely CNetworkQueue is used to access the API of the network queue. This API consist of classes that contain methods to send a message, to receive a message, to construct a logical message, to apply net data to a logical message and to react on errors. In addition, a special synchronous blocking function is provided as an RPC call. On the remote peer side, the same source code is used as on local peer side. This minimizes redundancy while increasing stability of code and ease of maintenance. The platform-independent source code was written in such a way that it compiles warning (even at -Wall) and error free on any supported platform. Platform-dependencies are maintained through "wrapper files" Platform.cpp and Platform.h. Inside, the required operating system functionality is implemented. Namely socket-functionality, task resp. thread management, 'counting up to N' semaphores, mutex, error handling, sleeping, data types and endianess changers.

Network transport, marshalling

On the network, the UDP packet always contains little endian data. During encoding a logical message, packet marshalling is done. Directly inside the receive thread at the remote peer, the packet is unmarshalled and the logical message is decoded. By this method, marshalling and network transport is fully transparent to the user. On marshalling, the logical message is put into the physical packet layout (little endian) and wrapped by a packet header and packet tail. This wrapping contents will be checked on receive at the remote peer to make sure only valid packets are delivered further. On UDP sockets, packet filtering is implemented if a packet is received from an unknown peer. If a packet is going to be send to an unknown peer, it is rejected. Additional peers have to be told to the Network Queue class instance. To draw another bound-line to secure the network transport, the packet marshalling is done to roughly check the integrity of the network packet.

Class Layout

nq_classdiagram.png

Class (and Class Relationship) Overview

The user code interfaces with the Network Queue API by two classes. First CNQMessage, which abstracts message creation, encoding, marshalling, decoding and disposal. Second CNetworkQueue, which is the core class of the Network Queue. CNQMessage and CNetworkQueue are friends, the are able to access each others private variables without violation by performance reasons. The CNetworkQueue API consists of several parts:

The basic API provides functionality to open/create a local Network Queue endpoint and to cleanly close it. Moreover, methods are provided to add or remove a remote peer to/from the list of allowed peers. Only peers that are once registered in this list are able to talk to this local endpoint. A peer is defined by a unique combination of IPv4-address and port number. In addition, methods for extended error information and to get statistical information, are provided.

Inside CNetworkQueue there exist a pool of a defined number of CNQMessage instances, which can be requested, referenced and put back to the pool finally. This functionality is provided by API methods. The reason of this pool is to minimize dynamic memory allocation, saving cpu cycles on slow embedded devices and stack segment consumption. By making use of this API part, the user programmer can request an instance to a CNQMessage object in order to fill it with contents that need to be transferred to a remote peer. Afterwards, the CNQMessage instance can be put into either synchronous or asynchronous API in order to send it out. It should be noted that on sending and receiving a CNQMessage instance using synchonous or asynchronous API methods, it is important to put these instances back to the pool after finishing using them. Otherwise the pool might get empty sometimes. The number of CNQMessage instances in the pool is adjustable at compile time by a constant inside NQDeclarations.h.

Inside asynchronous API, methods are provided to send (fire and forget) an already constructed CNQMessage object. Moreover, a method is provided to probe (with immediate return) whether a new CNQMessage was received by any remote peer. As third method, one can check whether a new CNQMessage was or will arrive soon from remote peer (probe with timeout). The timeout blocking is depicted by a b surrounded by a circle (see legend).

The syncronous API consist of two methods. The first one sends a user-defined CNQMessage out and awaits an acknowledge packet (ACK) from the remote peer in a defined timeout period. Execution flow only continues if ACK was received successfully or the timeout expired. It must be noted that even if ACK was not received locally the original CNQMessage might be received succesfully at the remote end and just the ACK packet was thrown away somewhere. The second method of synchronous API provides functionality in order to emulate something like a function call. One can send a 'request'-tagged CNQMessage and wait for a 'response' tagged CNQMessage that was send especially as answer to the original 'request' message. There is a timeout period which must be specified at the function call in order not to break execution for an unlimited amount of time.

Inside of the CNetworkQueue class, a couple of packet queues are used to decouple/buffer sending and receiving from calls to the API of the class. Inside the queues, CNQMessage instances will be temporarily placed until they are required by the other end.

Before sending CNQMessage contents out, it is checked whether the destination is in the list of allowed peers, depicted by CHK! markers in the diagram. After this is successfully done, the message is placed in the send queue. The decoupled send thread/task will then send out the accumulated packets in the send queue as fast as it can.

On receiving of UDP packets, which happens inside receive thread, they are examined to verify that the sender (ip:port) is in the list of allowed peers. If it is not, the packet is thrown away. It must be noted that also any packet is thrown away which contains neither a valid encoded CNQMessage nor an ACK packet. In the next two stages, the packet is examined whether

If any of these rules match, the packet is treated special and will not be guided through to the asynchronous receiveing packet queue. For an ACK, the ACK packet queue is examined and if a proper waiting blocking send() was found, its semaphore will be signalled. If nothing could be found, a warning message will be printed out in debug mode and the ACK packet will be thrown away. For a so-called awaited response, the packet is placed as a CNQMessage instance into a dedicated queue and the semaphore of the blocking function-call is signalled to unlock the block.

Directory Contents

Source code explanation

The sourcecode is well inline-commented. API documentation is directly incorporated into the appropriate source and header files using Doxygen commands. Regarding platform-based documentation, Platform.cpp and Platform.h are reasonable inline-documented. It should be noted that detailed Doxygen documentation is only available for API parts of the Network Queue.

Source code naming conventions

In order to easier learn and understand the structure and behaviour of sourcecode, certain assumptions shall be made through all source files:

Platform Specifics

Windows-platform

The implementation of the Network Queue started on this platform. During development, MS Visual C++ v6 SP6 was used. Threads, semaphores and mutex are implemented using raw win32 API.

Linux-platform

The implementation of the Network Queue was done on 64-bit platform (SL4). GCC v.3.2.3 was used as compiler and linker. Posix API was used for threads (decoupled sending and receiving), semaphores and mutex. The required "counting up to max" semaphore is implemented by hand because such element is not directly supported by Posix API.

NIOS2-LWIP-ucOS/II platform

The implementation of the Network Queue for Nios2 platform was done using Nios2 SDK and IDE v5.1 with latest patches as of January 2007. The Nios2 SDK v5.1 includes LWIP v1.1.0 and ucOS/II v2.77 . The testing was done on a Klystron Interlock reference board (CntrlM #5) which embeds a Altera Nios2 softcore processor running on an Altera Cyclone II FPGA.

As for the implementation details for Nios2 platform, the mutex element was created as a binary semaphore. A dedicated Mutex object is available in ucOS/II, but due to breaking source-dependency, implementation using this type was not meaningful. The "counting up to max" semaphore is implemented by hand due to lack of support for this element in ucOS/II 2.77 API.

Tasks (using ucOS/II and LWIP API) are used as physical tool to implement decoupled send and receive functionality. Platform-independent semaphores and mutex are implemented using ucOS/II API.

Note:
LWIP tasks stack size (set inside Nios2 IDE System Library Properties) should be at least 65536 bytes (16384 DWORDS)! According to prototyping and testing period, a value less than is size may cause irregular behaviour and hardly-reproducable stack-critical bugs that are very hard to debug.

NIOS2-LWIP-ucOS/II platform

The second implementation of the Network Queue for Nios2 platform was done using Nios2 SDK and IDE v6.1 with latest patches as of October 2007. The Nios2 SDK v6.1 includes InterNiche NicheStack and ucOS/II v2.77 . The testing was done on a Klystron Interlock reference board (CntrlM #5) which embeds a Altera Nios2 softcore processor running on an Altera Cyclone II FPGA.

As for the implementation details for Nios2 platform, details regarding ucOS/II can be found in the last section. The second implementation is an adaption to InterNiche nicheStack for Altera Nios2. Alter decides to deprecate the LWIP stack and the design guideline is to use nicheStack from now on. Changes to nicheStack were rather straightforward. Two special findings need to be known and understood:

Regarding performance, nicheStack works much better than LWIP. When doing estimations with the Altera-Nios2-based board mentioned above, the speed-factor is about 1:4 to 1:6.

Solaris-platform

The implementation of the Network Queue was done on 32-bit platform (Solaris 5.8 and 5.10). GCC v.3.4.3 and Sun CC (Sun Studio 11) were checked and used as compiler and linker. Posix API was used for threads (decoupled sending and receiving), semaphores and mutex. The required "counting up to max" semaphore is implemented by hand because such element is not directly supported by Posix API.

Generated on Tue Apr 22 18:19:57 2008 for NetworkQueue by  doxygen 1.5.5