New Choices for System Architects
This marriage of switched fabrics and DDS real-time middleware offers architects new flexibility in adding capabilities that were once quite difficult to achieve. Many of the features offered by switched fabrics have complementary capabilities in the DDS-compliant middleware. For example, switched fabrics typically offer rich error-management features, such as the ability to recognize, report, and route around failed paths. With DDS-compliant software, system designers can also take advantage of DDS error-reporting facilities.
A key feature of switched fabrics is support for multiple paths between nodes (Fig. 3). With such support, system are able to easily implement multiple physical interconnects that can be combined with sophisticated error management. Likewise, with DDS, applications can take advantage of redundant publishers that have different strengths or bandwidths. When a higher-strength publisher fails, one with lower strength is automatically switched in by the DDS middleware. In addition to fault tolerance, this can also help with load balancing on heavily used networks (Fig. 4).
Figure 3. DDS publish-subscribe involves direct anonymous communication between producers and consumers of data. Topic A has a primary producer 1, and a backup producer 2. Note that nominally when producer 1 is active consumers do not receive data from the backup producer Switched-fabric specifications already provide for a hot-plug or hot-swap capability. This hardware capability can be combined with a "virtual" hot-plug capability at the application level using DDS middleware. Unlike traditional tightly coupled client/server architectures, DDS middleware allows producers and consumers to be dynamically added or removed in an operational system.
Figure 4. DDS provides for automatic failover. When the primary producer 1 of topic A crashes, the middleware automatically switches to the backup producer 2 of topic A. The consumers get unin Many switched fabrics provide sophisticated features that allow, for instance, bandwidth-reserved, isochronous transactions across the fabric, something that is not supported by, say, Ethernet. Corresponding to the hardware QoS facilities, DDS-compliant middleware can offer a number of QoS policies that make predictability at the application level possible. For instance, the TRANSPORT_PRIORITY policy allows developers to manage how they prioritize one data flow over another.
The Roadmap for Distributed Data Services
The existence of DDS as a standard specification endorsed by the Department of Defense (DoD) paves the way for addressing the challenge of distributing data among a myriad of defense systems. DDS is now mandated for data distribution by the Navy Open Architecture Computer Environment (Navy OACE), and DISR (DoD Information Technology Standards Registry FCS Future Combat Systems) and has already been adopted by programs such as FCS, DD(X), LCS (Littoral Combat Ship), and SSDS (Ship Self Defense System).
But despite the existence of a standard specification, the value of the solution is highly dependent upon its implementation. The specification defines certain features and capabilities, but not how they should be implemented.
A carefully designed middleware architecture can reduce the likelihood of a fault, limit the damage of a fault if it does occur, help detect faults immediately, protect the middleware from errors in application code, and isolate applications from errors in other applications. That architecture can also deliver significant advantages in the performance and flexibility of network distributed data communications.
For example, the DDS specification defines how a publish-subscribe communication model should work for a distributed real-time network. The DDS specification defines DataWriters for publishing and DataReaders for subscribing to a single topic on a user-defined data type. This in itself is standard and straightforward, but how it's implemented can significantly impact network performance and scalability.
A robust implementation improves both performance and scalability by defining an architecture that supplies each DataWriter or DataReader with a queue that buffers messages bound for another endpoint through a transport. This architecture supports direct end-to-end messaging, since each endpoint (a DataReader or DataWriter) in each application communicates directly with a sister set of endpoints. Each endpoint has a dedicated set of buffers to hold messages in transit to other endpoints.
Such a queuing architecture provides for an optimized transfer of messages from DataWriter to DataReader, no matter where each resides on the network. Also, because the endpoints queue and buffer transmissions to other endpoints, this architecture can easily scale to large and complex networks with predictable delivery times.
In a similar manner, DDS defines the concept of a "DomainParticipant," which is the fundamental container entity that can participate in a publish-subscribe network. A DomainParticipant can contain many DataReaders and DataWriters. Typical applications may use only one domain, and therefore have one DomainParticipant. However, applications are free to create several DomainParticipants, so that multiple instances of this entity can exist simultaneously.
Multiple execution threads are a way to optimize responsiveness and performance, while also allowing the system to scale across a broad fabric-based network. One possible approach is to use several dedicated threads for each DomainParticipant, in this manner:
Event thread: Manages both timing delays and periodic events, such as protocol heartbeats, deadlines, and liveliness.
Database cleanup thread: Purges old information from the internal data structures, such as publication declarations and subscription requests.
Receive threads: They process the data packets received from the underlying network transports. A receive thread is created per transport "port," which represents a transport specific resource for receiving incoming messages.
When the application provides new data to the DDS middleware, the message passes all the way through to the network in a single operation. In the user's thread context, the message is serialized, deposited into the writer queue, encapsulated into a wire-protocol packet, and passed to the transport for delivery.
In the common case, the entire operation's critical path takes no inter-application locks and suffers no context switches. The event thread is only involved if the initial transport operation fails, or if it must execute follow-on processing (such as ensuring reliable delivery).
The event thread has ready access to the message, since it's already stored in the writer queue. When the transport receives a new packet, the appropriate receive thread processes the packet, retrieves the message, stores it in the reader queue, and immediately executes the listener callback. In the common-case critical path, there are no inter-application locks or context switches. If the application requires the message to be handled with user threads, it can do so with DDS WaitSets. Both flexibility and performance are optimized, even as the network scales.