Marek Uliasz | Dreamstime
Rtos Promo Marek Uliasz Dreamstime Xxl 43650219 63324a0584db7

High-Availability RTOSs Deliver Five-Nines Reliability

Sept. 27, 2022
To work, multiprocessor systems and hot-swap hardware require high-availability RTOSs.

Article updated 9/26/22

New real-time operating-system (RTOS) enhancements make 99.999% availability and real-time application requirements achievable. Applications like transaction processing, process control, communications switching, and air-traffic control are just a few examples where any downtime cannot be tolerated. Such companies as Monta Vista, Enea, BlackBerry QNX Systems, Red Hat, UbuntuLynx Software Technologies, and Aptiv/Wind River have added high-availability services to the list of modules that can be incorporated into an RTOS.

The technology of high-availability systems isn't new. IBM, Sun, Microsoft, and others have done it for years. Custom embedded systems have often utilized high-availability techniques through customized software instead of standardized OS support.

High-availability hardware isn't new either, but this type of hardware such as RAID disk and tape support is showing up in more embedded and real-time systems. Standard CompactPCI Serial systems, like those from ADLINK, provide hot-swap board support. Likewise, network interconnects, including Ethernet and InfiniBand, give developers a choice of implementation methods. Today, off-the-shelf hardware can provide high-availability support with an off-the-shelf RTOS.

High-availability hardware systems available generally feature:

  • Hot-swapping capability. This is available in computer boards like CompactPCI Serial boards and disk and tape drives.
  • Multiprocessor links. Popular buses like InfiniBand and PCI Express as well as networks like Ethernet include this feature.
  • A RAID (redundant arrays of hard disks) architecture as found in disk and tape drives.

It's important to recognize the roles redundant hardware and hot-swapping play in a high-availability system. A number of hardware technologies are available to implement high-availability systems.

Software support for high-availability systems is cropping up in a number of places (Fig. 1). Now, even an application programming interface (API) exists for CompactPCI Serial.

Checkpointing, transaction support, and application heartbeat support are just some of the features being used with real-time systems. But the APIs aren't always standardized across vendors because each OS implements a heartbeat support in a different fashion.

Checkpointing is the ability to save enough information from a process to restart it if it fails. Heartbeat support is the act of finding when a process fails.

Modularity is still the key aspect of high availability in an RTOS. One example can be seen in a partitioning of high-availability services that closely match an OS, in this case, Wind River's VxWorks  (Fig. 2).

Other examples include Lynx MOSA.ic which add high-availability support to Linux-compatible and Linux operating systems respectively. These additions have a modular construction similar to VxWorks.

Hardware may steal the limelight in numerous circuit designs, but high-availability hardware won't work without the correct software. More importantly, high-availability applications need to operate regardless of the kind of hardware available in the system. In particular, applications must continue working with other applications in the system, even if one application fails due to errant coding, a lack of resources, or other software-related problems.

In some cases, software failover support can be provided transparently. That's how many message-based systems operate.

In general, a high-availability system should have the following software services:

  • Heartbeat support for each server and each application.
  • Event management capability for change notification.
  • Alarm management for error handling.
  • Transactions capability for check-pointing and rollback/restart.
  • Clustering for server management and applications links.
  • Reliable storage support for RAIDs and for journaling file systems.

With QNX, applications communicate with each other using a messaging system that is part of the RTOS' core services. The QNX message system supports transparent message-based services independent of its new high-availability support. The QNX link manager can detect a failed application and redirect messages to an alternate application (Fig. 3).

The link manager can utilize alternate paths between applications and start up a new application if necessary. Changes are handled based on an application's description of a link. QNX uses messaging for all major services, and messages move transparently across node boundaries (Fig. 3). Of course, this redirection works equally well between applications on the same node.

IBM, Microsoft, and Sun Microsystems have extensive clustering solutions. Although these tend to be used in high-end installations, the same techniques are applicable to embedded environments.

APIs for this type of clustering support are OS-specific. Applications must take advantage of these APIs, and applications that work together are tightly integrated.

Exceptions, such as a failed service or application, must be handled explicitly. High-availability support typically provides services like checkpointing and transaction rollback.

RTOS high-availability modularity allows developers to choose the kinds of services needed to support their particular requirements. This may include hardware support such as hot-swap recognition, device failure, environment problems like overheating, or the use of reliable storage.

It might further be limited to event and alarm support. Even basic heartbeat monitoring can help bring a system into high-availability land if applications are written to handle faults.

Certainly, additional high-availability modules should make the programmer's job easier. For this reason, high-availability technologies from high-end systems, such as clustering, are finding their way into embedded systems.

Some high-availability technologies already exist in many RTOSs. Those from QNX are an example. This message-based RTOS provides transparent message redirection as part of the regular RTOS implementation. Additional support addresses features typically not found in a basic RTOS, such as transaction-oriented checkpoint support.

In this case, a checkpointed task provides data and restart information as part of a checkpoint that's managed by the QNX high-availability monitor. If the task terminates or fails to respond in a set time, the monitor will start a new task.

Using features like checkpointing becomes significantly easier with off-the-shelf components if the RTOS vendor provides support for the boards used in the system. The latest crop of high-availability add-ons, such as those available from Wind River and QNX, have the necessary support.

Meeting the five-nines requirement isn't the only reason to consider for high-availability support. Simply providing a more reliable product is justification enough to consider a high-availability-enabled RTOS—either that, or build it from scratch.

Yes, high-availability RTOS integration is just beginning.

This article appeared in Electronic Design, Oct 29, 2001.

Need More Information?
Green Hills Software Inc.
(805) 965-6044
www.ghs.com

IBM Corp.
(800) IBM-4YOU
www.ibm.com

Lane 15 Software Inc.
(512) 502-9898
www.lane15.com

Lynuxworks Inc.
(408) 979-3900
www.lynuxworks.com

Monta Vista
(408) 328-9200
www.mvista.com

Microsoft Corp.
(425) 882-8080
www.microsoft.com

Enea
(408) 392-9300
www.enea.com

PCI Industrial Computer
Manufacturers Group

(781) 246-9318
www.pcimg.org

QNX Systems
(800) 676-0566
www.qnx.com

Red Hat Inc.
(919) 547-0012
www.redhat.com

Sun Microsystems Inc.
(800) 786-7638
www.sun.com

Wind River Systems
(800) 545-WIND
www.windriver.com

About the Author

William G. Wong | Senior Content Director - Electronic Design and Microwaves & RF

I am Editor of Electronic Design focusing on embedded, software, and systems. As Senior Content Director, I also manage Microwaves & RF and I work with a great team of editors to provide engineers, programmers, developers and technical managers with interesting and useful articles and videos on a regular basis. Check out our free newsletters to see the latest content.

You can send press releases for new products for possible coverage on the website. I am also interested in receiving contributed articles for publishing on our website. Use our template and send to me along with a signed release form. 

Check out my blog, AltEmbedded on Electronic Design, as well as his latest articles on this site that are listed below. 

You can visit my social media via these links:

I earned a Bachelor of Electrical Engineering at the Georgia Institute of Technology and a Masters in Computer Science from Rutgers University. I still do a bit of programming using everything from C and C++ to Rust and Ada/SPARK. I do a bit of PHP programming for Drupal websites. I have posted a few Drupal modules.  

I still get a hand on software and electronic hardware. Some of this can be found on our Kit Close-Up video series. You can also see me on many of our TechXchange Talk videos. I am interested in a range of projects from robotics to artificial intelligence. 

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!