Multi-processor systems-on-chip (MPSoCs) are increasingly becoming the hardware offering of choice among embedded-systems designers. Driving this trend are market demands for increased performance and capacity, lower power consumption, and lower bill-of-materials (BOM) costs. The consolidated hardware and subsystems increase the complexity of the associated embedded software, including the ability to address subsystem hardware isolation features like safety zones that make multicore enablement and use even more challenging.
Emerging from these technological advances is the concept of mixed safety-critical systems. A mixed safety-critical system requires the execution of several applications of different safety-integrity levels (SIL) or criticalities, such as safety-critical and non-safety-critical, on a single MPSoC. One example is an in-vehicle gateway in which one side of the gateway needs to interact with an Automotive Safety Integrity Level (ASIL) system, while the other side of the gateway passes data to other systems (maybe even to the cloud) that don’t have a functional safety requirement.
In the past, users would have to create different hardware systems to meet the functional safety requirement or certify the entire system (including the parts that didn’t impact safety functions). Now through new standards like the OpenAMP Framework, users can take advantage of the features of the MPSoC to separate the safe world from the unsafe world, yet maintain system communications through technologies.
OpenAMP Introduction
The electronics industry has seen a dramatic rise in the number of embedded heterogeneous hardware platforms to address the diverse requirements of today’s electronic devices, and to help lower BOM costs. This new SoC hardware brings with it increased system demands that range from supporting real-time behavior, to the provisioning of rich user interfaces (UIs). Thus, CPUs with different capabilities are now clustered together to optimally handle these diverse tasks.
On the software side, this is accomplished by incorporating dissimilar software environments suitable to the capabilities of the CPUs present in the system. The different software contexts work in conjunction to provide the desired functionality. This collaboration usually entails communication enablement between the software components and the management of system resources.
An AMP software architecture is required to develop such systems. The OpenAMP framework enables the development of such AMP software architecture by providing the management of system resources and communication between the participating software contexts.
OpenAMP establishes a communications channel between the master operating system and remote operating systems through RPMsg, allowing data to be passed across the channel. In addition, it defines the Remoteproc feature for lifecycle management. For example, an OS or application stack can be started and stopped on remote processors, thereby reducing power consumption. This article focuses on inter-processor communication using RPMsg.
General Safety Architecture
Before developing a mixed safety-critical system, a number of architecture decisions must be made. Which components need to be safety certified, and which components do not? How is the system going to provide proper separation? What is the boot flow for the system? How is the system going to communicate between the safe and non-safe domains? Discussion of these decisions follow below.
Safe vs. Non-Safe Domains
To fully take advantage of cost saving as it pertains to functional safety certification, the architect must define what components need to be in the safety domain and what components can reside outside of this domain.
The general requirements for the safe domain are:
- Establish the isolation perimeter in the system.
- Detect and handle systemic faults in software.
- Monitor hardware operating parameters such as temperature, voltage, clock, errors, and other system variables.
- Verify the correct operation of hardware and run diagnostics.
- Run a safety-certified RTOS.
The non-safe domain is usually responsible for providing a rich execution environment, such as a user interface or upstream connectivity, and doesn’t have to fulfill any certification requirements (Fig. 1).
Separation
One of the key factors in mixed safety-critical systems is the use of isolation to separate the different software components from each other. The isolation is essential to prevent fault propagation and interference from the non-safe domain. Hardware-assisted separation capabilities provided by many MPSoC architectures help obtain the required separation between the safe domain and the non-safe domain. This includes the separation of processing blocks, memory blocks, peripherals, and system functions.
For systems with a dedicated system controller for power management, the controller must be a part of the safe domain. Furthermore, the controller must enforce correct permissions so that requests from the non-safe domain are properly scrutinized. Incorrect permissions can provide a bypass channel to access safe resources.
Another important decision is when to enforce isolation. The possible options are boot time, post-boot time (initialization), and runtime. For systems with complex boot scenarios, it’s recommended to enforce isolation early.
Booting
Another key factor for mixed safety-critical systems is how the system boots. How the system is brought up affects domain separation and a safe, secure boot. The architect needs to understand what must be configured first. The authenticated boot should be sufficient to ensure the integrity of components.
Secure boot using encrypted images is required for secrecy. The chain of trust is desirable to prevent malicious content creeping into the system. Finally, there may be other specific system requirements, such as the device must have some communication from the application within a certain time after power-on.
Communication
The system architect needs to consider how the design will safely pass information and data between the safe and non-safe domains. Mixed safety-critical systems must use extra care for the IPC to ensure that the safe and non-safe domains stay separated, and that the non-safe domain can’t contaminate or disrupt the safe domain.
Two important issues emerge when it comes to reliable communication between the safe and non-safe domains:
Buffer validation: Buffer parameters such as address, size, and permissions must be validated before being used by the safe domain. Thoroughly check bounds on the buffer and discard any buffer that’s outside of the valid range. Buffer validation must be paired with the proper error response to give the user insight into the system and possibly detect malicious activity.
Mitigate interrupt flooding: The non-safe world could potentially flood the communication channel with interrupts. The load may violate the temporal isolation requirements of the system if no special handling is provided. Provide a mechanism to throttle the interrupts on the non-safe side or support polling mode on the safe side.
OpenAMP is ideal for solving the problem of communication between different criticality domains. RPMsg is the Inter-Processor Communication (IPC) component included with OpenAMP that provides the capability to communicate across operating systems on the heterogeneous hardware (Fig. 2).
RPMsg uses VirtIO as a shared-memory-based transport abstraction. The VirtIO code in OpenAMP has bound checking for buffers and can be easily extended to support polling mode and interrupt throttling.
The VirtIO has its roots in the guest hypervisor. It’s used as a standard IO virtualization mechanism, providing virtual device configuration and data exchange between the guest driver (front end) and the virtual device in the hypervisor (backend). VirtIO also defines a communication abstraction known as virtqueues. This is used to transfer data between guest and host. RPMsg uses the same abstraction to exchange data between the master and the remote.
Internally, virtqueues maintain a ring buffer, known as vring. The vring resides in the shared memory and contains the ring of buffer descriptors. The buffer descriptors contain the pointers to buffer exchanges between the master and the remote and read/write permissions.
Use Cases and Application
The following example demonstrates a mixed safety-critical system on the Xilinx Zynq UltraScale+ MPSoC platform. The Xilinx MPSoC provides an abundance of resources to develop mixed safety-critical systems. The platform comprises a cluster of Arm Cortex-A53 cores known as the Application Processing Unit (APU) and a cluster of Arm Cortex-R5 cores known as the Real-time Processing Unit (RPU). The RPU is ideally suited to execute functional safety-critical applications.
The system contains a dedicated control processor—the Platform Management Unit (PMU)—to perform system monitoring and power management. There’s also memory and peripherals protection hardware, the Xilinx Memory Protection Unit (XMPU), and the Xilinx Peripheral Protection Unit (XPPU), to implement the isolation. Furthermore, safe and secure boot options are available to set up isolation.
The following example mimics a patient monitoring system, where the functional safety-certified software context on the RPU obtains the sensor data and controls the medicine dose delivered to the patient. The sensor data contains the patient vitals. The non-safe domain consists of the high-level operating system, which displays the data on an LCD and provides internet connectivity. The design decisions are made in light of the discussion in the last section.
Safe and Non-Safe Domains
The system is split into to two domains:
The Safe Domain
The safe domain comprises the RPU cluster, PMU, CSU, System Control Register, and peripherals. System monitoring and verification are performed by the PMU.
The Non-Safe Domain
The non-safe domain consists of the APU and related peripherals. Linux is used on the APU to provide a rich UI to display patient vitals.
The safe and non-safe domains communicate through OpenAMP RPMsg.
Isolation
The design consists of three subsystems. These subsystems are created in the Xilinx Vivado HLS Processor Configuration Wizard (PCW). The PCW generates the code to configure the XMPU and XPPU to enforce the desired isolation. The initialization code is executed by the Xilinx First Stage Bootloader (FSBL) on the RPU. In this design, the APU system is a non-secure subsystem (colored maroon), while the PMU and RPU subsystems are both considered safe systems (colored green).
Any attempt by non-safe subsystem components to access secure system registers or memory is blocked by the XPPU and XMPU. To restrict access in this manner requires configuring with the Xilinx Vivado IDE and generating a hardware project.
To restrict the PMU from servicing requests from the non-safe APU subsystem, the First Stage Boot Loader (FSBL) provides a compile-time configurable permissions object. This object is passed by the FSBL to the PMU firmware during initialization to configure the permissions.
Booting
The boot sequence starts on the PMU (Fig. 3). The PMU initializes, then releases, the Configuration and Security Unit (CSU). The CSU next authenticates the PMU firmware contained in the boot.bin image and loads it. After that, the CSU authenticates, loads, and starts the FSBL on the RPU.
The FSBL performs many tasks:
- Provides the PMU configuration object to the PMU firmware, which contains the power-management permissions granted to various subsystem masters.
- Starts the hardware isolation by programing both the XPPU and the XMPU using the configuration generated by the Vivado tools.
- Authenticates, loads, and starts the Nucleus Cert on the RPU.
- Authenticates, loads, and starts Arm Trusted Firmware and U-Boot on the APU.
The U-Boot authenticates and programs the programmable logic bitstream, followed by starting Linux on the APU.
Communication
OpenAMP RPMsg is used to provide communication between the Linux and RTOS domains. Internally, shared memory is employed for buffers and Xilinx’s IPI block handles notifications. Communication between the operating systems via RPMsg requires two data queues in shared memory to allow for bidirectional data transfer. To support asynchronous notification, an interrupt resource is allocated to each operating system, and the Xilinx IPI component signals data availability between the subsystems.
Other Considerations in Utilizing OpenAMP
Remember that while OpenAMP provides the basic software components needed to enable the development of applications for AMP systems, OpenAMP is only a framework. The concepts in this article still require the user to modify portions of the OpenAMP components to ensure safe communication between the different domains.
The Mentor Embedded Multicore Framework Cert has already made these changes, helping users speed up implementation of their mixed-safety critical systems. It’s also built on the OpenAMP standard and supported on a number of leading silicon platforms, plus it’s operational in production electronics today.
Summary
Multi-processor SoCs deliver a new level of consolidated peripheral and CPU clusters that can help engineers lower BOM costs and accelerate design implementation. In safety-critical applications, proper use of this hardware requires new levels of system management addressed through the OpenAMP Framework.
Using a mixed-safety critical medical example, we explored how a Xilinx Zynq UltraScale+ MPSoC and OpenAMP provides a solution platform that can be used to implement mixed safety-critical solutions in a robust and cost-effective manner. For more information on Mentor’s Embedded Multicore Framework Cert, visit www.mentor.com/embedded-software.
Jeff Hancock is Senior Product Manager and Etsam Anjum is Senior Software Engineer at Mentor Embedded Platform Solutions, Siemens Digital Industries Software.
References
https://www.xilinx.com/support/documentation/application_notes/xapp1320-isolation-methods.pdf