Carrier-Grade Reliability—A “Must-Have” for NFV Success

Telecom service providers across the globe are developing strategies to migrate away from their traditional, physical networks and toward a new architecture based on virtualization technologies. The new approach, called network functions virtualization (NFV), is necessary from both technical and business perspectives. It will help improve service flexibilities; increase operational efficiencies; make it easier to differentiate services; and drastically reduce costs.

As service providers begin considering their deployment strategies, one of the first issues they will face is a platform decision: Adopt a carrier-class or enterprise-class virtualization solution for their NFV infrastructure? Many operators associate NFV with enterprise-class software and might be tempted to go this route, but enterprise software can’t deliver some of the most essential reliability and cost characteristics needed for carrier-grade networks.

Download this article in .PDF format
This file type includes high resolution graphics and schematics when applicable.

The purpose of this article is to clarify the vital advantages and reduced risks that carrier-class infrastructure provides compared to enterprise approaches. It also delineates some of the key technical requirements an NFV platform must meet to achieve carrier-grade reliability.

Always On: The Essential Value of Carrier-Class Reliability

Telecom networks must be “always on.” Such networks have to guarantee always-on services because society, businesses, and industries depend on reliable connections for both routine and critical communications. In fact, a dial tone (or its VoIP equivalent) is so reliable that it makes one assume it will always be there. The lack of a connection, on the other hand, is so unusual that it will always surprise a customer.

Because always-on reliability is essentially mandatory, telecom service providers have built their networks—and their reputations and revenue streams—on a foundation of “carrier-class reliability.” A carrier-class network guarantees network availability 99.9999% (6-nines) of the time, allowing no more than 32 seconds of downtime per year. This level of service is typically required by high-value enterprise customers who often will pay a premium for service-level agreements that specify 99.9999% network availability.

Carrier-Class NFV Minimizes Risks, Costs

While 6-nines is the reliability standard in telecom, enterprise-class IT services usually guarantee network availability 99.9% of the time (3-nines). This class of service implies 526 minutes, or 8.76 hours of downtime, per year.

Downtime is costly and the penalties imposed on a service provider for any network outages or failures are severe, estimated at $11,000 per minute per server in refunds paid to customers covered by SLAs. Because typical systems use thousands of servers, SLA costs skyrocket if downtime isn’t kept to a minimum.

For example, if a service provider employs enterprise-class virtualization software and a single server is down for 526 minutes per year, the company’s lost revenues could amount to $5.786 million per year. If 1000 servers are affected by 526 minutes of downtime per year, the company’s losses could amount to $5.786 billion per year. These estimated costs represent the direct revenue losses attributed to the outages. They don’t include the long-term revenues lost if affected customers switch to another service provider that promises greater service availability. In addition, the refunds might not fully compensate customers for the financial losses and other adverse business consequences they incur from outages.

A carrier-grade network will have significantly less impact on service providers and customers. A carrier-grade server that’s down for 32 seconds per year would incur losses of $5830. If the service provider has 1000 servers, the costs could reach $5.83 million. While not negligible, it’s one-thousandth the cost incurred by an enterprise service. And customers are less likely to switch providers when the downtime is kept to less than 32 seconds per year.

The table presents the differences in downtime and costs that service providers can expect with carrier- and enterprise-grade infrastructure.

Electronicdesign Com Sites Electronicdesign com Files Uploads 2015 01 Wind River Table

Achieve NFV-Driven Carrier-Grade Reliability

Fortunately, NFV can provide carrier-class reliability to deliver 6-nines guarantees of always-on connectivity to service providers and their customers. Such reliability is attained by complying with very stringent availability, security, performance, and management requirements, including TL 9000 standards and metrics that define the core attributes of a carrier-class system. However, because many types of software elements used across the architecture must meet these requirements, a NFV platform must be designed from the ground-up to succeed. During platform planning and development, several key considerations must be addressed:

Availability

A telecom network must provide virtual-machine (VM) redundancy over a geographic range of at least 500 km to allow continued operation in the event of a natural disaster, such as a hurricane or earthquake. When faults occur, the VM infrastructure has to recover in less than 500 ms. The network should not drop calls or lose data during failovers.

Numerous components will play a part in guaranteeing this level of availability. The hypervisor, for example, must minimize the duration of downtime during the live migration of VMs from one system to another. The standard implementation of a KVM hypervisor, however, doesn’t provide the response time required to minimize downtime during orchestration operations for power management, software upgrades, or reliability spare reconfiguration. Therefore, it must be optimized to meet the required response times.

Furthermore, to respond to failures of physical or virtual elements within a NFV platform, the management software must be able to detect failed controllers, hosts, or VMs immediately. Then it must implement hot data synchronization to avoid dropped calls or loss of data in the event of a failover.

If degraded, the system has to automatically act to recover failed components and restore sparing capability. To do this, the platform must provide a full range of carrier-grade availability APIs (for hot sync and VM monitoring, among other capabilities) that are compatible with the needs of the OSS, orchestration systems, and virtualized network functions (VNFs) deployed by the service provider. In general, a system’s software design must also ensure that no single point of failure can bring down a network component, nor any “silent” VM failures go undetected.

Security

Telecom networks have stringent, carrier-grade security requirements that go beyond typical enterprise installations. These capabilities must be designed into the platform as a set of coordinated, fully embedded features, not implemented as a collection of capabilities added onto enterprise-class software.

All observable traffic in a 4G network must be encrypted and visible user data can’t be stored in the system. For an NFV data center or cloud deployment, operators will need to implement multi-tenant isolation and security to ensure that subscribers can’t access one another’s traffic or data. Also, the network must fully implement AAA security protocols (authentication, authorization, and accounting) to prevent unauthorized access, hacking, or terrorist attacks. Rate limiting, overload, and denial-of-service (DoS) protection are essential to secure critical network and inter-VM connections. Other requirements include full protection for the program store and hypervisor; secure, isolated VM networks; and secure password management. Prevention of OpenStack component spoofing is crucial, too.

Performance

A carrier-grade network must achieve both high throughput and very low latency for critical real-time applications. In a NFV architecture, throughput depends on the performance of the host virtual switch (vSwitch), which determines the bandwidth delivered to guest VMs. It has to deliver very high bandwidth to the guest VMs over secure tunnels. At the same time, the vSwitch must make minimal use of CPU resources, because the service provider utilizes these processor resources to run revenue-generating services and applications. All VM data-plane processing functions must be accelerated to maximize the revenue-generating payload per watt.

In terms of latency, it’s critical that the platform ensure a deterministic interrupt latency of 10 µs or less to ensure virtualization’s feasibility for the most demanding CPE and access functions. Finally, live migration of VMs must occur with an outage time of less than 150 ms, using a “share nothing” model in which all of a subscriber’s data and state are transferred as part of the migration. The “share nothing” model, which is preferred over the shared storage model in enterprise software, ensures full support of legacy applications without needing to be rewritten for deployment in NFV.

Network Management

It’s essential that a carrier-grade system eliminate unscheduled, as well as planned, downtime for network maintenance. To prevent this downtime, it must support hitless software upgrades and hitless patches. The backup and recovery system has to be fully integrated with the platform software. Finally, support must be implemented for “northbound” APIs that interface the infrastructure platform to the OSS/BSS and NFV orchestration software, including SNMP, Netconf, XML, REST APIs, OpenStack plug-ins, and ACPI.

Download this article in .PDF format
This file type includes high resolution graphics and schematics when applicable.

Conclusion

Carrier-grade networks maintain strict reliability requirements, but with careful planning and the assistance of telecom-system engineering experts, service providers can build this capability into their NFV deployments. With carrier-grade systems, service providers can confidently commercialize their NFV services, knowing that they will meet business and technology objectives while satisfying their customers. Without this assurance for NFV, service providers risk the loss of high-value customers, whose churn from the network could offset the many business benefits provided by NFV.

Charlie Ashton, director of business development at Wind River, is responsible for initiatives in the networking and telecommunications markets. He has held leadership roles in both engineering and marketing at software, semiconductor, and systems companies, including 6WIND, Green Hills Software, Timesys, Motorola (now Freescale Semiconductor), AMCC (now AppliedMicro), AMD, and Dell.