When it comes to processing wire-speed packets with a time-to-market that ASIC system designers will envy, network processors promise to beat the pants off of conventional processors. Envelope-pushing network-hardware designers simply have to decide which network processor to use. But network-processor architectures are so varied, designers need to choose carefully.
Designers also must determine which network processors will address their final product's market. The term network processor has been applied to a range of products, from Ethernet-to-DSL controllers to terabit switching. Many favored processors address mid- to high-end switching with speeds that range from OC-3 (155-Mbit/s) enterprise solutions to OC-192 (10-Gbit/s) channels used by ISPs and carriers. This range includes OC-12 (622 Mbits/s), OC-48 (2.4 Gbits/s), and Gigabit Ethernet OC-768, a long-term target for many network-processor vendors.
Operating at wire speeds is critical. Designers must match network-processing vendor claims with service support when comparing alternatives, though. Many vendors provide performance numbers based on basic routing and switching policies addressed by layers 2 through 4 of the International Standards Organization's (ISO's) Open Systems Interconnect (OSI) reference model.
The more robust layers 5 through 7 include address-content switching, URL switching, security, load balancing, service-level agreements (SLAs), network address-translation (NAT), multiprotocol-label-switching (MPLS), and voice-over-IP (VoIP). These layers often reduce the maximum bandwidth supported by the network processor. Some solutions utilize multiple network processors while others restrict designs to a single network processor.
Though the enterprise-network processor space is at the low end of the performance scale, it's usually at the high end of the service scale. Some solutions approach the commodity level. For example, Switchcore's CXE-16 is a 16-port Gigabit Ethernet switch/router on a chip. Just add some RAMBUS RDRAM, some optional content addressable memory (CAM), and an external control processor for a complete solution. There is room to add value, but not nearly as much as some of the more expensive alternatives.
The IXP1200 Internet Exchange Processor by Intel is the quintessential network processor. Its six integrated programmable microengines have hardware context support (32 registers) for four threads for a total of 24 active threads (Fig. 1). An on-chip 200-MHz StrongARM processor coordinates system activities, although a PCI-bus interface provides integration with an external control processor.
The idea behind the hardware context switch is to maintain high utilization of resources, such as the built-in coprocessors and memory access subsystem. This can be a complex programming task. A single IXP1200 may suffice, but multiple chips can be combined using a variety of bus architectures to increase throughput and functionality. This is another area that lets designers differentiate their product from the competition when providing layer 2 through 7 services.
The current offering from Allayer Technologies addresses layer 2 through 4. The AL100 uses a ring-of-switches (ROX) bus architecture. The bus supports up to four network processors, plus additional coprocessors that offer switch management and other services. Multiple processors provide incremental improvement and migration to the next-generation AL3000. The 12.8-Gbit/s ROX-II bus will supply a major leap in performance.
Under the watchful eye of an on-chip Power PC core, IBM Microelectronics packs 16 programmable protocol processors into its network processor. A PCI control-bus interface provides access to an external control processor. The protocol processors are paired to share on-chip hardware coprocessors that accelerate tree searching and frame manipulation. IBM's design utilizes less-expensive DDR DRAM while supporting OC-48 rates. As with most network processors, the amount of traffic that the device can handle depends upon the type of analysis performed with each frame. Higher-level protocols force the processors to look further down into the data, which decelerates throughput.
The CS2000 reconfigurable communication processors (RCPs) from Chameleon Systems offer an interesting alternative to the fixed configuration of most network processors (see "Scalable, Reconfigurable Processor Adjusts Logic For Top Performance," electronic design, May 15, p. 66). They consist of a dozen identical but configurable tiles, which are organized into four slices, for data processing (Fig. 2). Each tile has a control unit, seven 32-bit datapaths, two 16- by 24-bit single-cycle multipliers, and four 32-bit, 128-word memory blocks. The CS2000 also has 16 DMA engines, and its components are tied to the internal 128-bit RoadRunner system bus.
The ability to reconfigure all or part of the CS2000 on-the-fly impacts the overall system
design as well as individual algorithms. Unfortunately, configuration switching isn't instantaneous, thereby limiting the throughput that can be handled under these circumstances. Even so, it allows efficient processing algorithms to be implemented with customizable hardware that usually performs better than software designed to perform the same job.
The high-end network-pro-cessor space maximizes performance. Compared to enterprise solutions, though, functionality is often sacrificed for speed. Still, features like quality of service (QoS) are usually included.
Formally known as Agere Inc., Lucent Network Processors uses a pair of chips to handle OC-48 rates: the Fast Pattern Processor (FPP) and the Routing Switch Processor (RSP). Many OC-48 products actually support four OC-12 ports. Lucent indicates that its design should scale to OC-192 rates. It seems a good bet given that the FPP consists of only 4 million transistors, versus 53 million for C-Port. The company keeps costs down by utilizing PC133 SDRAM for off-chip memory. The PCI interface handles external management.
The FPP is programmed using a functional programming language that lends itself to the chip's architecture, compared to a more conventional processor or a state machine. Function programming may be new to many designers. Lucent, however, supplies programming samples, multiprotocol routing, and segmentation and reassembly (SAR) needed for IP over asynchronous transfer mode (ATM), one of the target markets.