Task-Optimized Architecture Lets Network Processor Handle 10-Gbit/s Wire-Speed Applications

As network speeds move from 1-Gbit/s transmission rates to 10 Gbits/s, software-controlled RISC processors run out of the performance needed to manage packet handling at wire speed. But by developing a task-optimized processing (TOP) approach, designers at EZchip Technologies in Migdal Haemek, Israel, have overcome many of the speed limitations. They've developed a network processor that can handle ISO OSI reference model layers 1 through 7 at 10-Gbit/s wire speeds.

The TOP approach starts with a core building block, the TOPcore. It contains a collection of multiple task-optimized processors that operate in parallel and dedicated blocks of memory to support them. These TOP processors deliver an approximately tenfold throughput improvement over even the application-targeted RISC network processor chips. To achieve this, designers at EZchip crafted a series of TOP engines.

A TOP engine is dedicated to each packet-processing task. Each engine has customized instruction sets and datapaths to handle the intended task with maximum throughput. This reduces the number of clock cycles needed for each task. It also lets the engines operate in parallel to get more work done every cycle.

Typically, the four main operations that packet processors handle are parsing, searching, resolving, and modifying. The basic TOP system architecture consists of a packet data flow that enters the chip and goes to a TOPparse engine, to a TOPsearch engine, to a TOPresolve engine, and finally to a TOPmodify engine before leaving the chip and moving on to its destination.

The TOPparse block identifies and extracts the various headers and fields within the packet. It handles all seven layers of the ISO OSI reference model, including fields with dynamic offsets and length. The TOPsearch engine performs various table lookups required for layer-2 switching, layer-3 routing, layer-4 session switching, and layer-5 through -7 content switching and policy enforcement. Special support is included to enable wire-speed performance of layer-5 through -7 processing for items such as text strings, which often are very long and vary in size—for example, URLs.

Traffic Accounting Info Gathered Next, the TOPresolve engine assigns the packet to its appropriate output port and queue. It forwards the packet to multiple ports for multicast applications as well. This block also gathers traffic accounting information on a per-flow basis, which then permits the network manager to analyze network usage and determine billing charges.

Finally, the TOPmodify block adjusts the packet contents in accordance with the results of the previous stages. It modifies LAN assignments and other relevant fields. Additionally, it performs network address translations while managing quality-of-service (QoS) priority settings and other functions. Multiple copies of each block are integrated on the chip, along with a large amount of embedded local memory to reduce the access time (see the figure).

The local memory offers the TOP engines an extremely wide, 512-bit interface. This capacity wouldn't be possible if the memory weren't physically integrated on the chip. The wide interface also greatly improves the data bandwidth.

Software commands, downloaded from a system's host processor, control the TOPcore operation. Any change in network policy or other functions can be implemented with a simple software update. The software controls the superpipelines and superscalar pipeline of the TOPcore engines. The packet-processing tasks are pipelined. Packets pass from the TOPparse to the TOPsearch, to the TOPresolve, and finally to the TOPmodify blocks.

Superpipelining accelerates the execution of the operations. It can be scaled as integration levels increase. Multiple instruction pipelines operate in parallel in a superscalar architecture to let several different instructions execute in parallel during a single cycle.

For operations such as parsing a URL in an HTTP/RSTP packet, the TOPcore solution requires about 60 clock cycles. A generic RISC processor could require as many as 400 clock cycles. To search URL tables, the TOPcore can use just six cycles. A RISC processor might need up to 200 cycles. Additionally, for tasks like resolving a multicast routing decision, the TOPcore needs just eight cycles while the RISC engine might require as many as 80 clock cycles.

The company expects to sample the EZswitch chip near the end of this year. For more information, point your browser to www.ezchip.com, or call (972) 644-9966.