Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Design Application]
Use Embedded RISC Processing To Boost Router Power
When CISC Stops Dead In Its Tracks, Special RISC Instructions And Versatile Computational Add-Ons Can Increase Router Performance.

Contributing Author  |   ED Online ID #7598  |   June 22, 1998


Network routers are increasingly performance hungry because local-area networks (LANs) and wide-area networks (WANs) currently operate faster than ever. Ten-Mbit/s Ethernet technology has been the mainstay for the past two decades. But now the industry is moving en masse into Fast Ethernet (100 Mbits/s), and even gigabit Ethernet (1000 Mbits/s).

As a result, the number and throughput of datapaths coming in and out of a router are increasing. The system engineer must, therefore, factor greater performance processing into a design, in order for his system to efficiently perform routing functions. Take, for example, a 25-MHz system moving data between a 10-Mbit/s Ethernet LAN and a WAN. This level of processing is adequate for up to 6000 packets per second (pps), or assuming 200+ byte packets, 10 Mbits/s, which is the limit of 10-Mbit/s Ethernet.

However, if this LAN is migrated to 100-Mbit/s technology, with a corresponding WAN capability, there is considerably more throughput, and a major requirement for a faster processor with performance add-ons. To date, conventional CISC processors have been utilized as router CPUs. However, more embedded RISC processing is making inroads into this CISC-dominated design arena. There are several design considerations and trade-offs for utilizing embedded RISC over CISC in next-generation routers, which today splinter into various levels of design requirements from the high-end, high-performance models to newer, low-end access routers.

There's been an ongoing popular belief among CISC devotees that it is indeed faster than RISC processing. The truth of the matter is that a typical CISC processor can perform a task in one instruction, but that same task would require two-to-three RISC instructions. However, a CISC processor requires multiple clocks per cycle—typically, at least three clock cycles of throughput execution time for the simplest instructions, and on the order of 12 to 24 clock cycles for more complex instructions. Conversely, a RISC processor takes a single clock cycle for each instruction.

Therefore, consider that the two-to-three RISC instructions and three clock cycles are much more attractive from a higher-performance point of view compared to one CISC instruction taking at least three clock cycles, and frequently much more. Also, consider that RISC instructions are simpler and, consequently, operate faster so you can achieve substantially higher clock rates. By combining single-cycle execution with high clock rates, the RISC processor can provide more than three times the processing power of a CISC processor in a typical application.

But it becomes more difficult when comparing the processor performance itself. There are a wide variety of CISC processors with an assortment of instructions and instruction timings. Furthermore, different applications will use the various instructions in different ways. If your application seldom uses the "XYZ" instruction, it may not be worth paying extra for a processor that executes "XYZ" in few clock cycles. Different applications, processors, and programming styles will always generate different results.

Let's compare the implementation of a simple ring-buffer put routine on a CISC processor and a RISC processor (Table 1). The put routine is a simplified version of a common routine used to implement a ring buffer. Here, a value in a data register is stored in the next location of a ring buffer in memory, and the next location pointer is incremented and wrapped back to the start of the buffer, if necessary.

No special instructions were used for this comparison. The code looks at general instructions used to implement the put routine on a 68000-style processor, and on a MIPS RISC-style processor. Estimated clock execution times (throughput for pipelined processors) are provided for each implementation. For the CISC implementation, the estimated execution clocks for both a low-end processor and a high-end processor (in parentheses) are given. Both the RISC and the higher-end CISC timings assume full caching of data and instructions.

From this example, it can be seen that the high-end CISC processor executes in nearly the same number of clock cycles as the RISC processor. This is a result of using the simplest instructions with fast execution times to implement a very simple routine. Still, it can be seen in lines four and five that the CISC processor implements the required operation with two instructions requiring five-to-six clock cycles, where the RISC processor requires three instructions and three clock cycles. More-complex instructions show a wider difference both in the number of RISC instructions to be equivalent, and number of CISC cycles to execute.

The other number, not shown in Table 1, is the difference in maximum clock frequency of these processors. The high-end CISC processor here typically tops out around 40 MHz, while RISC processor equivalents exceeding 100 MHz are readily available. When the system designer evaluates processors from a performance standpoint, it's difficult to make exact comparisons without implementing the entire design in each processor. There are, however, some general guidelines. The RISC processor will typically execute in fewer clock cycles, require more instructions for the equivalent application, and will be available at much higher clock frequencies.

In a router design, data packets are handled by software with a minimal of hardware intervention. Legacy-software-based high-end routers can process about 500,000 to one-million 64-byte packets per second. A single gigabit Ethernet interface can pump over 1.4 million of such packets in each direction. This tells the system designer that future-generation Layer 3 routers and switches will not be able to solely depend on software for packet filtering and forwarding. Or, put another way, conventional CISC processing has little likelihood of shoring up this problem.


<-- prev. page     [1] 2 3 4     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?



POST YOUR COMMENTS HERE
Name:

Email:
Your Comments:

Enter the text from the image below


Please refresh the page if you have trouble reading this text.

Search Electronic Design
     
  
 
Web Seminar
Sponsored By:
Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
Speakers: 
Date: 07/01/08
Register: 

Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources