Premium Content

New Signal Chain Resources from Texas Instruments:

Customized Processor Extension Speeds Network Cryptology

Collapsing several conventional instructions into one custom instruction yields a performance increase of 92× for 3DES.

Date Posted: September 16, 2002 12:00 AM

Implementation: The desrnd in-struction was added to the processor using the standard extension-instruction architecture. The core pipeline could modify the values in the extension registers (one at a time). But instead, the DES module simultaneously updates them as it executes the desrnd instruction. The DES module can perform a single DES round or two successive DES rounds per cycle with one desrnd instruction.

These extensions also support the ARC processor's scoreboarding capability. Before executing a desrnd instruction, the program can load the C, D, and L registers with LD instructions, load the value of register R into a core register, then move that value from the core register into R. This is necessary because the desrnd instruction writes results into all four extension registers simultaneously.

The IPSec software was modified to use the new instruction if the programmer specifies a conditional compiler directive. There are two levels of software integration:

  • C module: A simple modular integration replaces the key-generation and encryption modules with functional (nearly empty) duplicates. These duplicate modules employ code macros to perform the key generation and encryption/decryption. (Although they retain the 16-element key table, they use only the first element because the desrnd instruction rotates the keys.)
  • In-line assembly: A cleaner, higher-performance in-line assembly implementation inserts the desrnd instruction (also via code macros) directly into the DES- and 3DES-encryption/decryption modules. The revised modules were incorporated into a test program, and the FPGA-based ARCangel prototyping system exercised them against the baseline code for an additional 500 million data/key pairs.

Verification: To verify the modified DES implementation, the DES algorithm (as defined in FIPS 46-3) was first modeled in software, simulating the operation of the new desrnd instruction and registers via function calls. The developers extracted actual modules from the IPSec stack and added them to the shell code driving the simulated instruction. Executing these modules in parallel with the simulated instruction would verify the correctness of the new implementation.

After the developers tested and verified a few hand-built data/key combinations, they modified the shell to continuously generate random data and keys. They compared more than 5 million data/key combinations over a period of several days. The developers used the same software as a guide while designing the hardware implementation. This gave step-by-step intermediate values to verify correctness during early simulation.

When they had verified that hardware simulation matched the results of the software, the developers applied individual test vectors supplied in NIST Special Publication 800-17 (MOVS Requirements and Procedures) to the hardware simulation. This provided additional confirmation of the implementation's correctness.

Next, the developers synthesized and loaded an image of the customized processor into the Xilinx FPGA of an ARCangel development system. They then modified the DES simulation to run on this system and tested over 500 million random key/data pairs. The results of the DES simulation and the customized ARC processor were in complete agreement.

Results—Performance Increase: Figure 3 illustrates the performance increase achieved (relative to the baseline for single 3DES encryption) by using the desrnd instruction with simulated TCP/IP data. It also compares results from the desrnd instruction in a program in two different ways: with a C module and with an in-line assembly-language routine.

Results—Data Rates: Tables 3 and 4 show the 3DES data rates calculated from CPU cycles, assuming a CPU speed of 200 MHz and a DES block size of 64 bits. We used the following formula:

data rate = CPU speed × block size/cycle count

The data rates for 3DES were calculated via two different software implementations. One employed the desrnd instruction in a C module, while the other implemented it with an in-line assembly-language routine. The data rates also were calculated for one DES round per cycle (Table 3, again) and two DES rounds per cycle (Table 4, again).

Results—Code Size: Although the primary goal of the DES extension was to accelerate performance, a fringe benefit of collapsing several conventional instructions into a single custom instruction is smaller code size. The actual reduction is about 4 kbytes, or approximately 63% of the low-level DES-CBC code, important for applications that have limited memory resources (Table 5).

To sum everything up, extending a user-customizable processor improves the performance of compute-intensive algorithms, such as those required in the IPSec protocol. In this instance, an increase of 92&215; was achieved with 3DES. Any application that needs high-performance DES processing could potentially benefit from the custom DES extension to ARC's processor, like Secure Socket Layer (SSL), Transport Layer Security (TLS), and Encrypted PPP.

Part Inventory
Go
powered by:
 

 
You must log on before posting a comment.

Are you a new visitor? Register Here
    There are no comments to display. Be the first one!