Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Design Application]
Customized Processor Extension Speeds Network Cryptology
Collapsing several conventional instructions into one custom instruction yields a performance increase of 92× for 3DES.

Peter Davies, Steve Robsky  |   ED Online ID #2752  |   September 16, 2002


As public data networks, online commerce, and smart cards become more popular, the need for secure data transmission grows. But the complex computations required to encrypt and decrypt data can soon become a performance bottleneck. When servers spend more time handling the overhead of secure connections, users often complain about network delays causing lower productivity and dissatisfied customers. Therefore, fast encryption and decryption provides a distinct competitive advantage.

The U.S. government's Data Encryption Standard (DES) specifies the Federal Institute of Standard Publications' (FIPS) approved cryptographic algorithm FIPS-46-3 as issued by the National Institute of Standards and Technology (NIST). The most widely used encryption algorithm in the world, it's utilized by banks for electronic fund transactions and by government agencies for communication systems.

The primary goal of the project described in this article was to reduce the CPU's workload for Internet Protocol Security (IPSec) encryption and decryption. This frees up CPU resources for other tasks without increasing the CPU's clock frequency. Alternatively, it could enable the CPU to run at a lower clock frequency to reduce power consumption, simplify system design, and cut costs without sacrificing performance.

To do this, we developed a DES extension by employing the user-customizable ARC processor, together with IPSec protocol software and software-development and analysis tools. To analyze and profile the compute-intensive software routines in the DES algorithm, public-domain C source code was implemented.

The result was an overall performance increase of 47× with single DES, and 92— with triple DES (3DES). These extraordinary gains substantially reduce the CPU's cycle count when executing these tasks. Another benefit is a 63% decrease in code size.

Data-Encryption Standard: The DES algorithm works on 64-bit data blocks in 16 repeated cycles, or rounds, under the control of a 56-bit encryption key. Each round uses sub-keys generated from the original key, making it well suited to hardware acceleration. The data supplied to the algorithm is called plaintext, and the resulting data is known as ciphertext. There are essentially three components to the DES algorithm. To encipher a 64-bit data block, the DES algorithm performs the following functions (Fig. 1):

  1. Initial permutation (IP)
  2. Complex key-dependent computations
  3. Final or inverse initial permutation (IP­1)

To decipher the data, the program uses the same algorithm and key that enciphered the data. However, it alters the addressing of the key bits so the deciphering process is the reverse of the enciphering process.

The key-dependent computation can be defined in terms of the cipher function f and the key schedule, KS. The key schedule produces 16 sub-keys based on the 56-bit secret key.

Initial Investigation: The starting point for the acceleration project was to analyze some DES software that's optimized for general-purpose 32-bit microprocessors. The original idea was to create a new instruction that would perform an entire DES round in one clock cycle.

Sixteen iterations of this instruction would almost complete a 64-bit DES encryption routine. Additional instructions would perform the initial and final permutations on the data block and execute the initial contraction and permutation on the key data. These instructions would employ four custom extension registers (named C, <D, L, and R) to hold the key and data block values during processing.

Further investigation showed that the ARC processor could perform the data permutations as part of the rounding operation. Also, the key permutation could happen automatically while loading the extension registers. Therefore, the final implementation consists of one new extension instruction operating on four custom 32-bit extension registers.


<-- prev. page     [1] 2 3     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?



POST YOUR COMMENTS HERE
Name:

Email:
Your Comments:

Enter the text from the image below


Please refresh the page if you have trouble reading this text.

Search Electronic Design
     
  
 
Web Seminar
Sponsored By:
Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
Speakers: 
Date: 07/01/08
Register: 

Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources