• Channels
Part Inventory
Go
 
powered by:

 
  • Quick Poll
What Social Networking site do you use the most?



VOTE VIEW RESULTS
Previous Polls

Premium Content

New Signal Chain Technical Papers from Texas Instruments:

 

 

 

Factors To Consider When Choosing The Right DSP For The Job

DSP Performance Isn't Just About MIPS. Application-Specific Issues Can Strongly Affect A Chip's Performance.


Contributing Author

June 08, 1998

Print
Reprints Comment Subscribe

The recently-introduced Texas Instruments TMS320C67x and the well-established Analog Devices ADSP-2106x SHARC processors are the two highest-performance, floating-point DSPs on the market today.3 Which of these two processors provides the highest system performance? As we shall see, the answer really depends on the kind of task you're trying to perform. Keep in mind that Analog Devices (ADI) will be releasing their next generation SHARC processors, and Texas Instruments (TI) has an aggressive plan to increase the speed of the 'C67x range.

System engineers must select the device that provides the most effective solution to meet the requirements of their DSP application. While the obvious step is to compare the raw processing power of the two processors, this comparison will give little indication of expected system performance, especially in highly demanding multiprocessing applications.

Choosing the most suitable DSP platform, from a systems perspective, requires an analysis of many aspects of the application. First, the I/O data rates and channel density must be reviewed to determine the bandwidth in and out of the system.

The next step involves the mapping of DSP algorithms to DSP devices. This may be complex, and requires an understanding of I/O data paths, memory management, interprocessor communication capability, and synchronization mechanisms. While the resolution of these issues determines the best technical solution, other factors also require consideration. For example, time-to-market is influenced by the availability of third-party library support, and the characteristics of the development tools accompanying each processor.

A comparison of the two components logically begins with an analysis of the features of each device. Rather than a comprehensive feature list, this section summarizes the features that differentiate the performance of each (see the table). Full specifications are available in the data sheets provided by each vendor. As a detailed specification was not available for the 'C67x at the time of this writing, some parameters (e.g. power consumption) are not addressed here.

From the table, it is clear that the 'C6701 outperforms the 21060 in single-processor, low- and medium-bandwidth configurations. Using a conservative estimate of the sustained computational capacity of the 'C6701, its raw performance exceeds the 21060 by more than five to one.

However, the 21060, although less powerful, has other distinct advantages. Applications requiring large internal memory resources, either program or data, benefit from a configurable internal memory that is four times that of the 'C6701. In addition, multiprocessing applications can take advantage of the efficient native multiprocessing support of the 21060 processor. Finally, the 21060 has a higher cumulative I/O bandwidth than the 'C6701.

Of course, the 'C6701 has substantial I/O bandwidth and, with the assistance of external hardware, it may also be used effectively in multiprocessing architectures. This is investigated in the multiprocessing section.

Local Memory Support Is Key
It is clear that the SHARC gains the upper hand when it comes to internal memory capacity. However, it is rare that an entire application and its associated data can be accommodated in internal memory for either of these devices. It is, therefore, worth investigating the external memory options available in each case--and considering the performance.

High-Performance Memory
There are many instances where the algorithm developer needs high- performance external memory, but in some circumstances, it is critical to the application. For example, high performance is required when code must be executed directly from external memory, and when critical variables (e.g. filter tap coefficients) are stored externally due to a lack of internal resources. Both the SHARC and the 'C67x support high-performance external memory.

A SHARC processor is easily interfaced to asynchronous SRAM (ASRAM), accessible in a single 25-ns clock cycle. Of course, ASRAM is both expensive and low in density, with a practical maximum capacity of 512 k-by-32 per cluster in most commercial-off-the-shelf (COTS) implementations.

The 'C67x directly supports SBSRAM, SDRAM, and ASRAM as high-performance resources. This memory is currently available at 133 MHz, supporting an access every two 6-ns clock cycles of the DSP. It will likely be available at 166 MHz by the time the DSP is shipping, allowing for single-cycle access. The pipeline delay of SBSRAM should be taken into account in throughput considerations, as it is another three cycles for each first access. The consequence here is that critical sections of code must be run from internal DSP memory as the memory will require more than 8 clock cycles to load a single 256-bit instruction from any external memory. As with ASRAM, SBSRAM is expensive and low in density, with a typical allocation of approximately 128k by 32 per DSP in COTS 'C6x boards.

In summary, the SBSRAM interface of the 'C67x gives it a major performance advantage when accessing external memory, four times the throughput of a SHARC accessing ASRAM. However, this can only be realized for multiple consecutive external accesses where the pipeline delay becomes negligible. Furthermore, in cases where consecutive instructions must be accessed from external memory, the theoretical performance of the 'C67x can be reduced from 1328 to 166 MIPS. The SHARC sustains its 40-MIP rate whether it executes from internal or external memory.

High-Density Memory Support
In data-driven applications (e.g. imaging and radar), the DSP requires high-density memory for temporary storage of data. Usually, memory access is sequential due to the correlated nature of the data.

With the addition of some external logic, the SHARC can be interfaced to low-cost, bulk DRAM, with one or two 25-ns wait states. It is fairly typical to find COTS configurations with 64 Mbytes or more of DRAM per cluster. The 'C67x, on the other hand, supports a glue-less connection to SDRAM.

As with SBSRAM, there is a pipeline latency of three cycles, but sequential accesses take two 6-ns clock cycles. Paging and refresh delays also need to be considered as these will result in non-deterministic delays of ten cycles or more. In spite of this, SDRAM clearly has an advantage over DRAM when making sequential accesses to large sets of data.

The recently-introduced Texas Instruments TMS320C67x and the well-established Analog Devices ADSP-2106x SHARC processors are the two highest-performance, floating-point DSPs on the market today.3 Which of these two processors provides the highest system performance? As we shall see, the answer really depends on the kind of task you're trying to perform. Keep in mind that Analog Devices (ADI) will be releasing their next generation SHARC processors, and Texas Instruments (TI) has an aggressive plan to increase the speed of the 'C67x range.

System engineers must select the device that provides the most effective solution to meet the requirements of their DSP application. While the obvious step is to compare the raw processing power of the two processors, this comparison will give little indication of expected system performance, especially in highly demanding multiprocessing applications.

Choosing the most suitable DSP platform, from a systems perspective, requires an analysis of many aspects of the application. First, the I/O data rates and channel density must be reviewed to determine the bandwidth in and out of the system.

The next step involves the mapping of DSP algorithms to DSP devices. This may be complex, and requires an understanding of I/O data paths, memory management, interprocessor communication capability, and synchronization mechanisms. While the resolution of these issues determines the best technical solution, other factors also require consideration. For example, time-to-market is influenced by the availability of third-party library support, and the characteristics of the development tools accompanying each processor.

A comparison of the two components logically begins with an analysis of the features of each device. Rather than a comprehensive feature list, this section summarizes the features that differentiate the performance of each (see the table). Full specifications are available in the data sheets provided by each vendor. As a detailed specification was not available for the 'C67x at the time of this writing, some parameters (e.g. power consumption) are not addressed here.

From the table, it is clear that the 'C6701 outperforms the 21060 in single-processor, low- and medium-bandwidth configurations. Using a conservative estimate of the sustained computational capacity of the 'C6701, its raw performance exceeds the 21060 by more than five to one.

However, the 21060, although less powerful, has other distinct advantages. Applications requiring large internal memory resources, either program or data, benefit from a configurable internal memory that is four times that of the 'C6701. In addition, multiprocessing applications can take advantage of the efficient native multiprocessing support of the 21060 processor. Finally, the 21060 has a higher cumulative I/O bandwidth than the 'C6701.

Of course, the 'C6701 has substantial I/O bandwidth and, with the assistance of external hardware, it may also be used effectively in multiprocessing architectures. This is investigated in the multiprocessing section.

Local Memory Support Is Key
It is clear that the SHARC gains the upper hand when it comes to internal memory capacity. However, it is rare that an entire application and its associated data can be accommodated in internal memory for either of these devices. It is, therefore, worth investigating the external memory options available in each case--and considering the performance.

High-Performance Memory
There are many instances where the algorithm developer needs high- performance external memory, but in some circumstances, it is critical to the application. For example, high performance is required when code must be executed directly from external memory, and when critical variables (e.g. filter tap coefficients) are stored externally due to a lack of internal resources. Both the SHARC and the 'C67x support high-performance external memory.

A SHARC processor is easily interfaced to asynchronous SRAM (ASRAM), accessible in a single 25-ns clock cycle. Of course, ASRAM is both expensive and low in density, with a practical maximum capacity of 512 k-by-32 per cluster in most commercial-off-the-shelf (COTS) implementations.

The 'C67x directly supports SBSRAM, SDRAM, and ASRAM as high-performance resources. This memory is currently available at 133 MHz, supporting an access every two 6-ns clock cycles of the DSP. It will likely be available at 166 MHz by the time the DSP is shipping, allowing for single-cycle access. The pipeline delay of SBSRAM should be taken into account in throughput considerations, as it is another three cycles for each first access. The consequence here is that critical sections of code must be run from internal DSP memory as the memory will require more than 8 clock cycles to load a single 256-bit instruction from any external memory. As with ASRAM, SBSRAM is expensive and low in density, with a typical allocation of approximately 128k by 32 per DSP in COTS 'C6x boards.

In summary, the SBSRAM interface of the 'C67x gives it a major performance advantage when accessing external memory, four times the throughput of a SHARC accessing ASRAM. However, this can only be realized for multiple consecutive external accesses where the pipeline delay becomes negligible. Furthermore, in cases where consecutive instructions must be accessed from external memory, the theoretical performance of the 'C67x can be reduced from 1328 to 166 MIPS. The SHARC sustains its 40-MIP rate whether it executes from internal or external memory.

High-Density Memory Support
In data-driven applications (e.g. imaging and radar), the DSP requires high-density memory for temporary storage of data. Usually, memory access is sequential due to the correlated nature of the data.

With the addition of some external logic, the SHARC can be interfaced to low-cost, bulk DRAM, with one or two 25-ns wait states. It is fairly typical to find COTS configurations with 64 Mbytes or more of DRAM per cluster. The 'C67x, on the other hand, supports a glue-less connection to SDRAM.

As with SBSRAM, there is a pipeline latency of three cycles, but sequential accesses take two 6-ns clock cycles. Paging and refresh delays also need to be considered as these will result in non-deterministic delays of ten cycles or more. In spite of this, SDRAM clearly has an advantage over DRAM when making sequential accesses to large sets of data.

Average (0 Ratings):

Subscribe
Subscribe to Electronic Design and start receiving more articles like this one
Filed Under:

Check for price and availability on Source ESB:

Go
powered by  
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here
Acceptable Use Policy

Sponsored Links