Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Lab Bench Online]
Multicore Heavy Hitters

William Wong  |   ED Online ID #20640  |   February 4, 2009


To list multicore solutions available for general embedded development, the hex core Xeon by Intel (see the fig.), Nvidia’s GPU (see the fig.), and XMos’ XC-1 (see the fig.) multicore platforms are a great starting point. I also took a look at some of the software support for these platforms, like Intel’s Thread Building Blocks (TBB) for x86 platforms, plus CUDA and OpenCL for GPUs. Actually open computing language (OpenCL) has a broader working context.

At the top is the server workhorse from Intel, its hex core Xeon. This particular platform may be superseded by the time this article is published, but it will simply be a faster, more power-efficient platform. The four-chip system Intel loaned me allowed testing of TBB and virtual machine environments with 24-cores to manage.

The next multicore platform was NVidia’s GTX 260 video adapter. I provide a short overview of this board as a video adapter, but the main reason for testing it was to check out CUDA and OpenCL. CUDA opened up NVidia’s graphics adapters to programmers, allowing its use in non-graphic applications in addition to providing closer contact for video applications.

Finally, there is the tiny Xmos XC-1. This chip is so cool it doesn’t need a heat-sink even while running 32 hardware managed threads on its four cores. Compare this to the robust cooling systems of the two other platforms in this article and you can get a feel for multicore in mobile devices.

Lots of Xeons Cores
Intel’s development platform belongs in a rack well away from prying eyes and ears. Turn it on and it sounds like an aircraft carrier, due to the pair of hot swappable power supply/cooling systems. Its par for the course when it comes to a 3U rack mount system, but it made the labs a bit noisy.

The box also highlights the trend to 2.5-in. hard drives in the corporate server environment. It allows more drives to be part of a RAID system. It is also leading to changes in RAID support and the arrangement of drives because of its smaller size. The row drives on this system only utilizes a fraction of the interior space, allowing more space for the motherboard and four hex core processors. But, in many instances a rack system may just be storing drives, leaving lots of wasted space if the drives are only in front (as was mine).

I used the platform to check out a range of software from development tools to virtual machine management. The first chunk of software I looked at was Intel’s TBB, running under Windows since Windows Server 2008 was already installed on the system. I’ve seen TBB before (click here to read the article) so I won’t get into the details here. Suffice it to say that having 24 cores available to TBB means applications are very, very fast. What has changed lately is the release of Intel’s Parallel Studio. This includes TBB along with a lot of other tools, plus integration Microsoft’s Visual Studio. It was usable with Microsoft tools, but it was a bit of a challenge. Some of the new items in Parallel Studio are Parallel Advisorm which came in handy when starting a project. The Parallel Inspector incorporates additional debugging support that is integrated with Visual Studio. Finally there is Parallel Amplifier that uses Intel’s Thread Profiler and the VTune Performance Analyzer.

That’s a lot of software, and I did not get to exercise all of it to any great degree, but I can see where it will be invaluable for developers. The tools provide significantly more insight into the operation of a TBB application even if you are using a runtime library rather than writing your own parallel code.

After putting TBB through its paces I overwrote Windows Server with a couple Linux installations, including Centos and Ubuntu, just to see what they recognized and to try out the system with lots of virtual machines. Xen was the virtual machine manager (VMM) of choice since I have a number of systems configured for Xen already and I could grab a couple images to use as test subjects.

Limited primarily by the amount of RAM that Intel sent along with the system, the system hummed merrily away. With 24 cores, you want as much RAM as you can afford.

VMM is something many IT specialists have had, but embedded developers tend to be dealing with a much smaller number of cores. Still, a standard desktop with a single processor machine and VMM support can handle many virtual machines, so you can exercise the management of such platforms easily. Things become more interesting with 24 cores, but it tends to be more of the same.

TBB and CUDA run on Linux as well. I tried CUDA when I had Linux installed. It is the same version that I looked at later with Nvidia’s GPU. I did not get a chance to check out OpenCL—may be something for the future. The latest version of CUDA actually supports OpenCL. What was more interesting was installing CUDA on a virtual machine, and then using the same image on the Intel system, and later on another server with a similar configuration, but only a single multicore chip.

The only difference I found between systems was performance when dealing with large datasets. Of course they were all arbitrary items like hi-res images. The bottom line, this system is clearly one of the fastest around.

Working with such a high-end server was a new experience for me. It is one that many IT managers are used to, as most corporate environments have racks and racks of systems of this caliber; but it is unlikely that many embedded developers will have the same opportunity. Still, the number of cores on a single chip continues to climb and embedded dual and quad core systems are growing in number. Developers will need to be aware of the depth and breadth of such systems before it is too late. Debugging is one area where the change will be more radical, but that will have to be left to another article.


<-- prev. page     [1] 2     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?



POST YOUR COMMENTS HERE
Name:

Email:
Your Comments:

Enter the text from the image below


Please refresh the page if you have trouble reading this text.

Search Electronic Design
     
  
 
Web Seminar
Sponsored By:
Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
Speakers: 
Date: 07/01/08
Register: 

Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources