Fig. 1

NVIDIA’s Volta Architecture Gives DRIVE PX Pegasus Its Smarts

Oct. 20, 2017
DRIVE PX Pegasus is NVIDIA’s driverless car platform, hosting a pair of Xavier system-on-chips incorporating NVIDIA’s new Volta GPU architecture.

DRIVE PX Pegasus is NVIDIA’s driverless car platform (Fig. 1). The platform is designed to meet safety Level 5, self-driving car autonomy requirements as well as ISO 26262 and ASIL D functional safety specifications. The board delivers more than 320 trillion operations/s (TOPS) and is powered by a pair of Xavier system-on-chips (SoC). The Xavier incorporates the latest Volta GPU architecture designed to support deep neural networks (DNN) and machine learning.

DRIVE PX Pegasus is designed to handle up to 16 dedicated, high-speed sensor inputs. It also supports 10-Gbit Ethernet, Flexray, and CAN. It has a memory bandwidth greater than 1 Tbyte/s. Its built-in security support is designed to authenticate over-the-air (OTA) updates.

The Xavier AI system-on-chip (SoC) is designed to deliver 20 TOPS while consuming under 20 W. The CPU consists of eight 64-bit ARMv8-compatible cores. The GPU has 512 cores. There is also a Computer Vision Accelerator (CVA) and an 8K high dynamic range (HDR) visual processing (VP) unit.

The bulk of the Xavier AI SoC is taken up with the Volta GPU that has been augmented to handle AI applications—DNN in particular. The GPU alone requires more than 21 billion transistors. The GPU includes 640 Tensor Cores that deliver over 100 Teraflops/s (TFLOPS). That is five times what the NVIDIA Pascal architecture delivers.

The Volta streaming multiprocessor architecture (Fig. 3) is designed to be more efficient than the previous Pascal design while boosting FP32 and FP64 performance and addressing. The hierarchy includes multiple Graphics Processing Clusters (GPC), Texture Processing Clusters (TPC), Streaming Multiprocessors (SM), and memory controllers. Overall there are six GPCs, 42 TPCs, 84 SMs, and eight 512-bit memory controllers that connect to the high bandwidth memory version 2 (HBM2).

Each SM is broken into four blocks (Fig. 4) that include 64 FP32 cores, 32 FP64 cores, 64 32-bit integer cores, and eight Tensor cores. The floating point and integer cores can operate independently. Each block has up to 32 active threads.

Each Tensor core provides a 4-by-4-by-4 matrix processing array (Fig. 5). The system implements a fused multiply-add (FMA) operation. Arrays A and B are FP16 while C and D may be FP16 or FP32. Each core can execute 1024 floating point operations per cycle.

Multiple Tensor cores can be combined concurrently. This allows threads to provide a larger 16-by-16-by-16 matrix operation. NVIDIA’s CUDA exposes these operations as Warp-Level Matrix Operations within the CUDA C++ API. There are C++ interfaces provided for specialized matrix load, matrix multiply and accumulate, and matrix store operations to efficiently program these cores.

Multiple Xavier chips can be linked using Volta’s NVLink (Fig. 6). NVIDIA’s high-speed interconnect technology is designed to connect multiple CPU and GPU subsystems together.  

NVIDIA now has a family of DRIVE PX platforms, including the top-end DRIVE PX Pegasus. The other platforms in the family include the DRIVE PX Xavier with a single Xavier chip, plus the DRIVE PX Parker AutoChauffeur and DRIVE PX Parker AutoCruise that use NVIDIA’s older SoCs.

About the Author

William G. Wong | Senior Content Director - Electronic Design and Microwaves & RF

I am Editor of Electronic Design focusing on embedded, software, and systems. As Senior Content Director, I also manage Microwaves & RF and I work with a great team of editors to provide engineers, programmers, developers and technical managers with interesting and useful articles and videos on a regular basis. Check out our free newsletters to see the latest content.

You can send press releases for new products for possible coverage on the website. I am also interested in receiving contributed articles for publishing on our website. Use our template and send to me along with a signed release form. 

Check out my blog, AltEmbedded on Electronic Design, as well as his latest articles on this site that are listed below. 

You can visit my social media via these links:

I earned a Bachelor of Electrical Engineering at the Georgia Institute of Technology and a Masters in Computer Science from Rutgers University. I still do a bit of programming using everything from C and C++ to Rust and Ada/SPARK. I do a bit of PHP programming for Drupal websites. I have posted a few Drupal modules.  

I still get a hand on software and electronic hardware. Some of this can be found on our Kit Close-Up video series. You can also see me on many of our TechXchange Talk videos. I am interested in a range of projects from robotics to artificial intelligence. 

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!