NVIDIA’s support for machine learning using GPGPUs has been extensive. Its latest Turing architecture-based GPU, the RTX 8000, combines ray-tracing support with machine-learning (ML) acceleration. The Turing architecture includes Tensor Cores to accelerate ML applications.
NVIDIA’s T4 Tensor Core GPU, built on the Turing architecture with its Tensor Core support, targets hyperscale deployment where high-performance interference is needed (Fig. 1). The PCI Express (PCIe) card only needs 75 W. Its small form factor allows for very dense system design.
1. NVIDIA’s T4 is designed for hyperscale deployment and includes Turing Tensor Cores to accelerate machine-learning applications.
The T4’s Tensor Cores support INT4, INT8, FP16, and FP32 data types, enabling developers to optimize performance while minimizing size and the amount of computation needed for a particular deep-neural-network (DNN) model. The system is also designed to address video applications. It can analyze 38 full-HD video streams in real time.
Each T4 has 2560 CUDA cores, 320 Turing Tensor Cores, and 16 GB of GDDR6 with a bandwidth over 320 GB/s. The board includes a x16 PCIe interface, and is rated at 260 INT4 TOPS, 130 INT8 TOPS, 65 FP16 TFLOPS and 8.1 FP32 TFLOPS.
NVIDIA’s DRIVE AGX line (Fig. 2) brings the same architecture to automotive applications that had been served by NVIDIA’s P4 series. The top-end NVIDIA DRIVE AGX PEGASUS, which combines two NVIDIA Xavier processors and two Tensor Core-based GPUs, delivers 320 TOPS. The more compact NVIDIA DRIVE AGX Xavier has a single processor but only needs 30 W. Both are available as development kits.
2. The NVIDIA AGX line brings TensorRT acceleration to automotive applications.
These systems run DRIVE Software 1.0 that targets autonomous systems. The DriveNet DNN support allows vehicles to detect and classify objects in the surrounding environment and track them from one frame to the next. With the LaneNet and OpenRoadNet support, the system can identify lane markings and detect drivable spaces.
The DRIVE IX SDK also includes support for processing input from driver-facing cameras. It can recognize a driver’s facial expression to detect whether they are drowsy or paying attention to the road.
In addition, the software comes with a data-recording tool. Consequently, developers and manufacturers can collect real-time, time-stamped data from various sensors for training, testing, and system validation.
3. Clara AGX is designed for medical instrumentation.
Another application area that NVIDIA is targeting is medical instrumentation, where ML can provide additional support. The Clara AGX (Fig. 3) system is a combination of hardware and software. The Clara SDK provides developers with a set of GPU-accelerated libraries for computing, graphics, and AI designed for medical applications such as image processing and rendering, and computational workflows for CT, MRI, and ultrasound. The tools leverage CLARA containers and Kubernetes, allowing applications to scale.
The Jetson AGX Xavier platform is designed for mobile applications such as robotics (Fig. 4). The platform’s SoC includes a 512-core Volta GPGPU with Tensor Cores along with an 8-core ARM v8.2 64-bit CPU cluster with 8 MB of L2 cache and 4 MB of L3 cache. The module packs 16 GB of 256-bit LPDDR4x with a 137-GB/s bandwidth and 32 GB of eMMC 5.1 flash memory. Non-volatile storage can be expanded using an M.2 Key M (NVMe) or M.2 Key E interface, an SD/UFS socket, as well as an eSATAp interface and an USB 3.0 Type A connection.
4. Jetson AGX Xavier development kit targets robotic and compact machine-learning applications.
The system also incorporates a pair of NVDLA deep-learning accelerators and a 7-way VLIW vision processor. Hardware encode/decode support can handle two 4Kp60 as well as a pair of high-efficiency video coding (HEVC) at 4Kp60. The module measures 105 by 105 mm.
The Jetson AGX Xavier can handle 16 CSI-2 camera connections. It has two x8 PCI Express Gen 4 interfaces, plus a Gigabit Ethernet interface and two USB-C interfaces. One features DisplayPort support, while the other maintains Close-System Debug and Flashing support. A 40-pin header exposes serial and GPIO. The system offers High-Definition Audio, HDMI, and DisplayPort outputs.
Versions are available that use as little as 10 W, with the top end using 30 W. The developer kit is priced at $2,499.