One aspect of using the cloud is having lots of computing and storage resources available at any time. The challenge is managing them and deploying them. This is where Kubernetes enters the picture.
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Google uses it to run billions of containers every week, and others are doing the same thing. It makes management of cloud resources practical, even with a small management team, since reusing configurations is significantly easier than individually configuring systems. The other advantage is these containers can run almost anywhere—that’s because the underlying structure of the cloud is relatively consistent within the orchestration of Kubernetes.
1. NVIDIA demonstrated Kubernetes running on its GPGPUs.
This makes NVIDIA’s demo (Fig. 1) very impressive for two reasons. First, it allows Kubernetes to manage GPGPU resources while managing CPUs. The GPU awareness enables more effective management of GPU applications, such as training in machine-learning (ML) applications.
Second, users are able to take advantage of NVIDIA’s large-scale GPGPU solutions like the latest DGX-2 (Fig. 2). The DGX-2 utilizes NVIDIA’s 16-port NVLink switch chips along with 16 32-GB Tesla V100 GPGPUs. The GPGPUs share a 512-GB HBM2 memory space. This large compute engine will be deployed in the cloud with Kubernetes support.
The NVIDIA demonstration in Fig. 1 scanned pictures of flowers using machine-learning algorithms to identify the type of flower in the picture, taking advantage of GPU compute resources. This is an application that scales well by adding more, parallel instances of the same application. Of course, the demo starts with a single GPU and with an obvious performance increase as more systems with GPUs are brought online.
2. NVIDIA’s DGX-2 combines 16 32-GB Tesla V100 GPGPUs into a single GPU system using NVSwitch chips.
Kubernetes was managing the systems, and it’s easy to see how this could scale to hundreds or more systems using GPGPUs. However, the really impressive part of the demo occurred when some systems were removed and automatically replaced by other nodes within the cloud. This type of load leveling and resilience are part of the Kubernetes solution that’s now applicable to GPU-managed containers and resources.
So far, Amazon Web Services (AWS) and Google Cloud Platform (GCP) provide this type of GPU-enabled Kubernetes support. Microsoft Azure certification is in the works.
The GPU support is implemented as a Kubernetes plug-in. The plug-in makes GPUs a first-class resource that’s managed like other system resources. Future enhancements include GPU monitoring as well as support for different GPUs.
Such support should prove very useful for ML training and inference in the cloud. It can work in private, public, or hybrid clouds, as well as large embedded environments. Large-scale automotive and robotic simulations could benefit from this support, too.