Google Cloud Blog

Efficiently serve optimized AI models with NVIDIA NIM microservices on GKE

Google Cloud and NVIDIA have announced the availability of NVIDIA NIM on Google Kubernetes Engine (GKE), allowing users to deploy NIM microservices directly from the GKE console. This integration optimizes AI model inference, providing optimal latency and throughput with the scale and operational efficiency of GKE. NVIDIA NIM containerized microservices optimize deployment for common AI models, offering standard APIs for seamless integration into generative AI applications and workflows. The combination of NVIDIA NIM and GKE unlocks new potential for AI model inference, helping organizations deliver optimal latency and throughput with the scale and operational efficiency of GKE. Users can deploy the latest NIM-optimized models on GKE with just a few clicks, expanding upon the previously available helm-based deployment. This collaboration improves deployment abilities and uses advanced technology to ensure top performance and reliability. To get started with NVIDIA NIM on GKE, navigate to the Google Kubernetes Engine in the Google Cloud console, select NVIDIA NIM, and launch it to configure your deployment. After deployment, connect to your NIM endpoint and send a test inference with a curl command.
favicon
cloud.google.com
cloud.google.com
Create attached notes ...