Run your AI inference applications on Cloud Run with NVIDIA GPUs

Google Cloud has introduced NVIDIA L4 GPU support for Cloud Run, now in preview, enabling developers to perform real-time AI inference with ease. This upgrade is particularly beneficial for applications utilizing open generative AI models, such as Google's Gemma and Meta's Llama. Key features include fast autoscaling, scale-to-zero, and pay-per-use pricing, making Cloud Run ideal for handling variable user traffic and cost optimization. With this new capability, developers can deploy lightweight models for tasks like custom chatbots and document summarization, or more compute-intensive applications like image recognition and 3D rendering. The NVIDIA GPUs enhance performance by speeding up AI inference processes, offering low latency and efficient scaling, with Cloud Run's infrastructure managing the underlying complexities. Early adopters, like L’Oreal and Chaptr, have praised the GPU integration for its low startup times, scalability, and ease of use. The GPU support is currently available in the US-central1 region, with plans to expand to Europe and Asia by year-end. To deploy a service with NVIDIA GPUs on Cloud Run, developers can specify GPU requirements via command-line or the Google Cloud console. Additionally, Cloud Run now supports functions with GPU attachments, simplifying event-driven AI inference tasks.

cloud.google.com

TheNote.app (macOS, iOS and Android apps)

2024-08-21