RSS Cloud Blog

Announcing smaller machine types for A3 High VMs

Organizations are increasingly using GPUs to run inference on their AI/ML models, and they need more granularity in the number of GPUs in their virtual machines to keep costs low while scaling with user demand. Google Cloud offers A3 High VMs powered by NVIDIA H100 80GB GPUs in multiple machine types, including 1, 2, 4, and 8 GPUs. These machine types are available through Vertex AI, Google Kubernetes Engine (GKE), and Google Compute Engine. The 1, 2, and 4 A3 High GPU machine types are also available as Spot VMs and through Dynamic Workload Scheduler (DWS) Flex Start mode. GKE provides a cost-efficient, highly scalable, and open platform for training and serving AI workloads, and GKE Autopilot reduces operational cost and offers workload-level SLAs. Vertex AI is a fully managed, unified AI development platform for building and using predictive and generative AI. The new A3 High GPU machine types enable Model Garden customers to deploy hundreds of open models cost-effectively and with strong performance. Customers can use these machine types to reduce latency and improve user experience. Google Cloud aims to provide flexibility to run inference for AI and ML models cost-effectively and with great performance. The availability of A3 High VMs using NVIDIA H100 80GB GPUs in smaller machine types provides the granularity needed to scale with user demand while keeping costs in check.
cloud.google.com
cloud.google.com
bsky.app
AI and ML News on Bluesky @ai-news.at.thenote.app
Create attached notes ...