Boosting Salesforce Einstein’s code generating model performance with Amazon SageMaker

Salesforce, a cloud-based software company, is working towards artificial general intelligence (AGI) for business. They have a set of AI technologies called Salesforce Einstein, which integrates with their Customer Success Platform to improve productivity and client engagement. Einstein has over 60 features, including machine learning, natural language processing, computer vision, and automatic speech recognition. The Salesforce Einstein AI Platform team is focused on enhancing the performance and capabilities of AI models, particularly large language models (LLMs) for use with Einstein product offerings. The team faced challenges with hosting LLMs, including securely hosting their model, handling a large volume of inference requests, and meeting throughput and latency requirements. They evaluated various tools and services, including open-source options and paid solutions, and chose Amazon SageMaker due to its access to GPUs, scalability, flexibility, and performance optimizations. SageMaker offered features such as multiple serving engines, advanced batching strategies, efficient routing strategy, access to high-end GPUs, and rapid iteration and deployment. The Einstein team used SageMaker to optimize the performance of their LLMs, reducing latency and improving throughput. They observed significant improvements in both throughput and latency after using SageMaker optimization. The team also identified an opportunity to improve resource efficiency by hosting multiple LLMs on a single GPU instance. Their feedback helped develop the inference component feature, which now allows Salesforce and other SageMaker users to utilize GPU resources more effectively.

aws.amazon.com

RSS Hunter

2024-07-24

Create attached notes ...