Towards Data Science | Medium

LLMs for Coding in 2024: Price, Performance, and the Battle for the Best

The landscape of Large Language Models (LLMs) for coding has become increasingly competitive, with major players like Alibaba, Anthropic, Google, Meta, Mistral, OpenAI, and xAI offering their own models. To evaluate these models, we can look at their performance in coding tasks as measured by benchmarks like HumanEval, and their observed real-world performance as reflected by their respective Elo scores. OpenAI's models dominate in performance, with their top model outperforming the best non-OpenAI model by 46 Elo points and 3.9% in HumanEval. Interestingly, Google's models perform significantly better than reported, with their newest Gemini 1.5 Pro model being the top performer in this regard. Meanwhile, Alibaba and Mistral tend to create models that overfit the benchmark, performing better on benchmarks than in real life. When considering both performance and price, OpenAI and Google models make up the Pareto front, with OpenAI offering high performance and Google focusing on lighter weight, cheaper models. Over time, models are getting better and cheaper, with proprietary models continuing to dominate the market. Even minor model updates can significantly impact performance.
favicon
towardsdatascience.com
towardsdatascience.com
Create attached notes ...