Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out

Moonshot AI has released Kimi K2.7-Code, an open-source update to their K2 coding model. This new version is built on a trillion-parameter mixture-of-experts architecture and offers an OpenAI-compatible API for easy integration. Moonshot AI claims K2.7-Code exhibits leaner reasoning and improved performance, with a significant reduction in thinking-token usage. This efficiency gain is expected to lower inference costs for teams using agentic workflows. The model directly authors implementations rather than wrapping existing libraries, aiming for better generalization across programming languages and task types. Moonshot AI reports substantial gains on its proprietary benchmarks like Kimi Code Bench v2 and Program Bench. However, independent evaluations suggest a more nuanced picture regarding its capabilities. One researcher found K2.7-Code to be more "honest" in code generation but not necessarily more capable, with some generated code failing. Concerns have also been raised about Moonshot AI's use of proprietary benchmarks for performance claims. Despite these questions, the token efficiency improvement offered by K2.7-Code is immediately applicable for enterprises running K2.6. Teams can test K2.7-Code on their own workloads to assess real-world performance gains before making changes.

https://venturebeat.com/technology/kimi-k2-7-code-cuts-thinking-tokens-30-practitioners-say-benchmarks-dont-check-out venturebeat.com

RSS Hunter • Jun 12