The study employed Z3 SMT formal verification to analyze 3,500 code samples generated by various large language models. The analysis revealed that over half of the generated code contained at least one proven vulnerability, with many having exploitable weaknesses. Notably, no model performed below 48% vulnerable code, and industry-standard security tools missed most identified flaws. Despite models often recognizing their own errors during review, they still produced a significant amount of buggy code.
reddit.com
reddit.com
