Llama 2 avoids errors by staying quiet, GPT-4 gives long, if useless, samples

The article discusses a study conducted by computer scientists at the University of California San Diego on the reliability and robustness of large language models (LLMs) in generating code. The researchers evaluated four different code-capable LLMs using an API checker called RobustAPI. They gathered 1,208 coding questions from StackOverflow involving 24 common Java APIs and tested the LLMs with three different types of questions. The results showed that the LLMs had high rates of API misuse, with GPT-3.5 and GPT-4 from OpenAI exhibiting the highest failure rates. However, Meta's Llama 2 performed exceptionally well, with a failure rate of less than one percent. The study highlights the importance of assessing code reliability and the need for improvement in large language models' ability to generate clean code.

https://www.theregister.com/2023/08/29/ai_models_coding/

bookmark collected newsletter summary teams

About Paul Welty

Dr. Paul J. Welty is Vice Provost for Academic Innovation at Emory University and Interim Executive Director of Emory Continuing Education. He leads large-scale initiatives integrating AI and emerging technologies to enhance learning, faculty development, and institutional strategy.

His leadership has driven groundbreaking projects including Emory's first faculty information and action system, AI-driven workforce development programs, and establishment of the Emory Center for AI Learning. Under his direction, Emory Continuing Education has launched workforce development initiatives, increased organizational productivity, and expanded access to technology-driven education.

A recognized thought leader on AI's implications for education and work, Dr. Welty regularly contributes to public discourse through research and invited presentations. Recent speaking engagements include Simuvaction 2025 (Quebec City) and the ACEN Learning Symposium keynote on "AI at Work."

Dr. Welty holds a Ph.D. in Philosophy from Emory University with focus in ethics and phenomenology. His career spans technology consulting, academic leadership, and entrepreneurship, consistently driving initiatives that define AI's role in organizational and educational transformation.

This blog examines the intersection of philosophy, technology, and the future of work through analysis that combines conceptual rigor with practical implementation experience.