Large language models struggle with generating clean code

The article discusses a study on the reliability and robustness of code generated by large language models (LLMs) for Java coding questions. The study evaluated four code-capable LLMs, including GPT-3.5 and GPT-4 from OpenAI, and found that they exhibited high rates of API misuse. The study also highlighted the importance of assessing code reliability beyond semantic correctness and emphasized the need for static analysis to ensure full coverage. Llama 2, an open model, performed the best with a failure rate of less than one percent.

Original article: Perhaps AI is going to take away coding jobs of those who trust this tech too much

bookmark collected summary

About Paul Welty

Dr. Paul J. Welty is Vice Provost for Academic Innovation at Emory University and Interim Executive Director of Emory Continuing Education. He leads large-scale initiatives integrating AI and emerging technologies to enhance learning, faculty development, and institutional strategy.

His leadership has driven groundbreaking projects including Emory's first faculty information and action system, AI-driven workforce development programs, and establishment of the Emory Center for AI Learning. Under his direction, Emory Continuing Education has launched workforce development initiatives, increased organizational productivity, and expanded access to technology-driven education.

A recognized thought leader on AI's implications for education and work, Dr. Welty regularly contributes to public discourse through research and invited presentations. Recent speaking engagements include Simuvaction 2025 (Quebec City) and the ACEN Learning Symposium keynote on "AI at Work."

Dr. Welty holds a Ph.D. in Philosophy from Emory University with focus in ethics and phenomenology. His career spans technology consulting, academic leadership, and entrepreneurship, consistently driving initiatives that define AI's role in organizational and educational transformation.

This blog examines the intersection of philosophy, technology, and the future of work through analysis that combines conceptual rigor with practical implementation experience.