The article discusses a study conducted by computer scientists at the University of California San Diego on the reliability and robustness of large language models (LLMs) in generating code. The researchers evaluated four different code-capable LLMs using an API checker called RobustAPI. They gathered 1,208 coding questions from StackOverflow involving 24 common Java APIs and tested the LLMs with three different types of questions. The results showed that the LLMs had high rates of API misuse, with GPT-3.5 and GPT-4 from OpenAI exhibiting the highest failure rates. However, Meta’s Llama 2 performed exceptionally well, with a failure rate of less than one percent. The study highlights the importance of assessing code reliability and the need for improvement in large language models’ ability to generate clean code.
Llama 2 avoids errors by staying quiet, GPT-4 gives long, if useless, samples
Discover more from Polymathic
Subscribe to get the latest posts sent to your email.
About Me
Visionary leader driving digital transformation across higher education and Fortune 500 companies. Pioneered AI integration at Emory University, including GenAI and AI agents, while spearheading faculty information systems and student entrepreneurship initiatives. Led crisis management during pandemic, transitioning 200+ courses online and revitalizing continuing education through AI-driven improvements. Designed, built, and launched the Emory Center for Innovation. Combines Ph.D. in Philosophy with deep tech expertise to navigate ethical implications of emerging technologies. International experience includes DAAD fellowship in Germany. Proven track record in thought leadership, workforce development, and driving profitability in diverse sectors.
Recent Posts
- Article analysis: ‘I’ve never been more excited about anything’: Why Marc Benioff is all in on AI
- Article analysis: Agents are the future AI companies promise — and desperately need
- Bookmark: Employees are hiding their AI use from their managers. Here’s why
- Article analysis: AI in organizations: Some tactics
- Article analysis: Has the OPM Market Already Imploded?
Favorite sites
- Daring Fireball
Favorite podcasts
- Manager Tools
Leave a Reply