The Reality Check on AI Agents: A Troubling Picture Emerges
AI-summarised brief · reviewed before publication
Venture capitalist investments in artificial intelligence (AI) have experienced a remarkable surge, reaching $131.5 billion in 2024, a 52 percent increase compared to 2023. In the last three months of 2024, over half of all venture capital globally went to AI companies. One of the most hyped areas of AI technology is the development of "AI agents," software products designed to complete multi-part tasks on behalf of their human users. Tech companies and corporations have been enthusiastically promoting these agents, claiming they will "replace knowledge work" and bring about a "fundamental shift in how businesses operate." However, recent research suggests that AI agents may not be living up to their promises. In May, researchers at Carnegie Mellon University released a paper revealing that even the top-performing AI agent, Google's Gemini 2.5 Pro, failed to complete real-world office tasks 70 percent of the time. When partially completed tasks were factored in, Gemini's failure rate only dropped to 61.7 percent. Moreover, the majority of competing agents performed significantly worse. OpenAI's GPT-4o had a failure rate of 91.4 percent, while Meta's Llama-3.1-405b had a failure rate of 92.6 percent. Amazon's Nova-Pro-v1 performed poorly, failing an astonishing 98.3 percent of its office tasks. Further evidence of the challenges facing AI agents comes from a recent report by Gartner, a tech consultant firm. The report predicts that over 40 percent of AI agent projects initiated by businesses will be cancelled by 2027 due to out-of-control costs, vague business value, and unpredictable security risks. According to the report, "most agentic AI projects right now are early stage experiments or proof of concepts that are not yet ready for prime time." These findings cast a shadow over the hype surrounding AI agents, raising questions about the viability of this technology in the near future.