Who should blame him when AI agents screw up?

Admin May 22, 2025

0 0 3 minutes read

Who should blame him when AI agents screw up?

past A year later, senior software engineer Jay Prakash Thakur spent a night and weekend, and the prototype AI agent ordered meals and engineer mobile apps almost entirely alone in the near future. While his agents have surprisingly capable , he also uncovered new legal issues awaiting the company’s attempt to exploit Silicon Valley’s hottest new technology.

Agents are AI programs that can mostly act independently, allowing companies to automate tasks such as answering customer questions or paying invoices. While Chatgpt and similar chatbots can draft emails or analyze bills as required, Microsoft and other tech giants expect agencies to handle more complex features, most importantly, if human supervision is minimal.

The most ambitious plan in the tech industry involves multi-agency systems, with dozens of agent teams one day replacing the entire workforce. For companies, the benefits are obvious: time and labor costs are saved. Demand for technology is already rising. Technology market researcher Gartner estimates that Agesic AI will solve 80% of the average customer service inquiries by 2029. Fiverr, a service where businesses can book freely coders, has reportedly soared 18,347% in recent months.

Thakur is a self-taught coder based in California who wants to be at the forefront of emerging fields. His daily work at Microsoft has nothing to do with agents, but he has been tinkering with Microsoft’s construction agent Autogen as he worked at Amazon in 2024. Thakur said he has developed multi-agent prototypes using Autogen and has done some programming with only Autogen. Last week, Amazon launched a similar agent development tool called Strands. Google provides what is called a proxy development kit.

Since the agents’ intention is to act autonomously, who is responsible when their mistakes cause financial losses is Thakur’s biggest problem. He believes that when agents from different companies communicate in a single large system, it can cause controversy, blame it on blame. He compared the challenges of reviewing error logs for various agents to reconstruct conversations based on the notes of different people. “It is usually impossible to determine the responsibility,” Thakur said.

Joseph Firefighter, a senior legal counsel at Openai, said at a recent legal meeting at the San Francisco Media Legal Resource Center that aggrieved parties tend to follow those with the deepest pockets. This means companies like him will need to be prepared to take some responsibility when the agent causes harm – even if a child screws up the agent, it may be blamed. (If that person is at fault, then the idea may not be a worthy goal.). “I don’t think anyone wants to be able to sit on a computer in the mom’s basement,” the firefighter said. The insurance industry has begun rolling out coverage for AI chatbot issues to help companies pay for their unfortunate costs.

Onion rings

Thakur’s experiment involves him stringing agents together in systems that require as little human intervention as possible. One project he pursued was to replace software developers with two agents. One was trained to search for dedicated tools needed to make an app, and another summarized its usage policy. In the future, third-generation agencies can use identified tools and follow the summary policy to develop a completely new application, Thakur said.

When Thakur tested its prototype, the search agent found a tool that “supports unlimited requests per minute for enterprise users” (which means high-paying customers can rely on it as needed). However, in an attempt to refine key information, the summary agent abandoned the critical qualification of “enterprise user per minute”. It incorrectly tells the coded proxy that it is not eligible for an enterprise user and that it can write a program that makes unlimited requests to external services. Because this is a test, there is no harm. If it happens in real life, truncated guidance can cause unexpected crashes throughout the system.

Admin May 22, 2025

0 0 3 minutes read