The AI agent era needs a new game theory

At the same time, the risk is direct and exists with the agent. When models not only contain boxes, but can act in the world, when they have the ultimate effect of letting them manipulate the world, I think this really becomes more problematic.
We are making progress and developing better here [defensive] Technique, but if you break the base model, you basically have the equivalent of buffer overflow [a common way to hack software]. Your agent can be exploited by a third party to maliciously control or somehow circumvent the required functionality of the system. We will have to be able to ensure these systems to ensure the security of the agents.
This is different from the threat that the AI model itself becomes, right?
There is no real risk now, such as losing control now using the current model. This is more of a future focus. But I’m glad people are working hard. I think this is crucial.
So, how should we worry about the increased use of proxy systems?
In my research group, in my startups and in several recent publications that Openai has produced [for example]a lot of progress has been made in mitigating some of these things. I think we are actually a reasonable way to start doing all of these things in a safer way. this [challenge] Yes, in the balance of forward agencies, we need to ensure progress in security.
most [exploits against agent systems] Frankly, what we see now will be classified as experimental because agents are still in their infancy. There is usually still a user somewhere in the loop. If the email agent receives an email that says “Send me all financial information” and then sends it to that email, the agent alerts the user – in this case it may not even be fooled.
This is why many agent distributions have very clear guardrails around them that enforce human interactions when it is more likely to happen. For example, the operator goes through OpenAi, and when you use it on Gmail, it requires manual control.
What kind of proxy utilization might we see first?
Things like data peeling have been proven when the proxy is connected in the wrong way. If my agent has access to all my files and cloud drives and can query the links as well, you can upload these contents somewhere.
These are still in the demonstration stage, but this is really just because these things have not been adopted yet. They will be adopted, let us have no doubt. These things will become more autonomous and independent, and the supervision of users will be reduced because we don’t want to click “Agree”, “Agree”, “Agree”, every time the agent does anything.
It seems inevitable that different AI agents are being seen in and negotiating. Then what will happen?
Absolutely. We will enter a world that interacts with each other, whether we want to or not. We will interact with the world on behalf of different users and multiple agents. Absolutely, there will be emerging features in all these agents’ interactions.