Mediating Interactions Between AI and APIs
How can AI agents interact seamlessly with third-party APIs?
AI systems aren't that intelligent when they have to call external APIs. It's true that AI can easily understand your intents. However, it doesn't really know what API to call to meet your demands. There are a few approaches to match intents with APIs, but none of them are standardized. Can there be a standard way to mediate the interaction between AI and APIs? Let's find out what is possible.
This article is brought to you with the help of our supporter: Speakeasy.
Speakeasy provides you with the tools to craft truly developer-friendly integration experiences for your APIs: idiomatic, strongly typed, lightweight & customizable SDKs in 8+ languages, Terraform providers & always-in-sync docs. Increase API user adoption with friction-free integrations.
I first met Daniel Bentes in July 2024. I had published an article sharing how current AI systems can't reliably understand what external services to use based on your intent alone. Daniel left an intriguing comment saying that there could be an intermediary layer "allowing services to register their intents and actions, and enabling AI agents to discover and execute these intents." I asked Daniel to share more, but at that time, an idea was all there was, nothing more.
Daniel is a Norwegian technologist from Oslo. He’s been working on the intersection of product and technology since the early 2000s. He was the creator and co-founder of Orbit Technology, a company building workspace management solutions. Before that, he worked at companies like Telia and Schibsted. I believe Daniel’s diverse background is what’s behind his ability to think outside of the box. This is exactly what he did now after seeing the shortcomings of how AI systems interact with APIs.
Fast-forwarding to August 2024, I received a message from Daniel saying there was already a document with a draft of a protocol. I got even more intrigued as I obviously wanted to know more. I began researching the work Daniel had put together and found it fascinating. The work is called the "Unified Intent Mediator Protocol," and its goal is to standardize the way AI agents interact with third-party APIs. According to its documentation, "UIM enhances efficiency, scalability, and reliability for AI-driven applications."
The protocol is split into five key concepts. The first concept, intents, defines what actions third-party APIs can expose. The goal is that an AI system can identify what you, the user, are interested in doing and then can search through existing intents to find a match. The AI system can perform the search using natural language, meaning it could literally grab a user's input to find what matching intents are available. What I find interesting is that each intent isn't just a pointer to an API operation. On the contrary.
The second concept of the protocol has to do with intent metadata and parameters. Each intent comes with a comprehensible list of attributes that are all defined in the intent specification. Intents have, among other attributes, an identifier, a name and description, tags, and endpoint information. Intent definitions are part of a wider agent definition stored inside an agents.json
document. All this information is used by the system to make requests to third-party APIs.
And that's exactly what the third concept is about. The execute method is a concept that links what users request to intents and then to third-party APIs. It not only executes the external requests but also does other things like input validation, authentication, and response formatting. The protocol also offers guidance on how to handle third-party throttling, request retries in case of errors, and reuse previously cached data. Altogether, it looks like a robust integration definition capable of handling most cases. What about authorizations and usage policies? The protocol also offers guidance on those.
Ensuring secure and compliant interactions is what the fourth key concept documents. Policy Adherence Tokens, or, in short, PATs, are responsible for encapsulating usage policies, permissions, and obligations. The protocol uses the Open Digital Rights Language (ODRL) to define policies. Each one is an agreement between an AI agent and an API. In short, it specifies what an AI agent is allowed or not to do and what its obligations are. PATs are used on each request to the third-party API as an Authorization bearer token. PATs can even contain things like billing information and data licensing information.
The final key concept of the protocol is the AI agent. The documentation defines an AI agent as "an application or service that utilizes intents to interact with web services." AI agents are responsible for finding available intents and APIs, handling PATs and managing the relationship with APIs, and finally executing those intents following any existing policies.
Overall, the "Unified Intent Mediator Protocol" looks promising. At this point, Daniel is looking for feedback through the protocol's GitHub repo. If you find any of these topics interesting and you feel you can contribute, please go ahead and open an issue or a discussion. I feel many of the concepts in the protocol can also be applied in other situations involving APIs, so I'll continue researching it.
Developing a social media app requires interaction with multiple APIs for user authentication, posting, comments, and reactions. EchoAPI proved to be an excellent tool for testing these API endpoints, ensuring seamless user experience and real-time functionality.