You use agentic AI when there's a clear goal that involves the execution of tasks with a high degree of specialization. What happens whenever there's more than one clear goal, or when you can split it into smaller, specialized goals? Then, one AI agent is not enough. And you'll need the power to integrate several agents into a richer solution. Keep reading to learn what is possible and how to achieve it.
This article is brought to you with the help of our supporter, Apideck.
Apideck helps you integrate with multiple APIs through a single connection. Instead of building and maintaining integrations one by one, use our Unified APIs to connect with hundreds of leading providers across Accounting, HRIS, CRM, Ecommerce, and more.
All connections are built for secure, real-time data exchange, so you can trust that the data you access is always accurate and protected.
Ship integrations 10x faster, reduce dev costs, and accelerate your product roadmap without the integration bottleneck.
To be useful, AI agents need to connect to at least one API. They can do it directly or through a number of mechanisms that simplify the process. LangChain was the first to offer such a solution in late 2022. With their APIChain class, you can create solutions where LLMs make requests to HTTP APIs. More or less at the same time, OpenAI began working on a series of solutions with the goal of interfacing ChatGPT with external APIs. First, they released their plugin system through the .well-known/ai-plugin.json
file that you could use to describe how an LLM would find which API to use to fulfill the needs of the user. Then, they moved to function calling, and finally, since 2024, they have what they call assistant tools. It was also in 2024 that Anthropic, a competitor to OpenAI, launched the Model Context Protocol (MCP) openly. MCP changed the rules of the game because it introduced an abstract layer that sits between the LLM and the APIs it needs to call. Since it's Open Source, anyone can build on it and help improve it over time. All this is brilliant if all you want is to connect one single agent to one or more APIs. But, what if you want to design your solution around multiple agents that can collaborate to reach a solution?
Before analyzing how to make several agents collaborate, I want to understand why I'd want to do that in the first place. Well, if you see an AI agent as a program that can dynamically adapt itself to reach its goal, then it's only natural to think it would try to get help from one or more similar programs. After all, collaboration is one possible organization that increases the odds of succeeding in a given task. One tactic involving collaboration is to split the work between specialized agents so that the outcome of each task is the best possible. Another tactic involves having agents that use different LLMs and different APIs that can change depending on costs. Here, the goal is to find the cheapest way to execute the whole workflow. I'm sure there are other tactics, but I feel these two illustrate well why you'd want to use multiple agents. So, how can you make those agents talk to each other?
The solution to make agents work together is called Agent2Agent Protocol or, in short, A2A. At least, that's what Google and a few dozen partners with names such as Atlassian, Elastic, Oracle, and New Relic, believe in. While solutions like MCP simplify the connection between an agent and the APIs it consumes, A2A aims at streamlining the communication between multiple agents. In fact, both types of solutions are meant to work together. According to Google, A2A "complements Anthropic's MCP, which provides helpful tools and context to agents." The protocol provides solutions to four key areas of agent interaction: capability discovery, task management, collaboration, and UX negotiation. I find it interesting that the protocol creators thought of providing a full end-to-end experience instead of focusing exclusively on the technology. In the end, what matters is what you can do with the A2A protocol, not how it works.
Starting with capability discovery is important because it guarantees that agents can find other agents even before they can communicate with each other. I find this area particularly interesting because it unlocks the ability to map a particular intent to one or more agents. The way it works is based on a mechanism well known to the Web industry. Providers can use the .well-known
path on their domain to host an "agent card." This is a JSON document that includes attributes like the name of the agent, its description, the URL where other agents can invoke it, authentication information, and a list of its skills. The agent card is made available at the /.well-known/agent.json
location, following the standard approach described in RFC 8615. With this rather simple approach, anyone can automatically find if a provider offers an agent and, if so, get all the attributes of such an agent.
After discovery takes place, agents communicate using JSON-RPC. It's interesting that A2A uses the same underlying API style as MCP. The advantage is that all communication uses schema-driven data, which means that all the involved parties know exactly what to expect. Remember that one big reason to use multiple agents is to split a large objective into smaller, specialized tasks. So, it's important to focus the communication on the input parameters and output of tasks, and not so much on their format. Standardizing all aspects of communication is, therefore, fundamental.
One area of A2A that took me longer to understand is the UX negotiation. Essentially, it's the ability agents have to generate artifacts of different media types while executing a whole workflow. Each task can receive multiple types of content that can be manipulated throughout the workflow. Imagine you want to dictate a piece of text and then use a specific voice and tone. One of the agents would receive as input the text and output the dictation. On the following step, another agent would receive as input the dictation audio and return a modified version according to the requested voice and tone. This can happen with any content type and can even pause the workflow until a human user provides the requested information.
In summary, integrating multiple AI agents is possible thanks to the combination of A2A, MCP, and APIs. While A2A and MCP work on top of JSON-RPC, which provides a layer of trust through standardization, the real functionalities lie behind any kind of existing API.