"Do androids dream of electric sheep?" Asked Philip K. Dick in his 1968 homonymous novel. According to the story, the only way to tell an android apart from a human was through an empathy test. Empathy, as it turns out, is a good way to detect human behavior. In the case of androids, other types of detection aren't possible because they have a body that looks like a human. Luckily for us, we're not yet at that point. AI agents don't have bodies. However, they communicate. And, sometimes, they do it in a way that sounds a lot like a human. What languages do they use? Stay with me to find out.
This article is brought to you with the help of our supporter: Speakeasy.
Further expanding its best-in-class API tooling, Speakeasy now empowers teams with open standards. The platform simplifies OpenAPI Overlay adoption so you can focus on building. Click the button below to check out the playground.
To understand how AI agents communicate, we first need to know what they are. Let's dissect first the meaning of "agent," and then see how it relates to the concept of artificial intelligence. According to Canadian-American psychologist Albert Bandura, agents are individuals who actively influence their own functioning. Agents are self-regulated, meaning they can adapt themselves and regulate their motivation and behavior. They're also proactive in the sense that they can create plans and take action to achieve their goals. We, human beings, are all agents at our core. We have the power to control ourselves and the environment around us to attempt to achieve our objectives. Depending on specific life-related factors, the level of agency we have can vary. However, it's undeniable that we humans are, at our core, agents. What about machines?
Machines can be as agentic as we want them to be. There are interesting differences between machines that are merely programmed to do something and the ones we consider agentic. The classification works like a spectrum, where, on one end, you have fully programmable machines and, on the other end, you have the ones that are fully agentic. The characteristics that determine how agentic machines are have to do with flexibility, decision-making, handling changes, scalability, and interaction with external functions, e.g., APIs. So, while a programmed machine strictly follows what it's been programmed to do, an agentic one can adapt dynamically to the environment. Programmed machines work on a set of predefined conditions, while agentic ones are helped by AI-driven reasoning to decide what to do next. Whenever there's a change, someone needs to update programmed machines, while agentic ones can evolve and update themselves. Programmed machines are, evidently, limited to the scenarios they've been programmed for, while agentic ones can expand and adapt dynamically to whatever scenario they encounter. Finally, while programmed machines can only interact with specific external functions and APIs, agentic ones dynamically decide which functions and APIs to consume.
What we call "AI agents" are, in fact, and more generically, agentic machines. They act in a way that resembles how humans behave. How do they communicate, though? Or, more importantly, how do we communicate with them? Those questions leave us with the definition of language, in the context of communicating between machines and between us and them. Language as a concept is not what's important. Because agentic machines can use translation functions, they can understand any language. What's important is not the syntax of the language but, instead, its semantics. What's important is the details of how the communication happens. Do we tell an agentic machine what we want it to do? Or, how we want it to perform? Or, do we tell it what we need, and it will figure out what it has to do and how to get it done? What about the communication between agentic machines? Do they "program" each other? Or, do they follow the same approach as we do? So many questions worth pondering.
Let's start with how we, humans, communicate with AI agents. Ideally, we want to tell machines what we need, not what we want them to do. This style of communication is what HCI researcher Jakob Nielsen calls "Intent-based outcome specification." According to Nielsen, "the user tells the computer the desired result but does not specify how this outcome should be accomplished." Is this how currently humans interact with AI agents? Not really. Most AI prompts tell machines what to do. And, in many situations, prompts even explain how to get things done. This type of interaction is even worse than using a GUI because commands take longer to execute and are prone to errors of interpretation. So, why do we insist on using it? Probably because we're coming from a reality of command-based interactions and we assume that AI agents would work in the same way.
So, if AI agents can understand our intent or command-based communication, how do they speak with each other? Here, the choice of language changes with the capabilities of the machines. If one AI agent needs to communicate with another one it will have to use a prompt, because that's what agents understand. In that case, the first agent needs to craft a prompt, probably using an intent-based communication style, and send it to the second agent. On the other hand, if one AI agent needs to communicate to a non-AI machine it will use the specific language that the second machine understands. If the AI agent communicates with a REST API, for example, it will use HTTP and make the necessary requests to perform the operations it needs. It's natural that AI agents are better at communicating using prompts. After all, that's the type of communication they're based on. So, whenever they need to cross into the world of non-AI machines, they need help with "translation." This is what things like MCP or, before that, OpenAI's x-oaiMeta
have been doing.
In summary, prompts are the most popular communication style. However, people still follow the command-based approach instead of telling AI agents what they need and following an intent-based position. Communication between machines varies from a prompt style whenever all participants are AI-based to the style of the receiving end, e.g., REST API. We're still a long way from having all participants communicating in the same language and, until then, translation devices like MCP will thrive.