Data Models, Types, or Schemas?
What do you call the definition of an API input and output payload?
Naming things is hard. I've recently been involved in conversations that had to do with the definition of API input and output data. While the topic didn't seem too complicated, I immediately noticed an unpleasant struggle. Different people were using different names to refer to the definition of data. To some people, "data model" was the correct way, while some others used "data types" instead. There even were others who used just the word "schema." I'm one of them, and finding these alternatives intrigued me to the point of having to share my opinion here. Read on.
This article is brought to you with the help of our supporter: Speakeasy.
Further expanding its best-in-class API tooling, Speakeasy now empowers teams with open standards. The platform simplifies OpenAPI Overlay adoption so you can focus on building. Click the button below to check out the playground.
There is only one way to determine if something is correct: being pedantic about its meaning. At least, that's how I feel whenever I find myself in an argument about the definition of something. The other option is to defend your taste even if you know you're wrong in the first place. I prefer being pedantic, if you know what I mean. So, let's see what each of the options means from a purely dictionary perspective.
A data model is nothing more than a representation, often simplified, of the way particular information exists and circulates. Data models exist to help people and machines make sense of the way data is organized. One of the key elements of a data model is a definition of each of the properties it's representing. So, for example, say you're creating a data model of a person. You'd want to define what a person's name would be like, how many characters would it have, what kind of words it could be made of, and so on. The same thing happens with, for instance, a person's age. You'd want to define how you'd represent age. If it's in years, months, or something else. All these constraints and definitions constitute what a data model is. The bad thing about the word model is that it's very popular. However, not in the way we wish it would be. Almost all the time, people refer to a model in the context of AI—as in a large language model.
Data type is the second option we have. We're lucky because there's a direct definition of "data type" in the dictionary. It says that a data type is a specific kind of data item, one that holds the definition of the values it can store. In a way, data types are what we were referring to before when we were trying to come up with the properties of a person. You define each property of a person by a data type. A person's name, age, gender, and so on are defined by data types. I think it's safe to say we can discard this option. People—at least the ones I interact with the most—usually associate it with primitive data types such as numbers, strings, and booleans. Yes, API input and output data can be as simple as a string, but that's not the most common case. Input and output payloads are often objects formed by primitive data types and even by other objects and arrays.
That leaves us with the third option: data schema. Schema is a word whose meaning is very similar to the word model. In fact, both could be interchangeable. What happens is that people use the word schema to refer to multiple things. So, sometimes there can be confusion about what it really means. So, for instance, some people call an OpenAPI document a schema. To other people, a schema is a way to validate a JSON document (in reference to JSON schema). However, schema is the word the OpenAPI specification officially uses to refer to how it lets you configure payloads. According to the OpenAPI specification version 3.1.1, a schema is "a formal description of syntax and structure." It's through its schema object that you can define input and output payloads. According to the specification, payloads can be "objects, but also primitives and arrays." The AsyncAPI specification version 3.0.0 follows a similar approach. According to its reference, a schema object "allows the definition of input and output data types." GraphQL also uses the word schema to refer to "what data can be queried from the API." Even gRPC follows a path that is somehow similar to the other specifications. Protocol buffers have "a fully reflective schema that you can use to implement self-description."
It sure looks like "data schema" is the best way to refer to the definition of API input and output payloads. I'll stick to it from now on.