Orphan, Shadow, and Zombie APIs
Why do we use such complicated terminology when referring to API states?
APIs that go rogue are almost impossible to tame. While managing the APIs you already know can sometimes be challenging, imagine the difficulty of controlling the ones you don't even know exist. What I think is even more confusing is the terminology we're using when we refer to the different types of unknown APIs. Read on if, like me, you often feel confused when someone says an API is an orphan, shadow, or zombie.
This article is brought to you with the help of our supporter: Speakeasy.
Speakeasy provides you with the tools to craft truly developer-friendly integration experiences for your APIs: idiomatic, strongly typed, lightweight & customizable SDKs in 8+ languages, Terraform providers & always-in-sync docs. Increase API user adoption with friction-free integrations.
Calling an API "rogue" isn't something I hear every day—thankfully. Because, in the existing terminology, people refer to a rogue API as one with unpredictable and mostly dangerous behavior. You wouldn't want to come across too many rogue APIs, would you? But if you do, you want at least to know how rogue an API is. Is an API rogue because its behavior is erratic? Or is the API dangerous in a way that can damage the business? Or is it a threat from a security perspective?
Thankfully—to many people, not me—there's a categorization of the level of rogueness of an API. In fact, there are three broad categories that you can use to identify rogue APIs: shadow, orphan, and zombie. Each one lets you focus on a particular characteristic. However, you can find APIs that are in more than one category. Which makes them even more rogue. I never like referring to certain APIs as rogue. Even less, to think of an API as a zombie or as any of the other categories, for all that matters.
Let's get into the definitions of the three categories, starting with "orphan." APIs that fall in this category are those that are documented, published, and deployed but don't have any traffic. They can be full APIs or just individual operations. The part that matters is that no one is using them. If no one uses a whole API or some of its operations, it means that you're wasting resources and unnecessarily increasing the perceived complexity of documentation. To begin with, you have to maintain code that no one is using, increasing the chances of bugs and security risks. Then, you end up allocating infrastructure that's never consumed. Instead, you could allocate that infrastructure to other more important projects. Finally, your API documentation is more complex than it could be because it includes operations that no one uses.
In summary, APIs that are just sitting there with no consumers are a bad thing because they're wasting precious resources. I totally agree with this view. What I don't buy is the naming of the category. In my mind, I associate the word orphan with something that doesn't have a parent. An orphan is a living being with no parents. Not something that is idle, waiting for someone to use it. I would call these APIs "inactive" or perhaps "unused." But orphan? That feels like the wrong analogy to me. When I hear orphan, I think of an API that's not being managed. One that might not even have any documentation. There's a different name for those, in fact. They're called shadow APIs.
APIs that exist but aren't actively managed are called "shadow." This category of APIs includes those that don't have any documentation, don't have any documented authorization, and aren't being instrumented. These APIs exist, and there are consumers using them. However, they're not part of the list of APIs actively managed and monitored. These APIs are hard to detect. They're even harder to document if you don't have access to their source code. So, how do you find them if they're not instrumented? One solution that's been gaining popularity is using an eBPF approach. By tapping into the network connections, you might be able to see requests to endpoints that aren't part of your API catalog.
I'm not fond of the term "shadow" in this context because it makes me feel these APIs are somehow related to others. Perhaps a better way to identify these APIs would be to call them "invisible" or even "concealed." But shadow? That doesn't make me feel like these APIs aren't known at all. While some of these invisible APIs have never been actively documented, there are others that have been marked as deprecated but still have traffic flowing through.
So-called "zombie" APIs are the ones that have existed at one point and have been actively marked as deprecated. However, somehow, during the process, no one has deactivated them. The APIs kept running, using infrastructure and other resources. Because consumers couldn't bother to migrate to better options, they kept making requests as before. The only difference is that no one is taking care of them. It's as if they've been abandoned but not disabled for good. To me, these aren't "zombies." Being a zombie means that you died and then came back to life. These APIs have never died, have they?
A better name for these APIs, in my opinion, would be "orphan." These are APIs that are still running, but they don't have anyone looking for them. Their parents stopped caring for them and even announced their estrangement. But they never went all the way to actually destroy these APIs. It could be that the teams in charge of these APIs were disbanded. Or, priorities have shifted elsewhere. Whatever the reason, one thing is true: these APIs were left running and have never been killed.
In conclusion, I feel the names "orphan," "shadow," and "zombie" aren't the best for what they refer to. A better option would be to call an orphan API "inactive," to use "invisible" to name a shadow API and to use "orphan" to refer to a zombie API. I know, naming things is complicated. But using ambiguous names doesn't help anyone. What do you think?
Can you share some insights into what eBPF entails for finding shadow APIs