Object-capability Model

The object-capability model is a really interesting model of programming that I don’t think is well taught and certainly not well practiced, nor even encouraged by most modern programming languages.

Think about the concrete actions that your application can take: making a web request, reading/writing to the filesystem, communicating with another process like a database, printing to the console, etc. We call these “capabilities”, and without them a program is basically useless.

Imagine trying to track down all the places in a code-base where these actions could be taken, because you’re trying to trace a bug or you’re trying to see what actions a library call might make. In most modern programming languages this is really hard. Let’s pick as an example just making web requests. You might start by searching for all references to HttpClient (or similar class in your preferred language).

But what if it uses a different HTTP primitive, like HttpWebRequest? Or what if the usage of the primitive is hidden inside one of the libraries you import, like a standard client library for a particular service, and you don’t have the source code available? It’s basically impossible to find everything, even for something this important and this ubiquitous.

The problem is that the “make a web request” capability is available statically to every part of your program, or any of its dependencies, via public classes like HttpClient. If I’m writing a piece of code that needs to make a web request, nothing can stop me from doing it. I don’t need to be given permission/access to this capability by any other part of the application, I don’t need to even declare in any discoverable way that I’m using this capability, so that others might be aware of my using it. It’s completely unpreventable and untraceable to the rest of the program.

The way that the object-capability model handles this is to dictate how parts of a program can access each capability, and you can use this to track usage of a capability across your program. The object-capability model defines four ways that part of a program can access a capability:

Initial conditions: The runtime of your program may make certain capabilities available to anybody/everybody. This can be a global capability like most language primitives, or the availability can be limited to only parts of the program by classes marked as protected, private or internal.
Parenthood: If an object A creates another object B, then A usually gets a reference to B automatically. This means that A can likely access any capability that B can access.
Endowment: If an object A creates another object B, then A can pass B any/all references it wishes, e.g. via the arguments of the constructor.
Introduction: If an object A has a reference to other objects B and C, then A can give B a reference to C via one of B’s methods.

The reason why it’s so hard to track the “make a web request” capability in most modern programming languages is that it’s widely available through initial conditions—it’s a public class available to the entire program.

Imagine instead if the only place that you were able to call the HttpClient constructor was in Program.cs, and you weren’t able to create subclasses. Basically, imagine if we limited the availability of HttpClient via initial conditions to a single file.

If any other part of our program wants to use HttpClient, it would need to use some other mechanism for transferring capabilities—parenthood, endowment, or introduction. Any library which needs a HttpClient would essentially need to request permission for one via a constructor or method argument. We could track the flow of HttpClients via constructor and method signatures, knowing that they can only possibly originate in Program.cs.

This is very similar to how dependency injection and inversion of control work in object-oriented programming languages, but we usually only use these techniques for our own code—the public classes in the language’s standard library usually don’t work this way.

Another key point in the object capability model is that references to objects should be unforgeable. Unless an object is given a reference through one of the four means above—initial conditions, parenthood, endowment, or introduction—it can’t possibly forge its own copy.

Most programming languages don’t allow you to forge object references. Some do, like how in C you can cast an arbitrary integer to a pointer to obtain a reference to an object/value. But most languages since C don’t allow this.

But not all capabilities are objects in your runtime, and not all references are object references. Think about access to a database or a web service endpoint, which usually have a connection string or a URL. This is a reference, but it’s just an ordinary string.

Your appsettings.json or web.config file is an initial condition for obtaining such a reference to a database or web service endpoint, and usually access to these values is limited to Startup.cs so you can track the flow of these references via constructor and method arguments. From the perspective of the object-capability model, this is great news.

Unfortunately they can also be hard-coded into the program source anywhere, or reconstructed from a template and environment name, etc. These references are forgeable—they break the rules of the object-capability model, and make it impossible to track usage of the capability.

Why does this all matter? So what if some arbitrary part of my code-base can make a web request without my knowledge? What’s the big deal? Well for one this could be a security problem if a library you depend on suddenly started exfiltrating data because it has full access to your filesystem and the internet. This isn’t a hypothetical threat either, something just like this happened with the event-stream NPM package.

It also makes it really hard to reason about program behaviour when you have no way of knowing which parts of your program have access to which capabilities. If your program has a bug and you identify the cause to be an unwanted call to another service or local process, and there’s no limit to which parts of your code-base could be doing this, it’s a lot harder to track down the source of the problem.

The object-capability model addresses these problems by limiting the ways that capabilities can be passed around your application to one of four methods: initial conditions, parenthood, endowment, and introduction. By further restricting which capabilities are freely available across our entire codebase via initial conditions, we can restrict which pieces of code can access each capability, track how the capabilities are passed from one class to another via constructor or method arguments, and have a better understanding of what our program is doing and where.