System Initiative

(Photo is from Flickr, under CC BY-SA 2.0)

System Initiative is a transformative new power tool that reimagines what’s possible from DevOps. It’s an intelligent automation platform that allows DevOps teams to build detailed interactive simulations of their infrastructure and use them to rapidly update their production environments. If you haven’t seen it yet, go watch our demo video, then come back here and be nerdy with me. I’ll wait.

When Alex Ethier, Mahir Lupinacci, and I started building System Initiative in 2019, we were guided by a conviction that to help DevOps teams reach their full potential, our industry needed a fundamental rethinking of how the technology stack works. We started building and iterating relentlessly: each design we thought might be better would eventually turn out to be flawed in some critical way - usually, it simply wouldn’t be powerful enough to evolve into something that could work for the gnarliest problems in the world. It took us 2 1/2 years to discover and then build a system we believe has the potential to be an order of magnitude better than the status quo.

Here are the five breakthroughs we discovered along the way:

Full fidelity modeling

It was clear that DevOps work was ultimately suffering from a user experience problem. Our first take on how to make it better was to simply stop making people specify the same thing a million times. Could we make it so that a user specifies only the high-level constraints they care about and let the system solve for the rest?

The first prototype modeled asking AWS for a server, with all of its required configuration, from a minimal expression. You could say something like:

Server.create({
  Constraints: {
    “MemoryGIB”: 4000,
    “Cpu”: { “cores”: [“gte”, 4], “instructionSets”: “arxiv” },
    “OperatingSystem”: { “platform”: “ubuntu” },
    “SshKey”: { “name”: “Super Root Key” },
  },
  Properties: {
    “Name”: “starsky”,
  }
})

And we would use a constraint solver to choose the right instance size, look up the correct AMI, and create any dependent AWS objects you might need (like an SSH key, for example.) We discovered that this prototype only worked if the models were incredibly detailed - the more detailed we were, the more interesting constraints you could specify, and the easier it was to work with. These full-fidelity models also knew how to take action - they could create themselves, delete themselves, etc. It was pretty cool!

Unfortunately, the trouble with this approach was that people needed a way to see what the system would “guess”, and correct it by adding more constraints. You could narrow down the correct solution, but in the end, people care more about the configuration being right than being minimally specified. When you compared the core loop of using our initial prototype to using existing Infrastructure as Code or Configuration Management tools, it wasn’t really that much better. It was a more intelligent form of automation, to be sure, but the user experience was still basically “write code, see it fail, tweak it, apply it”. Only now with a bunch of magic in between that you could only influence rather than control directly.

Visual Interfaces

Luckily, our co-founder Alex Ethier had a long career in the visual fx industry before he became an entrepreneur. Alex knew that any system composed of complex models and relationships could be expressed with a visual interface, similar to the ones found in tools like Blender or Unreal Blueprints. He designed one that he thought would do the trick, and we got to work building it.

His design solved many of the issues with our constraint-solving prototype. For example - we could now make expressing the relationships between models as easy as drawing a line on a diagram. We realized that you no longer needed the constraint solver - it was faster and easier to simply express what you needed visually, and let our full-fidelity models infer the configuration. We could then generate whatever code we needed directly from the model.

The very first time this worked end to end was with a model of deploying a simple web application to Kubernetes. It had a deployment, some secrets, volume mounts, services, etc. I built it from scratch in a very early version of System Initiative in 5 minutes and 38 seconds. I didn’t look at any documentation. I didn’t make any typos. I was simply guided by the 1:1 model and a visual interface that let me compose it. It was glorious enough that I remember exactly how long it took. (If you’re curious, similar tasks take considerably less time in today’s version!)

Integrated change control

We could combine multiple models, infer much of their configuration, and automatically generate code for you. But we still needed to figure out how to apply it to real infrastructure - and have multiple working versions simultaneously.

Simply outputting traditional Infrastructure as Code to a git repository was an easy answer, but it brought with it a ton of user experience pain. CD Pipelines take a long time to execute. Code review is a lot of things, but “conducive to collaboration” isn’t on the list (and so much of the user experience pain in DevOps work is caused by our inability to collaborate with the experts we need.) Anything we generated would need to be refactorable and able to integrate into the many complex code bases that already exist.

Our answer was to integrate change control directly into our models. We would make the models themselves aware that they could be in multiple states at the same time. That way, you could visually model what you thought would work without impacting real-world resources. We could then show you not only what you propose but how it is different from the real-world system.

It also eliminated the need for an infrastructure pipeline. You would simply express your configuration and the actions to take in a change set. When you apply that change set, System Initiative will reconcile your intent with the real-world state. It was like git branches but with all the semantic knowledge afforded by our rich modeling. We could even eliminate the need to do things like ‘rebase’ your branch - when a change set is applied, we can automatically update every open change set to reflect it.

Fast feedback loops

Once we could propose changes to our models in a hypothetical way, we realized that we needed to give you much faster feedback about whether your configuration would work. Why wait until deployment time to learn that you have a problem?

The answer was to start “qualifying” our models in advance: every time you update a value on a model, System Initiative will validate the value and then qualify the entire model, allowing us to catch errors immediately. For example, when we model running a docker container we qualify that the image name you specified exists in a registry you have access to. How many times have you written infrastructure as code, but typo-ed the container name? How long did it take for you to realize it? When you combine qualifications with the visual interface, we suddenly give you near real-time feedback on your configuration. Essentially integrated test coverage!

At this point, we knew we had something special. We could write highly detailed, intelligent models, describe them in a visual interface, and get fast feedback on their fitness. Sounds like a product worth having, eh? For the briefest of moments, we thought we had it nailed.

A hypergraph of functions

When we started creating more models, we realized our approach to writing the models wasn’t scalable. To write a model, you had to swap out of the platform into a development environment, write a combination of TypeScript and Rust code, compile System Initiative, then switch back to the visual interface and see if your model was good or bad. This is basically what you have to do if you decide to extend any infrastructure as code tool with new functionality - hop out of whatever DSL they have offered, and extend it in their implementation language.

But it was a lot to ask if all wanted to do was write a custom qualification that encoded your security or compliance policy. If you made a mistake or your model didn’t behave how you wanted, the turnaround time was simply too long.

We realized that without ubiquitous, integrated customization, no matter how much effort we put into the user experience, System Initiative would fall short. It simply wouldn’t be powerful enough to solve the problems we knew we would encounter when working with larger companies and more complex environments. We needed to give users the ability to customize every aspect of how the models worked, from within System Initiative itself, so that they could not only use change sets to set up hypothetical configurations but also test new models and tweaks to existing ones.

The solution was to rebuild the model around a giant hypergraph of functions. We stopped having values written directly to our models’ properties. Instead, we model every property, validation, qualification, and action as a TypeScript function. These functions take inputs from other elements on the graph, and they emit the correct value to store on the graph. When any input to a function has changed, System Initiative will automatically update all the dependent values.

This system was difficult to build but easy to use. When everything is a function whose inputs come from change set aware models, we can automatically recalculate the values when things have changed (like a very dope spreadsheet.) We could add new function types easily, and their behavior was predictable. We added an integrated function editor (essentially an IDE,) and now any user can program System Initiative from within System Initiative.

System Initiative is a power tool

We think that the biggest barriers to a better DevOps workflow, and therefore to unlocking better outcomes, all live in the user experience. The combination of full fidelity models, described with a visual interface, integrated change control, and fast feedback loops - sitting on top of a hypergraph of functions - lets us deliver that experience without sacrificing any of the power or control that we get from traditional infrastructure as code tooling. We didn’t just make the ‘interface simple’ - we streamlined the experience of doing the work with a power tool that is seamlessly extensible to the needs of the most complex environments.

It’s early days - we have a lot more to build. But I’m so excited to work with you to take these foundations we’ve iterated on into real-world environments and see how they can be scaled and improved.

Five breakthroughs on the path to System Initiative