Skip to content
Ryan de Melo
Go back

Platform Engineering: Paving Roads vs Building Cages

The fastest way to kill an internal platform is to make it mandatory before it is good. I have watched it happen, and I have nearly done it myself.

A few years back I owned an ML platform that a couple dozen product teams ended up building on. Feature store, model serving, the vector and embedding pieces as they came along, the deploy path that took a model from a notebook to something serving real traffic. The thing people remember about it now is that it got used. What they do not see is how close it came to being the thing everyone routed around.

Here is the part nobody tells you. The day your platform team starts measuring adoption by mandate compliance is the day you have stopped building a road and started building a cage. The two look identical on an org chart. One team owns the path, other teams are on it, the dashboard is green. The difference is entirely in why people are on the path. On a road they chose it because it was faster. In a cage they are there because the fence went up and the old route got a ticket queue bolted to it.

The road was already there

The teams I served were not waiting for me to invent a workflow. They already had one. They pulled features out of warehouse tables by hand, trained on a borrowed GPU, and shipped a model by copying a pickle file somewhere a server could reach it. It was ugly and it mostly worked. (The pickle file living in an S3 bucket nobody owned was a war crime, but it shipped models, which is more than most platforms can say in their first quarter.)

A paved road follows the path people already walk. That sounds obvious and almost nobody does it. The temptation is to design the workflow you wish they had, the clean one from the architecture diagram, and then stand at the trailhead explaining why their cow path is wrong. You will lose. People do not abandon a route that works for one that is correct in theory. They abandon it for one that is faster on the exact task they were already doing.

So the feature store did not start as a feature store. It started as one query that fetched the features three teams kept rewriting, returned the same answer, and was faster to call than to rewrite. That was the whole pitch. Not governance, not lineage, not a single source of truth. Faster than the thing you are doing right now, on the thing you are already doing.

When the road becomes a fence

There is a specific moment a platform team turns into a gatekeeper, and it is rarely a decision anyone announces. It arrives as a reasonable-sounding sentence in a planning doc. “All new models must deploy through the platform.” Said out loud it sounds like maturity. What it actually means is that you have given up on being chosen and decided to be enforced.

The tell is the conversation that follows. Someone has a model that does not fit the golden path, an odd serving shape, a latency need the platform cannot hit, a dependency you never supported. Before the mandate, that is a feature request and a happy customer telling you exactly where the road needs to extend. After the mandate, that same person is a problem to be managed. They want an exception. You are now in the business of granting or denying exceptions, which is to say you are now a checkpoint, and every checkpoint has a queue, and every queue grows a shadow copy of your platform built by people who got tired of waiting.

I have been the bottleneck. Not because the platform was bad but because we made ourselves the only door, and then could not open it fast enough. The mandate did not increase real adoption. It converted enthusiastic users into resentful ones and pushed the clever ones underground, where they rebuilt the exact thing we were trying to standardize, just without us watching.

The only honest test

The golden path versus mandate tension never fully goes away, and I do not think it should. Some things genuinely have to be enforced. How you handle data residency, how you log access to a model that touches money, the audit trail a regulator will ask for: those are not opinions, and pretending they are golden paths people may opt into is how you end up explaining yourself to a regulator. Fine. Enforce those. But be honest that you are building a fence, build the smallest one that does the job, and pave a genuinely good road right up to it so the fence is the last thing people hit, not the first.

For everything else, there is one test, and it is the only one I trust. Is the paved road faster than asking a human, and faster than rolling your own?

That is a brutal bar and it is supposed to be. It means your self-service path has to beat a Slack message to a helpful colleague, which is the real competition, not some other platform. It means it has to beat a motivated engineer’s afternoon of doing it themselves, because that engineer will absolutely spend the afternoon rather than learn your abstraction if your abstraction is slower or more confusing than the raw thing underneath. If you cannot win on speed, you do not have a platform. You have a tax with a logo.

The platform that got used passed that test, mostly, and where it failed the test it lost, fairly, to a human or a hand-rolled script, and that loss was the most useful signal I got all year. It told me exactly where to pave next. The version of me that wanted to write a mandate would have buried that signal under compliance and called the dashboard a success.

Measure your platform by what people would do if you removed the requirement to use it. If they would stay, you built a road. If they would scatter, you built a cage, and the only thing your mandate was protecting was your own headcount. Which one do you actually have?


Share this post:

Previous Post
Llama 2 Is Here. Should You Self-Host?
Next Post
The Real Cost of a Customer Data Platform