Archive for May 2011

All systems suck

I’ve been thinking a lot about this idea lately.  I’ve spent a lot of years as an engineer and consultant fixing other people’s systems that suck, writing my own systems that suck, and working on legacy systems, that, well, suck.

Don’t let anyone fool you.  All systems suck, to a greater or lesser extent.

If it’s an old system, there’s the part of the code that everybody is afraid to work on: the fragile code that is easier to replace than maintain or refactor.  Sometimes this seems hard, or nobody really understands it.  These parts of the code are almost always surrounded by an SEP field.  If you’re unfamiliar with the term, it means “Somebody Else’s Problem”.  Items with an SEP field are completely invisible to the average human.

New systems have the parts that haven’t been built yet, so you’ll hear things like “This will be so awesome once we build feature X”.  That sucks.

There’s also the prototype that made it into production, a common problem.  Something somebody knocked together over a weekend, whether it was because of lack of time, or because of their utter brilliance, is probably going to suck in ways you just haven’t worked out yet.

All systems, old and crufty or new and shiny, have bottlenecks, where a bottleneck is defined as the slow part, the part that will break first when the system is under excessive load.  This is also part of your system that sucks.

If someone claims their system has no bugs, I have news for you: their system sucks.  And they are overly optimistic (or naive).  (Possibly they just suck as an engineer, too.)

In our heads as engineers we have the Platonic Form of a system: the system that doesn’t suck, that never breaks, that runs perfectly and quietly without anyone looking at it.  We work tirelessly to make our systems approach that system.

Even if you produce this Platonically perfect system, it will begin to suck as soon as you release it.  As data grows and changes, there will start to be parts of the system that don’t work right, or that don’t work fast enough.  Users will find ways to make your system suck in ways you hadn’t even anticipated.  When you need to add features to your perfect system, they will detract from its perfection, and make it suck more.

Here’s the punchline: sucking is like scaling.  You just have to keep on top of it, keep fixing and refactoring and improving and rewriting as you go.  Sometimes you can manage the suck in a linear fashion with bug fixes and refactoring, and sometimes you need a phase change where you re-do parts or all of the system to recover from suckiness.

This mess is what makes engineering and ops different from pure mathematics.  Embrace the suck.  It’s what gets me up in the mornings.