This is an updated re-post of an article I wrote on the PKC Blog in June 2018
There’s always a reason to rebuild.
Perhaps you’re a CEO of a startup that has had some success and your engineers are clamoring to re-platform and do a rewrite from scratch. Perhaps you’re an executive or IT lead and you’re counting the cost of pulling the trigger on a rewrite of a legacy application. Perhaps you’re a lead engineer in the midst of a rebuild and are having second thoughts (am I crazy?). Regardless of where you’re at, you’ve likely realized that talk of rebuilds, like talk of tax reform or anarchy, is just a tad bit dangerous—you never know what kind of danger you’ll end up in if you actually convince yourself to go through with it.
But despite the fact that rebuilding goes against wisdom of some pretty well-respected folks (well, really just Joel Spolsky), rebuilds do happen. Sometimes they’re even the right choice. But even when they happen for the right reasons, rebuilds are mucho risky, and you should enter any rebuild with the foreknowledge that you very well may fail.
So, having “been there and done that,” I present, for your consideration, five red flags that conclusively show that no, you are not some crazy person, and yes your rebuild is doomed. 😉
Red Flag #1: No clear executive vision for the value of a rebuild
The rebuild must add new value. In fact, if your rebuild isn’t looking like it will demonstrate value in the first six months, you should view that as an existential problem. Why? Because rebuilds and major refactoring work are way too resource intensive to be done willy-nilly—systems that took 30 man years to write don’t get rewritten in 6 months—so unless you’re Google X and can afford to not see a RIO for 3+ years, you’d better reconsider.
If you’re an executive and you yourself aren’t sure what the value add is, it’s probably time to pull the plug, or if you’re sure you want this, do a design sprint with real users to capture this vision clearly and then re-evaluate if the rebuild is on track with what your users want.
Red Flag #2: You’re going for the big cutover rewrite
A cutover is easier. Switching that DNS record and BOOM seeing instant, sweeping changes in every facet of the app is aesthetically pleasing. The big cutover rewrite is the siren call of all rebuilds. It only works on very simple systems. There is a ton of prior work on why a more incremental approach to rebuilding is better, which generally boils down to the fact that a big cutover is the rebuild-equivalent of the waterfall approach.
If you’re an executive, you need to call your developers to the mat on this, because they will push hard for a big cutover. Of course, they’d never use those words, instead opting for phrases like “clean slate” or “re-architecting”, but it’s the same thing. Ask them—when can we see some of this code in production? If the answer is more than 3 months away…chances are they’re doing a cutover.
Red Flag #3: The rebuild has slower feature velocity than the legacy system
It’s pretty simple: if you’re simultaneously improving the existing legacy product while doing the rebuild in order to hedge your bets and maintain competitive advantage (a good idea), but your rebuild isn’t able to add those same features at a faster rate, the rebuild will never be done.
Your rebuild team needs to be composed of your superstar developers, who are aware and up to date on current technologies1because it’s really sad when your rebuild replaces 20 year old obsolete technology with 10 year old obsolete technology and who have the brainpower to grasp the complexity of what the legacy system has become (this interestingly seems to follow from a mix of Kernighan’s Law2“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” and the Peter Principle3“People in a hierarchy tend to rise to their ‘level of incompetence.’”).
Also, make sure your rebuild team(s) can sustainably make progress faster! Don’t base your judgment of velocity of the rebuild on the first 2-3 months of a rebuild. The start of a project always has great velocity because the complexity hasn’t been tackled yet, but it slows over time:
Red Flag #4: You aren’t working with people who were experts in the old system.
Any expert helps—the former developer of the old system, Marge from Accounting, or even a power user. These people are essential to a rebuild, because they’ll know all the gotchas that went into the app in the first place. Without these people, you’ll fall victim to Tesler’s Law, and end up producing a system that has less value than the legacy version.
Think of them as human test suites. They’ll discover subtle implementation errors in the rebuild within minutes that you would probably never catch.
Involve these people. Make them feel like they’re instrumental to the rebuild (because they are). Get their feedback early and often.
Red Flag #5: You’re planning to remove features because they’re hard
Let’s imagine you’re rebuilding a system that’s had some success and is actively providing value to users. It’s easy to fall into the trap of doing a rebuild with less features in the name of “streamlining” the build. But does that make sense? Yes, the legacy app may be slow and ugly. But think about it—your users are willing to put up with slow and ugly to use this system! This legacy code should be a source of awe and respect, not disdain. If you remove features that they were using, your users will hate you for it.4Along a similar vein, some of the first features to get mistakenly axed in a rebuild are instrumentation and tests. Those are usually there for a reason, especially the integration and business-logic tests.
Does this mean you should blindly copy the legacy system’s features? Of course not! Some features that were necessary with the legacy technology have become ossified. There’s tons of managed services provided by AWS/GCP/Azure that can take the place of old “built here” features. Also, a few features are probably failed and unused by any customer. Get rid of those. Just do not remove jobs to be done / entire stories, no matter how tempting they are.
If you feel like you’re stuck having to choose between copying an especially onerous old feature via a rebuild and adding new features, you have some options. One option is employ Martin Fowler’s strangler pattern, whereby you create new functionality within a rebuild that can simultaneously be integrated with the old functionality, thus preserving those features without having to recreate them.
Let’s Wrap it Up
If this article left you with one impression, I hope it’s the absolute necessity of maintaining a fanatical focus on delivering real, present value in production as soon as possible5this is also known in some circles as Time to Value: https://en.wikipedia.org/wiki/Time_to_value.
It’s critical for every rebuild to begin with a thorough business case to unearth and carefully sequence all the value-adding features. The core methodology of this could be a design sprint, customer focus groups, business canvas, jobs-to-be-done analysis, or anything else, really. The point is to do whatever possible to ensure that the new product still fits the market need in the way you expect and leaves room for innovation during the rebuild.