“In the brain of all brilliant minds resides in the corner a fool.”

Aristotle

Writing about “Best Practices” can get boring, so I thought I’d take a break this week, and write about some bad engineering practices that I’ve found the absolute hardest to undo once done. Real foot-guns, you could say.

Each of these is a bit controversial in its own way, and that’s how it should be—I’d welcome any counter-views. The prose in this post is a bit more irreverent than normal—in most cases, I’m poking fun at myself (both past and present!), as I’ve been guilty of each of these foot-guns—and a lot of them, frankly I still struggle with. Hopefully this post will generate some “motivation through transparency” 🙂

Engineering Foot-gun #1—Writing clever code instead of clear code

It’s because optimizing is fun. https://xkcd.com/1691

It’s unreasonably fun to write clever code. We even came up with a clever name for it: elegant code.1Truly elegant code (here’s a wonderful example: https://norvig.com/spell-correct.html) is simultaneously clever and easy to follow, but there are a lot of great ideas that have truly awful failure modes (ex. anarchy, communism, monarchy, and poorly-executed elegant code has horrible failure modes and is more common than I’d like to admit). Who could say no to elegant code without appearing barbaric and ignorant? If instead we called it “smarty-pants code,” or “tricky code,” maybe we’d stand a chance of keeping away from it. But no—it’s elegant code—and elegance is just amazing and irresistible, all the time.

Here are some guilty pleasures that I’ve indulged in the relatively few times I’ve been cursed with a clever idea:

  • “I wonder if I can do this in a one-liner?” This never results in clear code.
  • fancy pointer-arithmetic routines
  • clever indexing
  • fancy map/reduces
  • recursion
  • even *gasp* ternary operators (rule of thumb: if the syntax causes you to pause for even a second to mentally confirm it works as you expect, it’s not worth it)

But my favorite self-argument about why it’s ok to inflict future pain on one’s unsuspecting colleagues is this one: “Well, it’s worth it because the code is more efficient.”

Efficient? If I’m honest with myself, half the time this isn’t true because the weird hack I’ve just written takes me off the happy path for any compiler optimizations in whatever language I’m writing in. And the other half of the time, the faster code I wrote gets called once per API call, so the user never notices the few nanoseconds I cleverly shaved off (purely for their benefit of course!).

If I’m really honest, this habit is hard to break because it’s hard to let go of a clever thought and not share it. And after all, what harm does some occasionally difficult-to-understand code really do? Well, a lot actually. One second-order effect of the great resignation is that more software engineers are reading more new lines of code for the first time than ever before. This means that the value of writing clear code has never been higher.2There’s a separate rant-motif I chose not to include in the main text of this post about how several of these stubborn habits are things I picked up in college. I’ll include these footnote-rants at the end of relevant sections for those interested.

In the case of #1 clever code—in an academic setting, you’re richly rewarded for cleverness that comes at the expensive of clarity. Clarity is appreciated (usually after a hundred years, when the cleverness of something old is not enough to keep it around), but cleverness will get you tenure.

Engineering Foot-gun #2—Not being willing to “throw it all away”

“Never form attachments to your code” they said. “Avoid the Sunk Cost Fallacy” they said. Hah! Easier said than done. I’ve found that sometimes it’s really hard to abandon a certain approach and start from scratch. This is especially baffling because the few times in my life where I’ve had the misfortune to completely destroy a draft paper or a day or two of code, rewriting it has been nothing short of pleasurable.

But somehow, deciding—willfully—to erase working code is hard. Rather than start over, I continue adding on layer after layer of hacks to my existing code until it resembles an unrecognizable frankenstein ruin, much like this orc from Lord of the Rings:

My code, after I try to save it four or five times. https://lotr.fandom.com/wiki/Gothmog_(Lieutenant_of_Morgul)

I think it has to do with some subconscious part of myself that likes to think that it’s impossible that the first solution I thought of wasn’t the perfect one! If only I were so lucky or smart.

Engineering Foot-gun #3—Creating abstractions prematurely

I think as hunter/gatherers, we were programmed to be absolutely paranoid about preparing for the future. That makes sense when you don’t know where your next meal is coming from. But this attitude is disastrous when you’re coding. Refactoring software is the cheapest activity humankind has ever invented, in terms of the ratio between initial effort and change effort. Doing it, even often, is not a bad thing.

I try to remind myself every time I start griping about refactoring code to please talk to Bonanno Pisano, the architect of the Tower of Pisa. Or talk to the scientists and engineers involved in the herculean efforts required to apply corrective fixes to the Hubble Space Telescope. These folks had it rough! Refactoring, even major refactoring, is simple by comparison, and these days, we have a lot of great advice on how to go about doing it. 3Academia is basically built around recognizing the creation of abstractions as valuable. This isn’t a bad thing in an academic setting (usually), but in the workplace, you don’t get extra points for creating an AbstractSingleton BuilderFactory—abstractions are only valuable in-so-far-as the output they produce or the simplicity they bring to the overall system.

Engineering Foot-gun #4—Not properly respecting the complexity of distributed systems

I always thought the RCAs we did for outages involving queues and microservices were tinged with irony. The causes sounded so…familiar:

  • “our users experienced delays in jobs X because of downstream service queues failing to process”
  • “a sequence of larger-than-expected messages on the queue were constantly retrying before going to DLQ, causing it to hang while responding to new enqueue requests”
  • “we lost messages because the queue was full”

What’s familiar about these? They’re the same problems I was trying to solve by going to a distributed system!4Ok, typical caveat, clearly there are times when distributed systems are necessary and clearly, messaging queues are necessary, especially if your load profile is extremely variable, or if you need many GB- or TB- sized space to hold the messages. RAM is truly great, but it can’t solve everything.

So let’s talk about distributed message queues. Why do I use queues? Well, because I want to drop off a message in some persistent place and return immediately, without waiting till it’s done processing. That is a great property. I have a buffer, and some breathing room if things go wrong.

But I haven’t actually done anything about the underlying problem. In true Goldratt’s Theory of Constraints style, I haven’t removed the underlying constraint. All I’ve done is introduced another system (actually two systems if you include the network) where that constraint can manifest (because after all, the queue itself can get overwhelmed and stop processing my requests…)

The siren call of distributed message queues is very great. It feels like the Elegant Thing to Do. But why not wait till this really is a BAD problem, before taking on the problems of a distributed system? And if it really is a major problem, why not look at threading or in-memory queues? Or parallelizing execution of the receiver?

I think part of the reason I have trouble shaking this one is because the costs are paid down the road. Usually, messaging queues handle the small amount of load you initially put on them beautifully—it’s only as the system matures that the faults in distributed system start manifesting themselves.

Engineering Foot-gun #5—Waiting too long to ask for help

The big problem with waiting too long to ask for help is that I end up solving my own problem without help. Wait, what? Why is this bad? Isn’t solving things myself best?

It actually isn’t. The most surprising and impactful technical things I’ve ever learned came after asking for help. I say surprising, because rarely was the thing I learned a direct answer to my question. It was usually completely unexpected.

Notice, I also said after asking for help. That’s because a lot of great conversations happen in the context of discussing a problem with other people, after it’s been solved. Take a look at these cases:

“Oh, you don’t actually have to do what you’re trying to do at all, here’s a really simple core function that does exactly what you were trying to do in this whole piece of code”

Or, “Yeah, so you’re looking for X. Just so you know, this is actually a specific case of this general class of problem called…”

Or, “Ah, yes you can fix it this way. Also, did you know this really interesting concept that this made me think of?”

I wrote a couple weeks ago about Why experience is primarily about removing your Unknown Unknowns. This is exactly what’s at stake in breaking this habit. Interestingly, now that I code less, and do more leadership, I think this one is probably even more true now than it was before.5Academia values originality above almost anything else. And you can’t be original if you ask for help. While it’s true that being original in the workplace does have true (sometimes tremendous) value, impact always trumps originality.

Let me know what bad engineering habits you’re trying to break! 🙂