My favorite term for the sort of engineering work we need to do to avoid things getting "too bad" is one I learned from Camille Fournier: "sustaining engineering."
In a previous life as a director, I spent some time digging into the sources of what the team called "technical debt." I wanted to know what we might be able to plan for and what was always going to be a surprise. I ended up with three broad categories:
- Debt in the most limited sense of the original metaphor: we accept some less-than-optimal solution in exchange for faster time-to-market. This is ideally something we explicitly discuss with other stakeholders and make trade-offs together, like "the sprint after launch we will do some of the refactoring work while we're gathering initial customer data." This kind of debt fits nicely into Martin Fowler's 2x2 for technical debt. It is easy to assess metrics like impact/fix cost/contagion, and there are ways to contain it.
- "Debt" that comes from software aging: new library or framework or language runtime versions, updated best practices, changing tools, and the like. It is hard to fit this into Fowler's quadrants. It seems deliberate and prudent to use Linux servers, or CPython, or React. And yet the "debt" is, effectively, infinite. I have come to call this "technical depreciation," rather than debt: the raw materials of the software we create have a limited effective lifespan and need regular maintenance. This is what I typically call maintenance or sustaining engineering work.
- "Debt" that arises when a foundational assumption or requirement changes. These changes come from outside the source code: we reverse a business requirement, or we have a new regulation, etc. The identifying characteristic is that our original decision was not debt—it was a good and reasonable choice at the time. For example, one customer-facing website my team owned went from having a single, consistent look and feel to having visually-distinct subsections due to a shift in the business's strategy. A technical decision about how CSS assets were built and associated with each page, which had been a good one, suddenly became a problem. This usually requires some kind of refactor or rearchitecture. Calling this "debt" is not only wrong but can be damaging to the team.
These categories are distinct and disjoint: a decision to prioritize time-to-market over quality or maintainability has little to do with the reality that new versions of libraries come out or the new requirements imposed by the CCPA. As Chelsea Troy points out, labeling them all as "technical debt" tends to obfuscate more than clarify, and can have other negative outcomes.
For my original goal of assessing what we could plan for and what we could not, each category has its own answer:
- For proper borrow-against-the-future debt, yes we can, as long as it's deliberate. In theory, it can be a short-term plan made with other stakeholders, i.e. we will do X this week to ship and defer Y to next week, even though that will make Y bigger by a predictable amount.
- Depreciation is predictable—not always in the details, but broadly speaking. We can account for this work with Marty Cagan's version of 20% time. We dedicate, on average, some of our team's capacity to sustaining engineering work, which the engineering team should plan and prioritize.
- Can we plan for changing a foundational assumption? No, not really. Well, maybe sometimes, kind of, in a sense, if we're lucky. Perhaps Reverend Lovejoy said it best: "short answer: yes, with an 'if'; long answer: no, with a 'but.'"
The asterisk on the last answer is one of the harder challenges in software engineering, because it requires a great deal of business context. If we can predict which types of decisions are likely to change, we can make early investments to preemptively make those types of change easier for our future selves. Using the CSS example from earlier, if we thought that splitting apart the visual identity was likely, even if it had never happened before, we might apply the zero-one-infinity rule to support building and using multiple CSS assets, even if we only had one at the moment. We can combine the likelihood of the change with the predicted cost of the preemption and decide if it's worth it. There's even a pithy aphorism for when we decide against the investment: you ain't gonna need it.
Labeling all three categories as "tech debt," as Chelsea says," makes it easy to incorrectly assume the what the speaker means. Even worse, it tends to erase the distinction, which leads to our 20% engineering time to be eaten up by work that is properly part of delivering business value, not sustaining work. When we allow the paydown of proper "debt" or rearchitecting for new requirements to happen in place of sustaining engineering work, we're both hiding the actual cost and making an inadvertent trade.
Proper debt must be deliberate and planned for the trade to be acceptable. We won't do this product work now, so we will do it after launch. This might look like splitting and reordering tickets in a product backlog, or shrinking estimates but filing new refactor tickets in the backlog.
Sustaining engineering work can and should be planned by engineering. While some weeks may have, e.g., more library releases than others, we can measure our maintenance load over time to ensure it is stable or decreasing. If we have something like Cagan's 20% time deal, we must protect this time from the other two categories, and using distinct language for it helps.
Rearchitecting to support a new requirement is part of implementing that requirement. It should be planned as part of the product work, not separately. Calling this kind of work "tech debt" is a dangerous lie: doing so misrepresents the level of effort to change the requirement, and retroactively paints our decisions as short-sighted or mistakes. Over time, this leads stakeholders to underestimate the cost of big changes and erodes the engineers' confidence in their decisions.