In the mid-2000s (so, in ancient history) a friend of mine went to an interview at a big tech company and was asked to write the JPEG compression algorithm on a whiteboard. He didn't know that algorithm off the top of his head, which he admitted, and then made up a compression algorithm on the spot. He said that he knew it wasn't great, but it should work.
The interviewer failed him out. He took the whiteboard and wrote the JPEG algorithm himself and said that's what he was looking for.
There's plenty to take from this story, but one less obvious piece is: this is not how knowledge works in programming.
It is very rare that we are tasked with implementing something that is purely algorithmic, like JPEG encoding or scrypt or a sorting algorithm. For all the various sorting options out there, the correct answer in >99% of real world cases is to use whatever method is built-in to the language or runtime and move on. I can think of twice-ish, in the past 20 years, that I've needed to implement some type of parser. The first time I was younger and brute-forced my way through it until I realized what I was doing. The second time, older and wiser, I used a Parsing Expression Grammar (PEG) parser generator and just wrote out the grammar and tools to work with the resulting syntax tree.
I "know" how to write a parser or implement something inefficient like bubble sort or insertion sort (though not how to compress images). I also know that, most of the time, I shouldn't do that.
Most of these purely-algorithmic problems are, as a practical matter, extremely complex. Parsing code is hard to follow in even trivial cases, requiring state machines and typically involving multiple concurrent processes. JPEG encoding makes use of perception science, linear algebra, and additional compression techniques. At the end of a day working on the parser, even with the PEG generator, my brain was fried. There's a limit to human capacity to hold complex, abstract systems in our heads. While that limit may be higher or lower for some folks, it is always finite.
Fortunately, a solution lies in "structured programming" itself: a subroutine.
When a purely-algorithmic problem does come up—or, in practice, part of a business need is reducible to a purely-algorithmic problem—we can find libraries in most languages to help.
Human language is often too ambiguous to describe complex—or even simple—algorithms. That's why writing standards is hard, and why we restrict human language with tools like RFC 2119 (the definitions of "SHOULD, MUST" etc that are referenced in nearly every other IETF RFC). Human language also has the idea of a subroutine. Instead of saying "then we take the bytes from the bitmap, and for each 3-byte sequence after the header bytes, save the first byte as
R, the second byte as
G, and the third byte as
B; then we calculate
Y = Kr * R + Kg * G + Kb * B, where
Kr is..." we can say "we convert the image to a JPEG."
As Alan Turing said, the subroutine, whether in human or computer language, "buries" the details. We can't see them, but we don't need to. In more modern terms, we'd say the details are encapsulated.
So What Does This Mean for Teams?
A few of my teams over the years have attempted to make documentation a consistent practice—even going so far as to set goals around it. This is so accepted and seen as virtuous that it's hard even to add nuance or suggest that "more documentation" may not always be helpful.
Documentation has its own inherent challenges. It tends to fall out of date unless maintaining it is someone's explicit responsibility (and even then...). This is the source of a lot of disagreement over code comments. It is yet another thing to maintain, which takes some of the team's time.
It is also, as we've seen, difficult to write effective documentation of complex algorithms. We have tools like pseudocode, flow diagrams, and BNF and its derivatives—each of which requires learning something, besides the thing we're documenting, to use—to help where human language is weaker.
Finally, complex algorithms are difficult to hold in a human head all at once. Adding all of the math of color-space conversion and discrete cosine transforms to whatever our problem and business rules are is impossible for most people.
So we encapsulate it. Encapsulating means more than organizing the code, it means "burying" the details. The code embodies the knowledge—and often all sorts of background knowledge, like that human eyes are more sensitive to variations in brightness than hue, or that Fourier transforms can be done quickly—and allows us to rely on that knowledge without actually knowing it. Code can even incorporate more knowledge over time, as we learn about its performance characteristics and discover shortcuts, or handle new types of errors.
Creating this kind of encapsulation is not just a method of sharing the knowledge it took to create, it's among the most effective methods. When it's done well, the people using the knowledge don't need to know it. That lets them spend less time learning the minutia, and hold more relevant details in their heads and focus on new value. This is true whether we're talking about general algorithms from computer science, or the details of business rules.
Don't stop documenting things. Instead, consider how encapsulated code—whether in a packaged library, or as a deep module within a larger project—can contribute to the goal of sharing knowledge. Is the that is difficult to document, or keeps falling out of date, something that would be better served by being a (possibly annotated) library? Is this something team members should need to learn in order to use, or should need to hold in their heads, or is it something we can encapsulate?
An Addendum on Generality and Interfaces
Sometimes I get into trouble with the word "general." In my math background, "general" means "everything that's not special." Saying a problem can be solved for certain instances "but not in the general case" means it can't be solved for arbitrary input. But in colloquial English, "general" usually means, well, "usually." I don't generally go to that store, except when I do. So when I say "general" in technical settings, I usually mean "everything," where a lot of people hear "most things."
Here, I mean "everything."
There are times to solve a general problem and times to solve a specific instance of it. On one team, we were working on syntax for filter predicates based on Google's Common Expression Language (CEL). (E.g. give me a list of users where
organization = "foobarbaz".) We knew our first use for filtering was going to be
ORing together a number of potential values for one field, like
org = "foo" OR org = "baz". So, to get the product into production quickly, we implemented support for exactly that, and no extra syntax.
const org_values = filter_string. split("OR"). map(v => v.split("="). trim(" \"") )
Independently, we worked on a more general solution that would handle any filter syntax (
(org = "foo" OR org = "bar") AND joined < "2021-01-01") and represent that predicate in a usable way. We put this work in because the general solution:
- would be reusable by other teams, and
- would support future use cases we didn't know yet.
There are trade-offs to saving time by implementing a specific case over the general. The most obvious is that it isn't reusable—each specific case will need its own implementation, where a general solution can often be dropped in and handle any particular case. So if something is a specific example of a common need, building (towards) general solutions is a better investment.
A less obvious trade-off is that you may miss important aspects of an API that could encapsulate a more general solution. In the previous example,
org_values represents a list of possible values for
org. However, because
OR is implicit, there's no way of representing
AND, and because the values have to apply to
org, there's no way of representing
joined < "2021-01-01". Having the "parse the filter" method return a list of values for a single field, instead of a data structure that can handle more general cases, meant that while we could be faster to implement it, we then had to do more refactoring work to accommodate the more robust data structure later.
In the case of our filters, a more general output from parsing the filter would have had to explicitly include the operators (
=) and the operands (
"foo"), even if we threw three fourths of those away. Something like
That's a lot more complexity, because it has space for the inherent complexity of the potential filter expressions. While we had a sense of what this would need to look like, we weren't sure until we got into the general implementation. At that point, we had to restructure our consumer, which couldn't assume the
= parts anymore.
This doesn't always happen, though. Sometimes—especially if you know what an implementation in another language looks like—you can get the API right, or at least "right enough." Let's say I wanted to, for some reason, encode protocol buffers myself, but I only needed to handle a specific case:
This isn't a great API for a general implementation of the encoder, which should take an arbitrary data structure (within the limits of what protobuf can handle). But at least the output is usable as is, and we can replace our implementation with the more general version:
The takeaway here is that while there are definitely times to go with the faster partial implementation of something, the trade-offs can be more costly than they seem at first glance. If enough is known about the problem domain in general, it's possible to reduce the costs later by building interfaces, if not implementations, that support the general case.