Changing the Meaning: Breaking Semantic Changes

A zoomed out photo of a dictionary open to a random page. The text is mostly blurred.
"dictionary" by Stock Catalog on Flickr.

The subtlest kind of breaking change is the kind that doesn't affect a value's type but instead affects what that value means. These changes can be hard to notice when you're making them, and even harder to debug when they happen. They can lead to downstream effects like broken reports—or worse, simply wrong ones.

I'll walk through an example of how a change can be semantically breaking and how we can avoid the breaking behavior, or decompose it to ensure a window of compatibility.

Let's Throw a Party

Let's start with a system for managing real world events—think Partiful, Facebook or Discord events, Paperless Post, etc. Maybe it's a feature of a bigger product or is the whole product. The major pieces we need are:

  • A model of an Event (name, description, time, place, plus-one rules, etc)
  • A model of a Person or account (name, contact info)
  • Relationships between an Event and multiple People for hosts and guests

When we built this service (which might be an internal subsystem or part of the backend of an app) we created a few REST endpoints to support the UI we had at the time:

  • GET /event/<uuid> - gets details about an event
  • GET /event/<uuid> - update event details
  • GET /event/<uuid>/attendees - gets a list of everyone attending an event
  • POST /event/<uuid>/attendees - RSVPs to an event
  • PUT /event/<uuid>/attendee/<person-id> - updates an RSVP

This worked fine for a while, but now we're noticing some pain points. At first, we didn't have places where we really needed to separate "Guests" and "Hosts," but we increasingly do. We've had some bugs where Hosts were able to RSVP "no" to their own events, and had to put additional checks in place in the POST and PUT handlers.

Signs that Something is Wrong

A good sign that something is not defined well is that we need to start doing a lot of "unless" checks. In the RSVP endpoints, we might have ended up with code like:

def create_rsvp(request, event_id):
    event = Event.get(event_id)
    user = request.user
    payload = request.json

    # make sure the user isn't a host and saying no
    if user in event.hosts and payload["rsvp"] == "no":
        return JSONResponse({"error": "must attend own event"}, status=400)

By itself, that's not too much of a code smell. But then the UI also starts needing to make exceptions:

const GuestList = ({ attendees, hostIDs }) => {
  return (
    <ul className="guest-list">
      {attendees.map((attendee) => {
        if (hostIDs.includes(attendee.id)) {
          return null;
        }
        return <li>{attendee.name}</li>;
      })}
    </ul>
  );
};

Or if we want to be able to sort, keeping Hosts at the beginning:

const SortableAttendees = ({ attendees, hostIDs, orderBy }) => {
  const hosts = [];
  const guests = [];
  
  attendees.forEach((attendee) => {
    if (hostIDs.includes(attendee.id)) {
      hosts.push(attendee);
    } else {
      guests.push(attendee);
    }
  });
  
  switch (orderBy) {
    case "name":
      hosts.sort(orderByName);
      guests.sort(orderByName);
      break;
    case "rsvpDate":
      hosts.sort(orderByRSVP);
      guests.sort(orderByRSVP);
      break;
    default:
  }

  const allAttendees = hosts.concat(guests);

  return (
    <ol>
      {allAttendees.map((attendee) => {
        <li>{attendee.name}</li>
      })}
    </ol>
  );
};

Once we start needing to split this list or add "special" cases over and over—to the point where they're clearly not that special—that's a good sign that something is up with our definition.

Changing the Definition of "Attendee"

It's reasonable to decide in this situation that our definition of "attendees" as everyone attending an event instead of as the guests attending an event is no longer working for us. (It's tempting to say that our definition was a mistake—I even described it that way in the first draft here. But I prefer to assume we made the best decision we could at the time. What's more important is that, right now, it's causing more work-arounds than benefits.)

But attendees is a list of people! Its type is Person[] or Array<Person>. No problem, we'll just stop including the hosts:

class Event(models.Model):
    @property
    def attendees(self):
        # delete the 'hosts' part
        return self.hosts.all() + self.guests.all()

def get_attendees(request, event_id):
    event = Event.get(event_id);

    return JSONResponse({"attendees": event.attendees})

After all, it seems like the front-end should handle this fine, right? The types didn't change, so all of our components still work. We'll stop hitting our "special case" branches, and we can remove those slowly.

Hold on...

Except, wait. <SortableAttendees /> assumes you have everyone in the same list. Well, ok, we'll change it. We can pass in the hosts as well, instead of just host IDs:

<SortableAttendees attendees={event.hosts.concat(event.attendees)} />

But, hold on, that'll put the hosts in the list twice until the backend change goes out. No worries, we'll deploy them at the same time. Well, the front-end takes a little longer than the back-end to deploy. So maybe we'll use a feature flag and switch the front-end and the back-end at the same time? Then only people who already had the front-end loaded should see...

And weren't we also using attendees for some analytics, where reports had to do:

num_guests = len(attendees) - len(hosts)

in some Airflow code? If we take hosts out of attendees, all our reports will under-count guests!

Semantics are Breaking

Once a meaning has been set—the definition of members of a set or list, or the acceptable range of a quantity, the units of a quantity, maximum lengths or sizes, etc—systems quickly begin to grow around that definition and work with it. Changing the meaning of data, even when it seems backwards compatible from a types perspective, changes how consumers need to interact with that data.

Alternatives and Decomposing the Change

Since we can't make this shift in a single deploy, we need to either 1) find a way to decompose this into a three-legged deploy, or 2) find a way to avoid changing the definition at all.

Decomposing a Definitional Change

Whether or not decomposing a change like this is a viable option will depend on how big the team is, how wide the use of the definition is, and how long of a time frame is acceptable.

Following the general principle of a 3-legged deploy, we'll want to:

  1. Provide a new thing,
  2. Update consumers to use the new thing,
  3. Remove the old thing.

One option here might be to extend the "Person" type in the response to include a "host" boolean or "attendee type" value:

interface Attendee extends Person {
  isHost?: boolean;
}

// change the type of "attendees" from Person[] to...
type Attendees = Attendee[];

(This is not always a safe change. In Typescript, or in plain JSON, it's OK to extend a type like this, because arrays are covariant, i.e. anything that takes a parameter like Person[] will also accept Attendee[], since Attendee extends Person. And, because isHost can be optional, the substitutions can actually go both ways. It depends on the languages involved.)

The new isHost value will make our lives a lot easier. For step 2, we can start updating consumers, like the UI:

type OldParams = { attendees: Attendee[], hostIDs: number[], orderBy: string };
type NewParams = { attendees: Attendee[], hosts: Person[], orderBy: string }

const SortableAttendees = (options:NewParams | OldParams) => {
  const guests = options.attendees.filter((attendee) => attendee.isHost);

and the Airflow code:

num_guests = len(
    # use getattr() because we know this is temporary
    filter(attendees, lambda attendee: not getattr(attendee, "is_host", False)
)

Being explicit here gives us a way to start moving our APIs toward the new definition. <SortableAttendees /> can take either a list of Hosts, or a list of the Hosts' IDs. num_guests is no longer subtracting anything, its definition is now consistent with the new state we want to move to.

In step 3, once we're no longer relying on Hosts being in Attendees anywhere, we can remove the Hosts. Finally, when all entries in Attendees have isHost set to false, we can return to the Person[] type, and remove isHost entirely.

Expand the Interface instead of Changing it

However, even though it's possible, that may be too much work. It may be hard to coordinate making forward-compatible changes like getattr(attendee, "is_host", False) across several different teams.

So perhaps the best alternative in this scenario is to avoid changing the definition at all. Instead, we can create a new API endpoint, /guests. We can update our front-end to use the new API endpoint over time, making the necessary change to each component.

This is almost a 3-legged change, except that we don't really worry about the last step. "Attendees" is still "everyone attending the event," and "Guests" is now "all the Guests."

If Event data is published to a message broker like Kafka, we might extend those messages with the new Guests list, but we won't ever remove the Attendees list, because it would break the contract of those messages.

Later, we might see that there are no more requests to /attendees, at which point we can decide to deprecate and ultimately remove it. That's a separate change for future us, though.

Avoiding the Problem

Once we find ourselves in this situation, there's going to be some work involved, one way or another. It would be great if we could avoid it next time. We can't always predict where our needs will change, but we may be able to give our future selves a hand by making different design decisions and different promises early on.

In the case of Event Attendees, in retrospect, we ended up needing special cases because we didn't give ourselves a good way to distinguish two classes of entity: Hosts and Guests. Since we already know that they are different, in our combined list, we should create some signifier, even if we don't think we need it right now.

Imagine that we always had the isHost flag. The <SortableAttendees /> list could take fewer parameters. It might need to do the same list splitting—to encapsulate the "sorted by hosts first" behavior—but it could do so in a more explicit and more natural way. The Airflow code might have never had to subtract anything, and could instead have been written more simply:

num_guests = len(filter(attendees, lambda a: not a.is_host))

For a completely different example, consider that we have been recording a user's height in inches. We don't plan to expand into Canada or the EU right now, but since this is a quantity that we know has different possible units, we can include the unit from the get go:

type Height = number; // no units, inches is implicit in the definition

// instead, specify the unit, even if there's only one possibility now
interface Height {
  size: number;
  units: "IN" | "CM";
};

This doesn't guarantee that no one will misuse the height values—in fact, most consumers probably will ignore the units attribute. But it does mean that when the business strategy shifts and we do expand into Canada, our changes can be much smaller. We've already needed to set the units explicitly, and the data is already present for us to read wherever the height is.

// with Height = number
// will turn 173cm into 14' 5"
const HeightDisplay = ({ height }) => {
  const feet = Math.floor(height / 12);
  const inches = height % 12;
  return <span class="height">{`{feet}' {inches}"`}</span>;
};

// with Height = { size: number; units: "IN" }

// we might ignore units for now, but at least if we're
// doing something like <HeightDisplay {...height} />
const HeightDisplay = ({ size }) => {
  const feet = Math.floor(size / 12);
  const inches = size % 12;
  return <span class="height">{`{feet}' {inches}"`}</span>;
};

// then we're at least *encouranged* to consider fallbacks
const HeightDisplay = ({ size, units }) => {
  if (units === "IN") {
    const feet = Math.floor(size / 12);
    const inches = size % 12;
    return <span class="height">{`{feet}' {inches}"`}</span>;
  }
  return <span class="height">{`{size}{units.toLowerCase()}`}</span>;
};

It's not a perfect shield against a change breaking something in the future. But by being explicit, and by not making implicit promises—e.g. that height will always be in inches or that price will always be in USD—we have given ourselves room and mechanisms to grow. And we've given consumers a heads up that this could change, so it may be in their best interest to assume it will.