Digging into "that" Python error
I think this is my most popular tweet ever:
>>> foo = ([],)
>>> foo[0] += [1]
TypeError: 'tuple' object does not support item assignment
>>> foo
<<< ([1],)
— James Socol (@jamessocol) February 12, 2015
I’ve known about this little quirk for a while, but I shared it because it still amuses me. It shocks and confuses people. Some people tried to make sense of it
@jamessocol Like this:
i=([],)
f=i[0]
f+=[1]
i
([1],)
Surely that's equivalent, & it works. Like you said, corner case.
— Ramon Ferrandis (@ramoncreager) February 14, 2015
And some were just appalled
@jamessocol Thanks for sharing. This is the first Python quirk which makes me think of PHP/JavaScript.
— Eason Chen (@easoncxz) February 14, 2015
So what actually is going on here? To answer, we need to take a dive into the world of CPython opcodes and spend some time with the dis
module.
Let’s start by disassembling and digging into a very simple set of statements:
And the output:
The left-most number is the line number (line 1 is def f1():
) then an offset we’ll ignore, then the opcode, then some information about the arguments to the opcode.
All of these opcodes operate on “the stack,” and most often on the value on the top of the stack, TOS
. We load values onto the stack from constants, functions, and in-scope variables, and we can store values into in-scope variables. Sometimes we’ll deal with a couple of values at once, which we’ll call TOSn
, where n
is the depth in the stack. Since the stack is, after all, a stack, we can’t operate on TOS1
unless we’re also operating on TOS
.
Line 2 has two operations: LOAD_CONST
, which pushes the constant 0
onto the stack. Then STORE_FAST
takes the value off the top of the stack and stores it in the local variable a
, which is a human name for variable number 0
.
Line 3 is a little more interesting: first we load the value from a
onto the stack (LOAD_FAST
), then load the constant 1
on top of it (LOAD_CONST
again). The INPLACE_ADD
operation pops the top two stack values, executes __iadd__
, and pushes the result back onto the stack. Then STORE_FAST
takes the value off the top of the stack and stores it back into a
.
The last two opcodes are the implicit return None
from this function, so we’ll ignore those.
There are a couple of interesting things to note here: one is that INPLACE_ADD
is actually less “in-place” than we were lead to believe. It still loads the value onto the stack and then stores a new value. It is not single atomic operation, but rather 4 operations. Another is to recall that number types, like strings and tuples, are immutable.
Immutability is going to be very important, so keep that in mind. There is no integer method that changes the value of an integer. Any time we operate on an integer value, we need to store the result again. __iadd__
returns a new value that replaces the current one, it doesn’t change the current one. There is no unary increment (++
) operator in Python.
Let’s change f1
slightly and see what happens:
Line 2 is the same: push 0
onto the stack, then pop it and store it in local variable 0/a
.
Line 3 starts off the same: we still load a
and push it onto the stack, then load the constant 1 and push it onto the stack. But this time we call BINARY_ADD
which pops the top two values, adds them with __add__
, and pushing the result. Then STORE_FAST
pops and stores it in a
. Finally the implicit return we’ll ignore.
OK, integers are fun, but what about a more interesting type, like a list?
Start with the LHS of line 2: BUILD_LIST 0
takes zero objects off the top of the stack, builds a list object, and pushes that list onto the stack. Then we STORE_FAST
which puts that list into a
.
On line 3, because we’re doing an “inplace” add again, we follow the same road map: LOAD_FAST
pushes the value from the variable stored on the RHS back onto the stack (the list). Then we evaluate the LHS: LOAD_CONST
to push 1
onto the stack, call BUILD_LIST 1
to create a list containing the top value from the list, and push the new list onto the stack. Then call INPLACE_ADD
which, for lists, executes the extend operation and pushes the new list onto the stack. Finally, pop the new list and store it in our local variable.
Now let’s compare that with calling .extend()
:
On line 3, we push the list from a
onto the stack, then call LOAD_ATTR
to pop it off the stack, and then push a reference to the extend
attribute onto the stack. LOAD_CONST
to push 1
onto stack, BUILD_LIST
replaces it with a list [1]
then CALL_FUNCTION 1
which pops 1 value from the stack to use as a function argument, then pops the function, and finally calls the function, which, in this case, extends the list with that value, then pushes the result onto the stack.
What happens next is interesting. We do not call STORE_FAST
. Lists are mutable, and the value of the list has already changed. We just call POP_TOP
to throw away the top stack value.
At this point, we can basically understand how this bizarre error both works and fails at the “same time,” but there’s one more type of operation we should cover.
Line 2 builds a list just like we’ve been doing: LOAD_CONST
to put 5
on the stack, BUILD_LIST 1
to build a list with 1 value from the stack, and STORE_FAST
to pop the list from the top of the stack and store it into a
.
Line 3 LOAD_FAST
s the list onto the stack, then LOAD_CONST
the index (0) from the LHS onto the stack. DUP_TOPX 2
takes the top two values from stack and duplicates them, in order, which makes the stack a 0 a 0
.
BINARY_SUBSCR
takes the top two values from the stack, evaluates TOS1[TOS]
, and pushes it back onto the stack, so the stack is now a 0 5
. LOAD_CONST
puts the value 1
onto the stack, and INPLACE_ADD
pops the two integers, adds, then, and pushes 6
back (a 0 6
).
ROT_THREE
rotates the top three stack objects, so we reorder the stack to 6 a 0
. We need to do this to call STORE_SUBSCR
, which implements TOS1[TOS] = TOS2
, or a[0] = 6
. STORE_SUBSCR
replaces STORE_FAST
in the other examples, and it changes the stored value of TOS1
, so it only makes sense for mutable types.
OK! Finally, we’ve got everything we need to look at the our favorite bizarre behavior:
Line 2: build a list and push it. ([]
) Pop it and use it to build a tuple, then push the tuple (([],)
) Pop the tuple and store it.
Line 3:
LOAD_FAST
Pusha
onto the stack (([],)
)LOAD_CONST
Push0
onto the stack (([],) 0
)DUP_TOPX 2
Duplicate the two stack values (([],) 0 ([],) 0
)BINARY_SUBSCR
Pop 2, calculateTOS1[TOS]
, push the result (([],) 0 []
Remember that the list inside the tuple and the list are the same object.)LOAD_CONST
Push1
onto the stack (([],) 0 [] 1
)BUILD_LIST 1
Pop the top, build[TOS]
, and push the result (([],) 0 [] [1]
)INPLACE_ADD
Pop the top two, executeTOS1.__iadd__(TOS)
, extending the list (([1],) 0 [1]
) At this point the extend operation has already happened.ROT_THREE
([1] ([1],) 0
)STORE_SUBSTR
Attempt to pop the top three and storeTOS1[TOS] = TOS2
.
STORE_SUBSTR
fails because tuples are immutable, which causes the TypeError
. But, as we learned, +=
is not an atomic operation. Part of it (the actual addition/extend) can succeed while another part (store) fails.
Without using dis.dis
, can we tell how Ramon’s example is different?
i = ([],)
builds a list, pushes it onto the stack, pops it to build a tuple, pushes the tuple, and finally pops the tuple stores it in the local variablei
.f = i[0]
is going to do something likeLOAD_FAST
to get the tuple onto the stack,LOAD_CONST
to get0
onto the stack,BINARY_SUBSCR
to pop them and pushTOS1[TOS]
, a reference to the list, onto the stack, thenSTORE_FAST
to pop it and store the reference in local variablef
.f += [1]
willLOAD_FAST
the list from the RHS,LOAD_CONST
to push1
onto the stack,BUILD_LIST 1
to pop1
onto the stack, build a list, and push it onto the stack.INPLACE_ADD
to pop the top two, add (extend) them, and push the result onto the stack, and finallySTORE_FAST
to pop the extended list intof
.
Since f
and i[0]
point to the same list object, i
is now ([1],)
. i
hasn’t mutated but its mutable member has, and nothing ever tried to do a STORE_SUBSCR
on an immutable object.
This is specifically CPython (I used 2.7.5) and the opcode representation is not guaranteed or part of the spec. It’s internal and subject to change, but it’s also an interesting introduction to an important aspect of how Python works.
The original, specific example is contrived and bad code, specifically constructed to fail. But it’s not so bad or so contrived that no one would ever do it, which is why it’s a particularly interesting one.