Testing with Legacy Data in Django

As we incrementally move SUMO over to Django, we often run into problems with legacy data—I’d say all of our problems so far, in fact. This one took several hours to figure out, so I thought I’d share.

Our basic problem is that we want to test our search facility, which is built on Sphinx, including indexing, searching directly, and searching through the web interface. Fortunately, Dave Dash on the AMO team tackled this in general, and I was largely able to follow his lead. But then I ran into our legacy data issues.

*Before I even start describing my solution, I want to point out that, yes, I tried basically everything else. I did not go so far as changing the schema of our legacy tables to make this test pass, but I was right up at the brink. I do not recommend following this example unless you’ve exhausted your options.

In order to run our search indexer, and get accurate results, we needed a number of tables to join the the documents we’re indexing (wiki pages, in this case) to their category information.

My first attempt at making these pass was to use the initial-SQL method in Django. The downside was that the tests would only pass consistently if they were run with FORCE_DB. I was willing to accept that, but then I started reading more about the initial-SQL facility, and realized this would get run on any syncdb operation, which was unacceptable.

Then Dave suggested I just create some legacy models so I could build fixtures for them. Well, it was more complex than setting FORCE_DB=1, but I gave it a shot.

It turns out that our old CMS, TikiWiki, has a schema so idiosyncratic that it largely cannot be represented by models in Django.

First, I couldn’t find any way to represent a relationship between two of the tables. This part is TikiWiki’s fault: it joins them on a unique string column, rather than something nice, like an integer. I could deal with that, though, as I realized I didn’t care if I couldn’t join them in Django, as long as Sphinx could in pure SQL.

But then, I needed to create a model for a join table so I could use it for through, because the old schema used completely wrong names for the foreign key columns, and I hit my first major stumbling point with Django: every model needs a primary key and it doesn’t support multi-column primary keys. I would have needed to alter the table schema to make this work, jeopardizing compatibility with the legacy code.

Lesson: Django will create join tables for you. Don’t argue with it.

So what to do?

I figured that I could probably just execute raw SQL statements in the setup and teardown methods of my test. So I took the queries out of the initial-SQL files and moved them into setup, and added some DROP TABLE statements to the teardown method.

Then I hit a final snag: CREATE TABLE IF NOT EXIST issues a “Note”-level warning if the table does exist. Django (or Nose?) considers any SQL warning an error, and it seems there’s no way to prevent that, not even for a single test case.

Fortunately, you can tell MySQL not to issue notes. With a couple of extra queries, I suppressed them and made my tests pass consistently. Thanks to wdoekes for sharing his solution!

from django.dbimport connection
cursor = connection.cursor()
cursor.execute(‘SET @[OLD_SQL_NOTES](http://twitter.com/OLD_SQL_NOTES)=@@SQL_NOTES, SQL_NOTES=0;’)
# … statements that can issues notes …
cursor.execute(‘SET [email protected]_SQL_NOTES;’)