This post has been a long time in the making. A year ago, I started work on my Twanguages code. This was code to analyse a corpus of Twitter messages, and try to discern patterns about language use, geography, and character encoding.  I decided to use the Django web framework and the Python language for the Twanguages analysis code.  I know Python, but I was learning Django for the first time.

Django is really, really marvellous.  When I tried this expression, and got the Python array of records I was expecting,

q2 = TwUser.objects.annotate(ntweets=Count('twstatus')).filter(ntweets__gt=1)

I wrote in my log, “I think I just fell in love. Power and concision in a tool, awesome.”

But Django gave me fits.  It has its share of quirks to trap the unwary novice. Eventually I began writing notes about “Django gotchas” in my log.  Some of them are Django being difficult, or inadequate. Some are me being a clueless novice, and Django not rescuing me from my folly. But all of them were obstacles.  I share them in the hopes of helping another Django novice.

Here are my Django gotchas.  They are ranked from the most distressing to most benign. They apply to Django 1.1, the current version at the time. (As of August 2010, the current version is 1.2.1.) A couple of gotchas were addressed by Django 1.2, so I moved them down to a section of their own. The rest presumably still apply to Django 1.2, but I haven’t gone back to check.

  1. API fails unhelpfully. I wrote a simple query expression like:
    S2 = models.TwStatus.objects.get( key )

    I got a lot of weird errors, e.g. “ValueError: too many values to unpack” (where key is string) and “TypeError: ‘long’ object is not iterable” (where key is long). I had made a mistake, of course; the call to get() should have a keyword argument of “id__exact” or the like, not a positional argument. The correct spelling is this:

    S2 = models.TwStatus.objects.get( id__exact=key )

    The gotcha is that Django’s .get() isn’t written defensively. It isn’t very robust to programmer errors. Instead of checking parameters and giving clear error messages, it lets bad parameters through, only to have them fail obscurely deep in the framework. If defensive programming of the Django API would slow it down too much in production, I’d love to have a debug mode I could invoke during development.

  2. Don’t try tracing .get() and .save(). I tried tracing through Django’s get() and save() methods to diagnose the time zone problems. Boy was that a jungle! It seems the Query objects sometimes contain a cached set of records as their object value (not as attributes).  I was unable to follow the code well enough to understand what it did. I guess if the code must work wonders, it may not have the luxury of simplicity.  But if tracing get() and save() isn’t a straightforward way to understand how get and save work, then what is the novice’s alternative?
  3. Django model fields code doesn’t behave well under Eclipse+PyDev. I was unable to step through to discover a bug in my Field class instantiation. I made a plea for help on the Django-users mailing list, and the helpful Karen Tracey gave me some good advice:  “In those cases its usually easier to rely on something more basic, like pdb, to figure out what is going on. It may not be as pretty, but it can get the job done without going off into the weeds. …” (I later found out that Tracey wrote a book of advice like this: Django 1.1 Testing and Debugging.)
  4. CharField character set isn’t DRY. Django is largely unicode-savvy about the strings it passes in and out, yet doesn’t use UTF-8 as default character set for CharField columns, and for databases, which it creates. UTF-8 would be a better default. Also, since Django creates the database tables with syncdb, it would be nice if there was a way for it to set the character set for each table. By not providing this, Django lets down its Don’t Repeat Yourself (DRY) design philosophy: the charset of the backing store for a field should be defined where everything else about the field is defined.
  5. Type difference after .save(). When simply instantiating fields with Python int, long, str, or unicode types, then retrieving the field value, it comes back as the same type as was passed in. But .save() the record, then retrieve the field value, and it comes back as whichever type Django prefers. What was a str is now a unicode type. What was long, but has a value which fits an int, comes back as an int. What was int, but has a value requiring a long, comes back as a long. The type you get back depends on type passed in, whether it was saved, and to what back-end.  Maybe I only got bothered because I was trying to write strongly typed code in a duck-typing language, and I should instead have unclenched my C++-trained retentiveness about types.
  6. Unicode-savvy throughout, except when it’s not. I found a place in substrings handling code where Django wasn’t decoding the UTF-8 strings it read from the database. Why didn’t Django do this for me? Ah! I was using utf8_bin (case sensitive) collation, and a Django note on utf8_bin collation on MySQL notes, “if you are using MySQLdb 1.2.2, the database backend in Django will then return bytestrings (instead of unicode strings) for any character fields it returns receive from the database. This is a strong variation from Django’s normal practice of always returning unicode strings. It is up to you, the developer, to handle the fact that you will receive bytestrings if you configure your table(s) to use utf8_bin collation. Django itself should work smoothly with such columns, but if your code must be prepared to call django.utils.encoding.smart_unicode() at times if it really wants to work with consistent data”. It turns out this is “Officially Hard(tm)” to improve; see Django bug #8340.  Just to compound the issue, some of the Django-supplied test cases fail if collation is utf8_bin under MySQL.
  7. Can’t give Djanjo Test Loader a test suite to run. It’s great that Django has a TestLoader class to automate unit testing. But it only accepts individual TestCase classes, not test suites. This makes it harder to reuses collections of test cases developed elsewhere. My workaround was, in application’s tests.py or models.py, to import the TestCase classes by name. (That might make a good future blog post, actually.)
  8. Unhelpful error “IntegrityError: workshop_twstatus.user_id may not be NULL” on .save(). I got this when trying to save a pair of model objects linked by foreign key (a Twitter message of class TwStatus, with a .user field pointing to a TwUser instance). I was using an expression of the form myStatus.save(), which failed with the IntegrityError message above. The problem was that I was expecting models.Model.__init__(**kwargs) to find my foreign key attribute pointing to a subsidiary dictionary, and instantiate it in turn. All it did was drop the subsidiary dictionary and substitute NULL, with of course no helpful error message. It was up to me to use the subsidiary dictionary to instantiate a instance of the ForeignKey’s class, and attach this to the foreign key attribute. The wording of the Django Saving ForeignKey and ManyToManyField fields section at  wasn’t all that clear. It didn’t say, “you need to separately create and call .save() on the object pointed to by the ForeignKey.” I suppose I could propose a wording change for the documentation.
  9. Inheriting __metaclass__ leads to infinite loop. An interesting problem happened when I was defining models_utf8bin.CharField and friends. If I defined __metaclass__ = models.SubfieldBase in the helper class To_Python_Smart_Unicode instead of in my derived CharField, then when I ran the tests an infinite loop occurred at class definition time. I tried to figure out why that happened, but I understand Python class definition too little.  My workaround was that each derived class needs to have its own __metaclass__ = models.SubfieldBase definition.
  10. Django test harness demands too much from some hosting environments (e.g. Dreamhost). The Django test harness can’t use an existing database, it must create its own. But some shared hosting environments forbid this (mine with Dreamhost, for one). It looks like I can’t tell the Django test runner to use an existing database for testing, it insists on creating a new one. But my hosting environment only lets you create a database using their control panel. It doesn’t allow Django to connect unless it’s asking for an existing database. Thus I can’t run Django’s test harness against my hosting environment, production code. This makes it harder for me to validate that Django will work with my production database server.
  11. Django won’t write datetimes with timezones to database. Django only writes naive datetimes to database. The MySQL backend for Django 1.1 rejects any Python datetimes with timezone values, and the date-time values it writes to MySQL are naive, i.e. they have no timezone settings. But it was important for my application to know what time zone my time stamps were in.  I wrote my own Django Field type, a datetime field type that converted all Python datetimes to UTC time, then stripped off the time zone value and passed them as naive values to Django’s datetime field. This was awkward.  I expect to publish the code for this field sometime.

Fixed in Django 1.2

I wrote my gotcha list based on Django 1.1, which I was using in October 2009 for the main development of Twanguages.  In the interim, the Django folks released version 1.2.  It fixed some of my gotchas. To celebrate, I pulled them out of my gotchas list into this section.

  1. No model validation in Django 1.1. Maybe this was my C++-trained retentiveness about types again, but I really wanted to write code to validate that my data structures matched their models, and I wanted the Django framework to help.  Karen Tracey came to my rescue again, with a pointer to “Django Tip: Poor Man’s Model Validation“, by by Malcolm Tredinnick (offline at the moment). That was helpful. Even better, Django 1.2 supports model validation. Hooray!
  2. Django 1.1 doesn’t support a BigInt type by default. By BigInt I mean an integer capable of a value larger than 2^32-1. See interesting back-and-forth in Django Issue #399 “Bigint field object needed”. It looks like they held off on adding a bigint field type to Django because it wouldn’t behave consistently on all database back-ends, i.e. it might have errors sometimes. It turns out that after four years of discussion, a BigIntegerField was finally added to Django 1.2. Hooray!