2012-09-15 | python, django
While doing maintenance/bug fixing work on a website built with Django, a came across an issue with the commenting system. The problem was, that the order of the comments was not right sometimes. Comments which were placed as a comment on another comment were occasionally even displayed before their subject. Another strange thing about it was that it mostly happened at times with higher traffic.
The application which contained the commenting functionality contained the following model definition:
class Comment(models.Model): comment = models.TextField() published = models.DateTimeField(default=datetime.datetime.now()) user = models.ForeignKey(User, related_name="user_comment") content_type = models.ForeignKey(ContentType) object_id = models.PositiveIntegerField() content_object = generic.GenericForeignKey('content_type', 'object_id') client_ip = models.CharField(max_length=15)
Can you spot the bug?
It is right there on line 3 (where the
published field is defined).
The problem here was the call to
datetime.datetime.now() right there in the
When the models.py of the application in question is loaded, the class definition of Comment is evaluated too. Each field of the Comment class receives an instance of a Field class.
published field receives an instance of the
DateTimeField. No problem so
far. The instance is created using the result of
default. And right there things are wrong.
Because we actually call a function here, the result is used. This means the default value of the field is the "datetime" of the moment when the class definition is being evaluated. The python interpreter remembers the resulting class as long as it is running.
In case of the website, the python interpreter was invoked using Apache with mod_wsgi. This means that a python instance will be reused for several requests. And all of those requests will use the same default time for published...
The fix for the bug was the removal of two characters. I had to change the line from this:
published = models.DateTimeField(default=datetime.datetime.now())
published = models.DateTimeField(default=datetime.datetime.now)
Now the parentheses
now are missing from the definition, a
reference to the
now function is being passed as the default.
When Django tries to retrieve the default value of a field (for instance when
the model is being saved) it now calls the
instead of using the result of it when the models were loaded the first time.
Other places which are only evaluated once during an interpreter session will probably expose similar issues.
Take a basic function definition for example:
def example(items=): items.append("test") return items
What will a call to
example() return? What will be returned a second or
>>> def example(items=): ... items.append("test") ... return items ... >>> example() ['test'] >>> example() ['test', 'test'] >>> example() ['test', 'test', 'test'] >>>
You can even get more confusing issues when you combine this behavior with more side-effects such as database queries.
Once, I saw something like this within a Django application:
class ExampleForm(forms.Form): ages = [(u'', u'Age group')] ages.extend([([c.id, c.label]) for c in Age.objects.all()]) age = forms.ChoiceField(choices=ages, required=False)
This will probably work. The models.py will validate and the application will start just fine. But what will happen when an age group is added within the Django admin interface while the front-end application is running?
Try it. You will see that the list of age groups will not change as long as Django is not restarted/reloaded...
Just take care where you declare something and think about when your declaration will be evaluated, how it will change within the application's lifetime and if this is really what you want.
So just think about it.