========================== Serializing Django objects ========================== Django's serialization framework provides a mechanism for "translating" Django models into other formats. Usually these other formats will be text-based and used for sending Django data over a wire, but it's possible for a serializer to handle any format (text-based or not). .. seealso:: If you just want to get some data from your tables into a serialized form, you could use the :djadmin:`dumpdata` management command. Serializing data ---------------- At the highest level, serializing data is a very simple operation:: from django.core import serializers data = serializers.serialize("xml", SomeModel.objects.all()) The arguments to the ``serialize`` function are the format to serialize the data to (see `Serialization formats`_) and a :class:`~django.db.models.query.QuerySet` to serialize. (Actually, the second argument can be any iterator that yields Django model instances, but it'll almost always be a QuerySet). .. function:: django.core.serializers.get_serializer(format) You can also use a serializer object directly:: XMLSerializer = serializers.get_serializer("xml") xml_serializer = XMLSerializer() xml_serializer.serialize(queryset) data = xml_serializer.getvalue() This is useful if you want to serialize data directly to a file-like object (which includes an :class:`~django.http.HttpResponse`):: with open("file.xml", "w") as out: xml_serializer.serialize(SomeModel.objects.all(), stream=out) .. note:: Calling :func:`~django.core.serializers.get_serializer` with an unknown :ref:`format ` will raise a ``django.core.serializers.SerializerDoesNotExist`` exception. Subset of fields ~~~~~~~~~~~~~~~~ If you only want a subset of fields to be serialized, you can specify a ``fields`` argument to the serializer:: from django.core import serializers data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size')) In this example, only the ``name`` and ``size`` attributes of each model will be serialized. .. note:: Depending on your model, you may find that it is not possible to deserialize a model that only serializes a subset of its fields. If a serialized object doesn't specify all the fields that are required by a model, the deserializer will not be able to save deserialized instances. Inherited Models ~~~~~~~~~~~~~~~~ If you have a model that is defined using an :ref:`abstract base class `, you don't have to do anything special to serialize that model. Just call the serializer on the object (or objects) that you want to serialize, and the output will be a complete representation of the serialized object. However, if you have a model that uses :ref:`multi-table inheritance `, you also need to serialize all of the base classes for the model. This is because only the fields that are locally defined on the model will be serialized. For example, consider the following models:: class Place(models.Model): name = models.CharField(max_length=50) class Restaurant(Place): serves_hot_dogs = models.BooleanField() If you only serialize the Restaurant model:: data = serializers.serialize('xml', Restaurant.objects.all()) the fields on the serialized output will only contain the ``serves_hot_dogs`` attribute. The ``name`` attribute of the base class will be ignored. In order to fully serialize your ``Restaurant`` instances, you will need to serialize the ``Place`` models as well:: all_objects = list(Restaurant.objects.all()) + list(Place.objects.all()) data = serializers.serialize('xml', all_objects) Deserializing data ------------------ Deserializing data is also a fairly simple operation:: for obj in serializers.deserialize("xml", data): do_something_with(obj) As you can see, the ``deserialize`` function takes the same format argument as ``serialize``, a string or stream of data, and returns an iterator. However, here it gets slightly complicated. The objects returned by the ``deserialize`` iterator *aren't* simple Django objects. Instead, they are special ``DeserializedObject`` instances that wrap a created -- but unsaved -- object and any associated relationship data. Calling ``DeserializedObject.save()`` saves the object to the database. .. note:: If the ``pk`` attribute in the serialized data doesn't exist or is null, a new instance will be saved to the database. .. versionchanged:: 1.6 In previous versions of Django, the ``pk`` attribute had to be present on the serialized data or a ``DeserializationError`` would be raised. This ensures that deserializing is a non-destructive operation even if the data in your serialized representation doesn't match what's currently in the database. Usually, working with these ``DeserializedObject`` instances looks something like:: for deserialized_object in serializers.deserialize("xml", data): if object_should_be_saved(deserialized_object): deserialized_object.save() In other words, the usual use is to examine the deserialized objects to make sure that they are "appropriate" for saving before doing so. Of course, if you trust your data source you could just save the object and move on. The Django object itself can be inspected as ``deserialized_object.object``. If fields in the serialized data do not exist on a model, a ``DeserializationError`` will be raised unless the ``ignorenonexistent`` argument is passed in as ``True``:: serializers.deserialize("xml", data, ignorenonexistent=True) .. _serialization-formats: Serialization formats --------------------- Django supports a number of serialization formats, some of which require you to install third-party Python modules: ========== ============================================================== Identifier Information ========== ============================================================== ``xml`` Serializes to and from a simple XML dialect. ``json`` Serializes to and from JSON_. ``yaml`` Serializes to YAML (YAML Ain't a Markup Language). This serializer is only available if PyYAML_ is installed. ========== ============================================================== .. _json: http://json.org/ .. _PyYAML: http://www.pyyaml.org/ XML ~~~ The basic XML serialization format is quite simple:: 2013-01-16T08:16:59.844560+00:00 The whole collection of objects that is either serialized or de-serialized is represented by a ````-tag which contains multiple ````-elements. Each such object has two attributes: "pk" and "model", the latter being represented by the name of the app ("sessions") and the lowercase name of the model ("session") separated by a dot. Each field of the object is serialized as a ````-element sporting the fields "type" and "name". The text content of the element represents the value that should be stored. Foreign keys and other relational fields are treated a little bit differently:: 9 In this example we specify that the auth.Permission object with the PK 27 has a foreign key to the contenttypes.ContentType instance with the PK 9. ManyToMany-relations are exported for the model that binds them. For instance, the auth.User model has such a relation to the auth.Permission model:: This example links the given user with the permission models with PKs 46 and 47. JSON ~~~~ When staying with the same example data as before it would be serialized as JSON in the following way:: [ { "pk": "4b678b301dfd8a4e0dad910de3ae245b", "model": "sessions.session", "fields": { "expire_date": "2013-01-16T08:16:59.844Z", ... } } ] The formatting here is a bit simpler than with XML. The whole collection is just represented as an array and the objects are represented by JSON objects with three properties: "pk", "model" and "fields". "fields" is again an object containing each field's name and value as property and property-value respectively. Foreign keys just have the PK of the linked object as property value. ManyToMany-relations are serialized for the model that defines them and are represented as a list of PKs. Date and datetime related types are treated in a special way by the JSON serializer to make the format compatible with `ECMA-262`_. Be aware that not all Django output can be passed unmodified to :mod:`json`. In particular, :ref:`lazy translation objects ` need a `special encoder`_ written for them. Something like this will work:: import json from django.utils.functional import Promise from django.utils.encoding import force_text from django.core.serializers.json import DjangoJSONEncoder class LazyEncoder(DjangoJSONEncoder): def default(self, obj): if isinstance(obj, Promise): return force_text(obj) return super(LazyEncoder, self).default(obj) .. _special encoder: http://docs.python.org/library/json.html#encoders-and-decoders .. _ecma-262: http://www.ecma-international.org/ecma-262/5.1/#sec-15.9.1.15 YAML ~~~~ YAML serialization looks quite similar to JSON. The object list is serialized as a sequence mappings with the keys "pk", "model" and "fields". Each field is again a mapping with the key being name of the field and the value the value:: - fields: {expire_date: !!timestamp '2013-01-16 08:16:59.844560+00:00'} model: sessions.session pk: 4b678b301dfd8a4e0dad910de3ae245b Referential fields are again just represented by the PK or sequence of PKs. .. _topics-serialization-natural-keys: Natural keys ------------ The default serialization strategy for foreign keys and many-to-many relations is to serialize the value of the primary key(s) of the objects in the relation. This strategy works well for most objects, but it can cause difficulty in some circumstances. Consider the case of a list of objects that have a foreign key referencing :class:`~django.contrib.contenttypes.models.ContentType`. If you're going to serialize an object that refers to a content type, then you need to have a way to refer to that content type to begin with. Since ``ContentType`` objects are automatically created by Django during the database synchronization process, the primary key of a given content type isn't easy to predict; it will depend on how and when :djadmin:`syncdb` was executed. This is true for all models which automatically generate objects, notably including :class:`~django.contrib.auth.models.Permission`, :class:`~django.contrib.auth.models.Group`, and :class:`~django.contrib.auth.models.User`. .. warning:: You should never include automatically generated objects in a fixture or other serialized data. By chance, the primary keys in the fixture may match those in the database and loading the fixture will have no effect. In the more likely case that they don't match, the fixture loading will fail with an :class:`~django.db.IntegrityError`. There is also the matter of convenience. An integer id isn't always the most convenient way to refer to an object; sometimes, a more natural reference would be helpful. It is for these reasons that Django provides *natural keys*. A natural key is a tuple of values that can be used to uniquely identify an object instance without using the primary key value. Deserialization of natural keys ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Consider the following two models:: from django.db import models class Person(models.Model): first_name = models.CharField(max_length=100) last_name = models.CharField(max_length=100) birthdate = models.DateField() class Meta: unique_together = (('first_name', 'last_name'),) class Book(models.Model): name = models.CharField(max_length=100) author = models.ForeignKey(Person) Ordinarily, serialized data for ``Book`` would use an integer to refer to the author. For example, in JSON, a Book might be serialized as:: ... { "pk": 1, "model": "store.book", "fields": { "name": "Mostly Harmless", "author": 42 } } ... This isn't a particularly natural way to refer to an author. It requires that you know the primary key value for the author; it also requires that this primary key value is stable and predictable. However, if we add natural key handling to Person, the fixture becomes much more humane. To add natural key handling, you define a default Manager for Person with a ``get_by_natural_key()`` method. In the case of a Person, a good natural key might be the pair of first and last name:: from django.db import models class PersonManager(models.Manager): def get_by_natural_key(self, first_name, last_name): return self.get(first_name=first_name, last_name=last_name) class Person(models.Model): objects = PersonManager() first_name = models.CharField(max_length=100) last_name = models.CharField(max_length=100) birthdate = models.DateField() class Meta: unique_together = (('first_name', 'last_name'),) Now books can use that natural key to refer to ``Person`` objects:: ... { "pk": 1, "model": "store.book", "fields": { "name": "Mostly Harmless", "author": ["Douglas", "Adams"] } } ... When you try to load this serialized data, Django will use the ``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]`` into the primary key of an actual ``Person`` object. .. note:: Whatever fields you use for a natural key must be able to uniquely identify an object. This will usually mean that your model will have a uniqueness clause (either unique=True on a single field, or ``unique_together`` over multiple fields) for the field or fields in your natural key. However, uniqueness doesn't need to be enforced at the database level. If you are certain that a set of fields will be effectively unique, you can still use those fields as a natural key. Serialization of natural keys ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ So how do you get Django to emit a natural key when serializing an object? Firstly, you need to add another method -- this time to the model itself:: class Person(models.Model): objects = PersonManager() first_name = models.CharField(max_length=100) last_name = models.CharField(max_length=100) birthdate = models.DateField() def natural_key(self): return (self.first_name, self.last_name) class Meta: unique_together = (('first_name', 'last_name'),) That method should always return a natural key tuple -- in this example, ``(first name, last name)``. Then, when you call ``serializers.serialize()``, you provide a ``use_natural_keys=True`` argument:: >>> serializers.serialize('json', [book1, book2], indent=2, use_natural_keys=True) When ``use_natural_keys=True`` is specified, Django will use the ``natural_key()`` method to serialize any reference to objects of the type that defines the method. If you are using :djadmin:`dumpdata` to generate serialized data, you use the :djadminopt:`--natural` command line flag to generate natural keys. .. note:: You don't need to define both ``natural_key()`` and ``get_by_natural_key()``. If you don't want Django to output natural keys during serialization, but you want to retain the ability to load natural keys, then you can opt to not implement the ``natural_key()`` method. Conversely, if (for some strange reason) you want Django to output natural keys during serialization, but *not* be able to load those key values, just don't define the ``get_by_natural_key()`` method. Dependencies during serialization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since natural keys rely on database lookups to resolve references, it is important that the data exists before it is referenced. You can't make a "forward reference" with natural keys -- the data you're referencing must exist before you include a natural key reference to that data. To accommodate this limitation, calls to :djadmin:`dumpdata` that use the :djadminopt:`--natural` option will serialize any model with a ``natural_key()`` method before serializing standard primary key objects. However, this may not always be enough. If your natural key refers to another object (by using a foreign key or natural key to another object as part of a natural key), then you need to be able to ensure that the objects on which a natural key depends occur in the serialized data before the natural key requires them. To control this ordering, you can define dependencies on your ``natural_key()`` methods. You do this by setting a ``dependencies`` attribute on the ``natural_key()`` method itself. For example, let's add a natural key to the ``Book`` model from the example above:: class Book(models.Model): name = models.CharField(max_length=100) author = models.ForeignKey(Person) def natural_key(self): return (self.name,) + self.author.natural_key() The natural key for a ``Book`` is a combination of its name and its author. This means that ``Person`` must be serialized before ``Book``. To define this dependency, we add one extra line:: def natural_key(self): return (self.name,) + self.author.natural_key() natural_key.dependencies = ['example_app.person'] This definition ensures that all ``Person`` objects are serialized before any ``Book`` objects. In turn, any object referencing ``Book`` will be serialized after both ``Person`` and ``Book`` have been serialized.