2006-06-28 16:00:37 +00:00
|
|
|
==========================
|
|
|
|
Serializing Django objects
|
|
|
|
==========================
|
|
|
|
|
|
|
|
Django's serialization framework provides a mechanism for "translating" Django
|
2012-05-03 08:57:22 +02:00
|
|
|
models into other formats. Usually these other formats will be text-based and
|
|
|
|
used for sending Django data over a wire, but it's possible for a
|
2006-06-28 16:00:37 +00:00
|
|
|
serializer to handle any format (text-based or not).
|
|
|
|
|
Fixed a whole bunch of small docs typos, errors, and ommissions.
Fixes #8358, #8396, #8724, #9043, #9128, #9247, #9267, #9267, #9375, #9409, #9414, #9416, #9446, #9454, #9464, #9503, #9518, #9533, #9657, #9658, #9683, #9733, #9771, #9835, #9836, #9837, #9897, #9906, #9912, #9945, #9986, #9992, #10055, #10084, #10091, #10145, #10245, #10257, #10309, #10358, #10359, #10424, #10426, #10508, #10531, #10551, #10635, #10637, #10656, #10658, #10690, #10699, #19528.
Thanks to all the respective authors of those tickets.
git-svn-id: http://code.djangoproject.com/svn/django/trunk@10371 bcc190cf-cafb-0310-a4f2-bffc1f526a37
2009-04-03 18:30:54 +00:00
|
|
|
.. seealso::
|
|
|
|
|
|
|
|
If you just want to get some data from your tables into a serialized
|
|
|
|
form, you could use the :djadmin:`dumpdata` management command.
|
|
|
|
|
2006-06-28 16:00:37 +00:00
|
|
|
Serializing data
|
2016-01-03 12:56:22 +02:00
|
|
|
================
|
2006-06-28 16:00:37 +00:00
|
|
|
|
2019-06-17 16:54:55 +02:00
|
|
|
At the highest level, you can serialize data like this::
|
2006-06-28 16:00:37 +00:00
|
|
|
|
|
|
|
from django.core import serializers
|
|
|
|
data = serializers.serialize("xml", SomeModel.objects.all())
|
2006-09-25 17:44:07 +00:00
|
|
|
|
2008-08-23 22:25:40 +00:00
|
|
|
The arguments to the ``serialize`` function are the format to serialize the data
|
2011-09-30 10:28:39 +00:00
|
|
|
to (see `Serialization formats`_) and a
|
|
|
|
:class:`~django.db.models.query.QuerySet` to serialize. (Actually, the second
|
2012-05-03 08:57:22 +02:00
|
|
|
argument can be any iterator that yields Django model instances, but it'll
|
|
|
|
almost always be a QuerySet).
|
2006-06-28 16:00:37 +00:00
|
|
|
|
2013-01-01 08:12:42 -05:00
|
|
|
.. function:: django.core.serializers.get_serializer(format)
|
|
|
|
|
2006-06-28 16:00:37 +00:00
|
|
|
You can also use a serializer object directly::
|
|
|
|
|
2007-03-22 12:54:23 +00:00
|
|
|
XMLSerializer = serializers.get_serializer("xml")
|
|
|
|
xml_serializer = XMLSerializer()
|
2006-06-28 16:00:37 +00:00
|
|
|
xml_serializer.serialize(queryset)
|
|
|
|
data = xml_serializer.getvalue()
|
2006-09-25 17:44:07 +00:00
|
|
|
|
2006-06-28 16:00:37 +00:00
|
|
|
This is useful if you want to serialize data directly to a file-like object
|
2008-11-01 20:14:36 +00:00
|
|
|
(which includes an :class:`~django.http.HttpResponse`)::
|
2006-06-28 16:00:37 +00:00
|
|
|
|
2012-05-05 14:01:38 +02:00
|
|
|
with open("file.xml", "w") as out:
|
|
|
|
xml_serializer.serialize(SomeModel.objects.all(), stream=out)
|
2006-06-28 16:00:37 +00:00
|
|
|
|
2011-04-26 16:49:32 +00:00
|
|
|
.. note::
|
|
|
|
|
|
|
|
Calling :func:`~django.core.serializers.get_serializer` with an unknown
|
|
|
|
:ref:`format <serialization-formats>` will raise a
|
2013-01-01 08:12:42 -05:00
|
|
|
``django.core.serializers.SerializerDoesNotExist`` exception.
|
2011-04-26 16:49:32 +00:00
|
|
|
|
2014-11-08 17:08:12 +01:00
|
|
|
.. _subset-of-fields:
|
|
|
|
|
2007-06-01 13:39:08 +00:00
|
|
|
Subset of fields
|
2016-01-03 12:56:22 +02:00
|
|
|
----------------
|
2007-06-01 13:39:08 +00:00
|
|
|
|
2007-11-04 02:08:24 +00:00
|
|
|
If you only want a subset of fields to be serialized, you can
|
2007-07-01 01:00:23 +00:00
|
|
|
specify a ``fields`` argument to the serializer::
|
2007-06-01 13:39:08 +00:00
|
|
|
|
|
|
|
from django.core import serializers
|
2022-08-26 17:10:27 +03:00
|
|
|
data = serializers.serialize('xml', SomeModel.objects.all(), fields=['name','size'])
|
2007-06-01 13:39:08 +00:00
|
|
|
|
2007-07-01 01:00:23 +00:00
|
|
|
In this example, only the ``name`` and ``size`` attributes of each model will
|
2016-03-08 21:28:00 +01:00
|
|
|
be serialized. The primary key is always serialized as the ``pk`` element in the
|
|
|
|
resulting output; it never appears in the ``fields`` part.
|
2007-06-01 13:39:08 +00:00
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
2008-08-03 21:02:59 +00:00
|
|
|
Depending on your model, you may find that it is not possible to
|
|
|
|
deserialize a model that only serializes a subset of its fields. If a
|
|
|
|
serialized object doesn't specify all the fields that are required by a
|
|
|
|
model, the deserializer will not be able to save deserialized instances.
|
2007-06-01 13:39:08 +00:00
|
|
|
|
2016-01-24 22:26:11 +01:00
|
|
|
Inherited models
|
2016-01-03 12:56:22 +02:00
|
|
|
----------------
|
2008-06-09 14:03:35 +00:00
|
|
|
|
2008-09-03 20:23:18 +00:00
|
|
|
If you have a model that is defined using an :ref:`abstract base class
|
|
|
|
<abstract-base-classes>`, you don't have to do anything special to serialize
|
2019-06-17 16:54:55 +02:00
|
|
|
that model. Call the serializer on the object (or objects) that you want to
|
2008-09-03 20:23:18 +00:00
|
|
|
serialize, and the output will be a complete representation of the serialized
|
|
|
|
object.
|
2008-06-09 14:03:35 +00:00
|
|
|
|
2008-09-03 20:23:18 +00:00
|
|
|
However, if you have a model that uses :ref:`multi-table inheritance
|
|
|
|
<multi-table-inheritance>`, you also need to serialize all of the base classes
|
|
|
|
for the model. This is because only the fields that are locally defined on the
|
|
|
|
model will be serialized. For example, consider the following models::
|
2008-08-03 21:02:59 +00:00
|
|
|
|
2008-09-03 20:23:18 +00:00
|
|
|
class Place(models.Model):
|
|
|
|
name = models.CharField(max_length=50)
|
2008-08-03 21:02:59 +00:00
|
|
|
|
2008-09-03 20:23:18 +00:00
|
|
|
class Restaurant(Place):
|
2014-08-13 18:33:06 +02:00
|
|
|
serves_hot_dogs = models.BooleanField(default=False)
|
2008-08-03 21:02:59 +00:00
|
|
|
|
2008-06-09 14:03:35 +00:00
|
|
|
If you only serialize the Restaurant model::
|
|
|
|
|
2008-09-03 20:23:18 +00:00
|
|
|
data = serializers.serialize('xml', Restaurant.objects.all())
|
2008-06-09 14:03:35 +00:00
|
|
|
|
2013-03-22 05:50:45 -04:00
|
|
|
the fields on the serialized output will only contain the ``serves_hot_dogs``
|
|
|
|
attribute. The ``name`` attribute of the base class will be ignored.
|
2008-06-09 14:03:35 +00:00
|
|
|
|
2013-03-22 05:50:45 -04:00
|
|
|
In order to fully serialize your ``Restaurant`` instances, you will need to
|
|
|
|
serialize the ``Place`` models as well::
|
2008-06-09 14:03:35 +00:00
|
|
|
|
2018-09-28 18:57:12 +05:00
|
|
|
all_objects = [*Restaurant.objects.all(), *Place.objects.all()]
|
2008-09-03 20:23:18 +00:00
|
|
|
data = serializers.serialize('xml', all_objects)
|
2008-06-09 14:03:35 +00:00
|
|
|
|
2006-06-28 16:00:37 +00:00
|
|
|
Deserializing data
|
2016-01-03 12:56:22 +02:00
|
|
|
==================
|
2006-06-28 16:00:37 +00:00
|
|
|
|
2019-06-17 16:54:55 +02:00
|
|
|
Deserializing data is very similar to serializing it::
|
2006-06-28 16:00:37 +00:00
|
|
|
|
|
|
|
for obj in serializers.deserialize("xml", data):
|
|
|
|
do_something_with(obj)
|
2006-09-25 17:44:07 +00:00
|
|
|
|
2006-06-28 16:00:37 +00:00
|
|
|
As you can see, the ``deserialize`` function takes the same format argument as
|
|
|
|
``serialize``, a string or stream of data, and returns an iterator.
|
|
|
|
|
|
|
|
However, here it gets slightly complicated. The objects returned by the
|
2019-06-17 16:54:55 +02:00
|
|
|
``deserialize`` iterator *aren't* regular Django objects. Instead, they are
|
2006-06-28 16:00:37 +00:00
|
|
|
special ``DeserializedObject`` instances that wrap a created -- but unsaved --
|
|
|
|
object and any associated relationship data.
|
|
|
|
|
|
|
|
Calling ``DeserializedObject.save()`` saves the object to the database.
|
|
|
|
|
2013-02-12 14:00:38 +08:00
|
|
|
.. note::
|
|
|
|
|
|
|
|
If the ``pk`` attribute in the serialized data doesn't exist or is
|
|
|
|
null, a new instance will be saved to the database.
|
|
|
|
|
2006-06-28 16:00:37 +00:00
|
|
|
This ensures that deserializing is a non-destructive operation even if the
|
|
|
|
data in your serialized representation doesn't match what's currently in the
|
|
|
|
database. Usually, working with these ``DeserializedObject`` instances looks
|
|
|
|
something like::
|
|
|
|
|
|
|
|
for deserialized_object in serializers.deserialize("xml", data):
|
|
|
|
if object_should_be_saved(deserialized_object):
|
2008-02-18 23:43:12 +00:00
|
|
|
deserialized_object.save()
|
2006-09-25 17:44:07 +00:00
|
|
|
|
2006-06-28 16:00:37 +00:00
|
|
|
In other words, the usual use is to examine the deserialized objects to make
|
2020-05-01 13:37:21 +01:00
|
|
|
sure that they are "appropriate" for saving before doing so. If you trust your
|
|
|
|
data source you can instead save the object directly and move on.
|
2006-06-28 16:00:37 +00:00
|
|
|
|
|
|
|
The Django object itself can be inspected as ``deserialized_object.object``.
|
2013-08-19 08:29:32 -04:00
|
|
|
If fields in the serialized data do not exist on a model, a
|
|
|
|
``DeserializationError`` will be raised unless the ``ignorenonexistent``
|
|
|
|
argument is passed in as ``True``::
|
2006-06-28 16:00:37 +00:00
|
|
|
|
2013-08-19 08:29:32 -04:00
|
|
|
serializers.deserialize("xml", data, ignorenonexistent=True)
|
2012-09-30 16:34:13 +04:00
|
|
|
|
2008-08-23 22:25:40 +00:00
|
|
|
.. _serialization-formats:
|
|
|
|
|
2006-06-28 16:00:37 +00:00
|
|
|
Serialization formats
|
2016-01-03 12:56:22 +02:00
|
|
|
=====================
|
2006-06-28 16:00:37 +00:00
|
|
|
|
2009-02-21 08:40:07 +00:00
|
|
|
Django supports a number of serialization formats, some of which require you
|
|
|
|
to install third-party Python modules:
|
2006-06-28 16:00:37 +00:00
|
|
|
|
2011-10-14 00:12:01 +00:00
|
|
|
========== ==============================================================
|
|
|
|
Identifier Information
|
|
|
|
========== ==============================================================
|
|
|
|
``xml`` Serializes to and from a simple XML dialect.
|
2006-06-29 16:42:49 +00:00
|
|
|
|
2012-04-29 19:58:00 +02:00
|
|
|
``json`` Serializes to and from JSON_.
|
2006-06-29 16:42:49 +00:00
|
|
|
|
2020-06-16 15:51:58 +01:00
|
|
|
``jsonl`` Serializes to and from JSONL_.
|
|
|
|
|
2011-10-14 00:12:01 +00:00
|
|
|
``yaml`` Serializes to YAML (YAML Ain't a Markup Language). This
|
|
|
|
serializer is only available if PyYAML_ is installed.
|
|
|
|
========== ==============================================================
|
2006-06-29 16:42:49 +00:00
|
|
|
|
2018-01-07 14:28:41 +01:00
|
|
|
.. _json: https://json.org/
|
2021-04-27 12:09:00 +01:00
|
|
|
.. _jsonl: https://jsonlines.org/
|
2018-12-27 20:48:37 +11:00
|
|
|
.. _PyYAML: https://pyyaml.org/
|
2006-06-29 16:42:49 +00:00
|
|
|
|
2013-02-23 19:11:56 +01:00
|
|
|
XML
|
2016-01-03 12:56:22 +02:00
|
|
|
---
|
2013-02-23 19:11:56 +01:00
|
|
|
|
2019-06-17 16:54:55 +02:00
|
|
|
The basic XML serialization format looks like this::
|
2013-03-22 05:50:45 -04:00
|
|
|
|
2013-02-23 19:11:56 +01:00
|
|
|
<?xml version="1.0" encoding="utf-8"?>
|
|
|
|
<django-objects version="1.0">
|
|
|
|
<object pk="123" model="sessions.session">
|
|
|
|
<field type="DateTimeField" name="expire_date">2013-01-16T08:16:59.844560+00:00</field>
|
|
|
|
<!-- ... -->
|
|
|
|
</object>
|
|
|
|
</django-objects>
|
|
|
|
|
2018-07-31 11:32:38 -04:00
|
|
|
The whole collection of objects that is either serialized or deserialized is
|
2013-02-23 19:11:56 +01:00
|
|
|
represented by a ``<django-objects>``-tag which contains multiple
|
|
|
|
``<object>``-elements. Each such object has two attributes: "pk" and "model",
|
|
|
|
the latter being represented by the name of the app ("sessions") and the
|
|
|
|
lowercase name of the model ("session") separated by a dot.
|
|
|
|
|
|
|
|
Each field of the object is serialized as a ``<field>``-element sporting the
|
|
|
|
fields "type" and "name". The text content of the element represents the value
|
|
|
|
that should be stored.
|
|
|
|
|
|
|
|
Foreign keys and other relational fields are treated a little bit differently::
|
2013-03-22 05:50:45 -04:00
|
|
|
|
2013-02-23 19:11:56 +01:00
|
|
|
<object pk="27" model="auth.permission">
|
|
|
|
<!-- ... -->
|
|
|
|
<field to="contenttypes.contenttype" name="content_type" rel="ManyToOneRel">9</field>
|
|
|
|
<!-- ... -->
|
|
|
|
</object>
|
|
|
|
|
2018-07-31 11:32:38 -04:00
|
|
|
In this example we specify that the ``auth.Permission`` object with the PK 27
|
|
|
|
has a foreign key to the ``contenttypes.ContentType`` instance with the PK 9.
|
2013-02-23 19:11:56 +01:00
|
|
|
|
|
|
|
ManyToMany-relations are exported for the model that binds them. For instance,
|
2018-07-31 11:32:38 -04:00
|
|
|
the ``auth.User`` model has such a relation to the ``auth.Permission`` model::
|
2013-03-22 05:50:45 -04:00
|
|
|
|
2013-02-23 19:11:56 +01:00
|
|
|
<object pk="1" model="auth.user">
|
|
|
|
<!-- ... -->
|
|
|
|
<field to="auth.permission" name="user_permissions" rel="ManyToManyRel">
|
|
|
|
<object pk="46"></object>
|
|
|
|
<object pk="47"></object>
|
|
|
|
</field>
|
|
|
|
</object>
|
|
|
|
|
|
|
|
This example links the given user with the permission models with PKs 46 and 47.
|
|
|
|
|
2015-06-19 08:42:48 +02:00
|
|
|
.. admonition:: Control characters
|
|
|
|
|
|
|
|
If the content to be serialized contains control characters that are not
|
|
|
|
accepted in the XML 1.0 standard, the serialization will fail with a
|
|
|
|
:exc:`ValueError` exception. Read also the W3C's explanation of `HTML,
|
|
|
|
XHTML, XML and Control Codes
|
2018-01-07 14:28:41 +01:00
|
|
|
<https://www.w3.org/International/questions/qa-controls>`_.
|
2015-06-19 08:42:48 +02:00
|
|
|
|
2014-02-14 18:28:51 +01:00
|
|
|
.. _serialization-formats-json:
|
|
|
|
|
2013-02-23 19:11:56 +01:00
|
|
|
JSON
|
2016-01-03 12:56:22 +02:00
|
|
|
----
|
2013-02-23 19:11:56 +01:00
|
|
|
|
|
|
|
When staying with the same example data as before it would be serialized as
|
|
|
|
JSON in the following way::
|
2013-03-22 05:50:45 -04:00
|
|
|
|
2013-02-23 19:11:56 +01:00
|
|
|
[
|
|
|
|
{
|
|
|
|
"pk": "4b678b301dfd8a4e0dad910de3ae245b",
|
|
|
|
"model": "sessions.session",
|
|
|
|
"fields": {
|
|
|
|
"expire_date": "2013-01-16T08:16:59.844Z",
|
|
|
|
...
|
|
|
|
}
|
|
|
|
}
|
|
|
|
]
|
|
|
|
|
|
|
|
The formatting here is a bit simpler than with XML. The whole collection
|
|
|
|
is just represented as an array and the objects are represented by JSON objects
|
|
|
|
with three properties: "pk", "model" and "fields". "fields" is again an object
|
|
|
|
containing each field's name and value as property and property-value
|
|
|
|
respectively.
|
|
|
|
|
2019-06-17 16:54:55 +02:00
|
|
|
Foreign keys have the PK of the linked object as property value.
|
2013-02-23 19:11:56 +01:00
|
|
|
ManyToMany-relations are serialized for the model that defines them and are
|
|
|
|
represented as a list of PKs.
|
|
|
|
|
2012-04-29 19:58:00 +02:00
|
|
|
Be aware that not all Django output can be passed unmodified to :mod:`json`.
|
2015-09-26 20:15:26 +02:00
|
|
|
For example, if you have some custom type in an object to be serialized, you'll
|
2016-05-08 18:07:43 -04:00
|
|
|
have to write a custom :mod:`json` encoder for it. Something like this will
|
|
|
|
work::
|
2007-11-04 02:08:24 +00:00
|
|
|
|
2013-02-23 19:11:56 +01:00
|
|
|
from django.core.serializers.json import DjangoJSONEncoder
|
2007-11-04 02:08:24 +00:00
|
|
|
|
2013-02-23 19:11:56 +01:00
|
|
|
class LazyEncoder(DjangoJSONEncoder):
|
2007-11-04 02:08:24 +00:00
|
|
|
def default(self, obj):
|
2015-09-26 20:15:26 +02:00
|
|
|
if isinstance(obj, YourCustomType):
|
2017-04-21 19:52:26 +02:00
|
|
|
return str(obj)
|
2017-01-22 12:27:14 +05:30
|
|
|
return super().default(obj)
|
2007-11-04 02:08:24 +00:00
|
|
|
|
2016-06-28 14:48:19 +03:00
|
|
|
You can then pass ``cls=LazyEncoder`` to the ``serializers.serialize()``
|
|
|
|
function::
|
|
|
|
|
|
|
|
from django.core.serializers import serialize
|
|
|
|
|
|
|
|
serialize('json', SomeModel.objects.all(), cls=LazyEncoder)
|
|
|
|
|
2014-11-08 17:08:12 +01:00
|
|
|
Also note that GeoDjango provides a :doc:`customized GeoJSON serializer
|
|
|
|
</ref/contrib/gis/serializers>`.
|
|
|
|
|
2016-06-03 14:33:19 -04:00
|
|
|
``DjangoJSONEncoder``
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
.. class:: django.core.serializers.json.DjangoJSONEncoder
|
|
|
|
|
|
|
|
The JSON serializer uses ``DjangoJSONEncoder`` for encoding. A subclass of
|
|
|
|
:class:`~json.JSONEncoder`, it handles these additional types:
|
|
|
|
|
|
|
|
:class:`~datetime.datetime`
|
|
|
|
A string of the form ``YYYY-MM-DDTHH:mm:ss.sssZ`` or
|
|
|
|
``YYYY-MM-DDTHH:mm:ss.sss+HH:MM`` as defined in `ECMA-262`_.
|
|
|
|
|
|
|
|
:class:`~datetime.date`
|
|
|
|
A string of the form ``YYYY-MM-DD`` as defined in `ECMA-262`_.
|
|
|
|
|
|
|
|
:class:`~datetime.time`
|
|
|
|
A string of the form ``HH:MM:ss.sss`` as defined in `ECMA-262`_.
|
|
|
|
|
2016-05-26 14:48:36 +02:00
|
|
|
:class:`~datetime.timedelta`
|
|
|
|
A string representing a duration as defined in ISO-8601. For example,
|
|
|
|
``timedelta(days=1, hours=2, seconds=3.4)`` is represented as
|
|
|
|
``'P1DT02H00M03.400000S'``.
|
|
|
|
|
2016-06-03 14:33:19 -04:00
|
|
|
:class:`~decimal.Decimal`, ``Promise`` (``django.utils.functional.lazy()`` objects), :class:`~uuid.UUID`
|
|
|
|
A string representation of the object.
|
|
|
|
|
2021-04-27 12:09:00 +01:00
|
|
|
.. _ecma-262: https://262.ecma-international.org/5.1/#sec-15.9.1.15
|
2013-02-23 19:11:56 +01:00
|
|
|
|
2020-06-16 15:51:58 +01:00
|
|
|
.. _serialization-formats-jsonl:
|
|
|
|
|
|
|
|
JSONL
|
|
|
|
-----
|
|
|
|
|
|
|
|
*JSONL* stands for *JSON Lines*. With this format, objects are separated by new
|
2020-06-17 07:59:40 +02:00
|
|
|
lines, and each line contains a valid JSON object. JSONL serialized data looks
|
2020-06-16 15:51:58 +01:00
|
|
|
like this::
|
|
|
|
|
2020-06-17 07:59:40 +02:00
|
|
|
{"pk": "4b678b301dfd8a4e0dad910de3ae245b", "model": "sessions.session", "fields": {...}}
|
|
|
|
{"pk": "88bea72c02274f3c9bf1cb2bb8cee4fc", "model": "sessions.session", "fields": {...}}
|
|
|
|
{"pk": "9cf0e26691b64147a67e2a9f06ad7a53", "model": "sessions.session", "fields": {...}}
|
2020-06-16 15:51:58 +01:00
|
|
|
|
|
|
|
JSONL can be useful for populating large databases, since the data can be
|
|
|
|
processed line by line, rather than being loaded into memory all at once.
|
|
|
|
|
2013-02-23 19:11:56 +01:00
|
|
|
YAML
|
2016-01-03 12:56:22 +02:00
|
|
|
----
|
2013-02-23 19:11:56 +01:00
|
|
|
|
|
|
|
YAML serialization looks quite similar to JSON. The object list is serialized
|
|
|
|
as a sequence mappings with the keys "pk", "model" and "fields". Each field is
|
|
|
|
again a mapping with the key being name of the field and the value the value::
|
2013-03-22 05:50:45 -04:00
|
|
|
|
2021-12-27 14:51:43 +03:00
|
|
|
- model: sessions.session
|
|
|
|
pk: 4b678b301dfd8a4e0dad910de3ae245b
|
|
|
|
fields:
|
|
|
|
expire_date: 2013-01-16 08:16:59.844560+00:00
|
2013-02-23 19:11:56 +01:00
|
|
|
|
2019-06-17 16:54:55 +02:00
|
|
|
Referential fields are again represented by the PK or sequence of PKs.
|
2009-01-06 05:13:02 +00:00
|
|
|
|
2009-12-14 12:39:20 +00:00
|
|
|
.. _topics-serialization-natural-keys:
|
|
|
|
|
|
|
|
Natural keys
|
2016-01-03 12:56:22 +02:00
|
|
|
============
|
2009-12-14 12:39:20 +00:00
|
|
|
|
2012-01-07 21:52:07 +00:00
|
|
|
The default serialization strategy for foreign keys and many-to-many relations
|
|
|
|
is to serialize the value of the primary key(s) of the objects in the relation.
|
|
|
|
This strategy works well for most objects, but it can cause difficulty in some
|
|
|
|
circumstances.
|
|
|
|
|
|
|
|
Consider the case of a list of objects that have a foreign key referencing
|
2012-12-25 09:56:22 -05:00
|
|
|
:class:`~django.contrib.contenttypes.models.ContentType`. If you're going to
|
2012-01-07 21:52:07 +00:00
|
|
|
serialize an object that refers to a content type, then you need to have a way
|
|
|
|
to refer to that content type to begin with. Since ``ContentType`` objects are
|
|
|
|
automatically created by Django during the database synchronization process,
|
|
|
|
the primary key of a given content type isn't easy to predict; it will
|
2013-07-25 16:19:36 +01:00
|
|
|
depend on how and when :djadmin:`migrate` was executed. This is true for all
|
2012-01-07 21:52:07 +00:00
|
|
|
models which automatically generate objects, notably including
|
2012-02-04 12:48:01 +00:00
|
|
|
:class:`~django.contrib.auth.models.Permission`,
|
|
|
|
:class:`~django.contrib.auth.models.Group`, and
|
|
|
|
:class:`~django.contrib.auth.models.User`.
|
2012-01-07 21:52:07 +00:00
|
|
|
|
|
|
|
.. warning::
|
|
|
|
|
|
|
|
You should never include automatically generated objects in a fixture or
|
|
|
|
other serialized data. By chance, the primary keys in the fixture
|
|
|
|
may match those in the database and loading the fixture will
|
|
|
|
have no effect. In the more likely case that they don't match, the fixture
|
|
|
|
loading will fail with an :class:`~django.db.IntegrityError`.
|
2009-12-14 12:39:20 +00:00
|
|
|
|
|
|
|
There is also the matter of convenience. An integer id isn't always
|
|
|
|
the most convenient way to refer to an object; sometimes, a
|
|
|
|
more natural reference would be helpful.
|
|
|
|
|
2010-05-09 04:24:58 +00:00
|
|
|
It is for these reasons that Django provides *natural keys*. A natural
|
2009-12-14 12:39:20 +00:00
|
|
|
key is a tuple of values that can be used to uniquely identify an
|
|
|
|
object instance without using the primary key value.
|
|
|
|
|
2010-05-09 04:24:58 +00:00
|
|
|
Deserialization of natural keys
|
2016-01-03 12:56:22 +02:00
|
|
|
-------------------------------
|
2010-05-09 04:24:58 +00:00
|
|
|
|
2009-12-14 12:39:20 +00:00
|
|
|
Consider the following two models::
|
|
|
|
|
|
|
|
from django.db import models
|
|
|
|
|
|
|
|
class Person(models.Model):
|
|
|
|
first_name = models.CharField(max_length=100)
|
|
|
|
last_name = models.CharField(max_length=100)
|
|
|
|
|
|
|
|
birthdate = models.DateField()
|
|
|
|
|
2010-05-09 04:24:58 +00:00
|
|
|
class Meta:
|
2019-04-22 09:31:08 +03:00
|
|
|
unique_together = [['first_name', 'last_name']]
|
2010-05-09 04:24:58 +00:00
|
|
|
|
2009-12-14 12:39:20 +00:00
|
|
|
class Book(models.Model):
|
|
|
|
name = models.CharField(max_length=100)
|
2015-07-22 09:43:21 -05:00
|
|
|
author = models.ForeignKey(Person, on_delete=models.CASCADE)
|
2009-12-14 12:39:20 +00:00
|
|
|
|
|
|
|
Ordinarily, serialized data for ``Book`` would use an integer to refer to
|
|
|
|
the author. For example, in JSON, a Book might be serialized as::
|
|
|
|
|
|
|
|
...
|
|
|
|
{
|
|
|
|
"pk": 1,
|
|
|
|
"model": "store.book",
|
|
|
|
"fields": {
|
|
|
|
"name": "Mostly Harmless",
|
|
|
|
"author": 42
|
|
|
|
}
|
|
|
|
}
|
|
|
|
...
|
|
|
|
|
|
|
|
This isn't a particularly natural way to refer to an author. It
|
|
|
|
requires that you know the primary key value for the author; it also
|
|
|
|
requires that this primary key value is stable and predictable.
|
|
|
|
|
|
|
|
However, if we add natural key handling to Person, the fixture becomes
|
|
|
|
much more humane. To add natural key handling, you define a default
|
|
|
|
Manager for Person with a ``get_by_natural_key()`` method. In the case
|
|
|
|
of a Person, a good natural key might be the pair of first and last
|
|
|
|
name::
|
|
|
|
|
|
|
|
from django.db import models
|
|
|
|
|
|
|
|
class PersonManager(models.Manager):
|
|
|
|
def get_by_natural_key(self, first_name, last_name):
|
2009-12-18 00:32:10 +00:00
|
|
|
return self.get(first_name=first_name, last_name=last_name)
|
2009-12-14 12:39:20 +00:00
|
|
|
|
|
|
|
class Person(models.Model):
|
|
|
|
first_name = models.CharField(max_length=100)
|
|
|
|
last_name = models.CharField(max_length=100)
|
|
|
|
birthdate = models.DateField()
|
|
|
|
|
2018-12-27 16:34:14 -08:00
|
|
|
objects = PersonManager()
|
|
|
|
|
2010-05-09 04:24:58 +00:00
|
|
|
class Meta:
|
2019-04-22 09:31:08 +03:00
|
|
|
unique_together = [['first_name', 'last_name']]
|
2010-05-09 04:24:58 +00:00
|
|
|
|
2009-12-14 12:39:20 +00:00
|
|
|
Now books can use that natural key to refer to ``Person`` objects::
|
|
|
|
|
|
|
|
...
|
|
|
|
{
|
|
|
|
"pk": 1,
|
|
|
|
"model": "store.book",
|
|
|
|
"fields": {
|
|
|
|
"name": "Mostly Harmless",
|
|
|
|
"author": ["Douglas", "Adams"]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
...
|
|
|
|
|
|
|
|
When you try to load this serialized data, Django will use the
|
|
|
|
``get_by_natural_key()`` method to resolve ``["Douglas", "Adams"]``
|
|
|
|
into the primary key of an actual ``Person`` object.
|
|
|
|
|
2010-05-09 04:24:58 +00:00
|
|
|
.. note::
|
|
|
|
|
|
|
|
Whatever fields you use for a natural key must be able to uniquely
|
|
|
|
identify an object. This will usually mean that your model will
|
|
|
|
have a uniqueness clause (either unique=True on a single field, or
|
|
|
|
``unique_together`` over multiple fields) for the field or fields
|
|
|
|
in your natural key. However, uniqueness doesn't need to be
|
|
|
|
enforced at the database level. If you are certain that a set of
|
|
|
|
fields will be effectively unique, you can still use those fields
|
|
|
|
as a natural key.
|
|
|
|
|
2012-08-01 11:49:01 +10:00
|
|
|
Deserialization of objects with no primary key will always check whether the
|
|
|
|
model's manager has a ``get_by_natural_key()`` method and if so, use it to
|
|
|
|
populate the deserialized object's primary key.
|
|
|
|
|
2009-12-14 12:39:20 +00:00
|
|
|
Serialization of natural keys
|
2016-01-03 12:56:22 +02:00
|
|
|
-----------------------------
|
2009-12-14 12:39:20 +00:00
|
|
|
|
|
|
|
So how do you get Django to emit a natural key when serializing an object?
|
|
|
|
Firstly, you need to add another method -- this time to the model itself::
|
|
|
|
|
|
|
|
class Person(models.Model):
|
|
|
|
first_name = models.CharField(max_length=100)
|
|
|
|
last_name = models.CharField(max_length=100)
|
|
|
|
birthdate = models.DateField()
|
|
|
|
|
2018-12-27 16:34:14 -08:00
|
|
|
objects = PersonManager()
|
2009-12-14 12:39:20 +00:00
|
|
|
|
2010-05-09 04:24:58 +00:00
|
|
|
class Meta:
|
2019-04-22 09:31:08 +03:00
|
|
|
unique_together = [['first_name', 'last_name']]
|
2010-05-09 04:24:58 +00:00
|
|
|
|
2018-12-27 16:34:14 -08:00
|
|
|
def natural_key(self):
|
|
|
|
return (self.first_name, self.last_name)
|
|
|
|
|
2010-05-09 04:24:58 +00:00
|
|
|
That method should always return a natural key tuple -- in this
|
|
|
|
example, ``(first name, last name)``. Then, when you call
|
2012-08-01 11:49:01 +10:00
|
|
|
``serializers.serialize()``, you provide ``use_natural_foreign_keys=True``
|
|
|
|
or ``use_natural_primary_keys=True`` arguments::
|
|
|
|
|
|
|
|
>>> serializers.serialize('json', [book1, book2], indent=2,
|
|
|
|
... use_natural_foreign_keys=True, use_natural_primary_keys=True)
|
|
|
|
|
|
|
|
When ``use_natural_foreign_keys=True`` is specified, Django will use the
|
|
|
|
``natural_key()`` method to serialize any foreign key reference to objects
|
|
|
|
of the type that defines the method.
|
2009-12-14 12:39:20 +00:00
|
|
|
|
2012-08-01 11:49:01 +10:00
|
|
|
When ``use_natural_primary_keys=True`` is specified, Django will not provide the
|
|
|
|
primary key in the serialized data of this object since it can be calculated
|
|
|
|
during deserialization::
|
|
|
|
|
|
|
|
...
|
|
|
|
{
|
|
|
|
"model": "store.person",
|
|
|
|
"fields": {
|
|
|
|
"first_name": "Douglas",
|
|
|
|
"last_name": "Adams",
|
|
|
|
"birth_date": "1952-03-11",
|
|
|
|
}
|
|
|
|
}
|
|
|
|
...
|
2009-12-14 12:39:20 +00:00
|
|
|
|
2012-08-01 11:49:01 +10:00
|
|
|
This can be useful when you need to load serialized data into an existing
|
|
|
|
database and you cannot guarantee that the serialized primary key value is not
|
|
|
|
already in use, and do not need to ensure that deserialized objects retain the
|
|
|
|
same primary keys.
|
2009-12-14 12:39:20 +00:00
|
|
|
|
2012-08-01 11:49:01 +10:00
|
|
|
If you are using :djadmin:`dumpdata` to generate serialized data, use the
|
2016-01-11 20:59:34 -05:00
|
|
|
:option:`dumpdata --natural-foreign` and :option:`dumpdata --natural-primary`
|
|
|
|
command line flags to generate natural keys.
|
2009-12-14 12:39:20 +00:00
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
You don't need to define both ``natural_key()`` and
|
|
|
|
``get_by_natural_key()``. If you don't want Django to output
|
|
|
|
natural keys during serialization, but you want to retain the
|
|
|
|
ability to load natural keys, then you can opt to not implement
|
|
|
|
the ``natural_key()`` method.
|
|
|
|
|
|
|
|
Conversely, if (for some strange reason) you want Django to output
|
|
|
|
natural keys during serialization, but *not* be able to load those
|
|
|
|
key values, just don't define the ``get_by_natural_key()`` method.
|
|
|
|
|
2018-07-13 22:54:47 +01:00
|
|
|
.. _natural-keys-and-forward-references:
|
|
|
|
|
|
|
|
Natural keys and forward references
|
|
|
|
-----------------------------------
|
|
|
|
|
|
|
|
Sometimes when you use :ref:`natural foreign keys
|
|
|
|
<topics-serialization-natural-keys>` you'll need to deserialize data where
|
|
|
|
an object has a foreign key referencing another object that hasn't yet been
|
|
|
|
deserialized. This is called a "forward reference".
|
|
|
|
|
|
|
|
For instance, suppose you have the following objects in your fixture::
|
|
|
|
|
|
|
|
...
|
|
|
|
{
|
|
|
|
"model": "store.book",
|
|
|
|
"fields": {
|
|
|
|
"name": "Mostly Harmless",
|
|
|
|
"author": ["Douglas", "Adams"]
|
|
|
|
}
|
|
|
|
},
|
|
|
|
...
|
|
|
|
{
|
|
|
|
"model": "store.person",
|
|
|
|
"fields": {
|
|
|
|
"first_name": "Douglas",
|
|
|
|
"last_name": "Adams"
|
|
|
|
}
|
|
|
|
},
|
|
|
|
...
|
|
|
|
|
|
|
|
In order to handle this situation, you need to pass
|
|
|
|
``handle_forward_references=True`` to ``serializers.deserialize()``. This will
|
|
|
|
set the ``deferred_fields`` attribute on the ``DeserializedObject`` instances.
|
|
|
|
You'll need to keep track of ``DeserializedObject`` instances where this
|
|
|
|
attribute isn't ``None`` and later call ``save_deferred_fields()`` on them.
|
|
|
|
|
|
|
|
Typical usage looks like this::
|
|
|
|
|
|
|
|
objs_with_deferred_fields = []
|
|
|
|
|
|
|
|
for obj in serializers.deserialize('xml', data, handle_forward_references=True):
|
|
|
|
obj.save()
|
|
|
|
if obj.deferred_fields is not None:
|
|
|
|
objs_with_deferred_fields.append(obj)
|
|
|
|
|
|
|
|
for obj in objs_with_deferred_fields:
|
|
|
|
obj.save_deferred_fields()
|
|
|
|
|
|
|
|
For this to work, the ``ForeignKey`` on the referencing model must have
|
|
|
|
``null=True``.
|
|
|
|
|
2009-12-14 12:39:20 +00:00
|
|
|
Dependencies during serialization
|
2016-01-03 12:56:22 +02:00
|
|
|
---------------------------------
|
2009-12-14 12:39:20 +00:00
|
|
|
|
2018-07-13 22:54:47 +01:00
|
|
|
It's often possible to avoid explicitly having to handle forward references by
|
|
|
|
taking care with the ordering of objects within a fixture.
|
2009-12-14 12:39:20 +00:00
|
|
|
|
2018-07-13 22:54:47 +01:00
|
|
|
To help with this, calls to :djadmin:`dumpdata` that use the :option:`dumpdata
|
|
|
|
--natural-foreign` option will serialize any model with a ``natural_key()``
|
|
|
|
method before serializing standard primary key objects.
|
2009-12-14 12:39:20 +00:00
|
|
|
|
|
|
|
However, this may not always be enough. If your natural key refers to
|
|
|
|
another object (by using a foreign key or natural key to another object
|
|
|
|
as part of a natural key), then you need to be able to ensure that
|
|
|
|
the objects on which a natural key depends occur in the serialized data
|
|
|
|
before the natural key requires them.
|
|
|
|
|
|
|
|
To control this ordering, you can define dependencies on your
|
|
|
|
``natural_key()`` methods. You do this by setting a ``dependencies``
|
|
|
|
attribute on the ``natural_key()`` method itself.
|
|
|
|
|
2012-01-07 21:52:07 +00:00
|
|
|
For example, let's add a natural key to the ``Book`` model from the
|
|
|
|
example above::
|
|
|
|
|
|
|
|
class Book(models.Model):
|
|
|
|
name = models.CharField(max_length=100)
|
2015-07-22 09:43:21 -05:00
|
|
|
author = models.ForeignKey(Person, on_delete=models.CASCADE)
|
2009-12-14 12:39:20 +00:00
|
|
|
|
|
|
|
def natural_key(self):
|
2012-01-07 21:52:07 +00:00
|
|
|
return (self.name,) + self.author.natural_key()
|
2009-12-14 12:39:20 +00:00
|
|
|
|
2012-01-07 21:52:07 +00:00
|
|
|
The natural key for a ``Book`` is a combination of its name and its
|
2012-02-03 20:16:06 +00:00
|
|
|
author. This means that ``Person`` must be serialized before ``Book``.
|
2012-01-07 21:52:07 +00:00
|
|
|
To define this dependency, we add one extra line::
|
2009-12-14 12:39:20 +00:00
|
|
|
|
|
|
|
def natural_key(self):
|
2012-01-07 21:52:07 +00:00
|
|
|
return (self.name,) + self.author.natural_key()
|
|
|
|
natural_key.dependencies = ['example_app.person']
|
2009-12-14 12:39:20 +00:00
|
|
|
|
2012-01-07 21:52:07 +00:00
|
|
|
This definition ensures that all ``Person`` objects are serialized before
|
|
|
|
any ``Book`` objects. In turn, any object referencing ``Book`` will be
|
|
|
|
serialized after both ``Person`` and ``Book`` have been serialized.
|