mirror of
https://github.com/django/django.git
synced 2025-01-25 17:49:52 +00:00
64aba7a8ab
Thanks Pablo Oubiña for the report.
366 lines
14 KiB
Plaintext
366 lines
14 KiB
Plaintext
============================
|
|
Database access optimization
|
|
============================
|
|
|
|
Django's database layer provides various ways to help developers get the most
|
|
out of their databases. This document gathers together links to the relevant
|
|
documentation, and adds various tips, organized under a number of headings that
|
|
outline the steps to take when attempting to optimize your database usage.
|
|
|
|
Profile first
|
|
=============
|
|
|
|
As general programming practice, this goes without saying. Find out :ref:`what
|
|
queries you are doing and what they are costing you
|
|
<faq-see-raw-sql-queries>`. You may also want to use an external project like
|
|
django-debug-toolbar_, or a tool that monitors your database directly.
|
|
|
|
Remember that you may be optimizing for speed or memory or both, depending on
|
|
your requirements. Sometimes optimizing for one will be detrimental to the
|
|
other, but sometimes they will help each other. Also, work that is done by the
|
|
database process might not have the same cost (to you) as the same amount of
|
|
work done in your Python process. It is up to you to decide what your
|
|
priorities are, where the balance must lie, and profile all of these as required
|
|
since this will depend on your application and server.
|
|
|
|
With everything that follows, remember to profile after every change to ensure
|
|
that the change is a benefit, and a big enough benefit given the decrease in
|
|
readability of your code. **All** of the suggestions below come with the caveat
|
|
that in your circumstances the general principle might not apply, or might even
|
|
be reversed.
|
|
|
|
.. _django-debug-toolbar: https://github.com/django-debug-toolbar/django-debug-toolbar/
|
|
|
|
Use standard DB optimization techniques
|
|
=======================================
|
|
|
|
...including:
|
|
|
|
* Indexes_. This is a number one priority, *after* you have determined from
|
|
profiling what indexes should be added. Use
|
|
:attr:`Field.db_index <django.db.models.Field.db_index>` or
|
|
:attr:`Meta.index_together <django.db.models.Options.index_together>` to add
|
|
these from Django. Consider adding indexes to fields that you frequently
|
|
query using :meth:`~django.db.models.query.QuerySet.filter()`,
|
|
:meth:`~django.db.models.query.QuerySet.exclude()`,
|
|
:meth:`~django.db.models.query.QuerySet.order_by()`, etc. as indexes may help
|
|
to speed up lookups. Note that determining the best indexes is a complex
|
|
database-dependent topic that will depend on your particular application.
|
|
The overhead of maintaining an index may outweigh any gains in query speed.
|
|
|
|
.. _Indexes: https://en.wikipedia.org/wiki/Database_index
|
|
|
|
* Appropriate use of field types.
|
|
|
|
We will assume you have done the obvious things above. The rest of this document
|
|
focuses on how to use Django in such a way that you are not doing unnecessary
|
|
work. This document also does not address other optimization techniques that
|
|
apply to all expensive operations, such as :doc:`general purpose caching
|
|
</topics/cache>`.
|
|
|
|
Understand ``QuerySet``\s
|
|
=========================
|
|
|
|
Understanding :doc:`QuerySets </ref/models/querysets>` is vital to getting good
|
|
performance with simple code. In particular:
|
|
|
|
Understand ``QuerySet`` evaluation
|
|
----------------------------------
|
|
|
|
To avoid performance problems, it is important to understand:
|
|
|
|
* that :ref:`QuerySets are lazy <querysets-are-lazy>`.
|
|
|
|
* when :ref:`they are evaluated <when-querysets-are-evaluated>`.
|
|
|
|
* how :ref:`the data is held in memory <caching-and-querysets>`.
|
|
|
|
Understand cached attributes
|
|
----------------------------
|
|
|
|
As well as caching of the whole ``QuerySet``, there is caching of the result of
|
|
attributes on ORM objects. In general, attributes that are not callable will be
|
|
cached. For example, assuming the :ref:`example Weblog models
|
|
<queryset-model-example>`::
|
|
|
|
>>> entry = Entry.objects.get(id=1)
|
|
>>> entry.blog # Blog object is retrieved at this point
|
|
>>> entry.blog # cached version, no DB access
|
|
|
|
But in general, callable attributes cause DB lookups every time::
|
|
|
|
>>> entry = Entry.objects.get(id=1)
|
|
>>> entry.authors.all() # query performed
|
|
>>> entry.authors.all() # query performed again
|
|
|
|
Be careful when reading template code - the template system does not allow use
|
|
of parentheses, but will call callables automatically, hiding the above
|
|
distinction.
|
|
|
|
Be careful with your own custom properties - it is up to you to implement
|
|
caching when required, for example using the
|
|
:class:`~django.utils.functional.cached_property` decorator.
|
|
|
|
Use the ``with`` template tag
|
|
-----------------------------
|
|
|
|
To make use of the caching behavior of ``QuerySet``, you may need to use the
|
|
:ttag:`with` template tag.
|
|
|
|
Use ``iterator()``
|
|
------------------
|
|
|
|
When you have a lot of objects, the caching behavior of the ``QuerySet`` can
|
|
cause a large amount of memory to be used. In this case,
|
|
:meth:`~django.db.models.query.QuerySet.iterator()` may help.
|
|
|
|
Do database work in the database rather than in Python
|
|
======================================================
|
|
|
|
For instance:
|
|
|
|
* At the most basic level, use :ref:`filter and exclude <queryset-api>` to do
|
|
filtering in the database.
|
|
|
|
* Use :class:`F expressions <django.db.models.F>` to filter
|
|
based on other fields within the same model.
|
|
|
|
* Use :doc:`annotate to do aggregation in the database
|
|
</topics/db/aggregation>`.
|
|
|
|
If these aren't enough to generate the SQL you need:
|
|
|
|
Use ``RawSQL``
|
|
--------------
|
|
|
|
A less portable but more powerful method is the
|
|
:class:`~django.db.models.expressions.RawSQL` expression, which allows some SQL
|
|
to be explicitly added to the query. If that still isn't powerful enough:
|
|
|
|
Use raw SQL
|
|
-----------
|
|
|
|
Write your own :doc:`custom SQL to retrieve data or populate models
|
|
</topics/db/sql>`. Use ``django.db.connection.queries`` to find out what Django
|
|
is writing for you and start from there.
|
|
|
|
Retrieve individual objects using a unique, indexed column
|
|
==========================================================
|
|
|
|
There are two reasons to use a column with
|
|
:attr:`~django.db.models.Field.unique` or
|
|
:attr:`~django.db.models.Field.db_index` when using
|
|
:meth:`~django.db.models.query.QuerySet.get` to retrieve individual objects.
|
|
First, the query will be quicker because of the underlying database index.
|
|
Also, the query could run much slower if multiple objects match the lookup;
|
|
having a unique constraint on the column guarantees this will never happen.
|
|
|
|
So using the :ref:`example Weblog models <queryset-model-example>`::
|
|
|
|
>>> entry = Entry.objects.get(id=10)
|
|
|
|
will be quicker than:
|
|
|
|
>>> entry = Entry.objects.get(headline="News Item Title")
|
|
|
|
because ``id`` is indexed by the database and is guaranteed to be unique.
|
|
|
|
Doing the following is potentially quite slow:
|
|
|
|
>>> entry = Entry.objects.get(headline__startswith="News")
|
|
|
|
First of all, ``headline`` is not indexed, which will make the underlying
|
|
database fetch slower.
|
|
|
|
Second, the lookup doesn't guarantee that only one object will be returned.
|
|
If the query matches more than one object, it will retrieve and transfer all of
|
|
them from the database. This penalty could be substantial if hundreds or
|
|
thousands of records are returned. The penalty will be compounded if the
|
|
database lives on a separate server, where network overhead and latency also
|
|
play a factor.
|
|
|
|
Retrieve everything at once if you know you will need it
|
|
========================================================
|
|
|
|
Hitting the database multiple times for different parts of a single 'set' of
|
|
data that you will need all parts of is, in general, less efficient than
|
|
retrieving it all in one query. This is particularly important if you have a
|
|
query that is executed in a loop, and could therefore end up doing many database
|
|
queries, when only one was needed. So:
|
|
|
|
Use ``QuerySet.select_related()`` and ``prefetch_related()``
|
|
------------------------------------------------------------
|
|
|
|
Understand :meth:`~django.db.models.query.QuerySet.select_related` and
|
|
:meth:`~django.db.models.query.QuerySet.prefetch_related` thoroughly, and use
|
|
them:
|
|
|
|
* in view code,
|
|
|
|
* and in :doc:`managers and default managers </topics/db/managers>` where
|
|
appropriate. Be aware when your manager is and is not used; sometimes this is
|
|
tricky so don't make assumptions.
|
|
|
|
Don't retrieve things you don't need
|
|
====================================
|
|
|
|
Use ``QuerySet.values()`` and ``values_list()``
|
|
-----------------------------------------------
|
|
|
|
When you just want a ``dict`` or ``list`` of values, and don't need ORM model
|
|
objects, make appropriate usage of
|
|
:meth:`~django.db.models.query.QuerySet.values()`.
|
|
These can be useful for replacing model objects in template code - as long as
|
|
the dicts you supply have the same attributes as those used in the template,
|
|
you are fine.
|
|
|
|
Use ``QuerySet.defer()`` and ``only()``
|
|
---------------------------------------
|
|
|
|
Use :meth:`~django.db.models.query.QuerySet.defer()` and
|
|
:meth:`~django.db.models.query.QuerySet.only()` if there are database columns
|
|
you know that you won't need (or won't need in most cases) to avoid loading
|
|
them. Note that if you *do* use them, the ORM will have to go and get them in
|
|
a separate query, making this a pessimization if you use it inappropriately.
|
|
|
|
Also, be aware that there is some (small extra) overhead incurred inside
|
|
Django when constructing a model with deferred fields. Don't be too aggressive
|
|
in deferring fields without profiling as the database has to read most of the
|
|
non-text, non-VARCHAR data from the disk for a single row in the results, even
|
|
if it ends up only using a few columns. The ``defer()`` and ``only()`` methods
|
|
are most useful when you can avoid loading a lot of text data or for fields
|
|
that might take a lot of processing to convert back to Python. As always,
|
|
profile first, then optimize.
|
|
|
|
Use ``QuerySet.count()``
|
|
------------------------
|
|
|
|
...if you only want the count, rather than doing ``len(queryset)``.
|
|
|
|
Use ``QuerySet.exists()``
|
|
-------------------------
|
|
|
|
...if you only want to find out if at least one result exists, rather than ``if
|
|
queryset``.
|
|
|
|
But:
|
|
|
|
.. _overuse_of_count_and_exists:
|
|
|
|
Don't overuse ``count()`` and ``exists()``
|
|
------------------------------------------
|
|
|
|
If you are going to need other data from the QuerySet, just evaluate it.
|
|
|
|
For example, assuming an Email model that has a ``body`` attribute and a
|
|
many-to-many relation to User, the following template code is optimal:
|
|
|
|
.. code-block:: html+django
|
|
|
|
{% if display_inbox %}
|
|
{% with emails=user.emails.all %}
|
|
{% if emails %}
|
|
<p>You have {{ emails|length }} email(s)</p>
|
|
{% for email in emails %}
|
|
<p>{{ email.body }}</p>
|
|
{% endfor %}
|
|
{% else %}
|
|
<p>No messages today.</p>
|
|
{% endif %}
|
|
{% endwith %}
|
|
{% endif %}
|
|
|
|
|
|
It is optimal because:
|
|
|
|
1. Since QuerySets are lazy, this does no database queries if 'display_inbox'
|
|
is False.
|
|
|
|
#. Use of :ttag:`with` means that we store ``user.emails.all`` in a variable
|
|
for later use, allowing its cache to be re-used.
|
|
|
|
#. The line ``{% if emails %}`` causes ``QuerySet.__bool__()`` to be called,
|
|
which causes the ``user.emails.all()`` query to be run on the database, and
|
|
at the least the first line to be turned into an ORM object. If there aren't
|
|
any results, it will return False, otherwise True.
|
|
|
|
#. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling
|
|
out the rest of the cache without doing another query.
|
|
|
|
#. The :ttag:`for` loop iterates over the already filled cache.
|
|
|
|
In total, this code does either one or zero database queries. The only
|
|
deliberate optimization performed is the use of the :ttag:`with` tag. Using
|
|
``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause
|
|
additional queries.
|
|
|
|
Use ``QuerySet.update()`` and ``delete()``
|
|
------------------------------------------
|
|
|
|
Rather than retrieve a load of objects, set some values, and save them
|
|
individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
|
|
<topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
|
|
<topics-db-queries-delete>` where possible.
|
|
|
|
Note, however, that these bulk update methods cannot call the ``save()`` or
|
|
``delete()`` methods of individual instances, which means that any custom
|
|
behavior you have added for these methods will not be executed, including
|
|
anything driven from the normal database object :doc:`signals </ref/signals>`.
|
|
|
|
Use foreign key values directly
|
|
-------------------------------
|
|
|
|
If you only need a foreign key value, use the foreign key value that is already on
|
|
the object you've got, rather than getting the whole related object and taking
|
|
its primary key. i.e. do::
|
|
|
|
entry.blog_id
|
|
|
|
instead of::
|
|
|
|
entry.blog.id
|
|
|
|
Don't order results if you don't care
|
|
-------------------------------------
|
|
|
|
Ordering is not free; each field to order by is an operation the database must
|
|
perform. If a model has a default ordering (:attr:`Meta.ordering
|
|
<django.db.models.Options.ordering>`) and you don't need it, remove
|
|
it on a ``QuerySet`` by calling
|
|
:meth:`~django.db.models.query.QuerySet.order_by()` with no parameters.
|
|
|
|
Adding an index to your database may help to improve ordering performance.
|
|
|
|
Insert in bulk
|
|
==============
|
|
|
|
When creating objects, where possible, use the
|
|
:meth:`~django.db.models.query.QuerySet.bulk_create()` method to reduce the
|
|
number of SQL queries. For example::
|
|
|
|
Entry.objects.bulk_create([
|
|
Entry(headline="Python 3.0 Released"),
|
|
Entry(headline="Python 3.1 Planned")
|
|
])
|
|
|
|
...is preferable to::
|
|
|
|
Entry.objects.create(headline="Python 3.0 Released")
|
|
Entry.objects.create(headline="Python 3.1 Planned")
|
|
|
|
Note that there are a number of :meth:`caveats to this method
|
|
<django.db.models.query.QuerySet.bulk_create>`, so make sure it's appropriate
|
|
for your use case.
|
|
|
|
This also applies to :class:`ManyToManyFields
|
|
<django.db.models.ManyToManyField>`, so doing::
|
|
|
|
my_band.members.add(me, my_friend)
|
|
|
|
...is preferable to::
|
|
|
|
my_band.members.add(me)
|
|
my_band.members.add(my_friend)
|
|
|
|
...where ``Bands`` and ``Artists`` have a many-to-many relationship.
|