mirror of
https://github.com/django/django.git
synced 2024-12-30 13:05:45 +00:00
467 lines
17 KiB
Plaintext
467 lines
17 KiB
Plaintext
============================
|
|
Database access optimization
|
|
============================
|
|
|
|
Django's database layer provides various ways to help developers get the most
|
|
out of their databases. This document gathers together links to the relevant
|
|
documentation, and adds various tips, organized under a number of headings that
|
|
outline the steps to take when attempting to optimize your database usage.
|
|
|
|
Profile first
|
|
=============
|
|
|
|
As general programming practice, this goes without saying. Find out :ref:`what
|
|
queries you are doing and what they are costing you <faq-see-raw-sql-queries>`.
|
|
Use :meth:`.QuerySet.explain` to understand how specific ``QuerySet``\s are
|
|
executed by your database. You may also want to use an external project like
|
|
django-debug-toolbar_, or a tool that monitors your database directly.
|
|
|
|
Remember that you may be optimizing for speed or memory or both, depending on
|
|
your requirements. Sometimes optimizing for one will be detrimental to the
|
|
other, but sometimes they will help each other. Also, work that is done by the
|
|
database process might not have the same cost (to you) as the same amount of
|
|
work done in your Python process. It is up to you to decide what your
|
|
priorities are, where the balance must lie, and profile all of these as required
|
|
since this will depend on your application and server.
|
|
|
|
With everything that follows, remember to profile after every change to ensure
|
|
that the change is a benefit, and a big enough benefit given the decrease in
|
|
readability of your code. **All** of the suggestions below come with the caveat
|
|
that in your circumstances the general principle might not apply, or might even
|
|
be reversed.
|
|
|
|
.. _django-debug-toolbar: https://github.com/jazzband/django-debug-toolbar/
|
|
|
|
Use standard DB optimization techniques
|
|
=======================================
|
|
|
|
...including:
|
|
|
|
* Indexes_. This is a number one priority, *after* you have determined from
|
|
profiling what indexes should be added. Use
|
|
:attr:`Meta.indexes <django.db.models.Options.indexes>` or
|
|
:attr:`Field.db_index <django.db.models.Field.db_index>` to add these from
|
|
Django. Consider adding indexes to fields that you frequently query using
|
|
:meth:`~django.db.models.query.QuerySet.filter()`,
|
|
:meth:`~django.db.models.query.QuerySet.exclude()`,
|
|
:meth:`~django.db.models.query.QuerySet.order_by()`, etc. as indexes may help
|
|
to speed up lookups. Note that determining the best indexes is a complex
|
|
database-dependent topic that will depend on your particular application.
|
|
The overhead of maintaining an index may outweigh any gains in query speed.
|
|
|
|
.. _Indexes: https://en.wikipedia.org/wiki/Database_index
|
|
|
|
* Appropriate use of field types.
|
|
|
|
We will assume you have done the things listed above. The rest of this document
|
|
focuses on how to use Django in such a way that you are not doing unnecessary
|
|
work. This document also does not address other optimization techniques that
|
|
apply to all expensive operations, such as :doc:`general purpose caching
|
|
</topics/cache>`.
|
|
|
|
Understand ``QuerySet``\s
|
|
=========================
|
|
|
|
Understanding :doc:`QuerySets </ref/models/querysets>` is vital to getting good
|
|
performance with simple code. In particular:
|
|
|
|
Understand ``QuerySet`` evaluation
|
|
----------------------------------
|
|
|
|
To avoid performance problems, it is important to understand:
|
|
|
|
* that :ref:`QuerySets are lazy <querysets-are-lazy>`.
|
|
|
|
* when :ref:`they are evaluated <when-querysets-are-evaluated>`.
|
|
|
|
* how :ref:`the data is held in memory <caching-and-querysets>`.
|
|
|
|
Understand cached attributes
|
|
----------------------------
|
|
|
|
As well as caching of the whole ``QuerySet``, there is caching of the result of
|
|
attributes on ORM objects. In general, attributes that are not callable will be
|
|
cached. For example, assuming the :ref:`example Weblog models
|
|
<queryset-model-example>`::
|
|
|
|
>>> entry = Entry.objects.get(id=1)
|
|
>>> entry.blog # Blog object is retrieved at this point
|
|
>>> entry.blog # cached version, no DB access
|
|
|
|
But in general, callable attributes cause DB lookups every time::
|
|
|
|
>>> entry = Entry.objects.get(id=1)
|
|
>>> entry.authors.all() # query performed
|
|
>>> entry.authors.all() # query performed again
|
|
|
|
Be careful when reading template code - the template system does not allow use
|
|
of parentheses, but will call callables automatically, hiding the above
|
|
distinction.
|
|
|
|
Be careful with your own custom properties - it is up to you to implement
|
|
caching when required, for example using the
|
|
:class:`~django.utils.functional.cached_property` decorator.
|
|
|
|
Use the ``with`` template tag
|
|
-----------------------------
|
|
|
|
To make use of the caching behavior of ``QuerySet``, you may need to use the
|
|
:ttag:`with` template tag.
|
|
|
|
Use ``iterator()``
|
|
------------------
|
|
|
|
When you have a lot of objects, the caching behavior of the ``QuerySet`` can
|
|
cause a large amount of memory to be used. In this case,
|
|
:meth:`~django.db.models.query.QuerySet.iterator()` may help.
|
|
|
|
Use ``explain()``
|
|
-----------------
|
|
|
|
:meth:`.QuerySet.explain` gives you detailed information about how the database
|
|
executes a query, including indexes and joins that are used. These details may
|
|
help you find queries that could be rewritten more efficiently, or identify
|
|
indexes that could be added to improve performance.
|
|
|
|
Do database work in the database rather than in Python
|
|
======================================================
|
|
|
|
For instance:
|
|
|
|
* At the most basic level, use :ref:`filter and exclude <queryset-api>` to do
|
|
filtering in the database.
|
|
|
|
* Use :class:`F expressions <django.db.models.F>` to filter
|
|
based on other fields within the same model.
|
|
|
|
* Use :doc:`annotate to do aggregation in the database
|
|
</topics/db/aggregation>`.
|
|
|
|
If these aren't enough to generate the SQL you need:
|
|
|
|
Use ``RawSQL``
|
|
--------------
|
|
|
|
A less portable but more powerful method is the
|
|
:class:`~django.db.models.expressions.RawSQL` expression, which allows some SQL
|
|
to be explicitly added to the query. If that still isn't powerful enough:
|
|
|
|
Use raw SQL
|
|
-----------
|
|
|
|
Write your own :doc:`custom SQL to retrieve data or populate models
|
|
</topics/db/sql>`. Use ``django.db.connection.queries`` to find out what Django
|
|
is writing for you and start from there.
|
|
|
|
Retrieve individual objects using a unique, indexed column
|
|
==========================================================
|
|
|
|
There are two reasons to use a column with
|
|
:attr:`~django.db.models.Field.unique` or
|
|
:attr:`~django.db.models.Field.db_index` when using
|
|
:meth:`~django.db.models.query.QuerySet.get` to retrieve individual objects.
|
|
First, the query will be quicker because of the underlying database index.
|
|
Also, the query could run much slower if multiple objects match the lookup;
|
|
having a unique constraint on the column guarantees this will never happen.
|
|
|
|
So using the :ref:`example Weblog models <queryset-model-example>`::
|
|
|
|
>>> entry = Entry.objects.get(id=10)
|
|
|
|
will be quicker than:
|
|
|
|
>>> entry = Entry.objects.get(headline="News Item Title")
|
|
|
|
because ``id`` is indexed by the database and is guaranteed to be unique.
|
|
|
|
Doing the following is potentially quite slow:
|
|
|
|
>>> entry = Entry.objects.get(headline__startswith="News")
|
|
|
|
First of all, ``headline`` is not indexed, which will make the underlying
|
|
database fetch slower.
|
|
|
|
Second, the lookup doesn't guarantee that only one object will be returned.
|
|
If the query matches more than one object, it will retrieve and transfer all of
|
|
them from the database. This penalty could be substantial if hundreds or
|
|
thousands of records are returned. The penalty will be compounded if the
|
|
database lives on a separate server, where network overhead and latency also
|
|
play a factor.
|
|
|
|
Retrieve everything at once if you know you will need it
|
|
========================================================
|
|
|
|
Hitting the database multiple times for different parts of a single 'set' of
|
|
data that you will need all parts of is, in general, less efficient than
|
|
retrieving it all in one query. This is particularly important if you have a
|
|
query that is executed in a loop, and could therefore end up doing many database
|
|
queries, when only one was needed. So:
|
|
|
|
Use ``QuerySet.select_related()`` and ``prefetch_related()``
|
|
------------------------------------------------------------
|
|
|
|
Understand :meth:`~django.db.models.query.QuerySet.select_related` and
|
|
:meth:`~django.db.models.query.QuerySet.prefetch_related` thoroughly, and use
|
|
them:
|
|
|
|
* in :doc:`managers and default managers </topics/db/managers>` where
|
|
appropriate. Be aware when your manager is and is not used; sometimes this is
|
|
tricky so don't make assumptions.
|
|
|
|
* in view code or other layers, possibly making use of
|
|
:func:`~django.db.models.prefetch_related_objects` where needed.
|
|
|
|
Don't retrieve things you don't need
|
|
====================================
|
|
|
|
Use ``QuerySet.values()`` and ``values_list()``
|
|
-----------------------------------------------
|
|
|
|
When you only want a ``dict`` or ``list`` of values, and don't need ORM model
|
|
objects, make appropriate usage of
|
|
:meth:`~django.db.models.query.QuerySet.values()`.
|
|
These can be useful for replacing model objects in template code - as long as
|
|
the dicts you supply have the same attributes as those used in the template,
|
|
you are fine.
|
|
|
|
Use ``QuerySet.defer()`` and ``only()``
|
|
---------------------------------------
|
|
|
|
Use :meth:`~django.db.models.query.QuerySet.defer()` and
|
|
:meth:`~django.db.models.query.QuerySet.only()` if there are database columns
|
|
you know that you won't need (or won't need in most cases) to avoid loading
|
|
them. Note that if you *do* use them, the ORM will have to go and get them in
|
|
a separate query, making this a pessimization if you use it inappropriately.
|
|
|
|
Don't be too aggressive in deferring fields without profiling as the database
|
|
has to read most of the non-text, non-VARCHAR data from the disk for a single
|
|
row in the results, even if it ends up only using a few columns. The
|
|
``defer()`` and ``only()`` methods are most useful when you can avoid loading a
|
|
lot of text data or for fields that might take a lot of processing to convert
|
|
back to Python. As always, profile first, then optimize.
|
|
|
|
Use ``QuerySet.count()``
|
|
------------------------
|
|
|
|
...if you only want the count, rather than doing ``len(queryset)``.
|
|
|
|
Use ``QuerySet.exists()``
|
|
-------------------------
|
|
|
|
...if you only want to find out if at least one result exists, rather than ``if
|
|
queryset``.
|
|
|
|
But:
|
|
|
|
.. _overuse_of_count_and_exists:
|
|
|
|
Don't overuse ``count()`` and ``exists()``
|
|
------------------------------------------
|
|
|
|
If you are going to need other data from the QuerySet, evaluate it immediately.
|
|
|
|
For example, assuming an Email model that has a ``subject`` attribute and a
|
|
many-to-many relation to User, the following code is optimal::
|
|
|
|
if display_emails:
|
|
emails = user.emails.all()
|
|
if emails:
|
|
print('You have', len(emails), 'emails:')
|
|
for email in emails:
|
|
print(email.subject)
|
|
else:
|
|
print('You do not have any emails.')
|
|
|
|
It is optimal because:
|
|
|
|
#. Since QuerySets are lazy, this does no database queries if
|
|
``display_emails`` is ``False``.
|
|
|
|
#. Storing ``user.emails.all()`` in the ``emails`` variable allows its result
|
|
cache to be re-used.
|
|
|
|
#. The line ``if emails`` causes ``QuerySet.__bool__()`` to be called, which
|
|
causes the ``user.emails.all()`` query to be run on the database. If there
|
|
aren't any results, it will return ``False``, otherwise ``True``.
|
|
|
|
#. The use of ``len(emails)`` calls ``QuerySet.__len__()``, reusing the result
|
|
cache.
|
|
|
|
#. The ``for`` loop iterates over the already filled cache.
|
|
|
|
In total, this code does either one or zero database queries. The only
|
|
deliberate optimization performed is using the ``emails`` variable. Using
|
|
``QuerySet.exists()`` for the ``if`` or ``QuerySet.count()`` for the count
|
|
would each cause additional queries.
|
|
|
|
Use ``QuerySet.update()`` and ``delete()``
|
|
------------------------------------------
|
|
|
|
Rather than retrieve a load of objects, set some values, and save them
|
|
individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
|
|
<topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
|
|
<topics-db-queries-delete>` where possible.
|
|
|
|
Note, however, that these bulk update methods cannot call the ``save()`` or
|
|
``delete()`` methods of individual instances, which means that any custom
|
|
behavior you have added for these methods will not be executed, including
|
|
anything driven from the normal database object :doc:`signals </ref/signals>`.
|
|
|
|
Use foreign key values directly
|
|
-------------------------------
|
|
|
|
If you only need a foreign key value, use the foreign key value that is already on
|
|
the object you've got, rather than getting the whole related object and taking
|
|
its primary key. i.e. do::
|
|
|
|
entry.blog_id
|
|
|
|
instead of::
|
|
|
|
entry.blog.id
|
|
|
|
Don't order results if you don't care
|
|
-------------------------------------
|
|
|
|
Ordering is not free; each field to order by is an operation the database must
|
|
perform. If a model has a default ordering (:attr:`Meta.ordering
|
|
<django.db.models.Options.ordering>`) and you don't need it, remove
|
|
it on a ``QuerySet`` by calling
|
|
:meth:`~django.db.models.query.QuerySet.order_by()` with no parameters.
|
|
|
|
Adding an index to your database may help to improve ordering performance.
|
|
|
|
Use bulk methods
|
|
================
|
|
|
|
Use bulk methods to reduce the number of SQL statements.
|
|
|
|
Create in bulk
|
|
--------------
|
|
|
|
When creating objects, where possible, use the
|
|
:meth:`~django.db.models.query.QuerySet.bulk_create()` method to reduce the
|
|
number of SQL queries. For example::
|
|
|
|
Entry.objects.bulk_create([
|
|
Entry(headline='This is a test'),
|
|
Entry(headline='This is only a test'),
|
|
])
|
|
|
|
...is preferable to::
|
|
|
|
Entry.objects.create(headline='This is a test')
|
|
Entry.objects.create(headline='This is only a test')
|
|
|
|
Note that there are a number of :meth:`caveats to this method
|
|
<django.db.models.query.QuerySet.bulk_create>`, so make sure it's appropriate
|
|
for your use case.
|
|
|
|
Update in bulk
|
|
--------------
|
|
|
|
When updating objects, where possible, use the
|
|
:meth:`~django.db.models.query.QuerySet.bulk_update()` method to reduce the
|
|
number of SQL queries. Given a list or queryset of objects::
|
|
|
|
entries = Entry.objects.bulk_create([
|
|
Entry(headline='This is a test'),
|
|
Entry(headline='This is only a test'),
|
|
])
|
|
|
|
The following example::
|
|
|
|
entries[0].headline = 'This is not a test'
|
|
entries[1].headline = 'This is no longer a test'
|
|
Entry.objects.bulk_update(entries, ['headline'])
|
|
|
|
...is preferable to::
|
|
|
|
entries[0].headline = 'This is not a test'
|
|
entries[0].save()
|
|
entries[1].headline = 'This is no longer a test'
|
|
entries[1].save()
|
|
|
|
Note that there are a number of :meth:`caveats to this method
|
|
<django.db.models.query.QuerySet.bulk_update>`, so make sure it's appropriate
|
|
for your use case.
|
|
|
|
Insert in bulk
|
|
--------------
|
|
|
|
When inserting objects into :class:`ManyToManyFields
|
|
<django.db.models.ManyToManyField>`, use
|
|
:meth:`~django.db.models.fields.related.RelatedManager.add` with multiple
|
|
objects to reduce the number of SQL queries. For example::
|
|
|
|
my_band.members.add(me, my_friend)
|
|
|
|
...is preferable to::
|
|
|
|
my_band.members.add(me)
|
|
my_band.members.add(my_friend)
|
|
|
|
...where ``Bands`` and ``Artists`` have a many-to-many relationship.
|
|
|
|
When inserting different pairs of objects into
|
|
:class:`~django.db.models.ManyToManyField` or when the custom
|
|
:attr:`~django.db.models.ManyToManyField.through` table is defined, use
|
|
:meth:`~django.db.models.query.QuerySet.bulk_create()` method to reduce the
|
|
number of SQL queries. For example::
|
|
|
|
PizzaToppingRelationship = Pizza.toppings.through
|
|
PizzaToppingRelationship.objects.bulk_create([
|
|
PizzaToppingRelationship(pizza=my_pizza, topping=pepperoni),
|
|
PizzaToppingRelationship(pizza=your_pizza, topping=pepperoni),
|
|
PizzaToppingRelationship(pizza=your_pizza, topping=mushroom),
|
|
], ignore_conflicts=True)
|
|
|
|
...is preferable to::
|
|
|
|
my_pizza.toppings.add(pepperoni)
|
|
your_pizza.toppings.add(pepperoni, mushroom)
|
|
|
|
...where ``Pizza`` and ``Topping`` have a many-to-many relationship. Note that
|
|
there are a number of :meth:`caveats to this method
|
|
<django.db.models.query.QuerySet.bulk_create>`, so make sure it's appropriate
|
|
for your use case.
|
|
|
|
Remove in bulk
|
|
--------------
|
|
|
|
When removing objects from :class:`ManyToManyFields
|
|
<django.db.models.ManyToManyField>`, use
|
|
:meth:`~django.db.models.fields.related.RelatedManager.remove` with multiple
|
|
objects to reduce the number of SQL queries. For example::
|
|
|
|
my_band.members.remove(me, my_friend)
|
|
|
|
...is preferable to::
|
|
|
|
my_band.members.remove(me)
|
|
my_band.members.remove(my_friend)
|
|
|
|
...where ``Bands`` and ``Artists`` have a many-to-many relationship.
|
|
|
|
When removing different pairs of objects from :class:`ManyToManyFields
|
|
<django.db.models.ManyToManyField>`, use
|
|
:meth:`~django.db.models.query.QuerySet.delete` on a
|
|
:class:`~django.db.models.Q` expression with multiple
|
|
:attr:`~django.db.models.ManyToManyField.through` model instances to reduce
|
|
the number of SQL queries. For example::
|
|
|
|
from django.db.models import Q
|
|
PizzaToppingRelationship = Pizza.toppings.through
|
|
PizzaToppingRelationship.objects.filter(
|
|
Q(pizza=my_pizza, topping=pepperoni) |
|
|
Q(pizza=your_pizza, topping=pepperoni) |
|
|
Q(pizza=your_pizza, topping=mushroom)
|
|
).delete()
|
|
|
|
...is preferable to::
|
|
|
|
my_pizza.toppings.remove(pepperoni)
|
|
your_pizza.toppings.remove(pepperoni, mushroom)
|
|
|
|
...where ``Pizza`` and ``Topping`` have a many-to-many relationship.
|