2010-01-09 17:08:52 +00:00
|
|
|
===========
|
|
|
|
Aggregation
|
|
|
|
===========
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
.. currentmodule:: django.db.models
|
|
|
|
|
2010-08-19 19:27:44 +00:00
|
|
|
The topic guide on :doc:`Django's database-abstraction API </topics/db/queries>`
|
2009-01-15 11:06:34 +00:00
|
|
|
described the way that you can use Django queries that create,
|
|
|
|
retrieve, update and delete individual objects. However, sometimes you will
|
|
|
|
need to retrieve values that are derived by summarizing or *aggregating* a
|
|
|
|
collection of objects. This topic guide describes the ways that aggregate values
|
|
|
|
can be generated and returned using Django queries.
|
|
|
|
|
|
|
|
Throughout this guide, we'll refer to the following models. These models are
|
2014-11-17 18:22:30 -05:00
|
|
|
used to track the inventory for a series of online bookstores:
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
.. _queryset-model-example:
|
|
|
|
|
|
|
|
.. code-block:: python
|
|
|
|
|
2013-05-18 12:12:26 +02:00
|
|
|
from django.db import models
|
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
|
2009-01-15 11:06:34 +00:00
|
|
|
class Author(models.Model):
|
2013-05-18 12:12:26 +02:00
|
|
|
name = models.CharField(max_length=100)
|
|
|
|
age = models.IntegerField()
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
|
2009-01-15 11:06:34 +00:00
|
|
|
class Publisher(models.Model):
|
2013-05-18 12:12:26 +02:00
|
|
|
name = models.CharField(max_length=300)
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
|
2009-01-15 11:06:34 +00:00
|
|
|
class Book(models.Model):
|
2013-05-18 12:12:26 +02:00
|
|
|
name = models.CharField(max_length=300)
|
|
|
|
pages = models.IntegerField()
|
|
|
|
price = models.DecimalField(max_digits=10, decimal_places=2)
|
|
|
|
rating = models.FloatField()
|
|
|
|
authors = models.ManyToManyField(Author)
|
2017-09-29 17:38:28 +02:00
|
|
|
publisher = models.ForeignKey(Publisher, on_delete=models.CASCADE)
|
2013-05-18 12:12:26 +02:00
|
|
|
pubdate = models.DateField()
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
|
2009-01-15 11:06:34 +00:00
|
|
|
class Store(models.Model):
|
2013-05-18 12:12:26 +02:00
|
|
|
name = models.CharField(max_length=300)
|
|
|
|
books = models.ManyToManyField(Book)
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2012-01-13 21:57:30 +00:00
|
|
|
Cheat sheet
|
|
|
|
===========
|
|
|
|
|
2019-12-23 05:47:13 -08:00
|
|
|
In a hurry? Here's how to do common aggregate queries, assuming the models
|
2023-02-09 16:48:46 +01:00
|
|
|
above:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2012-01-13 21:57:30 +00:00
|
|
|
|
|
|
|
# Total number of books.
|
|
|
|
>>> Book.objects.count()
|
|
|
|
2452
|
|
|
|
|
|
|
|
# Total number of books with publisher=BaloneyPress
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.filter(publisher__name="BaloneyPress").count()
|
2012-01-13 21:57:30 +00:00
|
|
|
73
|
|
|
|
|
|
|
|
# Average price across all books.
|
|
|
|
>>> from django.db.models import Avg
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.aggregate(Avg("price"))
|
2012-01-13 21:57:30 +00:00
|
|
|
{'price__avg': 34.35}
|
|
|
|
|
|
|
|
# Max price across all books.
|
|
|
|
>>> from django.db.models import Max
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.aggregate(Max("price"))
|
2012-01-13 21:57:30 +00:00
|
|
|
{'price__max': Decimal('81.20')}
|
|
|
|
|
2017-06-06 16:11:48 -04:00
|
|
|
# Difference between the highest priced book and the average price of all books.
|
|
|
|
>>> from django.db.models import FloatField
|
|
|
|
>>> Book.objects.aggregate(
|
2023-03-01 13:35:43 +01:00
|
|
|
... price_diff=Max("price", output_field=FloatField()) - Avg("price")
|
|
|
|
... )
|
2017-06-06 16:11:48 -04:00
|
|
|
{'price_diff': 46.85}
|
2013-12-26 00:13:18 +11:00
|
|
|
|
2013-02-11 12:49:30 -03:00
|
|
|
# All the following queries involve traversing the Book<->Publisher
|
2015-09-05 12:35:58 +02:00
|
|
|
# foreign key relationship backwards.
|
2013-02-11 12:49:30 -03:00
|
|
|
|
2012-01-13 21:57:30 +00:00
|
|
|
# Each publisher, each with a count of books as a "num_books" attribute.
|
|
|
|
>>> from django.db.models import Count
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> pubs = Publisher.objects.annotate(num_books=Count("book"))
|
2012-01-13 21:57:30 +00:00
|
|
|
>>> pubs
|
2015-10-05 19:07:34 -04:00
|
|
|
<QuerySet [<Publisher: BaloneyPress>, <Publisher: SalamiPress>, ...]>
|
2012-01-13 21:57:30 +00:00
|
|
|
>>> pubs[0].num_books
|
|
|
|
73
|
|
|
|
|
2017-04-22 16:44:51 +01:00
|
|
|
# Each publisher, with a separate count of books with a rating above and below 5
|
|
|
|
>>> from django.db.models import Q
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> above_5 = Count("book", filter=Q(book__rating__gt=5))
|
|
|
|
>>> below_5 = Count("book", filter=Q(book__rating__lte=5))
|
2017-04-22 16:44:51 +01:00
|
|
|
>>> pubs = Publisher.objects.annotate(below_5=below_5).annotate(above_5=above_5)
|
|
|
|
>>> pubs[0].above_5
|
|
|
|
23
|
|
|
|
>>> pubs[0].below_5
|
|
|
|
12
|
|
|
|
|
2012-01-13 21:57:30 +00:00
|
|
|
# The top 5 publishers, in order by number of books.
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> pubs = Publisher.objects.annotate(num_books=Count("book")).order_by("-num_books")[:5]
|
2012-01-13 21:57:30 +00:00
|
|
|
>>> pubs[0].num_books
|
|
|
|
1323
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2016-01-24 22:26:11 +01:00
|
|
|
Generating aggregates over a ``QuerySet``
|
|
|
|
=========================================
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
Django provides two ways to generate aggregates. The first way is to generate
|
|
|
|
summary values over an entire ``QuerySet``. For example, say you wanted to
|
|
|
|
calculate the average price of all books available for sale. Django's query
|
2023-02-09 16:48:46 +01:00
|
|
|
syntax provides a means for describing the set of all books:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
>>> Book.objects.all()
|
|
|
|
|
|
|
|
What we need is a way to calculate summary values over the objects that
|
|
|
|
belong to this ``QuerySet``. This is done by appending an ``aggregate()``
|
2023-02-09 16:48:46 +01:00
|
|
|
clause onto the ``QuerySet``:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
>>> from django.db.models import Avg
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.all().aggregate(Avg("price"))
|
2009-01-15 11:06:34 +00:00
|
|
|
{'price__avg': 34.35}
|
|
|
|
|
2023-02-09 16:48:46 +01:00
|
|
|
The ``all()`` is redundant in this example, so this could be simplified to:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.aggregate(Avg("price"))
|
2009-01-15 11:06:34 +00:00
|
|
|
{'price__avg': 34.35}
|
|
|
|
|
|
|
|
The argument to the ``aggregate()`` clause describes the aggregate value that
|
|
|
|
we want to compute - in this case, the average of the ``price`` field on the
|
|
|
|
``Book`` model. A list of the aggregate functions that are available can be
|
|
|
|
found in the :ref:`QuerySet reference <aggregation-functions>`.
|
|
|
|
|
|
|
|
``aggregate()`` is a terminal clause for a ``QuerySet`` that, when invoked,
|
|
|
|
returns a dictionary of name-value pairs. The name is an identifier for the
|
|
|
|
aggregate value; the value is the computed aggregate. The name is
|
|
|
|
automatically generated from the name of the field and the aggregate function.
|
|
|
|
If you want to manually specify a name for the aggregate value, you can do so
|
2023-02-09 16:48:46 +01:00
|
|
|
by providing that name when you specify the aggregate clause:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.aggregate(average_price=Avg("price"))
|
2009-01-15 11:06:34 +00:00
|
|
|
{'average_price': 34.35}
|
|
|
|
|
2019-06-17 16:54:55 +02:00
|
|
|
If you want to generate more than one aggregate, you add another argument to
|
|
|
|
the ``aggregate()`` clause. So, if we also wanted to know the maximum and
|
2023-02-09 16:48:46 +01:00
|
|
|
minimum price of all books, we would issue the query:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2013-05-18 12:12:26 +02:00
|
|
|
>>> from django.db.models import Avg, Max, Min
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.aggregate(Avg("price"), Max("price"), Min("price"))
|
2009-01-15 11:06:34 +00:00
|
|
|
{'price__avg': 34.35, 'price__max': Decimal('81.20'), 'price__min': Decimal('12.99')}
|
|
|
|
|
2016-01-24 22:26:11 +01:00
|
|
|
Generating aggregates for each item in a ``QuerySet``
|
|
|
|
=====================================================
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
The second way to generate summary values is to generate an independent
|
2018-09-10 19:23:19 +05:30
|
|
|
summary for each object in a :class:`.QuerySet`. For example, if you are
|
|
|
|
retrieving a list of books, you may want to know how many authors contributed
|
|
|
|
to each book. Each Book has a many-to-many relationship with the Author; we
|
2009-01-15 11:06:34 +00:00
|
|
|
want to summarize this relationship for each book in the ``QuerySet``.
|
|
|
|
|
2018-09-10 19:23:19 +05:30
|
|
|
Per-object summaries can be generated using the
|
|
|
|
:meth:`~.QuerySet.annotate` clause. When an ``annotate()`` clause is
|
|
|
|
specified, each object in the ``QuerySet`` will be annotated with the
|
|
|
|
specified values.
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
The syntax for these annotations is identical to that used for the
|
2018-09-10 19:23:19 +05:30
|
|
|
:meth:`~.QuerySet.aggregate` clause. Each argument to ``annotate()`` describes
|
|
|
|
an aggregate that is to be calculated. For example, to annotate books with the
|
2023-02-09 16:48:46 +01:00
|
|
|
number of authors:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
# Build an annotated queryset
|
2013-05-18 12:12:26 +02:00
|
|
|
>>> from django.db.models import Count
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> q = Book.objects.annotate(Count("authors"))
|
2009-01-15 11:06:34 +00:00
|
|
|
# Interrogate the first object in the queryset
|
|
|
|
>>> q[0]
|
|
|
|
<Book: The Definitive Guide to Django>
|
|
|
|
>>> q[0].authors__count
|
|
|
|
2
|
|
|
|
# Interrogate the second object in the queryset
|
|
|
|
>>> q[1]
|
|
|
|
<Book: Practical Django Projects>
|
|
|
|
>>> q[1].authors__count
|
|
|
|
1
|
|
|
|
|
|
|
|
As with ``aggregate()``, the name for the annotation is automatically derived
|
|
|
|
from the name of the aggregate function and the name of the field being
|
|
|
|
aggregated. You can override this default name by providing an alias when you
|
2023-02-09 16:48:46 +01:00
|
|
|
specify the annotation:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> q = Book.objects.annotate(num_authors=Count("authors"))
|
2009-01-15 11:06:34 +00:00
|
|
|
>>> q[0].num_authors
|
|
|
|
2
|
|
|
|
>>> q[1].num_authors
|
|
|
|
1
|
|
|
|
|
|
|
|
Unlike ``aggregate()``, ``annotate()`` is *not* a terminal clause. The output
|
|
|
|
of the ``annotate()`` clause is a ``QuerySet``; this ``QuerySet`` can be
|
|
|
|
modified using any other ``QuerySet`` operation, including ``filter()``,
|
2013-02-11 12:49:30 -03:00
|
|
|
``order_by()``, or even additional calls to ``annotate()``.
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2015-10-29 18:43:53 -04:00
|
|
|
.. _combining-multiple-aggregations:
|
|
|
|
|
2015-08-02 14:12:13 -03:00
|
|
|
Combining multiple aggregations
|
|
|
|
-------------------------------
|
|
|
|
|
2019-11-25 20:28:47 +05:00
|
|
|
Combining multiple aggregations with ``annotate()`` will :ticket:`yield the
|
|
|
|
wrong results <10060>` because joins are used instead of subqueries:
|
2015-08-02 14:12:13 -03:00
|
|
|
|
2016-04-27 14:49:29 -04:00
|
|
|
>>> book = Book.objects.first()
|
|
|
|
>>> book.authors.count()
|
2015-08-02 14:12:13 -03:00
|
|
|
2
|
2016-04-27 14:49:29 -04:00
|
|
|
>>> book.store_set.count()
|
2015-08-02 14:12:13 -03:00
|
|
|
3
|
2016-04-27 14:49:29 -04:00
|
|
|
>>> q = Book.objects.annotate(Count('authors'), Count('store'))
|
2015-08-02 14:12:13 -03:00
|
|
|
>>> q[0].authors__count
|
|
|
|
6
|
2016-04-27 14:49:29 -04:00
|
|
|
>>> q[0].store__count
|
2015-08-02 14:12:13 -03:00
|
|
|
6
|
|
|
|
|
|
|
|
For most aggregates, there is no way to avoid this problem, however, the
|
|
|
|
:class:`~django.db.models.Count` aggregate has a ``distinct`` parameter that
|
|
|
|
may help:
|
|
|
|
|
2016-04-27 14:49:29 -04:00
|
|
|
>>> q = Book.objects.annotate(Count('authors', distinct=True), Count('store', distinct=True))
|
2015-08-02 14:12:13 -03:00
|
|
|
>>> q[0].authors__count
|
|
|
|
2
|
2016-04-27 14:49:29 -04:00
|
|
|
>>> q[0].store__count
|
2015-08-02 14:12:13 -03:00
|
|
|
3
|
|
|
|
|
2014-12-01 00:02:43 +01:00
|
|
|
.. admonition:: If in doubt, inspect the SQL query!
|
|
|
|
|
|
|
|
In order to understand what happens in your query, consider inspecting the
|
|
|
|
``query`` property of your ``QuerySet``.
|
|
|
|
|
2009-01-15 11:06:34 +00:00
|
|
|
Joins and aggregates
|
|
|
|
====================
|
|
|
|
|
|
|
|
So far, we have dealt with aggregates over fields that belong to the
|
2009-02-24 22:51:14 +00:00
|
|
|
model being queried. However, sometimes the value you want to aggregate
|
2009-01-15 11:06:34 +00:00
|
|
|
will belong to a model that is related to the model you are querying.
|
|
|
|
|
Fixed #10389, #10501, #10502, #10540, #10562, #10563, #10564, #10565, #10568, #10569, #10614, #10617, #10619 -- Fixed several typos as well as a couple minor issues in the docs, patches from timo, nih, bthomas, rduffield, UloPe, and sebleier@gmail.com.
git-svn-id: http://code.djangoproject.com/svn/django/trunk@10242 bcc190cf-cafb-0310-a4f2-bffc1f526a37
2009-03-31 07:01:01 +00:00
|
|
|
When specifying the field to be aggregated in an aggregate function, Django
|
|
|
|
will allow you to use the same :ref:`double underscore notation
|
|
|
|
<field-lookups-intro>` that is used when referring to related fields in
|
|
|
|
filters. Django will then handle any table joins that are required to retrieve
|
|
|
|
and aggregate the related value.
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
For example, to find the price range of books offered in each store,
|
2023-02-09 16:48:46 +01:00
|
|
|
you could use the annotation:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2013-05-18 12:12:26 +02:00
|
|
|
>>> from django.db.models import Max, Min
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Store.objects.annotate(min_price=Min("books__price"), max_price=Max("books__price"))
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2013-03-22 05:50:45 -04:00
|
|
|
This tells Django to retrieve the ``Store`` model, join (through the
|
|
|
|
many-to-many relationship) with the ``Book`` model, and aggregate on the
|
2009-01-15 11:06:34 +00:00
|
|
|
price field of the book model to produce a minimum and maximum value.
|
|
|
|
|
|
|
|
The same rules apply to the ``aggregate()`` clause. If you wanted to
|
|
|
|
know the lowest and highest price of any book that is available for sale
|
2023-02-09 16:48:46 +01:00
|
|
|
in any of the stores, you could use the aggregate:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Store.objects.aggregate(min_price=Min("books__price"), max_price=Max("books__price"))
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2009-01-15 14:07:53 +00:00
|
|
|
Join chains can be as deep as you require. For example, to extract the
|
2009-01-15 11:06:34 +00:00
|
|
|
age of the youngest author of any book available for sale, you could
|
2023-02-09 16:48:46 +01:00
|
|
|
issue the query:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Store.objects.aggregate(youngest_age=Min("books__authors__age"))
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2013-02-11 12:49:30 -03:00
|
|
|
Following relationships backwards
|
|
|
|
---------------------------------
|
|
|
|
|
|
|
|
In a way similar to :ref:`lookups-that-span-relationships`, aggregations and
|
|
|
|
annotations on fields of models or models that are related to the one you are
|
|
|
|
querying can include traversing "reverse" relationships. The lowercase name
|
|
|
|
of related models and double-underscores are used here too.
|
|
|
|
|
|
|
|
For example, we can ask for all publishers, annotated with their respective
|
2013-03-22 05:50:45 -04:00
|
|
|
total book stock counters (note how we use ``'book'`` to specify the
|
2023-02-09 16:48:46 +01:00
|
|
|
``Publisher`` -> ``Book`` reverse foreign key hop):
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2013-02-11 12:49:30 -03:00
|
|
|
|
2018-05-12 19:37:42 +02:00
|
|
|
>>> from django.db.models import Avg, Count, Min, Sum
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Publisher.objects.annotate(Count("book"))
|
2013-02-11 12:49:30 -03:00
|
|
|
|
2013-03-22 05:50:45 -04:00
|
|
|
(Every ``Publisher`` in the resulting ``QuerySet`` will have an extra attribute
|
|
|
|
called ``book__count``.)
|
2013-02-11 12:49:30 -03:00
|
|
|
|
2023-02-09 16:48:46 +01:00
|
|
|
We can also ask for the oldest book of any of those managed by every publisher:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2013-02-11 12:49:30 -03:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Publisher.objects.aggregate(oldest_pubdate=Min("book__pubdate"))
|
2013-02-11 12:49:30 -03:00
|
|
|
|
|
|
|
(The resulting dictionary will have a key called ``'oldest_pubdate'``. If no
|
2013-03-22 05:50:45 -04:00
|
|
|
such alias were specified, it would be the rather long ``'book__pubdate__min'``.)
|
2013-02-11 12:49:30 -03:00
|
|
|
|
|
|
|
This doesn't apply just to foreign keys. It also works with many-to-many
|
|
|
|
relations. For example, we can ask for every author, annotated with the total
|
2013-11-30 08:37:15 -05:00
|
|
|
number of pages considering all the books the author has (co-)authored (note how we
|
2023-02-09 16:48:46 +01:00
|
|
|
use ``'book'`` to specify the ``Author`` -> ``Book`` reverse many-to-many hop):
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2013-02-11 12:49:30 -03:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Author.objects.annotate(total_pages=Sum("book__pages"))
|
2013-02-11 12:49:30 -03:00
|
|
|
|
2013-03-22 05:50:45 -04:00
|
|
|
(Every ``Author`` in the resulting ``QuerySet`` will have an extra attribute
|
|
|
|
called ``total_pages``. If no such alias were specified, it would be the rather
|
|
|
|
long ``book__pages__sum``.)
|
2013-02-11 12:49:30 -03:00
|
|
|
|
|
|
|
Or ask for the average rating of all the books written by author(s) we have on
|
2023-02-09 16:48:46 +01:00
|
|
|
file:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2013-02-11 12:49:30 -03:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Author.objects.aggregate(average_rating=Avg("book__rating"))
|
2013-02-11 12:49:30 -03:00
|
|
|
|
2016-03-25 13:14:52 -04:00
|
|
|
(The resulting dictionary will have a key called ``'average_rating'``. If no
|
2013-03-22 05:50:45 -04:00
|
|
|
such alias were specified, it would be the rather long ``'book__rating__avg'``.)
|
2013-02-11 12:49:30 -03:00
|
|
|
|
2016-01-24 22:26:11 +01:00
|
|
|
Aggregations and other ``QuerySet`` clauses
|
|
|
|
===========================================
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
``filter()`` and ``exclude()``
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
Aggregates can also participate in filters. Any ``filter()`` (or
|
|
|
|
``exclude()``) applied to normal model fields will have the effect of
|
|
|
|
constraining the objects that are considered for aggregation.
|
|
|
|
|
|
|
|
When used with an ``annotate()`` clause, a filter has the effect of
|
|
|
|
constraining the objects for which an annotation is calculated. For example,
|
|
|
|
you can generate an annotated list of all books that have a title starting
|
2023-02-09 16:48:46 +01:00
|
|
|
with "Django" using the query:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2018-05-12 19:37:42 +02:00
|
|
|
>>> from django.db.models import Avg, Count
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.filter(name__startswith="Django").annotate(num_authors=Count("authors"))
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
When used with an ``aggregate()`` clause, a filter has the effect of
|
|
|
|
constraining the objects over which the aggregate is calculated.
|
|
|
|
For example, you can generate the average price of all books with a
|
2023-02-09 16:48:46 +01:00
|
|
|
title that starts with "Django" using the query:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.filter(name__startswith="Django").aggregate(Avg("price"))
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2017-04-22 16:44:51 +01:00
|
|
|
.. _filtering-on-annotations:
|
|
|
|
|
2009-01-15 11:06:34 +00:00
|
|
|
Filtering on annotations
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
Annotated values can also be filtered. The alias for the annotation can be
|
|
|
|
used in ``filter()`` and ``exclude()`` clauses in the same way as any other
|
|
|
|
model field.
|
|
|
|
|
|
|
|
For example, to generate a list of books that have more than one author,
|
2023-02-09 16:48:46 +01:00
|
|
|
you can issue the query:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.annotate(num_authors=Count("authors")).filter(num_authors__gt=1)
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
This query generates an annotated result set, and then generates a filter
|
|
|
|
based upon that annotation.
|
|
|
|
|
2017-04-22 16:44:51 +01:00
|
|
|
If you need two annotations with two separate filters you can use the
|
|
|
|
``filter`` argument with any aggregate. For example, to generate a list of
|
2023-02-09 16:48:46 +01:00
|
|
|
authors with a count of highly rated books:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2017-04-22 16:44:51 +01:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> highly_rated = Count("book", filter=Q(book__rating__gte=7))
|
|
|
|
>>> Author.objects.annotate(num_books=Count("book"), highly_rated_books=highly_rated)
|
2017-04-22 16:44:51 +01:00
|
|
|
|
|
|
|
Each ``Author`` in the result set will have the ``num_books`` and
|
2021-04-10 00:04:44 +04:30
|
|
|
``highly_rated_books`` attributes. See also :ref:`conditional-aggregation`.
|
2017-04-22 16:44:51 +01:00
|
|
|
|
|
|
|
.. admonition:: Choosing between ``filter`` and ``QuerySet.filter()``
|
|
|
|
|
|
|
|
Avoid using the ``filter`` argument with a single annotation or
|
|
|
|
aggregation. It's more efficient to use ``QuerySet.filter()`` to exclude
|
|
|
|
rows. The aggregation ``filter`` argument is only useful when using two or
|
|
|
|
more aggregations over the same relations with different conditionals.
|
|
|
|
|
2009-01-15 11:06:34 +00:00
|
|
|
Order of ``annotate()`` and ``filter()`` clauses
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
When developing a complex query that involves both ``annotate()`` and
|
2015-10-29 18:43:53 -04:00
|
|
|
``filter()`` clauses, pay particular attention to the order in which the
|
|
|
|
clauses are applied to the ``QuerySet``.
|
|
|
|
|
|
|
|
When an ``annotate()`` clause is applied to a query, the annotation is computed
|
|
|
|
over the state of the query up to the point where the annotation is requested.
|
|
|
|
The practical implication of this is that ``filter()`` and ``annotate()`` are
|
|
|
|
not commutative operations.
|
|
|
|
|
|
|
|
Given:
|
|
|
|
|
|
|
|
* Publisher A has two books with ratings 4 and 5.
|
|
|
|
* Publisher B has two books with ratings 1 and 4.
|
|
|
|
* Publisher C has one book with rating 1.
|
|
|
|
|
2023-02-09 16:48:46 +01:00
|
|
|
Here's an example with the ``Count`` aggregate:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2015-10-29 18:43:53 -04:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> a, b = Publisher.objects.annotate(num_books=Count("book", distinct=True)).filter(
|
|
|
|
... book__rating__gt=3.0
|
|
|
|
... )
|
2015-10-29 18:43:53 -04:00
|
|
|
>>> a, a.num_books
|
|
|
|
(<Publisher: A>, 2)
|
|
|
|
>>> b, b.num_books
|
|
|
|
(<Publisher: B>, 2)
|
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> a, b = Publisher.objects.filter(book__rating__gt=3.0).annotate(num_books=Count("book"))
|
2015-10-29 18:43:53 -04:00
|
|
|
>>> a, a.num_books
|
|
|
|
(<Publisher: A>, 2)
|
|
|
|
>>> b, b.num_books
|
|
|
|
(<Publisher: B>, 1)
|
|
|
|
|
|
|
|
Both queries return a list of publishers that have at least one book with a
|
|
|
|
rating exceeding 3.0, hence publisher C is excluded.
|
|
|
|
|
|
|
|
In the first query, the annotation precedes the filter, so the filter has no
|
2015-12-08 16:45:15 -05:00
|
|
|
effect on the annotation. ``distinct=True`` is required to avoid a :ref:`query
|
|
|
|
bug <combining-multiple-aggregations>`.
|
2015-10-29 18:43:53 -04:00
|
|
|
|
|
|
|
The second query counts the number of books that have a rating exceeding 3.0
|
|
|
|
for each publisher. The filter precedes the annotation, so the filter
|
|
|
|
constrains the objects considered when calculating the annotation.
|
|
|
|
|
2023-02-09 16:48:46 +01:00
|
|
|
Here's another example with the ``Avg`` aggregate:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2015-10-29 18:43:53 -04:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> a, b = Publisher.objects.annotate(avg_rating=Avg("book__rating")).filter(
|
|
|
|
... book__rating__gt=3.0
|
|
|
|
... )
|
2015-10-29 18:43:53 -04:00
|
|
|
>>> a, a.avg_rating
|
|
|
|
(<Publisher: A>, 4.5) # (5+4)/2
|
|
|
|
>>> b, b.avg_rating
|
|
|
|
(<Publisher: B>, 2.5) # (1+4)/2
|
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> a, b = Publisher.objects.filter(book__rating__gt=3.0).annotate(
|
|
|
|
... avg_rating=Avg("book__rating")
|
|
|
|
... )
|
2015-10-29 18:43:53 -04:00
|
|
|
>>> a, a.avg_rating
|
|
|
|
(<Publisher: A>, 4.5) # (5+4)/2
|
|
|
|
>>> b, b.avg_rating
|
|
|
|
(<Publisher: B>, 4.0) # 4/1 (book with rating 1 excluded)
|
|
|
|
|
|
|
|
The first query asks for the average rating of all a publisher's books for
|
|
|
|
publisher's that have at least one book with a rating exceeding 3.0. The second
|
|
|
|
query asks for the average of a publisher's book's ratings for only those
|
|
|
|
ratings exceeding 3.0.
|
|
|
|
|
|
|
|
It's difficult to intuit how the ORM will translate complex querysets into SQL
|
|
|
|
queries so when in doubt, inspect the SQL with ``str(queryset.query)`` and
|
|
|
|
write plenty of tests.
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
``order_by()``
|
|
|
|
--------------
|
|
|
|
|
|
|
|
Annotations can be used as a basis for ordering. When you
|
|
|
|
define an ``order_by()`` clause, the aggregates you provide can reference
|
|
|
|
any alias defined as part of an ``annotate()`` clause in the query.
|
|
|
|
|
|
|
|
For example, to order a ``QuerySet`` of books by the number of authors
|
2023-02-09 16:48:46 +01:00
|
|
|
that have contributed to the book, you could use the following query:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.annotate(num_authors=Count("authors")).order_by("num_authors")
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
``values()``
|
|
|
|
------------
|
|
|
|
|
|
|
|
Ordinarily, annotations are generated on a per-object basis - an annotated
|
|
|
|
``QuerySet`` will return one result for each object in the original
|
2010-05-10 13:14:19 +00:00
|
|
|
``QuerySet``. However, when a ``values()`` clause is used to constrain the
|
2009-01-15 11:06:34 +00:00
|
|
|
columns that are returned in the result set, the method for evaluating
|
|
|
|
annotations is slightly different. Instead of returning an annotated result
|
|
|
|
for each result in the original ``QuerySet``, the original results are
|
|
|
|
grouped according to the unique combinations of the fields specified in the
|
|
|
|
``values()`` clause. An annotation is then provided for each unique group;
|
|
|
|
the annotation is computed over all members of the group.
|
|
|
|
|
|
|
|
For example, consider an author query that attempts to find out the average
|
|
|
|
rating of books written by each author:
|
|
|
|
|
2009-01-15 22:33:55 +00:00
|
|
|
>>> Author.objects.annotate(average_rating=Avg('book__rating'))
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2009-02-24 22:51:14 +00:00
|
|
|
This will return one result for each author in the database, annotated with
|
2009-01-15 11:06:34 +00:00
|
|
|
their average book rating.
|
|
|
|
|
2023-02-09 16:48:46 +01:00
|
|
|
However, the result will be slightly different if you use a ``values()`` clause:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Author.objects.values("name").annotate(average_rating=Avg("book__rating"))
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
In this example, the authors will be grouped by name, so you will only get
|
|
|
|
an annotated result for each *unique* author name. This means if you have
|
|
|
|
two authors with the same name, their results will be merged into a single
|
|
|
|
result in the output of the query; the average will be computed as the
|
|
|
|
average over the books written by both authors.
|
|
|
|
|
2009-01-15 12:35:41 +00:00
|
|
|
Order of ``annotate()`` and ``values()`` clauses
|
2009-01-15 11:06:34 +00:00
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
As with the ``filter()`` clause, the order in which ``annotate()`` and
|
|
|
|
``values()`` clauses are applied to a query is significant. If the
|
|
|
|
``values()`` clause precedes the ``annotate()``, the annotation will be
|
|
|
|
computed using the grouping described by the ``values()`` clause.
|
|
|
|
|
|
|
|
However, if the ``annotate()`` clause precedes the ``values()`` clause,
|
|
|
|
the annotations will be generated over the entire query set. In this case,
|
|
|
|
the ``values()`` clause only constrains the fields that are generated on
|
|
|
|
output.
|
|
|
|
|
|
|
|
For example, if we reverse the order of the ``values()`` and ``annotate()``
|
2023-02-09 16:48:46 +01:00
|
|
|
clause from our previous example:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Author.objects.annotate(average_rating=Avg("book__rating")).values(
|
|
|
|
... "name", "average_rating"
|
|
|
|
... )
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
This will now yield one unique result for each author; however, only
|
|
|
|
the author's name and the ``average_rating`` annotation will be returned
|
|
|
|
in the output data.
|
|
|
|
|
2009-02-23 14:47:59 +00:00
|
|
|
You should also note that ``average_rating`` has been explicitly included
|
|
|
|
in the list of values to be returned. This is required because of the
|
|
|
|
ordering of the ``values()`` and ``annotate()`` clause.
|
|
|
|
|
|
|
|
If the ``values()`` clause precedes the ``annotate()`` clause, any annotations
|
|
|
|
will be automatically added to the result set. However, if the ``values()``
|
|
|
|
clause is applied after the ``annotate()`` clause, you need to explicitly
|
|
|
|
include the aggregate column.
|
|
|
|
|
2016-07-25 06:21:39 -04:00
|
|
|
.. _aggregation-ordering-interaction:
|
|
|
|
|
2021-06-08 16:39:00 +02:00
|
|
|
Interaction with ``order_by()``
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
2009-03-25 09:05:38 +00:00
|
|
|
|
2021-06-08 16:39:00 +02:00
|
|
|
Fields that are mentioned in the ``order_by()`` part of a queryset are used
|
|
|
|
when selecting the output data, even if they are not otherwise specified in the
|
|
|
|
``values()`` call. These extra fields are used to group "like" results together
|
|
|
|
and they can make otherwise identical result rows appear to be separate. This
|
|
|
|
shows up, particularly, when counting things.
|
2009-03-25 09:05:38 +00:00
|
|
|
|
|
|
|
By way of example, suppose you have a model like this::
|
|
|
|
|
2013-05-18 12:12:26 +02:00
|
|
|
from django.db import models
|
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
|
2009-03-25 09:05:38 +00:00
|
|
|
class Item(models.Model):
|
|
|
|
name = models.CharField(max_length=10)
|
|
|
|
data = models.IntegerField()
|
|
|
|
|
2021-06-08 16:39:00 +02:00
|
|
|
If you want to count how many times each distinct ``data`` value appears in an
|
|
|
|
ordered queryset, you might try this::
|
2009-03-25 09:05:38 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
items = Item.objects.order_by("name")
|
2009-03-25 09:05:38 +00:00
|
|
|
# Warning: not quite correct!
|
2023-03-01 13:35:43 +01:00
|
|
|
items.values("data").annotate(Count("id"))
|
2009-03-25 09:05:38 +00:00
|
|
|
|
|
|
|
...which will group the ``Item`` objects by their common ``data`` values and
|
|
|
|
then count the number of ``id`` values in each group. Except that it won't
|
2021-06-08 16:39:00 +02:00
|
|
|
quite work. The ordering by ``name`` will also play a part in the grouping, so
|
|
|
|
this query will group by distinct ``(data, name)`` pairs, which isn't what you
|
|
|
|
want. Instead, you should construct this queryset::
|
2009-03-25 09:05:38 +00:00
|
|
|
|
2023-03-01 13:35:43 +01:00
|
|
|
items.values("data").annotate(Count("id")).order_by()
|
2009-03-25 09:05:38 +00:00
|
|
|
|
|
|
|
...clearing any ordering in the query. You could also order by, say, ``data``
|
|
|
|
without any harmful effects, since that is already playing a role in the
|
|
|
|
query.
|
|
|
|
|
|
|
|
This behavior is the same as that noted in the queryset documentation for
|
2011-09-30 10:28:39 +00:00
|
|
|
:meth:`~django.db.models.query.QuerySet.distinct` and the general rule is the
|
|
|
|
same: normally you won't want extra columns playing a part in the result, so
|
|
|
|
clear out the ordering, or at least make sure it's restricted only to those
|
|
|
|
fields you also select in a ``values()`` call.
|
2009-03-25 09:05:38 +00:00
|
|
|
|
|
|
|
.. note::
|
|
|
|
You might reasonably ask why Django doesn't remove the extraneous columns
|
|
|
|
for you. The main reason is consistency with ``distinct()`` and other
|
|
|
|
places: Django **never** removes ordering constraints that you have
|
|
|
|
specified (and we can't change those other methods' behavior, as that
|
2010-08-19 19:27:44 +00:00
|
|
|
would violate our :doc:`/misc/api-stability` policy).
|
2009-03-25 09:05:38 +00:00
|
|
|
|
2009-01-15 11:06:34 +00:00
|
|
|
Aggregating annotations
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
You can also generate an aggregate on the result of an annotation. When you
|
|
|
|
define an ``aggregate()`` clause, the aggregates you provide can reference
|
|
|
|
any alias defined as part of an ``annotate()`` clause in the query.
|
|
|
|
|
|
|
|
For example, if you wanted to calculate the average number of authors per
|
|
|
|
book you first annotate the set of books with the author count, then
|
2023-02-09 16:48:46 +01:00
|
|
|
aggregate that author count, referencing the annotation field:
|
|
|
|
|
|
|
|
.. code-block:: pycon
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2018-05-12 19:37:42 +02:00
|
|
|
>>> from django.db.models import Avg, Count
|
2023-03-01 13:35:43 +01:00
|
|
|
>>> Book.objects.annotate(num_authors=Count("authors")).aggregate(Avg("num_authors"))
|
2009-01-15 11:06:34 +00:00
|
|
|
{'num_authors__avg': 1.66}
|