2009-01-15 11:06:34 +00:00
|
|
|
.. _topics-db-aggregation:
|
|
|
|
|
|
|
|
=============
|
|
|
|
Aggregation
|
|
|
|
=============
|
|
|
|
|
|
|
|
.. versionadded:: 1.1
|
|
|
|
|
|
|
|
.. currentmodule:: django.db.models
|
|
|
|
|
|
|
|
The topic guide on :ref:`Django's database-abstraction API <topics-db-queries`
|
|
|
|
described the way that you can use Django queries that create,
|
|
|
|
retrieve, update and delete individual objects. However, sometimes you will
|
|
|
|
need to retrieve values that are derived by summarizing or *aggregating* a
|
|
|
|
collection of objects. This topic guide describes the ways that aggregate values
|
|
|
|
can be generated and returned using Django queries.
|
|
|
|
|
|
|
|
Throughout this guide, we'll refer to the following models. These models are
|
|
|
|
used to track the inventory for a series of online bookstores:
|
|
|
|
|
|
|
|
.. _queryset-model-example:
|
|
|
|
|
|
|
|
.. code-block:: python
|
|
|
|
|
|
|
|
class Author(models.Model):
|
|
|
|
name = models.CharField(max_length=100)
|
|
|
|
age = models.IntegerField()
|
|
|
|
friends = models.ManyToManyField('self', blank=True)
|
|
|
|
|
|
|
|
class Publisher(models.Model):
|
|
|
|
name = models.CharField(max_length=300)
|
|
|
|
num_awards = models.IntegerField()
|
|
|
|
|
|
|
|
class Book(models.Model):
|
|
|
|
isbn = models.CharField(max_length=9)
|
|
|
|
name = models.CharField(max_length=300)
|
|
|
|
pages = models.IntegerField()
|
|
|
|
price = models.DecimalField(max_digits=10, decimal_places=2)
|
|
|
|
rating = models.FloatField()
|
|
|
|
authors = models.ManyToManyField(Author)
|
|
|
|
publisher = models.ForeignKey(Publisher)
|
2009-01-15 22:26:00 +00:00
|
|
|
pubdate = models.DateField()
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
class Store(models.Model):
|
|
|
|
name = models.CharField(max_length=300)
|
|
|
|
books = models.ManyToManyField(Book)
|
|
|
|
|
|
|
|
|
|
|
|
Generating aggregates over a QuerySet
|
|
|
|
=====================================
|
|
|
|
|
|
|
|
Django provides two ways to generate aggregates. The first way is to generate
|
|
|
|
summary values over an entire ``QuerySet``. For example, say you wanted to
|
|
|
|
calculate the average price of all books available for sale. Django's query
|
|
|
|
syntax provides a means for describing the set of all books::
|
|
|
|
|
|
|
|
>>> Book.objects.all()
|
|
|
|
|
|
|
|
What we need is a way to calculate summary values over the objects that
|
|
|
|
belong to this ``QuerySet``. This is done by appending an ``aggregate()``
|
|
|
|
clause onto the ``QuerySet``::
|
|
|
|
|
|
|
|
>>> from django.db.models import Avg
|
|
|
|
>>> Book.objects.all().aggregate(Avg('price'))
|
|
|
|
{'price__avg': 34.35}
|
|
|
|
|
|
|
|
The ``all()`` is redundant in this example, so this could be simplified to::
|
|
|
|
|
|
|
|
>>> Book.objects.aggregate(Avg('price'))
|
|
|
|
{'price__avg': 34.35}
|
|
|
|
|
|
|
|
The argument to the ``aggregate()`` clause describes the aggregate value that
|
|
|
|
we want to compute - in this case, the average of the ``price`` field on the
|
|
|
|
``Book`` model. A list of the aggregate functions that are available can be
|
|
|
|
found in the :ref:`QuerySet reference <aggregation-functions>`.
|
|
|
|
|
|
|
|
``aggregate()`` is a terminal clause for a ``QuerySet`` that, when invoked,
|
|
|
|
returns a dictionary of name-value pairs. The name is an identifier for the
|
|
|
|
aggregate value; the value is the computed aggregate. The name is
|
|
|
|
automatically generated from the name of the field and the aggregate function.
|
|
|
|
If you want to manually specify a name for the aggregate value, you can do so
|
|
|
|
by providing that name when you specify the aggregate clause::
|
|
|
|
|
|
|
|
>>> Book.objects.aggregate(average_price=Avg('price'))
|
|
|
|
{'average_price': 34.35}
|
|
|
|
|
|
|
|
If you want to generate more than one aggregate, you just add another
|
|
|
|
argument to the ``aggregate()`` clause. So, if we also wanted to know
|
|
|
|
the maximum and minimum price of all books, we would issue the query::
|
|
|
|
|
2009-01-23 11:12:06 +00:00
|
|
|
>>> from django.db.models import Avg, Max, Min, Count
|
2009-01-15 11:06:34 +00:00
|
|
|
>>> Book.objects.aggregate(Avg('price'), Max('price'), Min('price'))
|
|
|
|
{'price__avg': 34.35, 'price__max': Decimal('81.20'), 'price__min': Decimal('12.99')}
|
|
|
|
|
|
|
|
Generating aggregates for each item in a QuerySet
|
|
|
|
=================================================
|
|
|
|
|
|
|
|
The second way to generate summary values is to generate an independent
|
|
|
|
summary for each object in a ``Queryset``. For example, if you are retrieving
|
|
|
|
a list of books, you may want to know how many authors contributed to
|
|
|
|
each book. Each Book has a many-to-many relationship with the Author; we
|
|
|
|
want to summarize this relationship for each book in the ``QuerySet``.
|
|
|
|
|
|
|
|
Per-object summaries can be generated using the ``annotate()`` clause.
|
|
|
|
When an ``annotate()`` clause is specified, each object in the ``QuerySet``
|
|
|
|
will be annotated with the specified values.
|
|
|
|
|
|
|
|
The syntax for these annotations is identical to that used for the
|
2009-01-15 23:15:32 +00:00
|
|
|
``aggregate()`` clause. Each argument to ``annotate()`` describes an
|
2009-01-15 11:06:34 +00:00
|
|
|
aggregate that is to be calculated. For example, to annotate Books with
|
|
|
|
the number of authors::
|
|
|
|
|
|
|
|
# Build an annotated queryset
|
|
|
|
>>> q = Book.objects.annotate(Count('authors'))
|
|
|
|
# Interrogate the first object in the queryset
|
|
|
|
>>> q[0]
|
|
|
|
<Book: The Definitive Guide to Django>
|
|
|
|
>>> q[0].authors__count
|
|
|
|
2
|
|
|
|
# Interrogate the second object in the queryset
|
|
|
|
>>> q[1]
|
|
|
|
<Book: Practical Django Projects>
|
|
|
|
>>> q[1].authors__count
|
|
|
|
1
|
|
|
|
|
|
|
|
As with ``aggregate()``, the name for the annotation is automatically derived
|
|
|
|
from the name of the aggregate function and the name of the field being
|
|
|
|
aggregated. You can override this default name by providing an alias when you
|
|
|
|
specify the annotation::
|
|
|
|
|
|
|
|
>>> q = Book.objects.annotate(num_authors=Count('authors'))
|
|
|
|
>>> q[0].num_authors
|
|
|
|
2
|
|
|
|
>>> q[1].num_authors
|
|
|
|
1
|
|
|
|
|
|
|
|
Unlike ``aggregate()``, ``annotate()`` is *not* a terminal clause. The output
|
|
|
|
of the ``annotate()`` clause is a ``QuerySet``; this ``QuerySet`` can be
|
|
|
|
modified using any other ``QuerySet`` operation, including ``filter()``,
|
|
|
|
``order_by``, or even additional calls to ``annotate()``.
|
|
|
|
|
|
|
|
Joins and aggregates
|
|
|
|
====================
|
|
|
|
|
|
|
|
So far, we have dealt with aggregates over fields that belong to the
|
|
|
|
model being queries. However, sometimes the value you want to aggregate
|
|
|
|
will belong to a model that is related to the model you are querying.
|
|
|
|
|
|
|
|
When specifying the field to be aggregated in an aggregate functions,
|
|
|
|
Django will allow you to use the same
|
|
|
|
:ref:`double underscore notation <field-lookups-intro>` that is used
|
|
|
|
when referring to related fields in filters. Django will then handle
|
|
|
|
any table joins that are required to retrieve and aggregate the
|
|
|
|
related value.
|
|
|
|
|
|
|
|
For example, to find the price range of books offered in each store,
|
|
|
|
you could use the annotation::
|
|
|
|
|
2009-01-15 23:15:32 +00:00
|
|
|
>>> Store.objects.annotate(min_price=Min('books__price'), max_price=Max('books__price'))
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
This tells Django to retrieve the Store model, join (through the
|
|
|
|
many-to-many relationship) with the Book model, and aggregate on the
|
|
|
|
price field of the book model to produce a minimum and maximum value.
|
|
|
|
|
|
|
|
The same rules apply to the ``aggregate()`` clause. If you wanted to
|
|
|
|
know the lowest and highest price of any book that is available for sale
|
|
|
|
in a store, you could use the aggregate::
|
|
|
|
|
2009-01-16 00:02:35 +00:00
|
|
|
>>> Store.objects.aggregate(min_price=Min('books__price'), max_price=Max('books__price'))
|
2009-01-15 11:06:34 +00:00
|
|
|
|
2009-01-15 14:07:53 +00:00
|
|
|
Join chains can be as deep as you require. For example, to extract the
|
2009-01-15 11:06:34 +00:00
|
|
|
age of the youngest author of any book available for sale, you could
|
|
|
|
issue the query::
|
|
|
|
|
|
|
|
>>> Store.objects.aggregate(youngest_age=Min('books__authors__age'))
|
|
|
|
|
|
|
|
Aggregations and other QuerySet clauses
|
|
|
|
=======================================
|
|
|
|
|
|
|
|
``filter()`` and ``exclude()``
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
Aggregates can also participate in filters. Any ``filter()`` (or
|
|
|
|
``exclude()``) applied to normal model fields will have the effect of
|
|
|
|
constraining the objects that are considered for aggregation.
|
|
|
|
|
|
|
|
When used with an ``annotate()`` clause, a filter has the effect of
|
|
|
|
constraining the objects for which an annotation is calculated. For example,
|
|
|
|
you can generate an annotated list of all books that have a title starting
|
|
|
|
with "Django" using the query::
|
|
|
|
|
|
|
|
>>> Book.objects.filter(name__startswith="Django").annotate(num_authors=Count('authors'))
|
|
|
|
|
|
|
|
When used with an ``aggregate()`` clause, a filter has the effect of
|
|
|
|
constraining the objects over which the aggregate is calculated.
|
|
|
|
For example, you can generate the average price of all books with a
|
|
|
|
title that starts with "Django" using the query::
|
|
|
|
|
|
|
|
>>> Book.objects.filter(name__startswith="Django").aggregate(Avg('price'))
|
|
|
|
|
|
|
|
Filtering on annotations
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
Annotated values can also be filtered. The alias for the annotation can be
|
|
|
|
used in ``filter()`` and ``exclude()`` clauses in the same way as any other
|
|
|
|
model field.
|
|
|
|
|
|
|
|
For example, to generate a list of books that have more than one author,
|
|
|
|
you can issue the query::
|
|
|
|
|
|
|
|
>>> Book.objects.annotate(num_authors=Count('authors')).filter(num_authors__gt=1)
|
|
|
|
|
|
|
|
This query generates an annotated result set, and then generates a filter
|
|
|
|
based upon that annotation.
|
|
|
|
|
|
|
|
Order of ``annotate()`` and ``filter()`` clauses
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
When developing a complex query that involves both ``annotate()`` and
|
|
|
|
``filter()`` clauses, particular attention should be paid to the order
|
|
|
|
in which the clauses are applied to the ``QuerySet``.
|
|
|
|
|
|
|
|
When an ``annotate()`` clause is applied to a query, the annotation is
|
|
|
|
computed over the state of the query up to the point where the annotation
|
|
|
|
is requested. The practical implication of this is that ``filter()`` and
|
|
|
|
``annotate()`` are not transitive operations -- that is, there is a
|
|
|
|
difference between the query::
|
|
|
|
|
|
|
|
>>> Publisher.objects.annotate(num_books=Count('book')).filter(book__rating__gt=3.0)
|
|
|
|
|
|
|
|
and the query::
|
|
|
|
|
|
|
|
>>> Publisher.objects.filter(book__rating__gt=3.0).annotate(num_books=Count('book'))
|
|
|
|
|
|
|
|
Both queries will return a list of Publishers that have at least one good
|
|
|
|
book (i.e., a book with a rating exceeding 3.0). However, the annotation in
|
|
|
|
the first query will provide the total number of all books published by the
|
|
|
|
publisher; the second query will only include good books in the annotated
|
|
|
|
count. In the first query, the annotation precedes the filter, so the
|
|
|
|
filter has no effect on the annotation. In the second query, the filter
|
|
|
|
preceeds the annotation, and as a result, the filter constrains the objects
|
|
|
|
considered when calculating the annotation.
|
|
|
|
|
|
|
|
``order_by()``
|
|
|
|
--------------
|
|
|
|
|
|
|
|
Annotations can be used as a basis for ordering. When you
|
|
|
|
define an ``order_by()`` clause, the aggregates you provide can reference
|
|
|
|
any alias defined as part of an ``annotate()`` clause in the query.
|
|
|
|
|
|
|
|
For example, to order a ``QuerySet`` of books by the number of authors
|
|
|
|
that have contributed to the book, you could use the following query::
|
|
|
|
|
|
|
|
>>> Book.objects.annotate(num_authors=Count('authors')).order_by('num_authors')
|
|
|
|
|
|
|
|
``values()``
|
|
|
|
------------
|
|
|
|
|
|
|
|
Ordinarily, annotations are generated on a per-object basis - an annotated
|
|
|
|
``QuerySet`` will return one result for each object in the original
|
|
|
|
``Queryset``. However, when a ``values()`` clause is used to constrain the
|
|
|
|
columns that are returned in the result set, the method for evaluating
|
|
|
|
annotations is slightly different. Instead of returning an annotated result
|
|
|
|
for each result in the original ``QuerySet``, the original results are
|
|
|
|
grouped according to the unique combinations of the fields specified in the
|
|
|
|
``values()`` clause. An annotation is then provided for each unique group;
|
|
|
|
the annotation is computed over all members of the group.
|
|
|
|
|
|
|
|
For example, consider an author query that attempts to find out the average
|
|
|
|
rating of books written by each author:
|
|
|
|
|
2009-01-15 22:33:55 +00:00
|
|
|
>>> Author.objects.annotate(average_rating=Avg('book__rating'))
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
This will return one result for each author in the database, annotate with
|
|
|
|
their average book rating.
|
|
|
|
|
|
|
|
However, the result will be slightly different if you use a ``values()`` clause::
|
|
|
|
|
2009-01-15 22:33:55 +00:00
|
|
|
>>> Author.objects.values('name').annotate(average_rating=Avg('book__rating'))
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
In this example, the authors will be grouped by name, so you will only get
|
|
|
|
an annotated result for each *unique* author name. This means if you have
|
|
|
|
two authors with the same name, their results will be merged into a single
|
|
|
|
result in the output of the query; the average will be computed as the
|
|
|
|
average over the books written by both authors.
|
|
|
|
|
|
|
|
The annotation name will be added to the fields returned
|
|
|
|
as part of the ``ValuesQuerySet``.
|
|
|
|
|
2009-01-15 12:35:41 +00:00
|
|
|
Order of ``annotate()`` and ``values()`` clauses
|
2009-01-15 11:06:34 +00:00
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
As with the ``filter()`` clause, the order in which ``annotate()`` and
|
|
|
|
``values()`` clauses are applied to a query is significant. If the
|
|
|
|
``values()`` clause precedes the ``annotate()``, the annotation will be
|
|
|
|
computed using the grouping described by the ``values()`` clause.
|
|
|
|
|
|
|
|
However, if the ``annotate()`` clause precedes the ``values()`` clause,
|
|
|
|
the annotations will be generated over the entire query set. In this case,
|
|
|
|
the ``values()`` clause only constrains the fields that are generated on
|
|
|
|
output.
|
|
|
|
|
|
|
|
For example, if we reverse the order of the ``values()`` and ``annotate()``
|
|
|
|
clause from our previous example::
|
|
|
|
|
2009-01-15 22:33:55 +00:00
|
|
|
>>> Author.objects.annotate(average_rating=Avg('book__rating')).values('name')
|
2009-01-15 11:06:34 +00:00
|
|
|
|
|
|
|
This will now yield one unique result for each author; however, only
|
|
|
|
the author's name and the ``average_rating`` annotation will be returned
|
|
|
|
in the output data.
|
|
|
|
|
|
|
|
Aggregating annotations
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
You can also generate an aggregate on the result of an annotation. When you
|
|
|
|
define an ``aggregate()`` clause, the aggregates you provide can reference
|
|
|
|
any alias defined as part of an ``annotate()`` clause in the query.
|
|
|
|
|
|
|
|
For example, if you wanted to calculate the average number of authors per
|
|
|
|
book you first annotate the set of books with the author count, then
|
|
|
|
aggregate that author count, referencing the annotation field::
|
|
|
|
|
|
|
|
>>> Book.objects.annotate(num_authors=Count('authors')).aggregate(Avg('num_authors'))
|
|
|
|
{'num_authors__avg': 1.66}
|