mirror of
https://github.com/django/django.git
synced 2024-12-25 02:26:12 +00:00
251 lines
9.8 KiB
Plaintext
251 lines
9.8 KiB
Plaintext
================
|
|
Full text search
|
|
================
|
|
|
|
The database functions in the ``django.contrib.postgres.search`` module ease
|
|
the use of PostgreSQL's `full text search engine
|
|
<https://www.postgresql.org/docs/current/textsearch.html>`_.
|
|
|
|
For the examples in this document, we'll use the models defined in
|
|
:doc:`/topics/db/queries`.
|
|
|
|
.. seealso::
|
|
|
|
For a high-level overview of searching, see the :doc:`topic documentation
|
|
</topics/db/search>`.
|
|
|
|
.. currentmodule:: django.contrib.postgres.search
|
|
|
|
The ``search`` lookup
|
|
=====================
|
|
|
|
.. fieldlookup:: search
|
|
|
|
A common way to use full text search is to search a single term against a
|
|
single column in the database. For example::
|
|
|
|
>>> Entry.objects.filter(body_text__search='Cheese')
|
|
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
|
|
|
|
This creates a ``to_tsvector`` in the database from the ``body_text`` field
|
|
and a ``plainto_tsquery`` from the search term ``'Cheese'``, both using the
|
|
default database search configuration. The results are obtained by matching the
|
|
query and the vector.
|
|
|
|
To use the ``search`` lookup, ``'django.contrib.postgres'`` must be in your
|
|
:setting:`INSTALLED_APPS`.
|
|
|
|
``SearchVector``
|
|
================
|
|
|
|
.. class:: SearchVector(*expressions, config=None, weight=None)
|
|
|
|
Searching against a single field is great but rather limiting. The ``Entry``
|
|
instances we're searching belong to a ``Blog``, which has a ``tagline`` field.
|
|
To query against both fields, use a ``SearchVector``::
|
|
|
|
>>> from django.contrib.postgres.search import SearchVector
|
|
>>> Entry.objects.annotate(
|
|
... search=SearchVector('body_text', 'blog__tagline'),
|
|
... ).filter(search='Cheese')
|
|
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
|
|
|
|
The arguments to ``SearchVector`` can be any
|
|
:class:`~django.db.models.Expression` or the name of a field. Multiple
|
|
arguments will be concatenated together using a space so that the search
|
|
document includes them all.
|
|
|
|
``SearchVector`` objects can be combined together, allowing you to reuse them.
|
|
For example::
|
|
|
|
>>> Entry.objects.annotate(
|
|
... search=SearchVector('body_text') + SearchVector('blog__tagline'),
|
|
... ).filter(search='Cheese')
|
|
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
|
|
|
|
See :ref:`postgresql-fts-search-configuration` and
|
|
:ref:`postgresql-fts-weighting-queries` for an explanation of the ``config``
|
|
and ``weight`` parameters.
|
|
|
|
``SearchQuery``
|
|
===============
|
|
|
|
.. class:: SearchQuery(value, config=None, search_type='plain')
|
|
|
|
``SearchQuery`` translates the terms the user provides into a search query
|
|
object that the database compares to a search vector. By default, all the words
|
|
the user provides are passed through the stemming algorithms, and then it
|
|
looks for matches for all of the resulting terms.
|
|
|
|
If ``search_type`` is ``'plain'``, which is the default, the terms are treated
|
|
as separate keywords. If ``search_type`` is ``'phrase'``, the terms are treated
|
|
as a single phrase. If ``search_type`` is ``'raw'``, then you can provide a
|
|
formatted search query with terms and operators. Read PostgreSQL's `Full Text
|
|
Search docs`_ to learn about differences and syntax. Examples:
|
|
|
|
.. _Full Text Search docs: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
|
|
|
|
>>> from django.contrib.postgres.search import SearchQuery
|
|
>>> SearchQuery('red tomato') # two keywords
|
|
>>> SearchQuery('tomato red') # same results as above
|
|
>>> SearchQuery('red tomato', search_type='phrase') # a phrase
|
|
>>> SearchQuery('tomato red', search_type='phrase') # a different phrase
|
|
>>> SearchQuery("'tomato' & ('red' | 'green')", search_type='raw') # boolean operators
|
|
|
|
``SearchQuery`` terms can be combined logically to provide more flexibility::
|
|
|
|
>>> from django.contrib.postgres.search import SearchQuery
|
|
>>> SearchQuery('meat') & SearchQuery('cheese') # AND
|
|
>>> SearchQuery('meat') | SearchQuery('cheese') # OR
|
|
>>> ~SearchQuery('meat') # NOT
|
|
|
|
See :ref:`postgresql-fts-search-configuration` for an explanation of the
|
|
``config`` parameter.
|
|
|
|
``SearchRank``
|
|
==============
|
|
|
|
.. class:: SearchRank(vector, query, weights=None)
|
|
|
|
So far, we've returned the results for which any match between the vector and
|
|
the query are possible. It's likely you may wish to order the results by some
|
|
sort of relevancy. PostgreSQL provides a ranking function which takes into
|
|
account how often the query terms appear in the document, how close together
|
|
the terms are in the document, and how important the part of the document is
|
|
where they occur. The better the match, the higher the value of the rank. To
|
|
order by relevancy::
|
|
|
|
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
|
|
>>> vector = SearchVector('body_text')
|
|
>>> query = SearchQuery('cheese')
|
|
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
|
|
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
|
|
|
|
See :ref:`postgresql-fts-weighting-queries` for an explanation of the
|
|
``weights`` parameter.
|
|
|
|
.. _postgresql-fts-search-configuration:
|
|
|
|
Changing the search configuration
|
|
=================================
|
|
|
|
You can specify the ``config`` attribute to a :class:`SearchVector` and
|
|
:class:`SearchQuery` to use a different search configuration. This allows using
|
|
different language parsers and dictionaries as defined by the database::
|
|
|
|
>>> from django.contrib.postgres.search import SearchQuery, SearchVector
|
|
>>> Entry.objects.annotate(
|
|
... search=SearchVector('body_text', config='french'),
|
|
... ).filter(search=SearchQuery('œuf', config='french'))
|
|
[<Entry: Pain perdu>]
|
|
|
|
The value of ``config`` could also be stored in another column::
|
|
|
|
>>> from django.db.models import F
|
|
>>> Entry.objects.annotate(
|
|
... search=SearchVector('body_text', config=F('blog__language')),
|
|
... ).filter(search=SearchQuery('œuf', config=F('blog__language')))
|
|
[<Entry: Pain perdu>]
|
|
|
|
.. _postgresql-fts-weighting-queries:
|
|
|
|
Weighting queries
|
|
=================
|
|
|
|
Every field may not have the same relevance in a query, so you can set weights
|
|
of various vectors before you combine them::
|
|
|
|
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
|
|
>>> vector = SearchVector('body_text', weight='A') + SearchVector('blog__tagline', weight='B')
|
|
>>> query = SearchQuery('cheese')
|
|
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by('rank')
|
|
|
|
The weight should be one of the following letters: D, C, B, A. By default,
|
|
these weights refer to the numbers ``0.1``, ``0.2``, ``0.4``, and ``1.0``,
|
|
respectively. If you wish to weight them differently, pass a list of four
|
|
floats to :class:`SearchRank` as ``weights`` in the same order above::
|
|
|
|
>>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
|
|
>>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by('-rank')
|
|
|
|
Performance
|
|
===========
|
|
|
|
Special database configuration isn't necessary to use any of these functions,
|
|
however, if you're searching more than a few hundred records, you're likely to
|
|
run into performance problems. Full text search is a more intensive process
|
|
than comparing the size of an integer, for example.
|
|
|
|
In the event that all the fields you're querying on are contained within one
|
|
particular model, you can create a functional index which matches the search
|
|
vector you wish to use. The PostgreSQL documentation has details on
|
|
`creating indexes for full text search
|
|
<https://www.postgresql.org/docs/current/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX>`_.
|
|
|
|
``SearchVectorField``
|
|
---------------------
|
|
|
|
.. class:: SearchVectorField
|
|
|
|
If this approach becomes too slow, you can add a ``SearchVectorField`` to your
|
|
model. You'll need to keep it populated with triggers, for example, as
|
|
described in the `PostgreSQL documentation`_. You can then query the field as
|
|
if it were an annotated ``SearchVector``::
|
|
|
|
>>> Entry.objects.update(search_vector=SearchVector('body_text'))
|
|
>>> Entry.objects.filter(search_vector='cheese')
|
|
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
|
|
|
|
.. _PostgreSQL documentation: https://www.postgresql.org/docs/current/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS
|
|
|
|
Trigram similarity
|
|
==================
|
|
|
|
Another approach to searching is trigram similarity. A trigram is a group of
|
|
three consecutive characters. In addition to the :lookup:`trigram_similar`
|
|
lookup, you can use a couple of other expressions.
|
|
|
|
To use them, you need to activate the `pg_trgm extension
|
|
<https://www.postgresql.org/docs/current/pgtrgm.html>`_ on PostgreSQL. You can
|
|
install it using the
|
|
:class:`~django.contrib.postgres.operations.TrigramExtension` migration
|
|
operation.
|
|
|
|
``TrigramSimilarity``
|
|
---------------------
|
|
|
|
.. class:: TrigramSimilarity(expression, string, **extra)
|
|
|
|
Accepts a field name or expression, and a string or expression. Returns the
|
|
trigram similarity between the two arguments.
|
|
|
|
Usage example::
|
|
|
|
>>> from django.contrib.postgres.search import TrigramSimilarity
|
|
>>> Author.objects.create(name='Katy Stevens')
|
|
>>> Author.objects.create(name='Stephen Keats')
|
|
>>> test = 'Katie Stephens'
|
|
>>> Author.objects.annotate(
|
|
... similarity=TrigramSimilarity('name', test),
|
|
... ).filter(similarity__gt=0.3).order_by('-similarity')
|
|
[<Author: Katy Stevens>, <Author: Stephen Keats>]
|
|
|
|
``TrigramDistance``
|
|
-------------------
|
|
|
|
.. class:: TrigramDistance(expression, string, **extra)
|
|
|
|
Accepts a field name or expression, and a string or expression. Returns the
|
|
trigram distance between the two arguments.
|
|
|
|
Usage example::
|
|
|
|
>>> from django.contrib.postgres.search import TrigramDistance
|
|
>>> Author.objects.create(name='Katy Stevens')
|
|
>>> Author.objects.create(name='Stephen Keats')
|
|
>>> test = 'Katie Stephens'
|
|
>>> Author.objects.annotate(
|
|
... distance=TrigramDistance('name', test),
|
|
... ).filter(distance__lte=0.7).order_by('distance')
|
|
[<Author: Katy Stevens>, <Author: Stephen Keats>]
|