1
0
mirror of https://github.com/django/django.git synced 2025-01-10 18:36:05 +00:00
django/docs/ref/databases.txt
Malcolm Tredinnick f2b389b354 Added documentation to explain the gains and losses when using utf8_bin
collation in MySQL. This should help people to make a reasonably informed
decision. Usually, leaving the MySQL collation alone will be the best solution,
but if you must change it, this gives a start to the information you need and
pointers to the appropriate place in the MySQL docs.

There's a small chance I also got all the necessary Sphinx markup correct, too
(it builds without errors, but I may have missed some chances for glory and
linkage).

Fixed #2335, #8506.


git-svn-id: http://code.djangoproject.com/svn/django/trunk@8568 bcc190cf-cafb-0310-a4f2-bffc1f526a37
2008-08-26 01:59:25 +00:00

361 lines
15 KiB
Plaintext

.. _ref-databases:
===============================
Notes about supported databases
===============================
Django attempts to support as many features as possible on all database
backends. However, not all database backends are alike, and we've had to make
design decisions on which features to support and which assumptions we can make
safely.
This file describes some of the features that might be relevant to Django
usage. Of course, it is not intended as a replacement for server-specific
documentation or reference manuals.
MySQL notes
===========
Django expects the database to support transactions, referential integrity,
and Unicode support (UTF-8 encoding). Fortunately, MySQL_ has all these
features as available as far back as 3.23. While it may be possible to use
3.23 or 4.0, you'll probably have less trouble if you use 4.1 or 5.0.
MySQL 4.1
---------
`MySQL 4.1`_ has greatly improved support for character sets. It is possible to
set different default character sets on the database, table, and column.
Previous versions have only a server-wide character set setting. It's also the
first version where the character set can be changed on the fly. 4.1 also has
support for views, but Django currently doesn't use views.
MySQL 5.0
---------
`MySQL 5.0`_ adds the ``information_schema`` database, which contains detailed
data on all database schema. Django's ``inspectdb`` feature uses this
``information_schema`` if it's available. 5.0 also has support for stored
procedures, but Django currently doesn't use stored procedures.
.. _MySQL: http://www.mysql.com/
.. _MySQL 4.1: http://dev.mysql.com/doc/refman/4.1/en/index.html
.. _MySQL 5.0: http://dev.mysql.com/doc/refman/5.0/en/index.html
Storage engines
---------------
MySQL has several `storage engines`_ (previously called table types). You can
change the default storage engine in the server configuration.
The default engine is MyISAM_. The main drawback of MyISAM is that it doesn't
currently support transactions or foreign keys. On the plus side, it's
currently the only engine that supports full-text indexing and searching.
The InnoDB_ engine is fully transactional and supports foreign key references.
The BDB_ engine, like InnoDB, is also fully transactional and supports foreign
key references. However, its use seems to be deprecated.
`Other storage engines`_, including SolidDB_ and Falcon_, are on the horizon.
For now, InnoDB is probably your best choice.
.. _storage engines: http://dev.mysql.com/doc/refman/5.0/en/storage-engines.html
.. _MyISAM: http://dev.mysql.com/doc/refman/5.0/en/myisam-storage-engine.html
.. _BDB: http://dev.mysql.com/doc/refman/5.0/en/bdb-storage-engine.html
.. _InnoDB: http://dev.mysql.com/doc/refman/5.0/en/innodb.html
.. _Other storage engines: http://dev.mysql.com/doc/refman/5.1/en/storage-engines-other.html
.. _SolidDB: http://forge.mysql.com/projects/view.php?id=139
.. _Falcon: http://dev.mysql.com/doc/falcon/en/index.html
MySQLdb
-------
`MySQLdb`_ is the Python interface to MySQL. Version 1.2.1p2 or later is
required for full MySQL support in Django.
.. note::
If you see ``ImportError: cannot import name ImmutableSet`` when trying to
use Django, your MySQLdb installation may contain an outdated ``sets.py``
file that conflicts with the built-in module of the same name from Python
2.4 and later. To fix this, verify that you have installed MySQLdb version
1.2.1p2 or newer, then delete the ``sets.py`` file in the MySQLdb
directory that was left by an earlier version.
.. _MySQLdb: http://sourceforge.net/projects/mysql-python
Creating your database
----------------------
You can `create your database`_ using the command-line tools and this SQL::
CREATE DATABASE <dbname> CHARACTER SET utf8;
This ensures all tables and columns will use UTF-8 by default.
.. _create your database: http://dev.mysql.com/doc/refman/5.0/en/create-database.html
.. _mysql-collation:
Collation settings
~~~~~~~~~~~~~~~~~~
The collation setting for a column controls the order in which data is sorted
as well as what strings compare as equal. It can be set on a database-wide
level and also per-table and per-column. This is `documented thoroughly`_ in
the MySQL documentation. In all cases, you set the collation by directly
manipulating the database tables; Django doesn't provide a way to set this on
the model definition.
.. _documented thoroughly: http://dev.mysql.com/doc/refman/5.0/en/charset.html
By default, with a UTF-8 database, MySQL will use the
``utf8_general_ci_swedish`` collation. This results in all string equality
comparisons being done in a *case-insensitive* manner. That is, ``"Fred"`` and
``"freD"`` are considered equal at the database level. If you have a unique
constraint on a field, it would be illegal to try to insert both ``"aa"`` and
``"AA"`` into the same column, since they compare as equal (and, hence,
non-unique) with the default collation.
In many cases, this default will not be a problem. However, if you really want
case-sensitive comparisons on a particular column or table, you would change
the column or table to use the ``utf8_bin`` collation. The main thing to be
aware of in this case is that if you are using MySQLdb 1.2.2, the database backend in Django will then return
bytestrings (instead of unicode strings) for any character fields it returns
receive from the database. This is a strong variation from Django's normal
practice of *always* returning unicode strings. It is up to you, the
developer, to handle the fact that you will receive bytestrings if you
configure your table(s) to use ``utf8_bin`` collation. Django itself should work
smoothly with such columns, but if your code must be prepared to call
``django.utils.encoding.smart_unicode()`` at times if it really wants to work
with consistent data -- Django will not do this for you (the database backend
layer and the model population layer are separated internally so the database
layer doesn't know it needs to make this conversion in this one particular
case).
If you're using MySQLdb 1.2.1p2, Django's standard
:class:`~django.db.models.CharField` class will return unicode strings even
with ``utf8_bin`` collation. However, :class:`~django.db.models.TextField`
fields will be returned as an ``array.array`` instance (from Python's standard
``array`` module). There isn't a lot Django can do about that, since, again,
the information needed to make the necessary conversions isn't available when
the data is read in from the database. This problem was `fixed in MySQLdb
1.2.2`_, so if you want to use :class:`~django.db.models.TextField` with
``utf8_bin`` collation, upgrading to version 1.2.2 and then dealing with the
bytestrings (which shouldn't be too difficult) is the recommended solution.
Should you decide to use ``utf8_bin`` collation for some of your tables with
MySQLdb 1.2.1p2, you should still use ``utf8_collation_ci_swedish`` (the
default) collation for the :class:`django.contrib.sessions.models.Session`
table (usually called ``django_session`` and the table
:class:`django.contrib.admin.models.LogEntry` table (usually called
``django_admin_log``). Those are the two standard tables that use
:class:`~django.db.model.TextField` internally.
.. _fixed in MySQLdb 1.2.2: http://sourceforge.net/tracker/index.php?func=detail&aid=1495765&group_id=22307&atid=374932
Connecting to the database
--------------------------
Refer to the :ref:`settings documentation <ref-settings>`.
Connection settings are used in this order:
1. :setting:`DATABASE_OPTIONS`.
2. :setting:`DATABASE_NAME`, :setting:`DATABASE_USER`,
:setting:`DATABASE_PASSWORD`, :setting:`DATABASE_HOST`,
:setting:`DATABASE_PORT`
3. MySQL option files.
In other words, if you set the name of the database in ``DATABASE_OPTIONS``,
this will take precedence over ``DATABASE_NAME``, which would override
anything in a `MySQL option file`_.
Here's a sample configuration which uses a MySQL option file::
# settings.py
DATABASE_ENGINE = "mysql"
DATABASE_OPTIONS = {
'read_default_file': '/path/to/my.cnf',
}
# my.cnf
[client]
database = DATABASE_NAME
user = DATABASE_USER
password = DATABASE_PASSWORD
default-character-set = utf8
Several other MySQLdb connection options may be useful, such as ``ssl``,
``use_unicode``, ``init_command``, and ``sql_mode``. Consult the
`MySQLdb documentation`_ for more details.
.. _MySQL option file: http://dev.mysql.com/doc/refman/5.0/en/option-files.html
.. _MySQLdb documentation: http://mysql-python.sourceforge.net/
Creating your tables
--------------------
When Django generates the schema, it doesn't specify a storage engine, so
tables will be created with whatever default storage engine your database
server is configured for. The easiest solution is to set your database server's
default storage engine to the desired engine.
If you're using a hosting service and can't change your server's default
storage engine, you have a couple of options.
* After the tables are created, execute an ``ALTER TABLE`` statement to
convert a table to a new storage engine (such as InnoDB)::
ALTER TABLE <tablename> ENGINE=INNODB;
This can be tedious if you have a lot of tables.
* Another option is to use the ``init_command`` option for MySQLdb prior to
creating your tables::
DATABASE_OPTIONS = {
# ...
"init_command": "SET storage_engine=INNODB",
# ...
}
This sets the default storage engine upon connecting to the database.
After your tables have been created, you should remove this option.
* Another method for changing the storage engine is described in
AlterModelOnSyncDB_.
.. _AlterModelOnSyncDB: http://code.djangoproject.com/wiki/AlterModelOnSyncDB
.. _oracle-notes:
Oracle notes
============
Django supports `Oracle Database Server`_ versions 9i and higher. Oracle
version 10g or later is required to use Django's ``regex`` and ``iregex`` query
operators. You will also need the `cx_Oracle`_ driver, version 4.3.1 or newer.
.. _`Oracle Database Server`: http://www.oracle.com/
.. _`cx_Oracle`: http://cx-oracle.sourceforge.net/
In order for the ``python manage.py syncdb`` command to work, your Oracle
database user must have privileges to run the following commands:
* CREATE TABLE
* CREATE SEQUENCE
* CREATE PROCEDURE
* CREATE TRIGGER
To run Django's test suite, the user needs these *additional* privileges:
* CREATE USER
* DROP USER
* CREATE TABLESPACE
* DROP TABLESPACE
Connecting to the database
--------------------------
Your Django settings.py file should look something like this for Oracle::
DATABASE_ENGINE = 'oracle'
DATABASE_NAME = 'xe'
DATABASE_USER = 'a_user'
DATABASE_PASSWORD = 'a_password'
DATABASE_HOST = ''
DATABASE_PORT = ''
If you don't use a ``tnsnames.ora`` file or a similar naming method that
recognizes the SID ("xe" in this example), then fill in both
:setting:`DATABASE_HOST` and :setting:`DATABASE_PORT` like so::
DATABASE_ENGINE = 'oracle'
DATABASE_NAME = 'xe'
DATABASE_USER = 'a_user'
DATABASE_PASSWORD = 'a_password'
DATABASE_HOST = 'dbprod01ned.mycompany.com'
DATABASE_PORT = '1540'
You should supply both :setting:`DATABASE_HOST` and :setting:`DATABASE_PORT`, or leave both
as empty strings.
Tablespace options
------------------
A common paradigm for optimizing performance in Oracle-based systems is the
use of `tablespaces`_ to organize disk layout. The Oracle backend supports
this use case by adding ``db_tablespace`` options to the ``Meta`` and
``Field`` classes. (When you use a backend that lacks support for tablespaces,
Django ignores these options.)
.. _`tablespaces`: http://en.wikipedia.org/wiki/Tablespace
A tablespace can be specified for the table(s) generated by a model by
supplying the ``db_tablespace`` option inside the model's ``class Meta``.
Additionally, you can pass the ``db_tablespace`` option to a ``Field``
constructor to specify an alternate tablespace for the ``Field``'s column
index. If no index would be created for the column, the ``db_tablespace``
option is ignored::
class TablespaceExample(models.Model):
name = models.CharField(max_length=30, db_index=True, db_tablespace="indexes")
data = models.CharField(max_length=255, db_index=True)
edges = models.ManyToManyField(to="self", db_tablespace="indexes")
class Meta:
db_tablespace = "tables"
In this example, the tables generated by the ``TablespaceExample`` model
(i.e., the model table and the many-to-many table) would be stored in the
``tables`` tablespace. The index for the name field and the indexes on the
many-to-many table would be stored in the ``indexes`` tablespace. The ``data``
field would also generate an index, but no tablespace for it is specified, so
it would be stored in the model tablespace ``tables`` by default.
**New in the Django development version:** Use the :setting:`DEFAULT_TABLESPACE`
and :setting:`DEFAULT_INDEX_TABLESPACE` settings to specify default values for
the db_tablespace options. These are useful for setting a tablespace for the
built-in Django apps and other applications whose code you cannot control.
Django does not create the tablespaces for you. Please refer to `Oracle's
documentation`_ for details on creating and managing tablespaces.
.. _`Oracle's documentation`: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7003.htm#SQLRF01403
Naming issues
-------------
Oracle imposes a name length limit of 30 characters. To accommodate this, the
backend truncates database identifiers to fit, replacing the final four
characters of the truncated name with a repeatable MD5 hash value.
NULL and empty strings
----------------------
Django generally prefers to use the empty string ('') rather than NULL, but
Oracle treats both identically. To get around this, the Oracle backend
coerces the ``null=True`` option on fields that permit the empty string as a
value. When fetching from the database, it is assumed that a NULL value in
one of these fields really means the empty string, and the data is silently
converted to reflect this assumption.
``TextField`` limitations
-------------------------
The Oracle backend stores ``TextFields`` as ``NCLOB`` columns. Oracle imposes
some limitations on the usage of such LOB columns in general:
* LOB columns may not be used as primary keys.
* LOB columns may not be used in indexes.
* LOB columns may not be used in a ``SELECT DISTINCT`` list. This means that
attempting to use the ``QuerySet.distinct`` method on a model that
includes ``TextField`` columns will result in an error when run against
Oracle. A workaround to this is to keep ``TextField`` columns out of any
models that you foresee performing ``distinct()`` queries on, and to
include the ``TextField`` in a related model instead.