mirror of
https://github.com/django/django.git
synced 2025-10-24 14:16:09 +00:00
committed by
Tim Graham
parent
888c1e9bfe
commit
d1bab24e01
@@ -3,8 +3,8 @@ Unicode data
|
||||
============
|
||||
|
||||
Django natively supports Unicode data everywhere. Providing your database can
|
||||
somehow store the data, you can safely pass around Unicode strings to
|
||||
templates, models and the database.
|
||||
somehow store the data, you can safely pass around strings to templates,
|
||||
models, and the database.
|
||||
|
||||
This document tells you what you need to know if you're writing applications
|
||||
that use data or templates that are encoded in something other than ASCII.
|
||||
@@ -35,11 +35,10 @@ able to store certain characters in the database, and information will be lost.
|
||||
.. _section 2: https://docs.oracle.com/database/121/NLSPG/ch2charset.htm#NLSPG002
|
||||
.. _section 11: https://docs.oracle.com/database/121/NLSPG/ch11charsetmig.htm#NLSPG011
|
||||
|
||||
All of Django's database backends automatically convert Unicode strings into
|
||||
All of Django's database backends automatically convert strings into
|
||||
the appropriate encoding for talking to the database. They also automatically
|
||||
convert strings retrieved from the database into Python Unicode strings. You
|
||||
don't even need to tell Django what encoding your database uses: that is
|
||||
handled transparently.
|
||||
convert strings retrieved from the database into strings. You don't even need
|
||||
to tell Django what encoding your database uses: that is handled transparently.
|
||||
|
||||
For more, see the section "The database API" below.
|
||||
|
||||
@@ -48,7 +47,7 @@ General string handling
|
||||
|
||||
Whenever you use strings with Django -- e.g., in database lookups, template
|
||||
rendering or anywhere else -- you have two choices for encoding those strings.
|
||||
You can use normal Unicode strings or bytestrings (starting with a 'b').
|
||||
You can use normal strings or bytestrings (starting with a 'b').
|
||||
|
||||
.. warning::
|
||||
|
||||
@@ -74,13 +73,13 @@ using your application -- and if that person chooses a different setting, your
|
||||
code must still continue to work. Ergo, it cannot rely on that setting.
|
||||
|
||||
In most cases when Django is dealing with strings, it will convert them to
|
||||
Unicode strings before doing anything else. So, as a general rule, if you pass
|
||||
in a bytestring, be prepared to receive a Unicode string back in the result.
|
||||
strings before doing anything else. So, as a general rule, if you pass
|
||||
in a bytestring, be prepared to receive a string back in the result.
|
||||
|
||||
Translated strings
|
||||
------------------
|
||||
|
||||
Aside from Unicode strings and bytestrings, there's a third type of string-like
|
||||
Aside from strings and bytestrings, there's a third type of string-like
|
||||
object you may encounter when using Django. The framework's
|
||||
internationalization features introduce the concept of a "lazy translation" --
|
||||
a string that has been marked as translated but whose actual translation result
|
||||
@@ -93,7 +92,7 @@ Normally, you won't have to worry about lazy translations. Just be aware that
|
||||
if you examine an object and it claims to be a
|
||||
``django.utils.functional.__proxy__`` object, it is a lazy translation.
|
||||
Calling ``str()`` with the lazy translation as the argument will generate a
|
||||
Unicode string in the current locale.
|
||||
string in the current locale.
|
||||
|
||||
For more details about lazy translation objects, refer to the
|
||||
:doc:`internationalization </topics/i18n/index>` documentation.
|
||||
@@ -102,17 +101,17 @@ Useful utility functions
|
||||
------------------------
|
||||
|
||||
Because some string operations come up again and again, Django ships with a few
|
||||
useful functions that should make working with Unicode and bytestring objects
|
||||
useful functions that should make working with string and bytestring objects
|
||||
a bit easier.
|
||||
|
||||
Conversion functions
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The ``django.utils.encoding`` module contains a few functions that are handy
|
||||
for converting back and forth between Unicode and bytestrings.
|
||||
for converting back and forth between strings and bytestrings.
|
||||
|
||||
* ``smart_text(s, encoding='utf-8', strings_only=False, errors='strict')``
|
||||
converts its input to a Unicode string. The ``encoding`` parameter
|
||||
converts its input to a string. The ``encoding`` parameter
|
||||
specifies the input encoding. (For example, Django uses this internally
|
||||
when processing form input data, which might not be UTF-8 encoded.) The
|
||||
``strings_only`` parameter, if set to True, will result in Python
|
||||
@@ -126,7 +125,7 @@ for converting back and forth between Unicode and bytestrings.
|
||||
cases. The difference is when the first argument is a :ref:`lazy
|
||||
translation <lazy-translations>` instance. While ``smart_text()``
|
||||
preserves lazy translations, ``force_text()`` forces those objects to a
|
||||
Unicode string (causing the translation to occur). Normally, you'll want
|
||||
string (causing the translation to occur). Normally, you'll want
|
||||
to use ``smart_text()``. However, ``force_text()`` is useful in
|
||||
template tags and filters that absolutely *must* have a string to work
|
||||
with, not just something that can be converted to a string.
|
||||
@@ -139,8 +138,8 @@ for converting back and forth between Unicode and bytestrings.
|
||||
but the difference is needed in a few places within Django's internals.
|
||||
|
||||
Normally, you'll only need to use ``force_text()``. Call it as early as
|
||||
possible on any input data that might be either Unicode or a bytestring, and
|
||||
from then on, you can treat the result as always being Unicode.
|
||||
possible on any input data that might be either a string or a bytestring, and
|
||||
from then on, you can treat the result as always being a string.
|
||||
|
||||
.. _uri-and-iri-handling:
|
||||
|
||||
@@ -225,13 +224,13 @@ double-quoting problems.
|
||||
Models
|
||||
======
|
||||
|
||||
Because all strings are returned from the database as Unicode strings, model
|
||||
Because all strings are returned from the database as ``str`` objects, model
|
||||
fields that are character based (CharField, TextField, URLField, etc.) will
|
||||
contain Unicode values when Django retrieves data from the database. This
|
||||
is *always* the case, even if the data could fit into an ASCII bytestring.
|
||||
|
||||
You can pass in bytestrings when creating a model or populating a field, and
|
||||
Django will convert it to Unicode when it needs to.
|
||||
Django will convert it to strings when it needs to.
|
||||
|
||||
Taking care in ``get_absolute_url()``
|
||||
-------------------------------------
|
||||
@@ -263,7 +262,7 @@ non-ASCII characters would have been removed in quoting in the first line.)
|
||||
The database API
|
||||
================
|
||||
|
||||
You can pass either Unicode strings or UTF-8 bytestrings as arguments to
|
||||
You can pass either strings or UTF-8 bytestrings as arguments to
|
||||
``filter()`` methods and the like in the database API. The following two
|
||||
querysets are identical::
|
||||
|
||||
@@ -273,11 +272,12 @@ querysets are identical::
|
||||
Templates
|
||||
=========
|
||||
|
||||
You can use either Unicode or bytestrings when creating templates manually::
|
||||
You can use either strings or UTF-8 bytestrings when creating templates
|
||||
manually::
|
||||
|
||||
from django.template import Template
|
||||
t1 = Template(b'This is a bytestring template.')
|
||||
t2 = Template('This is a Unicode template.')
|
||||
t2 = Template('This is a string template.')
|
||||
|
||||
But the common case is to read templates from the filesystem, and this creates
|
||||
a slight complication: not all filesystems store their data encoded as UTF-8.
|
||||
@@ -294,13 +294,13 @@ Template tags and filters
|
||||
|
||||
A couple of tips to remember when writing your own template tags and filters:
|
||||
|
||||
* Always return Unicode strings from a template tag's ``render()`` method
|
||||
* Always return strings from a template tag's ``render()`` method
|
||||
and from template filters.
|
||||
|
||||
* Use ``force_text()`` in preference to ``smart_text()`` in these
|
||||
places. Tag rendering and filter calls occur as the template is being
|
||||
rendered, so there is no advantage to postponing the conversion of lazy
|
||||
translation objects into strings. It's easier to work solely with Unicode
|
||||
translation objects into strings. It's easier to work solely with
|
||||
strings at that point.
|
||||
|
||||
.. _unicode-files:
|
||||
|
||||
Reference in New Issue
Block a user