Removed obsolete and potentially problematic IDNA 2003 ("punycode")
encoding of international domain names in smart_urlquote() and Urlizer,
which are used (only) by AdminURLFieldWidget and the urlize/urlizetrunc
template filters. Changed to use percent-encoded UTF-8, which defers
IDNA details to the browser (like other URLs rendered by Django).
Backport of 29ba75e6e57414f0e6f9528d08a520b8b931fb28 from main.
Thanks Saravana Kumar for the report, and Sarah Boyce and Mariusz
Felisiak for the reviews.
Co-authored-by: Natalia <124304+nessita@users.noreply.github.com>
To use the simple `filename="..."` form, the value must conform to the
official grammar from RFC6266[^1]:
filename-parm = "filename" "=" value
value = <value, defined in [RFC2616], Section 3.6>
; token | quoted-string
The `quoted-string` definition comes from RFC 9110[^2]:
```
quoted-string = DQUOTE *( qdtext / quoted-pair ) DQUOTE
qdtext = HTAB / SP / %x21 / %x23-5B / %x5D-7E / obs-text
The backslash octet ("\") can be used as a single-octet quoting
mechanism within quoted-string and comment constructs. Recipients that
process the value of a quoted-string MUST handle a quoted-pair as if
it were replaced by the octet following the backslash.
quoted-pair = "\" ( HTAB / SP / VCHAR / obs-text )
A sender SHOULD NOT generate a quoted-pair in a quoted-string except
where necessary to quote DQUOTE and backslash octets occurring within
that string.
```
That is, quoted strings are able to express horizontal tabs, space
characters, and everything in the range from 0x21 to 0x7e, expect for
0x22 (`"`) and 0x5C (`\`), which can still be expressed but must be
escaped with their own `\`.
We ignore the case of `obs-text`, which is defined as the range
0x80-0xFF, since its presence is there for permissive parsing of
accidental high-bit characters, and it should not be generated by
conforming implementations.
Transform this character range into a regex and apply it in addition
to the "is ASCII" check. This ensures that all simple filenames are
expressed in the simple format, and that all filenames with newlines
and other control characters are properly expressed with the
percent-encoded `filename*=...`form.
[^1]: https://datatracker.ietf.org/doc/html/rfc6266#section-4.1
[^2]: https://datatracker.ietf.org/doc/html/rfc9110#name-quoted-strings
Urlizer was not properly encoding email addresses containing punctuation
in generated mailto links. Per RFC 6068, fixed by percent encoding
(urllib.parse.quote) the local and domain address parts.
Over the years we've had multiple instances of hit and misses when
emitting warnings: either setting the wrong stacklevel or not setting
it at all.
This work adds assertions for the existing warnings that were declaring
the correct stacklevel, but were lacking tests for it.
This makes it easier to work with lazy iterators used for callables,
etc. when extracting items or comparing to lists, e.g. during testing.
Also added `BaseChoiceIterator.__iter__()` to make it clear that
subclasses must implement this and added `__all__` to the module.
Co-authored-by: Adam Johnson <me@adamj.eu>
Co-authored-by: Natalia Bidart <124304+nessita@users.noreply.github.com>
Previously, the proxy class was prepared lazily:
lazy_identity = lazy(identity, int)
lazy_identity(10) # prepared here
lazy_identity(10)
This has a slight advantage that if the lazy doesn't end up getting
used, the preparation work is skipped, however that's not very likely.
Besides this laziness, it is also inconsistent in that the methods which
are wrapped directly (__str__ etc.) are prepared already when __proxy__
is defined, and there is a weird half-initialized state.
This change it so that everything is prepared already on the first line
of the example above.
If the result type is bytes, then calling bytes() on it does nothing.
If the result type is not bytes, we should not cast to bytes, just
because the return value may be bytes.
This removes __text_cast() as it's the same as __cast().
_delegate_bytes and __delegate_text are mutually exclusive so the
`if self._delegate_bytes` branch in __cast() is unreachable.
Co-Authored-By: David Sanders <shang.xiao.sanders@gmail.com>