To use the simple `filename="..."` form, the value must conform to the
official grammar from RFC6266[^1]:
filename-parm = "filename" "=" value
value = <value, defined in [RFC2616], Section 3.6>
; token | quoted-string
The `quoted-string` definition comes from RFC 9110[^2]:
```
quoted-string = DQUOTE *( qdtext / quoted-pair ) DQUOTE
qdtext = HTAB / SP / %x21 / %x23-5B / %x5D-7E / obs-text
The backslash octet ("\") can be used as a single-octet quoting
mechanism within quoted-string and comment constructs. Recipients that
process the value of a quoted-string MUST handle a quoted-pair as if
it were replaced by the octet following the backslash.
quoted-pair = "\" ( HTAB / SP / VCHAR / obs-text )
A sender SHOULD NOT generate a quoted-pair in a quoted-string except
where necessary to quote DQUOTE and backslash octets occurring within
that string.
```
That is, quoted strings are able to express horizontal tabs, space
characters, and everything in the range from 0x21 to 0x7e, expect for
0x22 (`"`) and 0x5C (`\`), which can still be expressed but must be
escaped with their own `\`.
We ignore the case of `obs-text`, which is defined as the range
0x80-0xFF, since its presence is there for permissive parsing of
accidental high-bit characters, and it should not be generated by
conforming implementations.
Transform this character range into a regex and apply it in addition
to the "is ASCII" check. This ensures that all simple filenames are
expressed in the simple format, and that all filenames with newlines
and other control characters are properly expressed with the
percent-encoded `filename*=...`form.
[^1]: https://datatracker.ietf.org/doc/html/rfc6266#section-4.1
[^2]: https://datatracker.ietf.org/doc/html/rfc9110#name-quoted-strings
- Updated references to RFC 1123 to RFC 5322
- Only partial as RFC 5322 sort of sub-references RFC 1123.
- Updated references to RFC 2388 to RFC 7578
- Except RFC 2388 Section 5.3 which has no equivalent.
- Updated references to RFC 2396 to RFC 3986
- Updated references to RFC 2616 to RFC 9110
- Updated references to RFC 3066 to RFC 5646
- Updated references to RFC 7230 to RFC 9112
- Updated references to RFC 7231 to RFC 9110
- Updated references to RFC 7232 to RFC 9110
- Updated references to RFC 7234 to RFC 9111
- Tidied up style of text when referring to RFC documents
Refs Python CVE-2022-0391. Django is not affected, but others who
incorrectly use internal function url_has_allowed_host_and_scheme()
with unsanitized input could be at risk.
- Replaced datetime.utcnow() with datetime.now().
- Replaced datetime.utcfromtimestamp() with datetime.fromtimestamp().
- Replaced datetime.utctimetuple() with datetime.timetuple().
- Replaced calendar.timegm() and datetime.utctimetuple() with datetime.timestamp().
Previously, when running the Django test suite with warnings enabled,
the following was emitted:
/usr/lib64/python3.7/urllib/parse.py:915: BytesWarning: str() on a bytearray instance
v = quote_via(str(v), safe, encoding, errors)
This occurred due to the bytearray() being passed to
urllib.parse.urlencode() which eventually calls str() on it. The test
does not represent desired real world behavior. Rather than test for and
assert strange unspecified behavior that emits a warning, remove it.
This was also discussed in PR #11374.