diff --git a/CHANGES/5930.feature b/CHANGES/5930.feature new file mode 100644 index 00000000000..17cecee40d9 --- /dev/null +++ b/CHANGES/5930.feature @@ -0,0 +1 @@ +Switched ``chardet`` to ``charset-normalizer`` for guessing the HTTP payload body encoding -- :user:`Ousret`. diff --git a/CONTRIBUTORS.txt b/CONTRIBUTORS.txt index f94ab0f7438..f42331eeaef 100644 --- a/CONTRIBUTORS.txt +++ b/CONTRIBUTORS.txt @@ -7,6 +7,7 @@ Adam Horacek Adam Mills Adrian Krupa Adrián Chaves +Ahmed Tahri Alan Tse Alec Hanefeld Alejandro Gómez diff --git a/README.rst b/README.rst index 6abb34bef56..143c3a59baf 100644 --- a/README.rst +++ b/README.rst @@ -164,14 +164,14 @@ Requirements - Python >= 3.7 - async-timeout_ -- chardet_ +- charset-normalizer_ - multidict_ - yarl_ Optionally you may install the cChardet_ and aiodns_ libraries (highly recommended for sake of speed). -.. _chardet: https://pypi.python.org/pypi/chardet +.. _charset-normalizer: https://pypi.org/project/charset-normalizer .. _aiodns: https://pypi.python.org/pypi/aiodns .. _multidict: https://pypi.python.org/pypi/multidict .. _yarl: https://pypi.python.org/pypi/yarl diff --git a/aiohttp/client_reqrep.py b/aiohttp/client_reqrep.py index 8c64db600e6..41602afe703 100644 --- a/aiohttp/client_reqrep.py +++ b/aiohttp/client_reqrep.py @@ -70,7 +70,7 @@ try: import cchardet as chardet except ImportError: # pragma: no cover - import chardet # type: ignore[no-redef] + import charset_normalizer as chardet # type: ignore[no-redef] __all__ = ("ClientRequest", "ClientResponse", "RequestInfo", "Fingerprint") diff --git a/docs/client_reference.rst b/docs/client_reference.rst index ed935a2da1a..86bad7f0c95 100644 --- a/docs/client_reference.rst +++ b/docs/client_reference.rst @@ -1374,10 +1374,10 @@ Response object specified *encoding* parameter. If *encoding* is ``None`` content encoding is autocalculated - using ``Content-Type`` HTTP header and *chardet* tool if the + using ``Content-Type`` HTTP header and *charset-normalizer* tool if the header is not provided by server. - :term:`cchardet` is used with fallback to :term:`chardet` if + :term:`cchardet` is used with fallback to :term:`charset-normalizer` if *cchardet* is not available. Close underlying connection if data reading gets an error, @@ -1389,14 +1389,14 @@ Response object :return str: decoded *BODY* - :raise LookupError: if the encoding detected by chardet or cchardet is + :raise LookupError: if the encoding detected by cchardet is unknown by Python (e.g. VISCII). .. note:: If response has no ``charset`` info in ``Content-Type`` HTTP - header :term:`cchardet` / :term:`chardet` is used for content - encoding autodetection. + header :term:`cchardet` / :term:`charset-normalizer` is used for + content encoding autodetection. It may hurt performance. If page encoding is known passing explicit *encoding* parameter might help:: @@ -1411,7 +1411,7 @@ Response object a ``read`` call will be done, If *encoding* is ``None`` content encoding is autocalculated - using :term:`cchardet` or :term:`chardet` as fallback if + using :term:`cchardet` or :term:`charset-normalizer` as fallback if *cchardet* is not available. if response's `content-type` does not match `content_type` parameter @@ -1449,11 +1449,11 @@ Response object Automatically detect content encoding using ``charset`` info in ``Content-Type`` HTTP header. If this info is not exists or there are no appropriate codecs for encoding then :term:`cchardet` / - :term:`chardet` is used. + :term:`charset-normalizer` is used. Beware that it is not always safe to use the result of this function to decode a response. Some encodings detected by cchardet are not known by - Python (e.g. VISCII). + Python (e.g. VISCII). *charset-normalizer* is not concerned by that issue. :raise RuntimeError: if called before the body has been read, for :term:`cchardet` usage diff --git a/docs/glossary.rst b/docs/glossary.rst index c2da11817af..1de13dc7d04 100644 --- a/docs/glossary.rst +++ b/docs/glossary.rst @@ -45,11 +45,12 @@ Any object that can be called. Use :func:`callable` to check that. - chardet + charset-normalizer - The Universal Character Encoding Detector + The Real First Universal Charset Detector. + Open, modern and actively maintained alternative to Chardet. - https://pypi.python.org/pypi/chardet/ + https://pypi.org/project/charset-normalizer/ cchardet diff --git a/docs/index.rst b/docs/index.rst index 0f627bd170f..6be4898e029 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -34,7 +34,7 @@ Library Installation $ pip install aiohttp You may want to install *optional* :term:`cchardet` library as faster -replacement for :term:`chardet`: +replacement for :term:`charset-normalizer`: .. code-block:: bash @@ -51,7 +51,7 @@ This option is highly recommended: Installing speedups altogether ------------------------------ -The following will get you ``aiohttp`` along with :term:`chardet`, +The following will get you ``aiohttp`` along with :term:`charset-normalizer`, :term:`aiodns` and ``Brotli`` in one bundle. No need to type separate commands anymore! @@ -148,11 +148,11 @@ Dependencies - Python 3.7+ - *async_timeout* -- *chardet* +- *charset-normalizer* - *multidict* - *yarl* - *Optional* :term:`cchardet` as faster replacement for - :term:`chardet`. + :term:`charset-normalizer`. Install it explicitly via: diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt index e7a608fd658..c4182c2f06d 100644 --- a/docs/spelling_wordlist.txt +++ b/docs/spelling_wordlist.txt @@ -124,6 +124,7 @@ canonicalization canonicalize cchardet ceil +Chardet charset charsetdetect chunked @@ -228,6 +229,7 @@ namespace netrc nginx noop +normalizer nowait optimizations os diff --git a/requirements/base.txt b/requirements/base.txt index add03f1d06a..4c995d352d6 100644 --- a/requirements/base.txt +++ b/requirements/base.txt @@ -6,7 +6,7 @@ async-timeout==4.0.0a3 asynctest==0.13.0; python_version<"3.8" Brotli==1.0.9 cchardet==2.1.7 -chardet==4.0.0 +charset-normalizer==2.0.4 frozenlist==1.2.0 gunicorn==20.1.0 typing_extensions==3.7.4.3 diff --git a/requirements/dev.txt b/requirements/dev.txt index bd79518db7d..df6a12a9ad4 100644 --- a/requirements/dev.txt +++ b/requirements/dev.txt @@ -46,7 +46,7 @@ cfgv==3.2.0 # via # -r requirements/lint.txt # pre-commit -chardet==4.0.0 +charset-normalizer==2.0.4 # via # -r requirements/base.txt # requests diff --git a/setup.py b/setup.py index d9c7ef68a04..a73d331ea07 100644 --- a/setup.py +++ b/setup.py @@ -50,7 +50,7 @@ raise RuntimeError("Unable to determine version.") install_requires = [ - "chardet>=2.0,<5.0", + "charset-normalizer>=2.0,<3.0", "multidict>=4.5,<7.0", "async_timeout>=4.0a2,<5.0", 'asynctest==0.13.0; python_version<"3.8"',