Django

Code

Ticket #5778 (closed: fixed)

Opened 1 year ago

Last modified 1 year ago

Email subjects not encoded properly

Reported by: Thomas Petazzoni <thomas.petazzoni@enix.org> Assigned to: nobody
Milestone: Component: Core framework
Version: SVN Keywords:
Cc: Triage Stage: Accepted
Has patch: 0 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

When providing an UTF-8 encoded subject to the EmailMessage? class constructor, the subject is sent directly in UTF-8, without being encoded in Quoted-Printable or Base 64. When the MUA of the recepient is running on a machine with UTF-8, it "works", but with recipients having their machines running ISO-8859-x or other non-UTF-8 charset, the subject appears broken.

I think the problem comes from the implementation of the setitem method of the SafeMIMEText class. It only uses the Header() class when str(force_unicode(val)) raises an exception, which it doesn't do in my case (I suppose because my subject is properly UTF-8 encoded). However, I'd say it should *always* use Header(), which properly turns an UTF-8 string to a quoted-printable string.

I'm running Django trunk at r6526.

Don't hesitate to ask for further details if needed.

Attachments

django-mail-encoding-header-fix (2.7 kB) - added by Thomas Petazzoni <thomas.petazzoni@enix.org> on 10/18/07 16:03:41.
Ugly patch that fixes the problem for me

Change History

10/18/07 16:03:41 changed by Thomas Petazzoni <thomas.petazzoni@enix.org>

  • attachment django-mail-encoding-header-fix added.

Ugly patch that fixes the problem for me

10/19/07 05:25:33 changed by mtredinnick

  • needs_better_patch changed.
  • stage changed from Unreviewed to Accepted.
  • needs_tests changed.
  • needs_docs changed.

Yes. Good catch.

I'll have to check the behaviour of Header. This might need some tweaking from memory. The point is that when I was writing the current code, there were times when headers were being pointlessly encoded even when they could be represented directly (particularly ASCII text). So, providing it doesn't try to wrap ASCII up in anything fancy, this is the right fix. Otherwise, we need to check that the data really is non-ASCII before making a Header() out of it.

10/19/07 08:19:25 changed by tpetazzoni

You're right, it does some pointless encoding when the string is pure ASCII, for example:

From: =?utf-8?q?Trivialibre?= <trivialibre@enix.org>

In that case, the =?utf-8?q? stuff is useless.

So, instead of verifying if the string is Unicode (with force_unicode), the code should probably check if the string is ASCII 7bits only or not. Do you want me to provide an improved fix, or are you going to do it ?

10/19/07 22:10:26 changed by mtredinnick

I was thinking about this some more and I'm not sure I completely understand the problem any longer. force_unicode() forces the input to a unicode object and str() will raise an error for any data that isn't ASCII.

So, for example

>>> str(force_unicode('\xc3\x85ngstr\xc3\xb6m'))    # A UTF-8 bytestring
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 0: ordinal not in range(128)

The only way I could see this failing is if somebody had changed Python's default encoding, which is well advertised as being something that shouldn't be done, for exactly this sort of reason.

What is an example of a header string that is causing the problem? And what does sys.getdefaultencoding() return?

10/20/07 02:29:39 changed by tpetazzoni

From a raw Python shell, with PYTHONPATH=/path/to/django:

thomas@toulibre:/srv/www/trivialibre.humanoidz.org$ python
Python 2.4.4 (#2, Apr  5 2007, 20:11:18) 
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print sys.getdefaultencoding()
ascii
>>> from django.utils.encoding import force_unicode
>>> str(force_unicode('\xc3\x85ngstr\xc3\xb6m'))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 0: ordinal not in range(128)
>>> str(force_unicode('Nouvelle question "Et ça marche bien é ?"'))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 22: ordinal not in range(128)

So here, it works properly. Now, from a Python shell ran using "manage.py shell", still with PYTHONPATH=/path/to/django/:

thomas@toulibre:/srv/www/trivialibre.humanoidz.org$ ./trivialibre/tvl/manage.py shell
Python 2.4.4 (#2, Apr  5 2007, 20:11:18) 
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import sys
>>> print sys.getdefaultencoding()
utf-8
>>> from django.utils.encoding import force_unicode
>>> str(force_unicode('\xc3\x85ngstr\xc3\xb6m'))
'\xc3\x85ngstr\xc3\xb6m'
>>> str(force_unicode('Nouvelle question "Et ça marche bien é ?"'))
'Nouvelle question "Et \xc3\xa7a marche bien \xc3\xa9 ?"'
>>> 

The second string tested above is the one I was using for my tests. But yours also perfectly shows the problem.

10/20/07 02:34:50 changed by mtredinnick

Oh, that's tricky and not very nice behaviour. :-(

Okay, now I'm convinced. Will fix it in a minute.

10/20/07 02:53:55 changed by mtredinnick

  • status changed from new to closed.
  • resolution set to fixed.

(In [6551]) Fixed #5778 -- Changed the way we detect if a string is non-ASCII when creating email headers. This fixes a problem that was showing up on some (but not all) systems.

10/21/07 09:15:33 changed by tpetazzoni

Tested, works perfectly. Thanks!


Add/Change #5778 (Email subjects not encoded properly)




Change Properties
Action