email package status in 3.X?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

email package status in 3.X?

lutz-10
I'm updating the current Programming Python for 3.X, as well as
its fairly large email client examples (GUI- and web-based) that
rely on the email package heavily.

I've gotten to the point where I need to decode the bytes of a
message fetched with poplib into the Unicode strings expected
by the email parser, and run into the dilemma--because decoding
may require headers inspection, it appears that scripts need to
parse in order to decode, but need to decode in order to parse.

I know this is being discussed and may be addressed soon, but
because the email package is crucial to this book's largest
examples, I'm looking or a bit more information on this:

--What's the current ETA on a new version of the email parser
which handles byte strings?  The web suggests it might be 3.2,
3.3, or even 3.4.  It seems to still be in early stages.

--How backward compatible will the new email be?  I'm assuming
it will handle bytes but be otherwise very similar, but 3.x set
quite a precedent for changes, and changes break books.

Any updates on this would be appreciated; for better or worse,
email is a major dependency for one of the flagship Python books
out there.  Since postponing the update probably isn't an option,
I'm leaning towards decoding per a user-configurable default
(latin1 or utf8?) for now, but that's less than ideal.

Thanks,
--Mark Lutz    

(feel free to cross-post if this belongs elsewhere, and please
respond to my email address directly)

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: email package status in 3.X?

R. David Murray
On Fri, 07 May 2010 07:15:19 -0400, [hidden email] wrote:
> --What's the current ETA on a new version of the email parser
> which handles byte strings?  The web suggests it might be 3.2,
> 3.3, or even 3.4.  It seems to still be in early stages.

My best guess at this point (it's an informed guess, but still very much
a guess) is that email6 will be available in 3.3, and I am hoping there
will be a pypi package available for testing it under 3.1/3.2 some time
before the end of this year.

> --How backward compatible will the new email be?  I'm assuming
> it will handle bytes but be otherwise very similar, but 3.x set
> quite a precedent for changes, and changes break books.

Sorry to be the bearer of bad news from your point of view, but there
are indeed likely to be a number of fairly significant changes.  The plan
is to provide a backward compatibility layer, but that probably doesn't
help you much since you'd presumably rather discuss the "official" API.

> Any updates on this would be appreciated; for better or worse,
> email is a major dependency for one of the flagship Python books
> out there.  Since postponing the update probably isn't an option,
> I'm leaning towards decoding per a user-configurable default
> (latin1 or utf8?) for now, but that's less than ideal.

Email is a major dependency for a number of things, and IMO is perhaps
the biggest thing blocking Python3 adoption that the Python development
community has any control over.  Unfortunately there is currently a
distinct lack of volunteer time to work on it.

Several of us are working on ways to support and speed up email6
development.  There is a GSoC student who will be doing some work,
with me as mentor, and I am hoping to get funding to be able to spend
a significant number of hours on the package on a contract-programming
basis as well.  There are structures the PSF needs to put in place before
I can do fundraising for that, however.  If you know anyone who might
want to just pay me for it straight out, let me know :)

As for what you do *now*...unfortunately I don't know of any answer that
works, otherwise we'd have implemented it.

--
R. David Murray                                      www.bitdance.com
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: email package status in 3.X?

lutz-10
In reply to this post by lutz-10
Thanks very much for your reply.

I'm probably going to have to go ahead and finish the book
with the email package as it is now, and include a lot of
caveats about the problems that a new version may fix in the
future.  I can also post updated example code if/when possible.

I realize everybody on this list probably knows this already,
but email in 3.X not only doesn't support the Unicode/bytes
dichotomy, it was also broken by it.  Beyond the pre-parse
decode issue, its mail text generation really only works for
all-text mails.  Generating text of an email with any sort of
binary part doesn't work at all now, because the base64 text
is still bytes, and the Generator expects str.  I've coded a
custom encoder to pass to MIMEImage that works around this
by decoding to ASCII, but it's not a great story to have to
tell the tens of thousands of readers of this book, many of
whom will be evaluating 3.X in general.

It's unfortunate, IMHO, that the powers that be chose to ship
Python 3.0 with a badly broken email package.  This probably
could have been avoided with a short period of concerted effort
by pydev, and I think it does leave 3.X with a bit of a black
eye.  Two years later, the 3.0 I/O speed issue has been fixed
but this has not?  Odd, that.  I'm also not convinced that
poplib, smptlib, or ftplib in 3.X completely address the brave
new Unicode world either, but time and 3.X users will tell.

Then again, such is life in realistic software development.
At the end of the day, I suppose this isn't a bad lesson for
readers to learn.  As for funding, I don't have any specific
ideas, but this project should clearly be a top priority.

Thanks again,
--Mark Lutz



-----Original Message-----

>From: "R. David Murray" <[hidden email]>
>Sent: May 9, 2010 5:31 PM
>To: [hidden email]
>Cc: [hidden email]
>Subject: Re: [Email-SIG] email package status in 3.X?
>
>On Fri, 07 May 2010 07:15:19 -0400, [hidden email] wrote:
>> --What's the current ETA on a new version of the email parser
>> which handles byte strings?  The web suggests it might be 3.2,
>> 3.3, or even 3.4.  It seems to still be in early stages.
>
>My best guess at this point (it's an informed guess, but still very much
>a guess) is that email6 will be available in 3.3, and I am hoping there
>will be a pypi package available for testing it under 3.1/3.2 some time
>before the end of this year.
>
>> --How backward compatible will the new email be?  I'm assuming
>> it will handle bytes but be otherwise very similar, but 3.x set
>> quite a precedent for changes, and changes break books.
>
>Sorry to be the bearer of bad news from your point of view, but there
>are indeed likely to be a number of fairly significant changes.  The plan
>is to provide a backward compatibility layer, but that probably doesn't
>help you much since you'd presumably rather discuss the "official" API.
>
>> Any updates on this would be appreciated; for better or worse,
>> email is a major dependency for one of the flagship Python books
>> out there.  Since postponing the update probably isn't an option,
>> I'm leaning towards decoding per a user-configurable default
>> (latin1 or utf8?) for now, but that's less than ideal.
>
>Email is a major dependency for a number of things, and IMO is perhaps
>the biggest thing blocking Python3 adoption that the Python development
>community has any control over.  Unfortunately there is currently a
>distinct lack of volunteer time to work on it.
>
>Several of us are working on ways to support and speed up email6
>development.  There is a GSoC student who will be doing some work,
>with me as mentor, and I am hoping to get funding to be able to spend
>a significant number of hours on the package on a contract-programming
>basis as well.  There are structures the PSF needs to put in place before
>I can do fundraising for that, however.  If you know anyone who might
>want to just pay me for it straight out, let me know :)
>
>As for what you do *now*...unfortunately I don't know of any answer that
>works, otherwise we'd have implemented it.
>
>--
>R. David Murray                                      www.bitdance.com

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: email package status in 3.X?

R. David Murray
On Mon, 10 May 2010 14:02:46 -0400, [hidden email] wrote:
> I realize everybody on this list probably knows this already,
> but email in 3.X not only doesn't support the Unicode/bytes
> dichotomy, it was also broken by it.  Beyond the pre-parse
> decode issue, its mail text generation really only works for
> all-text mails.  Generating text of an email with any sort of
> binary part doesn't work at all now, because the base64 text
> is still bytes, and the Generator expects str.  I've coded a

There's an open bug report for this, and it can be addressed with a fix
in the current package (I just bumped the prio to critical to make sure
I get it into the next release).

> custom encoder to pass to MIMEImage that works around this
> by decoding to ASCII, but it's not a great story to have to
> tell the tens of thousands of readers of this book, many of
> whom will be evaluating 3.X in general.
>
> It's unfortunate, IMHO, that the powers that be chose to ship
> Python 3.0 with a badly broken email package.  This probably
> could have been avoided with a short period of concerted effort
> by pydev, and I think it does leave 3.X with a bit of a black
> eye.  Two years later, the 3.0 I/O speed issue has been fixed
> but this has not?  Odd, that.  I'm also not convinced that

Well, speeding up IO was a matter of rewriting an already designed and
implemented python-based package in C, and volunteers with an interest
stepped forward to do that job.  Fixing email involves designing and
implementing a new version of email that can handle the separation
between bytes and unicode correctly.  (You will note that the 2.x package
did not do so, and that fact is the source of many still-open bugs.)
Unfortunately, none of the email experts involved in Python development
had any time available to do this work, and until I expressed interest
at the end of last year no new volunteers had come forward to write code.

> poplib, smptlib, or ftplib in 3.X completely address the brave
> new Unicode world either, but time and 3.X users will tell.

I am afraid that you are correct.  We've found an fixed a few things,
but I'm pretty sure there are more waiting to be found.

If you have time to file bugs for anything you come across, that would
be most helpful.

> Then again, such is life in realistic software development.

Particularly in the primarily-volunteer open source world.

> At the end of the day, I suppose this isn't a bad lesson for
> readers to learn.  As for funding, I don't have any specific
> ideas, but this project should clearly be a top priority.

Thanks.  I've forwarded your note to the PSF board, as a reminder of
how important this is ;)

--
R. David Murray                                      www.bitdance.com
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: email package status in 3.X?

Matthew Dixon Cowles
In reply to this post by lutz-10
Mark,

> I realize everybody on this list probably knows this already,
> but email in 3.X not only doesn't support the Unicode/bytes
> dichotomy, it was also broken by it.

Yes, it's a shame that it has worked out that way. I think it's
because email is an almost uniquely hard problem when you try to make
a sharp distinction between text and bytes.

When you receive an email, what have you got? It's supposed to be
ASCII, but of course it often isn't. What character set should you
assume that those eight-bit characters are in? The program that's
using the module probably does want to try to guess since it probably
wants to make as much sense as possible out of an incorrectly formed
email. The same goes for mis-specified encodings, both in headers and
in MIME parts.

So you probably need to provide multiple ways of getting at headers
and the MIME parts that claim to be text. You'll want to be able to
get at the original data (probably as bytes for safety) and the text
version if one can be created.

And so forth.

Happily passing eight-bit strings around with the assumption that the
user would make the correct sense of them mapped onto email really
well. Trying to make a strict distinction between bytes and text
turns out to be a bit of a mess in this context.

But you probably already knew all that as well.

Regards,
Matt

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com