email package status in 3.X

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

email package status in 3.X

R. David Murray
On Mon May 10 20:02:46 CEST 2010 Mark Lutz wrote:

> I'm probably going to have to go ahead and finish the book
> with the email package as it is now, and include a lot of
> caveats about the problems that a new version may fix in the
> future.  I can also post updated example code if/when possible.
>
> I realize everybody on this list probably knows this already,
> but email in 3.X not only doesn't support the Unicode/bytes
> dichotomy, it was also broken by it.  Beyond the pre-parse
> decode issue, its mail text generation really only works for
> all-text mails.  Generating text of an email with any sort of
> binary part doesn't work at all now, because the base64 text
> is still bytes, and the Generator expects str.  I've coded a
> custom encoder to pass to MIMEImage that works around this
> by decoding to ASCII, but it's not a great story to have to
> tell the tens of thousands of readers of this book, many of
> whom will be evaluating 3.X in general.

This bug should now be fixed in both the py3k branch and the 3.1
maint branch.  This means the fix will be in 3.1.3, as well as 3.2a1.
Hopefully that will be in time for your book, since 3.2a1 is due June
27th and I'm guessing the 3.1.3 release will be some time not too far
off that time frame as well.  FYI I also fixed a related bug that made
using utf-8 as a charset problematic.  Unfortunately I suspect there
maybe some other charset issues waiting to be discovered.

If you have come across any other bugs that don't already have
issues in the tracker please file bug reports.  Anything that
can be fixed in the current package I will endeavor to fix
before the next release.  Feel free also to indicate bugs which
should be given priority.

--
R. David Murray                                      www.bitdance.com
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: email package status in 3.X

lutz-10
Thanks, David; that's great news.  I'll update the book draft
accordingly.

For the record, despite the issues, I was able to complete a fairly
full-featured email client GUI with the email package as it currently
is.  This includes parsing and generating arbitrary attachments, as
well as encoding on sends and decoding on fetches for both text payloads
and I18N mail headers. The package is still quite powerful as is.  It
does take a bit of digging to figure out how to use its many tools,
but the book will probably help on this front, especially the
upcoming edition's more complete application.

In other words, some of my concern may have been a bit premature.  
I hope that in the future we'll either strive for compatibility
or keep the current version around; it's a lot of very useful code.

In fact, I recommend that any new email package be named distinctly,
and that the current package be retained for a number of releases to
come.  After all the breakages that 3.X introduced in general, doing
the same to any email-based code seems a bit too much, especially
given that the current package is largely functional as is.  To me,
after having just used it extensively, fixing its few issues seems
a better approach than starting from scratch.

As far as other issues, the things I found are described below my
signature.  I don't know what the utf-8 issue is that you refer
too; I'm able to parse and send with this encoding as is without
problems (both payloads and headers), but I'm probably not using the
interfaces you fixed, and this may be the same as one of item listed.

Another thought: it might be useful to use the book's email client
as a sort of test case for the package; it's much more rigorous in
the new edition because it now has to be given 3.X'Unicode model
(it's abut 4,900 lines of code, though not all is email-related).
I'd be happy to donate the code as soon as I find out what the
copyright will be this time around; it will be at O'Reilly's site
this Fall in any event.

Thanks,
--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)


Major issues I found...
------------------------------------------------------------------
1) Str required for parsing, but bytes returned from poplib

The initial decode from bytes to str of full mail text; in
retrospect, probably not a major issue, since original email
standards called for ASCII.  A 8-bit encoding like Latin-1 is
probably sufficient for most conforming mails.  For the book,
I try a set of different encodings, beginning with an optional
configuration module setting, then ascii, latin-1, and utf-8;
this is probably overkill, but a GUI has to be defensive.

----------------------------------------------------------------

2) Binary attachments encoding

The binary attachments byte-to-str issue that you've just
fixed.  As I mentioned, I worked around this by passing in a
custom encoder that calls the original and runs an extra decode
step.  Here's what my fix looked like in the book; your patch
may do better, and I will minimally add a note about the 3.1.3
and 3.2 fix for this:

def fix_encode_base64(msgobj):
     from email.encoders import encode_base64
     encode_base64(msgobj)                # what email does normally: leaves bytes
     bytes = msgobj.get_payload()         # bytes fails in email pkg on text gen
     text  = bytes.decode('ascii')        # decode to unicode str so text gen works
     ...plus line splitting logic omitted...
     msgobj.set_payload('\n'.join(lines))

>>> from email.mime.image import MIMEImage
>>> from mailtools.mailSender import fix_encode_base64      # use custom workaround
>>> bytes = open('monkeys.jpg', 'rb').read()
>>> m = MIMEImage(bytes, _encoder=fix_encode_base64)        # convert to ascii str
>>> print(m.as_string()[:500])

-------------------------------------------------------------------

3) Type-dependent text part encoding

There's a str/bytes confusion issue related to Unicode encodings
in text payload generation: some encodings require the payload to
be str, but others expect bytes.  Unfortunately, this means that
clients need to know how the package will react to the encoding
that is used, and special-case based upon that.  

For example, I needed to pass in str for ASCII and Latin-1 (the
former is unencoded and the latter gets QP MIME treatment), but
must pass a bytes for UTF-8 (which triggers Base64).  That's less
than ideal for a client trying to attach arbitrary text parts
generically from filenames.  Here's the obscure workaround I came
up with; the bodytext is str when fetched from an edit window,
but may also be loaded from an attachment file.  This may or may
not have been reported, and it's entirley possible that there's
a better solution that I've missed.

def fix_text_required(encodingname):
    """
    4E: workaround for str/bytes combinaton errors in email package;  MIMEText
    requires different types for different Unicode encodings in Python 3.1, due
    to the different ways it MIME-encodes some types of text;  see Chapter 13;
    the only other alternative is using generic Message and repeating much code;
    """
    from email.charset import Charset, BASE64, QP
    charset = Charset(encodingname)   # how email knows what to do for encoding
    bodyenc = charset.body_encoding   # utf8, others require bytes input data
    return bodyenc in (None, QP)      # ascii, latin1, others require str

# on mail sends...
# email needs either str xor bytes specifically;
if fix_text_required(bodytextEncoding):
    if not isinstance(bodytext, str):
        bodytext = bodytext.decode(bodytextEncoding)
else:
    if not isinstance(bodytext, bytes):
        bodytext = bodytext.encode(bodytextEncoding)

# later
msg.set_payload(bodytext, charset=bodytextEncoding)
...or...
msg = MIMEText(bodytext, _charset=bodytextEncoding)
mainmsg.attach(msg)

# attachments
# build sub-Message of appropriate kind
maintype, subtype = contype.split('/', 1)
if maintype == 'text':                       # 4E: text needs encoding
    if fix_text_required(fileencode):        # requires str or bytes
        data = open(filename, 'r', encoding=fileencode)
    else:
        data = open(filename, 'rb')
    msg = MIMEText(data.read(), _subtype=subtype, _charset=fileencode)
    data.close()

-------------------------------------------------------------------

There are some additional cases that now require decoding per mail
headers today due to the str/bytes split, but these are just a
normal artifact of supporting Unicode character sets in general,
ans seem like issues for package client to resolve (e.g., the bytes
returned for decoded payloads in 3.X didn't play well with existing
str-based text processing code written for 2.X).

-------------------------------------------------------------------


-----Original Message-----

>From: "R. David Murray" <[hidden email]>
>Sent: Jun 4, 2010 12:39 PM
>To: [hidden email]
>Cc: [hidden email]
>Subject: email package status in 3.X
>
>On Mon May 10 20:02:46 CEST 2010 Mark Lutz wrote:
>> I'm probably going to have to go ahead and finish the book
>> with the email package as it is now, and include a lot of
>> caveats about the problems that a new version may fix in the
>> future.  I can also post updated example code if/when possible.
>>
>> I realize everybody on this list probably knows this already,
>> but email in 3.X not only doesn't support the Unicode/bytes
>> dichotomy, it was also broken by it.  Beyond the pre-parse
>> decode issue, its mail text generation really only works for
>> all-text mails.  Generating text of an email with any sort of
>> binary part doesn't work at all now, because the base64 text
>> is still bytes, and the Generator expects str.  I've coded a
>> custom encoder to pass to MIMEImage that works around this
>> by decoding to ASCII, but it's not a great story to have to
>> tell the tens of thousands of readers of this book, many of
>> whom will be evaluating 3.X in general.
>
>This bug should now be fixed in both the py3k branch and the 3.1
>maint branch.  This means the fix will be in 3.1.3, as well as 3.2a1.
>Hopefully that will be in time for your book, since 3.2a1 is due June
>27th and I'm guessing the 3.1.3 release will be some time not too far
>off that time frame as well.  FYI I also fixed a related bug that made
>using utf-8 as a charset problematic.  Unfortunately I suspect there
>maybe some other charset issues waiting to be discovered.
>
>If you have come across any other bugs that don't already have
>issues in the tracker please file bug reports.  Anything that
>can be fixed in the current package I will endeavor to fix
>before the next release.  Feel free also to indicate bugs which
>should be given priority.
>
>--
>R. David Murray                                      www.bitdance.com

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: email package status in 3.X

R. David Murray
On Thu, 10 Jun 2010 09:21:52 -0400, [hidden email] wrote:
> In other words, some of my concern may have been a bit premature.  
> I hope that in the future we'll either strive for compatibility
> or keep the current version around; it's a lot of very useful code.

The plan is to have a compatibility layer that will accept calls based
on the old API and forward appropriately to the new API.  So far I'm
thinking I can succeed in doing this in a fairly straightforward manner,
but I won't know for sure until I get some more pieces in place.

> In fact, I recommend that any new email package be named distinctly,

I'm going to avoid that if I can (though the PyPI package will be
named email6 when we publish it for public testing).  If, however,
it turns out that I can't correctly support both the old and the
new API, then I'll have to do that.

> and that the current package be retained for a number of releases to
> come.  After all the breakages that 3.X introduced in general, doing
> the same to any email-based code seems a bit too much, especially
> given that the current package is largely functional as is.  To me,
> after having just used it extensively, fixing its few issues seems
> a better approach than starting from scratch.

Well, the thing is, as you found, existing 2.x code needs to be fixed to
correctly handle the distinction between strings and bytes no matter what.
The goal is to make it easier to write correct programs, while providing
the compatibility layer to make porting smoother.  But I doubt that any
non-trivial 2.x email program will port without significant changes,
even if the compatibility layer is close to 100% compatible with the
current Python3 email package, simply because the previous conflation
of text and bytes must be untangled in order to work correctly in
Python3, and email involves lots of transitions between text and bytes.

As for "starting from scratch", it is true that the current plan involves
considerable changes in the recommended API (in the direction of greater
flexibility and power), but I'm hoping that significant portions of the
code will carry forward with minor changes, and that this will make it
easier to support the old API.

> As far as other issues, the things I found are described below my
> signature.  I don't know what the utf-8 issue is that you refer
> too; I'm able to parse and send with this encoding as is without
> problems (both payloads and headers), but I'm probably not using the
> interfaces you fixed, and this may be the same as one of item listed.

It is, see below.

> Another thought: it might be useful to use the book's email client
> as a sort of test case for the package; it's much more rigorous in
> the new edition because it now has to be given 3.X'Unicode model
> (it's abut 4,900 lines of code, though not all is email-related).
> I'd be happy to donate the code as soon as I find out what the
> copyright will be this time around; it will be at O'Reilly's site
> this Fall in any event.

That would be great.  I am planning to write my own sample ap to
demonstrate the new API, but if I can use yours to test the compatibility
layer that will help a lot, since I otherwise have no Python3 email
application to test against unless I port something from Python2.

> Major issues I found...
> ------------------------------------------------------------------
> 1) Str required for parsing, but bytes returned from poplib
>
> The initial decode from bytes to str of full mail text; in
> retrospect, probably not a major issue, since original email
> standards called for ASCII.  A 8-bit encoding like Latin-1 is
> probably sufficient for most conforming mails.  For the book,
> I try a set of different encodings, beginning with an optional
> configuration module setting, then ascii, latin-1, and utf-8;
> this is probably overkill, but a GUI has to be defensive.

This works (mostly) for conforming email, but some important Python email
applications need to deal with non-conforming email.  That's where the
inability to parse bytes directly really causes problems.

> 2) Binary attachments encoding
>
> The binary attachments byte-to-str issue that you've just
> fixed.  As I mentioned, I worked around this by passing in a
> custom encoder that calls the original and runs an extra decode
> step.  Here's what my fix looked like in the book; your patch
> may do better, and I will minimally add a note about the 3.1.3
> and 3.2 fix for this:

Yeah, our patch was a lot simpler since we could fix the encoding inside
the loop producing the encoded lines :)

> 3) Type-dependent text part encoding
>
> There's a str/bytes confusion issue related to Unicode encodings
> in text payload generation: some encodings require the payload to
> be str, but others expect bytes.  Unfortunately, this means that
> clients need to know how the package will react to the encoding
> that is used, and special-case based upon that.  

This was the UTF-8 bug I fixed.  I shouldn't have called it "the UTF-8
bug", because it applies equally to the other charsets that use base64,
as you note.  I called it that because UTF-8 was where the problem was
noticed and is mentioned in the title of the bug report.

I had a suspicion that the quoted-printable encoding wasn't being done
correctly either, so to hear that it is working for you is good news.
There may still be bugs to find there, though.

So, in the next releases of Python all MIMEText input should be string,
and it will fail if you pass bytes.  I consider this as email previously
not living up to its published API, but do you think I should hack
in a way for it to accept bytes too, for backward compatibility in the
3 line?

> There are some additional cases that now require decoding per mail
> headers today due to the str/bytes split, but these are just a
> normal artifact of supporting Unicode character sets in general,
> ans seem like issues for package client to resolve (e.g., the bytes
> returned for decoded payloads in 3.X didn't play well with existing
> str-based text processing code written for 2.X).

I'm not following you here.  Can you give me some more specific
examples?  Even if these "normal artifacts" must remain with
the current API, I'd like to make things as easy as practical when
using the new API.

Thanks for all your feedback!

--David
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: email package status in 3.X

Barry Warsaw
On Jun 10, 2010, at 10:18 AM, R. David Murray wrote:

>That would be great.  I am planning to write my own sample ap to
>demonstrate the new API, but if I can use yours to test the compatibility
>layer that will help a lot, since I otherwise have no Python3 email
>application to test against unless I port something from Python2.

I would support/help with a port of Mailman 3 to Python 3.  It's
non-trivial, but would make a good test case.  The dependency stack may make
that difficult.

-Barry

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: email package status in 3.X

R. David Murray
On Thu, 10 Jun 2010 10:42:14 -0400, Barry Warsaw <[hidden email]> wrote:

> On Jun 10, 2010, at 10:18 AM, R. David Murray wrote:
>
> >That would be great.  I am planning to write my own sample ap to
> >demonstrate the new API, but if I can use yours to test the compatibility
> >layer that will help a lot, since I otherwise have no Python3 email
> >application to test against unless I port something from Python2.
>
> I would support/help with a port of Mailman 3 to Python 3.  It's
> non-trivial, but would make a good test case.  The dependency stack may make
> that difficult.

I realized after I sent that email that I should have said "until",
since that's one of the testing goals (seeing how applications
port both to the compatibility and to the new API).

Mailman is at the top of the list of test ports, but as you say
dependencies may have to be dealt with first.  I'm certainly glad
you are willing to help, since that will doubtless make it go
faster :)

--
R. David Murray                                      www.bitdance.com
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: email package status in 3.X

lutz-10
In reply to this post by R. David Murray
Hi David,

All sounds good, and thanks again for all your work on this.

I appreciate the difficulties of moving this package to 3.X
in a backward-compatible way.  My suggestions stem from the fact
that it does work as is today, albeit in a less than ideal way.

That, and I'm seeing that Python 3.X in general is still having
a great deal of trouble gaining traction in the "real world"
almost 2 years after its release, and I'd hate to see further
disincentives for people to migrate.  This is a bigger issue
than both the email package and this thread, of course.

> > 3) Type-dependent text part encoding
> >
> ...
> So, in the next releases of Python all MIMEText input should be string,
> and it will fail if you pass bytes.  I consider this as email previously
> not living up to its published API, but do you think I should hack
> in a way for it to accept bytes too, for backward compatibility in the
> 3 line?

Decoding can probably be safely delegated to package clients.
Typical email clients will probably have str for display of the
main text.  They may wish to read attachments in binary mode, but
can always read in text mode instead or decode manualy, because
they need a known encoding to send the part correctly (my client
has to ask or use configurations in some cases).

B/W compatibility probably isn't a concern; I suspect that my
temporary workaround will still work with your patch anyhow,
and this code didn't work at all for some encodings before.

> > There are some additional cases that now require decoding per mail
> > headers today due to the str/bytes split, but these are just a
> > normal artifact of supporting Unicode character sets in general,
> > ans seem like issues for package client to resolve (e.g., the bytes
> > returned for decoded payloads in 3.X didn't play well with existing
> > str-based text processing code written for 2.X).
>
> I'm not following you here.  Can you give me some more specific
> examples?  Even if these "normal artifacts" must remain with
> the current API, I'd like to make things as easy as practical when
> using the new API.

This was just a general statement about things in my own code that
didn't jive with the 3.X string model.  For instance, line wrapping
logic assumed str; tkinter text widgets do much better rendering str
than the bytes fetched for decoded payloads; and my Pyedit text editor
component had to be overhauled to handle display/edit/save of payloads
of arbitrary encodings.  If I remember any more specific issues with
the email package itself, I'll forward your way.

I'll watch for an opportunity to get the book's new PyMailGUI
client code to you as a candidate test case, but please ping
me about it later if I haven't acted on this.  It works well,
but largely because of all the work that went into the email
package underlying it.

Thanks,
--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)


> -----Original Message-----
> From: "R. David Murray" <[hidden email]>
> To: [hidden email]
> Subject: Re: email package status in 3.X
> Date: Thu, 10 Jun 2010 10:18:48 -0400
>
> On Thu, 10 Jun 2010 09:21:52 -0400, [hidden email] wrote:
> > In other words, some of my concern may have been a bit premature.  
> > I hope that in the future we'll either strive for compatibility
> > or keep the current version around; it's a lot of very useful code.
>
> The plan is to have a compatibility layer that will accept calls based
> on the old API and forward appropriately to the new API.  So far I'm
> thinking I can succeed in doing this in a fairly straightforward manner,
> but I won't know for sure until I get some more pieces in place.
>
> > In fact, I recommend that any new email package be named distinctly,
>
> I'm going to avoid that if I can (though the PyPI package will be
> named email6 when we publish it for public testing).  If, however,
> it turns out that I can't correctly support both the old and the
> new API, then I'll have to do that.
>
> > and that the current package be retained for a number of releases to
> > come.  After all the breakages that 3.X introduced in general, doing
> > the same to any email-based code seems a bit too much, especially
> > given that the current package is largely functional as is.  To me,
> > after having just used it extensively, fixing its few issues seems
> > a better approach than starting from scratch.
>
> Well, the thing is, as you found, existing 2.x code needs to be fixed to
> correctly handle the distinction between strings and bytes no matter what.
> The goal is to make it easier to write correct programs, while providing
> the compatibility layer to make porting smoother.  But I doubt that any
> non-trivial 2.x email program will port without significant changes,
> even if the compatibility layer is close to 100% compatible with the
> current Python3 email package, simply because the previous conflation
> of text and bytes must be untangled in order to work correctly in
> Python3, and email involves lots of transitions between text and bytes.
>
> As for "starting from scratch", it is true that the current plan involves
> considerable changes in the recommended API (in the direction of greater
> flexibility and power), but I'm hoping that significant portions of the
> code will carry forward with minor changes, and that this will make it
> easier to support the old API.
>
> > As far as other issues, the things I found are described below my
> > signature.  I don't know what the utf-8 issue is that you refer
> > too; I'm able to parse and send with this encoding as is without
> > problems (both payloads and headers), but I'm probably not using the
> > interfaces you fixed, and this may be the same as one of item listed.
>
> It is, see below.
>
> > Another thought: it might be useful to use the book's email client
> > as a sort of test case for the package; it's much more rigorous in
> > the new edition because it now has to be given 3.X'Unicode model
> > (it's abut 4,900 lines of code, though not all is email-related).
> > I'd be happy to donate the code as soon as I find out what the
> > copyright will be this time around; it will be at O'Reilly's site
> > this Fall in any event.
>
> That would be great.  I am planning to write my own sample ap to
> demonstrate the new API, but if I can use yours to test the compatibility
> layer that will help a lot, since I otherwise have no Python3 email
> application to test against unless I port something from Python2.
>
> > Major issues I found...
> > ------------------------------------------------------------------
> > 1) Str required for parsing, but bytes returned from poplib
> >
> > The initial decode from bytes to str of full mail text; in
> > retrospect, probably not a major issue, since original email
> > standards called for ASCII.  A 8-bit encoding like Latin-1 is
> > probably sufficient for most conforming mails.  For the book,
> > I try a set of different encodings, beginning with an optional
> > configuration module setting, then ascii, latin-1, and utf-8;
> > this is probably overkill, but a GUI has to be defensive.
>
> This works (mostly) for conforming email, but some important Python email
> applications need to deal with non-conforming email.  That's where the
> inability to parse bytes directly really causes problems.
>
> > 2) Binary attachments encoding
> >
> > The binary attachments byte-to-str issue that you've just
> > fixed.  As I mentioned, I worked around this by passing in a
> > custom encoder that calls the original and runs an extra decode
> > step.  Here's what my fix looked like in the book; your patch
> > may do better, and I will minimally add a note about the 3.1.3
> > and 3.2 fix for this:
>
> Yeah, our patch was a lot simpler since we could fix the encoding inside
> the loop producing the encoded lines :)
>
> > 3) Type-dependent text part encoding
> >
> > There's a str/bytes confusion issue related to Unicode encodings
> > in text payload generation: some encodings require the payload to
> > be str, but others expect bytes.  Unfortunately, this means that
> > clients need to know how the package will react to the encoding
> > that is used, and special-case based upon that.  
>
> This was the UTF-8 bug I fixed.  I shouldn't have called it "the UTF-8
> bug", because it applies equally to the other charsets that use base64,
> as you note.  I called it that because UTF-8 was where the problem was
> noticed and is mentioned in the title of the bug report.
>
> I had a suspicion that the quoted-printable encoding wasn't being done
> correctly either, so to hear that it is working for you is good news.
> There may still be bugs to find there, though.
>
> So, in the next releases of Python all MIMEText input should be string,
> and it will fail if you pass bytes.  I consider this as email previously
> not living up to its published API, but do you think I should hack
> in a way for it to accept bytes too, for backward compatibility in the
> 3 line?
>
> > There are some additional cases that now require decoding per mail
> > headers today due to the str/bytes split, but these are just a
> > normal artifact of supporting Unicode character sets in general,
> > ans seem like issues for package client to resolve (e.g., the bytes
> > returned for decoded payloads in 3.X didn't play well with existing
> > str-based text processing code written for 2.X).
>
> I'm not following you here.  Can you give me some more specific
> examples?  Even if these "normal artifacts" must remain with
> the current API, I'd like to make things as easy as practical when
> using the new API.
>
> Thanks for all your feedback!
>
> --David
>



_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: email package status in 3.X

lutz-10
In reply to this post by R. David Murray
Come to think of it, here was another oddness I just recalled: this
may have been reported already, but header decoding returns mixed types
depending upon the structure of the header.  Converting to a str for
display isn't too difficult to handle, but this seems a bit inconsistent
and contrary to Python's type neutrality:

>>> from email.header import decode_header
>>> S1 = 'Man where did you get that assistant?'
>>> S2 = '=?utf-8?q?Man_where_did_you_get_that_assistant=3F?='
>>> S3 = 'Man where did you get that =?UTF-8?Q?assistant=3F?='

# str: don't decode()
>>> decode_header(S1)
[('Man where did you get that assistant?', None)]

# bytes: do decode()
>>> decode_header(S2)
[(b'Man where did you get that assistant?', 'utf-8')]

# bytes: do decode(), using raw-unicode-escape applied in package
>>> decode_header(S3)
[(b'Man where did you get that', None), (b'assistant?', 'utf-8')]

I can make this work around this with the following code, but it
feels a bit too tightly coupled to the package's internal details
(further evidence that email.* can be made to work as is today,
even if it may be seen as less than ideal aesthetically):

parts = email.header.decode_header(rawheader)
decoded = []
for (part, enc) in parts:                      # for all substrings
    if enc == None:                            # part unencoded?
        if not isinstance(part, bytes):        # str: full hdr unencoded
            decoded += [part]                  # else do unicode decode
        else:
            decoded += [part.decode('raw-unicode-escape')]
    else:
        decoded += [part.decode(enc)]
return ' '.join(decoded)

Thanks,
--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)


> -----Original Message-----
> From: [hidden email]
> To: "R. David Murray" <[hidden email]>
> Subject: Re: email package status in 3.X
> Date: Sat, 12 Jun 2010 16:52:32 -0000
>
> Hi David,
>
> All sounds good, and thanks again for all your work on this.
>
> I appreciate the difficulties of moving this package to 3.X
> in a backward-compatible way.  My suggestions stem from the fact
> that it does work as is today, albeit in a less than ideal way.
>
> That, and I'm seeing that Python 3.X in general is still having
> a great deal of trouble gaining traction in the "real world"
> almost 2 years after its release, and I'd hate to see further
> disincentives for people to migrate.  This is a bigger issue
> than both the email package and this thread, of course.
>
> > > 3) Type-dependent text part encoding
> > >
> > ...
> > So, in the next releases of Python all MIMEText input should be string,
> > and it will fail if you pass bytes.  I consider this as email previously
> > not living up to its published API, but do you think I should hack
> > in a way for it to accept bytes too, for backward compatibility in the
> > 3 line?
>
> Decoding can probably be safely delegated to package clients.
> Typical email clients will probably have str for display of the
> main text.  They may wish to read attachments in binary mode, but
> can always read in text mode instead or decode manualy, because
> they need a known encoding to send the part correctly (my client
> has to ask or use configurations in some cases).
>
> B/W compatibility probably isn't a concern; I suspect that my
> temporary workaround will still work with your patch anyhow,
> and this code didn't work at all for some encodings before.
>
> > > There are some additional cases that now require decoding per mail
> > > headers today due to the str/bytes split, but these are just a
> > > normal artifact of supporting Unicode character sets in general,
> > > ans seem like issues for package client to resolve (e.g., the bytes
> > > returned for decoded payloads in 3.X didn't play well with existing
> > > str-based text processing code written for 2.X).
> >
> > I'm not following you here.  Can you give me some more specific
> > examples?  Even if these "normal artifacts" must remain with
> > the current API, I'd like to make things as easy as practical when
> > using the new API.
>
> This was just a general statement about things in my own code that
> didn't jive with the 3.X string model.  For instance, line wrapping
> logic assumed str; tkinter text widgets do much better rendering str
> than the bytes fetched for decoded payloads; and my Pyedit text editor
> component had to be overhauled to handle display/edit/save of payloads
> of arbitrary encodings.  If I remember any more specific issues with
> the email package itself, I'll forward your way.
>
> I'll watch for an opportunity to get the book's new PyMailGUI
> client code to you as a candidate test case, but please ping
> me about it later if I haven't acted on this.  It works well,
> but largely because of all the work that went into the email
> package underlying it.
>
> Thanks,
> --Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)
>
>
> > -----Original Message-----
> > From: "R. David Murray" <[hidden email]>
> > To: [hidden email]
> > Subject: Re: email package status in 3.X
> > Date: Thu, 10 Jun 2010 10:18:48 -0400
> >
> > On Thu, 10 Jun 2010 09:21:52 -0400, [hidden email] wrote:
> > > In other words, some of my concern may have been a bit premature.  
> > > I hope that in the future we'll either strive for compatibility
> > > or keep the current version around; it's a lot of very useful code.
> >
> > The plan is to have a compatibility layer that will accept calls based
> > on the old API and forward appropriately to the new API.  So far I'm
> > thinking I can succeed in doing this in a fairly straightforward manner,
> > but I won't know for sure until I get some more pieces in place.
> >
> > > In fact, I recommend that any new email package be named distinctly,
> >
> > I'm going to avoid that if I can (though the PyPI package will be
> > named email6 when we publish it for public testing).  If, however,
> > it turns out that I can't correctly support both the old and the
> > new API, then I'll have to do that.
> >
> > > and that the current package be retained for a number of releases to
> > > come.  After all the breakages that 3.X introduced in general, doing
> > > the same to any email-based code seems a bit too much, especially
> > > given that the current package is largely functional as is.  To me,
> > > after having just used it extensively, fixing its few issues seems
> > > a better approach than starting from scratch.
> >
> > Well, the thing is, as you found, existing 2.x code needs to be fixed to
> > correctly handle the distinction between strings and bytes no matter what.
> > The goal is to make it easier to write correct programs, while providing
> > the compatibility layer to make porting smoother.  But I doubt that any
> > non-trivial 2.x email program will port without significant changes,
> > even if the compatibility layer is close to 100% compatible with the
> > current Python3 email package, simply because the previous conflation
> > of text and bytes must be untangled in order to work correctly in
> > Python3, and email involves lots of transitions between text and bytes.
> >
> > As for "starting from scratch", it is true that the current plan involves
> > considerable changes in the recommended API (in the direction of greater
> > flexibility and power), but I'm hoping that significant portions of the
> > code will carry forward with minor changes, and that this will make it
> > easier to support the old API.
> >
> > > As far as other issues, the things I found are described below my
> > > signature.  I don't know what the utf-8 issue is that you refer
> > > too; I'm able to parse and send with this encoding as is without
> > > problems (both payloads and headers), but I'm probably not using the
> > > interfaces you fixed, and this may be the same as one of item listed.
> >
> > It is, see below.
> >
> > > Another thought: it might be useful to use the book's email client
> > > as a sort of test case for the package; it's much more rigorous in
> > > the new edition because it now has to be given 3.X'Unicode model
> > > (it's abut 4,900 lines of code, though not all is email-related).
> > > I'd be happy to donate the code as soon as I find out what the
> > > copyright will be this time around; it will be at O'Reilly's site
> > > this Fall in any event.
> >
> > That would be great.  I am planning to write my own sample ap to
> > demonstrate the new API, but if I can use yours to test the compatibility
> > layer that will help a lot, since I otherwise have no Python3 email
> > application to test against unless I port something from Python2.
> >
> > > Major issues I found...
> > > ------------------------------------------------------------------
> > > 1) Str required for parsing, but bytes returned from poplib
> > >
> > > The initial decode from bytes to str of full mail text; in
> > > retrospect, probably not a major issue, since original email
> > > standards called for ASCII.  A 8-bit encoding like Latin-1 is
> > > probably sufficient for most conforming mails.  For the book,
> > > I try a set of different encodings, beginning with an optional
> > > configuration module setting, then ascii, latin-1, and utf-8;
> > > this is probably overkill, but a GUI has to be defensive.
> >
> > This works (mostly) for conforming email, but some important Python email
> > applications need to deal with non-conforming email.  That's where the
> > inability to parse bytes directly really causes problems.
> >
> > > 2) Binary attachments encoding
> > >
> > > The binary attachments byte-to-str issue that you've just
> > > fixed.  As I mentioned, I worked around this by passing in a
> > > custom encoder that calls the original and runs an extra decode
> > > step.  Here's what my fix looked like in the book; your patch
> > > may do better, and I will minimally add a note about the 3.1.3
> > > and 3.2 fix for this:
> >
> > Yeah, our patch was a lot simpler since we could fix the encoding inside
> > the loop producing the encoded lines :)
> >
> > > 3) Type-dependent text part encoding
> > >
> > > There's a str/bytes confusion issue related to Unicode encodings
> > > in text payload generation: some encodings require the payload to
> > > be str, but others expect bytes.  Unfortunately, this means that
> > > clients need to know how the package will react to the encoding
> > > that is used, and special-case based upon that.  
> >
> > This was the UTF-8 bug I fixed.  I shouldn't have called it "the UTF-8
> > bug", because it applies equally to the other charsets that use base64,
> > as you note.  I called it that because UTF-8 was where the problem was
> > noticed and is mentioned in the title of the bug report.
> >
> > I had a suspicion that the quoted-printable encoding wasn't being done
> > correctly either, so to hear that it is working for you is good news.
> > There may still be bugs to find there, though.
> >
> > So, in the next releases of Python all MIMEText input should be string,
> > and it will fail if you pass bytes.  I consider this as email previously
> > not living up to its published API, but do you think I should hack
> > in a way for it to accept bytes too, for backward compatibility in the
> > 3 line?
> >
> > > There are some additional cases that now require decoding per mail
> > > headers today due to the str/bytes split, but these are just a
> > > normal artifact of supporting Unicode character sets in general,
> > > ans seem like issues for package client to resolve (e.g., the bytes
> > > returned for decoded payloads in 3.X didn't play well with existing
> > > str-based text processing code written for 2.X).
> >
> > I'm not following you here.  Can you give me some more specific
> > examples?  Even if these "normal artifacts" must remain with
> > the current API, I'd like to make things as easy as practical when
> > using the new API.
> >
> > Thanks for all your feedback!
> >
> > --David
> >
>
>
>
>



_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: email package status in 3.X

lutz-10
In reply to this post by R. David Murray
[copied to pydev from email-sig because of the broader scope]

Well, it looks like I've stumbled onto the "other shoe" on this
issue--that the email package's problems are also apparently
behind the fact that CGI binary file uploads don't work in 3.1
(http://bugs.python.org/issue4953).  Yikes.

I trust that people realize this is a show-stopper for broader
Python 3.X adoption.  Why 3.0 was rolled out anyhow is beyond
me; it seems that it would have been better if Python developers
had gotten their own code to work with 3.X, before expecting the
world at large to do so.

FWIW, after rewriting Programming Python for 3.1, 3.x still feels
a lot like a beta to me, almost 2 years after its release.  How
did this happen?  Maybe nobody is using 3.X enough to care, but
I have a feeling that issues like this are part of the reason why.

No offense to people who obviously put in an incredible amount of
work on 3.X.  As someone who remembers 0.X, though, it's hard not
to find the current situation a bit disappointing.

--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)


> -----Original Message-----
> From: [hidden email]
> To: "R. David Murray" <[hidden email]>
> Subject: Re: email package status in 3.X
> Date: Sun, 13 Jun 2010 15:30:06 -0000
>
> Come to think of it, here was another oddness I just recalled: this
> may have been reported already, but header decoding returns mixed types
> depending upon the structure of the header.  Converting to a str for
> display isn't too difficult to handle, but this seems a bit inconsistent
> and contrary to Python's type neutrality:
>
> >>> from email.header import decode_header
> >>> S1 = 'Man where did you get that assistant?'
> >>> S2 = '=?utf-8?q?Man_where_did_you_get_that_assistant=3F?='
> >>> S3 = 'Man where did you get that =?UTF-8?Q?assistant=3F?='
>
> # str: don't decode()
> >>> decode_header(S1)
> [('Man where did you get that assistant?', None)]
>
> # bytes: do decode()
> >>> decode_header(S2)
> [(b'Man where did you get that assistant?', 'utf-8')]
>
> # bytes: do decode(), using raw-unicode-escape applied in package
> >>> decode_header(S3)
> [(b'Man where did you get that', None), (b'assistant?', 'utf-8')]
>
> I can work around this with the following code, but it
> feels a bit too tightly coupled to the package's internal details
> (further evidence that email.* can be made to work as is today,
> even if it may be seen as less than ideal aesthetically):
>
> parts = email.header.decode_header(rawheader)
> decoded = []
> for (part, enc) in parts:                      # for all substrings
>     if enc == None:                            # part unencoded?
>         if not isinstance(part, bytes):        # str: full hdr unencoded
>             decoded += [part]                  # else do unicode decode
>         else:
>             decoded += [part.decode('raw-unicode-escape')]
>     else:
>         decoded += [part.decode(enc)]
> return ' '.join(decoded)
>
> Thanks,
> --Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)
>
>
> > -----Original Message-----
> > From: [hidden email]
> > To: "R. David Murray" <[hidden email]>
> > Subject: Re: email package status in 3.X
> > Date: Sat, 12 Jun 2010 16:52:32 -0000
> >
> > Hi David,
> >
> > All sounds good, and thanks again for all your work on this.
> >
> > I appreciate the difficulties of moving this package to 3.X
> > in a backward-compatible way.  My suggestions stem from the fact
> > that it does work as is today, albeit in a less than ideal way.
> >
> > That, and I'm seeing that Python 3.X in general is still having
> > a great deal of trouble gaining traction in the "real world"
> > almost 2 years after its release, and I'd hate to see further
> > disincentives for people to migrate.  This is a bigger issue
> > than both the email package and this thread, of course.
> >
> > > > 3) Type-dependent text part encoding
> > > >
> > > ...
> > > So, in the next releases of Python all MIMEText input should be string,
> > > and it will fail if you pass bytes.  I consider this as email previously
> > > not living up to its published API, but do you think I should hack
> > > in a way for it to accept bytes too, for backward compatibility in the
> > > 3 line?
> >
> > Decoding can probably be safely delegated to package clients.
> > Typical email clients will probably have str for display of the
> > main text.  They may wish to read attachments in binary mode, but
> > can always read in text mode instead or decode manualy, because
> > they need a known encoding to send the part correctly (my client
> > has to ask or use configurations in some cases).
> >
> > B/W compatibility probably isn't a concern; I suspect that my
> > temporary workaround will still work with your patch anyhow,
> > and this code didn't work at all for some encodings before.
> >
> > > > There are some additional cases that now require decoding per mail
> > > > headers today due to the str/bytes split, but these are just a
> > > > normal artifact of supporting Unicode character sets in general,
> > > > ans seem like issues for package client to resolve (e.g., the bytes
> > > > returned for decoded payloads in 3.X didn't play well with existing
> > > > str-based text processing code written for 2.X).
> > >
> > > I'm not following you here.  Can you give me some more specific
> > > examples?  Even if these "normal artifacts" must remain with
> > > the current API, I'd like to make things as easy as practical when
> > > using the new API.
> >
> > This was just a general statement about things in my own code that
> > didn't jive with the 3.X string model.  For instance, line wrapping
> > logic assumed str; tkinter text widgets do much better rendering str
> > than the bytes fetched for decoded payloads; and my Pyedit text editor
> > component had to be overhauled to handle display/edit/save of payloads
> > of arbitrary encodings.  If I remember any more specific issues with
> > the email package itself, I'll forward your way.
> >
> > I'll watch for an opportunity to get the book's new PyMailGUI
> > client code to you as a candidate test case, but please ping
> > me about it later if I haven't acted on this.  It works well,
> > but largely because of all the work that went into the email
> > package underlying it.
> >
> > Thanks,
> > --Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)
> >
> >
> > > -----Original Message-----
> > > From: "R. David Murray" <[hidden email]>
> > > To: [hidden email]
> > > Subject: Re: email package status in 3.X
> > > Date: Thu, 10 Jun 2010 10:18:48 -0400
> > >
> > > On Thu, 10 Jun 2010 09:21:52 -0400, [hidden email] wrote:
> > > > In other words, some of my concern may have been a bit premature.  
> > > > I hope that in the future we'll either strive for compatibility
> > > > or keep the current version around; it's a lot of very useful code.
> > >
> > > The plan is to have a compatibility layer that will accept calls based
> > > on the old API and forward appropriately to the new API.  So far I'm
> > > thinking I can succeed in doing this in a fairly straightforward manner,
> > > but I won't know for sure until I get some more pieces in place.
> > >
> > > > In fact, I recommend that any new email package be named distinctly,
> > >
> > > I'm going to avoid that if I can (though the PyPI package will be
> > > named email6 when we publish it for public testing).  If, however,
> > > it turns out that I can't correctly support both the old and the
> > > new API, then I'll have to do that.
> > >
> > > > and that the current package be retained for a number of releases to
> > > > come.  After all the breakages that 3.X introduced in general, doing
> > > > the same to any email-based code seems a bit too much, especially
> > > > given that the current package is largely functional as is.  To me,
> > > > after having just used it extensively, fixing its few issues seems
> > > > a better approach than starting from scratch.
> > >
> > > Well, the thing is, as you found, existing 2.x code needs to be fixed to
> > > correctly handle the distinction between strings and bytes no matter what.
> > > The goal is to make it easier to write correct programs, while providing
> > > the compatibility layer to make porting smoother.  But I doubt that any
> > > non-trivial 2.x email program will port without significant changes,
> > > even if the compatibility layer is close to 100% compatible with the
> > > current Python3 email package, simply because the previous conflation
> > > of text and bytes must be untangled in order to work correctly in
> > > Python3, and email involves lots of transitions between text and bytes.
> > >
> > > As for "starting from scratch", it is true that the current plan involves
> > > considerable changes in the recommended API (in the direction of greater
> > > flexibility and power), but I'm hoping that significant portions of the
> > > code will carry forward with minor changes, and that this will make it
> > > easier to support the old API.
> > >
> > > > As far as other issues, the things I found are described below my
> > > > signature.  I don't know what the utf-8 issue is that you refer
> > > > too; I'm able to parse and send with this encoding as is without
> > > > problems (both payloads and headers), but I'm probably not using the
> > > > interfaces you fixed, and this may be the same as one of item listed.
> > >
> > > It is, see below.
> > >
> > > > Another thought: it might be useful to use the book's email client
> > > > as a sort of test case for the package; it's much more rigorous in
> > > > the new edition because it now has to be given 3.X'Unicode model
> > > > (it's abut 4,900 lines of code, though not all is email-related).
> > > > I'd be happy to donate the code as soon as I find out what the
> > > > copyright will be this time around; it will be at O'Reilly's site
> > > > this Fall in any event.
> > >
> > > That would be great.  I am planning to write my own sample ap to
> > > demonstrate the new API, but if I can use yours to test the compatibility
> > > layer that will help a lot, since I otherwise have no Python3 email
> > > application to test against unless I port something from Python2.
> > >
> > > > Major issues I found...
> > > > ------------------------------------------------------------------
> > > > 1) Str required for parsing, but bytes returned from poplib
> > > >
> > > > The initial decode from bytes to str of full mail text; in
> > > > retrospect, probably not a major issue, since original email
> > > > standards called for ASCII.  A 8-bit encoding like Latin-1 is
> > > > probably sufficient for most conforming mails.  For the book,
> > > > I try a set of different encodings, beginning with an optional
> > > > configuration module setting, then ascii, latin-1, and utf-8;
> > > > this is probably overkill, but a GUI has to be defensive.
> > >
> > > This works (mostly) for conforming email, but some important Python email
> > > applications need to deal with non-conforming email.  That's where the
> > > inability to parse bytes directly really causes problems.
> > >
> > > > 2) Binary attachments encoding
> > > >
> > > > The binary attachments byte-to-str issue that you've just
> > > > fixed.  As I mentioned, I worked around this by passing in a
> > > > custom encoder that calls the original and runs an extra decode
> > > > step.  Here's what my fix looked like in the book; your patch
> > > > may do better, and I will minimally add a note about the 3.1.3
> > > > and 3.2 fix for this:
> > >
> > > Yeah, our patch was a lot simpler since we could fix the encoding inside
> > > the loop producing the encoded lines :)
> > >
> > > > 3) Type-dependent text part encoding
> > > >
> > > > There's a str/bytes confusion issue related to Unicode encodings
> > > > in text payload generation: some encodings require the payload to
> > > > be str, but others expect bytes.  Unfortunately, this means that
> > > > clients need to know how the package will react to the encoding
> > > > that is used, and special-case based upon that.  
> > >
> > > This was the UTF-8 bug I fixed.  I shouldn't have called it "the UTF-8
> > > bug", because it applies equally to the other charsets that use base64,
> > > as you note.  I called it that because UTF-8 was where the problem was
> > > noticed and is mentioned in the title of the bug report.
> > >
> > > I had a suspicion that the quoted-printable encoding wasn't being done
> > > correctly either, so to hear that it is working for you is good news.
> > > There may still be bugs to find there, though.
> > >
> > > So, in the next releases of Python all MIMEText input should be string,
> > > and it will fail if you pass bytes.  I consider this as email previously
> > > not living up to its published API, but do you think I should hack
> > > in a way for it to accept bytes too, for backward compatibility in the
> > > 3 line?
> > >
> > > > There are some additional cases that now require decoding per mail
> > > > headers today due to the str/bytes split, but these are just a
> > > > normal artifact of supporting Unicode character sets in general,
> > > > ans seem like issues for package client to resolve (e.g., the bytes
> > > > returned for decoded payloads in 3.X didn't play well with existing
> > > > str-based text processing code written for 2.X).
> > >
> > > I'm not following you here.  Can you give me some more specific
> > > examples?  Even if these "normal artifacts" must remain with
> > > the current API, I'd like to make things as easy as practical when
> > > using the new API.
> > >
> > > Thanks for all your feedback!
> > >
> > > --David
> > >
> >
> >
> >
> >
>



_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

Nick Coghlan
On Thu, Jun 17, 2010 at 6:48 AM,  <[hidden email]> wrote:

> I trust that people realize this is a show-stopper for broader
> Python 3.X adoption.  Why 3.0 was rolled out anyhow is beyond
> me; it seems that it would have been better if Python developers
> had gotten their own code to work with 3.X, before expecting the
> world at large to do so.
>
> FWIW, after rewriting Programming Python for 3.1, 3.x still feels
> a lot like a beta to me, almost 2 years after its release.  How
> did this happen?  Maybe nobody is using 3.X enough to care, but
> I have a feeling that issues like this are part of the reason why.
>
> No offense to people who obviously put in an incredible amount of
> work on 3.X.  As someone who remembers 0.X, though, it's hard not
> to find the current situation a bit disappointing.

Agreed, but the binary/text distinction in 2.x (or rather, the lack
thereof) makes the unicode handling situation so hopelessly confused
that there is a lot of 2.x code (including in the standard library)
that silently mixes the two, often without really testing the
consequences (as clearly happened here).

3.x was rolled out anyway because the vast majority of it works.
Obviously people affected by the problems specific to the email
package and any other binary vs text parsing problems that are still
lingering are out of luck at the moment, but leaving 3.x sitting on a
shelf indefinitely would hardly have inspired anyone to clean it up.
My personal perspective is that a lot of that code was likely already
broken in hard to detect ways when dealing with mixed encodings -
releasing 3.x just made the associated errors significantly easier to
detect.

If we end up being able to add your email client code to the standard
library's unit test suite, that should help the situation immensely.

Regards,
Nick.

--
Nick Coghlan   |   [hidden email]   |   Brisbane, Australia
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

Barry Warsaw
In reply to this post by lutz-10
On Jun 16, 2010, at 08:48 PM, [hidden email] wrote:

>Well, it looks like I've stumbled onto the "other shoe" on this
>issue--that the email package's problems are also apparently
>behind the fact that CGI binary file uploads don't work in 3.1
>(http://bugs.python.org/issue4953).  Yikes.
>
>I trust that people realize this is a show-stopper for broader
>Python 3.X adoption.

We know it, we have extensively discussed how to fix it, we have IMO a good
design, and we even have someone willing and able to tackle the problem.  We
need to find a sufficient source of funding to enable him to do the work it
will take, and so far that's been the biggest stumbling block.  It will take a
focused and determined effort to see this through, and it's obvious that
volunteers cannot make it happen.  I include myself in the latter category, as
I've tried and failed at least twice to do it in my spare time.

-Barry

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

Brett Cannon-2
On Thu, Jun 17, 2010 at 08:43, Barry Warsaw <[hidden email]> wrote:

> On Jun 16, 2010, at 08:48 PM, [hidden email] wrote:
>
>>Well, it looks like I've stumbled onto the "other shoe" on this
>>issue--that the email package's problems are also apparently
>>behind the fact that CGI binary file uploads don't work in 3.1
>>(http://bugs.python.org/issue4953).  Yikes.
>>
>>I trust that people realize this is a show-stopper for broader
>>Python 3.X adoption.
>
> We know it, we have extensively discussed how to fix it, we have IMO a good
> design, and we even have someone willing and able to tackle the problem.  We
> need to find a sufficient source of funding to enable him to do the work it
> will take, and so far that's been the biggest stumbling block.  It will take a
> focused and determined effort to see this through, and it's obvious that
> volunteers cannot make it happen.  I include myself in the latter category, as
> I've tried and failed at least twice to do it in my spare time.

And in general I think this is the reason some modules have not
transitioned as well as others: there are only so many of us. The
stdlib passes its test suite, but obviously some unit tests do not
cover enough of the code in the ways people need it covered.

As for using Python 3 for my code, I do and have since Python 3 became
more-or-less usable. I just happen to not work with internet-related
stuff in my day-to-day work.

Plus we have needed to maintain FOUR branches for a while. That is a
nasty time sink when you are having to port bug fixes and such. It
also means that python-dev has been focused on making sure Python 2.7
is a solid release instead of getting to focus on the stdlib in Python
3. This a nasty chicken-and-egg issue; we could ignore Python 2 and
focus on Python 3, but then the community would complain about us not
supporting the transition from 2 to 3 better, but obviously focusing
on 2 has led to 3 not getting enough TLC.

Once Python 2.7 is done and out the door the entire situation for
Python 3 should start to improve as python-dev as whole will have a
chance to begin to focus solely on Python 3.
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

Steve Holden-5
In reply to this post by Barry Warsaw
Barry Warsaw wrote:

> On Jun 16, 2010, at 08:48 PM, [hidden email] wrote:
>
>> Well, it looks like I've stumbled onto the "other shoe" on this
>> issue--that the email package's problems are also apparently
>> behind the fact that CGI binary file uploads don't work in 3.1
>> (http://bugs.python.org/issue4953).  Yikes.
>>
>> I trust that people realize this is a show-stopper for broader
>> Python 3.X adoption.
>
> We know it, we have extensively discussed how to fix it, we have IMO a good
> design, and we even have someone willing and able to tackle the problem.  We
> need to find a sufficient source of funding to enable him to do the work it
> will take, and so far that's been the biggest stumbling block.  It will take a
> focused and determined effort to see this through, and it's obvious that
> volunteers cannot make it happen.  I include myself in the latter category, as
> I've tried and failed at least twice to do it in my spare time.
>
> -Barry
>
Lest the readership think that the PSF is unaware of this issue, allow
me to point out that we have already partially funded this effort, and
are still offering R. David Murray some further matching funds if he can
raise sponsorship to complete the effort (on which he has made a very
promising start).

We are also attempting to enable tax-deductible fund raising to increase
the likelihood of David's finding support. Perhaps we need to think
about a broader campaign to increase the quality of the python 3
libraries. I find it very annoying that the #python IRC group still has
"Don't use Python 3" in it's topic.  They adamantly refuse to remove it
until there is better library support, and they are the guys who see the
issues day in day out so it is hard to argue with them (and I don't
think an autocratic decision-making process would be appropriate).

regards
 Steve
--
Steve Holden           +1 571 484 6266   +1 800 494 3119
See Python Video!       http://python.mirocommunity.org/
Holden Web LLC                 http://www.holdenweb.com/
UPCOMING EVENTS:        http://holdenweb.eventbrite.com/
"All I want for my birthday is another birthday" -
                                     Ian Dury, 1942-2000

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

Arc Riley-2
David and his Google Summer of Code student, Shashwat Anand.

You can read Shashwat's weekly progress updates at http://l0nwlf.in/ or subscribe to http://twitter.com/l0nwlf for more micro updates.

We have more than 30 paid students working on Python 3 tasks this year, most of them participating under the PSF umbrella but also a few with 3rd party projects such as Mercurial porting those various packages to Py3.

Given all this "on the horizon" work, I think the Py3 package situation will look a lot brighter by Python 3.2's release.


On Thu, Jun 17, 2010 at 10:32 PM, Steve Holden <[hidden email]> wrote:

Lest the readership think that the PSF is unaware of this issue, allow
me to point out that we have already partially funded this effort, and
are still offering R. David Murray some further matching funds if he can
raise sponsorship to complete the effort (on which he has made a very
promising start).

We are also attempting to enable tax-deductible fund raising to increase
the likelihood of David's finding support.

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

Stephen J. Turnbull
In reply to this post by lutz-10
[hidden email] writes:

 > FWIW, after rewriting Programming Python for 3.1, 3.x still feels
 > a lot like a beta to me, almost 2 years after its release.

Email, of course, is a big wart.  But guess what?  Python 2's email
module doesn't actually work!  Sure, the program runs most of the
time, but every program that depends on email must acquire inches of
armorplate against all the things that can go wrong.  You simply can't
rely on it to DTRT except in a pre-MIME, pre-HTML, ASCII-only world.
Although they're often addressing general problems, these hacks are
*not* integrated back into the email module in most cases, but remain
app-specific voodoo.

If you live in Kansas, sure, you can concentrate on dodging tornados
and completely forget about Unicode and MIME and text/bogus content.
For the rest of the world, though, the problem is not Python 3.  It's
STD 11 (which still points at RFC 822, dated 1982!)  It's really
inappropriate to point at the email module, whose developers are
trying *not* to punt on conformance and robustness, when even the IETF
can only "run in circles, scream and shout"!

Maybe there are other problems with Python 3 that deserve to be
pointed at, but given the general scarcity of resources I think the
email module developers are working on the right things.  Unlike many
other modules, email really needs to be rewritten from the ground
(Python 3) up, because of the centrality of bytes/unicode confusion to
all email problems.  Python 3 completely changes the assumptions
there; a Python 2-style email module really can't work properly.

Then on top of that, today we know a lot more about handling issues
like text/html content and MIME in general than when the Python 2
email module was designed.  New problems have arisen over the period
of Python 3 development, like "domain keys", which email doesn't
handle out of the box AFAIK, but email for Python 3 should IMHO.

Should Python 3 have been held back until email was fixed?  Dunno, but
I personally am very glad it was not; where I have a choice, I always
use Python 3 now, and have yet to run into a problem.  I expect that
to change if I can find the time to get involved in email and Mailman
3 development, of course.<wink>

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

Barry Warsaw
In reply to this post by Steve Holden-5
On Jun 18, 2010, at 11:32 AM, Steve Holden wrote:

>Lest the readership think that the PSF is unaware of this issue, allow
>me to point out that we have already partially funded this effort, and
>are still offering R. David Murray some further matching funds if he can
>raise sponsorship to complete the effort (on which he has made a very
>promising start).

Right, sorry, I didn't mean to imply the PSF isn't doing anything.  More that
we need a coordinated effort among all the companies and organizations that
use Python to help fund Python 3 library development (and not just in the
stdlib).  I think the PSF is best suited to coordinating and managing those
efforts, and through its tax-exempt status, collecting and distributing
donations specifically targeted to Python 3 work.

-Barry

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

lutz-10
In reply to this post by R. David Murray
Replying en masse to save bandwidth here...

Barry Warsaw <[hidden email]> writes:
> We know it, we have extensively discussed how to fix it, we have IMO a good
> design, and we even have someone willing and able to tackle the problem.  We
> need to find a sufficient source of funding to enable him to do the work it
> will take, and so far that's been the biggest stumbling block.  It will take a
> focused and determined effort to see this through, and it's obvious that
> volunteers cannot make it happen.  I include myself in the latter category, as
> I've tried and failed at least twice to do it in my spare time.

All understood, and again, not to disparage anyone here.  My
comments are directed to the development community at large
to underscore the grave p/r problems 3.X faces.

I realize email parsing is a known issue; I also realize that
most people evaluating 3.X today won't care that it is.  Most
will care only that the new version of a language reportedly
used by Google and YouTube still doesn't support CGI uploads
a year and a half after its release.  As an author, that's a
downright horrible story to have to tell the world.


"Stephen J. Turnbull" <[hidden email]> writes:
> Email, of course, is a big wart.  But guess what?  Python 2's email
> module doesn't actually work!

Yes it does (see next point).

> If you live in Kansas, sure, you can concentrate on dodging tornados
> and completely forget about Unicode and MIME and text/bogus content.
> For the rest of the world, though, the problem is not Python 3

Yes it is, and Kansas is a lot bigger than you seem to think.

I want to reiterate that I was able to build a feature rich
email client with the email package as it exists in 3.1.  This
includes support on both the receiving and sending sides for HTML,
arbitrary attachments, and decoding and encoding of both text
payloads and headers according to email, MIME, and Unicode/I18N
standards.  It's an amazingly useful package, and does work as is
in 3.X.  The two main issues I found have been recently fixed.  
It's unfortunate that this package is also the culprit behind CGI
breakage, but it's not clear why it became a critical path for so
much utility in the first place.

The package might not be aesthetically ideal, but to me it
seems that an utterly incompatible overhaul of this in the name
of supporting potentially very different data streams is a huge
functional overload.  And to those people in Kansas who live
outside the pydev clique, replacing it with something different
at this point will look as if an incompatible Python is already
incompatible with releases in its own line.  Why in the world
would anyone base a new project on that sort of thrashing?

For my part, I've had to add far too many notes to the upcoming
edition of Programming Python about major pieces of functionality
that worked in 2.X but no longer do in 3.X.  That's disappointing
to me personally, but it will probably seem a lot worse to the
book's tens of thousands of readers.  Yet this is the reality
that 3.X has created for itself.

> Should Python 3 have been held back until email was fixed?  Dunno, but
> I personally am very glad it was not; where I have a choice, I always
> use Python 3 now, and have yet to run into a problem.

I guess we'll just have to disagree on that.  IMHO, Python 3 shot
itself in the foot by releasing in half-baked form.  And the 3.0
I/O speed issue (remember that?) came very close to blowing its
leg clean off.

The reality out there in Kansas today is that 3.X is perceived as
so bad that it could very well go the way of POP4 if its story does
not improve.  I don't know what sort of Python world will be left
behind in the wake, but I do know it will probably be much smaller.


Steve Holden <[hidden email]> writes:

> Lest the readership think that the PSF is unaware of this issue, allow
> me to point out that we have already partially funded this effort, and
> are still offering R. David Murray some further matching funds if he can
> raise sponsorship to complete the effort (on which he has made a very
> promising start).
>
> We are also attempting to enable tax-deductible fund raising to increase
> the likelihood of David's finding support. Perhaps we need to think
> about a broader campaign to increase the quality of the python 3
> libraries. I find it very annoying that the #python IRC group still has
> "Don't use Python 3" in it's topic.  They adamantly refuse to remove it
> until there is better library support, and they are the guys who see the
> issues day in day out so it is hard to argue with them (and I don't
> think an autocratic decision-making process would be appropriate).

I'm all for people getting paid for work they do, but with all
due respect, I think this underscores part of the problem in
the Python world today.  If funding had been as stringent a
prerequisite in the 90s, I doubt there would be a Python today.
It was about the fun and the code, not the bucks and the
bureaucracy.  As far as I can recall, there was no notion of
creating a task force to get things done.

Of course, this may just be the natural evolutionary pattern of
human enterprises.  As it is today, though, the Python community
has a formal diversity statement, but it still does not have a
fully functional 3.X almost two years after the fact.  I doubt
that I'm the only one who sees the irony in that.

Again, I mean no disrespect to people contributing to Python
today on so many fronts, and I have no answers to offer here.
For better or worse, though, this is a personal issue to me too.
After spending much of the last 2 years updating the best selling
Python books for all the changes this group has seen fit to make,
I believe I can say with some authority that 3.X still faces a
very uncertain future.

--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)



_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

Michael Foord-5
On 18/06/2010 16:09, [hidden email] wrote:

> Replying en masse to save bandwidth here...
>
> Barry Warsaw<[hidden email]>  writes:
>    
>> We know it, we have extensively discussed how to fix it, we have IMO a good
>> design, and we even have someone willing and able to tackle the problem.  We
>> need to find a sufficient source of funding to enable him to do the work it
>> will take, and so far that's been the biggest stumbling block.  It will take a
>> focused and determined effort to see this through, and it's obvious that
>> volunteers cannot make it happen.  I include myself in the latter category, as
>> I've tried and failed at least twice to do it in my spare time.
>>      
> All understood, and again, not to disparage anyone here.  My
> comments are directed to the development community at large
> to underscore the grave p/r problems 3.X faces.
>
> I realize email parsing is a known issue; I also realize that
> most people evaluating 3.X today won't care that it is.  Most
> will care only that the new version of a language reportedly
> used by Google and YouTube still doesn't support CGI uploads
> a year and a half after its release.  As an author, that's a
> downright horrible story to have to tell the world.
>
>    

Really? How widely used is the CGI module these days? Maybe there is a
reason nobody appeared to notice...


> [snip...]
>> Should Python 3 have been held back until email was fixed?  Dunno, but
>> I personally am very glad it was not; where I have a choice, I always
>> use Python 3 now, and have yet to run into a problem.
>>      
> I guess we'll just have to disagree on that.  IMHO, Python 3 shot
> itself in the foot by releasing in half-baked form.  And the 3.0
> I/O speed issue (remember that?) came very close to blowing its
> leg clean off.
>
>    

Whilst I agree that there are plenty of issues to workon, and I don't
underestimate the difficulty of some of them, I think "half-baked" is
very much overblown. Whilst you have a lot to say about how much of a
problem this is I don't understand what you are suggesting be *done*?

Python 3.0 was *declared* to be an experimental release, and by most
standards 3.1 (in terms of the core language and functionality) was a
solid release.

Any reasonable expectation about Python 3 adoption predicted that it
would take years, and would include going through a phase of difficulty
and disappointment...

All the best,

Michael Foord

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.


_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

lutz-10
In reply to this post by R. David Murray
> Python 3.0 was *declared* to be an experimental release, and by most
> standards 3.1 (in terms of the core language and functionality) was a
> solid release.
>
> Any reasonable expectation about Python 3 adoption predicted that it
> would take years, and would include going through a phase of difficulty
> and disappointment...

Declaring something to be a turd doesn't change the fact that
it's a turd.  I have a feeling that most people outside this
list would have much rather avoided the difficulty and
disappointment altogether.

Let's be honest here; 3.X was released to the community in part
as an extended beta.  That's not a problem, unless you drop the
word "beta".  And if you're still not buying that, imagine the sort
of response you'd get if you tried to sell software that billed
itself as "experimental", and promised a phase of "disappointment".  
Why would you expect the Python world to react any differently?

> Whilst I agree that there are plenty of issues to workon, and I don't
> underestimate the difficulty of some of them, I think "half-baked" is
> very much overblown. Whilst you have a lot to say about how much of a
> problem this is I don't understand what you are suggesting be *done*?

I agree that 3.X isn't all bad, and I very much hope it succeeds.  And
no, I have no answers; I'm just reporting the perception from downwind.

So here it is: The prevailing view is that 3.X developers hoisted things
on users that they did not fully work through themselves.  Unicode is
prime among these: for all the talk here about how 2.X was broken in
this regard, the implications of the 3.X string solution remain to be
fully resolved in the 3.X standard library to this day.  What is a
common Python user to make of that?

--Mark Lutz  (http://learning-python.com, http://rmi.net/~lutz)



_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

PJ Eby
At 05:22 PM 6/18/2010 +0000, [hidden email] wrote:
>So here it is: The prevailing view is that 3.X developers hoisted things
>on users that they did not fully work through themselves.  Unicode is
>prime among these: for all the talk here about how 2.X was broken in
>this regard, the implications of the 3.X string solution remain to be
>fully resolved in the 3.X standard library to this day.  What is a
>common Python user to make of that?

Certainly, this was my impression as well, after all the Web-SIG
discussions regarding the state of the stdlib in 3.x with respect to
URL parsing, joining, opening, etc.

To be honest, I'm waiting to see some sort of tutorial(s) for using
3.x that actually addresses these kinds of stdlib usage issues, so
that I don't have to think about it or futz around with
experimenting, possibly to find that some things can't be done at all.

IOW, 3.x has broken TOOOWTDI for me in some areas.  There may be
obvious ways to do it, but, as per the Zen of Python, "that way may
not be obvious at first unless you're Dutch".  ;-)
Since at the moment Python 3 offers me only cosmetic improvements
over 2.x (apart from argument annotations), it's hard to get excited
enough about it to want to muck about with porting anything to it, or
even trying to learn about all the ramifications of the changes.  :-(

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: [Python-Dev] email package status in 3.X

Jesse Noller
On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby <[hidden email]> wrote:

> At 05:22 PM 6/18/2010 +0000, [hidden email] wrote:
>>
>> So here it is: The prevailing view is that 3.X developers hoisted things
>> on users that they did not fully work through themselves.  Unicode is
>> prime among these: for all the talk here about how 2.X was broken in
>> this regard, the implications of the 3.X string solution remain to be
>> fully resolved in the 3.X standard library to this day.  What is a
>> common Python user to make of that?
>
> Certainly, this was my impression as well, after all the Web-SIG discussions
> regarding the state of the stdlib in 3.x with respect to URL parsing,
> joining, opening, etc.

Nothing is set in stone; if something is incredibly painful, or worse
yet broken, then someone needs to file a bug, bring it to this list,
or bring up a patch. This is code we're talking about - nothing is set
in stone, and if something is criminally broken it needs to be first
identified, and then fixed.

> To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x that
> actually addresses these kinds of stdlib usage issues, so that I don't have
> to think about it or futz around with experimenting, possibly to find that
> some things can't be done at all.

I guess tutorial welcome, rather than patch welcome then ;)

> IOW, 3.x has broken TOOOWTDI for me in some areas.  There may be obvious
> ways to do it, but, as per the Zen of Python, "that way may not be obvious
> at first unless you're Dutch".  ;-)

What areas. We need specifics which can either be:

1> Shot down.
2> Turned into bugs, so they can be fixed
3> Documented in the core documentation.

jesse
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
12