Quantcast

API thoughts

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

API thoughts

R. David Murray
This is a long email, for which my apologies.  I hope you all will
manage to find some time to read it and provide feedback, as it speaks
to fundamental design issues.

My subconscious seems to have been very busy last night, since in the
shower this morning it presented me with a whole bunch of thoughts about
the email API.  This was triggered, I think, by Barry's question about
__version__, my response that we might want an 'api version' declaration,
and some comments made during the email 5.1 discussion by Steven D'Arapano
(I think) about how Message is really the idealized representation of
an email message.

Let me start by saying that I think we can all agree that the fundamental
design of the email package is excellent:  we have a Parser which handles
taking input from the outside world and turning it into a Message, and
we have a Generator which handles taking a Message and turning it into
something the outside world can handle.  In the focus of the original
development the "outside world" was, sensibly, RFC 822/2822 encoded
byte streams.

The idealized message consists of some meta information (addressee,
recipient, date, etc, etc) and a body.  The body, the content, can be
arbitrarily complex.  The purpose of the message is to convey some of
that meta information and all of the arbitrarily complex body content
from the sender to the recipient.

Everything else is an implementation detail :)

So, if we are writing a program and we want to compose such a message, it
makes sense that we can build up this idealized message from its component
pieces by attaching objects representing those pieces to the Message.
At that stage we care nothing about how it needs to be transformed to
get from point A to point B.

If we want to look at a message, we again don't are about how it was
transformed to get from point A to point B, we just want to be able to
access the content in its original form.

In today's "outside world" we have more to worry about than just
RFC822/2822/5322.  The "outside world" could be an http transmission
medium.  It could (if we re-design things right:) be a SIP session.
It could be a disk-based data store, where an RFC822-like message format
is being used to store data.  I'm sure there are other contexts as well.

So keeping the external representation concerns separate from the
idealized message model makes sense.

The email4/5 API doesn't do this as successfully as it could, especially
in a Python3 context.  The application program dealing with the idealized
message doesn't really care what character set any given piece of a header
is encoded in, it really just wants to deal with complete unicode strings.
The application program also really doesn't care about the MIME type of a
piece of content, it just wants to manage an object that has methods that
allow it to manipulate that image, or that audio file, or what have you.
Of course, it also needs to know what type of object it is handling in an
incoming message, but the mime type is only one piece of the information
that determines that (albeit usually the most important one).

(Yes, some applications *do* care about internal details...but those
are special cases and we can provide additional APIs that allow access
at that level for those applications that need it, as we have discussed
previously.)

We propose to create a new API to make all of this easier for
the application programmer.  What doesn't change is the fundamental
structure of the package:  a message in some transmission format is
fed to a Parser, which produces a Message object.  A Message object
can be fed to a Generator, which produces a transmission format object.
Now, I lost sight of this a bit while I was working on the email6 header
classes, as Barry at least will remember, but I do think it is important,
and I want to keep it in the forefront of my mind as I work on adding
the proposed policy framework.

So, and here is the point of this email, how does the policy framework
integrate into this design?

I said that the policy pulls together the tunable bits of the email
package's algorithms.  What does this mean?  What are the tunable
bits?  Here are some candidates:

    maximum header line length on serialization
    line ending character on serialization
    whether or not to raise an exception if a defect is encountered
        during parsing
    how much transformation of untouched original data is permissible
        when re-serializing a message
    can the serialized form contain any non-ASCII data?
    what classes to use to represent various MIME types.
   
These are all decisions that can be made one way or another by an
application program using the current package.  Often, however, modifying
the default is not easy or convenient.  Note that the last one can only
be decided by an application program when constructing a message, not
when parsing one.

Here are some other things that it might be useful to be able to
control:

    what string to use as the continuation whitespace when needing
        to add some
    what classes to use to represent various structured headers
    what exactly counts as a defect
    should headers be RFC2047 encoded on serialization, or
        should another encoding be used?[*]

[*] There are current real-world use cases for this:  there are nntp
    servers that use utf-8 for headers, and the http protocol uses
    latin-1 (or sometimes, I think, utf-8)

This list breaks down into items that affect the Parser, ones that affect
the Generator, and ones that affect both the Parser and the Message.
(Well, the "how much transformation" affects all three in the sense that
the data has to be preserved by both the Parser and the Message in order
for the Generator to be able to implement it, but I think we can take
it as a given that we are going to preserve that data.)

The pieces that are shared between the Parser and the Message are really
about the Message:  how are the sub-objects represented?  How are the
structured headers represented?  So we could consider that the Parser
is a *consumer* of those pieces of policy, but that they are defined on
the Message, not on the Parser.

What this means is that the policy controlling each of the major
components (parser, message, generator) are in principle independent.

The design of the policy framework envisions having, for example, an
'HTTP' policy that would, say, expect and generate latin-1 encoded
headers, and generate headers without line breaks, using CRLF for the
line termination.  Initially I thought one would declare a policy
and that the Message object would remember that policy, but that you
could override it when, say, calling the generator.

Re-thinking it now, though, I think there are actually two distinct
components here: the I/O policy(s), and the Message construction policy.
That is, the things that the HTTP policy cares about are all Parser or
Generator controls.  The only things the Message (should) care about is
how to represent its components.  The Message is thus independent of any
policy *except* the header/mime classes, while the Parser and Generator
can be consumers of the header/mime class policy used to construct the
Message.  It nevertheless makes sense to group the parser and generator
policy controls together, since that is how we conceptually think of them
('HTML' implies a coherent set of input and output policies).

So, I think the "policy framework" is actually two things:  the
header/mime-types registry, and the Parser/Generator policies.  Let's have
'policy' refer to only the I/O policy, and call the other the email
class registry.

This narrower definition of policy is a straightforward enhancement
of the current API.  It makes these "knobs" more easily controlled,
and makes it easier to add new knobs without complicating the API.
I propose that I write up this policy API as a distinct proposal/patch
(with the work I've already done, this is more than half completed).
This would add policy keywords to the Parser and Generator classes,
and probably to the as_string method of Message.

The real meat of email6, then, is the header/mime-types registry, and
the changes in the API of the resulting Message objects.  The parser
currently accepts a _factory argument that specifies the object to be used
in creating the Message.   I propose that we deprecate this argument,
but that any code using it gets the old behavior of the parser (using
_factory to create the class for any new sub-objects).  Then we introduce
a new argument, 'factory'.  This new argument would expect a callable
that takes a mime-type as its argument, and returns an appropriate class.
The parser would be re-written so that it could use this factory, and
the backward compatibility case would be trivial to implement.

In theory the classes returned by the registry/factory are arbitrary,
but in practice we will need to define the minimal API that they
should provide.  By specifying the API separately from the concrete
implementation in email6, we will allow third parties to write classes
that can play well with programs expecting to operate on email6 Messages.
This will allow, for example, an MUA to provide custom classes to enhance
presentation, while still allowing the message to be submitted to smtplib
for transmission.

I guess I'm proposing, then, that there be an API version definition,
with two values as of Python3.3: email5 API, and email6 API.  We'll
figure out how we name and interrogate these formally later.

The Header registry in this vision is accessed through the Message class.
I have various thoughts about how this will work, but I'm going to leave
those for later, since this email is already long enough.  I also have
some additional thoughts about backward compatibility, but it is going
to require some experimentation to see if they are realistic.

--David
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

Glenn Linderman-3
On 3/1/2011 12:40 PM, R. David Murray wrote:
This is a long email, for which my apologies.  I hope you all will
manage to find some time to read it and provide feedback, as it speaks
to fundamental design issues.

Indeed.  Good to discuss before designing with ready-mix.

Everything else is an implementation detail :)

Agreed.

We propose to create a new API to make all of this easier for
the application programmer.  

YES!!

[*] There are current real-world use cases for this:  there are nntp
    servers that use utf-8 for headers, and the http protocol uses
    latin-1 (or sometimes, I think, utf-8)

All the tunables listed are relevant.  The HTTP protocol standard claims to use Latin-1 + RFC 2047 encoding for non-Latin-1 characters; in practice, the browser implementations apparently use nearly _any_ encoding for headers!!!  For <form> responses, when there is actually user-specified data involved, they use the encoding defined for the page containing the form, as the encoding of the MIME headers sent back.  The "standard headers" seem to be ASCII, and somewhat immune to choice of encoding, except perhaps for those few encodings that are not ASCII supersets. (I have no clue how such are handled, if they are.  Anyone want to write an EBCDIC page containing a <form> for testing?)

This is useful, as it reduces the amount of character escaping likely to be required, the designer of the page chooses a character set that can represent the page, and is likely in the language of the intended recipient, who is likely to fill out the form using the same language.

It would be more useful, if the browsers included a(n ASCII) header that specified the encoding of subsequent headers: they do not.  Therefore, the server that receives the headers must somehow "know" the proper encoding.  For the situation where the CGI (or equivalent) script both generates the page containing the <form> and receives the form data, this is simple.  For the situation where the same web application designer creates the page containing the <form> and the CGI receiving the form data, and explicitly or implicitly declares the same encoding for both, this is functional, but there is the danger of someone changing the static pages to conform to a new standard encoding without realizing the consequences on the associated CGI scripts.  It is also rather hard to create "form filling" applications that can send form data to a server bypassing the access of the form itself... such applications must also "know" the proper encoding, and such applications are much more likely to be generated outside the realm of the original development environment, and much less likely to be involved in any planning to change encodings inside the application <form>s and CGIs.

To support reading byte-stream HTTP headers, therefore, it is critical that the email API accept an encoding from the application which "knows" the encoding; presently cgi.py has to pre-decode incoming headers because email does not have such a parameter.  On the other hand, maybe cgi.py shouldn't use email header parsing at all... since browsers don't use RFC 2047 encoding in practice, the parsing of headers without such is straightforward.

Further, HTTP data streams can be extremely large, and thus time-consuming to obtain over the wire.  CGI applications cannot afford to keep large blocks of data in RAM during receipt, thus if email wishes to support CGI, it needs features for placing large blocks of data on disk instead of in RAM during the parsing phase; cgi.py presently has to preparse headers, to separate them from the data streams, which it then handles on its own, because of this issue.

Hence, cgi.py does sufficient preparsing and private handling of HTTP data streams, that it seems that the only real benefit it gains from using email at all, is the handling of the complex RFC 2047 decoding... which in practice isn't used in HTTP data streams!

In any case, if email wants to promulgate itself as the "one true way" to process HTTP data streams, as well as SMTP and NNTP data streams, then it needs to address the issues above.

There is, by the way, room for improvement in the cgi.py handler for HTTP data streams; presently all large MIME objects are written to disk (but small ones are kept as string or byte streams), but it isn't necessarily the right disk, and the data must then be again copied, byte by byte, to its final file system location.  I see that as abhorrent overhead.  There is presently no provision for hooks that ask the CGI application what to do with the data being received, while it is being received, nor for policies to assist with better heuristics, with the goal in mind that a properly and completely received MIME object could then be renamed to its final location rather than copied.

I guess I'm proposing, then, that there be an API version definition,
with two values as of Python3.3: email5 API, and email6 API.  We'll
figure out how we name and interrogate these formally later.

Question: While it is pretty clear that enhanced behaviors are required to benefit new applications that use email, and while some new APIs may be incompatible with some existing APIs, might it be possible to design the new API, and then build a compatibility layer that looks like the old API on top?  Such that there would be policies for the new APIs that would work like the old APIs to ease the implementation of such a layer?  I'm not sure I fully understand the use of _factory or factory parameters, but for APIs that have _factory and grow a factory, could not the presence of which parameter imply any variant functionality?

(OK, this question comes after not looking at the email API during all the GSOC and your implementation efforts since the last big round of discussion, but your proposals here seem to sound like it would be more possible with your current thinking that with your previous thinking.)

The Header registry in this vision is accessed through the Message class.
I have various thoughts about how this will work, but I'm going to leave
those for later, since this email is already long enough.  I also have
some additional thoughts about backward compatibility, but it is going
to require some experimentation to see if they are realistic.

Consider me an interested observer; I'll enjoy reading, thinking, and commenting about these ideas too, but sadly am unlikely to implement an email client this year :(  But I have aspirations to do so, because none of the existing email clients exactly suit my preferences... (everyone should write an editor and an email client, no?  I've done the former several times... what I want, though, is emacs-python, instead of emacs-lisp).

Glenn

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

R. David Murray
On Tue, 01 Mar 2011 13:58:50 -0800, Glenn Linderman <[hidden email]> wrote:
> To support reading byte-stream HTTP headers, therefore, it is critical
> that the email API accept an encoding from the application which "knows"
> the encoding; presently cgi.py has to pre-decode incoming headers
> because email does not have such a parameter.  On the other hand, maybe
> cgi.py shouldn't use email header parsing at all... since browsers don't
> use RFC 2047 encoding in practice, the parsing of headers without such
> is straightforward.

I think it could make sense for the default input character set to be
a policy parameter for the parser.  Maybe not in the first version,
though :)

Yes, it is simple(r) to parse headers if you don't have to worry about
RFC2047, but why duplicate code if you don't need to?  This assumes,
of course, that email6 does what cgi.py and similar programs need,
but I'll try to keep my eye on that.

> Further, HTTP data streams can be extremely large, and thus
> time-consuming to obtain over the wire.  CGI applications cannot afford
> to keep large blocks of data in RAM during receipt, thus if email wishes
> to support CGI, it needs features for placing large blocks of data on
> disk instead of in RAM during the parsing phase; cgi.py presently has to
> preparse headers, to separate them from the data streams, which it then
> handles on its own, because of this issue.

It is already in the plan to add disk caching support to the base email
API, so this will get addressed.  You may even be the one who suggested
designing the API as a general "storage" API so that different back-ends
can be hooked up.  In any case, that's what I've got in mind.

> There is, by the way, room for improvement in the cgi.py handler for
> HTTP data streams; presently all large MIME objects are written to disk
> (but small ones are kept as string or byte streams), but it isn't
> necessarily the right disk, and the data must then be again copied, byte
> by byte, to its final file system location.  I see that as abhorrent
> overhead.  There is presently no provision for hooks that ask the CGI
> application what to do with the data being received, while it is being
> received, nor for policies to assist with better heuristics, with the
> goal in mind that a properly and completely received MIME object could
> then be renamed to its final location rather than copied.

I think the hookable storage back end addresses this, but the concrete
implementation (eventually) provided by email ought to support it as well.

> > I guess I'm proposing, then, that there be an API version definition,
> > with two values as of Python3.3: email5 API, and email6 API.  We'll
> > figure out how we name and interrogate these formally later.
>
> Question: While it is pretty clear that enhanced behaviors are required
> to benefit new applications that use email, and while some new APIs may
> be incompatible with some existing APIs, might it be possible to design
> the new API, and then build a compatibility layer that looks like the
> old API on top?  Such that there would be policies for the new APIs that
> would work like the old APIs to ease the implementation of such a

Yes, this is what was behind my comment that I had further ideas
about backward compatibility.  One way is what Barry and I already
discussed:  a wrapper to put around an email6 object that would support
the email5 API.  Another approach is to have the email6 message itself
support the legacy API.  I haven't looked at every method, but most
of them would be supportable.  The tricky bit is headers:  an email6
Message will return Header objects, whereas an email5 application will
generally expect to get strings.  (It shouldn't!  But many will.  Even the
email package itself expects to get strings when it accesses headers.)
My wild thought at this point is:  what if Header subclassed string?
With the exception of a few structured headers such as address headers,
this might actually work pretty well.  But experimentation with some
at least semi-real-world examples would be needed to prove out the
concept.

> layer?  I'm not sure I fully understand the use of _factory or factory
> parameters, but for APIs that have _factory and grow a factory, could
> not the presence of which parameter imply any variant functionality?

I'm not sure what you are asking here.  In what I outlined for the parser
API, you'd get an email5-API object if you used _factory or nothing,
and and email6 API object if you used factory, so yes, in that sense
the parameter determines the API.  But what about a library that is
accepting a Message object?  It needs a way to detect whether or not
it has been passed an email5 API message, or an email6 one.

> (OK, this question comes after not looking at the email API during all
> the GSOC and your implementation efforts since the last big round of
> discussion, but your proposals here seem to sound like it would be more
> possible with your current thinking that with your previous thinking.)

Well, in my previous thinking I was intending on doing much the same thing
as far as backward compatibility went (having a policy that provided an
email5 compatible object), I just hadn't talked about it much :)  The
biggest difference now is that email5 will be the default, at least in
the Python3.3 release.

> Consider me an interested observer; I'll enjoy reading, thinking, and
> commenting about these ideas too, but sadly am unlikely to implement an
> email client this year :(  But I have aspirations to do so, because none
> of the existing email clients exactly suit my preferences... (everyone
> should write an editor and an email client, no?  I've done the former
> several times... what I want, though, is emacs-python, instead of
> emacs-lisp).

Thanks for your attention and comments.  I haven't implemented an editor
yet (VIM + Python has been good enough so far), but I have implemented
parts of an email client, and intend to finish that project as part of
working on email6, as an API test bed.

--David
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

R. David Murray
On Tue, 01 Mar 2011 17:59:10 -0500, "R. David Murray" <[hidden email]> wrote:

> On Tue, 01 Mar 2011 13:58:50 -0800, Glenn Linderman <[hidden email]> wrote:
> > To support reading byte-stream HTTP headers, therefore, it is critical
> > that the email API accept an encoding from the application which "knows"
> > the encoding; presently cgi.py has to pre-decode incoming headers
> > because email does not have such a parameter.  On the other hand, maybe
> > cgi.py shouldn't use email header parsing at all... since browsers don't
> > use RFC 2047 encoding in practice, the parsing of headers without such
> > is straightforward.
>
> I think it could make sense for the default input character set to be
> a policy parameter for the parser.  Maybe not in the first version,
> though :)

Just to clarify:  in the first version I check in.  I'd expect to decide
about that part of the API not too far in to the development process,
and certainly well before 3.3.

--David
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

R. David Murray
In reply to this post by R. David Murray
On Tue, 01 Mar 2011 16:55:58 -0800, Glenn Linderman <[hidden email]> wrote:

> On 3/1/2011 2:59 PM, R. David Murray wrote:
> > On Tue, 01 Mar 2011 13:58:50 -0800, Glenn Linderman<[hidden email]>  wrote:
> Another reason is if the existing code handles many cases that are not
> needed, and cannot be optimized for the case that is needed.  A "fast
> path" reimplementation can eliminate the cases that are not needed, and
> speed the result.  That, of course, depends on the internals of the
> parsing of headers in the email package, and how much overhead RFC 2047
> adds to that, which I haven't investigated and don't know.  Happily,
> when uploading big files, headers are a  tiny fraction of time spent.  
> Sadly, when using large fill-in-the-blanks forms, header parsing can be
> a significant fraction of the time spent.

I think the overhead if there are no encoded words in the header should
be minimal (probably a re scan, but possibly not even that, we'll see).
This could also be controlled by the policy (ie: the HTTP policy could
cause the header parser to skip the check-for-rfc2047-encoded-words
step).

> Presently, the cgi.py stream API only provides a open-file-like handle
> to the data... so it can be read, written, and sought, but not assigned
> to a specific filesystem, renamed, or moved using os facilities.  So a
> broader API seems to be necessary for cgi.py; if that were available in
> email, that would be helpful for cgi.py.

Yeah, additions to the cgi API are probably required to support this
properly.

> Hmm.  And while it might be more complex to handle structured headers,
> in fact they come in a character sequences, so a mapping to string is
> not impossible.  The real issue is if those headers had another API in
> email5 (I could look that up, I guess), but perhaps that API could also
> be supported along with a subclass of string.

They don't.  The issue is that what we would like is for the email6 API
for the address header to be that it looks like a list of Address objects.
So msg['To'][0] would yield an address object.  But if we also want the
header to look like a string, that won't work, because as a string that
should yield the first character of the body of the header.

Now, a sensible application would process the list of addresses in a To
header by passing it to util.getaddresses, but you can bet that there
are applications that don't do that.

A compromise would be to have an 'addresses' method that returned the
list of addresses.  Perhaps this would even be sensible in the context of
email6 by itself:  it would mean that all headers had a uniform base API
(they act like strings) and all structured information is accessed via
special methods.

> OK, what I was asking boils down to if the Message object can support
> both APIs, the application doesn't need to care.  New applications would
> probably want to use the new APIs, of course.  But they could be passed
> between old and new applications (or fragments thereof) if they support
> both.  It certainly wouldn't hurt to introduce the concept of a version
> for the object, although in itself, that would only be accessible via a
> new API, so old applications wouldn't think to use it...

Yeah, that would be an ideal world.  Let's see how close we can get :)

--David
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

R. David Murray
On Tue, 01 Mar 2011 18:36:47 -0800, Glenn Linderman <[hidden email]> wrote:

> On 3/1/2011 5:45 PM, R. David Murray wrote:
> > On Tue, 01 Mar 2011 16:55:58 -0800, Glenn Linderman<[hidden email]>  wrote:
> >> On 3/1/2011 2:59 PM, R. David Murray wrote:
> >>> On Tue, 01 Mar 2011 13:58:50 -0800, Glenn Linderman<[hidden email]>   wrote:
> >> Hmm.  And while it might be more complex to handle structured headers,
> >> in fact they come in a character sequences, so a mapping to string is
> >> not impossible.  The real issue is if those headers had another API in
> >> email5 (I could look that up, I guess), but perhaps that API could also
> >> be supported along with a subclass of string.
> > They don't.  The issue is that what we would like is for the email6 API
> > for the address header to be that it looks like a list of Address objects.
> > So msg['To'][0] would yield an address object.  But if we also want the
> > header to look like a string, that won't work, because as a string that
> > should yield the first character of the body of the header.
> >
> > Now, a sensible application would process the list of addresses in a To
> > header by passing it to util.getaddresses, but you can bet that there
> > are applications that don't do that.
> >
> > A compromise would be to have an 'addresses' method that returned the
> > list of addresses.  Perhaps this would even be sensible in the context of
> > email6 by itself:  it would mean that all headers had a uniform base API
> > (they act like strings) and all structured information is accessed via
> > special methods.
>
> While  msg['To']  producing a structured result might not be possible
> when subclassing string, you mention one possible alternative, an
> additional method... seems like you mean msg['To'].addresses()?  It
> would also be possible to make  msg.p['To'] for parsed/structured
> results.  I'm not sure which would be easier to implement, or more
> flexible under the covers to do caching of parsed/structured results.  
> Of course there are several headers dealing with lists of addresses, as
> you are well aware, so  msg.addresses() wouldn't work without some
> specification of the header.

Yes, exactly msg['To'].addresses (might as well use a property).
I think I prefer this to a separate retrieval method, since not all
headers are structured headers, and it is not clear what the "parsed"
version of a non-structured header would be (a plain string?).

--David
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

Steffen Daode Nurpmeso-2
In reply to this post by R. David Murray
I've also read the updated EMAIL-SIG DesignThoughts.

But if "what goes in .defects[]" will be configurable i would hope
for a generic is_malformed() and maybe is_processable() or the
like, i.e. state versus (translatable?) user-info.
(The more i think about it the more i agree with David (i hope
i don't lie about that) that it's a waste of time to try to
convert malformed data to a compliant state, especially if the
package is - by design - capable to spit out the data the very
same way it came in.  Someone will take care - and throw it away.)

I also go for lazy parsing when designing an email package.
(Pluggable) File-based backend.

Besides that all of this, and including the things David explained
in the issue tracker, sounds like smoked tofu to me. ;-)

Unfortunately my non-hate mail seems to have been mistreated as
spam 8-}, therefore i wrote all of the above just to thank David
once again for making the email and mailbox packages usable
already in Python 3.2.  Thanks.
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

Barry Warsaw
In reply to this post by R. David Murray
On Mar 01, 2011, at 03:40 PM, R. David Murray wrote:

>So, and here is the point of this email, how does the policy framework
>integrate into this design?
[...]

>This list breaks down into items that affect the Parser, ones that affect
>the Generator, and ones that affect both the Parser and the Message.
>(Well, the "how much transformation" affects all three in the sense that
>the data has to be preserved by both the Parser and the Message in order
>for the Generator to be able to implement it, but I think we can take
>it as a given that we are going to preserve that data.)
>
>The pieces that are shared between the Parser and the Message are really
>about the Message:  how are the sub-objects represented?  How are the
>structured headers represented?  So we could consider that the Parser
>is a *consumer* of those pieces of policy, but that they are defined on
>the Message, not on the Parser.
>
>What this means is that the policy controlling each of the major
>components (parser, message, generator) are in principle independent.
[...]
>Re-thinking it now, though, I think there are actually two distinct
>components here: the I/O policy(s), and the Message construction policy.
[...]
>So, I think the "policy framework" is actually two things:  the
>header/mime-types registry, and the Parser/Generator policies.  Let's have
>'policy' refer to only the I/O policy, and call the other the email
>class registry.

+1

This makes a lot of sense, and I'm glad you've been thinking about this more
deeply than I have since we last bandied it about.  At the time, I thought a
single policy hierarchy would probably be fine, but you've laid out a good
argument for keeping them separate, and in fact not even calling the latter a
'policy'.  Here's another distinction:

Policy objects should be composable.  This would allow for a standard library
of policies that could be mixed and matched for specific applications, and
might even include some higher level policies like 'CGI' or 'NNTP'.  E.g. my
applications might combine a standard 'don't-check-rfc-2047' policy with a
'use-only-CRNL' and 'die-on-defect'.

I wonder too, how sophisticated policy objects really need to be.  Are they
just bags of attributes with some defaults, properties for access, maybe some
validation, and composability?

As for the registry, I don't think you need anything near that.  You just need
to say "when you see this mime-type, create an object using this callable".
Multiple registrations might be useful, but I don't think composability is.

>The real meat of email6, then, is the header/mime-types registry, and
>the changes in the API of the resulting Message objects.  The parser
>currently accepts a _factory argument that specifies the object to be used
>in creating the Message.   I propose that we deprecate this argument,
>but that any code using it gets the old behavior of the parser (using
>_factory to create the class for any new sub-objects).  Then we introduce
>a new argument, 'factory'.  This new argument would expect a callable
>that takes a mime-type as its argument, and returns an appropriate class.
>The parser would be re-written so that it could use this factory, and
>the backward compatibility case would be trivial to implement.
+1.  The underscore name in _factory is a historical wart that's not needed
any more.  I'm not even sure it makes much sense any more in Message
subclasses.  It *does* still make sense in e.g. add_header() where there's a
potential name collision between the arguments and the **params.  We should
evaluate these more carefully given today's API and clean this up if possible
(modulo all b/c considerations).

>In theory the classes returned by the registry/factory are arbitrary,
>but in practice we will need to define the minimal API that they
>should provide.  By specifying the API separately from the concrete
>implementation in email6, we will allow third parties to write classes
>that can play well with programs expecting to operate on email6 Messages.
>This will allow, for example, an MUA to provide custom classes to enhance
>presentation, while still allowing the message to be submitted to smtplib
>for transmission.

+1

>I guess I'm proposing, then, that there be an API version definition,
>with two values as of Python3.3: email5 API, and email6 API.  We'll
>figure out how we name and interrogate these formally later.
>
>The Header registry in this vision is accessed through the Message class.
>I have various thoughts about how this will work, but I'm going to leave
>those for later, since this email is already long enough.  I also have
>some additional thoughts about backward compatibility, but it is going
>to require some experimentation to see if they are realistic.

Cool.  Really great stuff David.

-Barry

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

Barry Warsaw
In reply to this post by Glenn Linderman-3
On Mar 01, 2011, at 01:58 PM, Glenn Linderman wrote:

>(everyone should write an editor and an email client, no?

Is there really any difference?

http://www.catb.org/~esr/jargon/html/Z/Zawinskis-Law.html

That's also the proof that the email package is the most important one in
Python because it will eventually be used by every Python application ever
written.

-Barry

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

Barry Warsaw
In reply to this post by R. David Murray
On Mar 01, 2011, at 08:45 PM, R. David Murray wrote:

>They don't.  The issue is that what we would like is for the email6 API
>for the address header to be that it looks like a list of Address objects.
>So msg['To'][0] would yield an address object.  But if we also want the
>header to look like a string, that won't work, because as a string that
>should yield the first character of the body of the header.

Here's where things get really interesting because you won't actually know
what msg[header][0] could return for any arbitrary value of 'header'.

For structured headers like To, msg['To'] can return an ordered sequence of
address objects, but what about msg['Received'] or msg['X-Happy-Fun-Ball']?
The same will go for anything like .addresses.

I'm not sure what the implications of this for the API are, but it's important
to keep in mind (I know RDM knows this) that structured headers need extra
parsing and will have more sophisticated objects representing them.

-Barry

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

R. David Murray
In reply to this post by Steffen Daode Nurpmeso-2
On Wed, 02 Mar 2011 21:40:39 +0100, Steffen Daode Nurpmeso <[hidden email]> wrote:
> But if "what goes in .defects[]" will be configurable i would hope
> for a generic is_malformed() and maybe is_processable() or the
> like, i.e. state versus (translatable?) user-info.

I'm not sure what you are asking for here.  I think "if msg.is_malformed()"
is spelled "if msg.defects".  That is, if the defects list is non-empty,
the message is technically malformed.  Of course, that information by
itself isn't necessarily useful, which is why defects is a list
of defects.  "is_processable" lies in the eyes of the application.
What defects is it capable of dealing with?  The email package
can't know that.  So, again, that's why defects is a list.

Let me clarify what I mean by the policy controlling "what, exactly, is
a defect".  The idea here is that when parsing an email, each deviance
from the RFCs counts as a defect (the current email package, by the way,
only detects a small number of such defects!).  But when parsing, say,
an http stream, non-ascii characters in headers are perfectly legal.
So it seems to make sense that the HTTP policy would change what counts
as a defect during the operation of the parser.

> (The more i think about it the more i agree with David (i hope
> i don't lie about that) that it's a waste of time to try to
> convert malformed data to a compliant state, especially if the
> package is - by design - capable to spit out the data the very
> same way it came in.  Someone will take care - and throw it away.)

Well, I think we may provide some tools to do such "fixups" when it is
possible and the application wants it.  But they should be app-requested
transformations, not automatic ones.

> Unfortunately my non-hate mail seems to have been mistreated as
> spam 8-}, therefore i wrote all of the above just to thank David
> once again for making the email and mailbox packages usable
> already in Python 3.2.  Thanks.

You are welcome :)

--David
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

R. David Murray
In reply to this post by Barry Warsaw
On Wed, 02 Mar 2011 15:46:24 -0500, Barry Warsaw <[hidden email]> wrote:

> On Mar 01, 2011, at 03:40 PM, R. David Murray wrote:
> >So, I think the "policy framework" is actually two things:  the
> >header/mime-types registry, and the Parser/Generator policies.  Let's have
> >'policy' refer to only the I/O policy, and call the other the email
> >class registry.
>
> +1
>
> This makes a lot of sense, and I'm glad you've been thinking about this more
> deeply than I have since we last bandied it about.  At the time, I thought a
> single policy hierarchy would probably be fine, but you've laid out a good
> argument for keeping them separate, and in fact not even calling the latter
> a 'policy'.  Here's another distinction:
>
> Policy objects should be composable.  This would allow for a standard library
> of policies that could be mixed and matched for specific applications, and
> might even include some higher level policies like 'CGI' or 'NNTP'.  E.g. my
> applications might combine a standard 'don't-check-rfc-2047' policy with a
> 'use-only-CRNL' and 'die-on-defect'.

Yes, my current implementation of policy objects allows you to say
things like:

    policy = HTTP + Strict

where HTTP is the obvious and 'Strict' is a policy that sets the "raise
on defect" flag.

> I wonder too, how sophisticated policy objects really need to be.  Are they
> just bags of attributes with some defaults, properties for access, maybe some
> validation, and composability?

Pretty much.  I think they will also contain some callable methods,
to provide hooks where a policy subclass can implement a custom policy.
My current implementation has such a hook for registering defects, which
would allow a custom policy to, for example, log the defects in addition
to or instead of putting them into the defects list.

> As for the registry, I don't think you need anything near that.  You just need
> to say "when you see this mime-type, create an object using this callable".
> Multiple registrations might be useful, but I don't think composability is.

Well, I'm thinking that a minimal sort of composability *is* useful.
One of the annoying things about class hierarchies is that if you want to
add a feature to the base class, you have to make new subclasses for *all*
of the classes in the hierarchy (unless you monkey patch).  What I was
thinking of was to have the registry have a 'base class' slot that got
used as the base class for all the mime-type classes, composed on the fly
at instantiation time (and similarly for the headers).  That way if you
wanted to add features to all the classes in the hierarchy, you could
register your custom 'base class' and not need to touch anything else.
But since the API for the registry is now a callable, and especially if
we specify it as returning callables, then doing such composition could
be left to the application (perhaps with a recipe in the docs).

Composing registries can thus also be left to the application.  email6
itself should have only one, I think, or if there are two the other will
be the email5 back-compat registry and there'd be no reason to compose
with it.

I'm not sure what we you mean by multiple registrations.  Can you give
an example?

> >The real meat of email6, then, is the header/mime-types registry, and
> >the changes in the API of the resulting Message objects.  The parser
> >currently accepts a _factory argument that specifies the object to be used
> >in creating the Message.   I propose that we deprecate this argument,
> >but that any code using it gets the old behavior of the parser (using
> >_factory to create the class for any new sub-objects).  Then we introduce
> >a new argument, 'factory'.  This new argument would expect a callable
> >that takes a mime-type as its argument, and returns an appropriate class.
> >The parser would be re-written so that it could use this factory, and
> >the backward compatibility case would be trivial to implement.
>
> +1.  The underscore name in _factory is a historical wart that's not needed
> any more.  I'm not even sure it makes much sense any more in Message
> subclasses.  It *does* still make sense in e.g. add_header() where there's a
> potential name collision between the arguments and the **params.  We should
> evaluate these more carefully given today's API and clean this up if possible
> (modulo all b/c considerations).

Ah, so *that's* what those underscores are for.  I always wondered.
Yeah, I think we can do a lot of cleanup here.

> Cool.  Really great stuff David.

Thanks.

--David
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

Steffen Daode Nurpmeso-2
In reply to this post by R. David Murray
On Wed, Mar 02, 2011 at 07:50:20PM -0500, R. David Murray wrote:

> That is, if the defects list is non-empty,
> the message is technically malformed.  Of course, that information by
> itself isn't necessarily useful, which is why defects is a list
> of defects.
> "is_processable" lies in the eyes of the application.
> What defects is it capable of dealing with?  The email package
> can't know that.  So, again, that's why defects is a list.
>
> Let me clarify what I mean by the policy controlling "what, exactly, is
> a defect".  The idea here is that when parsing an email, each deviance
> from the RFCs counts as a defect (the current email package, by the way,
> only detects a small number of such defects!).  But when parsing, say,
> an http stream, non-ascii characters in headers are perfectly legal.
> So it seems to make sense that the HTTP policy would change what counts
> as a defect during the operation of the parser.

So i would hope for '.all_defects[]' and (policy-adjusted)
'.defects[]'.  I would hope for
'.had_header_defects(policy_only=True)',
'.had_payload_defects(policy_only=True)'.

Doing so would fill the huge hole in between 'not len(defects)'
and the detailed inspection of a defects list which consists of
a highly differentiated tree of classes.

The parser has to parse- and does encounter all of these anyway,
and an application cannot re-collect this (dropped) information
except with expensive effort, i.e. at least choosing a different,
stricter policy followed by another parse of the bogus mail.

In the end it is my believe that a framework should bring light
onto all aspects of a thing, such that no other framework is ever
needed, but especially not on a lower level (except the framework
is so designed that it allows replacement of its own low-level
interface, say).
And i don't think there can be a higher level interface than
message_from_(bytes|string)().
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

R. David Murray
On Thu, 03 Mar 2011 16:28:32 +0100, Steffen Daode Nurpmeso <[hidden email]> wrote:

> On Wed, Mar 02, 2011 at 07:50:20PM -0500, R. David Murray wrote:
> > That is, if the defects list is non-empty,
> > the message is technically malformed.  Of course, that information by
> > itself isn't necessarily useful, which is why defects is a list
> > of defects.
> > "is_processable" lies in the eyes of the application.
> > What defects is it capable of dealing with?  The email package
> > can't know that.  So, again, that's why defects is a list.
> >
> > Let me clarify what I mean by the policy controlling "what, exactly, is
> > a defect".  The idea here is that when parsing an email, each deviance
> > from the RFCs counts as a defect (the current email package, by the way,
> > only detects a small number of such defects!).  But when parsing, say,
> > an http stream, non-ascii characters in headers are perfectly legal.
> > So it seems to make sense that the HTTP policy would change what counts
> > as a defect during the operation of the parser.
>
> So i would hope for '.all_defects[]' and (policy-adjusted)
> '.defects[]'.  I would hope for
> '.had_header_defects(policy_only=True)',
> '.had_payload_defects(policy_only=True)'.

Well, what is a defect for an HTTP parse is not the same as what is
a defect for an email parse, so I don't know what "all defects" would
consist of.  The recovery decisions the parser makes can also be affected
by the policy, so there can't, as far as I can see, be a single list of
"all defects" that applies to all parses.

Currently the email package does not report header defects.  When it does,
my plan is that each Header will have its own defect list, and likewise
each message body (using a recursive definition).  How the defects list
on the Message object interacts with this is an interesting API question
worthy of discussion.  Perhaps we do, after all, have some sort of
"has_defects" method that queries the constituent parts, and perhaps a
function that returns a list of parts with defects, possibly divided
between headers and body as you suggest.

> Doing so would fill the huge hole in between 'not len(defects)'
> and the detailed inspection of a defects list which consists of
> a highly differentiated tree of classes.

Yeah, the number of different defect classes involved in this scheme
worries me a little bit.

> The parser has to parse- and does encounter all of these anyway,
> and an application cannot re-collect this (dropped) information
> except with expensive effort, i.e. at least choosing a different,
> stricter policy followed by another parse of the bogus mail.

Why recollect?  The list is there (and, as I indicated above, will be
associated with the part that contains the error).  The list of defects
will be *all* the defects detected by that policy: all RFC deviance
(well, perhaps not quite all...see below).  Defects don't normally raise
errors, so there's no reason not lot look for all of the relevant ones
(and indeed, we are probably only detecting the ones that actually affect
the parsing).

That is, if you parse an HTTP stream, encountering a non-ASCII character
is *not* a defect.  It doesn't make any sense to me to report an
"if this were an email this would be a defect" defect.  And if the
header for some strange reason included an RFC2047 encoded word that
was invalidly formed...well, in an HTTP parse that would *technically*
violate the RFC, but in practice it really means that the data should
just be passed through as is.  That is, it's not a defect, and we
would be be wasting time even *looking* for RFC2047 encoded words.
(Unless someone finds a browser or server that generates them!)

In other words, in the base package I don't think there are "strict"
and "less strict" parsing policies; rather there are *different* parsing
policies depending on the context.  As far as I can see, it makes no sense
to parse an HTTP stream, and the reparse it as if it were an email stream.
Now, it might be useful to design a "very_strict" policy that did extra
work looking for RFC defects that a normal parse wouldn't detect (I can't
think of any off the top of my head, but the email RFCs are so complex
that I'm sure there are some), but in that case if you parsed it with
the less-strict (normal) policy those defects would *not* be noticed
by the parser.  In any case, I think such a validating parser/policy is
out of scope for the current package.

--David
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

Barry Warsaw
In reply to this post by R. David Murray
On Mar 02, 2011, at 08:23 PM, R. David Murray wrote:

>Pretty much.  I think they will also contain some callable methods,
>to provide hooks where a policy subclass can implement a custom policy.
>My current implementation has such a hook for registering defects, which
>would allow a custom policy to, for example, log the defects in addition
>to or instead of putting them into the defects list.

Makes sense.

>I'm not sure what we you mean by multiple registrations.  Can you give
>an example?

I really meant multiple registries, mostly thinking about how to avoid some
global state.  But Python already has some global registries, so maybe that's
not too bad in this case.

-Barry

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

R. David Murray
On Thu, 03 Mar 2011 21:55:59 -0500, Barry Warsaw <[hidden email]> wrote:
> On Mar 02, 2011, at 08:23 PM, R. David Murray wrote:
> >I'm not sure what we you mean by multiple registrations.  Can you give
> >an example?
>
> I really meant multiple registries, mostly thinking about how to avoid some
> global state.  But Python already has some global registries, so maybe that's
> not too bad in this case.

Ah, yes.  Well, so far my thought is that there is a global registry
for the email package itself, but since email package access to that
registry will be through the 'factory', there is nothing that says that
has to be the only registry used by an application.  The existence of
the email package global registry will allow the addition of classes
to the "default" registry by libraries (if we dare :) and applications,
while access through the factory means that an application is free
to manage a completely independent registry if it prefers.  Or perhaps
it is better to think about the default email package registry as
just that, the *default* registry, since I think it's only specialness
will be that it is the registry that is used by default.

But that's just my current thought, if anyone can think of a better
design I'm all ears.

I should note that one design concern I have in all this is that it so
far looks like importing email will, under this registry design, end up
importing pretty much *all* of the email classes (and there will be more
of them than in the current package).  I'm so far ignoring that issue,
treating it as a premature optimization, but if anyone has any clever
ideas or other thoughts, let me know.

--David
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

Barry Warsaw
On Mar 04, 2011, at 08:33 AM, R. David Murray wrote:

>Ah, yes.  Well, so far my thought is that there is a global registry
>for the email package itself, but since email package access to that
>registry will be through the 'factory', there is nothing that says that
>has to be the only registry used by an application.  The existence of
>the email package global registry will allow the addition of classes
>to the "default" registry by libraries (if we dare :) and applications,
>while access through the factory means that an application is free
>to manage a completely independent registry if it prefers.  Or perhaps
>it is better to think about the default email package registry as
>just that, the *default* registry, since I think it's only specialness
>will be that it is the registry that is used by default.
I think that's a great place to start.

>But that's just my current thought, if anyone can think of a better
>design I'm all ears.
>
>I should note that one design concern I have in all this is that it so
>far looks like importing email will, under this registry design, end up
>importing pretty much *all* of the email classes (and there will be more
>of them than in the current package).  I'm so far ignoring that issue,
>treating it as a premature optimization, but if anyone has any clever
>ideas or other thoughts, let me know.

Yeah, that's a problem.  Maybe we (the Python community) should invest in good
lazy importing support for Python 3.3?  I know that this has been reinvented
several times already.

-Barry

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

Steffen Daode Nurpmeso-2
In reply to this post by R. David Murray
I was never involved in discussions, so that the topics i address
may have been defined for EMAIL 6 already etc.,
but because i've not found anything in the archives of the list
back in 2010 i add yet another feature request which really
worries me.

I find the interface a bit inconsistent in respect to
replace_header() (replaces the first header found), __delitem__()
(drops them all), __setitem__() (appends) in any case.
(I personally would through these __accessor__ things away, they
taste a bit strange when used to access email payload.)

And i would provide a series of functions which can be used
to get/set/modify header fields and bodies:
i would check wether the argument is a list and if, it would mean
"all bodies of a field".  This is of course very hard to implement
if it's done gracefully, i.e. with modification-detection,
order-preservation etc.

Another, easier to implement, idea would be (yet) an(other)
iterator which supports in-place editing.  Perfect: it could yield
a (to be invented) class which offers methods like .field(),
.bodies() (all [bodies] - maybe even as sub-iterator),
.remove_field() etc...
Doing it like this would offer the possibility to easily detect
in-place editing of header bodies etc...

All of these are just suggestions and my very personal point of
view, of course.
But one thing is true, and that's that it is currently really hard
to remove or replace just one body of a field, especially if there
are multiple bodies for a field.

-- Steffen Daode
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

Barry Warsaw
On Mar 07, 2011, at 09:06 PM, Steffen Daode Nurpmeso wrote:

>I find the interface a bit inconsistent in respect to
>replace_header() (replaces the first header found), __delitem__()
>(drops them all), __setitem__() (appends) in any case.
>(I personally would through these __accessor__ things away, they
>taste a bit strange when used to access email payload.)

I personally like this part of the API, and I think it's held up well under
years of use.  In general you don't care about header order, so using various
combinations of del, .get_all(), and __setitem__ work fine.  The semantics of
message-as-dict API, header ordering, the various header methods, etc. was
thought out and discussed, and I don't have a problem with them.

>And i would provide a series of functions which can be used
>to get/set/modify header fields and bodies:
>i would check wether the argument is a list and if, it would mean
>"all bodies of a field".  This is of course very hard to implement
>if it's done gracefully, i.e. with modification-detection,
>order-preservation etc.
>
>Another, easier to implement, idea would be (yet) an(other)
>iterator which supports in-place editing.  Perfect: it could yield
>a (to be invented) class which offers methods like .field(),
>.bodies() (all [bodies] - maybe even as sub-iterator),
>.remove_field() etc...
>Doing it like this would offer the possibility to easily detect
>in-place editing of header bodies etc...
>
>All of these are just suggestions and my very personal point of
>view, of course.
>But one thing is true, and that's that it is currently really hard
>to remove or replace just one body of a field, especially if there
>are multiple bodies for a field.
Well, replace one header retaining original order is a bit difficult, but I've
rarely had to do that.  Still, it would probably make sense to add such
functionality -- *if* it can be done without complicating the API or the
implementation.  I think it could too, by adding an index argument to
.replace_header(), and using .get_all() to get an ordered list of the headers
of interest.

Cheers,
-Barry

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: API thoughts

Steffen Daode Nurpmeso-2
Barry Warsaw wrote:
> I personally like this part of the API, and I think it's held up well under
> years of use.

:-)
msg[f] is indeed and really an elegant and understand-at-a-glance
way to access headers.
(Possible restriction: it would be graceful if it would return and
take a list.)

> Well, replace one header retaining original order is a bit difficult, but I've
> rarely had to do that.
[...]
> I think it could too, by adding an index argument to .replace_header(),
> and using .get_all() to get an ordered list of the headers of interest.

... and give me a way to also delete just one body of a field and
i'll be lucky.
Maybe simply 'Message._headers = {normalized_field = [bodies]}'?
But, why not .delete_all_of(0, 2, 5), realized by a walk in equal
spirit to .get_all().

(My thought was that a new Proxy class can be added very easily,
requiring only one new method in Message and
without affecting the remaining interface,
whatever status David's local EMAIL 6 branch is currently in and
whatever approach he will have chosen in the end.

Anyway, and unless i missed something, this is the current way:

    def _bewitch_msg(self):
        """Handle Python 3.2.0/3.3a0 issue 11401 email/message.py error"""
        if sys.hexversion > 0x030300A1 or sys.hexversion > 0x030200F1:
            return

        for f in self._msg:
            had_repl = False
            new_ab = []
            ab = self._msg.get_all(f)
            for b in ab:
                if not len(b):
                    had_repl = True
                    b = ' '
                new_ab.append(b)
            if had_repl:
                del self._msg[f]
                for b in new_ab:
                    self._msg[f] = b

At best the very same could be achieved (faster and with smaller
memory footprint):

        for p in self._msg.proxy_iter():
            for (idx, body) in p:
                if not len(body):
                    p[idx] = ' '
)
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
12
Loading...