Quantcast

header folding

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

header folding

R. David Murray
Well, my big project still hasn't kicked off, so I'm still working on email6.

I just posted a new blog post:

    http://www.bitdance.com/blog/2011/07/25_01_email6_pypi_release/

The PyPI release is old news here.  The interesting part of the post
for this group is the discussion of the new header folding API at
the end.  Basically, BaseHeader gets a 'wrap' method, and there is
a new policy control, 'refold_source' (I'll probably rename it to
'rewrap_source', since I expect to apply it also to message bodies).
The policy control has three values: none, long, and all.  None means
never touch the source, always use it.  long means refold a header if
any if the source's component lines are longer than max_line_length.
'all' means refold everything.

Email5.1 wraps long lines, but leaves short lines alone.  Under 'long',
this code refolds the whole header if there is a long line in it.
I think that is more RFC compliant, and I don't think it will cause any
problems if used.

The default for refold_source is 'none'.  I'm considering this a bug
fix, since a stated goal of the email package is to reproduce the source
accurately if possible.

(Currently the new code still calls Header to do the folding; writing
the new folder is my next task.)

--
R. David Murray           http://www.bitdance.com
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

header folding

Stephen J. Turnbull
R. David Murray writes:

 > the end.  Basically, BaseHeader gets a 'wrap' method, and there is
 > a new policy control, 'refold_source' (I'll probably rename it to
 > 'rewrap_source', since I expect to apply it also to message
 > bodies).

This bothers me.  Folding and wrapping are two different things.

Folding is about invertibly reformatting a single logical line to make
machines happy during transmission, what wrapping "does" is not 100%
clear to me but it's about making people happy.  (I put "does" in
quotes because it's not obvious to me that the source of wrapped text
necessarily is a single anything, nor that wrapping need be
invertible.)

I grant that people and many MUAs take a different point of view about
header folding, but clearly the RFCs have moved away from placing any
importance on presentation aspects toward specifying an invertible
transformation exactly.  On the other hand, I think that wrapping
should place emphasis on presentation.
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: header folding

R. David Murray
On Tue, 26 Jul 2011 13:03:11 +0900, "Stephen J. Turnbull" <[hidden email]> wrote:

> R. David Murray writes:
>
>  > the end.  Basically, BaseHeader gets a 'wrap' method, and there is
>  > a new policy control, 'refold_source' (I'll probably rename it to
>  > 'rewrap_source', since I expect to apply it also to message
>  > bodies).
>
> This bothers me.  Folding and wrapping are two different things.
>
> Folding is about invertibly reformatting a single logical line to make
> machines happy during transmission, what wrapping "does" is not 100%
> clear to me but it's about making people happy.  (I put "does" in
> quotes because it's not obvious to me that the source of wrapped text
> necessarily is a single anything, nor that wrapping need be
> invertible.)
>
> I grant that people and many MUAs take a different point of view about
> header folding, but clearly the RFCs have moved away from placing any
> importance on presentation aspects toward specifying an invertible
> transformation exactly.  On the other hand, I think that wrapping
> should place emphasis on presentation.

Hmm.  Makes sense to me.  So you'd rather the method were called "fold"
and that refold_source remains the name of the policy control.

What's the word for what is done when a text message is made to have
a line length of less than 78 by using quoted printable (or base64)
encoding?  Is that also folding?  If there's no existing term in common
use, folding would make sense to me.  So I have no objection to using
'fold' consistently in the api and code for these operations.

Can anyone see a use case for controlling folding of headers separately
from folding of message bodies?  I haven't thought of one, which is why
I'm thinking one policy knob controls both.

--David
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: header folding

Barry Warsaw
On Jul 26, 2011, at 08:38 AM, R. David Murray wrote:

>On Tue, 26 Jul 2011 13:03:11 +0900, "Stephen J. Turnbull" <[hidden email]> wrote:
>> R. David Murray writes:
>>
>>  > the end.  Basically, BaseHeader gets a 'wrap' method, and there is
>>  > a new policy control, 'refold_source' (I'll probably rename it to
>>  > 'rewrap_source', since I expect to apply it also to message
>>  > bodies).
>>
>> This bothers me.  Folding and wrapping are two different things.
>>
>> Folding is about invertibly reformatting a single logical line to make
>> machines happy during transmission, what wrapping "does" is not 100%
>> clear to me but it's about making people happy.  (I put "does" in
>> quotes because it's not obvious to me that the source of wrapped text
>> necessarily is a single anything, nor that wrapping need be
>> invertible.)
>>
>> I grant that people and many MUAs take a different point of view about
>> header folding, but clearly the RFCs have moved away from placing any
>> importance on presentation aspects toward specifying an invertible
>> transformation exactly.  On the other hand, I think that wrapping
>> should place emphasis on presentation.
>
>Hmm.  Makes sense to me.  So you'd rather the method were called "fold"
>and that refold_source remains the name of the policy control.
Stephen makes a good one, one I agree with.

>What's the word for what is done when a text message is made to have
>a line length of less than 78 by using quoted printable (or base64)
>encoding?  Is that also folding?  If there's no existing term in common
>use, folding would make sense to me.  So I have no objection to using
>'fold' consistently in the api and code for these operations.

Haven't we used 'splitting' as a term for this, at least internally, in
previous versions?  That's at least what I think of, and I do think we could
have two knows to control the different functionality:

- To 'split' a line means to take a line longer than a specified maximum, and
  make it fit into the maximum line length, splitting at whitespace or other
  semantic separators.

- To 'fill' a header means to take the logical contents of the header and
  recombine and resplit it so that each line is as close to the maximum line
  length as possible.  My analogy here is Emacs's M-q (fill-paragraph).

What then is "folding" or "wrapping"?  Maybe no different than the above.

>Can anyone see a use case for controlling folding of headers separately
>from folding of message bodies?  I haven't thought of one, which is why
>I'm thinking one policy knob controls both.

You might have a message body that contains code, in which case you might want
to fill the headers (using the terminology above), but not fill the body.

Cheers,
-Barry

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: header folding

Stephen J. Turnbull
In reply to this post by R. David Murray
R. David Murray writes:

 > Hmm.  Makes sense to me.  So you'd rather the method were called "fold"
 > and that refold_source remains the name of the policy control.

Yes.

 > What's the word for what is done when a text message is made to have
 > a line length of less than 78 by using quoted printable (or base64)
 > encoding?

RFC 2045 discusses "insertion of soft line breaks"; it doesn't mention
a term like "folding".  "Folding" seems like a good term to me,
though.  Note that the RFC 2045 definition of quoted-printable says
that physical line length MUST be 76 characters or less, including any
terminating = but not the CRLF pair that separates lines.

 > Can anyone see a use case for controlling folding of headers
 > separately from folding of message bodies?  I haven't thought of
 > one, which is why I'm thinking one policy knob controls both.

The RFCs' treatments differ somewhat.  RFC 5322 has both a MUST NOT
and a SHOULD NOT exceed limit on line length (998 and 78 characters,
not including the CRLF, respectively).  RFC 2045 quoted-printable has
only the MUST NOT limit of 76 (but the difference in limits is not a
big deal).

It's not clear to me what exactly the policy knob you're talking about
is for body text.  There is no policy really allowed if quoted-
printable is being used.  So the policy knob is whether to use
quoted-printable to limit physical line length?

The only reason I can think of for having separate controls is that
many MUAs mishandle quoted-printable in the body text.  Patches don't
apply, one-time-key URLs in links get broken and fail to be
recognized.  On the other hand, header-folding rarely has such
consequences in my experience.

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: header folding

R. David Murray
On Wed, 27 Jul 2011 16:18:36 +0900, "Stephen J. Turnbull" <[hidden email]> wrote:
> It's not clear to me what exactly the policy knob you're talking about
> is for body text.  There is no policy really allowed if quoted-
> printable is being used.  So the policy knob is whether to use
> quoted-printable to limit physical line length?

Well, I have *not* looked at this in detail yet.  By default nothing
is changed (refold_source='none').  My preliminary thought was that if
refold_source is 'long', and we come across a body that is wider than
the RFC limit (or if the application wants to reformat to a different
limit), we could reconstruct the body and refold it to the new limit.
Perhaps this is not practical/useful; as I say I haven't gotten there
yet :)

> The only reason I can think of for having separate controls is that
> many MUAs mishandle quoted-printable in the body text.  Patches don't
> apply, one-time-key URLs in links get broken and fail to be
> recognized.  On the other hand, header-folding rarely has such
> consequences in my experience.

That's an interesting point.  So perhaps I should rename the control
'header_source_refold'.  I hate making the name longer, but anything
less would be ambiguous, and I've already got other controls with long
names :(.  On the other hand, we could also provide a separate control
for whether or not quoted printable bodies in particular were folded,
and consider both controls when deciding what to do with a particular
quoted printable body.  I favor the latter at the moment.

--
R. David Murray           http://www.bitdance.com
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: header folding

Stephen J. Turnbull
R. David Murray writes:

 > That's an interesting point.  So perhaps I should rename the control
 > 'header_source_refold'.

I don't know have a strong opinion, but I tend to think it's
unnecessary.

 > On the other hand, we could also provide a separate control
 > for whether or not quoted printable bodies in particular were
 > folded,

If the body is already known to be quoted-printable, you don't really
have a choice.  Folding lines longer than 76 characters after
quoted-printable encoding is required by RFC 2045.  Of course you can
do more folding than necessary (eg, fold an 85-character line at 35
and 70 characters), but that doesn't seem very useful to me.

It seems to me that the policy question (if it exists) is "We have an
all-ASCII body with 'long lines'.  Shall we encode in quoted-printable
only for the purpose of folding the long lines?"
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: header folding

R. David Murray
On Wed, 27 Jul 2011 23:07:33 +0900, "Stephen J. Turnbull" <[hidden email]> wrote:

> R. David Murray writes:
>
>  > That's an interesting point.  So perhaps I should rename the control
>  > 'header_source_refold'.
>
> I don't know have a strong opinion, but I tend to think it's
> unnecessary.
>
>  > On the other hand, we could also provide a separate control
>  > for whether or not quoted printable bodies in particular were
>  > folded,
>
> If the body is already known to be quoted-printable, you don't really
> have a choice.  Folding lines longer than 76 characters after
> quoted-printable encoding is required by RFC 2045.  Of course you can

Right, I realized what I said didn't make sense after I hit send :)

> do more folding than necessary (eg, fold an 85-character line at 35
> and 70 characters), but that doesn't seem very useful to me.

Well, the use case I was thinking of was fixing up non-conformant output
from another MUA (quoted printable but with overlong lines).  I don't
know if such exists in the wild, but I would expect that it does,
everything else seems to :)  Still it may be a YAGNI, since any such
are most likely to be spammers.

> It seems to me that the policy question (if it exists) is "We have an
> all-ASCII body with 'long lines'.  Shall we encode in quoted-printable
> only for the purpose of folding the long lines?"

Yes, that would be a similar case:  we have a body that doesn't conform
to the "SHOULD" limit of 78; if refold_source is 'long', should we
use QP to fold it?  But this question also arises if the application
is attaching a text part with lines longer than 78 characters.  As you
suggested it might be the case that we don't want to QP encode such text.
That question, QP encoding only to fold text parts with long lines, thus
seems to be a separate policy control (and I do think we want one for it).
So if we have 'refold_source' set to 'long', an unencoded text part with
long lines would get QP encoded if and only if this new policy setting
that we haven't named yet is set to fold such parts using QP.

--
R. David Murray           http://www.bitdance.com
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: header folding

R. David Murray
In reply to this post by Barry Warsaw
On Tue, 26 Jul 2011 11:07:03 -0400, Barry Warsaw <[hidden email]> (by way of Barry Warsaw <[hidden email]>) wrote:

> On Jul 26, 2011, at 08:38 AM, R. David Murray wrote:
> >What's the word for what is done when a text message is made to have
> >a line length of less than 78 by using quoted printable (or base64)
> >encoding?  Is that also folding?  If there's no existing term in common
> >use, folding would make sense to me.  So I have no objection to using
> >'fold' consistently in the api and code for these operations.
>
> Haven't we used 'splitting' as a term for this, at least internally, in
> previous versions?  That's at least what I think of, and I do think we could
> have two knows to control the different functionality:

'split' and 'wrap' seem to be used somewhat interchangeably in the
current code and docs.  I'm now consistently using 'fold' in the new
code.

> - To 'split' a line means to take a line longer than a specified maximum, and
>   make it fit into the maximum line length, splitting at whitespace or other
>   semantic separators.

My current code doesn't do this anywhere.  The old code does.

> - To 'fill' a header means to take the logical contents of the header and
>   recombine and resplit it so that each line is as close to the maximum line
>   length as possible.  My analogy here is Emacs's M-q (fill-paragraph).

Neither my current code nor the old code does exactly this anywhere.

> What then is "folding" or "wrapping"?  Maybe no different than the above.

Folding is an RFC term-of-art that implies the specific RFC rules for
making sure a semantic unit (header, body) has lines that are shorter
than the RFC defined maximum length.

Wrapping is much more like your 'filling', but probably a less precise
term, as filling does imply maximizing line lengths, while wrapping
to my ears does not have that connotation as a requirement.

'refolding', as I've implemented it, consists of taking an existing folded
header, unfolding it, and then folding it according to the RFC rules and
recommendations.  This may or may not put the maximum possible number
of characters on a line, depending on whether the header is structured
or unstructured and the content of said header.  And it may or may not
exactly reproduce the original header, depending on how closely the
original folder and I agree on our interpretation of the RFC rules :)
(Which is why headers are only refolded by explicit request.)

So, I agree with Stephen, I think 'folding' is the correct term to
use here.

> >Can anyone see a use case for controlling folding of headers separately
> >from folding of message bodies?  I haven't thought of one, which is why
> >I'm thinking one policy knob controls both.
>
> You might have a message body that contains code, in which case you might want
> to fill the headers (using the terminology above), but not fill the body.

This is similar to the case we've already discussed, about excluding
a text body from being QP encoded.  I think we don't currently do
any paragraph reflow, but it might be an interesting facility to add :)

--
R. David Murray           http://www.bitdance.com
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: header folding

Barry Warsaw
On Jul 27, 2011, at 04:56 PM, R. David Murray wrote:

>Wrapping is much more like your 'filling', but probably a less precise
>term, as filling does imply maximizing line lengths, while wrapping
>to my ears does not have that connotation as a requirement.

Is it just the guarantee of maximizing line lengths that's missing?

>'refolding', as I've implemented it, consists of taking an existing folded
>header, unfolding it, and then folding it according to the RFC rules and
>recommendations.  This may or may not put the maximum possible number
>of characters on a line, depending on whether the header is structured
>or unstructured and the content of said header.  And it may or may not
>exactly reproduce the original header, depending on how closely the
>original folder and I agree on our interpretation of the RFC rules :)
>(Which is why headers are only refolded by explicit request.)
>
>So, I agree with Stephen, I think 'folding' is the correct term to
>use here.
Okay.  To me 'folding' is closer to 'splitting', while 'wrapping' is closer
'filling' since in what you describe above, there is an 'unfolding' operation
that happens first.  Note too that Emacs's filling doesn't guarantee maximal
line lengths (i.e. fill-column) either since long words can cause previous
lines to be shorter.  That seems analogous to your description above.

-Barry

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com

signature.asc (853 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: header folding

Stephen J. Turnbull
In reply to this post by Barry Warsaw
Barry Warsaw <[hidden email]> writes:

 > That's at least what I think of, and I do think we could
 > have two knows to control the different functionality:
 >
 > - To 'split' a line means to take a line longer than a specified maximum, and
 >   make it fit into the maximum line length, splitting at whitespace or other
 >   semantic separators.

In the case of headers, "folding" is hallowed usage (going back to at
least RFC 733), and is very precisely defined by RFC 5322.  If we are
going to do something non-RFC conformant (yeah, right, we might do
that, eh?), "splitting" would be better.  If our implementation is
intended to be conformant, I think "folding" is preferable both for
familiarity and ease of reference ("look it up in RFC 5322").

I think the generalization to bodies is reasonable, although I haven't
found any RFC usage of "folding" in that context in a quick look.

 > - To 'fill' a header means to take the logical contents of the
 > header and recombine and resplit it so that each line is as close
 > to the maximum line length as possible.  My analogy here is Emacs's
 > M-q (fill-paragraph).

 > What then is [...] "wrapping"?  Maybe no different than the above.

In my dialect, what you describe as "filling" is (at least
potentially) far more sophisticated than what I mean by "wrapping".
Wrapping moves forward through each line and at the maximum length
backtracks to the rightmost break point in the line, breaking there,
then continuing the process in the tail line.  This could and often in
my experience does result in very uneven lines.

However, I don't think we're talking about filling here.  Filling IMHO
should be implemented by the email module, but it should be called
explicitly by the client, not imposed internally on the basis of a
global policy.

Consider the following ugly header (which is somewhat unlikely to
actually appear in a real use case, although it could easily result
from cut-and-paste into an MUA's to field):

To: Amie Cawinski <[hidden email]>, Ichabod
 Tallman <[hidden email]>

(there is no trailing whitespace on either line).  IMO, there are two
plausible fillings (assuming a limit of 78 characters) here:

To: Amie Cawinski <[hidden email]>, Ichabod Tallman <[hidden email]>

and

To: Amie Cawinski <[hidden email]>,
    Ichabod Tallman <[hidden email]>

of which the second will be uglified by a RFC-5322-conformant
processor into:

To: Amie Cawinski <[hidden email]>,    Ichabod Tallman <[hidden email]>

(note the extra space after the comma).  I personally don't consider
either of

To: Amie Cawinski <[hidden email]>,
 Ichabod Tallman <[hidden email]>

To: Amie Cawinski <[hidden email]>,
<TAB>Ichabod Tallman <[hidden email]>

plausible as a presentation, but YMMV.  So filling (to me) is about
presentation, not protocol conformance.

Anyway, I don't see how we can justify making *these* choices for the
user on the basis of a policy that really is about conservative
compliance to a wire protocol standard.  For example, I personally do
not "fill" 81-character subject headers; it's just too ugly.  However,
I might want my mail program to conservatively "fold" them, especially
for certain correspondents known to be stuck behind weird MTAs or MUAs.

 > You might have a message body that contains code, in which case you
 > might want to fill the headers (using the terminology above), but
 > not fill the body.

That's another example of why control for filling has to be flexible
(and why IMHO filling should be called explicitly by the client).

However, if the receiving MUA is RFC 2045-conformant, the user cannot
tell that quoted-printable folding was used.
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: header folding

Glenn Linderman-3
In reply to this post by R. David Murray
On 7/26/2011 5:38 AM, R. David Murray wrote:
What's the word for what is done when a text message is made to have
a line length of less than 78 by using quoted printable (or base64)
encoding?  Is that also folding?  If there's no existing term in common
use, folding would make sense to me.  So I have no objection to using
'fold' consistently in the api and code for these operations.

To me, "fold" means to divide _a_ long line into multiple short lines (less than line length).  (Barry calls this split, it seems.)

To me, "wrap" means to divide and join as necessary a set of lines (sometimes/often a paragraph) to achieve some number of similar length lines, not to exceed a line length limit, with possibly a shorter one at the end.

To me, "fill" means to divide and join as necessary a set of lines (sometimes/often a paragraph) to use as few lines as possible without exceeding a line length limit, usually resulting in a shorter one at the end. (Barry seems to have this same definition.)

For all the above, all divisions and joinings happen at white space sequences, and white space sequences are considered irrelevant in composition, and are generally reduced to a single space or newline as a side effect.

I think that if these terms are defined in the RFCs, that those definitions should be preferred to mine.

Some set of definitions needs to be agreed upon, before sensible communication can be made about what various algorithms should actually do, and what policy settings might be named, and what algorithms they would invoke.

_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: header folding

Stephen J. Turnbull
Glenn Linderman writes:

 > To me, "wrap" means to divide and join as necessary a set of lines
 > (sometimes/often a paragraph) to achieve some number of similar length
 > lines, not to exceed a line length limit, with possibly a shorter one at
 > the end.

Typically such usage is in contexts where a paragraph is represented
as a single physical line, though.  Your "set" is not part of "wrap"
in my dialect.

 > I think that if these terms are defined in the RFCs, that those
 > definitions should be preferred to mine.

"Fold" is defined per RFC 5322.  The others don't seem to be.

I think "fold" should be used for the well-defined operation of header
folding (RFC 5322) and also for the well-defined operation of
"inserting a soft linebreak" in quoted-printable bodies (RFC 2045).
I'm happy with whatever usage others prefer for the other operations.
_______________________________________________
Email-SIG mailing list
[hidden email]
Your options: http://mail.python.org/mailman/options/email-sig/lists%40nabble.com
Loading...