2011-03-28 Policy Framework First Draft
Last week turned out to be mostly about tests and bugs. As per my last
post, I moved the tests into a test package. Then I went on to add a
bunch of `additional tests`_ developed by Michael Henry at the PyCon sprints.
More tests are always good before starting to modify code, right?
Michael's tests had revealed a couple bugs, though, so I then went on to
apply the `fix`_ for those bugs, which included a `rewritten algorithm`_
for encoding strings as quoted printable. I adapted the algorithm
proposed by Michael, then discovered a different and probably `better
algorithm`_ had already been proposed a while back and gotten lost in the
tracker. That proposed patch was against the email package in Python2,
though, and the corresponding code in Python3 has a different interface,
so the patch wasn't easily adapted. Since there are other changes
that need to be made to the quoted printable encoder, I have deferred
implementing the better algorithm until I get as far as touching that
code for the email6 work.
There was also a `bug`_ in the Email5 API that I wanted to fix before
starting to make API changes. When you deal with "dirty" headers in
Email5.1, you may get back a ``Header`` object when querying a header.
Now, the normal way to deal with crazy headers in Email5 is to pass them
to ``decode_header`` to get the pairs of character sets and original bytes
from the wire out. But ``decode_header`` wasn't accepting a ``Header``
object for ``decoding``. My first approach was to try shifting back to
returning strings even when the header was "dirty", by wrapping them up
in encoded words with the ``unknown-8bit`` charset. That more or less
worked, but doing it that way would mean making some other changes
to methods such as ``get_param`` to handle headers that had gotten
re-encoded into encoded words. This was far from optimal. The reporter
of the bug pointed out that I had carefully documented that ``Message``
would return a ``Header`` if the source header had unencoded non-ASCII
bytes in it, which made changing this behavior in a bug fix release
a non-starter. So I gave in and just fixed ``decode_header`` to handle
``Header`` objects. Since *all* headers in email6 will be a (new type of)
``Header`` object, programmers may as well get used to dealing with them.
For email6 itself, there is now a `feature branch`_ where I will do
the patch development for email6 before applying the changes to the
main cpython repository. The branch is named ``email6``, of course.
Anyone may browse or clone this repository to take a look at the current
state of development.
And that current state is that I have checked in the first draft of
the Policy framework. This consists of a new module, `policy.py`_,
the associated documentation, `policy.rst`_, and a set of tests,
The basic idea is that a ``Policy`` object is an immutable container
for a bunch of attributes and callback hooks. You can call a ``Policy``
object to get a new one with some of the defaults changed. And you can
add them together, with the non-default settings from the right operand
overriding those from the left operand.
So far we have policies such as:
*default* may get renamed *email6*. I'd prefer 'default', since that's
what I'd like it to be by the time we get to Python 3.4. The actual
default policy when I start adding the parameter to other classes and
functions will be *email5*, though, so the name *default* for email6 is
probably not going to work.
The *SMTP* policy is just like default, but generates "wire format" line
separators (``\r\n``). *HTML* is like *SMTP*, but does not wrap headers.
*Strict* sets a flag that will (once I implement it) cause the parser to
raise errors when it encounters defects instead of just keeping track
of them. Using *Strict* is where you can see the utility of adding
>>> StrictSMTP = SMTP + Strict
You could use StrictSMTP to parse an incoming SMTP message where you
wanted your program to blow up if the message was invalid. (When would
you ever want that? I don't know, but someone probably will!).
So far I've only defined one hook, ``register_defect``. You could
subclass ``Policy`` and define your own ``register_defect`` method that
would, say, log all defects to a log file, thus giving you some idea of
the quality of the email being processed by your program, even if you
did nothing else with the defect info.