Possible change to logging.handlers.SysLogHandler

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible change to logging.handlers.SysLogHandler

Vinay Sajip
There is a problem with the way logging.handlers.SysLogHandler works
when presented with Unicode messages. According to RFC 5424, Unicode
is supposed to be sent encoded as UTF-8 and preceded by a BOM.
However, the current handler implementation puts the BOM at the start
of the formatted message, and this is wrong in scenarios where you
want to put some additional structured data in front of the
unstructured message part; the BOM is supposed to go after the
structured part (which, therefore, has to be ASCII) and before the
unstructured part. In that scenario, the handler's current behaviour
does not strictly conform to RFC 5424.

The issue is described in [1]. The BOM was originally added / position
changed in response to [2] and [3].

It is not possible to achieve conformance with the current
implementation of the handler, unless you subclass the handler and
override the whole emit() method. This is not ideal. For 3.3, I will
refactor the implementation to expose a method which creates the byte
string which is sent over the wire to the syslog daemon. This method
can then be overridden for specific use cases where needed.

However, for 2.7 and 3.2, removing the BOM insertion would bring the
implementation into conformance to the RFC, though the entire message
would have to be regarded as just a set of octets. A Unicode message
would still be encoded using UTF-8, but the BOM would be left out.

I am thinking of removing the BOM insertion in 2.7 and 3.2 - although
it is a change in behaviour, the current behaviour does seem broken
with regard to RFC 5424 conformance. However, as some might disagree
with that assessment and view it as a backwards-incompatible behaviour
change, I thought I should post this to get some opinions about
whether this change is viewed as objectionable.

Regards,

Vinay Sajip

[1] http://bugs.python.org/issue14452
[2] http://bugs.python.org/issue7077
[3] http://bugs.python.org/issue8795
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Possible change to logging.handlers.SysLogHandler

Gregory P. Smith-3


On Fri, Apr 6, 2012 at 1:06 PM, Vinay Sajip <[hidden email]> wrote:
There is a problem with the way logging.handlers.SysLogHandler works
when presented with Unicode messages. According to RFC 5424, Unicode
is supposed to be sent encoded as UTF-8 and preceded by a BOM.
However, the current handler implementation puts the BOM at the start
of the formatted message, and this is wrong in scenarios where you
want to put some additional structured data in front of the
unstructured message part; the BOM is supposed to go after the
structured part (which, therefore, has to be ASCII) and before the
unstructured part. In that scenario, the handler's current behaviour
does not strictly conform to RFC 5424.

The issue is described in [1]. The BOM was originally added / position
changed in response to [2] and [3].

It is not possible to achieve conformance with the current
implementation of the handler, unless you subclass the handler and
override the whole emit() method. This is not ideal. For 3.3, I will
refactor the implementation to expose a method which creates the byte
string which is sent over the wire to the syslog daemon. This method
can then be overridden for specific use cases where needed.

However, for 2.7 and 3.2, removing the BOM insertion would bring the
implementation into conformance to the RFC, though the entire message
would have to be regarded as just a set of octets. A Unicode message
would still be encoded using UTF-8, but the BOM would be left out.

I am thinking of removing the BOM insertion in 2.7 and 3.2 - although
it is a change in behaviour, the current behaviour does seem broken
with regard to RFC 5424 conformance. However, as some might disagree
with that assessment and view it as a backwards-incompatible behaviour
change, I thought I should post this to get some opinions about
whether this change is viewed as objectionable.

Given the existing brokenness I personally think that removing the BOM insertion (because it is incorrect) in 2.7 and 3.2 is fine if you cannot find a way to make it correct in 2.7 and 3.2 without breaking existing APIs.

could a private method to create the byte string not be added and used in 2.7 and 3.2 that correctly add the BOM?


Regards,

Vinay Sajip

[1] http://bugs.python.org/issue14452
[2] http://bugs.python.org/issue7077
[3] http://bugs.python.org/issue8795
_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/greg%40krypto.org


_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Possible change to logging.handlers.SysLogHandler

Vinay Sajip
Gregory P. Smith <greg <at> krypto.org> writes:

> Given the existing brokenness I personally think that removing the BOM
> insertion (because it is incorrect) in 2.7 and 3.2 is fine if you cannot find
> a way to make it correct in 2.7 and 3.2 without breaking existing APIs.

Thanks for the feedback.
 
> could a private method to create the byte string not be added and used in 2.7
> and 3.2 that correctly add the BOM?

The problem is that given a format string, the code would not know where to
insert the BOM. According to the RFC, it's supposed to go just before the
unstructured message part, but that's format-string and hence
application-dependent. So some new API will need to be exposed, though I haven't
thought through exactly what that will be (for example, it could be a new
place-holder for the BOM in the format-string, or some new public methods which
are meant to be overridden and so not private).

Regards,

Vinay Sajip

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Possible change to logging.handlers.SysLogHandler

Vinay Sajip
In reply to this post by Gregory P. Smith-3
Gregory P. Smith <greg <at> krypto.org> writes:

> Given the existing brokenness I personally think that removing the BOM
insertion (because it is incorrect) in 2.7 and 3.2 is fine if you cannot find a
way to make it correct in 2.7 and 3.2 without breaking existing APIs.

I have an idea for a change which won't require changing any public APIs; though
it does change the behaviour so that BOM insertion doesn't happen any more,
anyone who needs a BOM can have it by a simple update to their format string.
The idea is outlined here:

http://bugs.python.org/issue14452#msg158030

Comments would be appreciated.

Regards,

Vinay Sajip

_______________________________________________
Python-Dev mailing list
[hidden email]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%2B1324100855712-1801473%40n6.nabble.com