Re: bytes / unicode

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: bytes / unicode

Antoine Pitrou
On Wed, 23 Jun 2010 14:23:33 -0400
Tres Seaver <[hidden email]> wrote:
>
> Perhaps such decisions need revisiting in light of subsequent experience
> / pain / learning.  E.g:
>
> - - the repeated inability of the web-sig to converge on appropriate
>   semantics for a Python3-compatible version of the WSGI spec;
>
> - - the subsequent quirkiness of the Python3 wsgiref implementation;

The way wsgiref was adapted is admittedly suboptimal. It was totally
broken at first, and PJE didn't want to look very deeply into it. We
therefore had to settle on a series of small modifications that seemed
rather reasonable, but without any in-depth discussion of what WSGI had
to look like under Python 3 (since it was not our job and responsibility).

Therefore, I don't think wsgiref should be taken as a guide to what
a cleaned up, Python 3-specific WSGI must look like.

> - - the slow adoption / porting rate of major web frameworks and libraries
>   to Python 3.

Some of the major web frameworks and libraries have a ton of
dependencies, which would explain why they really haven't bothered yet.

I don't think you can't claim, though, that Python 3 makes things
significantly harder for these frameworks. The proof is that many of
them already give the user unicode strings in Python 2.x. They must
have somehow got the decoding right.

Regards

Antoine.


_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: bytes / unicode

Henry Precheur
On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote:
> I don't think you can't claim, though, that Python 3 makes things
> significantly harder for these frameworks. The proof is that many of
> them already give the user unicode strings in Python 2.x. They must
> have somehow got the decoding right.

Well... Frameworks usually 'simplify' the problem by partly ignoring it.
By default they assume the data in the request in UTF-8. You can specify
an alternative encoding in most of them. Django [1], Werkzeug [2], and
WebOb [3] do that.

The problem with this approach is that you still have to deal with weird
requests where one thing is unicode, and another is latin-1. Sometime
you can even have 2 different encodings in a single header like Cookies.
There's no solution to this problem, it has to be solved on a case by
case basis.

There was a big discussion a while ago on web-sig. I think the consensus
was that WSGI for Python 3 should assume that the data is encoded in
latin-1 since it's the default encoding according to the RFC.


[1] http://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.encoding
[2] http://werkzeug.pocoo.org/documentation/dev/unicode.html#request-and-response-objects
[3] http://pythonpaste.org/webob/reference.html#unicode-variables

--
  Henry PrĂȘcheur
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com