Quantcast

Move to bless Graham's WSGI 1.1 as official spec

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Move to bless Graham's WSGI 1.1 as official spec

and-py
Manlio Perillo wrote:

> Words of *TEXT MAY contain characters from character sets other than
> ISO-8859-1 [22] only when encoded according to the rules of RFC 2047

Yeah, this is, unfortunately, a lie. The rules of RFC 2047 apply only to
RFC*822-family 'atoms' and not elsewhere; indeed, RFC2047 itself
specifically denies that an encoded-word can go in a quoted-string.

RFC2047 encoded-words are not on-topic in an HTTP header(*); this has
been confirmed by newer development work on HTTPbis by Reschke et al.
(http://tools.ietf.org/wg/httpbis/).

The "correct" way of escaping header parameters in an RFC*822-family
protocol would be RFC2231's complex encoding scheme, but HTTP is
explicitly not an 822-family protocol despite sharing many of the same
constructs. See
http://tools.ietf.org/html/draft-reschke-rfc2231-in-http-06 for a
strategy for how 2231 should interact with HTTP, but note that for now
RFC2231-in-HTTP simply does not exist in any deployed tools.

So for now there is basically nothing useful WSGI can do other than
provide direct, byte-oriented (even if wrapped in 8859-1 unicode
strings) access to headers.

--
And Clover
mailto:[hidden email]
http://www.doxdesk.com/

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Move to bless Graham's WSGI 1.1 as official spec

Manlio Perillo-3
In reply to this post by Henry Precheur
Henry Precheur ha scritto:

> On Thu, Dec 03, 2009 at 09:15:06PM +0100, Manlio Perillo wrote:
>> There is something that I don't understand.
>>
>> Some HTTP headers, like Accept-Language, contains data described as
>> `token`, where:
>>
>> token          = 1*<any CHAR except CTLs or separators>
>>
>> So a token, IMHO, is an opaque string, and it SHOULD not decoded.
>> In Python 3.x it SHOULD be a byte string.
>
> I think this is more an issue that frameworks should deal with. By
> decoding every headers value to latin-1:
>
> * It keeps WSGI simple. Simple is good.
>

It is just as simple as using byte strings, IMHO.
It is not simple, it is convenient because of (if I understand
correctly) how code is converted by 2to3.

> * WSGI sticks to what RFC 2616 (Hypertext Transfer Protocol -- HTTP/1.1)
>   says. WSGI is about HTTP, but that doesn't necessarily includes all
>   other standards extending HTTP.
>

HTTP never says to consided whole headers as latin-1 text, IMHO.

> * It's possible to convert latin-1 strings to bytes without losing data.
>

Yes, but it is quite stupid to first convert to Unicode and then convert
again to byte string.

It it true, however, that this does not happen often; but only for:

- WSGI applications that implement an HTTP proxy
- WSGI applications that needs to support HTTP Digest Authentication
- WSGI applications that store encoded data in cookies


Regards  Manlio
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Move to bless Graham's WSGI 1.1 as official spec

Manlio Perillo-3
In reply to this post by and-py
And Clover ha scritto:

> Manlio Perillo wrote:
>
>> Words of *TEXT MAY contain characters from character sets other than
>> ISO-8859-1 [22] only when encoded according to the rules of RFC 2047
>
> Yeah, this is, unfortunately, a lie. The rules of RFC 2047 apply only to
> RFC*822-family 'atoms' and not elsewhere; indeed, RFC2047 itself
> specifically denies that an encoded-word can go in a quoted-string.
>
> RFC2047 encoded-words are not on-topic in an HTTP header(*); this has
> been confirmed by newer development work on HTTPbis by Reschke et al.
> (http://tools.ietf.org/wg/httpbis/).
>

Thanks.
HTTPbis seems to fix all these problems:

"Historically, HTTP has allowed field content with text in the ISO-
8859-1 [ISO-8859-1] character encoding and supported other character
sets only through use of [RFC2047] encoding.  In practice, most HTTP
header field values use only a subset of the US-ASCII character
encoding [USASCII].  Newly defined header fields SHOULD limit their
field values to US-ASCII characters.  Recipients SHOULD treat other
(obs-text) octets in field content as opaque data."


This is the new rule for `quoted-string`:

quoted-string  = DQUOTE *( qdtext / quoted-pair ) DQUOTE
qdtext         = OWS / %x21 / %x23-5B / %x5D-7E / obs-text
               ; OWS / <VCHAR except DQUOTE and "\"> / obs-text
obs-text       = %x80-FF

quoted-pair    = "\" ( WSP / VCHAR / obs-text )


> The "correct" way of escaping header parameters in an RFC*822-family
> protocol would be RFC2231's complex encoding scheme, but HTTP is
> explicitly not an 822-family protocol despite sharing many of the same
> constructs. See
> http://tools.ietf.org/html/draft-reschke-rfc2231-in-http-06 for a
> strategy for how 2231 should interact with HTTP, but note that for now
> RFC2231-in-HTTP simply does not exist in any deployed tools.
>

It seems reasonable.

> So for now there is basically nothing useful WSGI can do other than
> provide direct, byte-oriented (even if wrapped in 8859-1 unicode
> strings) access to headers.
>

Yes, this is what I think.
I have some doubts about wrapping the headers in 8859-1 unicode strings,
but luckily there is surrogateescape.



Regards  Manlio
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Move to bless Graham's WSGI 1.1 as official spec

Henry Precheur
In reply to this post by Manlio Perillo-3
On Fri, Dec 04, 2009 at 10:17:09AM +0100, Manlio Perillo wrote:
> It is just as simple as using byte strings, IMHO.

No, it's not. There were lots of dicussions regarding this on the
mailing list. One of the main issue is that the standard library
supports bytes poorly. urllib for example expects strings not bytes.

> > * WSGI sticks to what RFC 2616 (Hypertext Transfer Protocol -- HTTP/1.1)
> >   says. WSGI is about HTTP, but that doesn't necessarily includes all
> >   other standards extending HTTP.
> >
>
> HTTP never says to consided whole headers as latin-1 text, IMHO.

It does:

  When no explicit charset parameter is provided by the sender, media
  subtypes of the "text" type are defined to have a default charset value
  of "ISO-8859-1" when received via HTTP.

  http://tools.ietf.org/html/rfc2616#section-3.7.1

> Yes, but it is quite stupid to first convert to Unicode and then convert
> again to byte string.

99% of the time latin-1 will work. And converting from Unicode to bytes
is not costly.

6 months ago I was a big fan of bytes, but bytes create more problems
than they solve.

--
  Henry PrĂȘcheur
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Move to bless Graham's WSGI 1.1 as official spec

Manlio Perillo-3
Henry Precheur ha scritto:
> On Fri, Dec 04, 2009 at 10:17:09AM +0100, Manlio Perillo wrote:
>> It is just as simple as using byte strings, IMHO.
>
> No, it's not. There were lots of dicussions regarding this on the
> mailing list. One of the main issue is that the standard library
> supports bytes poorly. urllib for example expects strings not bytes.
>

I read last month discussions 3 day ago!
The quote function supports byte strings, as an example.

What are the functions that does not works with byte strings?

>>> * WSGI sticks to what RFC 2616 (Hypertext Transfer Protocol -- HTTP/1.1)
>>>   says. WSGI is about HTTP, but that doesn't necessarily includes all
>>>   other standards extending HTTP.
>>>
>> HTTP never says to consided whole headers as latin-1 text, IMHO.
>
> It does:
>
>   When no explicit charset parameter is provided by the sender, media
>   subtypes of the "text" type are defined to have a default charset value
>   of "ISO-8859-1" when received via HTTP.
>
>   http://tools.ietf.org/html/rfc2616#section-3.7.1
>

This is not correct.

First of all, HTTP never says that whole headers are of type TEXT.
Only specific components are of type TEXT.

Moreover, HTTPbis has finally clarified this; TEXT is no more used,
instead non ascii characters are to be considered opaque.

Do you really want to define the new WSGI specification to be "against"
the new (possible) HTTP spec?

Of course it will work; but since some code in the standard library
needs to be fixed (the wsgiref.util.application_uri, as an example),
maybe it is better to fix it to work with byte strings.

Just my two cents.

> [...]


Regards  Manlio
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Move to bless Graham's WSGI 1.1 as official spec

Henry Precheur
On Fri, Dec 04, 2009 at 07:40:55PM +0100, Manlio Perillo wrote:
> What are the functions that does not works with byte strings?

Just to make things clear, I was talking about Python 3.

All the functions I tried not ending with _from_bytes raise an exception
with bytes. This includes urllib.parse.parse_qs & urllib.parse.urlparse
which are rather critical ...

> First of all, HTTP never says that whole headers are of type TEXT.
> Only specific components are of type TEXT.

If parts of a header contain latin-1 characters, that means its
encoding is latin-1 (at least partially).

> Moreover, HTTPbis has finally clarified this; TEXT is no more used,
> instead non ascii characters are to be considered opaque.

Yes, but the HTTPbis draft also says:

   Historically, HTTP has allowed field content with text in the
   ISO-8859-1 character encoding.

And WSGI is not about HTTP in a distant future, it's about HTTP right
now.

> Do you really want to define the new WSGI specification to be "against"
> the new (possible) HTTP spec?

I don't know why it would be "against" it. WSGI aims to handle HTTP in
the real world. Just because the HTTPbis spec is released wont take all
the garbage out of the web. There will still be latin-1 strings in
headers passed around for the next 10 years.

--
  Henry PrĂȘcheur
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Move to bless Graham's WSGI 1.1 as official spec

Manlio Perillo-3
Henry Precheur ha scritto:
> On Fri, Dec 04, 2009 at 07:40:55PM +0100, Manlio Perillo wrote:
>> What are the functions that does not works with byte strings?
>
> Just to make things clear, I was talking about Python 3.
>

I know.

Unfortunately I don't have installed Python 3, I'm just reading the code.

> All the functions I tried not ending with _from_bytes raise an exception
> with bytes. This includes urllib.parse.parse_qs & urllib.parse.urlparse
> which are rather critical ...
>

Ah, ok.
Can you show me the traceback of parse_qs? Thanks.


>> First of all, HTTP never says that whole headers are of type TEXT.
>> Only specific components are of type TEXT.
>
> If parts of a header contain latin-1 characters, that means its
> encoding is latin-1 (at least partially).
>

This is not completely true.

> [...]

> And WSGI is not about HTTP in a distant future, it's about HTTP right
> now.
>
>> Do you really want to define the new WSGI specification to be "against"
>> the new (possible) HTTP spec?
>
> I don't know why it would be "against" it.

Well, I have quoted it for this reason.
What I mean is that, IMHO:

- Using Unicode strings in WSGI is an abuse of Unicode string
- This abuse is not justified by the HTTP spec


> [...]


Regards  Manlio
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Move to bless Graham's WSGI 1.1 as official spec

Malthe Borch-2
In reply to this post by and-py
On 12/4/09 12:50 AM, And Clover wrote:
> So for now there is basically nothing useful WSGI can do other than
> provide direct, byte-oriented (even if wrapped in 8859-1 unicode
> strings) access to headers.

You could argue that this is perhaps a good reason to replace
``environ`` with something that interprets the headers according to how
HTTP is actually used in the real world.

It may be that WSGI should use bytes everywhere and the recommended
usage would be via a decorator (which could cache computations on the
environ dictionary):

e.g. the raw application handler versus one decorated with an imaginary
``webob`` function.

   def app(environ, start_response):
       ...

   @webob
   def app(request):
       ...

It is often said that WSGI should be practical, but in actual usage, I
think most developers use a request/response abstraction layer.

Middlewares are usually shrink-wrapped library code that could handle a
bytes-based environ dict (they'd have to explicitly decode the headers
of interest).

\malthe

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
12
Loading...