urllib.unquote in paste.httpserver prevents slashes in path segments

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

urllib.unquote in paste.httpserver prevents slashes in path segments

Florian Friesdorf-2

I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not
urllib.unquote the path [1] before setting it in the wsgi environment
[2]. The only pre-processing performed on the path between [1] and [2]
is concerned with slashes '/'. By urllib.unquoting it is not possible to
have urllib.quoted slashes within one path segment.

At least pyramid without routing fully relies on
``environ['PATH_INFO']`` [3]; by commenting [1] I succeeded to have
slashes in path segments, they are handle by pyramid in [4]f.

However, webob.request.BaseRequest would need to be adjusted wherever
PATH_INFO from the environment is used (e.g [5]).

Reasoning: The path stored in environ['PATH_INFO'] is still a path,
therefore it must not be urllib.unquoted, the unquoting must happen
after the path is split up in segments ([4]).

[1] https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-180
[2] https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-217
[3] https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L594
[4] https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L495
[5] https://bitbucket.org/ianb/webob/src/c0bb5309cfca/webob/request.py#cl-265

--
Florian Friesdorf <[hidden email]>
  GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
Jabber/XMPP: [hidden email]
IRC: chaoflow on freenode,ircnet,blafasel,OFTC

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com

attachment0 (851 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: urllib.unquote in paste.httpserver prevents slashes in path segments

ianb
It's implied by WSGI itself that the path be unquoted; there's no fix short of changing the specification.


On Thu, Mar 17, 2011 at 1:10 PM, Florian Friesdorf <[hidden email]> wrote:

I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not
urllib.unquote the path [1] before setting it in the wsgi environment
[2]. The only pre-processing performed on the path between [1] and [2]
is concerned with slashes '/'. By urllib.unquoting it is not possible to
have urllib.quoted slashes within one path segment.

At least pyramid without routing fully relies on
``environ['PATH_INFO']`` [3]; by commenting [1] I succeeded to have
slashes in path segments, they are handle by pyramid in [4]f.

However, webob.request.BaseRequest would need to be adjusted wherever
PATH_INFO from the environment is used (e.g [5]).

Reasoning: The path stored in environ['PATH_INFO'] is still a path,
therefore it must not be urllib.unquoted, the unquoting must happen
after the path is split up in segments ([4]).

[1] https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-180
[2] https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-217
[3] https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L594
[4] https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L495
[5] https://bitbucket.org/ianb/webob/src/c0bb5309cfca/webob/request.py#cl-265

--
Florian Friesdorf <[hidden email]>
 GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
Jabber/XMPP: [hidden email]
IRC: chaoflow on freenode,ircnet,blafasel,OFTC

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com



_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: urllib.unquote in paste.httpserver prevents slashes in path segments

and-py
In reply to this post by Florian Friesdorf-2
On Thu, 2011-03-17 at 19:10 +0100, Florian Friesdorf wrote:
> I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not
> urllib.unquote the path before setting it in the wsgi environment

I'm afraid it must. This is something the WSGI specification inherits
from CGI.

Yes, it was a terrible decision to have SCRIPT_NAME and PATH_INFO
automatically unescaped, as it loses the distinction between ‘%2F’ and
‘/’, and has resulted in endless problems with non-ASCII characters that
could otherwise been handled perfectly well as %-sequences.

But that decision was taken a couple of decades ago and there's not
really much we can do about it now. CGI may be an anachronism, but it is
still widely used and its assumptions are still felt through Apache, IIS
and WSGI.

> By urllib.unquoting it is not possible to
> have urllib.quoted slashes within one path segment.

Correct. And neither Apache nor IIS allows %2F to be used within a path
segment either, so really if you want to write a portable web app you
simply have to avoid them (along with %00 and %5C). It is not currently
practical to include any arbitrary byte sequence in a URL path segment,
even though by the URL specification you should be able to.

It's annoying, it's inelegant, it's limiting. But none of our attempts
to extend or replace it for non-CGI-based servers (see past list
discussion on path-info-raw or standardising REQUEST_URI) have come to
any acceptable conclusion. We are stuck with it for the foreseeable.

--
And Clover
mailto:[hidden email]
http://www.doxdesk.com
gtalk:chat?jid=[hidden email]

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: urllib.unquote in paste.httpserver prevents slashes in path segments

ianb
I'll just add that *if* you can design your URL space (you didn't just inherit one), and you want to distinguish path segments from values that contain '/', you can use URLs like:
  /item/{some/value}/view

And then use the matching {}'s to figure out that "some/value" is one path segment.  This makes it possible, for instance, to use GData (where XML namespaces can show up in the URL, and they contain /'s, but they need to be treated as a single value).  It's not perfect, but it does work.


On Thu, Mar 17, 2011 at 4:02 PM, And Clover <[hidden email]> wrote:
On Thu, 2011-03-17 at 19:10 +0100, Florian Friesdorf wrote:
> I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not
> urllib.unquote the path before setting it in the wsgi environment

I'm afraid it must. This is something the WSGI specification inherits
from CGI.

Yes, it was a terrible decision to have SCRIPT_NAME and PATH_INFO
automatically unescaped, as it loses the distinction between ‘%2F’ and
‘/’, and has resulted in endless problems with non-ASCII characters that
could otherwise been handled perfectly well as %-sequences.

But that decision was taken a couple of decades ago and there's not
really much we can do about it now. CGI may be an anachronism, but it is
still widely used and its assumptions are still felt through Apache, IIS
and WSGI.

> By urllib.unquoting it is not possible to
> have urllib.quoted slashes within one path segment.

Correct. And neither Apache nor IIS allows %2F to be used within a path
segment either, so really if you want to write a portable web app you
simply have to avoid them (along with %00 and %5C). It is not currently
practical to include any arbitrary byte sequence in a URL path segment,
even though by the URL specification you should be able to.

It's annoying, it's inelegant, it's limiting. But none of our attempts
to extend or replace it for non-CGI-based servers (see past list
discussion on path-info-raw or standardising REQUEST_URI) have come to
any acceptable conclusion. We are stuck with it for the foreseeable.

--
And Clover
mailto:[hidden email]
http://www.doxdesk.com
gtalk:chat?jid=[hidden email]

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com


_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: urllib.unquote in paste.httpserver prevents slashes in path segments

Florian Friesdorf-2
In reply to this post by ianb
On Thu, 17 Mar 2011 15:10:56 -0500, Ian Bicking <[hidden email]> wrote:
> It's implied by WSGI itself that the path be unquoted; there's no fix short
> of changing the specification.

What is WSGI's solution for path segments containing slashes?

> On Thu, Mar 17, 2011 at 1:10 PM, Florian Friesdorf <[hidden email]> wrote:
>
> >
> > I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not
> > urllib.unquote the path [1] before setting it in the wsgi environment
> > [2]. The only pre-processing performed on the path between [1] and [2]
> > is concerned with slashes '/'. By urllib.unquoting it is not possible to
> > have urllib.quoted slashes within one path segment.
> >
> > At least pyramid without routing fully relies on
> > ``environ['PATH_INFO']`` [3]; by commenting [1] I succeeded to have
> > slashes in path segments, they are handle by pyramid in [4]f.
> >
> > However, webob.request.BaseRequest would need to be adjusted wherever
> > PATH_INFO from the environment is used (e.g [5]).
> >
> > Reasoning: The path stored in environ['PATH_INFO'] is still a path,
> > therefore it must not be urllib.unquoted, the unquoting must happen
> > after the path is split up in segments ([4]).
> >
> > [1]
> > https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-180
> > [2]
> > https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-217
> > [3]
> > https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L594
> > [4]
> > https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L495
> > [5]
> > https://bitbucket.org/ianb/webob/src/c0bb5309cfca/webob/request.py#cl-265
> >
> > --
> > Florian Friesdorf <[hidden email]>
> >  GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
> > Jabber/XMPP: [hidden email]
> > IRC: chaoflow on freenode,ircnet,blafasel,OFTC
> >
> > _______________________________________________
> > Web-SIG mailing list
> > [hidden email]
> > Web SIG: http://www.python.org/sigs/web-sig
> > Unsubscribe:
> > http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com
> >
> >
Non-text part: text/html

--
Florian Friesdorf <[hidden email]>
  GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
Jabber/XMPP: [hidden email]
IRC: chaoflow on freenode,ircnet,blafasel,OFTC

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com

attachment0 (851 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: urllib.unquote in paste.httpserver prevents slashes in path segments

Florian Friesdorf-2
On Fri, 18 Mar 2011 10:36:27 +0100, Florian Friesdorf <[hidden email]> wrote:
> On Thu, 17 Mar 2011 15:10:56 -0500, Ian Bicking <[hidden email]> wrote:
> > It's implied by WSGI itself that the path be unquoted; there's no fix short
> > of changing the specification.
>
> What is WSGI's solution for path segments containing slashes?

Please ignore this post - mail client played tricks on me and I did not
see your further postings before writing this.

> > On Thu, Mar 17, 2011 at 1:10 PM, Florian Friesdorf <[hidden email]> wrote:
> >
> > >
> > > I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not
> > > urllib.unquote the path [1] before setting it in the wsgi environment
> > > [2]. The only pre-processing performed on the path between [1] and [2]
> > > is concerned with slashes '/'. By urllib.unquoting it is not possible to
> > > have urllib.quoted slashes within one path segment.
> > >
> > > At least pyramid without routing fully relies on
> > > ``environ['PATH_INFO']`` [3]; by commenting [1] I succeeded to have
> > > slashes in path segments, they are handle by pyramid in [4]f.
> > >
> > > However, webob.request.BaseRequest would need to be adjusted wherever
> > > PATH_INFO from the environment is used (e.g [5]).
> > >
> > > Reasoning: The path stored in environ['PATH_INFO'] is still a path,
> > > therefore it must not be urllib.unquoted, the unquoting must happen
> > > after the path is split up in segments ([4]).
> > >
> > > [1]
> > > https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-180
> > > [2]
> > > https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-217
> > > [3]
> > > https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L594
> > > [4]
> > > https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L495
> > > [5]
> > > https://bitbucket.org/ianb/webob/src/c0bb5309cfca/webob/request.py#cl-265
> > >
> > > --
> > > Florian Friesdorf <[hidden email]>
> > >  GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
> > > Jabber/XMPP: [hidden email]
> > > IRC: chaoflow on freenode,ircnet,blafasel,OFTC
> > >
> > > _______________________________________________
> > > Web-SIG mailing list
> > > [hidden email]
> > > Web SIG: http://www.python.org/sigs/web-sig
> > > Unsubscribe:
> > > http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com
> > >
> > >
> Non-text part: text/html
>
> --
> Florian Friesdorf <[hidden email]>
>   GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
> Jabber/XMPP: [hidden email]
> IRC: chaoflow on freenode,ircnet,blafasel,OFTC
Non-text part: application/pgp-signature

--
Florian Friesdorf <[hidden email]>
  GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
Jabber/XMPP: [hidden email]
IRC: chaoflow on freenode,ircnet,blafasel,OFTC

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com

attachment0 (851 bytes) Download Attachment