py3k, cgi, email, and form-data

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

py3k, cgi, email, and form-data

Robert Brewer-4
py3k, cgi, email, and form-data

There's a major change in functionality in the cgi module between Python
2 and Python 3 which I've just run across: the behavior of
FieldStorage.read_multi, specifically when an HTTP app accepts a file
upload within a multipart/form-data payload.

In Python 2, each part would be read in sequence within its own
FieldStorage instance. This allowed file uploads to be shunted to a
TemporaryFile (via make_file) as needed:

    klass = self.FieldStorageClass or self.__class__
    part = klass(self.fp, {}, ib,
                 environ, keep_blank_values, strict_parsing)
    # Throw first part away
    while not part.done:
        headers = rfc822.Message(self.fp)
        part = klass(self.fp, headers, ib,
                     environ, keep_blank_values, strict_parsing)
        self.list.append(part)

In Python 3 (svn revision 72466), the whole request body is read into
memory first via fp.read(), and then broken into separate parts in a
second step:

    klass = self.FieldStorageClass or self.__class__
    parser = email.parser.FeedParser()
    # Create bogus content-type header for proper multipart parsing
    parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib))
    parser.feed(self.fp.read())
    full_msg = parser.close()
    # Get subparts
    msgs = full_msg.get_payload()
    for msg in msgs:
        fp = StringIO(msg.get_payload())
        part = klass(fp, msg, ib, environ, keep_blank_values,
                     strict_parsing)
        self.list.append(part)

This makes the cgi module in Python 3 somewhat crippled for handling
multipart/form-data file uploads of any significant size (and since
the client is the one determining the size, opens a server up for an
unexpected Denial of Service vector).

I *think* the FeedParser is designed to accept incremental writes,
but I haven't yet found a way to do any kind of incremental reads
from it in order to shunt the fp.read out to a tempfile again.
I'm secretly hoping Barry has a one-liner fix for this. ;)


Robert Brewer
[hidden email]


_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: py3k, cgi, email, and form-data

Graham Dumpleton-2
2009/5/12 Robert Brewer <[hidden email]>:

> There's a major change in functionality in the cgi module between Python
> 2 and Python 3 which I've just run across: the behavior of
> FieldStorage.read_multi, specifically when an HTTP app accepts a file
> upload within a multipart/form-data payload.
>
> In Python 2, each part would be read in sequence within its own
> FieldStorage instance. This allowed file uploads to be shunted to a
> TemporaryFile (via make_file) as needed:
>
>     klass = self.FieldStorageClass or self.__class__
>     part = klass(self.fp, {}, ib,
>                  environ, keep_blank_values, strict_parsing)
>     # Throw first part away
>     while not part.done:
>         headers = rfc822.Message(self.fp)
>         part = klass(self.fp, headers, ib,
>                      environ, keep_blank_values, strict_parsing)
>         self.list.append(part)
>
> In Python 3 (svn revision 72466), the whole request body is read into
> memory first via fp.read(), and then broken into separate parts in a
> second step:
>
>     klass = self.FieldStorageClass or self.__class__
>     parser = email.parser.FeedParser()
>     # Create bogus content-type header for proper multipart parsing
>     parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib))
>     parser.feed(self.fp.read())
>     full_msg = parser.close()
>     # Get subparts
>     msgs = full_msg.get_payload()
>     for msg in msgs:
>         fp = StringIO(msg.get_payload())
>         part = klass(fp, msg, ib, environ, keep_blank_values,
>                      strict_parsing)
>         self.list.append(part)
>
> This makes the cgi module in Python 3 somewhat crippled for handling
> multipart/form-data file uploads of any significant size (and since
> the client is the one determining the size, opens a server up for an
> unexpected Denial of Service vector).
>
> I *think* the FeedParser is designed to accept incremental writes,
> but I haven't yet found a way to do any kind of incremental reads
> from it in order to shunt the fp.read out to a tempfile again.
> I'm secretly hoping Barry has a one-liner fix for this. ;)

FWIW, Werkzeug gave up on 'cgi' module for form passing and implements its own.

Not sure whether this issue in Python 3.0 was one of the reasons or
not. I know one of the reasons was because cgi.FieldStorage is not
WSGI 1.0 compliant. One of the main reasons that no one actually
adheres to WSGI 1.0 is because of the 'cgi' module. This still hasn't
been addressed by a proper amendment to WSGI 1.0 specification or a
new WSGI 1.1 specification to allow a hint to readline().

The Werkzeug form processing module is properly WSGI 1.0 compliant,
meaning that Wekzeug is possibly the only major WSGI framework to be
WSGI compliant.

Graham
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: py3k, cgi, email, and form-data

Robert Brewer-4
Graham Dumpleton wrote:

> 2009/5/12 Robert Brewer <[hidden email]>:
> > There's a major change in functionality in the cgi module between
> Python
> > 2 and Python 3 which I've just run across: the behavior of
> > FieldStorage.read_multi, specifically when an HTTP app accepts a file
> > upload within a multipart/form-data payload.
> >
> > In Python 2, each part would be read in sequence within its own
> > FieldStorage instance. This allowed file uploads to be shunted to a
> > TemporaryFile (via make_file) as needed:
> >
> >     klass = self.FieldStorageClass or self.__class__
> >     part = klass(self.fp, {}, ib,
> >                  environ, keep_blank_values, strict_parsing)
> >     # Throw first part away
> >     while not part.done:
> >         headers = rfc822.Message(self.fp)
> >         part = klass(self.fp, headers, ib,
> >                      environ, keep_blank_values, strict_parsing)
> >         self.list.append(part)
> >
> > In Python 3 (svn revision 72466), the whole request body is read into
> > memory first via fp.read(), and then broken into separate parts in a
> > second step:
> >
> >     klass = self.FieldStorageClass or self.__class__
> >     parser = email.parser.FeedParser()
> >     # Create bogus content-type header for proper multipart parsing
> >     parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type,
> ib))
> >     parser.feed(self.fp.read())
> >     full_msg = parser.close()
> >     # Get subparts
> >     msgs = full_msg.get_payload()
> >     for msg in msgs:
> >         fp = StringIO(msg.get_payload())
> >         part = klass(fp, msg, ib, environ, keep_blank_values,
> >                      strict_parsing)
> >         self.list.append(part)
> >
> > This makes the cgi module in Python 3 somewhat crippled for handling
> > multipart/form-data file uploads of any significant size (and since
> > the client is the one determining the size, opens a server up for an
> > unexpected Denial of Service vector).
> >
> > I *think* the FeedParser is designed to accept incremental writes,
> > but I haven't yet found a way to do any kind of incremental reads
> > from it in order to shunt the fp.read out to a tempfile again.
> > I'm secretly hoping Barry has a one-liner fix for this. ;)
>
> FWIW, Werkzeug gave up on 'cgi' module for form passing and implements
> its own.
>
> Not sure whether this issue in Python 3.0 was one of the reasons or
> not. I know one of the reasons was because cgi.FieldStorage is not
> WSGI 1.0 compliant. One of the main reasons that no one actually
> adheres to WSGI 1.0 is because of the 'cgi' module. This still hasn't
> been addressed by a proper amendment to WSGI 1.0 specification or a
> new WSGI 1.1 specification to allow a hint to readline().
>
> The Werkzeug form processing module is properly WSGI 1.0 compliant,
> meaning that Wekzeug is possibly the only major WSGI framework to be
> WSGI compliant.

FWIW, I just added a replacement for the cgi module to CherryPy over the weekend for the same reasons. It's in the python3 branch but will get backported to CherryPy 3.2 for Python 2.x.


Robert Brewer
[hidden email]
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com