Draft PEP: WSGI 1.1

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Draft PEP: WSGI 1.1

Dirkjan Ochtman
Mostly taking Graham's list of issues and incorporating it into PEP 333.

Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt

Let's have comments here (comments in the form of diffs are
particularly welcome, of course). Remember, the idea is not to change
or improve WSGI right now, but only to improve the spec, improving
interoperability and enabling Python 3 support.

Graham, I hope I did a good job with your suggestions. (Since so much
of this is yours, I've just listed you as the second author.) I tried
to clarify exactly what you meant by "native strings", can you check
that out?

Cheers,

Dirkjan

--- pep-0333.txt 2010-04-15 14:46:02.000000000 +0200
+++ wsgi-1.1.txt 2010-04-15 14:51:39.000000000 +0200
@@ -1,114 +1,124 @@
-PEP: 333
-Title: Python Web Server Gateway Interface v1.0
+PEP: 0000
+Title: Python Web Server Gateway Interface 1.1
 Version: $Revision$
 Last-Modified: $Date$
-Author: Phillip J. Eby <[hidden email]>
+Author: Dirkjan Ochtman <[hidden email]>,
+        Graham Dumpleton <[hidden email]>
 Discussions-To: Python Web-SIG <[hidden email]>
 Status: Draft
 Type: Informational
 Content-Type: text/x-rst
-Created: 07-Dec-2003
-Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
+Created: 15-04-2010
+Post-History: Not yet


 Abstract
 ========

-This document specifies a proposed standard interface between web
-servers and Python web applications or frameworks, to promote web
-application portability across a variety of web servers.
+This document specifies a revision of the proposed standard interface
+between web servers and Python web applications or frameworks, to
+promote web application portability across a variety of web servers.


 Rationale and Goals
 ===================

-Python currently boasts a wide variety of web application frameworks,
-such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
-name just a few [1]_.  This wide variety of choices can be a problem
-for new Python users, because generally speaking, their choice of web
-framework will limit their choice of usable web servers, and vice
-versa.
-
-By contrast, although Java has just as many web application frameworks
-available, Java's "servlet" API makes it possible for applications
-written with any Java web application framework to run in any web
-server that supports the servlet API.
-
-The availability and widespread use of such an API in web servers for
-Python -- whether those servers are written in Python (e.g. Medusa),
-embed Python (e.g. mod_python), or invoke Python via a gateway
-protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
-framework from choice of web server, freeing users to choose a pairing
-that suits them, while freeing framework and server developers to
-focus on their preferred area of specialization.
-
-This PEP, therefore, proposes a simple and universal interface between
-web servers and web applications or frameworks: the Python Web Server
-Gateway Interface (WSGI).
-
-But the mere existence of a WSGI spec does nothing to address the
-existing state of servers and frameworks for Python web applications.
-Server and framework authors and maintainers must actually implement
-WSGI for there to be any effect.
-
-However, since no existing servers or frameworks support WSGI, there
-is little immediate reward for an author who implements WSGI support.
-Thus, WSGI **must** be easy to implement, so that an author's initial
-investment in the interface can be reasonably low.
-
-Thus, simplicity of implementation on *both* the server and framework
-sides of the interface is absolutely critical to the utility of the
-WSGI interface, and is therefore the principal criterion for any
-design decisions.
-
-Note, however, that simplicity of implementation for a framework
-author is not the same thing as ease of use for a web application
-author.  WSGI presents an absolutely "no frills" interface to the
-framework author, because bells and whistles like response objects and
-cookie handling would just get in the way of existing frameworks'
-handling of these issues.  Again, the goal of WSGI is to facilitate
-easy interconnection of existing servers and applications or
-frameworks, not to create a new web framework.
-
-Note also that this goal precludes WSGI from requiring anything that
-is not already available in deployed versions of Python.  Therefore,
-new standard library modules are not proposed or required by this
-specification, and nothing in WSGI requires a Python version greater
-than 2.2.2.  (It would be a good idea, however, for future versions
-of Python to include support for this interface in web servers
-provided by the standard library.)
-
-In addition to ease of implementation for existing and future
-frameworks and servers, it should also be easy to create request
-preprocessors, response postprocessors, and other WSGI-based
-"middleware" components that look like an application to their
-containing server, while acting as a server for their contained
-applications.
-
-If middleware can be both simple and robust, and WSGI is widely
-available in servers and frameworks, it allows for the possibility
-of an entirely new kind of Python web application framework: one
-consisting of loosely-coupled WSGI middleware components.  Indeed,
-existing framework authors may even choose to refactor their
-frameworks' existing services to be provided in this way, becoming
-more like libraries used with WSGI, and less like monolithic
-frameworks.  This would then allow application developers to choose
-"best-of-breed" components for specific functionality, rather than
-having to commit to all the pros and cons of a single framework.
-
-Of course, as of this writing, that day is doubtless quite far off.
-In the meantime, it is a sufficient short-term goal for WSGI to
-enable the use of any framework with any server.
-
-Finally, it should be mentioned that the current version of WSGI
-does not prescribe any particular mechanism for "deploying" an
-application for use with a web server or server gateway.  At the
-present time, this is necessarily implementation-defined by the
-server or gateway.  After a sufficient number of servers and
-frameworks have implemented WSGI to provide field experience with
-varying deployment requirements, it may make sense to create
-another PEP, describing a deployment standard for WSGI servers and
-application frameworks.
+WSGI 1.0, specified in PEP 333, did a great job in making it easier
+for web applications and web servers to interface with each other.
+It has become very much the standard it was meant to be and an
+important part of the Python web development infrastructure.
+
+After several implementations were built by different developers,
+it inevitably turned out that the specification wasn't perfect. It
+left out some details that were implemented by all the web server
+interfaces because they were critical for many applications (or
+application frameworks). Additionally, the specification was written
+before Python 3.x was specified, resulting in a lack of clear
+specification on what to do with unicode strings.
+
+While there are some ideas around to improve WSGI further in less
+compatible ways, we feel that there is value to be had in first
+specifying a minor revision of the specification, which is largely
+compatible with existing implementations. Further simplification
+and experimentation are therefore deferred to a 2.0 version.
+
+
+Differences with WSGI 1.0
+=========================
+
+Descriptive changes
+-------------------
+
+The following changes were made to realign the spec with
+implementations 'in the wild'.
+
+1. The 'readline()' function of 'wsgi.input' must optionally take
+   a size hint. This is required because many applications use
+   cgi.FieldStorage, which uses this functionality.
+
+2. The 'wsgi.input' functions for reading input must return an empty
+   string as end of input stream marker. This is required for support
+   of HTTP 1.1 request pipelining. A correctly implemented WSGI
+   middleware already has to cope with an empty string as end
+   sentinel anyway to detect premature end of input.
+
+3. Any WSGI application or middleware should not itself return, or
+   consume from a wrapped WSGI component, more data than specified by
+   the Content-Length response header if defined. Middleware that
+   does this is arguably broken and can generate incorrect data.
+   This is just a clarification of obligations.
+
+4. The WSGI adapter must not pass on to the server any data above
+   what the Content-Length response header defines, if supplied.
+   Doing this is technically a violation of HTTP. This is another
+   clarification of obligations.
+
+
+String handling changes
+-----------------------
+
+The following changes were made to make WSGI work on Python 3.x.
+
+1. The application is passed an instance of a Python dictionary
+   containing what is referred to as the WSGI environment. All keys
+   in this dictionary are native strings. For CGI variables, all names
+   are going to be ISO-8859-1 and so where native strings are
+   unicode strings, that encoding is used for the names of CGI
+   variables.
+
+2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
+   environment, the value of the variable should be a native string.
+
+3. For the CGI variables contained in the WSGI environment, the values
+   of the variables are native strings. Where native strings are
+   unicode strings, ISO-8859-1 encoding would be used such that the
+   original character data is preserved and as necessary the unicode
+   string can be converted back to bytes and thence decoded to unicode
+   again using a different encoding.
+
+4. The WSGI input stream 'wsgi.input' contained in the WSGI environment
+   and from which request content is read, should yield byte strings.
+
+5. The status line specified by the WSGI application should be a byte
+   string. Where native strings are unicode strings, the native string
+   type can also be returned in which case it would be encoded as
+   ISO-8859-1.
+
+6. The list of response headers specified by the WSGI application should
+   contain tuples consisting of two values, where each value is a byte
+   string. Where native strings are unicode strings, the native string
+   type can also be returned in which case it would be encoded as
+   ISO-8859-1.
+
+7. The iterable returned by the application and from which response
+   content is derived, should yield byte strings. Where native strings
+   are unicode strings, the native string type can also be returned in
+   which case it would be encoded as ISO-8859-1.
+
+8. The value passed to the 'write()' callback returned by
+   'start_response()' should be a byte string. Where native strings
+   are unicode strings, a native string type can also be supplied, in
+   which case it would be encoded as ISO-8859-1.


 Specification Overview
@@ -447,6 +457,13 @@
 Streaming`_ section below for more on how application output must be
 handled.)

+Further on, several places specify constraints upon string types used
+in the WSGI API. The term native string is used to mean the 'str' class
+in both Python 2.x and 3.x. The spec tries to ensure optimal
+compatibility and ease of use by allowing implementations running on
+Python 3.x to encode strings (which are Unicode strings with no
+specified encoding) as ISO-8859-1 where a 3.x string is passed in.
+
 The server or gateway should treat the yielded strings as binary byte
 sequences: in particular, it should ensure that line endings are
 not altered.  The application is responsible for ensuring that the
@@ -489,12 +506,22 @@
 ``environ`` Variables
 ---------------------

+All keys in this dictionary are native strings. For CGI variables,
+all names are going to be ISO-8859-1 and so where native strings are
+unicode strings, that encoding is used for the names of CGI variables.
+
 The ``environ`` dictionary is required to contain these CGI
 environment variables, as defined by the Common Gateway Interface
 specification [2]_.  The following variables **must** be present,
 unless their value would be an empty string, in which case they
 **may** be omitted, except as otherwise noted below.

+The values for CGI variables are native strings. Where native strings
+are unicode strings, ISO-8859-1 encoding would be used such that the
+original character data is preserved and as necessary the unicode
+string can be converted back to bytes and thence decoded to unicode
+again using a different encoding.
+
 ``REQUEST_METHOD``
   The HTTP request method, such as ``"GET"`` or ``"POST"``.  This
   cannot ever be an empty string, and so is always required.
@@ -575,13 +602,14 @@
 =====================  ===============================================
 Variable               Value
 =====================  ===============================================
-``wsgi.version``       The tuple ``(1,0)``, representing WSGI
+``wsgi.version``       The tuple ``(1, 0)``, representing WSGI
                        version 1.0.

 ``wsgi.url_scheme``    A string representing the "scheme" portion of
                        the URL at which the application is being
                        invoked.  Normally, this will have the value
-                       ``"http"`` or ``"https"``, as appropriate.
+                       ``"http"`` or ``"https"``, as appropriate. The
+                       value is a native string.

 ``wsgi.input``         An input stream (file-like object) from which
                        the HTTP request body can be read.  (The server
@@ -646,7 +674,7 @@
 Method               Stream      Notes
 ===================  ==========  ========
 ``read(size)``       ``input``   1
-``readline()``       ``input``   1,2
+``readline(hint)``   ``input``   1,2
 ``readlines(hint)``  ``input``   1,3
 ``__iter__()``       ``input``
 ``flush()``          ``errors``  4
@@ -661,11 +689,12 @@
    ``Content-Length``, and is allowed to simulate an end-of-file
    condition if the application attempts to read past that point.
    The application **should not** attempt to read more data than is
-   specified by the ``CONTENT_LENGTH`` variable.
+   specified by the ``CONTENT_LENGTH`` variable. All read functions
+   are required to return an empty string as the end of input stream
+   marker. They must yield byte strings.

-2. The optional "size" argument to ``readline()`` is not supported,
-   as it may be complex for server authors to implement, and is not
-   often used in practice.
+2. The optional "size" argument to ``readline()`` is required for
+   the implementer, but optional for callers.

 3. Note that the ``hint`` argument to ``readlines()`` is optional for
    both caller and implementer.  The application is free not to
@@ -692,12 +721,15 @@
 ---------------------------------

 The second parameter passed to the application object is a callable
-of the form ``start_response(status,response_headers,exc_info=None)``.
+of the form ``start_response(status, response_headers, exc_info=None)``.
 (As with all WSGI callables, the arguments must be supplied
 positionally, not by keyword.)  The ``start_response`` callable is
 used to begin the HTTP response, and it must return a
 ``write(body_data)`` callable (see the `Buffering and Streaming`_
-section, below).
+section, below). Values passed to the ``write(body_data)`` callable
+should be byte strings. Where native strings are unicode strings, a
+native strings type can also be supplied, in which case it would be
+encoded as ISO-8859-1.

 The ``status`` argument is an HTTP "status" string like ``"200 OK"``
 or ``"404 Not Found"``.  That is, it is a string consisting of a
@@ -705,14 +737,20 @@
 single space, with no surrounding whitespace or other characters.
 (See RFC 2616, Section 6.1.1 for more information.)  The string
 **must not** contain control characters, and must not be terminated
-with a carriage return, linefeed, or combination thereof.
+with a carriage return, linefeed, or combination thereof. This
+value should be a byte string. Where native strings are unicode
+strings, the native string type can also be returned, in which
+case it would be encoded as ISO-8859-1.

 The ``response_headers`` argument is a list of ``(header_name,
 header_value)`` tuples.  It must be a Python list; i.e.
-``type(response_headers) is ListType``, and the server **may** change
+``type(response_headers) is list``, and the server **may** change
 its contents in any way it desires.  Each ``header_name`` must be a
 valid HTTP header field-name (as defined by RFC 2616, Section 4.2),
-without a trailing colon or other punctuation.
+without a trailing colon or other punctuation. Both the header_name
+and the header_value should be byte strings. Where native strings
+are unicode strings, the native string type can also be returned,
+in which case it would be encoded as ISO-8859-1.

 Each ``header_value`` **must not** include *any* control characters,
 including carriage returns or linefeeds, either embedded or at the end.
@@ -809,6 +847,14 @@
 Handling the ``Content-Length`` Header
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+If an application or middleware layer chooses to return a
+Content-Length header, it should not return more data than specified
+by the header value. Any wrapping middleware layer should not
+consume more data than specified in the header value from the
+wrapped component (either middleware or application). Any WSGI
+adapter must similarly not pass on data above what the
+Content-Length response header value defines.
+
 If the application does not supply a ``Content-Length`` header, a
 server or gateway may choose one of several approaches to handling
 it.  The simplest of these is to close the client connection when
@@ -1569,55 +1615,13 @@
    developers.


-Proposed/Under Discussion
-=========================
-
-These items are currently being discussed on the Web-SIG and elsewhere,
-or are on the PEP author's "to-do" list:
-
-* Should ``wsgi.input`` be an iterator instead of a file?  This would
-  help for asynchronous applications and chunked-encoding input
-  streams.
-
-* Optional extensions are being discussed for pausing iteration of an
-  application's ouptut until input is available or until a callback
-  occurs.
-
-* Add a section about synchronous vs. asynchronous apps and servers,
-  the relevant threading models, and issues/design goals in these
-  areas.
-
-
 Acknowledgements
 ================

-Thanks go to the many folks on the Web-SIG mailing list whose
-thoughtful feedback made this revised draft possible.  Especially:
+Thanks go to many folks on the Web-SIG mailing list for helping the work
+on clarifying and improving this specification. In particular:

-* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
-  on the first draft as not offering any advantages over "plain old
-  CGI", thus encouraging me to look for a better approach.
-
-* Ian Bicking, who helped nag me into properly specifying the
-  multithreading and multiprocess options, as well as badgering me to
-  provide a mechanism for servers to supply custom extension data to
-  an application.
-
-* Tony Lownds, who came up with the concept of a ``start_response``
-  function that took the status and headers, returning a ``write``
-  function.  His input also guided the design of the exception handling
-  facilities, especially in the area of allowing for middleware that
-  overrides application error messages.
-
-* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython
-  (well before the spec was finalized) helped to shape the "supporting
-  older versions of Python" section, as well as the optional
-  ``wsgi.file_wrapper`` facility.
-
-* Mark Nottingham, who reviewed the spec extensively for issues with
-  HTTP RFC compliance, especially with regard to HTTP/1.1 features that
-  I didn't even know existed until he pointed them out.
-
+* Phillip J. Eby, for writing/editing the 1.0 specification.

 References
 ==========
@@ -1643,8 +1647,6 @@

 This document has been placed in the public domain.

-
-
 ..
    Local Variables:
    mode: indented-text
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Draft PEP: WSGI 1.1

Manlio Perillo-3
Dirkjan Ochtman ha scritto:
> [...]
> --- pep-0333.txt 2010-04-15 14:46:02.000000000 +0200
> +++ wsgi-1.1.txt 2010-04-15 14:51:39.000000000 +0200
> @@ -1,114 +1,124 @@
> [...]


>  Abstract
>  ========
>
> [...]
> -Thus, simplicity of implementation on *both* the server and framework
> -sides of the interface is absolutely critical to the utility of the
> -WSGI interface, and is therefore the principal criterion for any
> -design decisions.
> -
> -Note, however, that simplicity of implementation for a framework
> -author is not the same thing as ease of use for a web application
> -author.  WSGI presents an absolutely "no frills" interface to the
> -framework author, because bells and whistles like response objects and
> -cookie handling would just get in the way of existing frameworks'
> -handling of these issues.  Again, the goal of WSGI is to facilitate
> -easy interconnection of existing servers and applications or
> -frameworks, not to create a new web framework.
> -

This, and the rest of the abstract, should not entirely be removed, IMHO.

> [...]
> -
> -Finally, it should be mentioned that the current version of WSGI
> -does not prescribe any particular mechanism for "deploying" an
> -application for use with a web server or server gateway.  At the
> -present time, this is necessarily implementation-defined by the
> -server or gateway.  After a sufficient number of servers and
> -frameworks have implemented WSGI to provide field experience with
> -varying deployment requirements, it may make sense to create
> -another PEP, describing a deployment standard for WSGI servers and
> -application frameworks.

This should not be removed.

> [...]
> +
> +Differences with WSGI 1.0
> +=========================
> +
> +Descriptive changes
> +-------------------
> +
> +The following changes were made to realign the spec with
> +implementations 'in the wild'.
> +

This text feels wrong, to me,

> +1. The 'readline()' function of 'wsgi.input' must optionally take
> +   a size hint. This is required because many applications use
> +   cgi.FieldStorage, which uses this functionality.
> +

What values are supported for size?
Are values -1 and None supported?

> [...]
> +3. Any WSGI application or middleware should not itself return, or
> +   consume from a wrapped WSGI component

This is not very clear.
What is the meaning of "consume from a wrapped WSGI component"?

> , more data than specified by
> +   the Content-Length response header if defined. Middleware that
> +   does this is arguably broken and can generate incorrect data.
> +   This is just a clarification of obligations.
> +
> [...]
> +
> +String handling changes
> +-----------------------
> +
> +The following changes were made to make WSGI work on Python 3.x.
> +
> +1. The application is passed an instance of a Python dictionary
> +   containing what is referred to as the WSGI environment. All keys
> +   in this dictionary are native strings. For CGI variables, all names
> +   are going to be ISO-8859-1

"going to be ISO-8859-1" should be expressed in more precise terms.

Moreover, you should probably define first what a "native string" is, or
you shoudl add a note that it is defined later in the document.

> and so where native strings are
> +   unicode strings, that encoding is used for the names of CGI
> +   variables.
> +
> +2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
> +   environment, the value of the variable should be a native string.
> +
> +3. For the CGI variables contained in the WSGI environment, the values
> +   of the variables are native strings. Where native strings are
> +   unicode strings, ISO-8859-1 encoding would be used such that the

What is the precise meaning of *would*, here?

> +   original character data is preserved and as necessary the unicode
> +   string can be converted back to bytes and thence decoded to unicode
> +   again using a different encoding.
> +
> +4. The WSGI input stream 'wsgi.input' contained in the WSGI environment
> +   and from which request content is read, should yield byte strings.
> +

"yield" should be replaced with "return".

And, again, why are you using *should*, here? Is an implementation
allowed to return a native string?

See my previous comment for "native string", about the use od "byte
string" here.

> [...]

> @@ -575,13 +602,14 @@
>  =====================  ===============================================
>  Variable               Value
>  =====================  ===============================================
> -``wsgi.version``       The tuple ``(1,0)``, representing WSGI
> +``wsgi.version``       The tuple ``(1, 0)``, representing WSGI
>                         version 1.0.
>

Should be (1, 1), not (1, 0).

> [...]
>
> -Proposed/Under Discussion
> -=========================
> -

I see no real reasons for removing this section.

> [...]

Moreover, should the section
"Supporting Older (<2.2) Versions of Python" be removed?

> -
>  Acknowledgements
>  ================
>

Since WSGI 1.1 contains only corrections for WSGI 1.0, I see no reasons
to remove original contributors to WSGI 1.0.


> [...]


Regards  Manlio
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Draft PEP: WSGI 1.1

Manlio Perillo-3
In reply to this post by Dirkjan Ochtman
Dirkjan Ochtman ha scritto:
> Mostly taking Graham's list of issues and incorporating it into PEP 333.
>
> Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt
>
> Let's have comments here (comments in the form of diffs are
> particularly welcome, of course). Remember, the idea is not to change
> or improve WSGI right now, but only to improve the spec, improving
> interoperability and enabling Python 3 support.
>

> [...]

Another comment.
The run_with_cgi sample function should be changed, since it probably
does not work correctly, on Python 3.x.

I'm not sure, since sys.stdout.write accepts a native string, however
how it is encoded is platform specific (with current text of WSGI 1.1,
however, it seems this is allowed).


I would like to do some tests with CGI, Python 3.2, IIS and Windows.


Regards  Manlio
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Draft PEP: WSGI 1.1

and-py
In reply to this post by Dirkjan Ochtman
Dirkjan Ochtman wrote:

> 1. The application is passed an instance of a Python dictionary
>    containing what is referred to as the WSGI environment. All keys
>    in this dictionary are native strings. For CGI variables, all names
>    are going to be ISO-8859-1 and so where native strings are
>    unicode strings, that encoding is used for the names of CGI
>    variables.

Perhaps explain where those ISO-8859-1 bytes might come from:

     ...are native strings. Where native strings are Unicode, any
     keys derived from byte-oriented sources (such as custom headers
     in the HTTP request reflected in the CGI environment variables)
     should be decoded using the ISO-8859-1 encoding.

> 3. For the CGI variables contained in the WSGI environment, the values
>    of the variables are native strings. Where native strings are
>    unicode strings, ISO-8859-1 encoding would be used such that the
>    original character data is preserved and as necessary the unicode
>    string can be converted back to bytes and thence decoded to unicode
>    again using a different encoding.

Good. The only problem that remains with this is that in certain
environments (notably: all IIS use, not just CGI) a WSGI gateway cannot
fully comply with this requirement.

a. disallow environments that cannot be sure they are preserving the
original byte data from declaring that they support wsgi.version 1.1?

b. add an extra wsgi.something flag for a WSGI server to add, to specify
that it is sure that the original bytes have been preserved? (ie. so
wsgiref's CGI handler would have to declare it wasn't sure when running
under Windows.)

c. just let WSGI gateways silently ignore the ISO-8859-1 requirement if
they can't honour it and let the application spend its time trying to
unravel the mess (status quo).

(Can wsgiref be fixed to use ISO-8859-1 in time for Python 3.2?)

> 7. The iterable returned by the application and from which response
>    content is derived, should yield byte strings. Where native strings
>    are unicode strings, the native string type can also be returned in
>    which case it would be encoded as ISO-8859-1.

> 8. The value passed to the 'write()' callback returned by
>    'start_response()' should be a byte string. Where native strings
>    are unicode strings, a native string type can also be supplied, in
>    which case it would be encoded as ISO-8859-1.

Weren't we going to only allow US-ASCII for the output? (These threads
are always so far apart I can never remember what conclusion we
reached... if any.)

Whilst ISO-8859-1 is in the HTTP standard for headers, and required to
preserve bytes in input, it's much, much less likely that the response
body is going to be ISO-8859-1. It could maybe be cp1252, but more
likely the author wanted UTF-8.

If we must support Unicode strings for response body output at all, I'd
prefer to be conservative here and spit a UnicodeEncodeError straight
away, rather than quietly mangle characters U+0080 to U+00FF.

Manlio Perillo wrote:

> The run_with_cgi sample function should be changed, since it probably
> does not work correctly, on Python 3.x.

Yes, the 'URL Reconstruction' fragment will be wrong too, since it uses
urllib.quote() to encode the path part. quote() defaults to UTF-8 rather
than the ISO-8859-1 WSGI 1.1 requires.

--
And Clover
mailto:[hidden email]
http://www.doxdesk.com/

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Draft PEP: WSGI 1.1

Manlio Perillo-3
And Clover ha scritto:

> [...]
>> 8. The value passed to the 'write()' callback returned by
>>    'start_response()' should be a byte string. Where native strings
>>    are unicode strings, a native string type can also be supplied, in
>>    which case it would be encoded as ISO-8859-1.
>
> Weren't we going to only allow US-ASCII for the output? (These threads
> are always so far apart I can never remember what conclusion we
> reached... if any.)
>

By the way, yesterday I wrote some tests for Python 3.x and I found a
possible problem (only indirectly related to WSGI, however).

The example consists in a simple client -> proxy -> server, where the
client and server are in Python 2.5 and the proxy in Python 3.2
(compiled from tip, some time ago).

Here is the proxy:
http://paste.pocoo.org/show/202212/

The application fails, if cookie contains non ascii character.
The reason is that, for reasons I do not understand, http.client encode
request headers using us-ascii, instead of iso-8859-1.

The offending code is:
http://hg.python.org/cpython/file/7dcb7a2fb54d/Lib/http/client.py#l912



Regards  Manlio
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Draft PEP: WSGI 1.1

Graham Dumpleton-2
In reply to this post by Dirkjan Ochtman
I haven't read what you have done yet, but if you have done so
already, ensure you read:

 http://bitbucket.org/ianb/wsgi-peps/src/

This is Ian's and Armin's previous go at new specification. It though
tried to go further than what you are doing.

Also read:

 http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html

I explain what I mean by native strings in that.

Graham

On 15 April 2010 22:54, Dirkjan Ochtman <[hidden email]> wrote:

> Mostly taking Graham's list of issues and incorporating it into PEP 333.
>
> Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt
>
> Let's have comments here (comments in the form of diffs are
> particularly welcome, of course). Remember, the idea is not to change
> or improve WSGI right now, but only to improve the spec, improving
> interoperability and enabling Python 3 support.
>
> Graham, I hope I did a good job with your suggestions. (Since so much
> of this is yours, I've just listed you as the second author.) I tried
> to clarify exactly what you meant by "native strings", can you check
> that out?
>
> Cheers,
>
> Dirkjan
>
> --- pep-0333.txt        2010-04-15 14:46:02.000000000 +0200
> +++ wsgi-1.1.txt        2010-04-15 14:51:39.000000000 +0200
> @@ -1,114 +1,124 @@
> -PEP: 333
> -Title: Python Web Server Gateway Interface v1.0
> +PEP: 0000
> +Title: Python Web Server Gateway Interface 1.1
>  Version: $Revision$
>  Last-Modified: $Date$
> -Author: Phillip J. Eby <[hidden email]>
> +Author: Dirkjan Ochtman <[hidden email]>,
> +        Graham Dumpleton <[hidden email]>
>  Discussions-To: Python Web-SIG <[hidden email]>
>  Status: Draft
>  Type: Informational
>  Content-Type: text/x-rst
> -Created: 07-Dec-2003
> -Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
> +Created: 15-04-2010
> +Post-History: Not yet
>
>
>  Abstract
>  ========
>
> -This document specifies a proposed standard interface between web
> -servers and Python web applications or frameworks, to promote web
> -application portability across a variety of web servers.
> +This document specifies a revision of the proposed standard interface
> +between web servers and Python web applications or frameworks, to
> +promote web application portability across a variety of web servers.
>
>
>  Rationale and Goals
>  ===================
>
> -Python currently boasts a wide variety of web application frameworks,
> -such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
> -name just a few [1]_.  This wide variety of choices can be a problem
> -for new Python users, because generally speaking, their choice of web
> -framework will limit their choice of usable web servers, and vice
> -versa.
> -
> -By contrast, although Java has just as many web application frameworks
> -available, Java's "servlet" API makes it possible for applications
> -written with any Java web application framework to run in any web
> -server that supports the servlet API.
> -
> -The availability and widespread use of such an API in web servers for
> -Python -- whether those servers are written in Python (e.g. Medusa),
> -embed Python (e.g. mod_python), or invoke Python via a gateway
> -protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
> -framework from choice of web server, freeing users to choose a pairing
> -that suits them, while freeing framework and server developers to
> -focus on their preferred area of specialization.
> -
> -This PEP, therefore, proposes a simple and universal interface between
> -web servers and web applications or frameworks: the Python Web Server
> -Gateway Interface (WSGI).
> -
> -But the mere existence of a WSGI spec does nothing to address the
> -existing state of servers and frameworks for Python web applications.
> -Server and framework authors and maintainers must actually implement
> -WSGI for there to be any effect.
> -
> -However, since no existing servers or frameworks support WSGI, there
> -is little immediate reward for an author who implements WSGI support.
> -Thus, WSGI **must** be easy to implement, so that an author's initial
> -investment in the interface can be reasonably low.
> -
> -Thus, simplicity of implementation on *both* the server and framework
> -sides of the interface is absolutely critical to the utility of the
> -WSGI interface, and is therefore the principal criterion for any
> -design decisions.
> -
> -Note, however, that simplicity of implementation for a framework
> -author is not the same thing as ease of use for a web application
> -author.  WSGI presents an absolutely "no frills" interface to the
> -framework author, because bells and whistles like response objects and
> -cookie handling would just get in the way of existing frameworks'
> -handling of these issues.  Again, the goal of WSGI is to facilitate
> -easy interconnection of existing servers and applications or
> -frameworks, not to create a new web framework.
> -
> -Note also that this goal precludes WSGI from requiring anything that
> -is not already available in deployed versions of Python.  Therefore,
> -new standard library modules are not proposed or required by this
> -specification, and nothing in WSGI requires a Python version greater
> -than 2.2.2.  (It would be a good idea, however, for future versions
> -of Python to include support for this interface in web servers
> -provided by the standard library.)
> -
> -In addition to ease of implementation for existing and future
> -frameworks and servers, it should also be easy to create request
> -preprocessors, response postprocessors, and other WSGI-based
> -"middleware" components that look like an application to their
> -containing server, while acting as a server for their contained
> -applications.
> -
> -If middleware can be both simple and robust, and WSGI is widely
> -available in servers and frameworks, it allows for the possibility
> -of an entirely new kind of Python web application framework: one
> -consisting of loosely-coupled WSGI middleware components.  Indeed,
> -existing framework authors may even choose to refactor their
> -frameworks' existing services to be provided in this way, becoming
> -more like libraries used with WSGI, and less like monolithic
> -frameworks.  This would then allow application developers to choose
> -"best-of-breed" components for specific functionality, rather than
> -having to commit to all the pros and cons of a single framework.
> -
> -Of course, as of this writing, that day is doubtless quite far off.
> -In the meantime, it is a sufficient short-term goal for WSGI to
> -enable the use of any framework with any server.
> -
> -Finally, it should be mentioned that the current version of WSGI
> -does not prescribe any particular mechanism for "deploying" an
> -application for use with a web server or server gateway.  At the
> -present time, this is necessarily implementation-defined by the
> -server or gateway.  After a sufficient number of servers and
> -frameworks have implemented WSGI to provide field experience with
> -varying deployment requirements, it may make sense to create
> -another PEP, describing a deployment standard for WSGI servers and
> -application frameworks.
> +WSGI 1.0, specified in PEP 333, did a great job in making it easier
> +for web applications and web servers to interface with each other.
> +It has become very much the standard it was meant to be and an
> +important part of the Python web development infrastructure.
> +
> +After several implementations were built by different developers,
> +it inevitably turned out that the specification wasn't perfect. It
> +left out some details that were implemented by all the web server
> +interfaces because they were critical for many applications (or
> +application frameworks). Additionally, the specification was written
> +before Python 3.x was specified, resulting in a lack of clear
> +specification on what to do with unicode strings.
> +
> +While there are some ideas around to improve WSGI further in less
> +compatible ways, we feel that there is value to be had in first
> +specifying a minor revision of the specification, which is largely
> +compatible with existing implementations. Further simplification
> +and experimentation are therefore deferred to a 2.0 version.
> +
> +
> +Differences with WSGI 1.0
> +=========================
> +
> +Descriptive changes
> +-------------------
> +
> +The following changes were made to realign the spec with
> +implementations 'in the wild'.
> +
> +1. The 'readline()' function of 'wsgi.input' must optionally take
> +   a size hint. This is required because many applications use
> +   cgi.FieldStorage, which uses this functionality.
> +
> +2. The 'wsgi.input' functions for reading input must return an empty
> +   string as end of input stream marker. This is required for support
> +   of HTTP 1.1 request pipelining. A correctly implemented WSGI
> +   middleware already has to cope with an empty string as end
> +   sentinel anyway to detect premature end of input.
> +
> +3. Any WSGI application or middleware should not itself return, or
> +   consume from a wrapped WSGI component, more data than specified by
> +   the Content-Length response header if defined. Middleware that
> +   does this is arguably broken and can generate incorrect data.
> +   This is just a clarification of obligations.
> +
> +4. The WSGI adapter must not pass on to the server any data above
> +   what the Content-Length response header defines, if supplied.
> +   Doing this is technically a violation of HTTP. This is another
> +   clarification of obligations.
> +
> +
> +String handling changes
> +-----------------------
> +
> +The following changes were made to make WSGI work on Python 3.x.
> +
> +1. The application is passed an instance of a Python dictionary
> +   containing what is referred to as the WSGI environment. All keys
> +   in this dictionary are native strings. For CGI variables, all names
> +   are going to be ISO-8859-1 and so where native strings are
> +   unicode strings, that encoding is used for the names of CGI
> +   variables.
> +
> +2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
> +   environment, the value of the variable should be a native string.
> +
> +3. For the CGI variables contained in the WSGI environment, the values
> +   of the variables are native strings. Where native strings are
> +   unicode strings, ISO-8859-1 encoding would be used such that the
> +   original character data is preserved and as necessary the unicode
> +   string can be converted back to bytes and thence decoded to unicode
> +   again using a different encoding.
> +
> +4. The WSGI input stream 'wsgi.input' contained in the WSGI environment
> +   and from which request content is read, should yield byte strings.
> +
> +5. The status line specified by the WSGI application should be a byte
> +   string. Where native strings are unicode strings, the native string
> +   type can also be returned in which case it would be encoded as
> +   ISO-8859-1.
> +
> +6. The list of response headers specified by the WSGI application should
> +   contain tuples consisting of two values, where each value is a byte
> +   string. Where native strings are unicode strings, the native string
> +   type can also be returned in which case it would be encoded as
> +   ISO-8859-1.
> +
> +7. The iterable returned by the application and from which response
> +   content is derived, should yield byte strings. Where native strings
> +   are unicode strings, the native string type can also be returned in
> +   which case it would be encoded as ISO-8859-1.
> +
> +8. The value passed to the 'write()' callback returned by
> +   'start_response()' should be a byte string. Where native strings
> +   are unicode strings, a native string type can also be supplied, in
> +   which case it would be encoded as ISO-8859-1.
>
>
>  Specification Overview
> @@ -447,6 +457,13 @@
>  Streaming`_ section below for more on how application output must be
>  handled.)
>
> +Further on, several places specify constraints upon string types used
> +in the WSGI API. The term native string is used to mean the 'str' class
> +in both Python 2.x and 3.x. The spec tries to ensure optimal
> +compatibility and ease of use by allowing implementations running on
> +Python 3.x to encode strings (which are Unicode strings with no
> +specified encoding) as ISO-8859-1 where a 3.x string is passed in.
> +
>  The server or gateway should treat the yielded strings as binary byte
>  sequences: in particular, it should ensure that line endings are
>  not altered.  The application is responsible for ensuring that the
> @@ -489,12 +506,22 @@
>  ``environ`` Variables
>  ---------------------
>
> +All keys in this dictionary are native strings. For CGI variables,
> +all names are going to be ISO-8859-1 and so where native strings are
> +unicode strings, that encoding is used for the names of CGI variables.
> +
>  The ``environ`` dictionary is required to contain these CGI
>  environment variables, as defined by the Common Gateway Interface
>  specification [2]_.  The following variables **must** be present,
>  unless their value would be an empty string, in which case they
>  **may** be omitted, except as otherwise noted below.
>
> +The values for CGI variables are native strings. Where native strings
> +are unicode strings, ISO-8859-1 encoding would be used such that the
> +original character data is preserved and as necessary the unicode
> +string can be converted back to bytes and thence decoded to unicode
> +again using a different encoding.
> +
>  ``REQUEST_METHOD``
>   The HTTP request method, such as ``"GET"`` or ``"POST"``.  This
>   cannot ever be an empty string, and so is always required.
> @@ -575,13 +602,14 @@
>  =====================  ===============================================
>  Variable               Value
>  =====================  ===============================================
> -``wsgi.version``       The tuple ``(1,0)``, representing WSGI
> +``wsgi.version``       The tuple ``(1, 0)``, representing WSGI
>                        version 1.0.
>
>  ``wsgi.url_scheme``    A string representing the "scheme" portion of
>                        the URL at which the application is being
>                        invoked.  Normally, this will have the value
> -                       ``"http"`` or ``"https"``, as appropriate.
> +                       ``"http"`` or ``"https"``, as appropriate. The
> +                       value is a native string.
>
>  ``wsgi.input``         An input stream (file-like object) from which
>                        the HTTP request body can be read.  (The server
> @@ -646,7 +674,7 @@
>  Method               Stream      Notes
>  ===================  ==========  ========
>  ``read(size)``       ``input``   1
> -``readline()``       ``input``   1,2
> +``readline(hint)``   ``input``   1,2
>  ``readlines(hint)``  ``input``   1,3
>  ``__iter__()``       ``input``
>  ``flush()``          ``errors``  4
> @@ -661,11 +689,12 @@
>    ``Content-Length``, and is allowed to simulate an end-of-file
>    condition if the application attempts to read past that point.
>    The application **should not** attempt to read more data than is
> -   specified by the ``CONTENT_LENGTH`` variable.
> +   specified by the ``CONTENT_LENGTH`` variable. All read functions
> +   are required to return an empty string as the end of input stream
> +   marker. They must yield byte strings.
>
> -2. The optional "size" argument to ``readline()`` is not supported,
> -   as it may be complex for server authors to implement, and is not
> -   often used in practice.
> +2. The optional "size" argument to ``readline()`` is required for
> +   the implementer, but optional for callers.
>
>  3. Note that the ``hint`` argument to ``readlines()`` is optional for
>    both caller and implementer.  The application is free not to
> @@ -692,12 +721,15 @@
>  ---------------------------------
>
>  The second parameter passed to the application object is a callable
> -of the form ``start_response(status,response_headers,exc_info=None)``.
> +of the form ``start_response(status, response_headers, exc_info=None)``.
>  (As with all WSGI callables, the arguments must be supplied
>  positionally, not by keyword.)  The ``start_response`` callable is
>  used to begin the HTTP response, and it must return a
>  ``write(body_data)`` callable (see the `Buffering and Streaming`_
> -section, below).
> +section, below). Values passed to the ``write(body_data)`` callable
> +should be byte strings. Where native strings are unicode strings, a
> +native strings type can also be supplied, in which case it would be
> +encoded as ISO-8859-1.
>
>  The ``status`` argument is an HTTP "status" string like ``"200 OK"``
>  or ``"404 Not Found"``.  That is, it is a string consisting of a
> @@ -705,14 +737,20 @@
>  single space, with no surrounding whitespace or other characters.
>  (See RFC 2616, Section 6.1.1 for more information.)  The string
>  **must not** contain control characters, and must not be terminated
> -with a carriage return, linefeed, or combination thereof.
> +with a carriage return, linefeed, or combination thereof. This
> +value should be a byte string. Where native strings are unicode
> +strings, the native string type can also be returned, in which
> +case it would be encoded as ISO-8859-1.
>
>  The ``response_headers`` argument is a list of ``(header_name,
>  header_value)`` tuples.  It must be a Python list; i.e.
> -``type(response_headers) is ListType``, and the server **may** change
> +``type(response_headers) is list``, and the server **may** change
>  its contents in any way it desires.  Each ``header_name`` must be a
>  valid HTTP header field-name (as defined by RFC 2616, Section 4.2),
> -without a trailing colon or other punctuation.
> +without a trailing colon or other punctuation. Both the header_name
> +and the header_value should be byte strings. Where native strings
> +are unicode strings, the native string type can also be returned,
> +in which case it would be encoded as ISO-8859-1.
>
>  Each ``header_value`` **must not** include *any* control characters,
>  including carriage returns or linefeeds, either embedded or at the end.
> @@ -809,6 +847,14 @@
>  Handling the ``Content-Length`` Header
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> +If an application or middleware layer chooses to return a
> +Content-Length header, it should not return more data than specified
> +by the header value. Any wrapping middleware layer should not
> +consume more data than specified in the header value from the
> +wrapped component (either middleware or application). Any WSGI
> +adapter must similarly not pass on data above what the
> +Content-Length response header value defines.
> +
>  If the application does not supply a ``Content-Length`` header, a
>  server or gateway may choose one of several approaches to handling
>  it.  The simplest of these is to close the client connection when
> @@ -1569,55 +1615,13 @@
>    developers.
>
>
> -Proposed/Under Discussion
> -=========================
> -
> -These items are currently being discussed on the Web-SIG and elsewhere,
> -or are on the PEP author's "to-do" list:
> -
> -* Should ``wsgi.input`` be an iterator instead of a file?  This would
> -  help for asynchronous applications and chunked-encoding input
> -  streams.
> -
> -* Optional extensions are being discussed for pausing iteration of an
> -  application's ouptut until input is available or until a callback
> -  occurs.
> -
> -* Add a section about synchronous vs. asynchronous apps and servers,
> -  the relevant threading models, and issues/design goals in these
> -  areas.
> -
> -
>  Acknowledgements
>  ================
>
> -Thanks go to the many folks on the Web-SIG mailing list whose
> -thoughtful feedback made this revised draft possible.  Especially:
> +Thanks go to many folks on the Web-SIG mailing list for helping the work
> +on clarifying and improving this specification. In particular:
>
> -* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
> -  on the first draft as not offering any advantages over "plain old
> -  CGI", thus encouraging me to look for a better approach.
> -
> -* Ian Bicking, who helped nag me into properly specifying the
> -  multithreading and multiprocess options, as well as badgering me to
> -  provide a mechanism for servers to supply custom extension data to
> -  an application.
> -
> -* Tony Lownds, who came up with the concept of a ``start_response``
> -  function that took the status and headers, returning a ``write``
> -  function.  His input also guided the design of the exception handling
> -  facilities, especially in the area of allowing for middleware that
> -  overrides application error messages.
> -
> -* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython
> -  (well before the spec was finalized) helped to shape the "supporting
> -  older versions of Python" section, as well as the optional
> -  ``wsgi.file_wrapper`` facility.
> -
> -* Mark Nottingham, who reviewed the spec extensively for issues with
> -  HTTP RFC compliance, especially with regard to HTTP/1.1 features that
> -  I didn't even know existed until he pointed them out.
> -
> +* Phillip J. Eby, for writing/editing the 1.0 specification.
>
>  References
>  ==========
> @@ -1643,8 +1647,6 @@
>
>  This document has been placed in the public domain.
>
> -
> -
>  ..
>    Local Variables:
>    mode: indented-text
>
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Draft PEP: WSGI 1.1

Graham Dumpleton-2
On 16 April 2010 11:41, Graham Dumpleton <[hidden email]> wrote:
> I haven't read what you have done yet

And still haven't. Don't know when I will get a chance to do so.

Two points from a quick scan of emails.

1. The following section of PEP needs to be updated:

"""
  1417 Apart from the handling of ``close()``, the semantics of returning a
  1418 file wrapper from the application should be the same as if the
  1419 application had returned ``iter(filelike.read, '')``.  In other words,
  1420 transmission should begin at the current position within the "file"
  1421 at the time that transmission begins, and continue until the end is
  1422 reached.
"""

It can't say read until 'end is reached' of file as Content-Length
must be honoured and less returned if Content-Length is less than what
is available in the remainder of the file as per descriptive changes
(3) and (4).

In respect of question about readline() arguments and whether -1 or
None is allowed. I would say no they are not. Must be positive integer
or no argument supplied at all.

Different implementations use -1 or None as value of a default
argument to know when an argument wasn't supplied. One cant rely
though on one or the other being used and so that supplying those
arguments explicitly means the same thing as no argument supplied. In
other words, supplying anything but positive integer or no argument at
all is undefined.

Same issue arises with read() except that only positive integer can
technically be supplied and argument is not optional. Although, any
implementation which implements wsgi.input as a proper file like
argument is going to accept no argument to mean read all input, this
is outside of WSGI specification and calling with no argument is
undefined.

Graham

> but if you have done so
> already, ensure you read:
>
>  http://bitbucket.org/ianb/wsgi-peps/src/
>
> This is Ian's and Armin's previous go at new specification. It though
> tried to go further than what you are doing.
>
> Also read:
>
>  http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html
>
> I explain what I mean by native strings in that.
>
> Graham
>
> On 15 April 2010 22:54, Dirkjan Ochtman <[hidden email]> wrote:
>> Mostly taking Graham's list of issues and incorporating it into PEP 333.
>>
>> Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt
>>
>> Let's have comments here (comments in the form of diffs are
>> particularly welcome, of course). Remember, the idea is not to change
>> or improve WSGI right now, but only to improve the spec, improving
>> interoperability and enabling Python 3 support.
>>
>> Graham, I hope I did a good job with your suggestions. (Since so much
>> of this is yours, I've just listed you as the second author.) I tried
>> to clarify exactly what you meant by "native strings", can you check
>> that out?
>>
>> Cheers,
>>
>> Dirkjan
>>
>> --- pep-0333.txt        2010-04-15 14:46:02.000000000 +0200
>> +++ wsgi-1.1.txt        2010-04-15 14:51:39.000000000 +0200
>> @@ -1,114 +1,124 @@
>> -PEP: 333
>> -Title: Python Web Server Gateway Interface v1.0
>> +PEP: 0000
>> +Title: Python Web Server Gateway Interface 1.1
>>  Version: $Revision$
>>  Last-Modified: $Date$
>> -Author: Phillip J. Eby <[hidden email]>
>> +Author: Dirkjan Ochtman <[hidden email]>,
>> +        Graham Dumpleton <[hidden email]>
>>  Discussions-To: Python Web-SIG <[hidden email]>
>>  Status: Draft
>>  Type: Informational
>>  Content-Type: text/x-rst
>> -Created: 07-Dec-2003
>> -Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
>> +Created: 15-04-2010
>> +Post-History: Not yet
>>
>>
>>  Abstract
>>  ========
>>
>> -This document specifies a proposed standard interface between web
>> -servers and Python web applications or frameworks, to promote web
>> -application portability across a variety of web servers.
>> +This document specifies a revision of the proposed standard interface
>> +between web servers and Python web applications or frameworks, to
>> +promote web application portability across a variety of web servers.
>>
>>
>>  Rationale and Goals
>>  ===================
>>
>> -Python currently boasts a wide variety of web application frameworks,
>> -such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
>> -name just a few [1]_.  This wide variety of choices can be a problem
>> -for new Python users, because generally speaking, their choice of web
>> -framework will limit their choice of usable web servers, and vice
>> -versa.
>> -
>> -By contrast, although Java has just as many web application frameworks
>> -available, Java's "servlet" API makes it possible for applications
>> -written with any Java web application framework to run in any web
>> -server that supports the servlet API.
>> -
>> -The availability and widespread use of such an API in web servers for
>> -Python -- whether those servers are written in Python (e.g. Medusa),
>> -embed Python (e.g. mod_python), or invoke Python via a gateway
>> -protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
>> -framework from choice of web server, freeing users to choose a pairing
>> -that suits them, while freeing framework and server developers to
>> -focus on their preferred area of specialization.
>> -
>> -This PEP, therefore, proposes a simple and universal interface between
>> -web servers and web applications or frameworks: the Python Web Server
>> -Gateway Interface (WSGI).
>> -
>> -But the mere existence of a WSGI spec does nothing to address the
>> -existing state of servers and frameworks for Python web applications.
>> -Server and framework authors and maintainers must actually implement
>> -WSGI for there to be any effect.
>> -
>> -However, since no existing servers or frameworks support WSGI, there
>> -is little immediate reward for an author who implements WSGI support.
>> -Thus, WSGI **must** be easy to implement, so that an author's initial
>> -investment in the interface can be reasonably low.
>> -
>> -Thus, simplicity of implementation on *both* the server and framework
>> -sides of the interface is absolutely critical to the utility of the
>> -WSGI interface, and is therefore the principal criterion for any
>> -design decisions.
>> -
>> -Note, however, that simplicity of implementation for a framework
>> -author is not the same thing as ease of use for a web application
>> -author.  WSGI presents an absolutely "no frills" interface to the
>> -framework author, because bells and whistles like response objects and
>> -cookie handling would just get in the way of existing frameworks'
>> -handling of these issues.  Again, the goal of WSGI is to facilitate
>> -easy interconnection of existing servers and applications or
>> -frameworks, not to create a new web framework.
>> -
>> -Note also that this goal precludes WSGI from requiring anything that
>> -is not already available in deployed versions of Python.  Therefore,
>> -new standard library modules are not proposed or required by this
>> -specification, and nothing in WSGI requires a Python version greater
>> -than 2.2.2.  (It would be a good idea, however, for future versions
>> -of Python to include support for this interface in web servers
>> -provided by the standard library.)
>> -
>> -In addition to ease of implementation for existing and future
>> -frameworks and servers, it should also be easy to create request
>> -preprocessors, response postprocessors, and other WSGI-based
>> -"middleware" components that look like an application to their
>> -containing server, while acting as a server for their contained
>> -applications.
>> -
>> -If middleware can be both simple and robust, and WSGI is widely
>> -available in servers and frameworks, it allows for the possibility
>> -of an entirely new kind of Python web application framework: one
>> -consisting of loosely-coupled WSGI middleware components.  Indeed,
>> -existing framework authors may even choose to refactor their
>> -frameworks' existing services to be provided in this way, becoming
>> -more like libraries used with WSGI, and less like monolithic
>> -frameworks.  This would then allow application developers to choose
>> -"best-of-breed" components for specific functionality, rather than
>> -having to commit to all the pros and cons of a single framework.
>> -
>> -Of course, as of this writing, that day is doubtless quite far off.
>> -In the meantime, it is a sufficient short-term goal for WSGI to
>> -enable the use of any framework with any server.
>> -
>> -Finally, it should be mentioned that the current version of WSGI
>> -does not prescribe any particular mechanism for "deploying" an
>> -application for use with a web server or server gateway.  At the
>> -present time, this is necessarily implementation-defined by the
>> -server or gateway.  After a sufficient number of servers and
>> -frameworks have implemented WSGI to provide field experience with
>> -varying deployment requirements, it may make sense to create
>> -another PEP, describing a deployment standard for WSGI servers and
>> -application frameworks.
>> +WSGI 1.0, specified in PEP 333, did a great job in making it easier
>> +for web applications and web servers to interface with each other.
>> +It has become very much the standard it was meant to be and an
>> +important part of the Python web development infrastructure.
>> +
>> +After several implementations were built by different developers,
>> +it inevitably turned out that the specification wasn't perfect. It
>> +left out some details that were implemented by all the web server
>> +interfaces because they were critical for many applications (or
>> +application frameworks). Additionally, the specification was written
>> +before Python 3.x was specified, resulting in a lack of clear
>> +specification on what to do with unicode strings.
>> +
>> +While there are some ideas around to improve WSGI further in less
>> +compatible ways, we feel that there is value to be had in first
>> +specifying a minor revision of the specification, which is largely
>> +compatible with existing implementations. Further simplification
>> +and experimentation are therefore deferred to a 2.0 version.
>> +
>> +
>> +Differences with WSGI 1.0
>> +=========================
>> +
>> +Descriptive changes
>> +-------------------
>> +
>> +The following changes were made to realign the spec with
>> +implementations 'in the wild'.
>> +
>> +1. The 'readline()' function of 'wsgi.input' must optionally take
>> +   a size hint. This is required because many applications use
>> +   cgi.FieldStorage, which uses this functionality.
>> +
>> +2. The 'wsgi.input' functions for reading input must return an empty
>> +   string as end of input stream marker. This is required for support
>> +   of HTTP 1.1 request pipelining. A correctly implemented WSGI
>> +   middleware already has to cope with an empty string as end
>> +   sentinel anyway to detect premature end of input.
>> +
>> +3. Any WSGI application or middleware should not itself return, or
>> +   consume from a wrapped WSGI component, more data than specified by
>> +   the Content-Length response header if defined. Middleware that
>> +   does this is arguably broken and can generate incorrect data.
>> +   This is just a clarification of obligations.
>> +
>> +4. The WSGI adapter must not pass on to the server any data above
>> +   what the Content-Length response header defines, if supplied.
>> +   Doing this is technically a violation of HTTP. This is another
>> +   clarification of obligations.
>> +
>> +
>> +String handling changes
>> +-----------------------
>> +
>> +The following changes were made to make WSGI work on Python 3.x.
>> +
>> +1. The application is passed an instance of a Python dictionary
>> +   containing what is referred to as the WSGI environment. All keys
>> +   in this dictionary are native strings. For CGI variables, all names
>> +   are going to be ISO-8859-1 and so where native strings are
>> +   unicode strings, that encoding is used for the names of CGI
>> +   variables.
>> +
>> +2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
>> +   environment, the value of the variable should be a native string.
>> +
>> +3. For the CGI variables contained in the WSGI environment, the values
>> +   of the variables are native strings. Where native strings are
>> +   unicode strings, ISO-8859-1 encoding would be used such that the
>> +   original character data is preserved and as necessary the unicode
>> +   string can be converted back to bytes and thence decoded to unicode
>> +   again using a different encoding.
>> +
>> +4. The WSGI input stream 'wsgi.input' contained in the WSGI environment
>> +   and from which request content is read, should yield byte strings.
>> +
>> +5. The status line specified by the WSGI application should be a byte
>> +   string. Where native strings are unicode strings, the native string
>> +   type can also be returned in which case it would be encoded as
>> +   ISO-8859-1.
>> +
>> +6. The list of response headers specified by the WSGI application should
>> +   contain tuples consisting of two values, where each value is a byte
>> +   string. Where native strings are unicode strings, the native string
>> +   type can also be returned in which case it would be encoded as
>> +   ISO-8859-1.
>> +
>> +7. The iterable returned by the application and from which response
>> +   content is derived, should yield byte strings. Where native strings
>> +   are unicode strings, the native string type can also be returned in
>> +   which case it would be encoded as ISO-8859-1.
>> +
>> +8. The value passed to the 'write()' callback returned by
>> +   'start_response()' should be a byte string. Where native strings
>> +   are unicode strings, a native string type can also be supplied, in
>> +   which case it would be encoded as ISO-8859-1.
>>
>>
>>  Specification Overview
>> @@ -447,6 +457,13 @@
>>  Streaming`_ section below for more on how application output must be
>>  handled.)
>>
>> +Further on, several places specify constraints upon string types used
>> +in the WSGI API. The term native string is used to mean the 'str' class
>> +in both Python 2.x and 3.x. The spec tries to ensure optimal
>> +compatibility and ease of use by allowing implementations running on
>> +Python 3.x to encode strings (which are Unicode strings with no
>> +specified encoding) as ISO-8859-1 where a 3.x string is passed in.
>> +
>>  The server or gateway should treat the yielded strings as binary byte
>>  sequences: in particular, it should ensure that line endings are
>>  not altered.  The application is responsible for ensuring that the
>> @@ -489,12 +506,22 @@
>>  ``environ`` Variables
>>  ---------------------
>>
>> +All keys in this dictionary are native strings. For CGI variables,
>> +all names are going to be ISO-8859-1 and so where native strings are
>> +unicode strings, that encoding is used for the names of CGI variables.
>> +
>>  The ``environ`` dictionary is required to contain these CGI
>>  environment variables, as defined by the Common Gateway Interface
>>  specification [2]_.  The following variables **must** be present,
>>  unless their value would be an empty string, in which case they
>>  **may** be omitted, except as otherwise noted below.
>>
>> +The values for CGI variables are native strings. Where native strings
>> +are unicode strings, ISO-8859-1 encoding would be used such that the
>> +original character data is preserved and as necessary the unicode
>> +string can be converted back to bytes and thence decoded to unicode
>> +again using a different encoding.
>> +
>>  ``REQUEST_METHOD``
>>   The HTTP request method, such as ``"GET"`` or ``"POST"``.  This
>>   cannot ever be an empty string, and so is always required.
>> @@ -575,13 +602,14 @@
>>  =====================  ===============================================
>>  Variable               Value
>>  =====================  ===============================================
>> -``wsgi.version``       The tuple ``(1,0)``, representing WSGI
>> +``wsgi.version``       The tuple ``(1, 0)``, representing WSGI
>>                        version 1.0.
>>
>>  ``wsgi.url_scheme``    A string representing the "scheme" portion of
>>                        the URL at which the application is being
>>                        invoked.  Normally, this will have the value
>> -                       ``"http"`` or ``"https"``, as appropriate.
>> +                       ``"http"`` or ``"https"``, as appropriate. The
>> +                       value is a native string.
>>
>>  ``wsgi.input``         An input stream (file-like object) from which
>>                        the HTTP request body can be read.  (The server
>> @@ -646,7 +674,7 @@
>>  Method               Stream      Notes
>>  ===================  ==========  ========
>>  ``read(size)``       ``input``   1
>> -``readline()``       ``input``   1,2
>> +``readline(hint)``   ``input``   1,2
>>  ``readlines(hint)``  ``input``   1,3
>>  ``__iter__()``       ``input``
>>  ``flush()``          ``errors``  4
>> @@ -661,11 +689,12 @@
>>    ``Content-Length``, and is allowed to simulate an end-of-file
>>    condition if the application attempts to read past that point.
>>    The application **should not** attempt to read more data than is
>> -   specified by the ``CONTENT_LENGTH`` variable.
>> +   specified by the ``CONTENT_LENGTH`` variable. All read functions
>> +   are required to return an empty string as the end of input stream
>> +   marker. They must yield byte strings.
>>
>> -2. The optional "size" argument to ``readline()`` is not supported,
>> -   as it may be complex for server authors to implement, and is not
>> -   often used in practice.
>> +2. The optional "size" argument to ``readline()`` is required for
>> +   the implementer, but optional for callers.
>>
>>  3. Note that the ``hint`` argument to ``readlines()`` is optional for
>>    both caller and implementer.  The application is free not to
>> @@ -692,12 +721,15 @@
>>  ---------------------------------
>>
>>  The second parameter passed to the application object is a callable
>> -of the form ``start_response(status,response_headers,exc_info=None)``.
>> +of the form ``start_response(status, response_headers, exc_info=None)``.
>>  (As with all WSGI callables, the arguments must be supplied
>>  positionally, not by keyword.)  The ``start_response`` callable is
>>  used to begin the HTTP response, and it must return a
>>  ``write(body_data)`` callable (see the `Buffering and Streaming`_
>> -section, below).
>> +section, below). Values passed to the ``write(body_data)`` callable
>> +should be byte strings. Where native strings are unicode strings, a
>> +native strings type can also be supplied, in which case it would be
>> +encoded as ISO-8859-1.
>>
>>  The ``status`` argument is an HTTP "status" string like ``"200 OK"``
>>  or ``"404 Not Found"``.  That is, it is a string consisting of a
>> @@ -705,14 +737,20 @@
>>  single space, with no surrounding whitespace or other characters.
>>  (See RFC 2616, Section 6.1.1 for more information.)  The string
>>  **must not** contain control characters, and must not be terminated
>> -with a carriage return, linefeed, or combination thereof.
>> +with a carriage return, linefeed, or combination thereof. This
>> +value should be a byte string. Where native strings are unicode
>> +strings, the native string type can also be returned, in which
>> +case it would be encoded as ISO-8859-1.
>>
>>  The ``response_headers`` argument is a list of ``(header_name,
>>  header_value)`` tuples.  It must be a Python list; i.e.
>> -``type(response_headers) is ListType``, and the server **may** change
>> +``type(response_headers) is list``, and the server **may** change
>>  its contents in any way it desires.  Each ``header_name`` must be a
>>  valid HTTP header field-name (as defined by RFC 2616, Section 4.2),
>> -without a trailing colon or other punctuation.
>> +without a trailing colon or other punctuation. Both the header_name
>> +and the header_value should be byte strings. Where native strings
>> +are unicode strings, the native string type can also be returned,
>> +in which case it would be encoded as ISO-8859-1.
>>
>>  Each ``header_value`` **must not** include *any* control characters,
>>  including carriage returns or linefeeds, either embedded or at the end.
>> @@ -809,6 +847,14 @@
>>  Handling the ``Content-Length`` Header
>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> +If an application or middleware layer chooses to return a
>> +Content-Length header, it should not return more data than specified
>> +by the header value. Any wrapping middleware layer should not
>> +consume more data than specified in the header value from the
>> +wrapped component (either middleware or application). Any WSGI
>> +adapter must similarly not pass on data above what the
>> +Content-Length response header value defines.
>> +
>>  If the application does not supply a ``Content-Length`` header, a
>>  server or gateway may choose one of several approaches to handling
>>  it.  The simplest of these is to close the client connection when
>> @@ -1569,55 +1615,13 @@
>>    developers.
>>
>>
>> -Proposed/Under Discussion
>> -=========================
>> -
>> -These items are currently being discussed on the Web-SIG and elsewhere,
>> -or are on the PEP author's "to-do" list:
>> -
>> -* Should ``wsgi.input`` be an iterator instead of a file?  This would
>> -  help for asynchronous applications and chunked-encoding input
>> -  streams.
>> -
>> -* Optional extensions are being discussed for pausing iteration of an
>> -  application's ouptut until input is available or until a callback
>> -  occurs.
>> -
>> -* Add a section about synchronous vs. asynchronous apps and servers,
>> -  the relevant threading models, and issues/design goals in these
>> -  areas.
>> -
>> -
>>  Acknowledgements
>>  ================
>>
>> -Thanks go to the many folks on the Web-SIG mailing list whose
>> -thoughtful feedback made this revised draft possible.  Especially:
>> +Thanks go to many folks on the Web-SIG mailing list for helping the work
>> +on clarifying and improving this specification. In particular:
>>
>> -* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
>> -  on the first draft as not offering any advantages over "plain old
>> -  CGI", thus encouraging me to look for a better approach.
>> -
>> -* Ian Bicking, who helped nag me into properly specifying the
>> -  multithreading and multiprocess options, as well as badgering me to
>> -  provide a mechanism for servers to supply custom extension data to
>> -  an application.
>> -
>> -* Tony Lownds, who came up with the concept of a ``start_response``
>> -  function that took the status and headers, returning a ``write``
>> -  function.  His input also guided the design of the exception handling
>> -  facilities, especially in the area of allowing for middleware that
>> -  overrides application error messages.
>> -
>> -* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython
>> -  (well before the spec was finalized) helped to shape the "supporting
>> -  older versions of Python" section, as well as the optional
>> -  ``wsgi.file_wrapper`` facility.
>> -
>> -* Mark Nottingham, who reviewed the spec extensively for issues with
>> -  HTTP RFC compliance, especially with regard to HTTP/1.1 features that
>> -  I didn't even know existed until he pointed them out.
>> -
>> +* Phillip J. Eby, for writing/editing the 1.0 specification.
>>
>>  References
>>  ==========
>> @@ -1643,8 +1647,6 @@
>>
>>  This document has been placed in the public domain.
>>
>> -
>> -
>>  ..
>>    Local Variables:
>>    mode: indented-text
>>
>
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Draft PEP: WSGI 1.1

Paul Davis-11
On Thu, Apr 15, 2010 at 10:08 PM, Graham Dumpleton
<[hidden email]> wrote:

> On 16 April 2010 11:41, Graham Dumpleton <[hidden email]> wrote:
>> I haven't read what you have done yet
>
> And still haven't. Don't know when I will get a chance to do so.
>
> Two points from a quick scan of emails.
>
> 1. The following section of PEP needs to be updated:
>
> """
>  1417 Apart from the handling of ``close()``, the semantics of returning a
>  1418 file wrapper from the application should be the same as if the
>  1419 application had returned ``iter(filelike.read, '')``.  In other words,
>  1420 transmission should begin at the current position within the "file"
>  1421 at the time that transmission begins, and continue until the end is
>  1422 reached.
> """
>
> It can't say read until 'end is reached' of file as Content-Length
> must be honoured and less returned if Content-Length is less than what
> is available in the remainder of the file as per descriptive changes
> (3) and (4).
>
> In respect of question about readline() arguments and whether -1 or
> None is allowed. I would say no they are not. Must be positive integer
> or no argument supplied at all.
>
> Different implementations use -1 or None as value of a default
> argument to know when an argument wasn't supplied. One cant rely
> though on one or the other being used and so that supplying those
> arguments explicitly means the same thing as no argument supplied. In
> other words, supplying anything but positive integer or no argument at
> all is undefined.
>
> Same issue arises with read() except that only positive integer can
> technically be supplied and argument is not optional. Although, any
> implementation which implements wsgi.input as a proper file like
> argument is going to accept no argument to mean read all input, this
> is outside of WSGI specification and calling with no argument is
> undefined.
>
> Graham

I happened to have just started hitting the body reading functions on
an HTTP parser I've been working on. I'd be interested to hear a
response on what happens when the various read functions are called
with a size hint of zero.

I realize that zero is not a positive integer but I'm not quite sure
on what the recommended return value would be. I'm can see None and -1
being obvious flags for "no size hint", but zero is a tad weird. I
want to say that it'd either return "" (which could sorta kinda
violate #2) or raise an exception. I really haven't got any reason to
prefer on over the other though.

As an aside, I think that "honoring Content-Length" should probably be
rephrased to a "middleware should not break HTTP" coupled with a page
that lists common ways that middle ware breaks HTTP. I reckon its the
same reasoning for 333's dictation that hop-by-hop headers are server
only, though there are plenty of other ways I could violate RFC 2616
as a middleware author without violating WSGI. Pie in the sky, the
common ways would be included with wsgiref's validate decorator.

Paul

>> but if you have done so
>> already, ensure you read:
>>
>>  http://bitbucket.org/ianb/wsgi-peps/src/
>>
>> This is Ian's and Armin's previous go at new specification. It though
>> tried to go further than what you are doing.
>>
>> Also read:
>>
>>  http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html
>>
>> I explain what I mean by native strings in that.
>>
>> Graham
>>
>> On 15 April 2010 22:54, Dirkjan Ochtman <[hidden email]> wrote:
>>> Mostly taking Graham's list of issues and incorporating it into PEP 333.
>>>
>>> Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt
>>>
>>> Let's have comments here (comments in the form of diffs are
>>> particularly welcome, of course). Remember, the idea is not to change
>>> or improve WSGI right now, but only to improve the spec, improving
>>> interoperability and enabling Python 3 support.
>>>
>>> Graham, I hope I did a good job with your suggestions. (Since so much
>>> of this is yours, I've just listed you as the second author.) I tried
>>> to clarify exactly what you meant by "native strings", can you check
>>> that out?
>>>
>>> Cheers,
>>>
>>> Dirkjan
>>>
>>> --- pep-0333.txt        2010-04-15 14:46:02.000000000 +0200
>>> +++ wsgi-1.1.txt        2010-04-15 14:51:39.000000000 +0200
>>> @@ -1,114 +1,124 @@
>>> -PEP: 333
>>> -Title: Python Web Server Gateway Interface v1.0
>>> +PEP: 0000
>>> +Title: Python Web Server Gateway Interface 1.1
>>>  Version: $Revision$
>>>  Last-Modified: $Date$
>>> -Author: Phillip J. Eby <[hidden email]>
>>> +Author: Dirkjan Ochtman <[hidden email]>,
>>> +        Graham Dumpleton <[hidden email]>
>>>  Discussions-To: Python Web-SIG <[hidden email]>
>>>  Status: Draft
>>>  Type: Informational
>>>  Content-Type: text/x-rst
>>> -Created: 07-Dec-2003
>>> -Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
>>> +Created: 15-04-2010
>>> +Post-History: Not yet
>>>
>>>
>>>  Abstract
>>>  ========
>>>
>>> -This document specifies a proposed standard interface between web
>>> -servers and Python web applications or frameworks, to promote web
>>> -application portability across a variety of web servers.
>>> +This document specifies a revision of the proposed standard interface
>>> +between web servers and Python web applications or frameworks, to
>>> +promote web application portability across a variety of web servers.
>>>
>>>
>>>  Rationale and Goals
>>>  ===================
>>>
>>> -Python currently boasts a wide variety of web application frameworks,
>>> -such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
>>> -name just a few [1]_.  This wide variety of choices can be a problem
>>> -for new Python users, because generally speaking, their choice of web
>>> -framework will limit their choice of usable web servers, and vice
>>> -versa.
>>> -
>>> -By contrast, although Java has just as many web application frameworks
>>> -available, Java's "servlet" API makes it possible for applications
>>> -written with any Java web application framework to run in any web
>>> -server that supports the servlet API.
>>> -
>>> -The availability and widespread use of such an API in web servers for
>>> -Python -- whether those servers are written in Python (e.g. Medusa),
>>> -embed Python (e.g. mod_python), or invoke Python via a gateway
>>> -protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
>>> -framework from choice of web server, freeing users to choose a pairing
>>> -that suits them, while freeing framework and server developers to
>>> -focus on their preferred area of specialization.
>>> -
>>> -This PEP, therefore, proposes a simple and universal interface between
>>> -web servers and web applications or frameworks: the Python Web Server
>>> -Gateway Interface (WSGI).
>>> -
>>> -But the mere existence of a WSGI spec does nothing to address the
>>> -existing state of servers and frameworks for Python web applications.
>>> -Server and framework authors and maintainers must actually implement
>>> -WSGI for there to be any effect.
>>> -
>>> -However, since no existing servers or frameworks support WSGI, there
>>> -is little immediate reward for an author who implements WSGI support.
>>> -Thus, WSGI **must** be easy to implement, so that an author's initial
>>> -investment in the interface can be reasonably low.
>>> -
>>> -Thus, simplicity of implementation on *both* the server and framework
>>> -sides of the interface is absolutely critical to the utility of the
>>> -WSGI interface, and is therefore the principal criterion for any
>>> -design decisions.
>>> -
>>> -Note, however, that simplicity of implementation for a framework
>>> -author is not the same thing as ease of use for a web application
>>> -author.  WSGI presents an absolutely "no frills" interface to the
>>> -framework author, because bells and whistles like response objects and
>>> -cookie handling would just get in the way of existing frameworks'
>>> -handling of these issues.  Again, the goal of WSGI is to facilitate
>>> -easy interconnection of existing servers and applications or
>>> -frameworks, not to create a new web framework.
>>> -
>>> -Note also that this goal precludes WSGI from requiring anything that
>>> -is not already available in deployed versions of Python.  Therefore,
>>> -new standard library modules are not proposed or required by this
>>> -specification, and nothing in WSGI requires a Python version greater
>>> -than 2.2.2.  (It would be a good idea, however, for future versions
>>> -of Python to include support for this interface in web servers
>>> -provided by the standard library.)
>>> -
>>> -In addition to ease of implementation for existing and future
>>> -frameworks and servers, it should also be easy to create request
>>> -preprocessors, response postprocessors, and other WSGI-based
>>> -"middleware" components that look like an application to their
>>> -containing server, while acting as a server for their contained
>>> -applications.
>>> -
>>> -If middleware can be both simple and robust, and WSGI is widely
>>> -available in servers and frameworks, it allows for the possibility
>>> -of an entirely new kind of Python web application framework: one
>>> -consisting of loosely-coupled WSGI middleware components.  Indeed,
>>> -existing framework authors may even choose to refactor their
>>> -frameworks' existing services to be provided in this way, becoming
>>> -more like libraries used with WSGI, and less like monolithic
>>> -frameworks.  This would then allow application developers to choose
>>> -"best-of-breed" components for specific functionality, rather than
>>> -having to commit to all the pros and cons of a single framework.
>>> -
>>> -Of course, as of this writing, that day is doubtless quite far off.
>>> -In the meantime, it is a sufficient short-term goal for WSGI to
>>> -enable the use of any framework with any server.
>>> -
>>> -Finally, it should be mentioned that the current version of WSGI
>>> -does not prescribe any particular mechanism for "deploying" an
>>> -application for use with a web server or server gateway.  At the
>>> -present time, this is necessarily implementation-defined by the
>>> -server or gateway.  After a sufficient number of servers and
>>> -frameworks have implemented WSGI to provide field experience with
>>> -varying deployment requirements, it may make sense to create
>>> -another PEP, describing a deployment standard for WSGI servers and
>>> -application frameworks.
>>> +WSGI 1.0, specified in PEP 333, did a great job in making it easier
>>> +for web applications and web servers to interface with each other.
>>> +It has become very much the standard it was meant to be and an
>>> +important part of the Python web development infrastructure.
>>> +
>>> +After several implementations were built by different developers,
>>> +it inevitably turned out that the specification wasn't perfect. It
>>> +left out some details that were implemented by all the web server
>>> +interfaces because they were critical for many applications (or
>>> +application frameworks). Additionally, the specification was written
>>> +before Python 3.x was specified, resulting in a lack of clear
>>> +specification on what to do with unicode strings.
>>> +
>>> +While there are some ideas around to improve WSGI further in less
>>> +compatible ways, we feel that there is value to be had in first
>>> +specifying a minor revision of the specification, which is largely
>>> +compatible with existing implementations. Further simplification
>>> +and experimentation are therefore deferred to a 2.0 version.
>>> +
>>> +
>>> +Differences with WSGI 1.0
>>> +=========================
>>> +
>>> +Descriptive changes
>>> +-------------------
>>> +
>>> +The following changes were made to realign the spec with
>>> +implementations 'in the wild'.
>>> +
>>> +1. The 'readline()' function of 'wsgi.input' must optionally take
>>> +   a size hint. This is required because many applications use
>>> +   cgi.FieldStorage, which uses this functionality.
>>> +
>>> +2. The 'wsgi.input' functions for reading input must return an empty
>>> +   string as end of input stream marker. This is required for support
>>> +   of HTTP 1.1 request pipelining. A correctly implemented WSGI
>>> +   middleware already has to cope with an empty string as end
>>> +   sentinel anyway to detect premature end of input.
>>> +
>>> +3. Any WSGI application or middleware should not itself return, or
>>> +   consume from a wrapped WSGI component, more data than specified by
>>> +   the Content-Length response header if defined. Middleware that
>>> +   does this is arguably broken and can generate incorrect data.
>>> +   This is just a clarification of obligations.
>>> +
>>> +4. The WSGI adapter must not pass on to the server any data above
>>> +   what the Content-Length response header defines, if supplied.
>>> +   Doing this is technically a violation of HTTP. This is another
>>> +   clarification of obligations.
>>> +
>>> +
>>> +String handling changes
>>> +-----------------------
>>> +
>>> +The following changes were made to make WSGI work on Python 3.x.
>>> +
>>> +1. The application is passed an instance of a Python dictionary
>>> +   containing what is referred to as the WSGI environment. All keys
>>> +   in this dictionary are native strings. For CGI variables, all names
>>> +   are going to be ISO-8859-1 and so where native strings are
>>> +   unicode strings, that encoding is used for the names of CGI
>>> +   variables.
>>> +
>>> +2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
>>> +   environment, the value of the variable should be a native string.
>>> +
>>> +3. For the CGI variables contained in the WSGI environment, the values
>>> +   of the variables are native strings. Where native strings are
>>> +   unicode strings, ISO-8859-1 encoding would be used such that the
>>> +   original character data is preserved and as necessary the unicode
>>> +   string can be converted back to bytes and thence decoded to unicode
>>> +   again using a different encoding.
>>> +
>>> +4. The WSGI input stream 'wsgi.input' contained in the WSGI environment
>>> +   and from which request content is read, should yield byte strings.
>>> +
>>> +5. The status line specified by the WSGI application should be a byte
>>> +   string. Where native strings are unicode strings, the native string
>>> +   type can also be returned in which case it would be encoded as
>>> +   ISO-8859-1.
>>> +
>>> +6. The list of response headers specified by the WSGI application should
>>> +   contain tuples consisting of two values, where each value is a byte
>>> +   string. Where native strings are unicode strings, the native string
>>> +   type can also be returned in which case it would be encoded as
>>> +   ISO-8859-1.
>>> +
>>> +7. The iterable returned by the application and from which response
>>> +   content is derived, should yield byte strings. Where native strings
>>> +   are unicode strings, the native string type can also be returned in
>>> +   which case it would be encoded as ISO-8859-1.
>>> +
>>> +8. The value passed to the 'write()' callback returned by
>>> +   'start_response()' should be a byte string. Where native strings
>>> +   are unicode strings, a native string type can also be supplied, in
>>> +   which case it would be encoded as ISO-8859-1.
>>>
>>>
>>>  Specification Overview
>>> @@ -447,6 +457,13 @@
>>>  Streaming`_ section below for more on how application output must be
>>>  handled.)
>>>
>>> +Further on, several places specify constraints upon string types used
>>> +in the WSGI API. The term native string is used to mean the 'str' class
>>> +in both Python 2.x and 3.x. The spec tries to ensure optimal
>>> +compatibility and ease of use by allowing implementations running on
>>> +Python 3.x to encode strings (which are Unicode strings with no
>>> +specified encoding) as ISO-8859-1 where a 3.x string is passed in.
>>> +
>>>  The server or gateway should treat the yielded strings as binary byte
>>>  sequences: in particular, it should ensure that line endings are
>>>  not altered.  The application is responsible for ensuring that the
>>> @@ -489,12 +506,22 @@
>>>  ``environ`` Variables
>>>  ---------------------
>>>
>>> +All keys in this dictionary are native strings. For CGI variables,
>>> +all names are going to be ISO-8859-1 and so where native strings are
>>> +unicode strings, that encoding is used for the names of CGI variables.
>>> +
>>>  The ``environ`` dictionary is required to contain these CGI
>>>  environment variables, as defined by the Common Gateway Interface
>>>  specification [2]_.  The following variables **must** be present,
>>>  unless their value would be an empty string, in which case they
>>>  **may** be omitted, except as otherwise noted below.
>>>
>>> +The values for CGI variables are native strings. Where native strings
>>> +are unicode strings, ISO-8859-1 encoding would be used such that the
>>> +original character data is preserved and as necessary the unicode
>>> +string can be converted back to bytes and thence decoded to unicode
>>> +again using a different encoding.
>>> +
>>>  ``REQUEST_METHOD``
>>>   The HTTP request method, such as ``"GET"`` or ``"POST"``.  This
>>>   cannot ever be an empty string, and so is always required.
>>> @@ -575,13 +602,14 @@
>>>  =====================  ===============================================
>>>  Variable               Value
>>>  =====================  ===============================================
>>> -``wsgi.version``       The tuple ``(1,0)``, representing WSGI
>>> +``wsgi.version``       The tuple ``(1, 0)``, representing WSGI
>>>                        version 1.0.
>>>
>>>  ``wsgi.url_scheme``    A string representing the "scheme" portion of
>>>                        the URL at which the application is being
>>>                        invoked.  Normally, this will have the value
>>> -                       ``"http"`` or ``"https"``, as appropriate.
>>> +                       ``"http"`` or ``"https"``, as appropriate. The
>>> +                       value is a native string.
>>>
>>>  ``wsgi.input``         An input stream (file-like object) from which
>>>                        the HTTP request body can be read.  (The server
>>> @@ -646,7 +674,7 @@
>>>  Method               Stream      Notes
>>>  ===================  ==========  ========
>>>  ``read(size)``       ``input``   1
>>> -``readline()``       ``input``   1,2
>>> +``readline(hint)``   ``input``   1,2
>>>  ``readlines(hint)``  ``input``   1,3
>>>  ``__iter__()``       ``input``
>>>  ``flush()``          ``errors``  4
>>> @@ -661,11 +689,12 @@
>>>    ``Content-Length``, and is allowed to simulate an end-of-file
>>>    condition if the application attempts to read past that point.
>>>    The application **should not** attempt to read more data than is
>>> -   specified by the ``CONTENT_LENGTH`` variable.
>>> +   specified by the ``CONTENT_LENGTH`` variable. All read functions
>>> +   are required to return an empty string as the end of input stream
>>> +   marker. They must yield byte strings.
>>>
>>> -2. The optional "size" argument to ``readline()`` is not supported,
>>> -   as it may be complex for server authors to implement, and is not
>>> -   often used in practice.
>>> +2. The optional "size" argument to ``readline()`` is required for
>>> +   the implementer, but optional for callers.
>>>
>>>  3. Note that the ``hint`` argument to ``readlines()`` is optional for
>>>    both caller and implementer.  The application is free not to
>>> @@ -692,12 +721,15 @@
>>>  ---------------------------------
>>>
>>>  The second parameter passed to the application object is a callable
>>> -of the form ``start_response(status,response_headers,exc_info=None)``.
>>> +of the form ``start_response(status, response_headers, exc_info=None)``.
>>>  (As with all WSGI callables, the arguments must be supplied
>>>  positionally, not by keyword.)  The ``start_response`` callable is
>>>  used to begin the HTTP response, and it must return a
>>>  ``write(body_data)`` callable (see the `Buffering and Streaming`_
>>> -section, below).
>>> +section, below). Values passed to the ``write(body_data)`` callable
>>> +should be byte strings. Where native strings are unicode strings, a
>>> +native strings type can also be supplied, in which case it would be
>>> +encoded as ISO-8859-1.
>>>
>>>  The ``status`` argument is an HTTP "status" string like ``"200 OK"``
>>>  or ``"404 Not Found"``.  That is, it is a string consisting of a
>>> @@ -705,14 +737,20 @@
>>>  single space, with no surrounding whitespace or other characters.
>>>  (See RFC 2616, Section 6.1.1 for more information.)  The string
>>>  **must not** contain control characters, and must not be terminated
>>> -with a carriage return, linefeed, or combination thereof.
>>> +with a carriage return, linefeed, or combination thereof. This
>>> +value should be a byte string. Where native strings are unicode
>>> +strings, the native string type can also be returned, in which
>>> +case it would be encoded as ISO-8859-1.
>>>
>>>  The ``response_headers`` argument is a list of ``(header_name,
>>>  header_value)`` tuples.  It must be a Python list; i.e.
>>> -``type(response_headers) is ListType``, and the server **may** change
>>> +``type(response_headers) is list``, and the server **may** change
>>>  its contents in any way it desires.  Each ``header_name`` must be a
>>>  valid HTTP header field-name (as defined by RFC 2616, Section 4.2),
>>> -without a trailing colon or other punctuation.
>>> +without a trailing colon or other punctuation. Both the header_name
>>> +and the header_value should be byte strings. Where native strings
>>> +are unicode strings, the native string type can also be returned,
>>> +in which case it would be encoded as ISO-8859-1.
>>>
>>>  Each ``header_value`` **must not** include *any* control characters,
>>>  including carriage returns or linefeeds, either embedded or at the end.
>>> @@ -809,6 +847,14 @@
>>>  Handling the ``Content-Length`` Header
>>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> +If an application or middleware layer chooses to return a
>>> +Content-Length header, it should not return more data than specified
>>> +by the header value. Any wrapping middleware layer should not
>>> +consume more data than specified in the header value from the
>>> +wrapped component (either middleware or application). Any WSGI
>>> +adapter must similarly not pass on data above what the
>>> +Content-Length response header value defines.
>>> +
>>>  If the application does not supply a ``Content-Length`` header, a
>>>  server or gateway may choose one of several approaches to handling
>>>  it.  The simplest of these is to close the client connection when
>>> @@ -1569,55 +1615,13 @@
>>>    developers.
>>>
>>>
>>> -Proposed/Under Discussion
>>> -=========================
>>> -
>>> -These items are currently being discussed on the Web-SIG and elsewhere,
>>> -or are on the PEP author's "to-do" list:
>>> -
>>> -* Should ``wsgi.input`` be an iterator instead of a file?  This would
>>> -  help for asynchronous applications and chunked-encoding input
>>> -  streams.
>>> -
>>> -* Optional extensions are being discussed for pausing iteration of an
>>> -  application's ouptut until input is available or until a callback
>>> -  occurs.
>>> -
>>> -* Add a section about synchronous vs. asynchronous apps and servers,
>>> -  the relevant threading models, and issues/design goals in these
>>> -  areas.
>>> -
>>> -
>>>  Acknowledgements
>>>  ================
>>>
>>> -Thanks go to the many folks on the Web-SIG mailing list whose
>>> -thoughtful feedback made this revised draft possible.  Especially:
>>> +Thanks go to many folks on the Web-SIG mailing list for helping the work
>>> +on clarifying and improving this specification. In particular:
>>>
>>> -* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
>>> -  on the first draft as not offering any advantages over "plain old
>>> -  CGI", thus encouraging me to look for a better approach.
>>> -
>>> -* Ian Bicking, who helped nag me into properly specifying the
>>> -  multithreading and multiprocess options, as well as badgering me to
>>> -  provide a mechanism for servers to supply custom extension data to
>>> -  an application.
>>> -
>>> -* Tony Lownds, who came up with the concept of a ``start_response``
>>> -  function that took the status and headers, returning a ``write``
>>> -  function.  His input also guided the design of the exception handling
>>> -  facilities, especially in the area of allowing for middleware that
>>> -  overrides application error messages.
>>> -
>>> -* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython
>>> -  (well before the spec was finalized) helped to shape the "supporting
>>> -  older versions of Python" section, as well as the optional
>>> -  ``wsgi.file_wrapper`` facility.
>>> -
>>> -* Mark Nottingham, who reviewed the spec extensively for issues with
>>> -  HTTP RFC compliance, especially with regard to HTTP/1.1 features that
>>> -  I didn't even know existed until he pointed them out.
>>> -
>>> +* Phillip J. Eby, for writing/editing the 1.0 specification.
>>>
>>>  References
>>>  ==========
>>> @@ -1643,8 +1647,6 @@
>>>
>>>  This document has been placed in the public domain.
>>>
>>> -
>>> -
>>>  ..
>>>    Local Variables:
>>>    mode: indented-text
>>>
>>
> _______________________________________________
> Web-SIG mailing list
> [hidden email]
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com
>
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Draft PEP: WSGI 1.1

Graham Dumpleton-2
On 16 April 2010 13:29, Paul Davis <[hidden email]> wrote:

> On Thu, Apr 15, 2010 at 10:08 PM, Graham Dumpleton
> <[hidden email]> wrote:
>> On 16 April 2010 11:41, Graham Dumpleton <[hidden email]> wrote:
>>> I haven't read what you have done yet
>>
>> And still haven't. Don't know when I will get a chance to do so.
>>
>> Two points from a quick scan of emails.
>>
>> 1. The following section of PEP needs to be updated:
>>
>> """
>>  1417 Apart from the handling of ``close()``, the semantics of returning a
>>  1418 file wrapper from the application should be the same as if the
>>  1419 application had returned ``iter(filelike.read, '')``.  In other words,
>>  1420 transmission should begin at the current position within the "file"
>>  1421 at the time that transmission begins, and continue until the end is
>>  1422 reached.
>> """
>>
>> It can't say read until 'end is reached' of file as Content-Length
>> must be honoured and less returned if Content-Length is less than what
>> is available in the remainder of the file as per descriptive changes
>> (3) and (4).
>>
>> In respect of question about readline() arguments and whether -1 or
>> None is allowed. I would say no they are not. Must be positive integer
>> or no argument supplied at all.
>>
>> Different implementations use -1 or None as value of a default
>> argument to know when an argument wasn't supplied. One cant rely
>> though on one or the other being used and so that supplying those
>> arguments explicitly means the same thing as no argument supplied. In
>> other words, supplying anything but positive integer or no argument at
>> all is undefined.
>>
>> Same issue arises with read() except that only positive integer can
>> technically be supplied and argument is not optional. Although, any
>> implementation which implements wsgi.input as a proper file like
>> argument is going to accept no argument to mean read all input, this
>> is outside of WSGI specification and calling with no argument is
>> undefined.
>>
>> Graham
>
> I happened to have just started hitting the body reading functions on
> an HTTP parser I've been working on. I'd be interested to hear a
> response on what happens when the various read functions are called
> with a size hint of zero.
>
> I realize that zero is not a positive integer but I'm not quite sure
> on what the recommended return value would be. I'm can see None and -1
> being obvious flags for "no size hint", but zero is a tad weird. I
> want to say that it'd either return "" (which could sorta kinda
> violate #2) or raise an exception. I really haven't got any reason to
> prefer on over the other though.

I almost mentioned 0 as argument in my previous email, but I got a bit
scared off by it also.

In all these things, one has to be guided by what a standard file like
object does in Python. Ie.,

>>> import sys
>>> sys.stdin.read(0)
''

So, although an empty string would normally indicate no more content
can be read, a argument of 0 has to be seen as a special exception to
that rule, with no choice but that empty string is returned.

Graham

> As an aside, I think that "honoring Content-Length" should probably be
> rephrased to a "middleware should not break HTTP" coupled with a page
> that lists common ways that middle ware breaks HTTP. I reckon its the
> same reasoning for 333's dictation that hop-by-hop headers are server
> only, though there are plenty of other ways I could violate RFC 2616
> as a middleware author without violating WSGI. Pie in the sky, the
> common ways would be included with wsgiref's validate decorator.
>
> Paul
>
>>> but if you have done so
>>> already, ensure you read:
>>>
>>>  http://bitbucket.org/ianb/wsgi-peps/src/
>>>
>>> This is Ian's and Armin's previous go at new specification. It though
>>> tried to go further than what you are doing.
>>>
>>> Also read:
>>>
>>>  http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html
>>>
>>> I explain what I mean by native strings in that.
>>>
>>> Graham
>>>
>>> On 15 April 2010 22:54, Dirkjan Ochtman <[hidden email]> wrote:
>>>> Mostly taking Graham's list of issues and incorporating it into PEP 333.
>>>>
>>>> Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt
>>>>
>>>> Let's have comments here (comments in the form of diffs are
>>>> particularly welcome, of course). Remember, the idea is not to change
>>>> or improve WSGI right now, but only to improve the spec, improving
>>>> interoperability and enabling Python 3 support.
>>>>
>>>> Graham, I hope I did a good job with your suggestions. (Since so much
>>>> of this is yours, I've just listed you as the second author.) I tried
>>>> to clarify exactly what you meant by "native strings", can you check
>>>> that out?
>>>>
>>>> Cheers,
>>>>
>>>> Dirkjan
>>>>
>>>> --- pep-0333.txt        2010-04-15 14:46:02.000000000 +0200
>>>> +++ wsgi-1.1.txt        2010-04-15 14:51:39.000000000 +0200
>>>> @@ -1,114 +1,124 @@
>>>> -PEP: 333
>>>> -Title: Python Web Server Gateway Interface v1.0
>>>> +PEP: 0000
>>>> +Title: Python Web Server Gateway Interface 1.1
>>>>  Version: $Revision$
>>>>  Last-Modified: $Date$
>>>> -Author: Phillip J. Eby <[hidden email]>
>>>> +Author: Dirkjan Ochtman <[hidden email]>,
>>>> +        Graham Dumpleton <[hidden email]>
>>>>  Discussions-To: Python Web-SIG <[hidden email]>
>>>>  Status: Draft
>>>>  Type: Informational
>>>>  Content-Type: text/x-rst
>>>> -Created: 07-Dec-2003
>>>> -Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
>>>> +Created: 15-04-2010
>>>> +Post-History: Not yet
>>>>
>>>>
>>>>  Abstract
>>>>  ========
>>>>
>>>> -This document specifies a proposed standard interface between web
>>>> -servers and Python web applications or frameworks, to promote web
>>>> -application portability across a variety of web servers.
>>>> +This document specifies a revision of the proposed standard interface
>>>> +between web servers and Python web applications or frameworks, to
>>>> +promote web application portability across a variety of web servers.
>>>>
>>>>
>>>>  Rationale and Goals
>>>>  ===================
>>>>
>>>> -Python currently boasts a wide variety of web application frameworks,
>>>> -such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
>>>> -name just a few [1]_.  This wide variety of choices can be a problem
>>>> -for new Python users, because generally speaking, their choice of web
>>>> -framework will limit their choice of usable web servers, and vice
>>>> -versa.
>>>> -
>>>> -By contrast, although Java has just as many web application frameworks
>>>> -available, Java's "servlet" API makes it possible for applications
>>>> -written with any Java web application framework to run in any web
>>>> -server that supports the servlet API.
>>>> -
>>>> -The availability and widespread use of such an API in web servers for
>>>> -Python -- whether those servers are written in Python (e.g. Medusa),
>>>> -embed Python (e.g. mod_python), or invoke Python via a gateway
>>>> -protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
>>>> -framework from choice of web server, freeing users to choose a pairing
>>>> -that suits them, while freeing framework and server developers to
>>>> -focus on their preferred area of specialization.
>>>> -
>>>> -This PEP, therefore, proposes a simple and universal interface between
>>>> -web servers and web applications or frameworks: the Python Web Server
>>>> -Gateway Interface (WSGI).
>>>> -
>>>> -But the mere existence of a WSGI spec does nothing to address the
>>>> -existing state of servers and frameworks for Python web applications.
>>>> -Server and framework authors and maintainers must actually implement
>>>> -WSGI for there to be any effect.
>>>> -
>>>> -However, since no existing servers or frameworks support WSGI, there
>>>> -is little immediate reward for an author who implements WSGI support.
>>>> -Thus, WSGI **must** be easy to implement, so that an author's initial
>>>> -investment in the interface can be reasonably low.
>>>> -
>>>> -Thus, simplicity of implementation on *both* the server and framework
>>>> -sides of the interface is absolutely critical to the utility of the
>>>> -WSGI interface, and is therefore the principal criterion for any
>>>> -design decisions.
>>>> -
>>>> -Note, however, that simplicity of implementation for a framework
>>>> -author is not the same thing as ease of use for a web application
>>>> -author.  WSGI presents an absolutely "no frills" interface to the
>>>> -framework author, because bells and whistles like response objects and
>>>> -cookie handling would just get in the way of existing frameworks'
>>>> -handling of these issues.  Again, the goal of WSGI is to facilitate
>>>> -easy interconnection of existing servers and applications or
>>>> -frameworks, not to create a new web framework.
>>>> -
>>>> -Note also that this goal precludes WSGI from requiring anything that
>>>> -is not already available in deployed versions of Python.  Therefore,
>>>> -new standard library modules are not proposed or required by this
>>>> -specification, and nothing in WSGI requires a Python version greater
>>>> -than 2.2.2.  (It would be a good idea, however, for future versions
>>>> -of Python to include support for this interface in web servers
>>>> -provided by the standard library.)
>>>> -
>>>> -In addition to ease of implementation for existing and future
>>>> -frameworks and servers, it should also be easy to create request
>>>> -preprocessors, response postprocessors, and other WSGI-based
>>>> -"middleware" components that look like an application to their
>>>> -containing server, while acting as a server for their contained
>>>> -applications.
>>>> -
>>>> -If middleware can be both simple and robust, and WSGI is widely
>>>> -available in servers and frameworks, it allows for the possibility
>>>> -of an entirely new kind of Python web application framework: one
>>>> -consisting of loosely-coupled WSGI middleware components.  Indeed,
>>>> -existing framework authors may even choose to refactor their
>>>> -frameworks' existing services to be provided in this way, becoming
>>>> -more like libraries used with WSGI, and less like monolithic
>>>> -frameworks.  This would then allow application developers to choose
>>>> -"best-of-breed" components for specific functionality, rather than
>>>> -having to commit to all the pros and cons of a single framework.
>>>> -
>>>> -Of course, as of this writing, that day is doubtless quite far off.
>>>> -In the meantime, it is a sufficient short-term goal for WSGI to
>>>> -enable the use of any framework with any server.
>>>> -
>>>> -Finally, it should be mentioned that the current version of WSGI
>>>> -does not prescribe any particular mechanism for "deploying" an
>>>> -application for use with a web server or server gateway.  At the
>>>> -present time, this is necessarily implementation-defined by the
>>>> -server or gateway.  After a sufficient number of servers and
>>>> -frameworks have implemented WSGI to provide field experience with
>>>> -varying deployment requirements, it may make sense to create
>>>> -another PEP, describing a deployment standard for WSGI servers and
>>>> -application frameworks.
>>>> +WSGI 1.0, specified in PEP 333, did a great job in making it easier
>>>> +for web applications and web servers to interface with each other.
>>>> +It has become very much the standard it was meant to be and an
>>>> +important part of the Python web development infrastructure.
>>>> +
>>>> +After several implementations were built by different developers,
>>>> +it inevitably turned out that the specification wasn't perfect. It
>>>> +left out some details that were implemented by all the web server
>>>> +interfaces because they were critical for many applications (or
>>>> +application frameworks). Additionally, the specification was written
>>>> +before Python 3.x was specified, resulting in a lack of clear
>>>> +specification on what to do with unicode strings.
>>>> +
>>>> +While there are some ideas around to improve WSGI further in less
>>>> +compatible ways, we feel that there is value to be had in first
>>>> +specifying a minor revision of the specification, which is largely
>>>> +compatible with existing implementations. Further simplification
>>>> +and experimentation are therefore deferred to a 2.0 version.
>>>> +
>>>> +
>>>> +Differences with WSGI 1.0
>>>> +=========================
>>>> +
>>>> +Descriptive changes
>>>> +-------------------
>>>> +
>>>> +The following changes were made to realign the spec with
>>>> +implementations 'in the wild'.
>>>> +
>>>> +1. The 'readline()' function of 'wsgi.input' must optionally take
>>>> +   a size hint. This is required because many applications use
>>>> +   cgi.FieldStorage, which uses this functionality.
>>>> +
>>>> +2. The 'wsgi.input' functions for reading input must return an empty
>>>> +   string as end of input stream marker. This is required for support
>>>> +   of HTTP 1.1 request pipelining. A correctly implemented WSGI
>>>> +   middleware already has to cope with an empty string as end
>>>> +   sentinel anyway to detect premature end of input.
>>>> +
>>>> +3. Any WSGI application or middleware should not itself return, or
>>>> +   consume from a wrapped WSGI component, more data than specified by
>>>> +   the Content-Length response header if defined. Middleware that
>>>> +   does this is arguably broken and can generate incorrect data.
>>>> +   This is just a clarification of obligations.
>>>> +
>>>> +4. The WSGI adapter must not pass on to the server any data above
>>>> +   what the Content-Length response header defines, if supplied.
>>>> +   Doing this is technically a violation of HTTP. This is another
>>>> +   clarification of obligations.
>>>> +
>>>> +
>>>> +String handling changes
>>>> +-----------------------
>>>> +
>>>> +The following changes were made to make WSGI work on Python 3.x.
>>>> +
>>>> +1. The application is passed an instance of a Python dictionary
>>>> +   containing what is referred to as the WSGI environment. All keys
>>>> +   in this dictionary are native strings. For CGI variables, all names
>>>> +   are going to be ISO-8859-1 and so where native strings are
>>>> +   unicode strings, that encoding is used for the names of CGI
>>>> +   variables.
>>>> +
>>>> +2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
>>>> +   environment, the value of the variable should be a native string.
>>>> +
>>>> +3. For the CGI variables contained in the WSGI environment, the values
>>>> +   of the variables are native strings. Where native strings are
>>>> +   unicode strings, ISO-8859-1 encoding would be used such that the
>>>> +   original character data is preserved and as necessary the unicode
>>>> +   string can be converted back to bytes and thence decoded to unicode
>>>> +   again using a different encoding.
>>>> +
>>>> +4. The WSGI input stream 'wsgi.input' contained in the WSGI environment
>>>> +   and from which request content is read, should yield byte strings.
>>>> +
>>>> +5. The status line specified by the WSGI application should be a byte
>>>> +   string. Where native strings are unicode strings, the native string
>>>> +   type can also be returned in which case it would be encoded as
>>>> +   ISO-8859-1.
>>>> +
>>>> +6. The list of response headers specified by the WSGI application should
>>>> +   contain tuples consisting of two values, where each value is a byte
>>>> +   string. Where native strings are unicode strings, the native string
>>>> +   type can also be returned in which case it would be encoded as
>>>> +   ISO-8859-1.
>>>> +
>>>> +7. The iterable returned by the application and from which response
>>>> +   content is derived, should yield byte strings. Where native strings
>>>> +   are unicode strings, the native string type can also be returned in
>>>> +   which case it would be encoded as ISO-8859-1.
>>>> +
>>>> +8. The value passed to the 'write()' callback returned by
>>>> +   'start_response()' should be a byte string. Where native strings
>>>> +   are unicode strings, a native string type can also be supplied, in
>>>> +   which case it would be encoded as ISO-8859-1.
>>>>
>>>>
>>>>  Specification Overview
>>>> @@ -447,6 +457,13 @@
>>>>  Streaming`_ section below for more on how application output must be
>>>>  handled.)
>>>>
>>>> +Further on, several places specify constraints upon string types used
>>>> +in the WSGI API. The term native string is used to mean the 'str' class
>>>> +in both Python 2.x and 3.x. The spec tries to ensure optimal
>>>> +compatibility and ease of use by allowing implementations running on
>>>> +Python 3.x to encode strings (which are Unicode strings with no
>>>> +specified encoding) as ISO-8859-1 where a 3.x string is passed in.
>>>> +
>>>>  The server or gateway should treat the yielded strings as binary byte
>>>>  sequences: in particular, it should ensure that line endings are
>>>>  not altered.  The application is responsible for ensuring that the
>>>> @@ -489,12 +506,22 @@
>>>>  ``environ`` Variables
>>>>  ---------------------
>>>>
>>>> +All keys in this dictionary are native strings. For CGI variables,
>>>> +all names are going to be ISO-8859-1 and so where native strings are
>>>> +unicode strings, that encoding is used for the names of CGI variables.
>>>> +
>>>>  The ``environ`` dictionary is required to contain these CGI
>>>>  environment variables, as defined by the Common Gateway Interface
>>>>  specification [2]_.  The following variables **must** be present,
>>>>  unless their value would be an empty string, in which case they
>>>>  **may** be omitted, except as otherwise noted below.
>>>>
>>>> +The values for CGI variables are native strings. Where native strings
>>>> +are unicode strings, ISO-8859-1 encoding would be used such that the
>>>> +original character data is preserved and as necessary the unicode
>>>> +string can be converted back to bytes and thence decoded to unicode
>>>> +again using a different encoding.
>>>> +
>>>>  ``REQUEST_METHOD``
>>>>   The HTTP request method, such as ``"GET"`` or ``"POST"``.  This
>>>>   cannot ever be an empty string, and so is always required.
>>>> @@ -575,13 +602,14 @@
>>>>  =====================  ===============================================
>>>>  Variable               Value
>>>>  =====================  ===============================================
>>>> -``wsgi.version``       The tuple ``(1,0)``, representing WSGI
>>>> +``wsgi.version``       The tuple ``(1, 0)``, representing WSGI
>>>>                        version 1.0.
>>>>
>>>>  ``wsgi.url_scheme``    A string representing the "scheme" portion of
>>>>                        the URL at which the application is being
>>>>                        invoked.  Normally, this will have the value
>>>> -                       ``"http"`` or ``"https"``, as appropriate.
>>>> +                       ``"http"`` or ``"https"``, as appropriate. The
>>>> +                       value is a native string.
>>>>
>>>>  ``wsgi.input``         An input stream (file-like object) from which
>>>>                        the HTTP request body can be read.  (The server
>>>> @@ -646,7 +674,7 @@
>>>>  Method               Stream      Notes
>>>>  ===================  ==========  ========
>>>>  ``read(size)``       ``input``   1
>>>> -``readline()``       ``input``   1,2
>>>> +``readline(hint)``   ``input``   1,2
>>>>  ``readlines(hint)``  ``input``   1,3
>>>>  ``__iter__()``       ``input``
>>>>  ``flush()``          ``errors``  4
>>>> @@ -661,11 +689,12 @@
>>>>    ``Content-Length``, and is allowed to simulate an end-of-file
>>>>    condition if the application attempts to read past that point.
>>>>    The application **should not** attempt to read more data than is
>>>> -   specified by the ``CONTENT_LENGTH`` variable.
>>>> +   specified by the ``CONTENT_LENGTH`` variable. All read functions
>>>> +   are required to return an empty string as the end of input stream
>>>> +   marker. They must yield byte strings.
>>>>
>>>> -2. The optional "size" argument to ``readline()`` is not supported,
>>>> -   as it may be complex for server authors to implement, and is not
>>>> -   often used in practice.
>>>> +2. The optional "size" argument to ``readline()`` is required for
>>>> +   the implementer, but optional for callers.
>>>>
>>>>  3. Note that the ``hint`` argument to ``readlines()`` is optional for
>>>>    both caller and implementer.  The application is free not to
>>>> @@ -692,12 +721,15 @@
>>>>  ---------------------------------
>>>>
>>>>  The second parameter passed to the application object is a callable
>>>> -of the form ``start_response(status,response_headers,exc_info=None)``.
>>>> +of the form ``start_response(status, response_headers, exc_info=None)``.
>>>>  (As with all WSGI callables, the arguments must be supplied
>>>>  positionally, not by keyword.)  The ``start_response`` callable is
>>>>  used to begin the HTTP response, and it must return a
>>>>  ``write(body_data)`` callable (see the `Buffering and Streaming`_
>>>> -section, below).
>>>> +section, below). Values passed to the ``write(body_data)`` callable
>>>> +should be byte strings. Where native strings are unicode strings, a
>>>> +native strings type can also be supplied, in which case it would be
>>>> +encoded as ISO-8859-1.
>>>>
>>>>  The ``status`` argument is an HTTP "status" string like ``"200 OK"``
>>>>  or ``"404 Not Found"``.  That is, it is a string consisting of a
>>>> @@ -705,14 +737,20 @@
>>>>  single space, with no surrounding whitespace or other characters.
>>>>  (See RFC 2616, Section 6.1.1 for more information.)  The string
>>>>  **must not** contain control characters, and must not be terminated
>>>> -with a carriage return, linefeed, or combination thereof.
>>>> +with a carriage return, linefeed, or combination thereof. This
>>>> +value should be a byte string. Where native strings are unicode
>>>> +strings, the native string type can also be returned, in which
>>>> +case it would be encoded as ISO-8859-1.
>>>>
>>>>  The ``response_headers`` argument is a list of ``(header_name,
>>>>  header_value)`` tuples.  It must be a Python list; i.e.
>>>> -``type(response_headers) is ListType``, and the server **may** change
>>>> +``type(response_headers) is list``, and the server **may** change
>>>>  its contents in any way it desires.  Each ``header_name`` must be a
>>>>  valid HTTP header field-name (as defined by RFC 2616, Section 4.2),
>>>> -without a trailing colon or other punctuation.
>>>> +without a trailing colon or other punctuation. Both the header_name
>>>> +and the header_value should be byte strings. Where native strings
>>>> +are unicode strings, the native string type can also be returned,
>>>> +in which case it would be encoded as ISO-8859-1.
>>>>
>>>>  Each ``header_value`` **must not** include *any* control characters,
>>>>  including carriage returns or linefeeds, either embedded or at the end.
>>>> @@ -809,6 +847,14 @@
>>>>  Handling the ``Content-Length`` Header
>>>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>> +If an application or middleware layer chooses to return a
>>>> +Content-Length header, it should not return more data than specified
>>>> +by the header value. Any wrapping middleware layer should not
>>>> +consume more data than specified in the header value from the
>>>> +wrapped component (either middleware or application). Any WSGI
>>>> +adapter must similarly not pass on data above what the
>>>> +Content-Length response header value defines.
>>>> +
>>>>  If the application does not supply a ``Content-Length`` header, a
>>>>  server or gateway may choose one of several approaches to handling
>>>>  it.  The simplest of these is to close the client connection when
>>>> @@ -1569,55 +1615,13 @@
>>>>    developers.
>>>>
>>>>
>>>> -Proposed/Under Discussion
>>>> -=========================
>>>> -
>>>> -These items are currently being discussed on the Web-SIG and elsewhere,
>>>> -or are on the PEP author's "to-do" list:
>>>> -
>>>> -* Should ``wsgi.input`` be an iterator instead of a file?  This would
>>>> -  help for asynchronous applications and chunked-encoding input
>>>> -  streams.
>>>> -
>>>> -* Optional extensions are being discussed for pausing iteration of an
>>>> -  application's ouptut until input is available or until a callback
>>>> -  occurs.
>>>> -
>>>> -* Add a section about synchronous vs. asynchronous apps and servers,
>>>> -  the relevant threading models, and issues/design goals in these
>>>> -  areas.
>>>> -
>>>> -
>>>>  Acknowledgements
>>>>  ================
>>>>
>>>> -Thanks go to the many folks on the Web-SIG mailing list whose
>>>> -thoughtful feedback made this revised draft possible.  Especially:
>>>> +Thanks go to many folks on the Web-SIG mailing list for helping the work
>>>> +on clarifying and improving this specification. In particular:
>>>>
>>>> -* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
>>>> -  on the first draft as not offering any advantages over "plain old
>>>> -  CGI", thus encouraging me to look for a better approach.
>>>> -
>>>> -* Ian Bicking, who helped nag me into properly specifying the
>>>> -  multithreading and multiprocess options, as well as badgering me to
>>>> -  provide a mechanism for servers to supply custom extension data to
>>>> -  an application.
>>>> -
>>>> -* Tony Lownds, who came up with the concept of a ``start_response``
>>>> -  function that took the status and headers, returning a ``write``
>>>> -  function.  His input also guided the design of the exception handling
>>>> -  facilities, especially in the area of allowing for middleware that
>>>> -  overrides application error messages.
>>>> -
>>>> -* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython
>>>> -  (well before the spec was finalized) helped to shape the "supporting
>>>> -  older versions of Python" section, as well as the optional
>>>> -  ``wsgi.file_wrapper`` facility.
>>>> -
>>>> -* Mark Nottingham, who reviewed the spec extensively for issues with
>>>> -  HTTP RFC compliance, especially with regard to HTTP/1.1 features that
>>>> -  I didn't even know existed until he pointed them out.
>>>> -
>>>> +* Phillip J. Eby, for writing/editing the 1.0 specification.
>>>>
>>>>  References
>>>>  ==========
>>>> @@ -1643,8 +1647,6 @@
>>>>
>>>>  This document has been placed in the public domain.
>>>>
>>>> -
>>>> -
>>>>  ..
>>>>    Local Variables:
>>>>    mode: indented-text
>>>>
>>>
>> _______________________________________________
>> Web-SIG mailing list
>> [hidden email]
>> Web SIG: http://www.python.org/sigs/web-sig
>> Unsubscribe: http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com
>>
>
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Draft PEP: WSGI 1.1

Graham Dumpleton-2
On 16 April 2010 15:19, Paul J Davis <[hidden email]> wrote:

>
>
> On Apr 15, 2010, at 11:53 PM, Graham Dumpleton <[hidden email]> wrote:
>
>> On 16 April 2010 13:29, Paul Davis <[hidden email]> wrote:
>>> On Thu, Apr 15, 2010 at 10:08 PM, Graham Dumpleton
>>> <[hidden email]> wrote:
>>>> On 16 April 2010 11:41, Graham Dumpleton <[hidden email]> wrote:
>>>>> I haven't read what you have done yet
>>>>
>>>> And still haven't. Don't know when I will get a chance to do so.
>>>>
>>>> Two points from a quick scan of emails.
>>>>
>>>> 1. The following section of PEP needs to be updated:
>>>>
>>>> """
>>>>  1417 Apart from the handling of ``close()``, the semantics of returning a
>>>>  1418 file wrapper from the application should be the same as if the
>>>>  1419 application had returned ``iter(filelike.read, '')``.  In other words,
>>>>  1420 transmission should begin at the current position within the "file"
>>>>  1421 at the time that transmission begins, and continue until the end is
>>>>  1422 reached.
>>>> """
>>>>
>>>> It can't say read until 'end is reached' of file as Content-Length
>>>> must be honoured and less returned if Content-Length is less than what
>>>> is available in the remainder of the file as per descriptive changes
>>>> (3) and (4).
>>>>
>>>> In respect of question about readline() arguments and whether -1 or
>>>> None is allowed. I would say no they are not. Must be positive integer
>>>> or no argument supplied at all.
>>>>
>>>> Different implementations use -1 or None as value of a default
>>>> argument to know when an argument wasn't supplied. One cant rely
>>>> though on one or the other being used and so that supplying those
>>>> arguments explicitly means the same thing as no argument supplied. In
>>>> other words, supplying anything but positive integer or no argument at
>>>> all is undefined.
>>>>
>>>> Same issue arises with read() except that only positive integer can
>>>> technically be supplied and argument is not optional. Although, any
>>>> implementation which implements wsgi.input as a proper file like
>>>> argument is going to accept no argument to mean read all input, this
>>>> is outside of WSGI specification and calling with no argument is
>>>> undefined.
>>>>
>>>> Graham
>>>
>>> I happened to have just started hitting the body reading functions on
>>> an HTTP parser I've been working on. I'd be interested to hear a
>>> response on what happens when the various read functions are called
>>> with a size hint of zero.
>>>
>>> I realize that zero is not a positive integer but I'm not quite sure
>>> on what the recommended return value would be. I'm can see None and -1
>>> being obvious flags for "no size hint", but zero is a tad weird. I
>>> want to say that it'd either return "" (which could sorta kinda
>>> violate #2) or raise an exception. I really haven't got any reason to
>>> prefer on over the other though.
>>
>> I almost mentioned 0 as argument in my previous email, but I got a bit
>> scared off by it also.
>>
>> In all these things, one has to be guided by what a standard file like
>> object does in Python. Ie.,
>>
>>>>> import sys
>>>>> sys.stdin.read(0)
>> ''
>>
>> So, although an empty string would normally indicate no more content
>> can be read, a argument of 0 has to be seen as a special exception to
>> that rule, with no choice but that empty string is returned.
>>
>> Graham
>>
>
> I'm inclined to agree. As a quick follow up that's semi tangentially related, what of the case where an app doesn't consume the entire request body in a keep alive context. I've been running on the assumption that the server would discard but I wonder about the possibility of silence causing more confusion than an exception, or worse, some sort of attack based on sneaking a hidden request. I only mention it because of the chunked-encoding-might-mean-zero-length-body assumption.

The underlying HTTP server, or WSGI server if they are one and the
same, should really ensure that any request content not consumed is
read in and discarded if it is going to allow a follow on request.

>From memory, wsgiref on top of Python basic HTTP server doesn't do this.

Not sure whether this obligation should be part of specification given
that it is really a HTTP thing and if a HTTP server implementing
HTTP/1.1 doesn't do that, it is arguably broken.

Graham

> Paul
>
>
>>> As an aside, I think that "honoring Content-Length" should probably be
>>> rephrased to a "middleware should not break HTTP" coupled with a page
>>> that lists common ways that middle ware breaks HTTP. I reckon its the
>>> same reasoning for 333's dictation that hop-by-hop headers are server
>>> only, though there are plenty of other ways I could violate RFC 2616
>>> as a middleware author without violating WSGI. Pie in the sky, the
>>> common ways would be included with wsgiref's validate decorator.
>>>
>>> Paul
>>>
>>>>> but if you have done so
>>>>> already, ensure you read:
>>>>>
>>>>>  http://bitbucket.org/ianb/wsgi-peps/src/
>>>>>
>>>>> This is Ian's and Armin's previous go at new specification. It though
>>>>> tried to go further than what you are doing.
>>>>>
>>>>> Also read:
>>>>>
>>>>>  http://blog.dscpl.com.au/2009/09/roadmap-for-python-wsgi-specification.html
>>>>>
>>>>> I explain what I mean by native strings in that.
>>>>>
>>>>> Graham
>>>>>
>>>>> On 15 April 2010 22:54, Dirkjan Ochtman <[hidden email]> wrote:
>>>>>> Mostly taking Graham's list of issues and incorporating it into PEP 333.
>>>>>>
>>>>>> Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt
>>>>>>
>>>>>> Let's have comments here (comments in the form of diffs are
>>>>>> particularly welcome, of course). Remember, the idea is not to change
>>>>>> or improve WSGI right now, but only to improve the spec, improving
>>>>>> interoperability and enabling Python 3 support.
>>>>>>
>>>>>> Graham, I hope I did a good job with your suggestions. (Since so much
>>>>>> of this is yours, I've just listed you as the second author.) I tried
>>>>>> to clarify exactly what you meant by "native strings", can you check
>>>>>> that out?
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Dirkjan
>>>>>>
>>>>>> --- pep-0333.txt        2010-04-15 14:46:02.000000000 +0200
>>>>>> +++ wsgi-1.1.txt        2010-04-15 14:51:39.000000000 +0200
>>>>>> @@ -1,114 +1,124 @@
>>>>>> -PEP: 333
>>>>>> -Title: Python Web Server Gateway Interface v1.0
>>>>>> +PEP: 0000
>>>>>> +Title: Python Web Server Gateway Interface 1.1
>>>>>>  Version: $Revision$
>>>>>>  Last-Modified: $Date$
>>>>>> -Author: Phillip J. Eby <[hidden email]>
>>>>>> +Author: Dirkjan Ochtman <[hidden email]>,
>>>>>> +        Graham Dumpleton <[hidden email]>
>>>>>>  Discussions-To: Python Web-SIG <[hidden email]>
>>>>>>  Status: Draft
>>>>>>  Type: Informational
>>>>>>  Content-Type: text/x-rst
>>>>>> -Created: 07-Dec-2003
>>>>>> -Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
>>>>>> +Created: 15-04-2010
>>>>>> +Post-History: Not yet
>>>>>>
>>>>>>
>>>>>>  Abstract
>>>>>>  ========
>>>>>>
>>>>>> -This document specifies a proposed standard interface between web
>>>>>> -servers and Python web applications or frameworks, to promote web
>>>>>> -application portability across a variety of web servers.
>>>>>> +This document specifies a revision of the proposed standard interface
>>>>>> +between web servers and Python web applications or frameworks, to
>>>>>> +promote web application portability across a variety of web servers.
>>>>>>
>>>>>>
>>>>>>  Rationale and Goals
>>>>>>  ===================
>>>>>>
>>>>>> -Python currently boasts a wide variety of web application frameworks,
>>>>>> -such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
>>>>>> -name just a few [1]_.  This wide variety of choices can be a problem
>>>>>> -for new Python users, because generally speaking, their choice of web
>>>>>> -framework will limit their choice of usable web servers, and vice
>>>>>> -versa.
>>>>>> -
>>>>>> -By contrast, although Java has just as many web application frameworks
>>>>>> -available, Java's "servlet" API makes it possible for applications
>>>>>> -written with any Java web application framework to run in any web
>>>>>> -server that supports the servlet API.
>>>>>> -
>>>>>> -The availability and widespread use of such an API in web servers for
>>>>>> -Python -- whether those servers are written in Python (e.g. Medusa),
>>>>>> -embed Python (e.g. mod_python), or invoke Python via a gateway
>>>>>> -protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
>>>>>> -framework from choice of web server, freeing users to choose a pairing
>>>>>> -that suits them, while freeing framework and server developers to
>>>>>> -focus on their preferred area of specialization.
>>>>>> -
>>>>>> -This PEP, therefore, proposes a simple and universal interface between
>>>>>> -web servers and web applications or frameworks: the Python Web Server
>>>>>> -Gateway Interface (WSGI).
>>>>>> -
>>>>>> -But the mere existence of a WSGI spec does nothing to address the
>>>>>> -existing state of servers and frameworks for Python web applications.
>>>>>> -Server and framework authors and maintainers must actually implement
>>>>>> -WSGI for there to be any effect.
>>>>>> -
>>>>>> -However, since no existing servers or frameworks support WSGI, there
>>>>>> -is little immediate reward for an author who implements WSGI support.
>>>>>> -Thus, WSGI **must** be easy to implement, so that an author's initial
>>>>>> -investment in the interface can be reasonably low.
>>>>>> -
>>>>>> -Thus, simplicity of implementation on *both* the server and framework
>>>>>> -sides of the interface is absolutely critical to the utility of the
>>>>>> -WSGI interface, and is therefore the principal criterion for any
>>>>>> -design decisions.
>>>>>> -
>>>>>> -Note, however, that simplicity of implementation for a framework
>>>>>> -author is not the same thing as ease of use for a web application
>>>>>> -author.  WSGI presents an absolutely "no frills" interface to the
>>>>>> -framework author, because bells and whistles like response objects and
>>>>>> -cookie handling would just get in the way of existing frameworks'
>>>>>> -handling of these issues.  Again, the goal of WSGI is to facilitate
>>>>>> -easy interconnection of existing servers and applications or
>>>>>> -frameworks, not to create a new web framework.
>>>>>> -
>>>>>> -Note also that this goal precludes WSGI from requiring anything that
>>>>>> -is not already available in deployed versions of Python.  Therefore,
>>>>>> -new standard library modules are not proposed or required by this
>>>>>> -specification, and nothing in WSGI requires a Python version greater
>>>>>> -than 2.2.2.  (It would be a good idea, however, for future versions
>>>>>> -of Python to include support for this interface in web servers
>>>>>> -provided by the standard library.)
>>>>>> -
>>>>>> -In addition to ease of implementation for existing and future
>>>>>> -frameworks and servers, it should also be easy to create request
>>>>>> -preprocessors, response postprocessors, and other WSGI-based
>>>>>> -"middleware" components that look like an application to their
>>>>>> -containing server, while acting as a server for their contained
>>>>>> -applications.
>>>>>> -
>>>>>> -If middleware can be both simple and robust, and WSGI is widely
>>>>>> -available in servers and frameworks, it allows for the possibility
>>>>>> -of an entirely new kind of Python web application framework: one
>>>>>> -consisting of loosely-coupled WSGI middleware components.  Indeed,
>>>>>> -existing framework authors may even choose to refactor their
>>>>>> -frameworks' existing services to be provided in this way, becoming
>>>>>> -more like libraries used with WSGI, and less like monolithic
>>>>>> -frameworks.  This would then allow application developers to choose
>>>>>> -"best-of-breed" components for specific functionality, rather than
>>>>>> -having to commit to all the pros and cons of a single framework.
>>>>>> -
>>>>>> -Of course, as of this writing, that day is doubtless quite far off.
>>>>>> -In the meantime, it is a sufficient short-term goal for WSGI to
>>>>>> -enable the use of any framework with any server.
>>>>>> -
>>>>>> -Finally, it should be mentioned that the current version of WSGI
>>>>>> -does not prescribe any particular mechanism for "deploying" an
>>>>>> -application for use with a web server or server gateway.  At the
>>>>>> -present time, this is necessarily implementation-defined by the
>>>>>> -server or gateway.  After a sufficient number of servers and
>>>>>> -frameworks have implemented WSGI to provide field experience with
>>>>>> -varying deployment requirements, it may make sense to create
>>>>>> -another PEP, describing a deployment standard for WSGI servers and
>>>>>> -application frameworks.
>>>>>> +WSGI 1.0, specified in PEP 333, did a great job in making it easier
>>>>>> +for web applications and web servers to interface with each other.
>>>>>> +It has become very much the standard it was meant to be and an
>>>>>> +important part of the Python web development infrastructure.
>>>>>> +
>>>>>> +After several implementations were built by different developers,
>>>>>> +it inevitably turned out that the specification wasn't perfect. It
>>>>>> +left out some details that were implemented by all the web server
>>>>>> +interfaces because they were critical for many applications (or
>>>>>> +application frameworks). Additionally, the specification was written
>>>>>> +before Python 3.x was specified, resulting in a lack of clear
>>>>>> +specification on what to do with unicode strings.
>>>>>> +
>>>>>> +While there are some ideas around to improve WSGI further in less
>>>>>> +compatible ways, we feel that there is value to be had in first
>>>>>> +specifying a minor revision of the specification, which is largely
>>>>>> +compatible with existing implementations. Further simplification
>>>>>> +and experimentation are therefore deferred to a 2.0 version.
>>>>>> +
>>>>>> +
>>>>>> +Differences with WSGI 1.0
>>>>>> +=========================
>>>>>> +
>>>>>> +Descriptive changes
>>>>>> +-------------------
>>>>>> +
>>>>>> +The following changes were made to realign the spec with
>>>>>> +implementations 'in the wild'.
>>>>>> +
>>>>>> +1. The 'readline()' function of 'wsgi.input' must optionally take
>>>>>> +   a size hint. This is required because many applications use
>>>>>> +   cgi.FieldStorage, which uses this functionality.
>>>>>> +
>>>>>> +2. The 'wsgi.input' functions for reading input must return an empty
>>>>>> +   string as end of input stream marker. This is required for support
>>>>>> +   of HTTP 1.1 request pipelining. A correctly implemented WSGI
>>>>>> +   middleware already has to cope with an empty string as end
>>>>>> +   sentinel anyway to detect premature end of input.
>>>>>> +
>>>>>> +3. Any WSGI application or middleware should not itself return, or
>>>>>> +   consume from a wrapped WSGI component, more data than specified by
>>>>>> +   the Content-Length response header if defined. Middleware that
>>>>>> +   does this is arguably broken and can generate incorrect data.
>>>>>> +   This is just a clarification of obligations.
>>>>>> +
>>>>>> +4. The WSGI adapter must not pass on to the server any data above
>>>>>> +   what the Content-Length response header defines, if supplied.
>>>>>> +   Doing this is technically a violation of HTTP. This is another
>>>>>> +   clarification of obligations.
>>>>>> +
>>>>>> +
>>>>>> +String handling changes
>>>>>> +-----------------------
>>>>>> +
>>>>>> +The following changes were made to make WSGI work on Python 3.x.
>>>>>> +
>>>>>> +1. The application is passed an instance of a Python dictionary
>>>>>> +   containing what is referred to as the WSGI environment. All keys
>>>>>> +   in this dictionary are native strings. For CGI variables, all names
>>>>>> +   are going to be ISO-8859-1 and so where native strings are
>>>>>> +   unicode strings, that encoding is used for the names of CGI
>>>>>> +   variables.
>>>>>> +
>>>>>> +2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
>>>>>> +   environment, the value of the variable should be a native string.
>>>>>> +
>>>>>> +3. For the CGI variables contained in the WSGI environment, the values
>>>>>> +   of the variables are native strings. Where native strings are
>>>>>> +   unicode strings, ISO-8859-1 encoding would be used such that the
>>>>>> +   original character data is preserved and as necessary the unicode
>>>>>> +   string can be converted back to bytes and thence decoded to unicode
>>>>>> +   again using a different encoding.
>>>>>> +
>>>>>> +4. The WSGI input stream 'wsgi.input' contained in the WSGI environment
>>>>>> +   and from which request content is read, should yield byte strings.
>>>>>> +
>>>>>> +5. The status line specified by the WSGI application should be a byte
>>>>>> +   string. Where native strings are unicode strings, the native string
>>>>>> +   type can also be returned in which case it would be encoded as
>>>>>> +   ISO-8859-1.
>>>>>> +
>>>>>> +6. The list of response headers specified by the WSGI application should
>>>>>> +   contain tuples consisting of two values, where each value is a byte
>>>>>> +   string. Where native strings are unicode strings, the native string
>>>>>> +   type can also be returned in which case it would be encoded as
>>>>>> +   ISO-8859-1.
>>>>>> +
>>>>>> +7. The iterable returned by the application and from which response
>>>>>> +   content is derived, should yield byte strings. Where native strings
>>>>>> +   are unicode strings, the native string type can also be returned in
>>>>>> +   which case it would be encoded as ISO-8859-1.
>>>>>> +
>>>>>> +8. The value passed to the 'write()' callback returned by
>>>>>> +   'start_response()' should be a byte string. Where native strings
>>>>>> +   are unicode strings, a native string type can also be supplied, in
>>>>>> +   which case it would be encoded as ISO-8859-1.
>>>>>>
>>>>>>
>>>>>>  Specification Overview
>>>>>> @@ -447,6 +457,13 @@
>>>>>>  Streaming`_ section below for more on how application output must be
>>>>>>  handled.)
>>>>>>
>>>>>> +Further on, several places specify constraints upon string types used
>>>>>> +in the WSGI API. The term native string is used to mean the 'str' class
>>>>>> +in both Python 2.x and 3.x. The spec tries to ensure optimal
>>>>>> +compatibility and ease of use by allowing implementations running on
>>>>>> +Python 3.x to encode strings (which are Unicode strings with no
>>>>>> +specified encoding) as ISO-8859-1 where a 3.x string is passed in.
>>>>>> +
>>>>>>  The server or gateway should treat the yielded strings as binary byte
>>>>>>  sequences: in particular, it should ensure that line endings are
>>>>>>  not altered.  The application is responsible for ensuring that the
>>>>>> @@ -489,12 +506,22 @@
>>>>>>  ``environ`` Variables
>>>>>>  ---------------------
>>>>>>
>>>>>> +All keys in this dictionary are native strings. For CGI variables,
>>>>>> +all names are going to be ISO-8859-1 and so where native strings are
>>>>>> +unicode strings, that encoding is used for the names of CGI variables.
>>>>>> +
>>>>>>  The ``environ`` dictionary is required to contain these CGI
>>>>>>  environment variables, as defined by the Common Gateway Interface
>>>>>>  specification [2]_.  The following variables **must** be present,
>>>>>>  unless their value would be an empty string, in which case they
>>>>>>  **may** be omitted, except as otherwise noted below.
>>>>>>
>>>>>> +The values for CGI variables are native strings. Where native strings
>>>>>> +are unicode strings, ISO-8859-1 encoding would be used such that the
>>>>>> +original character data is preserved and as necessary the unicode
>>>>>> +string can be converted back to bytes and thence decoded to unicode
>>>>>> +again using a different encoding.
>>>>>> +
>>>>>>  ``REQUEST_METHOD``
>>>>>>   The HTTP request method, such as ``"GET"`` or ``"POST"``.  This
>>>>>>   cannot ever be an empty string, and so is always required.
>>>>>> @@ -575,13 +602,14 @@
>>>>>>  =====================  ===============================================
>>>>>>  Variable               Value
>>>>>>  =====================  ===============================================
>>>>>> -``wsgi.version``       The tuple ``(1,0)``, representing WSGI
>>>>>> +``wsgi.version``       The tuple ``(1, 0)``, representing WSGI
>>>>>>                        version 1.0.
>>>>>>
>>>>>>  ``wsgi.url_scheme``    A string representing the "scheme" portion of
>>>>>>                        the URL at which the application is being
>>>>>>                        invoked.  Normally, this will have the value
>>>>>> -                       ``"http"`` or ``"https"``, as appropriate.
>>>>>> +                       ``"http"`` or ``"https"``, as appropriate. The
>>>>>> +                       value is a native string.
>>>>>>
>>>>>>  ``wsgi.input``         An input stream (file-like object) from which
>>>>>>                        the HTTP request body can be read.  (The server
>>>>>> @@ -646,7 +674,7 @@
>>>>>>  Method               Stream      Notes
>>>>>>  ===================  ==========  ========
>>>>>>  ``read(size)``       ``input``   1
>>>>>> -``readline()``       ``input``   1,2
>>>>>> +``readline(hint)``   ``input``   1,2
>>>>>>  ``readlines(hint)``  ``input``   1,3
>>>>>>  ``__iter__()``       ``input``
>>>>>>  ``flush()``          ``errors``  4
>>>>>> @@ -661,11 +689,12 @@
>>>>>>    ``Content-Length``, and is allowed to simulate an end-of-file
>>>>>>    condition if the application attempts to read past that point.
>>>>>>    The application **should not** attempt to read more data than is
>>>>>> -   specified by the ``CONTENT_LENGTH`` variable.
>>>>>> +   specified by the ``CONTENT_LENGTH`` variable. All read functions
>>>>>> +   are required to return an empty string as the end of input stream
>>>>>> +   marker. They must yield byte strings.
>>>>>>
>>>>>> -2. The optional "size" argument to ``readline()`` is not supported,
>>>>>> -   as it may be complex for server authors to implement, and is not
>>>>>> -   often used in practice.
>>>>>> +2. The optional "size" argument to ``readline()`` is required for
>>>>>> +   the implementer, but optional for callers.
>>>>>>
>>>>>>  3. Note that the ``hint`` argument to ``readlines()`` is optional for
>>>>>>    both caller and implementer.  The application is free not to
>>>>>> @@ -692,12 +721,15 @@
>>>>>>  ---------------------------------
>>>>>>
>>>>>>  The second parameter passed to the application object is a callable
>>>>>> -of the form ``start_response(status,response_headers,exc_info=None)``.
>>>>>> +of the form ``start_response(status, response_headers, exc_info=None)``.
>>>>>>  (As with all WSGI callables, the arguments must be supplied
>>>>>>  positionally, not by keyword.)  The ``start_response`` callable is
>>>>>>  used to begin the HTTP response, and it must return a
>>>>>>  ``write(body_data)`` callable (see the `Buffering and Streaming`_
>>>>>> -section, below).
>>>>>> +section, below). Values passed to the ``write(body_data)`` callable
>>>>>> +should be byte strings. Where native strings are unicode strings, a
>>>>>> +native strings type can also be supplied, in which case it would be
>>>>>> +encoded as ISO-8859-1.
>>>>>>
>>>>>>  The ``status`` argument is an HTTP "status" string like ``"200 OK"``
>>>>>>  or ``"404 Not Found"``.  That is, it is a string consisting of a
>>>>>> @@ -705,14 +737,20 @@
>>>>>>  single space, with no surrounding whitespace or other characters.
>>>>>>  (See RFC 2616, Section 6.1.1 for more information.)  The string
>>>>>>  **must not** contain control characters, and must not be terminated
>>>>>> -with a carriage return, linefeed, or combination thereof.
>>>>>> +with a carriage return, linefeed, or combination thereof. This
>>>>>> +value should be a byte string. Where native strings are unicode
>>>>>> +strings, the native string type can also be returned, in which
>>>>>> +case it would be encoded as ISO-8859-1.
>>>>>>
>>>>>>  The ``response_headers`` argument is a list of ``(header_name,
>>>>>>  header_value)`` tuples.  It must be a Python list; i.e.
>>>>>> -``type(response_headers) is ListType``, and the server **may** change
>>>>>> +``type(response_headers) is list``, and the server **may** change
>>>>>>  its contents in any way it desires.  Each ``header_name`` must be a
>>>>>>  valid HTTP header field-name (as defined by RFC 2616, Section 4.2),
>>>>>> -without a trailing colon or other punctuation.
>>>>>> +without a trailing colon or other punctuation. Both the header_name
>>>>>> +and the header_value should be byte strings. Where native strings
>>>>>> +are unicode strings, the native string type can also be returned,
>>>>>> +in which case it would be encoded as ISO-8859-1.
>>>>>>
>>>>>>  Each ``header_value`` **must not** include *any* control characters,
>>>>>>  including carriage returns or linefeeds, either embedded or at the end.
>>>>>> @@ -809,6 +847,14 @@
>>>>>>  Handling the ``Content-Length`` Header
>>>>>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>
>>>>>> +If an application or middleware layer chooses to return a
>>>>>> +Content-Length header, it should not return more data than specified
>>>>>> +by the header value. Any wrapping middleware layer should not
>>>>>> +consume more data than specified in the header value from the
>>>>>> +wrapped component (either middleware or application). Any WSGI
>>>>>> +adapter must similarly not pass on data above what the
>>>>>> +Content-Length response header value defines.
>>>>>> +
>>>>>>  If the application does not supply a ``Content-Length`` header, a
>>>>>>  server or gateway may choose one of several approaches to handling
>>>>>>  it.  The simplest of these is to close the client connection when
>>>>>> @@ -1569,55 +1615,13 @@
>>>>>>    developers.
>>>>>>
>>>>>>
>>>>>> -Proposed/Under Discussion
>>>>>> -=========================
>>>>>> -
>>>>>> -These items are currently being discussed on the Web-SIG and elsewhere,
>>>>>> -or are on the PEP author's "to-do" list:
>>>>>> -
>>>>>> -* Should ``wsgi.input`` be an iterator instead of a file?  This would
>>>>>> -  help for asynchronous applications and chunked-encoding input
>>>>>> -  streams.
>>>>>> -
>>>>>> -* Optional extensions are being discussed for pausing iteration of an
>>>>>> -  application's ouptut until input is available or until a callback
>>>>>> -  occurs.
>>>>>> -
>>>>>> -* Add a section about synchronous vs. asynchronous apps and servers,
>>>>>> -  the relevant threading models, and issues/design goals in these
>>>>>> -  areas.
>>>>>> -
>>>>>> -
>>>>>>  Acknowledgements
>>>>>>  ================
>>>>>>
>>>>>> -Thanks go to the many folks on the Web-SIG mailing list whose
>>>>>> -thoughtful feedback made this revised draft possible.  Especially:
>>>>>> +Thanks go to many folks on the Web-SIG mailing list for helping the work
>>>>>> +on clarifying and improving this specification. In particular:
>>>>>>
>>>>>> -* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
>>>>>> -  on the first draft as not offering any advantages over "plain old
>>>>>> -  CGI", thus encouraging me to look for a better approach.
>>>>>> -
>>>>>> -* Ian Bicking, who helped nag me into properly specifying the
>>>>>> -  multithreading and multiprocess options, as well as badgering me to
>>>>>> -  provide a mechanism for servers to supply custom extension data to
>>>>>> -  an application.
>>>>>> -
>>>>>> -* Tony Lownds, who came up with the concept of a ``start_response``
>>>>>> -  function that took the status and headers, returning a ``write``
>>>>>> -  function.  His input also guided the design of the exception handling
>>>>>> -  facilities, especially in the area of allowing for middleware that
>>>>>> -  overrides application error messages.
>>>>>> -
>>>>>> -* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython
>>>>>> -  (well before the spec was finalized) helped to shape the "supporting
>>>>>> -  older versions of Python" section, as well as the optional
>>>>>> -  ``wsgi.file_wrapper`` facility.
>>>>>> -
>>>>>> -* Mark Nottingham, who reviewed the spec extensively for issues with
>>>>>> -  HTTP RFC compliance, especially with regard to HTTP/1.1 features that
>>>>>> -  I didn't even know existed until he pointed them out.
>>>>>> -
>>>>>> +* Phillip J. Eby, for writing/editing the 1.0 specification.
>>>>>>
>>>>>>  References
>>>>>>  ==========
>>>>>> @@ -1643,8 +1647,6 @@
>>>>>>
>>>>>>  This document has been placed in the public domain.
>>>>>>
>>>>>> -
>>>>>> -
>>>>>>  ..
>>>>>>    Local Variables:
>>>>>>    mode: indented-text
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> Web-SIG mailing list
>>>> [hidden email]
>>>> Web SIG: http://www.python.org/sigs/web-sig
>>>> Unsubscribe: http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com
>>>>
>>>
>
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com