PEP 444 / WSGI2 Proposal: Filters to suppliment middleware.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

PEP 444 / WSGI2 Proposal: Filters to suppliment middleware.

Alice Bevan–McGregor
Howdy!

There's one issue I've seen repeated a lot in working with WSGI1 and
that is the use of middleware to process incoming data, but not
outgoing, and vice-versa; middleware which filters the output in some
way, but cares not about the input.

Wrapping middleware around an application is simple and effective, but
costly in terms of stack allocation overhead; it also makes debugging a
bit more of a nightmare as the stack trace can be quite deep.

My updated draft PEP 444[1] includes a section describing Filters, both
ingress (input filtering) and egress (output filtering).  The API is
trivially simple, optional (as filters can be easily adapted as
middleware if the host server doesn't support filters) and easy to
implement in a server.  (The Marrow HTTP/1.1 server implements them as
two for loops.)

Basically an input filter accepts the environment dictionary and can
mutate it.  Ingress filters take a single positional argument that is
the environ.  The return value is ignored.  (This is questionable; it
may sometimes be good to have ingress filters return responses.  Not
sure about that, though.)

An egress filter accepts the status, headers, body tuple from the
applciation and returns a status, headers, and body tuple of its own
which then replaces the response.  An example implementation is:

        for filter_ in ingress_filters:
            filter_(environ)
       
        response = application(environ)
       
        for filter_ in egress_filters:
            response = filter_(*response)

I'd love to get some input on this.  Questions, comments, criticisms,
or better ideas are welcome!

        — Alice.

[1] https://github.com/GothAlice/wsgi2/blob/master/pep-0444.rst


_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: PEP 444 / WSGI2 Proposal: Filters to supplimentmiddleware.

Robert Brewer-4
Alice Bevan–McGregor

> There's one issue I've seen repeated a lot in working with WSGI1 and
> that is the use of middleware to process incoming data, but not
> outgoing, and vice-versa; middleware which filters the output in some
> way, but cares not about the input.
>
> Wrapping middleware around an application is simple and effective, but
> costly in terms of stack allocation overhead; it also makes debugging a
> bit more of a nightmare as the stack trace can be quite deep.
>
> My updated draft PEP 444[1] includes a section describing Filters, both
> ingress (input filtering) and egress (output filtering).  The API is
> trivially simple, optional (as filters can be easily adapted as
> middleware if the host server doesn't support filters) and easy to
> implement in a server.  (The Marrow HTTP/1.1 server implements them as
> two for loops.)
>
> Basically an input filter accepts the environment dictionary and can
> mutate it.  Ingress filters take a single positional argument that is
> the environ.  The return value is ignored.  (This is questionable; it
> may sometimes be good to have ingress filters return responses.  Not
> sure about that, though.)
>
> An egress filter accepts the status, headers, body tuple from the
> applciation and returns a status, headers, and body tuple of its own
> which then replaces the response.  An example implementation is:
>
> for filter_ in ingress_filters:
>    filter_(environ)
>
> response = application(environ)
>
> for filter_ in egress_filters:
>    response = filter_(*response)

That looks amazingly like the code for CherryPy Filters circa 2005. In version 2 of CherryPy, "Filters" were the canonical extension method (for the framework, not WSGI, but the same lessons apply). It was still expensive in terms of stack allocation overhead, because you had to call () each filter to see if it was "on". It would be much better to find a way to write something like:

    for f in ingress_filters:
        if f.on:
            f(environ)

It was also fiendishly difficult to get executed in the right order: if you had a filter that was both ingress and egress, the natural tendency for core developers and users alike was to append each to each list, but this is almost never the correct order. But even if you solve the issue of static composition, there's still a demand for programmatic composition ("if X then add Y after it"), and even decomposition ("find the caching filter my framework added automatically and turn it off"), and list.insert()/remove() isn't stellar at that. Calling the filter to ask it whether it is "on" also leads filter developers down the wrong path; you really don't want to have Filter A trying to figure out if some other, conflicting Filter B has already run (or will run soon) that demands Filter A return without executing anything. You really, really want the set of filters to be both statically defined and statically analyzable.

Finally, you want the execution of filters to be configurable per URI and also configurable per controller. So the above should be rewritten again to something like:

    for f in ingress_filters(controller):
        if f.on(environ['path_info']):
            f(environ)

It was for these reasons that CherryPy 3 ditched its version 2 "filters" and replaced them with "hooks and tools" in version 3. You might find more insight by studying the latest cherrypy/_cptools.py


Robert Brewer
[hidden email]
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: PEP 444 / WSGI2 Proposal: Filters to supplimentmiddleware.

Alice Bevan–McGregor
> That looks amazingly like the code for CherryPy Filters circa 2005. In
> version 2 of CherryPy, "Filters" were the canonical extension method
> (for the framework, not WSGI, but the same lessons apply). It was still
> expensive in terms of stack allocation overhead, because you had to
> call () each filter to see if it was "on". It would be much better to
> find a way to write something like:
>
>
>
>     for f in ingress_filters:
>
>         if f.on:
>
>             f(environ)

.on will need to be an @property in most cases, still not avoiding
stack allocation and, in fact, doubling the overhead per filter.  
Statically disabled filters should not be added to the filter list.

> It was also fiendishly difficult to get executed in the right order: if
> you had a filter that was both ingress and egress, the natural tendency
> for core developers and users alike was to append each to each list,
> but this is almost never the correct order.

If something is both an ingress and egress filter, it should be
implemented as middleware instead.  Nothing can prevent developers from
doing bad things if they really try.  Appending to ingress and
prepending to egress would be the "right" thing to simulate middleware
behaviour with filters, but again, don't do that.  ;)

> But even if you solve the issue of static composition, there's still a
> demand for programmatic composition ("if X then add Y after it"), and
> even decomposition ("find the caching filter my framework added
> automatically and turn it off"), and list.insert()/remove() isn't
> stellar at that.

I have plans (and partial implementation) of a init.d-style
"needs/uses/provides" declaration and automatic dependency graphing.  
WebCore, for example, adds the declarations to existing middleware
layers to sort the middleware.

> Calling the filter to ask it whether it is "on" also leads filter
> developers down the wrong path; you really don't want to have Filter A
> trying to figure out if some other, conflicting Filter B has already
> run (or will run soon) that demands Filter A return without executing
> anything. You really, really want the set of filters to be both
> statically defined and statically analyzable.

Unfortunately, most, if not all filters need to check for request
headers and response headers to determine the capability to run.  E.g.
compression checks environ.get('HTTP_ACCEPT_ENCODING', '').lower() for
'gzip', and checks the response to determine if a 'Content-Encoding'
header has already been specified.

> Finally, you want the execution of filters to be configurable per URI
> and also configurable per controller. So the above should be rewritten
> again to something like:
>
>
>
>     for f in ingress_filters(controller):
>
>         if f.on(environ['path_info']):
>
>             f(environ)
>
>
>
> It was for these reasons that CherryPy 3 ditched its version 2
> "filters" and replaced them with "hooks and tools" in version 3.

This is possible by wrapping multiple applications, say, in the filter
middleware adapter with differing filter setups, then using the
separate wrapped applications with some form of dispatch.  You could
also utilize filters as decorators.  This is an implementation detail
left up to the framework utilizing WSGI2, however.  WSGI2 itself has no
concept of "controllers".

None of this prevents the simplified stack from being useful during
exception handling, though.  ;)  What I was really trying to do is
reduce the level of nesting on each request and make what used to be
middleware more explicit in its purpose.

> You might find more insight by studying the latest cherrypy/_cptools.py

I'll give it a gander, though I firmly believe filter management (as
middleware stack management) is the domain of a framework on top of
WSGI2, not the domain of the protocol.

        — Alice.


_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: PEP 444 / WSGI2 Proposal: Filters to suppliment middleware.

ianb
In reply to this post by Alice Bevan–McGregor
On Sun, Dec 12, 2010 at 9:59 PM, Alice Bevan–McGregor <[hidden email]> wrote:
Howdy!

There's one issue I've seen repeated a lot in working with WSGI1 and that is the use of middleware to process incoming data, but not outgoing, and vice-versa; middleware which filters the output in some way, but cares not about the input.

Wrapping middleware around an application is simple and effective, but costly in terms of stack allocation overhead; it also makes debugging a bit more of a nightmare as the stack trace can be quite deep.

My updated draft PEP 444[1] includes a section describing Filters, both ingress (input filtering) and egress (output filtering).  The API is trivially simple, optional (as filters can be easily adapted as middleware if the host server doesn't support filters) and easy to implement in a server.  (The Marrow HTTP/1.1 server implements them as two for loops.)

It's not clear to me how this can be composed or abstracted.

@webob.dec.wsgify does kind of handle this with its request/response pattern; in a simplified form it's like:

def wsgify(func):
    def replacement(environ):
        req = Request(environ)
        resp = func(req)
        return resp(environ)
    return replacement

This allows you to do an output filter like:

@wsgify
def output_filter(req):
    resp = some_app(req.environ)
    fiddle_with_resp(resp)
    return resp

(Most output filters also need the request.)  And an input filter like:

@wsgify
def input_filter(req):
    fiddle_with_req(req)
    return some_app


But while it handles the input filter case, it doesn't try to generalize this or move application composition into the server.  An application is an application and servers are imagined but not actually concrete.  If you handle filters at the server level you have to have some way of registering these filters, and it's unclear what order they should be applied.  At import?  Does the server have to poke around in the app it is running?  How can it traverse down if you have dispatching apps (like paste.urlmap or Routes)?

You can still implement this locally of course, as a class that takes an app and input and output filters.


--
Ian Bicking  |  http://blog.ianbicking.org

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: PEP 444 / WSGI2 Proposal: Filters to suppliment middleware.

Alice Bevan–McGregor
Ian,

> It's not clear to me how this can be composed or abstracted.

Filters themselves have no knowledge of the applicaiton, in a similar
vein to middleware not knowing if the next layer in the onion skin is
the final application, or another bit of well-behaved middleware,
except that filters do not get a reference to an "inner" application at
all.  (They are linear, not nested.)

> (Most output filters also need the request.)

You are quite correct; I'll upate the PEP.  marrow.server.http already
passes environ to egress filters in addition to the status_bytes,
headers_list, body_iter data.

> But while it handles the input filter case, it doesn't try to
> generalize this or move application composition into the server.

A large proportion of the filters I was able to imagine are
conditionless: there would be no "path" within your application
(controller or otherwise) that would need to modify the majority of
them.  As an example, egress compression.  (And even then, my example
egress compression filter offers a documented mechanism to disable it
on a per-request basis.)

> An application is an application and servers are imagined but not
> actually concrete.

Could you elaborate?  (Define "concrete" in this context.)

> If you handle filters at the server level you have to have some way of
> registering these filters, and it's unclear what order they should be
> applied.  At import?  Does the server have to poke around in the app it
> is running?  How can it traverse down if you have dispatching apps
> (like paste.urlmap or Routes)?

Filters are unaffected by, and unaware of, dispatch.  They are defined
at the same time your application middleware stack is constructed, and
passed (in the current implementation) to the HTTPServer protocol as a
list at the same time as your wrapped application stack.

> You can still implement this locally of course, as a class that takes
> an app and input and output filters.

If you -do- need "region specific" filtering, you can ostensibly wrap
multiple final applications in filter management middleware, as you
say.  That's a fairly advanced use-case regardless of filtering.

I would love to see examples of what people might implement as filters
(i.e. middleware that does ONE of ingress or egress processing, not
both).  From CherryPy I see things like:

 * BaseURLFilter (ingress Apache base path adjustments)
 * DecodingFilter (ingress request parameter decoding)
 * EncodingFilter (egress response header and body encoding)
 * GzipFilter (already mentioned)
 * LogDebugInfoFilter (egress insertion of page generation time into
HTML stream)
 * TidyFilter (egress piping of response body to Tidy)
 * VirtualHostFilter (similar to BaseURLFilter)

None of these (with the possible exception of LogDebugInfoFilter) I
could imagine needing to be path-specific.

        — Alice.


_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: PEP 444 / WSGI2 Proposal: Filters to suppliment middleware.

ianb
On Tue, Dec 14, 2010 at 12:54 PM, Alice Bevan–McGregor <[hidden email]> wrote:

An application is an application and servers are imagined but not actually concrete.

Could you elaborate?  (Define "concrete" in this context.)

WSGI applications never directly touch the server.  They are called by the server, but have no reference to the server.  Servers in turn take an app and parameters specific to there serveryness (which may or may not even involve HTTP), but it's good we've gotten them out of the realm of application composition (early on WSGI servers frequently handled mounting apps at locations in the path, but that's been replaced with dispatching middleware).  An application wrapped with middleware is also a single object you can hand around; we don't have an object that represents all of "application, list of pre-filters, list of post-filters".

 

If you handle filters at the server level you have to have some way of registering these filters, and it's unclear what order they should be applied.  At import?  Does the server have to poke around in the app it is running?  How can it traverse down if you have dispatching apps (like paste.urlmap or Routes)?

Filters are unaffected by, and unaware of, dispatch.  They are defined at the same time your application middleware stack is constructed, and passed (in the current implementation) to the HTTPServer protocol as a list at the same time as your wrapped application stack.


You can still implement this locally of course, as a class that takes an app and input and output filters.

If you -do- need "region specific" filtering, you can ostensibly wrap multiple final applications in filter management middleware, as you say.  That's a fairly advanced use-case regardless of filtering.

I would love to see examples of what people might implement as filters (i.e. middleware that does ONE of ingress or egress processing, not both).  From CherryPy I see things like:

* BaseURLFilter (ingress Apache base path adjustments)
* DecodingFilter (ingress request parameter decoding)
* EncodingFilter (egress response header and body encoding)
* GzipFilter (already mentioned)
* LogDebugInfoFilter (egress insertion of page generation time into HTML stream)
* TidyFilter (egress piping of response body to Tidy)
* VirtualHostFilter (similar to BaseURLFilter)

None of these (with the possible exception of LogDebugInfoFilter) I could imagine needing to be path-specific.

GzipFilter is wonky at best (it interacts oddly with range requests and etags).  Prefix handling is useful (e.g., paste.deploy.config.PrefixMiddleware), and usually global and unconfigured.  Debugging and logging stuff often needs per-path configuration, which can mean multiple instances applied after dispatch.  Encoding and Decoding don't apply to WSGI.  Tidy is intrusive and I think questionable on a global level.  I don't think the use cases are there.  Tightly bound pre-filters and post-filters are particularly problematic.  This all seems like a lot of work to avoid a few stack frames in a traceback.

--
Ian Bicking  |  http://blog.ianbicking.org

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com