Emulating req.write() in WSGI

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Emulating req.write() in WSGI

Aaron Fransen-2
One of the nice things about mod_python is the req.write() function.

Although I realize it's somewhat of an abuse to the http protocol, it's handy being able to periodically update the client browser with a status message for a long-running job.

So handy in fact that I have a number of applications that rely fairly heavily on it as a means of keeping the client (person) happy instead of just showing them the default "browser busy" notification.

There are a couple of workarounds, neither of which are ideal:
1. Take them immediately to a secondary page, then submit the actual job automatically on that second page.
2. Instead of using HTTP POST, use an HTTP Request Object (ie. Ajax).

Both of them involve significantly more development effort than an equivalent req.write().

Is there a way to emulate the periodic-write functionality in WSGI?

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

PJ Eby
At 01:01 PM 6/28/2010 -0600, Aaron Fransen wrote:

>One of the nice things about mod_python is the req.write() function.
>
>Although I realize it's somewhat of an abuse to the http protocol,
>it's handy being able to periodically update the client browser with
>a status message for a long-running job.
>
>So handy in fact that I have a number of applications that rely
>fairly heavily on it as a means of keeping the client (person) happy
>instead of just showing them the default "browser busy" notification.
>
>There are a couple of workarounds, neither of which are ideal:
>1. Take them immediately to a secondary page, then submit the actual
>job automatically on that second page.
>2. Instead of using HTTP POST, use an HTTP Request Object (ie. Ajax).
>
>Both of them involve significantly more development effort than an
>equivalent req.write().
>
>Is there a way to emulate the periodic-write functionality in WSGI?

Each string yielded (or passed to the write() callable returned by
start_response) is supposed to be sent straight through to the client.

As long as your WSGI stack is actually conformant to the protocol,
that's all you need to do.

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Aaron Fransen-2


On Mon, Jun 28, 2010 at 3:11 PM, P.J. Eby <[hidden email]> wrote:
At 01:01 PM 6/28/2010 -0600, Aaron Fransen wrote:
One of the nice things about mod_python is the req.write() function.

Although I realize it's somewhat of an abuse to the http protocol, it's handy being able to periodically update the client browser with a status message for a long-running job.

So handy in fact that I have a number of applications that rely fairly heavily on it as a means of keeping the client (person) happy instead of just showing them the default "browser busy" notification.

There are a couple of workarounds, neither of which are ideal:
1. Take them immediately to a secondary page, then submit the actual job automatically on that second page.
2. Instead of using HTTP POST, use an HTTP Request Object (ie. Ajax).

Both of them involve significantly more development effort than an equivalent req.write().

Is there a way to emulate the periodic-write functionality in WSGI?

Each string yielded (or passed to the write() callable returned by start_response) is supposed to be sent straight through to the client.

As long as your WSGI stack is actually conformant to the protocol, that's all you need to do.


Using mod_wsgi on Apache doesn't seem to exhibit that behavior.

Experimentation with the write() functionality variously produces *only* the helper text, or only the final result page, it doesn't incrementally update the user. This behaviour appears to be dependent on the inclusion of the Content-Length header field.

Yield command has not produced better results either, as it seems to produce the yield output then, as far as what's presented to the browser, exit the program completely (yet no errors in the log to speak of).

I'll experiment with yield some more to see if I can more sharply define what's going on.

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

PJ Eby
At 03:43 PM 6/28/2010 -0600, Aaron Fransen wrote:
>Using mod_wsgi on Apache doesn't seem to exhibit that behavior.

You may need "WSGIOutputBuffering Off" in your config; see:

http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGIOutputBuffering

Another possibility is that you've got some middleware or something
else buffering between your app and mod_wsgi, I suppose.

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Graham Dumpleton-2
On 29 June 2010 08:41, P.J. Eby <[hidden email]> wrote:

> At 03:43 PM 6/28/2010 -0600, Aaron Fransen wrote:
>>
>> Using mod_wsgi on Apache doesn't seem to exhibit that behavior.
>
> You may need "WSGIOutputBuffering Off" in your config; see:
>
> http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGIOutputBuffering
>
> Another possibility is that you've got some middleware or something else
> buffering between your app and mod_wsgi, I suppose.

Actually, that directive doesn't exist any more, plus the default even
when it was was unbuffered. I have removed it from the documentation.

If they are experiencing delays when using write() then possibly that
have an Apache output filter installed which is buffering up response
content and delaying it. For example, CONTENT_LENGTH of DEFLATE output
filters.

In other words, mod_wsgi doesn't delay it.

Graham
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Graham Dumpleton-2
In reply to this post by Aaron Fransen-2
On 29 June 2010 05:01, Aaron Fransen <[hidden email]> wrote:
> One of the nice things about mod_python is the req.write() function.

One thing I should warn you about req.write() in Apache is that for
streaming data as you seem to be using it, it will accumulate memory
against a request for each write call and that will not be reused,
albeit it will be released again at the end of the request.

The problem here isn't actually in mod_python but in the underlying
Apache ap_rwrite() call.

What this function does is that for each call to it, it creates what
is called a bucket to hold the data to be written. The memory for this
bucket is allocated from the per request memory pool each time. This
bucket is then passed down the Apache output filter chain and
eventually the data gets written out.

Now, because the code doesn't attempt to reuse the bucket, that memory
then remains unused, but still allocated against the memory pool, with
the memory pool only being destroyed at the end of the request.

The outcome of this is that if you had a long running request which
continually wrote out response data in small bits using req.write(),
for each call there is a small increase in amount of memory taken from
the per request memory pool with it not being reused. Thus if the
request were running for a very long time, you will see a gradual
increase in overall memory usage of the process. When the request
finishes, the memory is reclaimed and reused, but you have by then
already set the high ceiling on ongoing process memory in use.

Anyway, thought I should just warn you about this. In part this issue
may even be why mod_python got a reputation for memory bloat in some
situations. That is, the fundamental way of returning response data
could cause unnecessary increase in process size if called many times
for a request.

Graham
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Aaron Fransen-2


On Mon, Jun 28, 2010 at 5:42 PM, Graham Dumpleton <[hidden email]> wrote:
On 29 June 2010 05:01, Aaron Fransen <[hidden email]> wrote:
> One of the nice things about mod_python is the req.write() function.

One thing I should warn you about req.write() in Apache is that for
streaming data as you seem to be using it, it will accumulate memory
against a request for each write call and that will not be reused,
albeit it will be released again at the end of the request.

The problem here isn't actually in mod_python but in the underlying
Apache ap_rwrite() call.

What this function does is that for each call to it, it creates what
is called a bucket to hold the data to be written. The memory for this
bucket is allocated from the per request memory pool each time. This
bucket is then passed down the Apache output filter chain and
eventually the data gets written out.

Now, because the code doesn't attempt to reuse the bucket, that memory
then remains unused, but still allocated against the memory pool, with
the memory pool only being destroyed at the end of the request.

The outcome of this is that if you had a long running request which
continually wrote out response data in small bits using req.write(),
for each call there is a small increase in amount of memory taken from
the per request memory pool with it not being reused. Thus if the
request were running for a very long time, you will see a gradual
increase in overall memory usage of the process. When the request
finishes, the memory is reclaimed and reused, but you have by then
already set the high ceiling on ongoing process memory in use.

Anyway, thought I should just warn you about this. In part this issue
may even be why mod_python got a reputation for memory bloat in some
situations. That is, the fundamental way of returning response data
could cause unnecessary increase in process size if called many times
for a request.

Graham


Fortunately we're not talking about a huge amount of data here, basically just a couple of notices to keep the user happy (less than 1K usually).

When using yield, it's as if the module where the yield command is run is completely ignored. The page returned is a "default" page generated by the application. Errors are being trapped, but none are being generated, it's just exiting without any kind of notice.

When using write() without a Content-Length header, nothing shows on the browser.

When using write() with a Content-Length header, the first update shows (and only after the entire page has been generated), but none of the subsequent ones nor the final page.

When using write() with a Content-Length header set large enough to encompass the entire final result, the final result page shows, but none of the informational messages leading up to the generation of the page appear.

I haven't really done anything to the base wsgi installation; just set it up in daemon mode.


_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Aaron Fransen-2


On Tue, Jun 29, 2010 at 7:37 AM, Aaron Fransen <[hidden email]> wrote:


On Mon, Jun 28, 2010 at 5:42 PM, Graham Dumpleton <[hidden email]> wrote:
On 29 June 2010 05:01, Aaron Fransen <[hidden email]> wrote:
> One of the nice things about mod_python is the req.write() function.

One thing I should warn you about req.write() in Apache is that for
streaming data as you seem to be using it, it will accumulate memory
against a request for each write call and that will not be reused,
albeit it will be released again at the end of the request.

The problem here isn't actually in mod_python but in the underlying
Apache ap_rwrite() call.

What this function does is that for each call to it, it creates what
is called a bucket to hold the data to be written. The memory for this
bucket is allocated from the per request memory pool each time. This
bucket is then passed down the Apache output filter chain and
eventually the data gets written out.

Now, because the code doesn't attempt to reuse the bucket, that memory
then remains unused, but still allocated against the memory pool, with
the memory pool only being destroyed at the end of the request.

The outcome of this is that if you had a long running request which
continually wrote out response data in small bits using req.write(),
for each call there is a small increase in amount of memory taken from
the per request memory pool with it not being reused. Thus if the
request were running for a very long time, you will see a gradual
increase in overall memory usage of the process. When the request
finishes, the memory is reclaimed and reused, but you have by then
already set the high ceiling on ongoing process memory in use.

Anyway, thought I should just warn you about this. In part this issue
may even be why mod_python got a reputation for memory bloat in some
situations. That is, the fundamental way of returning response data
could cause unnecessary increase in process size if called many times
for a request.

Graham


Fortunately we're not talking about a huge amount of data here, basically just a couple of notices to keep the user happy (less than 1K usually).

When using yield, it's as if the module where the yield command is run is completely ignored. The page returned is a "default" page generated by the application. Errors are being trapped, but none are being generated, it's just exiting without any kind of notice.

When using write() without a Content-Length header, nothing shows on the browser.

When using write() with a Content-Length header, the first update shows (and only after the entire page has been generated), but none of the subsequent ones nor the final page.

When using write() with a Content-Length header set large enough to encompass the entire final result, the final result page shows, but none of the informational messages leading up to the generation of the page appear.

I haven't really done anything to the base wsgi installation; just set it up in daemon mode.


Couple more things I've been able to discern.

The first happened after I "fixed" the html code. Originally under mod_python, I guess I was cheating more than a little bit by sending <html></html> code blocks twice, once for the incremental notices, once for the final content. Once I changed the code to send a single properly parsed block, the entire document showed up as expected, however it still did not send any part of the html incrementally.

Watching the line with Wireshark, all of the data was transmitted at the same time, so nothing was sent to the browser incrementally.

(This is using the write() functionality, I haven't tried watching the line with yield yet.)

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

PJ Eby
At 10:14 AM 6/29/2010 -0600, Aaron Fransen wrote:

>Couple more things I've been able to discern.
>
>The first happened after I "fixed" the html code. Originally under
>mod_python, I guess I was cheating more than a little bit by sending
><html></html> code blocks twice, once for the incremental notices,
>once for the final content. Once I changed the code to send a single
>properly parsed block, the entire document showed up as expected,
>however it still did not send any part of the html incrementally.
>
>Watching the line with Wireshark, all of the data was transmitted at
>the same time, so nothing was sent to the browser incrementally.

So, you're not sending a multipart/x-mixed-replace ("server push")
transmission?

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

PJ Eby
At 12:33 PM 6/29/2010 -0600, Aaron Fransen wrote:
>I was sending text/html (I probably should have used multipart
>before) ... should I try multipart now, even with having everything
>in a single stream?

Heck if I know.  I just assumed that what you're doing would be
unlikely to work, whereas multipart has at least been previously
documented as working with Apache (at least for nph scripts).  Dunno
if mod_wsgi'll do that or not.

Actually, what I'd do in your place is try a "nph-" CGI in Python
(using a wsgiref CGIHandler with its 'origin_server' attribute set to
True), have it send multipart, and see if that works.  If it doesn't
work, then it's probably a problem with your app.

If it *does* work, but the same app doesn't work under mod_wsgi, then
it's a mod_wsgi issue; possibly related to configuration.  From what
Graham's said, mod_wsgi shouldn't be buffering anything, which means
it has to either be Apache or your app that's buffering.  If it's
Apache, doing a proper nph+multipart ought to fix it, unless there's
something else going on in the Apache configuration.


_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Graham Dumpleton-2
In reply to this post by Aaron Fransen-2
On 29 June 2010 23:37, Aaron Fransen <[hidden email]> wrote:

> Fortunately we're not talking about a huge amount of data here, basically
> just a couple of notices to keep the user happy (less than 1K usually).
>
> When using yield, it's as if the module where the yield command is run is
> completely ignored. The page returned is a "default" page generated by the
> application. Errors are being trapped, but none are being generated, it's
> just exiting without any kind of notice.
>
> When using write() without a Content-Length header, nothing shows on the
> browser.
>
> When using write() with a Content-Length header, the first update shows (and
> only after the entire page has been generated), but none of the subsequent
> ones nor the final page.
>
> When using write() with a Content-Length header set large enough to
> encompass the entire final result, the final result page shows, but none of
> the informational messages leading up to the generation of the page appear.

These statements concerns me.

The Content-Length header if you are sending a response of unknown
length should not be set. Further, you definitely cannot return/write
more response data than is specified by Content-Length. Doing so
breaks HTTP and mod_wsgi will actually deliberately discard anything
returned over what Content-Length specifies.

Can you clarify this? Are you setting Content-Length to a value less
than the amount of data you could actually return?

Graham
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Graham Dumpleton-2
In reply to this post by Aaron Fransen-2
On 30 June 2010 02:14, Aaron Fransen <[hidden email]> wrote:

> Couple more things I've been able to discern.
>
> The first happened after I "fixed" the html code. Originally under
> mod_python, I guess I was cheating more than a little bit by sending
> <html></html> code blocks twice, once for the incremental notices, once for
> the final content. Once I changed the code to send a single properly parsed
> block, the entire document showed up as expected, however it still did not
> send any part of the html incrementally.
>
> Watching the line with Wireshark, all of the data was transmitted at the
> same time, so nothing was sent to the browser incrementally.
>
> (This is using the write() functionality, I haven't tried watching the line
> with yield yet.)

Use a variation of WSGI middleware wrapper in:

  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Tracking_Request_and_Response

using it to 'print' returned data to Apache log and then tail Apache
error log to see when that data is output. Alternatively, change the
code there to output a time stamp against each chunk of data written
to the file recording the response content.

This will show what data is returned by WSGI application, before
mod_wsgi truncates anything greater than content length specified,
plus also show whether it is your WSGI application which is delaying
output somehow, or whether Apache output filters are doing it.

Graham
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Aaron Fransen-2


On Tue, Jun 29, 2010 at 6:17 PM, Graham Dumpleton <[hidden email]> wrote:
On 30 June 2010 02:14, Aaron Fransen <[hidden email]> wrote:
> Couple more things I've been able to discern.
>
> The first happened after I "fixed" the html code. Originally under
> mod_python, I guess I was cheating more than a little bit by sending
> <html></html> code blocks twice, once for the incremental notices, once for
> the final content. Once I changed the code to send a single properly parsed
> block, the entire document showed up as expected, however it still did not
> send any part of the html incrementally.
>
> Watching the line with Wireshark, all of the data was transmitted at the
> same time, so nothing was sent to the browser incrementally.
>
> (This is using the write() functionality, I haven't tried watching the line
> with yield yet.)

Use a variation of WSGI middleware wrapper in:

 http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Tracking_Request_and_Response

using it to 'print' returned data to Apache log and then tail Apache
error log to see when that data is output. Alternatively, change the
code there to output a time stamp against each chunk of data written
to the file recording the response content.

This will show what data is returned by WSGI application, before
mod_wsgi truncates anything greater than content length specified,
plus also show whether it is your WSGI application which is delaying
output somehow, or whether Apache output filters are doing it.

Graham

I've actually tried a variation on this already using a built-in logging facility in the application that writes date/time values to an external log file with comments, and in the case of testing wsgi I actually included some time.sleep() statements to force a delay in the application.

To give you an idea of the flow, here's essentially what's going on:

def application(environ,start_response):
    mydict = {}
    mydict['environ']=environ
    mydict['startresponse'] = start_response
    # run program in another .py file that has been imported
    RunTest(mydict)
   
Then in the other module you would have something like:

def RunTest(mydict):
    status = '200 OK'
    response_headers = [('Content-type','text/html')]
    writeobj = detail['startresponse'](status,response_headers)
    writeobj('<html><body>Fetching sales for 2009...')
    time.sleep(2)
    writeobj('<br>Fetching sales for 2010...')

    ...then finally...

    writeobj('5000 results returned.</body></html>')
    return

This is obviously a truncated (and fake) example, but it gives you an idea of the flow.


_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Graham Dumpleton-2
On 30 June 2010 21:35, Aaron Fransen <[hidden email]> wrote:

>
>
> On Tue, Jun 29, 2010 at 6:17 PM, Graham Dumpleton
> <[hidden email]> wrote:
>>
>> On 30 June 2010 02:14, Aaron Fransen <[hidden email]> wrote:
>> > Couple more things I've been able to discern.
>> >
>> > The first happened after I "fixed" the html code. Originally under
>> > mod_python, I guess I was cheating more than a little bit by sending
>> > <html></html> code blocks twice, once for the incremental notices, once
>> > for
>> > the final content. Once I changed the code to send a single properly
>> > parsed
>> > block, the entire document showed up as expected, however it still did
>> > not
>> > send any part of the html incrementally.
>> >
>> > Watching the line with Wireshark, all of the data was transmitted at the
>> > same time, so nothing was sent to the browser incrementally.
>> >
>> > (This is using the write() functionality, I haven't tried watching the
>> > line
>> > with yield yet.)
>>
>> Use a variation of WSGI middleware wrapper in:
>>
>>
>>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Tracking_Request_and_Response
>>
>> using it to 'print' returned data to Apache log and then tail Apache
>> error log to see when that data is output. Alternatively, change the
>> code there to output a time stamp against each chunk of data written
>> to the file recording the response content.
>>
>> This will show what data is returned by WSGI application, before
>> mod_wsgi truncates anything greater than content length specified,
>> plus also show whether it is your WSGI application which is delaying
>> output somehow, or whether Apache output filters are doing it.
>>
>> Graham
>
> I've actually tried a variation on this already using a built-in logging
> facility in the application that writes date/time values to an external log
> file with comments, and in the case of testing wsgi I actually included some
> time.sleep() statements to force a delay in the application.
>
> To give you an idea of the flow, here's essentially what's going on:
>
> def application(environ,start_response):
>     mydict = {}
>     mydict['environ']=environ
>     mydict['startresponse'] = start_response
>     # run program in another .py file that has been imported
>     RunTest(mydict)
>
> Then in the other module you would have something like:
>
> def RunTest(mydict):
>     status = '200 OK'
>     response_headers = [('Content-type','text/html')]
>     writeobj = detail['startresponse'](status,response_headers)
>     writeobj('<html><body>Fetching sales for 2009...')
>     time.sleep(2)
>     writeobj('<br>Fetching sales for 2010...')
>
>     ...then finally...
>
>     writeobj('5000 results returned.</body></html>')
>     return
>
> This is obviously a truncated (and fake) example, but it gives you an idea
> of the flow.

Now go try the following two examples as illustrated instead.

In both cases, do not use a web browser, instead telnet to the port of
the web server and enter HTTP GET directly. If you are not using
VirtualHost, use something like:

  telnet localhost 80
  GET /stream-yield.wsgi HTTP/1.0

If using a VirtualHost, use something like:

  telnet localhost 80
  GET /stream-yield.wsgi HTTP/1.1
  Host: tests.example.com

Ensure additional blank line entered to indicate end of headers.

First example uses yield.

# stream-yield.wsgi

import time

def application(environ, start_response):
    status = '200 OK'

    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)

    for i in range(10):
      yield '%d\n' % i
      time.sleep(1)

Second example uses write:

# stream-write.wsgi

import time

def application(environ, start_response):
    status = '200 OK'

    response_headers = [('Content-type', 'text/plain')]
    write = start_response(status, response_headers)

    for i in range(10):
      write('%d\n' % i)
      time.sleep(1)

    return []

For me, using stock standard operating system supplied Apache on Mac
OS X, I see a line returned every second.

If I use Safari as a web browser, in both cases the browser only shows
the response after all data has been written and the socket connection
closed. If I use Firefox however, they display as data comes in.

This delay in display is thus possibly just the behaviour of a
specific browser delaying the display until the socket is closed.

The example for multipart/x-mixed-replace which others mention is:

import time

def application(environ, start_response):
    status = '200 OK'

    response_headers = [('Content-Type', 'multipart/x-mixed-replace;
boundary=xstringx')]
    start_response(status, response_headers)

    yield '--xstrinx\n'

    for i in range(10):

      yield 'Content-type: text/plain\n'
      yield '\n'
      yield '%d\n' % i
      yield '--xstringx\n'

      time.sleep(1)

With telnet you will see the various sections, but with Safari again
only shows at end, although you will find that it only shows the data
line, ie., the number and not all the other stuff. So, understands
multipart format but doesn't support x-mixed-replace. It was always
the case that only certain browsers supported that mime type. In the
case of Firefox, it doesn't seem to like it at all and seems to give
up and not display anything, not even replacing the previously
displayed page contents.

What this means is that you cant rely on browsers to handle multipart
mixed replace alone. If you were really going to use that format, you
really want to use JavaScript and AJAX stuff to process it. The same
applies for progressive display of plain text content when streamed
over time.

In summary, you really want to be using some JavaScript/AJAX stuff on
browser side to get uniform behaviour on all the browsers.

Graham
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Graham Dumpleton-2
On 30 June 2010 22:26, Graham Dumpleton <[hidden email]> wrote:

> On 30 June 2010 21:35, Aaron Fransen <[hidden email]> wrote:
>>
>>
>> On Tue, Jun 29, 2010 at 6:17 PM, Graham Dumpleton
>> <[hidden email]> wrote:
>>>
>>> On 30 June 2010 02:14, Aaron Fransen <[hidden email]> wrote:
>>> > Couple more things I've been able to discern.
>>> >
>>> > The first happened after I "fixed" the html code. Originally under
>>> > mod_python, I guess I was cheating more than a little bit by sending
>>> > <html></html> code blocks twice, once for the incremental notices, once
>>> > for
>>> > the final content. Once I changed the code to send a single properly
>>> > parsed
>>> > block, the entire document showed up as expected, however it still did
>>> > not
>>> > send any part of the html incrementally.
>>> >
>>> > Watching the line with Wireshark, all of the data was transmitted at the
>>> > same time, so nothing was sent to the browser incrementally.
>>> >
>>> > (This is using the write() functionality, I haven't tried watching the
>>> > line
>>> > with yield yet.)
>>>
>>> Use a variation of WSGI middleware wrapper in:
>>>
>>>
>>>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Tracking_Request_and_Response
>>>
>>> using it to 'print' returned data to Apache log and then tail Apache
>>> error log to see when that data is output. Alternatively, change the
>>> code there to output a time stamp against each chunk of data written
>>> to the file recording the response content.
>>>
>>> This will show what data is returned by WSGI application, before
>>> mod_wsgi truncates anything greater than content length specified,
>>> plus also show whether it is your WSGI application which is delaying
>>> output somehow, or whether Apache output filters are doing it.
>>>
>>> Graham
>>
>> I've actually tried a variation on this already using a built-in logging
>> facility in the application that writes date/time values to an external log
>> file with comments, and in the case of testing wsgi I actually included some
>> time.sleep() statements to force a delay in the application.
>>
>> To give you an idea of the flow, here's essentially what's going on:
>>
>> def application(environ,start_response):
>>     mydict = {}
>>     mydict['environ']=environ
>>     mydict['startresponse'] = start_response
>>     # run program in another .py file that has been imported
>>     RunTest(mydict)
>>
>> Then in the other module you would have something like:
>>
>> def RunTest(mydict):
>>     status = '200 OK'
>>     response_headers = [('Content-type','text/html')]
>>     writeobj = detail['startresponse'](status,response_headers)
>>     writeobj('<html><body>Fetching sales for 2009...')
>>     time.sleep(2)
>>     writeobj('<br>Fetching sales for 2010...')
>>
>>     ...then finally...
>>
>>     writeobj('5000 results returned.</body></html>')
>>     return
>>
>> This is obviously a truncated (and fake) example, but it gives you an idea
>> of the flow.
>
> Now go try the following two examples as illustrated instead.
>
> In both cases, do not use a web browser, instead telnet to the port of
> the web server and enter HTTP GET directly. If you are not using
> VirtualHost, use something like:
>
>  telnet localhost 80
>  GET /stream-yield.wsgi HTTP/1.0
>
> If using a VirtualHost, use something like:
>
>  telnet localhost 80
>  GET /stream-yield.wsgi HTTP/1.1
>  Host: tests.example.com
>
> Ensure additional blank line entered to indicate end of headers.
>
> First example uses yield.
>
> # stream-yield.wsgi
>
> import time
>
> def application(environ, start_response):
>    status = '200 OK'
>
>    response_headers = [('Content-type', 'text/plain')]
>    start_response(status, response_headers)
>
>    for i in range(10):
>      yield '%d\n' % i
>      time.sleep(1)
>
> Second example uses write:
>
> # stream-write.wsgi
>
> import time
>
> def application(environ, start_response):
>    status = '200 OK'
>
>    response_headers = [('Content-type', 'text/plain')]
>    write = start_response(status, response_headers)
>
>    for i in range(10):
>      write('%d\n' % i)
>      time.sleep(1)
>
>    return []
>
> For me, using stock standard operating system supplied Apache on Mac
> OS X, I see a line returned every second.
>
> If I use Safari as a web browser, in both cases the browser only shows
> the response after all data has been written and the socket connection
> closed. If I use Firefox however, they display as data comes in.
>
> This delay in display is thus possibly just the behaviour of a
> specific browser delaying the display until the socket is closed.
>
> The example for multipart/x-mixed-replace which others mention is:
>
> import time
>
> def application(environ, start_response):
>    status = '200 OK'
>
>    response_headers = [('Content-Type', 'multipart/x-mixed-replace;
> boundary=xstringx')]
>    start_response(status, response_headers)
>
>    yield '--xstrinx\n'
>
>    for i in range(10):
>
>      yield 'Content-type: text/plain\n'
>      yield '\n'
>      yield '%d\n' % i
>      yield '--xstringx\n'
>
>      time.sleep(1)
>
> With telnet you will see the various sections, but with Safari again
> only shows at end, although you will find that it only shows the data
> line, ie., the number and not all the other stuff. So, understands
> multipart format but doesn't support x-mixed-replace. It was always
> the case that only certain browsers supported that mime type. In the
> case of Firefox, it doesn't seem to like it at all and seems to give
> up and not display anything, not even replacing the previously
> displayed page contents.
>
> What this means is that you cant rely on browsers to handle multipart
> mixed replace alone. If you were really going to use that format, you
> really want to use JavaScript and AJAX stuff to process it. The same
> applies for progressive display of plain text content when streamed
> over time.
>
> In summary, you really want to be using some JavaScript/AJAX stuff on
> browser side to get uniform behaviour on all the browsers.

Based on some Googling, sees that Firefox may have got rid of support
for multipart mixed replace in version 3.0.

Graham
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Aaron Fransen-2
In reply to this post by Graham Dumpleton-2


On Wed, Jun 30, 2010 at 6:26 AM, Graham Dumpleton <[hidden email]> wrote:
On 30 June 2010 21:35, Aaron Fransen <[hidden email]> wrote:
>
>
> On Tue, Jun 29, 2010 at 6:17 PM, Graham Dumpleton
> <[hidden email]> wrote:
>>
>> On 30 June 2010 02:14, Aaron Fransen <[hidden email]> wrote:
>> > Couple more things I've been able to discern.
>> >
>> > The first happened after I "fixed" the html code. Originally under
>> > mod_python, I guess I was cheating more than a little bit by sending
>> > <html></html> code blocks twice, once for the incremental notices, once
>> > for
>> > the final content. Once I changed the code to send a single properly
>> > parsed
>> > block, the entire document showed up as expected, however it still did
>> > not
>> > send any part of the html incrementally.
>> >
>> > Watching the line with Wireshark, all of the data was transmitted at the
>> > same time, so nothing was sent to the browser incrementally.
>> >
>> > (This is using the write() functionality, I haven't tried watching the
>> > line
>> > with yield yet.)
>>
>> Use a variation of WSGI middleware wrapper in:
>>
>>
>>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Tracking_Request_and_Response
>>
>> using it to 'print' returned data to Apache log and then tail Apache
>> error log to see when that data is output. Alternatively, change the
>> code there to output a time stamp against each chunk of data written
>> to the file recording the response content.
>>
>> This will show what data is returned by WSGI application, before
>> mod_wsgi truncates anything greater than content length specified,
>> plus also show whether it is your WSGI application which is delaying
>> output somehow, or whether Apache output filters are doing it.
>>
>> Graham
>
> I've actually tried a variation on this already using a built-in logging
> facility in the application that writes date/time values to an external log
> file with comments, and in the case of testing wsgi I actually included some
> time.sleep() statements to force a delay in the application.
>
> To give you an idea of the flow, here's essentially what's going on:
>
> def application(environ,start_response):
>     mydict = {}
>     mydict['environ']=environ
>     mydict['startresponse'] = start_response
>     # run program in another .py file that has been imported
>     RunTest(mydict)
>
> Then in the other module you would have something like:
>
> def RunTest(mydict):
>     status = '200 OK'
>     response_headers = [('Content-type','text/html')]
>     writeobj = detail['startresponse'](status,response_headers)
>     writeobj('<html><body>Fetching sales for 2009...')
>     time.sleep(2)
>     writeobj('<br>Fetching sales for 2010...')
>
>     ...then finally...
>
>     writeobj('5000 results returned.</body></html>')
>     return
>
> This is obviously a truncated (and fake) example, but it gives you an idea
> of the flow.

Now go try the following two examples as illustrated instead.

In both cases, do not use a web browser, instead telnet to the port of
the web server and enter HTTP GET directly. If you are not using
VirtualHost, use something like:

 telnet localhost 80
 GET /stream-yield.wsgi HTTP/1.0

If using a VirtualHost, use something like:

 telnet localhost 80
 GET /stream-yield.wsgi HTTP/1.1
 Host: tests.example.com

Ensure additional blank line entered to indicate end of headers.

First example uses yield.

# stream-yield.wsgi

import time

def application(environ, start_response):
   status = '200 OK'

   response_headers = [('Content-type', 'text/plain')]
   start_response(status, response_headers)

   for i in range(10):
     yield '%d\n' % i
     time.sleep(1)

Second example uses write:

# stream-write.wsgi

import time

def application(environ, start_response):
   status = '200 OK'

   response_headers = [('Content-type', 'text/plain')]
   write = start_response(status, response_headers)

   for i in range(10):
     write('%d\n' % i)
     time.sleep(1)

   return []

For me, using stock standard operating system supplied Apache on Mac
OS X, I see a line returned every second.

If I use Safari as a web browser, in both cases the browser only shows
the response after all data has been written and the socket connection
closed. If I use Firefox however, they display as data comes in.

This delay in display is thus possibly just the behaviour of a
specific browser delaying the display until the socket is closed.

The example for multipart/x-mixed-replace which others mention is:

import time

def application(environ, start_response):
   status = '200 OK'

   response_headers = [('Content-Type', 'multipart/x-mixed-replace;
boundary=xstringx')]
   start_response(status, response_headers)

   yield '--xstrinx\n'

   for i in range(10):

     yield 'Content-type: text/plain\n'
     yield '\n'
     yield '%d\n' % i
     yield '--xstringx\n'

     time.sleep(1)

With telnet you will see the various sections, but with Safari again
only shows at end, although you will find that it only shows the data
line, ie., the number and not all the other stuff. So, understands
multipart format but doesn't support x-mixed-replace. It was always
the case that only certain browsers supported that mime type. In the
case of Firefox, it doesn't seem to like it at all and seems to give
up and not display anything, not even replacing the previously
displayed page contents.

What this means is that you cant rely on browsers to handle multipart
mixed replace alone. If you were really going to use that format, you
really want to use JavaScript and AJAX stuff to process it. The same
applies for progressive display of plain text content when streamed
over time.

In summary, you really want to be using some JavaScript/AJAX stuff on
browser side to get uniform behaviour on all the browsers.

Graham

I can see that this could potentially get very ugly very quickly.

Using stock Apache on the current Ubuntu server, using yield produced a response error and using write() (over the telnet interface) returned the 0 only and disconnected. Similar behavior in Firefox.

How odd that nobody's come up with a simple streaming/update schema (at least to my mind).

It would have been nice to be able to provide some kind of in-stream feedback for long running jobs, but it looks like I'm going to have to abandon that approach. The only issue with either of the other solutions is that each subsequent request depends on data provided by the prior, so the amount of traffic going back & forth could potentially become a problem.

Alternatively I could simply create a session database that saves the required objects then each subsequent request simply fetches the required one from the table and...

Well, you can see why streaming seemed like such a simple solution! Back to the drawing board, as it were.

Thanks all.

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Éric Araujo
> How odd that nobody's come up with a simple streaming/update schema (at
> least to my mind).

Well, if I understand HTTP correctly, it wasn’t designed to do that.

> Well, you can see why streaming seemed like such a simple solution! Back to
> the drawing board, as it were.

Perhaps new things like Comet¹, HTML 5’s Server Events, HTML 5’s
WebSockets, or even XMPP over HTTP.

Regards

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Éric Araujo
Forgot the footnote:

¹ http://en.wikipedia.org/wiki/Comet_%28programming%29

_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Graham Dumpleton-2
In reply to this post by Aaron Fransen-2
On 30 June 2010 22:55, Aaron Fransen <[hidden email]> wrote:

>
>
> On Wed, Jun 30, 2010 at 6:26 AM, Graham Dumpleton
> <[hidden email]> wrote:
>>
>> On 30 June 2010 21:35, Aaron Fransen <[hidden email]> wrote:
>> >
>> >
>> > On Tue, Jun 29, 2010 at 6:17 PM, Graham Dumpleton
>> > <[hidden email]> wrote:
>> >>
>> >> On 30 June 2010 02:14, Aaron Fransen <[hidden email]> wrote:
>> >> > Couple more things I've been able to discern.
>> >> >
>> >> > The first happened after I "fixed" the html code. Originally under
>> >> > mod_python, I guess I was cheating more than a little bit by sending
>> >> > <html></html> code blocks twice, once for the incremental notices,
>> >> > once
>> >> > for
>> >> > the final content. Once I changed the code to send a single properly
>> >> > parsed
>> >> > block, the entire document showed up as expected, however it still
>> >> > did
>> >> > not
>> >> > send any part of the html incrementally.
>> >> >
>> >> > Watching the line with Wireshark, all of the data was transmitted at
>> >> > the
>> >> > same time, so nothing was sent to the browser incrementally.
>> >> >
>> >> > (This is using the write() functionality, I haven't tried watching
>> >> > the
>> >> > line
>> >> > with yield yet.)
>> >>
>> >> Use a variation of WSGI middleware wrapper in:
>> >>
>> >>
>> >>
>> >>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Tracking_Request_and_Response
>> >>
>> >> using it to 'print' returned data to Apache log and then tail Apache
>> >> error log to see when that data is output. Alternatively, change the
>> >> code there to output a time stamp against each chunk of data written
>> >> to the file recording the response content.
>> >>
>> >> This will show what data is returned by WSGI application, before
>> >> mod_wsgi truncates anything greater than content length specified,
>> >> plus also show whether it is your WSGI application which is delaying
>> >> output somehow, or whether Apache output filters are doing it.
>> >>
>> >> Graham
>> >
>> > I've actually tried a variation on this already using a built-in logging
>> > facility in the application that writes date/time values to an external
>> > log
>> > file with comments, and in the case of testing wsgi I actually included
>> > some
>> > time.sleep() statements to force a delay in the application.
>> >
>> > To give you an idea of the flow, here's essentially what's going on:
>> >
>> > def application(environ,start_response):
>> >     mydict = {}
>> >     mydict['environ']=environ
>> >     mydict['startresponse'] = start_response
>> >     # run program in another .py file that has been imported
>> >     RunTest(mydict)
>> >
>> > Then in the other module you would have something like:
>> >
>> > def RunTest(mydict):
>> >     status = '200 OK'
>> >     response_headers = [('Content-type','text/html')]
>> >     writeobj = detail['startresponse'](status,response_headers)
>> >     writeobj('<html><body>Fetching sales for 2009...')
>> >     time.sleep(2)
>> >     writeobj('<br>Fetching sales for 2010...')
>> >
>> >     ...then finally...
>> >
>> >     writeobj('5000 results returned.</body></html>')
>> >     return
>> >
>> > This is obviously a truncated (and fake) example, but it gives you an
>> > idea
>> > of the flow.
>>
>> Now go try the following two examples as illustrated instead.
>>
>> In both cases, do not use a web browser, instead telnet to the port of
>> the web server and enter HTTP GET directly. If you are not using
>> VirtualHost, use something like:
>>
>>  telnet localhost 80
>>  GET /stream-yield.wsgi HTTP/1.0
>>
>> If using a VirtualHost, use something like:
>>
>>  telnet localhost 80
>>  GET /stream-yield.wsgi HTTP/1.1
>>  Host: tests.example.com
>>
>> Ensure additional blank line entered to indicate end of headers.
>>
>> First example uses yield.
>>
>> # stream-yield.wsgi
>>
>> import time
>>
>> def application(environ, start_response):
>>    status = '200 OK'
>>
>>    response_headers = [('Content-type', 'text/plain')]
>>    start_response(status, response_headers)
>>
>>    for i in range(10):
>>      yield '%d\n' % i
>>      time.sleep(1)
>>
>> Second example uses write:
>>
>> # stream-write.wsgi
>>
>> import time
>>
>> def application(environ, start_response):
>>    status = '200 OK'
>>
>>    response_headers = [('Content-type', 'text/plain')]
>>    write = start_response(status, response_headers)
>>
>>    for i in range(10):
>>      write('%d\n' % i)
>>      time.sleep(1)
>>
>>    return []
>>
>> For me, using stock standard operating system supplied Apache on Mac
>> OS X, I see a line returned every second.
>>
>> If I use Safari as a web browser, in both cases the browser only shows
>> the response after all data has been written and the socket connection
>> closed. If I use Firefox however, they display as data comes in.
>>
>> This delay in display is thus possibly just the behaviour of a
>> specific browser delaying the display until the socket is closed.
>>
>> The example for multipart/x-mixed-replace which others mention is:
>>
>> import time
>>
>> def application(environ, start_response):
>>    status = '200 OK'
>>
>>    response_headers = [('Content-Type', 'multipart/x-mixed-replace;
>> boundary=xstringx')]
>>    start_response(status, response_headers)
>>
>>    yield '--xstrinx\n'
>>
>>    for i in range(10):
>>
>>      yield 'Content-type: text/plain\n'
>>      yield '\n'
>>      yield '%d\n' % i
>>      yield '--xstringx\n'
>>
>>      time.sleep(1)
>>
>> With telnet you will see the various sections, but with Safari again
>> only shows at end, although you will find that it only shows the data
>> line, ie., the number and not all the other stuff. So, understands
>> multipart format but doesn't support x-mixed-replace. It was always
>> the case that only certain browsers supported that mime type. In the
>> case of Firefox, it doesn't seem to like it at all and seems to give
>> up and not display anything, not even replacing the previously
>> displayed page contents.
>>
>> What this means is that you cant rely on browsers to handle multipart
>> mixed replace alone. If you were really going to use that format, you
>> really want to use JavaScript and AJAX stuff to process it. The same
>> applies for progressive display of plain text content when streamed
>> over time.
>>
>> In summary, you really want to be using some JavaScript/AJAX stuff on
>> browser side to get uniform behaviour on all the browsers.
>>
>> Graham
>
> I can see that this could potentially get very ugly very quickly.
>
> Using stock Apache on the current Ubuntu server, using yield produced a
> response error

What error? If you aren't going to debug it enough to even work out
what the error is in the browser or Apache error logs and post it here
for comment so can say what may be wrong on your system, then we cant
exactly help you much can we.

> and using write() (over the telnet interface) returned the 0
> only and disconnected. Similar behavior in Firefox.

All the scripts I provided you are conforming WSGI applications and
work on mod_wsgi. If you are having issues, then it is likely going to
be the way your Apache/Python is setup or how you configured mod_wsgi
to host the scripts. Again, because you are providing no details about
how you configured mod_wsgi we cant help you work out what is wrong
with your system.

> How odd that nobody's come up with a simple streaming/update schema (at
> least to my mind).

For response content they have and it can be made to work. Just
because you cant get it working or don't understand what we are saying
about the need to use a JavaScript/AJAX type client (eg. comet style)
to make use of it as opposed to trying to rely on browser
functionality that doesn't exist doesn't change that. Request content
streaming is a different matter as I will explain below but you
haven't even mentioned that as yet that I can see.

> It would have been nice to be able to provide some kind of in-stream
> feedback for long running jobs, but it looks like I'm going to have to
> abandon that approach. The only issue with either of the other solutions is
> that each subsequent request depends on data provided by the prior, so the
> amount of traffic going back & forth could potentially become a problem.
>
> Alternatively I could simply create a session database that saves the
> required objects then each subsequent request simply fetches the required
> one from the table and...
>
> Well, you can see why streaming seemed like such a simple solution! Back to
> the drawing board, as it were.

I'll try one last time to try and summarise a few issues for you,
although based on your attitude so far, I don't think it will change
your opinion or help your understanding.

1. Streaming of responses from a WSGI application works fine using
either yield or write(). If it doesn't work for a specific WSGI
hosting mechanism then that implementation may not be conforming to
WSGI requirements. Specifically, between a yield and/or write() it is
required that an implicit flush is performed. This should ensure that
the data is written to the HTTP client connection and/or ensure that
the return of such data to the client occurs in parallel to further
actions occurring in that request.

2. A WSGI middleware that caches response data can stuff this up. One
cant outright prohibit a WSGI middleware holding on to response data,
albeit that for each yield or write() technically it is supposed to
still pass on at least an empty string down the chain so as to allow
control to get back to the underlying WSGI implementation, which may
uses such windows to swap what request context it is operating on so
as to allow a measure of concurrency in situation where threads may
not be getting used.

3. Where a WSGI adapter on top of an existing web server is used, eg.
various options that exist with Apache and nginx, then an output
filter configured into the web server may also stuff this up. For
example, an output filter that compresses response data may buffer up
response data into large blocks before compressing them and returning
them.

4. Although response content can be streamed subject to above caveats,
streaming of request content is a totally different matter. First off,
WSGI requires that the request content have a Content-Length
specified. Thus technically a HTTP client cant leave out
Content-Length and instead used chunked request content. Further, the
way in which many web servers and WSGI servers are implemented would
prohibit streaming of request content anyway. This is because many
implementations, especially where proxying occurs, eg. cgi, fastcgi,
scgi, ajp, uwsgi, mod_proxy (??), and mod_wsgi daemon mode, expect
that the whole request content can be read in and written across the
proxy connection before any attempt is made to start reading any data
returned from the web application. The request content therefore
cannot be open ended in length because most implementations will never
switch from reading that content to expecting response from the
application. Thus it isn't possible to use WSGI as both way streaming
mechanism where some request content is written, some response content
returned and then the client sends more request content based on that
etc etc.

So what does this all mean. First up is that response content
streaming should be able to be made to work, however since request
content streaming isn't technically allowed with WSGI, if you need
that you are out of luck if you want to conform to WSGI specification.
Second, you can however with mod_wsgi embedded mode slightly step
outside of strict WSGI conformance and have request content streaming.
You are then bound to Apache/mod_wsgi, but whether you want to do that
is debatable for reasons below.

The bigger problem with both way streaming or long polling
applications which use the same HTTP request is that WSGI servers tend
to use processes and threads for concurrency. When you use this
mechanisms they will tie up a process or thread for the whole time.
Thus if you have lots of concurrent request you need huge numbers of
processes and/or threads, which just isn't usually practical because
of resource usage such as memory. For that reason, one would instead
on the server usually use special purpose web servers for these types
of applications and use HTTP directly and avoid WSGI, due to WSGI
blocking nature. Instead these servers would use an event driven
system model or other system which allows concurrency without
requiring a process or thread per application.

In short, this is what Comet and dedicated servers for that are about.
Allowing large numbers of concurrent long requests with minimal
resources. That they are dedicated systems also allows them to avoid
limitations in other high level web application interfaces such as
CGI, FASTCGI, SCGI, AJP etc which have an expectation that can read
whole request content before trying to deal with any response from a
web application that is handling the requests.

Anyway, hopefully that explains things better. You can do what you
want, you just need to select the correct tool for the job.

Graham
_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
Reply | Threaded
Open this post in threaded view
|

Re: Emulating req.write() in WSGI

Aaron Fransen-2
Apologies Graham, I'm not actually trying to appear dense but clearly I'm not one of the world's bright lights when it comes to web interfaces.

My installation is literally a base installation of the latest Ubuntu server platform. The only configuration at play is this:

    WSGIDaemonProcess node9 user=www-data group=www-data processes=2 threads=25
    WSGIProcessGroup node9
    WSGIScriptAlias /run /var/www/run/run.py

The error that occurs when using telnet and yield is:

[Mon Jul 05 06:30:24 2010] [error] [client 127.0.0.1] mod_wsgi (pid=2716): Target WSGI script '/var/www/run/run.py' cannot be loaded as Python module.
[Mon Jul 05 06:30:24 2010] [error] [client 127.0.0.1] mod_wsgi (pid=2716): Exception occurred processing WSGI script '/var/www/run/run.py'.
[Mon Jul 05 06:30:24 2010] [error] [client 127.0.0.1] SyntaxError: 'return' with argument inside generator (run.py, line 14)

using this code:

    status    =    '200 OK'
    response_headers    =    [('Content-type','text/plain')]
    start_response(status, response_headers)
    for x in range(0,10):
        yield 'hey %s' % x
        time.sleep(1)

The error occurs when I use "return []" as opposed to simply "return", however I now see that is a result of the yield command itself.

Using this method, the telnet interface returns immediately with:

HTTP/1.1 200 OK
Date: Mon, 05 Jul 2010 12:30:45 GMT
Server: Apache/2.2.14 (Ubuntu)
Vary: Accept-Encoding
Connection: close
Content-Type: text/plain

0
Connection closed by foreign host.

In fact, whether using yield or write produces the same result.

If I'm not getting the results I should be, then obviously I'm doing something wrong.

I understand the danger of having a long-running web process (hence the reason I have a lot of virtual machines in the live environment using mod_python right now) but unfortunately it's something I don't seem to be able to work around at the moment.

Thanks to all.

On Wed, Jun 30, 2010 at 5:19 PM, Graham Dumpleton <[hidden email]> wrote:
On 30 June 2010 22:55, Aaron Fransen <[hidden email]> wrote:
>
> I can see that this could potentially get very ugly very quickly.
>
> Using stock Apache on the current Ubuntu server, using yield produced a
> response error

What error? If you aren't going to debug it enough to even work out
what the error is in the browser or Apache error logs and post it here
for comment so can say what may be wrong on your system, then we cant
exactly help you much can we.

> and using write() (over the telnet interface) returned the 0
> only and disconnected. Similar behavior in Firefox.

All the scripts I provided you are conforming WSGI applications and
work on mod_wsgi. If you are having issues, then it is likely going to
be the way your Apache/Python is setup or how you configured mod_wsgi
to host the scripts. Again, because you are providing no details about
how you configured mod_wsgi we cant help you work out what is wrong
with your system.

> How odd that nobody's come up with a simple streaming/update schema (at
> least to my mind).

For response content they have and it can be made to work. Just
because you cant get it working or don't understand what we are saying
about the need to use a JavaScript/AJAX type client (eg. comet style)
to make use of it as opposed to trying to rely on browser
functionality that doesn't exist doesn't change that. Request content
streaming is a different matter as I will explain below but you
haven't even mentioned that as yet that I can see.

> It would have been nice to be able to provide some kind of in-stream
> feedback for long running jobs, but it looks like I'm going to have to
> abandon that approach. The only issue with either of the other solutions is
> that each subsequent request depends on data provided by the prior, so the
> amount of traffic going back & forth could potentially become a problem.
>
> Alternatively I could simply create a session database that saves the
> required objects then each subsequent request simply fetches the required
> one from the table and...
>
> Well, you can see why streaming seemed like such a simple solution! Back to
> the drawing board, as it were.

I'll try one last time to try and summarise a few issues for you,
although based on your attitude so far, I don't think it will change
your opinion or help your understanding.

1. Streaming of responses from a WSGI application works fine using
either yield or write(). If it doesn't work for a specific WSGI
hosting mechanism then that implementation may not be conforming to
WSGI requirements. Specifically, between a yield and/or write() it is
required that an implicit flush is performed. This should ensure that
the data is written to the HTTP client connection and/or ensure that
the return of such data to the client occurs in parallel to further
actions occurring in that request.

2. A WSGI middleware that caches response data can stuff this up. One
cant outright prohibit a WSGI middleware holding on to response data,
albeit that for each yield or write() technically it is supposed to
still pass on at least an empty string down the chain so as to allow
control to get back to the underlying WSGI implementation, which may
uses such windows to swap what request context it is operating on so
as to allow a measure of concurrency in situation where threads may
not be getting used.

3. Where a WSGI adapter on top of an existing web server is used, eg.
various options that exist with Apache and nginx, then an output
filter configured into the web server may also stuff this up. For
example, an output filter that compresses response data may buffer up
response data into large blocks before compressing them and returning
them.

4. Although response content can be streamed subject to above caveats,
streaming of request content is a totally different matter. First off,
WSGI requires that the request content have a Content-Length
specified. Thus technically a HTTP client cant leave out
Content-Length and instead used chunked request content. Further, the
way in which many web servers and WSGI servers are implemented would
prohibit streaming of request content anyway. This is because many
implementations, especially where proxying occurs, eg. cgi, fastcgi,
scgi, ajp, uwsgi, mod_proxy (??), and mod_wsgi daemon mode, expect
that the whole request content can be read in and written across the
proxy connection before any attempt is made to start reading any data
returned from the web application. The request content therefore
cannot be open ended in length because most implementations will never
switch from reading that content to expecting response from the
application. Thus it isn't possible to use WSGI as both way streaming
mechanism where some request content is written, some response content
returned and then the client sends more request content based on that
etc etc.

So what does this all mean. First up is that response content
streaming should be able to be made to work, however since request
content streaming isn't technically allowed with WSGI, if you need
that you are out of luck if you want to conform to WSGI specification.
Second, you can however with mod_wsgi embedded mode slightly step
outside of strict WSGI conformance and have request content streaming.
You are then bound to Apache/mod_wsgi, but whether you want to do that
is debatable for reasons below.

The bigger problem with both way streaming or long polling
applications which use the same HTTP request is that WSGI servers tend
to use processes and threads for concurrency. When you use this
mechanisms they will tie up a process or thread for the whole time.
Thus if you have lots of concurrent request you need huge numbers of
processes and/or threads, which just isn't usually practical because
of resource usage such as memory. For that reason, one would instead
on the server usually use special purpose web servers for these types
of applications and use HTTP directly and avoid WSGI, due to WSGI
blocking nature. Instead these servers would use an event driven
system model or other system which allows concurrency without
requiring a process or thread per application.

In short, this is what Comet and dedicated servers for that are about.
Allowing large numbers of concurrent long requests with minimal
resources. That they are dedicated systems also allows them to avoid
limitations in other high level web application interfaces such as
CGI, FASTCGI, SCGI, AJP etc which have an expectation that can read
whole request content before trying to deal with any response from a
web application that is handling the requests.

Anyway, hopefully that explains things better. You can do what you
want, you just need to select the correct tool for the job.

Graham


_______________________________________________
Web-SIG mailing list
[hidden email]
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/lists%40nabble.com
12