Re: [sage-devel] Re: Jupyter notebook by default?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [sage-devel] Re: Jupyter notebook by default?

Jason Grout
(cross-posting to ipython-dev)

Jon,

At the recent San Francisco meetings, we talked about this.  What do you think about:

1. keeping track of the size of the io messages sent from any specific kernel execution
2. When the total size of io reaches some specific size (user-configurable), transmitting a special "throwing away output, but here's how to save the output to a file if you want in the future, or how to increase the limit" message
3. keep a running buffer of the last bit of output attempted to be sent, and send it when the execution finishes (so basically a ring buffer that overwrites the oldest message)

This:

* allows small output through
* provides an explanatory message
* provides the last bit of output as well

One thing to figure out: a limit on size of output that is text may not be appropriate for output that is images, etc.

Thanks,

Jason


On Tue, Jan 5, 2016 at 12:11 PM, Jason Grout <[hidden email]> wrote:

---------- Forwarded message ----------
From: Jonathan Frederic <[hidden email]>
Date: Tue, Jan 5, 2016 at 11:42 AM
Subject: Re: [sage-devel] Re: Jupyter notebook by default?
To: Jason Grout <[hidden email]>
Cc: sage-devel <[hidden email]>


Jason,

Thanks for pulling me in on this.  

William,

I agree, getting a bunch of people to agree on stuff can seem impossible.  However, you mention Sage offers a couple options to mitigate output overflows, can you point me to those options?  The Jupyter Notebook should provide multiple options too - this will also make it easier for everyone to agree.

Also, in you experience, which of these options work the best?  

I was thinking initially of doing something simple, like hard limiting data/time, then printing an error if that's exceeded.  In the Jupyter Notebook, we have to worry about
- Too many messages sent on the websocket
- The notebook json file growing too large and consequently becoming unopenable
- Too much data being appended to the DOM, crashing the browser


Thanks!
-Jon

On Tue, Jan 5, 2016 at 10:19 AM, Jason Grout <[hidden email]> wrote:


On Tuesday, January 5, 2016 at 8:17:45 AM UTC-7, William wrote:

One example of a subtle feature in Sage (notebook and worksheets) not
in Jupyter, which I was just reminded of, is output limiting.  In Sage
there are numerous rules/options to deal with people doing stuff like:

while True:
   print "hi!"

... which is exactly what students will tend to do by accident...
Jupyter doesn't deal with this, but it might not be too hard to
implement in theory.  One of the main problems is figuring out what
the arbitrary rate limiting defaults "should" be; it's arbitrary, and
depends a lot on whether everything is local, over the web, etc. so
getting a bunch of people to agree is hard, which might mean they will
never implement anything.


William,

Jon Frederic in the Jupyter dev meeting happening right now said that he will be working on output limiting as one of his next things.

Jason




_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: [sage-devel] Re: Jupyter notebook by default?

William Stein


On Tue, Jan 5, 2016 at 11:19 AM, Jason Grout <[hidden email]> wrote:
(cross-posting to ipython-dev)

Jon,

At the recent San Francisco meetings, we talked about this.  What do you think about:

1. keeping track of the size of the io messages sent from any specific kernel execution
2. When the total size of io reaches some specific size (user-configurable), transmitting a special "throwing away output, but here's how to save the output to a file if you want in the future, or how to increase the limit" message
3. keep a running buffer of the last bit of output attempted to be sent, and send it when the execution finishes (so basically a ring buffer that overwrites the oldest message)

This:

* allows small output through
* provides an explanatory message
* provides the last bit of output as well

One thing to figure out: a limit on size of output that is text may not be appropriate for output that is images, etc.

The above strategy is a good start, but I've found it to be too naive in practice.  For example, in SMC there is a different limit on the amount of output that will be rendered with mathjax, versus the amount of stdout output.  This is because mathjax rendering is vastly more resource intensive than text rendering.  Also, if your graphics images are just contents of messages in base64 (say), they can be relatively large, but easy to render quickly. 

You might also want to distinguish between local users and people using Jupyter via a remote server.  If everything is running on your laptop, the network situation is completely different than a remote server talking to a cell phone.    

I know you wrote "user-configurable" above, but it's a possibly bad sign when user configuration is required. 

Have fun at the Jupyter dev meeting!

William

 

Thanks,

Jason


On Tue, Jan 5, 2016 at 12:11 PM, Jason Grout <[hidden email]> wrote:

---------- Forwarded message ----------
From: Jonathan Frederic <[hidden email]>
Date: Tue, Jan 5, 2016 at 11:42 AM
Subject: Re: [sage-devel] Re: Jupyter notebook by default?
To: Jason Grout <[hidden email]>
Cc: sage-devel <[hidden email]>


Jason,

Thanks for pulling me in on this.  

William,

I agree, getting a bunch of people to agree on stuff can seem impossible.  However, you mention Sage offers a couple options to mitigate output overflows, can you point me to those options?  The Jupyter Notebook should provide multiple options too - this will also make it easier for everyone to agree.

Also, in you experience, which of these options work the best?  

I was thinking initially of doing something simple, like hard limiting data/time, then printing an error if that's exceeded.  In the Jupyter Notebook, we have to worry about
- Too many messages sent on the websocket
- The notebook json file growing too large and consequently becoming unopenable
- Too much data being appended to the DOM, crashing the browser


Thanks!
-Jon

On Tue, Jan 5, 2016 at 10:19 AM, Jason Grout <[hidden email]> wrote:


On Tuesday, January 5, 2016 at 8:17:45 AM UTC-7, William wrote:

One example of a subtle feature in Sage (notebook and worksheets) not
in Jupyter, which I was just reminded of, is output limiting.  In Sage
there are numerous rules/options to deal with people doing stuff like:

while True:
   print "hi!"

... which is exactly what students will tend to do by accident...
Jupyter doesn't deal with this, but it might not be too hard to
implement in theory.  One of the main problems is figuring out what
the arbitrary rate limiting defaults "should" be; it's arbitrary, and
depends a lot on whether everything is local, over the web, etc. so
getting a bunch of people to agree is hard, which might mean they will
never implement anything.


William,

Jon Frederic in the Jupyter dev meeting happening right now said that he will be working on output limiting as one of his next things.

Jason






--

_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: [sage-devel] Re: Jupyter notebook by default?

Volker Braun
In reply to this post by Jason Grout
IMHO output capture into a web browser isn't really different from the scrollback buffer of a terminal. We obviously enjoy the infinite scrollback but do not want an unbounded drawing surface in the terminal (= dom nodes in the web browser). And certainly nobody wants a piece of their output discarded in a long-running computation. 

The technical implementation is virtual scrolling, this is what the terminal does and this is how the browser should do it, too.


On Tuesday, January 5, 2016 at 8:21:53 PM UTC+1, Jason Grout wrote:
(cross-posting to ipython-dev)

Jon,

At the recent San Francisco meetings, we talked about this.  What do you think about:

1. keeping track of the size of the io messages sent from any specific kernel execution
2. When the total size of io reaches some specific size (user-configurable), transmitting a special "throwing away output, but here's how to save the output to a file if you want in the future, or how to increase the limit" message
3. keep a running buffer of the last bit of output attempted to be sent, and send it when the execution finishes (so basically a ring buffer that overwrites the oldest message)

This:

* allows small output through
* provides an explanatory message
* provides the last bit of output as well

One thing to figure out: a limit on size of output that is text may not be appropriate for output that is images, etc.

Thanks,

Jason


On Tue, Jan 5, 2016 at 12:11 PM, Jason Grout <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="UT4FEb9oBwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">ja...@...> wrote:

---------- Forwarded message ----------
From: Jonathan Frederic <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="UT4FEb9oBwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">jon.f...@...>
Date: Tue, Jan 5, 2016 at 11:42 AM
Subject: Re: [sage-devel] Re: Jupyter notebook by default?
To: Jason Grout <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="UT4FEb9oBwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">grout...@...>
Cc: sage-devel <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="UT4FEb9oBwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">sage-...@...>


Jason,

Thanks for pulling me in on this.  

William,

I agree, getting a bunch of people to agree on stuff can seem impossible.  However, you mention Sage offers a couple options to mitigate output overflows, can you point me to those options?  The Jupyter Notebook should provide multiple options too - this will also make it easier for everyone to agree.

Also, in you experience, which of these options work the best?  

I was thinking initially of doing something simple, like hard limiting data/time, then printing an error if that's exceeded.  In the Jupyter Notebook, we have to worry about
- Too many messages sent on the websocket
- The notebook json file growing too large and consequently becoming unopenable
- Too much data being appended to the DOM, crashing the browser


Thanks!
-Jon

On Tue, Jan 5, 2016 at 10:19 AM, Jason Grout <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="UT4FEb9oBwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">grout...@...> wrote:


On Tuesday, January 5, 2016 at 8:17:45 AM UTC-7, William wrote:

One example of a subtle feature in Sage (notebook and worksheets) not
in Jupyter, which I was just reminded of, is output limiting.  In Sage
there are numerous rules/options to deal with people doing stuff like:

while True:
   print "hi!"

... which is exactly what students will tend to do by accident...
Jupyter doesn't deal with this, but it might not be too hard to
implement in theory.  One of the main problems is figuring out what
the arbitrary rate limiting defaults "should" be; it's arbitrary, and
depends a lot on whether everything is local, over the web, etc. so
getting a bunch of people to agree is hard, which might mean they will
never implement anything.


William,

Jon Frederic in the Jupyter dev meeting happening right now said that he will be working on output limiting as one of his next things.

Jason




_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: [sage-devel] Re: Jupyter notebook by default?

MinRK

On Wed, Jan 6, 2016 at 10:05 AM, Volker Braun <[hidden email]> wrote:

IMHO output capture into a web browser isn't really different from the scrollback buffer of a terminal. We obviously enjoy the infinite scrollback but do not want an unbounded drawing surface in the terminal (= dom nodes in the web browser). And certainly nobody wants a piece of their output discarded in a long-running computation. 

The technical implementation is virtual scrolling, this is what the terminal does and this is how the browser should do it, too.

Jon mentioned that there are a few levels for large output to cause problems. The lowest bar is putting the output on the page, which is by far the easiest to hit, causing an unresponsive browser. This is the level that can be addressed by virtual scrolling / truncating output in UI. Fortunately, it’s also the easiest one to implement.

If we truncate instead of virtual-scroll, then we have a choice for whether truncated output is included in the document or not, which alleviates the problem of opening notebooks that have a problematic amount of output. But it’s putting that on the page that’s ~always the problem, not loading the notebook JSON itself, so I’m somewhat less concerned about that.

The next level where it can cause problems is the output coming over the network in the first place. We can throttle this in the notebook server, as was implemented for 4.2 months ago. Again, this moves the bar for when output causes trouble, but isn’t a complete solution. Dumping truncated output to a file is complicated a bit by the separations we have in place, but it should be doable.

-MinRK



On Tuesday, January 5, 2016 at 8:21:53 PM UTC+1, Jason Grout wrote:
(cross-posting to ipython-dev)

Jon,

At the recent San Francisco meetings, we talked about this.  What do you think about:

1. keeping track of the size of the io messages sent from any specific kernel execution
2. When the total size of io reaches some specific size (user-configurable), transmitting a special "throwing away output, but here's how to save the output to a file if you want in the future, or how to increase the limit" message
3. keep a running buffer of the last bit of output attempted to be sent, and send it when the execution finishes (so basically a ring buffer that overwrites the oldest message)

This:

* allows small output through
* provides an explanatory message
* provides the last bit of output as well

One thing to figure out: a limit on size of output that is text may not be appropriate for output that is images, etc.

Thanks,

Jason


On Tue, Jan 5, 2016 at 12:11 PM, Jason Grout <[hidden email]> wrote:

---------- Forwarded message ----------
From: Jonathan Frederic <[hidden email]>
Date: Tue, Jan 5, 2016 at 11:42 AM
Subject: Re: [sage-devel] Re: Jupyter notebook by default?
To: Jason Grout <[hidden email]>
Cc: sage-devel <[hidden email]>


Jason,

Thanks for pulling me in on this.  

William,

I agree, getting a bunch of people to agree on stuff can seem impossible.  However, you mention Sage offers a couple options to mitigate output overflows, can you point me to those options?  The Jupyter Notebook should provide multiple options too - this will also make it easier for everyone to agree.

Also, in you experience, which of these options work the best?  

I was thinking initially of doing something simple, like hard limiting data/time, then printing an error if that's exceeded.  In the Jupyter Notebook, we have to worry about
- Too many messages sent on the websocket
- The notebook json file growing too large and consequently becoming unopenable
- Too much data being appended to the DOM, crashing the browser


Thanks!
-Jon

On Tue, Jan 5, 2016 at 10:19 AM, Jason Grout <[hidden email]> wrote:


On Tuesday, January 5, 2016 at 8:17:45 AM UTC-7, William wrote:

One example of a subtle feature in Sage (notebook and worksheets) not
in Jupyter, which I was just reminded of, is output limiting.  In Sage
there are numerous rules/options to deal with people doing stuff like:

while True:
   print "hi!"

... which is exactly what students will tend to do by accident...
Jupyter doesn't deal with this, but it might not be too hard to
implement in theory.  One of the main problems is figuring out what
the arbitrary rate limiting defaults "should" be; it's arbitrary, and
depends a lot on whether everything is local, over the web, etc. so
getting a bunch of people to agree is hard, which might mean they will
never implement anything.


William,

Jon Frederic in the Jupyter dev meeting happening right now said that he will be working on output limiting as one of his next things.

Jason




_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev


_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: [sage-devel] Re: Jupyter notebook by default?

MinRK
In reply to this post by Volker Braun


On Wed, Jan 6, 2016 at 1:02 PM, Volker Braun <[hidden email]> wrote:
On Wednesday, January 6, 2016 at 11:55:36 AM UTC+1, Min RK wrote:

If we truncate instead of virtual-scroll, then we have a choice for whether truncated output is included in the document or not, which alleviates the problem of opening notebooks that have a problematic amount of output


There is no fundamental problem with large amounts of output (really, any content), and there is essentially only a single way to do it right:

The view (dom) needs only a fixed number of dom nodes for a virtual scroll.

The in-browser view model can lazily load the current scroll position, with a suitable cache. Fixed amount of browser JS memory.

The server can just mmap the output file, or alternatively seek around in the file. With a suitable index. Fixed amount of server-side memory.

Files aren't used for output. The filesystem should only be involved, if at all, in the exceptional case of output overflow.
 

The kernel has to block if the notebook server can't append output fast enough, thats normal flow control just like in a pipe. Fixed memory usage in the kernel.

--
You received this message because you are subscribed to a topic in the Google Groups "sage-devel" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sage-devel/8erxWppKxXM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: [sage-devel] Re: Jupyter notebook by default?

Volker Braun
On Wednesday, January 6, 2016 at 1:17:12 PM UTC+1, Min RK wrote:
Files aren't used for output. The filesystem should only be involved, if at all, in the exceptional case of output overflow.

Everything is a file of sorts... map is just ram with filesystem backing. 

You can put large stuff that you don't continuously access into ram (which will then be paged out -> swapfs) or you put it into a temp file (either tmpfs->swap or disk). However you call it, large data that is not continuously accessed must end up on the disk because thats what it is good for.



_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: [sage-devel] Re: Jupyter notebook by default?

MinRK
In reply to this post by Jason Grout

Thanks Jason for cross-posting.

Since the issue of funding was brought up, I think supporting projects like this is exactly the sort of thing we should be doing with the funding we have, whether the work sits on the Jupyter or Sage side (I assume there will be both). It’s a bit tricky to keep track of all the points in an email thread, but if we could aggregate the things that are blockers and the things that would be nice, especially changes you need from Jupyter, we should be able to start ticking boxes.

A summary of what I’ve seen so far:

  • sage interacts
  • language cells
  • document conversion from sagenb to ipynb
  • low-level output capturing
  • gracefully handling large output

Some comments:

Re: language cells, I assume it’s referring to things like %%bash, %%R, and %%cython. While these look similar, there is a significant difference in how they are implemented. For instance, the R magic (provided by rpy2) runs an R interpreter in-memory, and talks to it, capturing output, etc.. Where many of these magics, such as bash, ruby, perl, come from is some “script magic” machinery in IPython, which populates the default magics with shortcuts to running a script in a given interpreter. They are essentially shortcuts to cat <cell> | <interpreter>. It’s not a fundamental limitation, or anything dire like that. If sage has an implementation of running code in a persistent alternate interpreter, then it should not be much work to represent that in magics, since cell magics are any Python functions called with two string arguments (the rest of the line and the cell), and can be defined at any time, for instance:

def mymagic(line, cell):
    do_stuff_with(cell)

get_ipython().register_magic_function(mymagic, 'cell')

Re: output capturing, Thomas Kluyver and I were at CERN last month working on the Cling kernel, and one of the things we did was C-level capturing of output. Now that we have that working, integrating it into the IPython kernel should not be much work, and if it’s really important, libraries can use the same technique themselves without waiting for IPython to catch up.

Interacts are perhaps the hardest piece. I think it should be doable to get sage’s own interacts working in the notebook, rather than forcing people to adopt the much more basic interact provided by the IPython widgets.

I can’t speak to the UI transition part of the problem whenever you change defaults, which is a big challenge, but I think we can at least mitigate most of the things on the Jupyter side that are getting in your way.

-MinRK


On Tue, Jan 5, 2016 at 8:19 PM, Jason Grout <[hidden email]> wrote:
(cross-posting to ipython-dev)

Jon,

At the recent San Francisco meetings, we talked about this.  What do you think about:

1. keeping track of the size of the io messages sent from any specific kernel execution
2. When the total size of io reaches some specific size (user-configurable), transmitting a special "throwing away output, but here's how to save the output to a file if you want in the future, or how to increase the limit" message
3. keep a running buffer of the last bit of output attempted to be sent, and send it when the execution finishes (so basically a ring buffer that overwrites the oldest message)

This:

* allows small output through
* provides an explanatory message
* provides the last bit of output as well

One thing to figure out: a limit on size of output that is text may not be appropriate for output that is images, etc.

Thanks,

Jason


On Tue, Jan 5, 2016 at 12:11 PM, Jason Grout <[hidden email]> wrote:

---------- Forwarded message ----------
From: Jonathan Frederic <[hidden email]>
Date: Tue, Jan 5, 2016 at 11:42 AM
Subject: Re: [sage-devel] Re: Jupyter notebook by default?
To: Jason Grout <[hidden email]>
Cc: sage-devel <[hidden email]>


Jason,

Thanks for pulling me in on this.  

William,

I agree, getting a bunch of people to agree on stuff can seem impossible.  However, you mention Sage offers a couple options to mitigate output overflows, can you point me to those options?  The Jupyter Notebook should provide multiple options too - this will also make it easier for everyone to agree.

Also, in you experience, which of these options work the best?  

I was thinking initially of doing something simple, like hard limiting data/time, then printing an error if that's exceeded.  In the Jupyter Notebook, we have to worry about
- Too many messages sent on the websocket
- The notebook json file growing too large and consequently becoming unopenable
- Too much data being appended to the DOM, crashing the browser


Thanks!
-Jon

On Tue, Jan 5, 2016 at 10:19 AM, Jason Grout <[hidden email]> wrote:


On Tuesday, January 5, 2016 at 8:17:45 AM UTC-7, William wrote:

One example of a subtle feature in Sage (notebook and worksheets) not
in Jupyter, which I was just reminded of, is output limiting.  In Sage
there are numerous rules/options to deal with people doing stuff like:

while True:
   print "hi!"

... which is exactly what students will tend to do by accident...
Jupyter doesn't deal with this, but it might not be too hard to
implement in theory.  One of the main problems is figuring out what
the arbitrary rate limiting defaults "should" be; it's arbitrary, and
depends a lot on whether everything is local, over the web, etc. so
getting a bunch of people to agree is hard, which might mean they will
never implement anything.


William,

Jon Frederic in the Jupyter dev meeting happening right now said that he will be working on output limiting as one of his next things.

Jason




_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev



_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev