Quantcast

storing variables *in* the notebook

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

storing variables *in* the notebook

Zoltán Vörös-2
Hi all,

This question has been raised a couple of times at various places, but I
haven't yet found the answer I was looking for. In short, I would like
to store the value of some variables in the notebook itself. The
typical, and almost exclusively mentioned, use-case is something like this:


In [320]: x = some_long_calculation(33)


and then the user would like to be able to re-use x in a new session,
without having to call some_long_calculation() again. Even worse is,
perhaps, the case


In [321]: x = some_long_measurement()


when the value of x cannot, even in principle, be recovered by simply
re-running the notebook, because some_long_measurement() collects
experimental data through a measurement device.

Now, the standard answer to this problem is the %store magic, but that
has at least two problems (one is actually more like a feature). First,
as far as I understand, it saves the variable into a separate file,
therefore, the notebook itself is not "portable" anymore: if I want to
give it to someone, or use it on another computer, then I need the extra
file, but then I could just save the variable in a file in the first place.

Second, if two sessions store the same variable, then, well, then it
will be over-written, which is probably not ideal (but can qualify as a
feature).

So, I would like to ask, whether it is possible to attach the value of
simple variables to the metadata of the notebook, and recover it from
there. If there is no infrastructure for this, is this advisable at all,
and what would it take to implement it? Basically, what I am after is
very similar to store, but the target would be the notebook itself.

Cheers,

Zoltán

_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Thomas Kluyver-2
On 25 January 2017 at 21:33, Zoltán Vörös <[hidden email]> wrote:
Now, the standard answer to this problem is the %store magic, but that has at least two problems (one is actually more like a feature). First, as far as I understand, it saves the variable into a separate file, therefore, the notebook itself is not "portable" anymore: if I want to give it to someone, or use it on another computer, then I need the extra file, but then I could just save the variable in a file in the first place.

In many cases, we think that the unit of sharing should be a directory containing notebooks and associated data files, rather than a notebook itself. Storing and retrieving data in a notebook would require breaking the abstraction that the code inside a notebook doesn't know about the document it's part of.

ActivePapers is a different take on connecting code and data which does package them in a single file; I believe it has some support for using a Jupyter notebook as part of an ActivePaper:
https://github.com/khinsen/activepapers-python

Thomas

_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Zoltán Vörös-2
Hi Thomas,


Thanks for the comments! Here are mine.


On 01/25/2017 11:20 PM, Thomas Kluyver wrote:

> On 25 January 2017 at 21:33, Zoltán Vörös <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Now, the standard answer to this problem is the %store magic, but
>     that has at least two problems (one is actually more like a
>     feature). First, as far as I understand, it saves the variable
>     into a separate file, therefore, the notebook itself is not
>     "portable" anymore: if I want to give it to someone, or use it on
>     another computer, then I need the extra file, but then I could
>     just save the variable in a file in the first place.
>
>
> In many cases, we think that the unit of sharing should be a directory
> containing notebooks and associated data files, rather than a notebook
> itself. Storing and retrieving data in a notebook would require
> breaking the abstraction that the code inside a notebook doesn't know
> about the document it's part of.

But by the same token, by resorting to the %store magic, the code inside
the notebook is linked to something on the file system, in fact,
surreptitiously in a way. I am afraid, I don't quite see, why and how
%store is different in this respect.

To me, one of the main appeals of the notebook is that one can write a
report/log (by this I mean create figures, do data analysis/simulation
and add context, explanation etc.) in a single document, portably, and
without clobbering the file system. I believe, the use case I mentioned
earlier is a logical extension of this concept.

The over-arching theme of the whole ipython project is that data,
analysis, presentation and narrative should not be separated. Metadata
are routinely attached to markdown cells, so why could not be done the
same for the notebook as well?

I understand that you do not want people to store GBs of data in the
notebook, but that was not the intent of the original question.


>
> ActivePapers is a different take on connecting code and data which
> does package them in a single file; I believe it has some support for
> using a Jupyter notebook as part of an ActivePaper:
> https://github.com/khinsen/activepapers-python

Thanks for the pointer, I will check it out!

Zoltán
_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Antonino Ingargiola
Hi Zoltan,

just a simple comment.

If the data is not big, why not copying it verbatim in a code cell? Even in a very long single line if you do not wish to clobber the visual aspect. That way the notebook would be self-contained.

My 2-cents.
Antonio

On Wed, Jan 25, 2017 at 11:45 PM, Zoltán Vörös <[hidden email]> wrote:
Hi Thomas,


Thanks for the comments! Here are mine.


On 01/25/2017 11:20 PM, Thomas Kluyver wrote:
On 25 January 2017 at 21:33, Zoltán Vörös <[hidden email] <mailto:[hidden email]>> wrote:

    Now, the standard answer to this problem is the %store magic, but
    that has at least two problems (one is actually more like a
    feature). First, as far as I understand, it saves the variable
    into a separate file, therefore, the notebook itself is not
    "portable" anymore: if I want to give it to someone, or use it on
    another computer, then I need the extra file, but then I could
    just save the variable in a file in the first place.


In many cases, we think that the unit of sharing should be a directory containing notebooks and associated data files, rather than a notebook itself. Storing and retrieving data in a notebook would require breaking the abstraction that the code inside a notebook doesn't know about the document it's part of.

But by the same token, by resorting to the %store magic, the code inside the notebook is linked to something on the file system, in fact, surreptitiously in a way. I am afraid, I don't quite see, why and how %store is different in this respect.

To me, one of the main appeals of the notebook is that one can write a report/log (by this I mean create figures, do data analysis/simulation and add context, explanation etc.) in a single document, portably, and without clobbering the file system. I believe, the use case I mentioned earlier is a logical extension of this concept.

The over-arching theme of the whole ipython project is that data, analysis, presentation and narrative should not be separated. Metadata are routinely attached to markdown cells, so why could not be done the same for the notebook as well?

I understand that you do not want people to store GBs of data in the notebook, but that was not the intent of the original question.



ActivePapers is a different take on connecting code and data which does package them in a single file; I believe it has some support for using a Jupyter notebook as part of an ActivePaper:
https://github.com/khinsen/activepapers-python

Thanks for the pointer, I will check it out!


Zoltán
_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev


_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Zoltán Vörös-2
Hi Antonio,


Thanks! Actually, this is how I do it at the moment, and it is certainly
better than nothing. With small variables, it is quite OK, the
medium-sized items (an array with 1000 elements, say) are a bit more
problematic, though.

Cheers,

Zoltán



On 01/26/2017 06:20 PM, Antonino Ingargiola wrote:

> Hi Zoltan,
>
> just a simple comment.
>
> If the data is not big, why not copying it verbatim in a code cell?
> Even in a very long single line if you do not wish to clobber the
> visual aspect. That way the notebook would be self-contained.
>
> My 2-cents.
> Antonio
>
> On Wed, Jan 25, 2017 at 11:45 PM, Zoltán Vörös <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Thomas,
>
>
>     Thanks for the comments! Here are mine.
>
>
>     On 01/25/2017 11:20 PM, Thomas Kluyver wrote:
>
>         On 25 January 2017 at 21:33, Zoltán Vörös <[hidden email]
>         <mailto:[hidden email]> <mailto:[hidden email]
>         <mailto:[hidden email]>>> wrote:
>
>             Now, the standard answer to this problem is the %store
>         magic, but
>             that has at least two problems (one is actually more like a
>             feature). First, as far as I understand, it saves the variable
>             into a separate file, therefore, the notebook itself is not
>             "portable" anymore: if I want to give it to someone, or
>         use it on
>             another computer, then I need the extra file, but then I could
>             just save the variable in a file in the first place.
>
>
>         In many cases, we think that the unit of sharing should be a
>         directory containing notebooks and associated data files,
>         rather than a notebook itself. Storing and retrieving data in
>         a notebook would require breaking the abstraction that the
>         code inside a notebook doesn't know about the document it's
>         part of.
>
>
>     But by the same token, by resorting to the %store magic, the code
>     inside the notebook is linked to something on the file system, in
>     fact, surreptitiously in a way. I am afraid, I don't quite see,
>     why and how %store is different in this respect.
>
>     To me, one of the main appeals of the notebook is that one can
>     write a report/log (by this I mean create figures, do data
>     analysis/simulation and add context, explanation etc.) in a single
>     document, portably, and without clobbering the file system. I
>     believe, the use case I mentioned earlier is a logical extension
>     of this concept.
>
>     The over-arching theme of the whole ipython project is that data,
>     analysis, presentation and narrative should not be separated.
>     Metadata are routinely attached to markdown cells, so why could
>     not be done the same for the notebook as well?
>
>     I understand that you do not want people to store GBs of data in
>     the notebook, but that was not the intent of the original question.
>
>
>
>         ActivePapers is a different take on connecting code and data
>         which does package them in a single file; I believe it has
>         some support for using a Jupyter notebook as part of an
>         ActivePaper:
>         https://github.com/khinsen/activepapers-python
>         <https://github.com/khinsen/activepapers-python>
>
>
>     Thanks for the pointer, I will check it out!
>
>
>     Zoltán
>     _______________________________________________
>     IPython-dev mailing list
>     [hidden email] <mailto:[hidden email]>
>     https://mail.scipy.org/mailman/listinfo/ipython-dev
>     <https://mail.scipy.org/mailman/listinfo/ipython-dev>
>
>
>
>
> _______________________________________________
> IPython-dev mailing list
> [hidden email]
> https://mail.scipy.org/mailman/listinfo/ipython-dev

_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Loper, Jackson
In reply to this post by Thomas Kluyver-2
Zoltán -- 

I am very curious to know why you want the data embedded in the ipynb file, instead of storing a file in the same directory.  Is it so you can share files with colleagues?  If so, why not just share the whole directory?  Is it just too bulky?  Just curious as to what your motivation is.  

Anywho, if you really want to do it, and you're in the mood, I think it would be fairly straightforward to make a combined python/javascript plugin that allowed one to conveniently store code cells of the form

  # This is a datacell.  If you do not have the datacell javascript extension, 
  # this code cell may look really really long.  Sorry about that.
  x = pickle.loads(b"\x80\x03}q\x00(X\x07\x00\x00\x00Purposeq\x01X$\x00\x00\x00Very important data just for Kluyverq\x02X\x07\x00\x00\x00Contentq\x03X\r\x00\x00\x00You're great!q\x04u.")

and make them appear in the notebook as a "data cell" that looks like

  x = pickle.loads(<<<content abridged>>>)

Such a data cell would be uneditable, but could be executed.  

I think the simplest way to do this would be to design an ipython widget that, when it comes online, adds such a "data cell" directly after the current one.  Creating a cell should then be as simple as

  datacells.make_data_cell(varname='x',data="Hello world.")

I could be missing something that makes this utterly impossible though.  

Cheers!

Jackson Loper
Division of Applied Math
Brown University



_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

William Stein
Hi Zoltán,

This is an interesting problem and idea.  Not that it would matter to
you, but we'll very likely implement this for SageMathCloud [1] using
our global blob store, which is also how we deal with graphics in a
way that keeps files small and makes copy paste between worksheets
possible.  I've made this issue:

   https://github.com/sagemathinc/smc/issues/1594

William

[1] https://cloud.sagemath.com

On Thu, Jan 26, 2017 at 11:57 AM, Loper, Jackson
<[hidden email]> wrote:

> Zoltán --
>
> I am very curious to know why you want the data embedded in the ipynb file,
> instead of storing a file in the same directory.  Is it so you can share
> files with colleagues?  If so, why not just share the whole directory?  Is
> it just too bulky?  Just curious as to what your motivation is.
>
> Anywho, if you really want to do it, and you're in the mood, I think it
> would be fairly straightforward to make a combined python/javascript plugin
> that allowed one to conveniently store code cells of the form
>
>   # This is a datacell.  If you do not have the datacell javascript
> extension,
>   # this code cell may look really really long.  Sorry about that.
>   x =
> pickle.loads(b"\x80\x03}q\x00(X\x07\x00\x00\x00Purposeq\x01X$\x00\x00\x00Very
> important data just for
> Kluyverq\x02X\x07\x00\x00\x00Contentq\x03X\r\x00\x00\x00You're
> great!q\x04u.")
>
> and make them appear in the notebook as a "data cell" that looks like
>
>   x = pickle.loads(<<<content abridged>>>)
>
> Such a data cell would be uneditable, but could be executed.
>
> I think the simplest way to do this would be to design an ipython widget
> that, when it comes online, adds such a "data cell" directly after the
> current one.  Creating a cell should then be as simple as
>
>   datacells.make_data_cell(varname='x',data="Hello world.")
>
> I could be missing something that makes this utterly impossible though.
>
> Cheers!
>
> Jackson Loper
> Division of Applied Math
> Brown University
>
>
>
> _______________________________________________
> IPython-dev mailing list
> [hidden email]
> https://mail.scipy.org/mailman/listinfo/ipython-dev
>



--
William (http://wstein.org)
_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Zoltán Vörös-2
In reply to this post by Loper, Jackson
Hi Jackson,



On 01/26/2017 08:57 PM, Loper, Jackson wrote:
> Zoltán --
>
> I am very curious to know why you want the data embedded in the ipynb
> file, instead of storing a file in the same directory.  Is it so you
> can share files with colleagues?  If so, why not just share the whole
> directory?  Is it just too bulky?  Just curious as to what your
> motivation is.


I guess, it always comes down to taste, but I will try to give some
rational argument, all the same.

1. In many cases, I have multiple notebooks in a single directory,
simply because most of the time, I really don't need a separate folder
for just a single file.So, I have all notebooks that belong to a
particular subject in a single folder, and I don't necessarily want to
share all of them with others.

2. In such cases, it would become messy quite soon, if I started to save
variables to separate data files. Suppose you want to save 10 variables,
all differing in shape and type. You can either save them separately,
which looks somewhat stupid, because you'll have then 10 very small
files, plus you have to load them one by one in the new session, or you
pack them by hand into a single file, and unpack them somehow (probably
by pickling and unpickling), when you want to load them.



What I would like to point out is that this notion already exist in the
notebook, because %store does exactly what I want. The only snag is that
it ties the data to the user, and not the notebook that generated the
data. (This is actually quite bad in my opinion, because notebooks
running on different computers will produce different results, simply
because the %store -r magic can assign different values to the same
variable names.) I understand the arguments and development decisions
brought up by Thomas yesterday, but I feel that those arguments are not
valid, or, at least, make it very hard to "unjustify" a
%store_in_notebook magic, or something similar.


>
> Anywho, if you really want to do it, and you're in the mood, I think
> it would be fairly straightforward to make a combined
> python/javascript plugin that allowed one to conveniently store code
> cells of the form
>
>   # This is a datacell.  If you do not have the datacell javascript
> extension,
>   # this code cell may look really really long.  Sorry about that.
>   x =
> pickle.loads(b"\x80\x03}q\x00(X\x07\x00\x00\x00Purposeq\x01X$\x00\x00\x00Very
> important data just for
> Kluyverq\x02X\x07\x00\x00\x00Contentq\x03X\r\x00\x00\x00You're
> great!q\x04u.")
>
> and make them appear in the notebook as a "data cell" that looks like
>
>   x = pickle.loads(<<<content abridged>>>)
>
> Such a data cell would be uneditable, but could be executed.
>
> I think the simplest way to do this would be to design an ipython
> widget that, when it comes online, adds such a "data cell" directly
> after the current one.  Creating a cell should then be as simple as
>
>   datacells.make_data_cell(varname='x',data="Hello world.")
>
> I could be missing something that makes this utterly impossible though.
>

I think this goes far beyond what I had in mind. I think this function
or whatever would just be

In [221]: x = long_calculation()  # x is 42
                 %store_in_notebook x

and in the new session

In [1]: %store_in_notebook -restore_variables
In [2]: x
Out [2]: 42

I don't think it would have to be a cell that cannot be edited, or
anything like that.

Perhaps, the purpose of my first e-mail was to inquire about how one can
write into the notebook metadata from a code cell. I know that I can
load up the metadata editor, but that's not any better then just putting
the data in a markdown cell.

Cheers,
Zoltán

_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Zoltán Vörös-2
In reply to this post by William Stein
William,


Thanks for the comment, I will keep a tab on this issue.


Cheers,

Zoltán


On 01/26/2017 09:26 PM, William Stein wrote:

> Hi Zoltán,
>
> This is an interesting problem and idea.  Not that it would matter to
> you, but we'll very likely implement this for SageMathCloud [1] using
> our global blob store, which is also how we deal with graphics in a
> way that keeps files small and makes copy paste between worksheets
> possible.  I've made this issue:
>
>     https://github.com/sagemathinc/smc/issues/1594
>
> William
>
> [1] https://cloud.sagemath.com
>
> On Thu, Jan 26, 2017 at 11:57 AM, Loper, Jackson
> <[hidden email]> wrote:
>> Zoltán --
>>
>> I am very curious to know why you want the data embedded in the ipynb file,
>> instead of storing a file in the same directory.  Is it so you can share
>> files with colleagues?  If so, why not just share the whole directory?  Is
>> it just too bulky?  Just curious as to what your motivation is.
>>
>> Anywho, if you really want to do it, and you're in the mood, I think it
>> would be fairly straightforward to make a combined python/javascript plugin
>> that allowed one to conveniently store code cells of the form
>>
>>    # This is a datacell.  If you do not have the datacell javascript
>> extension,
>>    # this code cell may look really really long.  Sorry about that.
>>    x =
>> pickle.loads(b"\x80\x03}q\x00(X\x07\x00\x00\x00Purposeq\x01X$\x00\x00\x00Very
>> important data just for
>> Kluyverq\x02X\x07\x00\x00\x00Contentq\x03X\r\x00\x00\x00You're
>> great!q\x04u.")
>>
>> and make them appear in the notebook as a "data cell" that looks like
>>
>>    x = pickle.loads(<<<content abridged>>>)
>>
>> Such a data cell would be uneditable, but could be executed.
>>
>> I think the simplest way to do this would be to design an ipython widget
>> that, when it comes online, adds such a "data cell" directly after the
>> current one.  Creating a cell should then be as simple as
>>
>>    datacells.make_data_cell(varname='x',data="Hello world.")
>>
>> I could be missing something that makes this utterly impossible though.
>>
>> Cheers!
>>
>> Jackson Loper
>> Division of Applied Math
>> Brown University
>>
>>
>>
>> _______________________________________________
>> IPython-dev mailing list
>> [hidden email]
>> https://mail.scipy.org/mailman/listinfo/ipython-dev
>>
>
>

_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Matthias Bussonnier
Thanks Zoltan,


You can try to implement a %store_in_notebook magic, but the current
abstractions layers make a lot of assumptions about how things are
running.
If you want to develop something  along these lines we can try to give
you pointers.
If I were you I would go the route of a custom ContentManager which
expose actual folders as notebooks (something akin ipymd/notedown but
with folders) and start kernels in these folders.
Then you "just" zip and share the folder. That would also solve the
fact that notebooks are text-based fileformats, which are inherently
bad for binary data, andd o not support incremental updates.

One of the problem is that what you are trying to do will not work on
many system and it is relatively hard to make it part of Jupyter if it
only work on some limited use case as we'd like to have a clear
message of what is vetted by the core.
We'll be happy to be proven wrong.

Thanks,
--
Matthias


On Thu, Jan 26, 2017 at 12:47 PM, Zoltán Vörös <[hidden email]> wrote:

> William,
>
>
> Thanks for the comment, I will keep a tab on this issue.
>
>
> Cheers,
>
> Zoltán
>
>
>
> On 01/26/2017 09:26 PM, William Stein wrote:
>>
>> Hi Zoltán,
>>
>> This is an interesting problem and idea.  Not that it would matter to
>> you, but we'll very likely implement this for SageMathCloud [1] using
>> our global blob store, which is also how we deal with graphics in a
>> way that keeps files small and makes copy paste between worksheets
>> possible.  I've made this issue:
>>
>>     https://github.com/sagemathinc/smc/issues/1594
>>
>> William
>>
>> [1] https://cloud.sagemath.com
>>
>> On Thu, Jan 26, 2017 at 11:57 AM, Loper, Jackson
>> <[hidden email]> wrote:
>>>
>>> Zoltán --
>>>
>>> I am very curious to know why you want the data embedded in the ipynb
>>> file,
>>> instead of storing a file in the same directory.  Is it so you can share
>>> files with colleagues?  If so, why not just share the whole directory?
>>> Is
>>> it just too bulky?  Just curious as to what your motivation is.
>>>
>>> Anywho, if you really want to do it, and you're in the mood, I think it
>>> would be fairly straightforward to make a combined python/javascript
>>> plugin
>>> that allowed one to conveniently store code cells of the form
>>>
>>>    # This is a datacell.  If you do not have the datacell javascript
>>> extension,
>>>    # this code cell may look really really long.  Sorry about that.
>>>    x =
>>>
>>> pickle.loads(b"\x80\x03}q\x00(X\x07\x00\x00\x00Purposeq\x01X$\x00\x00\x00Very
>>> important data just for
>>> Kluyverq\x02X\x07\x00\x00\x00Contentq\x03X\r\x00\x00\x00You're
>>> great!q\x04u.")
>>>
>>> and make them appear in the notebook as a "data cell" that looks like
>>>
>>>    x = pickle.loads(<<<content abridged>>>)
>>>
>>> Such a data cell would be uneditable, but could be executed.
>>>
>>> I think the simplest way to do this would be to design an ipython widget
>>> that, when it comes online, adds such a "data cell" directly after the
>>> current one.  Creating a cell should then be as simple as
>>>
>>>    datacells.make_data_cell(varname='x',data="Hello world.")
>>>
>>> I could be missing something that makes this utterly impossible though.
>>>
>>> Cheers!
>>>
>>> Jackson Loper
>>> Division of Applied Math
>>> Brown University
>>>
>>>
>>>
>>> _______________________________________________
>>> IPython-dev mailing list
>>> [hidden email]
>>> https://mail.scipy.org/mailman/listinfo/ipython-dev
>>>
>>
>>
>
> _______________________________________________
> IPython-dev mailing list
> [hidden email]
> https://mail.scipy.org/mailman/listinfo/ipython-dev
_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Jody Klymak-2
In reply to this post by Zoltán Vörös-2

I think this goes far beyond what I had in mind. I think this function or whatever would just be

In [221]: x = long_calculation()  # x is 42
               %store_in_notebook x

and in the new session

In [1]: %store_in_notebook -restore_variables
In [2]: x
Out [2]: 42

For my taste, I’d just save that result in a file (`pickle` or `shelf`, or netcdf if I wanted to be formal).  Its a lot more transparent what is going on.  

Imagine this case:  I `%store_in_notebook` the results of a long calculation, and then remove that code from the notebook for some reason.  I might very well wonder a year from now why my notebook is 50 Gb, and have no documentation of how it got that way.  

However, if you do have a whole slew of variables you suddenly want to save, did you try `dill`? 

import dill
import numpy as np

filename= 'globalsave.pkl'

if 1:
    x = np.arange(20)
    dill.dump_session(filename)
else:  
    dill.load_session(filename) 

Cheers,  Jody





_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Zoltán Vörös-2
In reply to this post by Matthias Bussonnier
Hi Matthias,


On 01/26/2017 10:03 PM, Matthias Bussonnier wrote:
> If I were you I would go the route of a custom ContentManager which
> expose actual folders as notebooks (something akin ipymd/notedown but
> with folders) and start kernels in these folders.
> Then you "just" zip and share the folder. That would also solve the
> fact that notebooks are text-based fileformats, which are inherently
> bad for binary data, andd o not support incremental updates.

If the data are pickled, then one wouldn't have to save anything in
binary format. All I want to do is write a single ascii line in the
metadata if the notebook. That in itself is actually recommended (From
the help: " We recommend putting custom metadata attributes in an
appropriately named sub-structure, so they don't conflict with those of
others."), and supports incremental updates. This magic command or
whatever would simply save the step of having to open the metadata
editor manually, and inserting the line by hand.

> One of the problem is that what you are trying to do will not work on
> many system and it is relatively hard to make it part of Jupyter if it
> only work on some limited use case as we'd like to have a clear
> message of what is vetted by the core.

I am not sure I see the difficulty: this would be pure python, pure
javascript. You take the variable, pickle it, attach the resulting
string to the notebook metadata under "user_variables", and you are
done. Of course, it is a different question, what happens, if your
kernel is not python. Well, then it's a problem, I admit. But there are
other magic commands that are python specific, e.g., prun, or the
debugger, so this in itself can't be an obstacle.

I have looked at the documentation, but it seems to me that no functions
could expose the notebook metadata, or the cell metadata for that
matter, is that correct? (I don't want to divert the discussion, but
this latter functionality could be used for creating plots/tables etc.
with caption in the notebook. The caption would be displayed in the
notebook as an extra div, and the content of the caption could be
written in the cell metadata. Nbconvert could then strip the div from
the output, and take the raw content of the metadata, and insert it in
the LaTeX document. Captions for figures are a long-standing problem in
the notebook.)



Cheers,
Zoltán


_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Nathan Goldbaum


On Thu, Jan 26, 2017 at 3:32 PM, Zoltán Vörös <[hidden email]> wrote:
Hi Matthias,


On 01/26/2017 10:03 PM, Matthias Bussonnier wrote:
If I were you I would go the route of a custom ContentManager which
expose actual folders as notebooks (something akin ipymd/notedown but
with folders) and start kernels in these folders.
Then you "just" zip and share the folder. That would also solve the
fact that notebooks are text-based fileformats, which are inherently
bad for binary data, andd o not support incremental updates.

If the data are pickled, then one wouldn't have to save anything in binary format. All I want to do is write a single ascii line in the metadata if the notebook. That in itself is actually recommended (From the help: " We recommend putting custom metadata attributes in an appropriately named sub-structure, so they don't conflict with those of others."), and supports incremental updates. This magic command or whatever would simply save the step of having to open the metadata editor manually, and inserting the line by hand.

One of the problem is that what you are trying to do will not work on
many system and it is relatively hard to make it part of Jupyter if it
only work on some limited use case as we'd like to have a clear
message of what is vetted by the core.

I am not sure I see the difficulty: this would be pure python, pure javascript. You take the variable, pickle it, attach the resulting string to the notebook metadata under "user_variables", and you are done. Of course, it is a different question, what happens, if your kernel is not python. Well, then it's a problem, I admit. But there are other magic commands that are python specific, e.g., prun, or the debugger, so this in itself can't be an obstacle.

But pickles aren't portable. You wouldn't only be able to share this data unless others use the same OS/arch/python version. Seems less flexible than a sidecar file with the data stored in a portable format (e.g. hdf5, csv, etc...).
 

I have looked at the documentation, but it seems to me that no functions could expose the notebook metadata, or the cell metadata for that matter, is that correct? (I don't want to divert the discussion, but this latter functionality could be used for creating plots/tables etc. with caption in the notebook. The caption would be displayed in the notebook as an extra div, and the content of the caption could be written in the cell metadata. Nbconvert could then strip the div from the output, and take the raw content of the metadata, and insert it in the LaTeX document. Captions for figures are a long-standing problem in the notebook.)



Cheers,
Zoltán



_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev


_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Zoltán Vörös-2
In reply to this post by Jody Klymak-2


On 01/26/2017 10:06 PM, Klymak Jody wrote:

>
>> I think this goes far beyond what I had in mind. I think this
>> function or whatever would just be
>>
>> In [221]: x = long_calculation()  # x is 42
>>                %store_in_notebook x
>>
>> and in the new session
>>
>> In [1]: %store_in_notebook -restore_variables
>> In [2]: x
>> Out [2]: 42
>
> For my taste, I’d just save that result in a file (`pickle` or
> `shelf`, or netcdf if I wanted to be formal).  Its a lot more
> transparent what is going on.


But why is it more transparent? By the same token, you could say that
the %%writefile magic command is more obscure than saving the file
explicitly with

with open('file.txt') as fout: fout.write('text')

Magic commands are abbreviations for common tasks, therefore, obscure:)

>
> Imagine this case:  I `%store_in_notebook` the results of a long
> calculation, and then remove that code from the notebook for some
> reason.  I might very well wonder a year from now why my notebook is
> 50 Gb, and have no documentation of how it got that way.

Or imagine this case: I save the results of a long calculation in a
file, and then remove that code from the notebook for some reason. I
might very well wonder a year from now why there is a 50 Gb in my
folder, and have no documentation of how it got that way;)

As William Stein pointed out, when pickling the variables, one would
have to impose some sensible upper bound on the size.


>
> However, if you do have a whole slew of variables you suddenly want to
> save, did you try `dill`?
>
> import dill
> import numpy as np
>
> filename= 'globalsave.pkl'
>
> if 1:
>     x = np.arange(20)
>     dill.dump_session(filename)
> else:
>     dill.load_session(filename)
>

The question is not how one can save variables in a file, the question
is, how one can avoid having to save variables in a file.

Cheers,
Zoltán
_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Steve Holden-5
In reply to this post by Jody Klymak-2
Just a quick note about one comment re: distribution of data and notebooks as separate files. At one point, when asked why this would not be convenient, Zoltan said:

1. In many cases, I have multiple notebooks in a single directory, simply because most of the time, I really don't need a separate folder for just a single file.So, I have all notebooks that belong to a particular subject in a single folder, and I don't necessarily want to share all of them with others.

Mindless tasks like "pick these files out of this directory and put them in a tar/zip file" are easily prone to automation, which would be far less troublesome that modifying a complex architecture to accommodate something not considered in its (in fact, rather careful) design.

Two ways come to mind: the first involves mostly shell interactions (I include the Windows shells available, though at present I am ignorant of them); the second would be a good exercise for a first-year undergraduate, and therefore the kind of thing a working (i.e. there to use programs to advance their research) might like to treat as competence practise (if they have time, that isn't mandatory ;-).

In the first case, for each separate distribution you want to make you can create a directory parallel to the one holding the notebooks and data, and in it create symbolic links to point to the required files. These directories can then be bundled with the standard tar utility, commanded to copy the real files after following the links.

In the second case, each distribution would be represented as a data file containing the paths to the files required, and they would be processed by a Python program that essentially duplicates the same process as above.

I would personally prefer the latter process because, being data driven, the configuration data can be made subject to change control, which with proper configuration metadata included enhances repeatability and allows you to reproduce and distribution on demand.

As a jobbing computational scientist who has spent long years discovering wrong ways to do things, I will just point out that trying to push a design beyond its intended limits is likely to impede the development towards the main goal (though many improvements are also the result of user suggestions and requests). Often there are much lower-complexity solutions available that will satisfy your specific needs without imposing their cost on others.

This note is offered in a spirit of scientific sharing.I know that people often struggle to use computers, because the activity of doing so is peripheral to research. I learned to do things this way through long years of experience. Ignorance is not a crime, and fortunately (unlike stupidity) it can be cured in rational people by the application of information.

Anyway, back to work ...

regards
 Steve

Steve Holden

On Thu, Jan 26, 2017 at 9:06 PM, Klymak Jody <[hidden email]> wrote:

I think this goes far beyond what I had in mind. I think this function or whatever would just be

In [221]: x = long_calculation()  # x is 42
               %store_in_notebook x

and in the new session

In [1]: %store_in_notebook -restore_variables
In [2]: x
Out [2]: 42

For my taste, I’d just save that result in a file (`pickle` or `shelf`, or netcdf if I wanted to be formal).  Its a lot more transparent what is going on.  

Imagine this case:  I `%store_in_notebook` the results of a long calculation, and then remove that code from the notebook for some reason.  I might very well wonder a year from now why my notebook is 50 Gb, and have no documentation of how it got that way.  

However, if you do have a whole slew of variables you suddenly want to save, did you try `dill`? 

import dill
import numpy as np

filename= 'globalsave.pkl'

if 1:
    x = np.arange(20)
    dill.dump_session(filename)
else:  
    dill.load_session(filename) 

Cheers,  Jody





_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev



_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Kiko
Hi all,

My 2 cents...

Why not store that in the notebook metadata? something like:

x = [1,2,3]
%store_in_notebook x # Would add the info to the notebook metadata

that is added to:

{
  kernelspec: {...}
  stored:{
    x: [1,2,3]
  }
}

This way data can be checked before restoring it in the notebook and you could restore just one var o all of them

%store_in_notebook -restore x # would read the notebook metadata

or

%store_in_notebook -restore_all # would read the notebook metadata

Of course, this only would be useful for very basic types: strings, integers, floats, dicst, lists,...

_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Zoltán Vörös-2


On 01/27/2017 10:01 AM, Kiko wrote:

> Hi all,
>
> My 2 cents...
>
> Why not store that in the notebook metadata? something like:
>
> x = [1,2,3]
> %store_in_notebook x # Would add the info to the notebook metadata
>
> that is added to:
>
> {
>   kernelspec: {...}
>   stored:{
>     x: [1,2,3]
>   }
> }
>

I can't tell why, but somehow this idea seems a bit familiar to me...
_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Kiko



I can't tell why, but somehow this idea seems a bit familiar to me...

Aha.

Ok, I read the thread diagonally and my2cents ideas are already there...

Sorry for the noise :-P

_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Kiko


2017-01-27 16:16 GMT+01:00 Kiko <[hidden email]>:



I can't tell why, but somehow this idea seems a bit familiar to me...

Aha.

Ok, I read the thread diagonally and my2cents ideas are already there...

Sorry for the noise :-P
An example about how to write to your metadata. This should provide some hints to create a line magic to do so:

from IPython.display import Javascript

x = [1,2,3]

def add_metadata(var_name, var):
    js = """
if (Jupyter.notebook.metadata['stored'] === undefined){{Jupyter.notebook.metadata['stored'] = {{}}}}
Jupyter.notebook.metadata['stored']['{var_name}']={var_content_serialised}
"""
    return Javascript(js.format(var_name=var_name, var_content_serialised=var.__repr__()))

add_metadata('x', x)


_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: storing variables *in* the notebook

Zoltán Vörös-2

On 01/27/2017 05:53 PM, Kiko wrote:

>
> An example about how to write to your metadata. This should provide
> some hints to create a line magic to do so:
>
> from IPython.display import Javascript
>
> x = [1,2,3]
>
> def add_metadata(var_name, var):
>     js = """
> if (Jupyter.notebook.metadata['stored'] ===
> undefined){{Jupyter.notebook.metadata['stored'] = {{}}}}
> Jupyter.notebook.metadata['stored']['{var_name}']={var_content_serialised}
> """
>     return Javascript(js.format(var_name=var_name,
> var_content_serialised=var.__repr__()))
>
> add_metadata('x', x)

Thanks for the pointer! I have already figured that I could re-use the
code in the notebook extension 'toc2', and that of %store magic, but
here you present a more or less complete solution. Thanks!

Zoltán
_______________________________________________
IPython-dev mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/ipython-dev
Loading...