Notebook format "incompatible" changes

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Notebook format "incompatible" changes

Matthias Bussonnier
Hello Jovyans and other beings from Jupyter galilean moons system, 


I'm writing to you to warn that on Monday (probably morning) a relatively important change
in IPython will land on master. In particular it does change the notebook structure quite a bit. 


As nothing is ever perfect, please do back-up your notebooks before upgrading. With this new update,
saved notebook of this new of IPython version won't be compatible with 
older IPython anymore, nor will nbviewer be able to render them. 

The new notebook format will be back-ported on older (2.x) version of IPython and the new notebook format
will be supported on nbviewer, we will just need a few days to port all the changes.

There is of course be a way to manually downgrade the notebook from v4 to v3 using nbconvert.

Eventually the more adventurous can test the branch this week-end. 

As usual, be prepared for data loss and bugs, so update your git branches,
refresh submodules, clear your browser caches, roll up your sleeves, and send bug reports. 

Cheers, 
-- 
Matthias





_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Brian Granger-3
Matthias,

Thanks for posting here about these changes!

And thanks to the whole team (esp Min) for working on all of this!

Cheers,

Brian

On Sat, Nov 1, 2014 at 3:12 AM, Matthias Bussonnier
<[hidden email]> wrote:

> Hello Jovyans and other beings from Jupyter galilean moons system,
>
>
> I'm writing to you to warn that on Monday (probably morning) a relatively
> important change
> in IPython will land on master. In particular it does change the notebook
> structure quite a bit.
>
> Cf https://github.com/ipython/ipython/pull/6045 for more information.
>
> As nothing is ever perfect, please do back-up your notebooks before
> upgrading. With this new update,
> saved notebook of this new of IPython version won't be compatible with
> older IPython anymore, nor will nbviewer be able to render them.
>
> The new notebook format will be back-ported on older (2.x) version of
> IPython and the new notebook format
> will be supported on nbviewer, we will just need a few days to port all the
> changes.
>
> There is of course be a way to manually downgrade the notebook from v4 to v3
> using nbconvert.
>
> Eventually the more adventurous can test the branch this week-end.
>
> As usual, be prepared for data loss and bugs, so update your git branches,
> refresh submodules, clear your browser caches, roll up your sleeves, and
> send bug reports.
>
> Cheers,
> --
> Matthias
>
>
>
>
>
> _______________________________________________
> IPython-dev mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>



--
Brian E. Granger
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
[hidden email] and [hidden email]
_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Carlos Córdoba
Hi,

Could you explain a bit more what this change is about?

Cheers,
Carlos

El 01/11/14 a las 13:01, Brian Granger escribió:
Matthias,

Thanks for posting here about these changes!

And thanks to the whole team (esp Min) for working on all of this!

Cheers,

Brian

On Sat, Nov 1, 2014 at 3:12 AM, Matthias Bussonnier
[hidden email] wrote:
Hello Jovyans and other beings from Jupyter galilean moons system,


I'm writing to you to warn that on Monday (probably morning) a relatively
important change
in IPython will land on master. In particular it does change the notebook
structure quite a bit.

Cf https://github.com/ipython/ipython/pull/6045 for more information.

As nothing is ever perfect, please do back-up your notebooks before
upgrading. With this new update,
saved notebook of this new of IPython version won't be compatible with
older IPython anymore, nor will nbviewer be able to render them.

The new notebook format will be back-ported on older (2.x) version of
IPython and the new notebook format
will be supported on nbviewer, we will just need a few days to port all the
changes.

There is of course be a way to manually downgrade the notebook from v4 to v3
using nbconvert.

Eventually the more adventurous can test the branch this week-end.

As usual, be prepared for data loss and bugs, so update your git branches,
refresh submodules, clear your browser caches, roll up your sleeves, and
send bug reports.

Cheers,
--
Matthias





_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev





_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Matthias Bussonnier
Hi Carlos, 

TL:DR; at the end. 

With the notebook format v4, the way notebooks are written on disk will change.
After now some time with v3, we know that a few things could have been done better
or in a more general way, thus we improved the notebook format. 

Once 3.0 and 2.4 are released this shouldn't change much for the end user.
I'll go over some of the changes in the notebook format and their reasons, but you 
should read the IPep for all details. 


Get rid of python specific names. 
pyin, pyout, pyerr are renamed input input, output, and error (roughly, read IPep for exact names). 

Uniformise jpg/png/text keys in output to be a mimetype. 
You can now store application/x-pdf or doc/microsoft-word in a notebook. 
If the frontend now how to read it, it will do something with it i suppose.
It remove a lot of special casing. 

Get rid of worksheets, almost nobody use it, and there are other smarter way to 
store them if we wanted to.

Text cells had their text under the 'text' key, and code cell under 'source'. 
Which is a bit silly, as it forces you to do some special casing where not necessary.

Heading cell are now gone (from the file format), Technology evolve fast, and we can 
now detect the #*n in markdown cell and convert properly to LaTeX, and add anchor 
in nbviewer.

More importantly, we know have a jsonschema that describe the format. 
So we can validate the notebook, and know that for example, a prompt number
is either null, or a number. v3 was implicitly allowing '*' (star), which would happened 
if you were saving while cell are running, which took by surprise both us and external
library where the conversion of some notebook made crashes. This include also security risk.
Which I won't develop, cause I won't develop, but those who know know I love javascipt
injection, and you can do nasty things on known websites. By insuring the type of each field
of the notebook you lower the attack area, and protect FooBar corp user from attacks
even if FooBar corp does not really respond to you when you do responsible disclosure and 
finally give up. 

I won't describe "All the things", but you see the big picture. 
It's better, faster, stronger.

TL;DR:

Wat ? You don't know ?! I've heard that IPython notebook format v3 might have been responsible
for the death of a huge number of kittens due to developers banging their head on their desk. 
It is also probable that the need of extra computing power to run more test because of it's complexity
is in part responsible for global warming. Also the new format prevent the use of %pylab that kills 
the endangered species of newbies. The naming convention was also really strange for php  
coders that don't understand the py prefixes.

As we like kittens, newbies, developers, php coders (but not php itself [1]), and dislike
global warming we decided to fix that.

It also comes as ipynb v4+  that have a 5.5 inches screen, and we plan to jump from v8 to v10 
directly. We also removed touch-id from metadata so that police cannot force youth unlock your
ipynb.

Hope that shine a light on a few of the reason and what will change. 
Tell us if you have any more questions. 


Cheers, 
-- 
M

[1]: But MinRK Love Javascript. 
 




Le 2 nov. 2014 à 18:49, Carlos Córdoba <[hidden email]> a écrit :

Hi,

Could you explain a bit more what this change is about?

Cheers,
Carlos

El 01/11/14 a las 13:01, Brian Granger escribió:
Matthias,

Thanks for posting here about these changes!

And thanks to the whole team (esp Min) for working on all of this!

Cheers,

Brian

On Sat, Nov 1, 2014 at 3:12 AM, Matthias Bussonnier
[hidden email] wrote:
Hello Jovyans and other beings from Jupyter galilean moons system,


I'm writing to you to warn that on Monday (probably morning) a relatively
important change
in IPython will land on master. In particular it does change the notebook
structure quite a bit.

Cf https://github.com/ipython/ipython/pull/6045 for more information.

As nothing is ever perfect, please do back-up your notebooks before
upgrading. With this new update,
saved notebook of this new of IPython version won't be compatible with
older IPython anymore, nor will nbviewer be able to render them.

The new notebook format will be back-ported on older (2.x) version of
IPython and the new notebook format
will be supported on nbviewer, we will just need a few days to port all the
changes.

There is of course be a way to manually downgrade the notebook from v4 to v3
using nbconvert.

Eventually the more adventurous can test the branch this week-end.

As usual, be prepared for data loss and bugs, so update your git branches,
refresh submodules, clear your browser caches, roll up your sleeves, and
send bug reports.

Cheers,
--
Matthias


_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Fernando Perez
In reply to this post by Brian Granger-3

On Sat, Nov 1, 2014 at 11:01 AM, Brian Granger <[hidden email]> wrote:
And thanks to the whole team (esp Min) for working on all of this!

+lots. This has been a huge amount of slow, careful, not-very-sexy work.  I am really thankful for the patience the whole team has had working on this, to help us set the notebook format as a solid foundation for long-term sharing and archival of computational work.  

This kind of effort provides few immediate rewards, but can have very significant long-term value. Thanks a lot for everyone who pitched in on that PR...

Cheers,

f


--
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Wes Turner
Thanks!

1. Is there a link to the jsonschema? In the documentation?
2. How feasible would it be to write a JSON-LD context?
3. Where/how can/could Dublin Core metadata be added? It would be great to be able to index these documents with a title and authors.


On Sun, Nov 2, 2014 at 5:23 PM, Fernando Perez <[hidden email]> wrote:

On Sat, Nov 1, 2014 at 11:01 AM, Brian Granger <[hidden email]> wrote:
And thanks to the whole team (esp Min) for working on all of this!

+lots. This has been a huge amount of slow, careful, not-very-sexy work.  I am really thankful for the patience the whole team has had working on this, to help us set the notebook format as a solid foundation for long-term sharing and archival of computational work.  

This kind of effort provides few immediate rewards, but can have very significant long-term value. Thanks a lot for everyone who pitched in on that PR...

Cheers,

f


--
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev




--

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Matthias Bussonnier
Hi, 
Le 3 nov. 2014 à 09:32, Wes Turner <[hidden email]> a écrit :

Thanks!

1. Is there a link to the jsonschema? In the documentation?


2. How feasible would it be to write a JSON-LD context?

This should have been discussed when we were in the refactoring of the notebook format, 
not once it's ready to merge. Metadata are free format you can add things like that if you like though.

3. Where/how can/could Dublin Core metadata be added? It would be great to be able to index these documents with a title and authors.

Same as above.

The two last issues of extra metadata have extensively discussed, the new format do not change
the discussions/problems that have been made. You can also refer to theses.

Cheers, 
-- 
M





On Sun, Nov 2, 2014 at 5:23 PM, Fernando Perez <[hidden email]> wrote:

On Sat, Nov 1, 2014 at 11:01 AM, Brian Granger <[hidden email]> wrote:
And thanks to the whole team (esp Min) for working on all of this!

+lots. This has been a huge amount of slow, careful, not-very-sexy work.  I am really thankful for the patience the whole team has had working on this, to help us set the notebook format as a solid foundation for long-term sharing and archival of computational work.  

This kind of effort provides few immediate rewards, but can have very significant long-term value. Thanks a lot for everyone who pitched in on that PR...

Cheers,

f


--
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev




--
_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev


_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Wes Turner

On Mon, Nov 3, 2014 at 3:05 AM, Matthias Bussonnier <[hidden email]> wrote:
Hi, 
Le 3 nov. 2014 à 09:32, Wes Turner <[hidden email]> a écrit :

Thanks!

1. Is there a link to the jsonschema? In the documentation?


2. How feasible would it be to write a JSON-LD context?

This should have been discussed when we were in the refactoring of the notebook format, 
not once it's ready to merge. Metadata are free format you can add things like that if you like though.

3. Where/how can/could Dublin Core metadata be added? It would be great to be able to index these documents with a title and authors.

Same as above.

The two last issues of extra metadata have extensively discussed, the new format do not change
the discussions/problems that have been made. You can also refer to theses.

Cheers, 
-- 
M





On Sun, Nov 2, 2014 at 5:23 PM, Fernando Perez <[hidden email]> wrote:

On Sat, Nov 1, 2014 at 11:01 AM, Brian Granger <[hidden email]> wrote:
And thanks to the whole team (esp Min) for working on all of this!

+lots. This has been a huge amount of slow, careful, not-very-sexy work.  I am really thankful for the patience the whole team has had working on this, to help us set the notebook format as a solid foundation for long-term sharing and archival of computational work.  

This kind of effort provides few immediate rewards, but can have very significant long-term value. Thanks a lot for everyone who pitched in on that PR...

Cheers,

f


--
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev




--
_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev


_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev




--

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Nicholas Bollweg
Re: JSON-LD:

One of the goals of JSON-LD is to be able to add context to a well-defined corpus of data without changing its native format: in this case, the developers mean what they mean, and its important that it be meant this way... so the challenge is how to extract a useful semantic model of the document out of the existing format.

The simplest approach is to use the context of:
{
  "@context": {
    "@vocab": "http://ipython.org/nbformat/v4/"
  }
}


This will map all object keys, and values which are told to be @ids, as URIs in that namespace: example.

(the playground doesn't have a way to do out-of-band expansion contexts, but it wouldn't have to be embedded like that)


A more thorough-going approach might yield some more interesting things, but this is a good starting point, and just a few additions to the above context would be get really close to reflecting what is being said in a given notebook.

IPython notebook nbformat v4 JSONSchema: https://github.com/minrk/ipython/blob/nbformat4/IPython/nbformat/v4/nbformat.v4.schema.json

The schema is a great start: I haven't tried it, but there are some tools to automatically generate context from schema:

some things that might be interesting to map to JSON-LD:
  • patternProperties in mimebundle.
    • In this case, though, it's referring to large, but agreed-upon enumeration of values (and not, like package.json's dependencies, an infinite number of package names).
    • with the above context, these would be lumped into the root of the namespace: http://ipython.org/nbformat/v4/text/html
      • this isn't so bad, probably
  • all the enums, i.e. cell_type: markdown
    • with the naive context above, it will map to a string
    • by setting the @type of cell_type to be @id in the context, markdown would expand to the URI
      http://ipython.org/nbformat/v4/markdown
      • this isn't so bad
    • Another option is to treat enums as more xml-like literals of a specific type, by setting @type of cell_type to be something like CellType
      • also not so bad
    • The advantage to having URIs vs. literals (they are both queryable) is that URIs can be the subject of something, and not just the object... not sure what we'd want to say in this case.
  • the wild west of metadata
    • JSON-LD can't tell the difference between cell metadata and notebook metadata
      • this is not so bad, as it is always "isolated" within the context of the <thing>s metadata, and wouldn't "pollute" the parent
    • with the naive context, everything will just fall into the main namespace.
      • this is bad. i don't see anything that can be done about it
  • all the lists
    • in JSON-LD, one can specify @container: @list
      • these are pretty bad in RDF, as it uses a bizarre lisp-like first and rest to represent them
  • nothing in the root that can map to an @id or @type
    • @id: not going there today
    • @type: nbformat is close, but there's nothing but duck typing to say, "I am a notebook"
    • loading these up into a graph would be interesting, as they would all just be blank nodes knocking around
I'll do some more poking around, but think this is worth having!

On Mon, Nov 3, 2014 at 4:21 AM, Wes Turner <[hidden email]> wrote:

On Mon, Nov 3, 2014 at 3:05 AM, Matthias Bussonnier <[hidden email]> wrote:
Hi, 
Le 3 nov. 2014 à 09:32, Wes Turner <[hidden email]> a écrit :

Thanks!

1. Is there a link to the jsonschema? In the documentation?


2. How feasible would it be to write a JSON-LD context?

This should have been discussed when we were in the refactoring of the notebook format, 
not once it's ready to merge. Metadata are free format you can add things like that if you like though.

3. Where/how can/could Dublin Core metadata be added? It would be great to be able to index these documents with a title and authors.

Same as above.

The two last issues of extra metadata have extensively discussed, the new format do not change
the discussions/problems that have been made. You can also refer to theses.

Cheers, 
-- 
M





On Sun, Nov 2, 2014 at 5:23 PM, Fernando Perez <[hidden email]> wrote:

On Sat, Nov 1, 2014 at 11:01 AM, Brian Granger <[hidden email]> wrote:
And thanks to the whole team (esp Min) for working on all of this!

+lots. This has been a huge amount of slow, careful, not-very-sexy work.  I am really thankful for the patience the whole team has had working on this, to help us set the notebook format as a solid foundation for long-term sharing and archival of computational work.  

This kind of effort provides few immediate rewards, but can have very significant long-term value. Thanks a lot for everyone who pitched in on that PR...

Cheers,

f


--
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev




--
_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev


_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev




--

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev



_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Nathan Goldbaum
In reply to this post by Matthias Bussonnier


On Sun, Nov 2, 2014 at 10:37 AM, Matthias Bussonnier <[hidden email]> wrote:
Hi Carlos, 

TL:DR; at the end. 

With the notebook format v4, the way notebooks are written on disk will change.
After now some time with v3, we know that a few things could have been done better
or in a more general way, thus we improved the notebook format. 

Once 3.0 and 2.4 are released this shouldn't change much for the end user.
I'll go over some of the changes in the notebook format and their reasons, but you 
should read the IPep for all details. 


Get rid of python specific names. 
pyin, pyout, pyerr are renamed input input, output, and error (roughly, read IPep for exact names). 

Uniformise jpg/png/text keys in output to be a mimetype. 
You can now store application/x-pdf or doc/microsoft-word in a notebook. 
If the frontend now how to read it, it will do something with it i suppose.
It remove a lot of special casing. 

Get rid of worksheets, almost nobody use it, and there are other smarter way to 
store them if we wanted to.

Text cells had their text under the 'text' key, and code cell under 'source'. 
Which is a bit silly, as it forces you to do some special casing where not necessary.

Heading cell are now gone (from the file format), Technology evolve fast, and we can 
now detect the #*n in markdown cell and convert properly to LaTeX, and add anchor 
in nbviewer.

More importantly, we know have a jsonschema that describe the format. 
So we can validate the notebook, and know that for example, a prompt number
is either null, or a number. v3 was implicitly allowing '*' (star), which would happened 
if you were saving while cell are running, which took by surprise both us and external
library where the conversion of some notebook made crashes. This include also security risk.
Which I won't develop, cause I won't develop, but those who know know I love javascipt
injection, and you can do nasty things on known websites. By insuring the type of each field
of the notebook you lower the attack area, and protect FooBar corp user from attacks
even if FooBar corp does not really respond to you when you do responsible disclosure and 
finally give up. 

I won't describe "All the things", but you see the big picture. 
It's better, faster, stronger.

TL;DR:

Wat ? You don't know ?! I've heard that IPython notebook format v3 might have been responsible
for the death of a huge number of kittens due to developers banging their head on their desk. 
It is also probable that the need of extra computing power to run more test because of it's complexity
is in part responsible for global warming. Also the new format prevent the use of %pylab that kills 
the endangered species of newbies. The naming convention was also really strange for php  
coders that don't understand the py prefixes.

As we like kittens, newbies, developers, php coders (but not php itself [1]), and dislike
global warming we decided to fix that.

It also comes as ipynb v4+  that have a 5.5 inches screen, and we plan to jump from v8 to v10 
directly. We also removed touch-id from metadata so that police cannot force youth unlock your
ipynb.

Hope that shine a light on a few of the reason and what will change. 
Tell us if you have any more questions. 

You don't mention it here, but nbformatv4 also includes the autoscroll behavior for a cell as part of the notebook metadata, meaning no more resetting autoscroll and issue 2172 can be fixed!

 


Cheers, 
-- 
M

[1]: But MinRK Love Javascript. 
 




Le 2 nov. 2014 à 18:49, Carlos Córdoba <[hidden email]> a écrit :

Hi,

Could you explain a bit more what this change is about?

Cheers,
Carlos

El 01/11/14 a las 13:01, Brian Granger escribió:
Matthias,

Thanks for posting here about these changes!

And thanks to the whole team (esp Min) for working on all of this!

Cheers,

Brian

On Sat, Nov 1, 2014 at 3:12 AM, Matthias Bussonnier
[hidden email] wrote:
Hello Jovyans and other beings from Jupyter galilean moons system,


I'm writing to you to warn that on Monday (probably morning) a relatively
important change
in IPython will land on master. In particular it does change the notebook
structure quite a bit.

Cf https://github.com/ipython/ipython/pull/6045 for more information.

As nothing is ever perfect, please do back-up your notebooks before
upgrading. With this new update,
saved notebook of this new of IPython version won't be compatible with
older IPython anymore, nor will nbviewer be able to render them.

The new notebook format will be back-ported on older (2.x) version of
IPython and the new notebook format
will be supported on nbviewer, we will just need a few days to port all the
changes.

There is of course be a way to manually downgrade the notebook from v4 to v3
using nbconvert.

Eventually the more adventurous can test the branch this week-end.

As usual, be prepared for data loss and bugs, so update your git branches,
refresh submodules, clear your browser caches, roll up your sleeves, and
send bug reports.

Cheers,
--
Matthias


_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev



_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Wes Turner
In reply to this post by Nicholas Bollweg
I'll do some more poking around, but think this is worth having!

Thank you so much!

On Mon, Nov 3, 2014 at 4:51 PM, Nicholas Bollweg <[hidden email]> wrote:
Re: JSON-LD:

One of the goals of JSON-LD is to be able to add context to a well-defined corpus of data without changing its native format: in this case, the developers mean what they mean, and its important that it be meant this way... so the challenge is how to extract a useful semantic model of the document out of the existing format.

The simplest approach is to use the context of:
{
  "@context": {
    "@vocab": "http://ipython.org/nbformat/v4/"
  }
}


This will map all object keys, and values which are told to be @ids, as URIs in that namespace: example.

(the playground doesn't have a way to do out-of-band expansion contexts, but it wouldn't have to be embedded like that)


A more thorough-going approach might yield some more interesting things, but this is a good starting point, and just a few additions to the above context would be get really close to reflecting what is being said in a given notebook.

IPython notebook nbformat v4 JSONSchema: https://github.com/minrk/ipython/blob/nbformat4/IPython/nbformat/v4/nbformat.v4.schema.json

The schema is a great start: I haven't tried it, but there are some tools to automatically generate context from schema:

some things that might be interesting to map to JSON-LD:
  • patternProperties in mimebundle.
    • In this case, though, it's referring to large, but agreed-upon enumeration of values (and not, like package.json's dependencies, an infinite number of package names).
    • with the above context, these would be lumped into the root of the namespace: http://ipython.org/nbformat/v4/text/html
      • this isn't so bad, probably
  • all the enums, i.e. cell_type: markdown
    • with the naive context above, it will map to a string
    • by setting the @type of cell_type to be @id in the context, markdown would expand to the URI
      http://ipython.org/nbformat/v4/markdown
      • this isn't so bad
    • Another option is to treat enums as more xml-like literals of a specific type, by setting @type of cell_type to be something like CellType
      • also not so bad
    • The advantage to having URIs vs. literals (they are both queryable) is that URIs can be the subject of something, and not just the object... not sure what we'd want to say in this case.
  • the wild west of metadata
    • JSON-LD can't tell the difference between cell metadata and notebook metadata
      • this is not so bad, as it is always "isolated" within the context of the <thing>s metadata, and wouldn't "pollute" the parent
    • with the naive context, everything will just fall into the main namespace.
      • this is bad. i don't see anything that can be done about it
  • all the lists
    • in JSON-LD, one can specify @container: @list
      • these are pretty bad in RDF, as it uses a bizarre lisp-like first and rest to represent them
  • nothing in the root that can map to an @id or @type
    • @id: not going there today
    • @type: nbformat is close, but there's nothing but duck typing to say, "I am a notebook"
    • loading these up into a graph would be interesting, as they would all just be blank nodes knocking around
I'll do some more poking around, but think this is worth having!

On Mon, Nov 3, 2014 at 4:21 AM, Wes Turner <[hidden email]> wrote:

On Mon, Nov 3, 2014 at 3:05 AM, Matthias Bussonnier <[hidden email]> wrote:
Hi, 
Le 3 nov. 2014 à 09:32, Wes Turner <[hidden email]> a écrit :

Thanks!

1. Is there a link to the jsonschema? In the documentation?


2. How feasible would it be to write a JSON-LD context?

This should have been discussed when we were in the refactoring of the notebook format, 
not once it's ready to merge. Metadata are free format you can add things like that if you like though.

3. Where/how can/could Dublin Core metadata be added? It would be great to be able to index these documents with a title and authors.

Same as above.

The two last issues of extra metadata have extensively discussed, the new format do not change
the discussions/problems that have been made. You can also refer to theses.

Cheers, 
-- 
M





On Sun, Nov 2, 2014 at 5:23 PM, Fernando Perez <[hidden email]> wrote:

On Sat, Nov 1, 2014 at 11:01 AM, Brian Granger <[hidden email]> wrote:
And thanks to the whole team (esp Min) for working on all of this!

+lots. This has been a huge amount of slow, careful, not-very-sexy work.  I am really thankful for the patience the whole team has had working on this, to help us set the notebook format as a solid foundation for long-term sharing and archival of computational work.  

This kind of effort provides few immediate rewards, but can have very significant long-term value. Thanks a lot for everyone who pitched in on that PR...

Cheers,

f


--
Fernando Perez (@fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev




--
_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev


_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev




--

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev



_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev




--

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Nicholas Bollweg
News from the JSON-LD front: Everything looks pretty good, except for the mime-types.

Here's the gist. Here's a JSON-LD playground. Here's the cross-post to public-linked-json.

  • A term can't be mapped to two things, say cells.0.cell_type to both the @type and nb4:cell_type. I've opted for @type, as I think the goal here is to make a context that makes content more machine-understandable, rather than something that can round-trip back to the original format.
  • I've manually added a few mime types as lists, but this task is almost impossible, as its completely arbitrary... this means you won't be able to ask graph questions about your base64-encoded application/x-pdf, unless it has been captured. 
  • I really wish there was a way to do get an @id. Change signature from foaf:sha1 to @id is so tempting, but then all the URIs would, again, be off the main namespace. However, since the signature contains some secret salt, it may not be worthwhile, as it's capturing notebook+user... though maybe that is useful.

I think the most likely way to understanding the meaning of a notebook fully would require some preprocessing, such as an nbconvert exporter :). 

Cheers!

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Wes Turner

So, if I was to try and add structured data RDF properties, I would add those as keys in the notebook-level metadata object?

I'm assuming I could just merge the nbformatv4 @context with an @context that would allow me to abbreviate additional URI predicates as regular JSON object strings.

OTTOMH:

dcterms:title
dcterms:creator
lang:en
schema:author
schema:description

Yesterday, I sent an email to the [hidden email] (schema.org) list asking what subtype of schema:CreativeWork would most accurately describe an IPython notebook, but have not yet received a response.

At the cell-level, it would be cool if we could say, "this is the abstract", " these are figures derived from this dataset with thus URI retrieved from this URL" and "thus is the conclusion" e.g. for PLoS.

I've looked at PROV for something like recording an additive journal of @datastep transformations ( https://github.com/pydata/pandas/issues/3402 ) but have nothing like an actual implementation. It would be great to see it.

For clinical studies (e.g. RCTs), an ontology of study-control URIs would also be helpful.

Thanks again for this @context, this is great work.

On Nov 4, 2014 9:15 AM, "Nicholas Bollweg" <[hidden email]> wrote:
News from the JSON-LD front: Everything looks pretty good, except for the mime-types.

Here's the gist. Here's a JSON-LD playground. Here's the cross-post to public-linked-json.

  • A term can't be mapped to two things, say cells.0.cell_type to both the @type and nb4:cell_type. I've opted for @type, as I think the goal here is to make a context that makes content more machine-understandable, rather than something that can round-trip back to the original format.
  • I've manually added a few mime types as lists, but this task is almost impossible, as its completely arbitrary... this means you won't be able to ask graph questions about your base64-encoded application/x-pdf, unless it has been captured. 
  • I really wish there was a way to do get an @id. Change signature from foaf:sha1 to @id is so tempting, but then all the URIs would, again, be off the main namespace. However, since the signature contains some secret salt, it may not be worthwhile, as it's capturing notebook+user... though maybe that is useful.

I think the most likely way to understanding the meaning of a notebook fully would require some preprocessing, such as an nbconvert exporter :). 

Cheers!

_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev


_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Nicholas Bollweg

Assuming @context were actually in /@context in the notebook, it could be explicitly amended by using an array of contexts:

"@context": [
  "http://foo/bar/json.ld",
  {" dc":"http://..."}
]

However, i'm not sure this assumption would be appropriate, as, for example, the format doesn't include an explicit $schema... perhaps, again in an nbconvert plugin.

For everything you described, i think you could just add an inline @context to the /metadata or /cells/0/metadata. This will override any other meaning of like-named terms in that object and its children. You can't use all keywords, like @base and @vocab, though.

Your use cases are great: true to semantic nerd, I was thinking about exploiting the data for learning about how people use the notebook, rather than telling other exploiters what a given notebook means. With the explicit metadata schema you describe, all the properties about the /metadata or /cells/0/metadata, and not the notebook or cell, but then that was the point of metadata: to keep the "wild west" out of the notebook and cell roots. 

As to the particular ontologies used: that's a whole other kettle of fish! schema.org, doapl, foaf, dc, skos?!? I see the exposure of this to users being a "Linked Data" nbextension, which in turn can offer different notebook/cell-level metadata annotations. Once the metadata tag UI lands, we'll have a better baseline for what this should be like.


_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev
Reply | Threaded
Open this post in threaded view
|

Re: Notebook format "incompatible" changes

Satrajit Ghosh
hi nicholas and wes,

i had echo-ed wes' comments over on the pull-request. i think we should move this to an IPEP and continue the discussion there. i would personally like to see this integrated in the next iteration of the notebook format and will make the document a first-class citizen in the linked data web. as a start it could be something as simple as adding @context at a global level and allowing modification of the context pointer within the notebook ui.

cheers,

satra

On Wed, Nov 5, 2014 at 10:59 AM, Nicholas Bollweg <[hidden email]> wrote:

Assuming @context were actually in /@context in the notebook, it could be explicitly amended by using an array of contexts:

"@context": [
  "http://foo/bar/json.ld",
  {" dc":"http://..."}
]

However, i'm not sure this assumption would be appropriate, as, for example, the format doesn't include an explicit $schema... perhaps, again in an nbconvert plugin.

For everything you described, i think you could just add an inline @context to the /metadata or /cells/0/metadata. This will override any other meaning of like-named terms in that object and its children. You can't use all keywords, like @base and @vocab, though.

Your use cases are great: true to semantic nerd, I was thinking about exploiting the data for learning about how people use the notebook, rather than telling other exploiters what a given notebook means. With the explicit metadata schema you describe, all the properties about the /metadata or /cells/0/metadata, and not the notebook or cell, but then that was the point of metadata: to keep the "wild west" out of the notebook and cell roots. 

As to the particular ontologies used: that's a whole other kettle of fish! schema.org, doapl, foaf, dc, skos?!? I see the exposure of this to users being a "Linked Data" nbextension, which in turn can offer different notebook/cell-level metadata annotations. Once the metadata tag UI lands, we'll have a better baseline for what this should be like.


_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev



_______________________________________________
IPython-dev mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-dev