what about OpenDocument?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

what about OpenDocument?

Chad Whitacre
Dear all,

Here's a crazy idea: what would it look like to maintain the Python
documentation in OpenDocument format?


Pros
====

   - robust authoring environment: OpenOffice
   - drastically lower barrier to entry for authors
   - custom markup == custom OOo stylesheet (!)
   - open XML-based format means easy-to-write tools
   - trivial PDF conversion
   - powerful typesetting
   - nice boost for ODF/OOo


Cons
====

   - HTML version would take work; default not good enough
   - harder to diff/version?
   - um ..., um ...


"Hey, I'd like to write documentation for my new Python module."
"Ok, download the OOo template from python.org and go to it!"




chad

_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|

Re: what about OpenDocument?

Chad Whitacre
Dear All,

Attached is the documentation for the random module in ODT format. The
file includes paragraph, character, and list styles for Python objects.
Also attached is a script for playing with this ODT documentation.

Here are my conclusions from this brief foray:

   - Once we had a stylesheet, ODT would make it *much* easier to write
     documentation: everyone knows how to use a word processor.

   - However, the stylesheet itself is almost too easy to change. Any
     tools would depend on it for their API, so it needs to be stable.

   - OOo is not super-flexible when it comes to styles. I don't see that
     you can nest paragraph-level styles, e.g.

   - The way that OOo stores information is not the easiest to work with:
     e.g.:

         <func>foo([</func><param>foo</param><func>])</func>

     rather than:

         <func>foo([<param>foo</param>])</func>


Bottom line: this is a hacked-up, watered-down, non-robust version of an
XML solution. Too easy for authors, and too hard for tool-smiths. I'd
say it might be an improvement over the status quo, but not enough of
one to warrant switching.



chad

#!/usr/bin/env python
"""Sandbox for playing with ODT files as Python doc.
"""
from zipfile import ZipFile
from xml.dom import minidom


class Function:
    """Represent the documentation for a function.
    """

    def __init__(self, name, params=None, description=''):
        self.name = name

    def __repr__(self):
        return '<doc for %s>' % self.name
    __str__ = __repr__

Method = Function
Class = Method


class Module:
    """Represent the documentation for a module.
    """

    def __init__(self, path):
        """Path points to an ODF file; we want the content as an xml.Document.
        """
        odt = ZipFile(open(path))
        odt.testzip() # I believe this fails on Windows, not sure why.
        content = odt.read('content.xml')
        self._dom = minidom.parseString(content)


    def functions(self):
        """Generate functions defined in this module.
        """
        for el in self._dom.getElementsByTagName('text:span'):
            if el.getAttribute('text:style-name') == "Function_20_Definition":
                val = el.childNodes[0].nodeValue
                if isinstance(val, basestring):
                    funcname = val.strip('[()], ')
                    if funcname:
                        yield Function(funcname)


    def classes(self):
        """Generate classes defined in this module.
        """
        for el in self._dom.getElementsByTagName('text:span'):
            if el.getAttribute('text:style-name') == "Class_20_Definition":
                val = el.childNodes[0].nodeValue
                if isinstance(val, basestring):
                    classname = val.strip('[()], ')[6:]
                    if classname:
                        yield Class(classname)



doc = Module('random.odt')

print
print "Classes:"
for c in doc.classes():
    print '  ', c
print
print "Functions:"
for f in doc.functions():
    print '  ', f
print

import code; code.interact(local=locals())
_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig

random.odt (23K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: what about OpenDocument?

Skip Montanaro-3

    Chad> Attached is the documentation for the random module in ODT
    Chad> format...

I've never heard of OpenDocument or ODT before.  You mentioned "word
processor".  Is it somehow related to OpenOffice?

Skip
_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|

Re: what about OpenDocument?

Chad Whitacre
Skip,

Yes, sorry. You'll see that I mention OpenOffice (OOo) in my first post,
  although I didn't define ODT explicitly. From Wikipedia:

   The OpenDocument format (ODF), short for the OASIS Open Document
   Format for Office Applications, is an open document file format for
   saving and exchanging editable office documents such as text documents
   (including memos, reports, and books), spreadsheets, charts, and
   presentations. This standard was developed by the OASIS industry
   consortium, based upon the XML-based file format originally created by
   OpenOffice.org.

   http://en.wikipedia.org/wiki/OpenDocument

ODT stands for "OpenDocument Text."



chad




[hidden email] wrote:

>     Chad> Attached is the documentation for the random module in ODT
>     Chad> format...
>
> I've never heard of OpenDocument or ODT before.  You mentioned "word
> processor".  Is it somehow related to OpenOffice?
>
> Skip
> _______________________________________________
> Doc-SIG maillist  -  [hidden email]
> http://mail.python.org/mailman/listinfo/doc-sig
>

_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|

Re: what about OpenDocument?

Laura Creighton

So it is a binary file format?  If so, that will be a problem.  Anything
that produces output you cannot run through unix tools such as grep, and
anything that you cannot edit in your favourite text editor will be a
problem.

Laura

In a message of Fri, 30 Dec 2005 12:55:34 EST, Chad Whitacre writes:

>Skip,
>
>Yes, sorry. You'll see that I mention OpenOffice (OOo) in my first post,
>  although I didn't define ODT explicitly. From Wikipedia:
>
>   The OpenDocument format (ODF), short for the OASIS Open Document
>   Format for Office Applications, is an open document file format for
>   saving and exchanging editable office documents such as text documents
>   (including memos, reports, and books), spreadsheets, charts, and
>   presentations. This standard was developed by the OASIS industry
>   consortium, based upon the XML-based file format originally created by
>   OpenOffice.org.
>
>   http://en.wikipedia.org/wiki/OpenDocument
>
>ODT stands for "OpenDocument Text."
>
>
>
>chad
>
>
>
>
>[hidden email] wrote:
>>     Chad> Attached is the documentation for the random module in ODT
>>     Chad> format...
>>
>> I've never heard of OpenDocument or ODT before.  You mentioned "word
>> processor".  Is it somehow related to OpenOffice?
>>
>> Skip
>> _______________________________________________
>> Doc-SIG maillist  -  [hidden email]
>> http://mail.python.org/mailman/listinfo/doc-sig
>>
>
>_______________________________________________
>Doc-SIG maillist  -  [hidden email]
>http://mail.python.org/mailman/listinfo/doc-sig
_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|

Re: what about OpenDocument?

Christopher Armstrong-2
On 12/31/05, Laura Creighton <[hidden email]> wrote:
> So it is a binary file format?  If so, that will be a problem.  Anything
> that produces output you cannot run through unix tools such as grep, and
> anything that you cannot edit in your favourite text editor will be a
> problem.

Well, all it is is a zip file that contains some XML files and some
subdirectories. content.xml is fairly easy to extract from the zip,
munge, and put back.

--
  Twisted   |  Christopher Armstrong: International Man of Twistery
   Radix    |    -- http://radix.twistedmatrix.com
            |  Release Manager, Twisted Project
  \\\V///   |    -- http://twistedmatrix.com
   |o O|    |
w----v----w-+
_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|

Re: what about OpenDocument?

Laura Creighton
Yes, this is precisely what many of us dislike.  When we are working
on a project, and somebody changes the docs, we want a nice readable
svn checkin with the diffs, something that we can glance at any say
'yes that's correct' or 'oops, that's wrong'.  When instead you get
'binary file, no diffs available' your documentation starts living in
a world of its own, a world that you have to visit periodically and do
work to keep up with.

Laura

In a message of Sat, 31 Dec 2005 13:19:34 +1100, Christopher Armstrong writes:

>On 12/31/05, Laura Creighton <[hidden email]> wrote:
>> So it is a binary file format?  If so, that will be a problem.  Anythin
>g
>> that produces output you cannot run through unix tools such as grep, an
>d
>> anything that you cannot edit in your favourite text editor will be a
>> problem.
>
>Well, all it is is a zip file that contains some XML files and some
>subdirectories. content.xml is fairly easy to extract from the zip,
>munge, and put back.
>
>--
>  Twisted   |  Christopher Armstrong: International Man of Twistery
>   Radix    |    -- http://radix.twistedmatrix.com
>            |  Release Manager, Twisted Project
>  \\\V///   |    -- http://twistedmatrix.com
>   |o O|    |
>w----v----w-+
_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|

Re: what about OpenDocument?

Chad Whitacre
Laura,

Thanks for the feedback. You are correct that ODT is functionally a
binary format from the point of view of the Unix toolset, and you'll
notice that I mention "harder to diff/version?" as a con in my original
post.

The decision I am seeing out of this is between a manual markup language
and an automated one. Examples of the former would be LaTeX, ReST,
HTML+PythonDoc: in these formats the doc authors directly manipulate the
documentation source. ODT would be an example of the latter, which
require a specialized authoring environment.

The problem with the former is that some folks don't like manually
coding markup, and so we then look for a balance between light markup
and adequacy to the task. The OpenOffice idea was a stab at solving the
problem in a different way: by not manually coding markup at all.
Besides the translucency of the format, my beef is that the markup it
gives us is inadequate, both in terms of encoding information and
accessing it.



chad



Laura Creighton wrote:

> Yes, this is precisely what many of us dislike.  When we are working
> on a project, and somebody changes the docs, we want a nice readable
> svn checkin with the diffs, something that we can glance at any say
> 'yes that's correct' or 'oops, that's wrong'.  When instead you get
> 'binary file, no diffs available' your documentation starts living in
> a world of its own, a world that you have to visit periodically and do
> work to keep up with.
>
> Laura
>
> In a message of Sat, 31 Dec 2005 13:19:34 +1100, Christopher Armstrong writes:
>
>>On 12/31/05, Laura Creighton <[hidden email]> wrote:
>>
>>>So it is a binary file format?  If so, that will be a problem.  Anythin
>>
>>g
>>
>>>that produces output you cannot run through unix tools such as grep, an
>>
>>d
>>
>>>anything that you cannot edit in your favourite text editor will be a
>>>problem.
>>
>>Well, all it is is a zip file that contains some XML files and some
>>subdirectories. content.xml is fairly easy to extract from the zip,
>>munge, and put back.
>>
>>--
>> Twisted   |  Christopher Armstrong: International Man of Twistery
>>  Radix    |    -- http://radix.twistedmatrix.com
>>           |  Release Manager, Twisted Project
>> \\\V///   |    -- http://twistedmatrix.com
>>  |o O|    |
>>w----v----w-+
>
> _______________________________________________
> Doc-SIG maillist  -  [hidden email]
> http://mail.python.org/mailman/listinfo/doc-sig
>

_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|

Re: what about OpenDocument?

Torsten Bronger
Hallöchen!

Not that I think ODT is a viable alternative but ...

Chad Whitacre <[hidden email]> writes:

> [...] The OpenOffice idea was a stab at solving the problem in a
> different way: by not manually coding markup at all.  Besides the
> translucency of the format,

Can't the main XML file be checked-in parallelly?  Can OOo calculate
a diff between two arbitrary versions?

> my beef is that the markup it gives us is inadequate, both in
> terms of encoding information and accessing it.

Well, OOo separates visual and structural markup very thoroughly, so
I don't understand the issue here.

Tschö,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus            ICQ 264-296-646

_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|

Re: what about OpenDocument?

Chad Whitacre
Torsten,

Thanks for jumping in.


> Can't the main XML file be checked-in parallelly?

Theoretically, yes. But that's an ugly hack, IMO, and we already have
one of those. :^)


> Can OOo calculate a diff between two arbitrary versions?

I'm not sure what OOo's versioning capabilities are, although I suspect
that versioning info would be stored in a separate part of the ODT and
not in the "main XML file" (which I take to refer to content.xml).


>>my beef is that the markup it gives us is inadequate, both in
>>terms of encoding information and accessing it.
>
> Well, OOo separates visual and structural markup very thoroughly, so
> I don't understand the issue here.

Sure, but my idea was to overload OOo's standard markup to encode
Python-documentation-specific information, like function definitions and
parameter lists. But when you use an OOo stylesheet to define what
amounts to a custom markup language, then the only hook you get within
their actual markup is a single attribute  -- text:style-name -- on only
about four different tags -- text:p, text:span, etc. Again, just another
ugly hack.



chad

_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig