[issue10665] Update and expand unicodedata module documentation

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[issue10665] Update and expand unicodedata module documentation

STINNER Victor

New submission from Alexander Belopolsky <[hidden email]>:

unicodedata module documentation has not been updated to reflect transition to 6.0.  Attached patch fixes the version and unicode.org links and starts making the documentation rely less on the unicode.org pages for basic understanding of the provided functionality.

I am posting work in progress to solicit feedback on how much of the Unicode Standard information we would want to present here.

On of the goals of this patch is to provide a standard reference that can be used throughout the library manual for basic Unicode concepts without sending the reader over to unicode.org.

----------
assignee: docs@python
components: Documentation
files: unicodedata-doc.diff
keywords: patch
messages: 123700
nosy: belopolsky, docs@python
priority: normal
severity: normal
status: open
title: Update and expand unicodedata module documentation
versions: Python 3.2
Added file: http://bugs.python.org/file19990/unicodedata-doc.diff

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Update and expand unicodedata module documentation

STINNER Victor

Changes by Alexander Belopolsky <[hidden email]>:


----------
components: +Unicode

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Update and expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Alexander Belopolsky <[hidden email]> added the comment:

Added more tables semi-automatically produced from http://www.unicode.org/Public/UNIDATA/PropertyValueAliases.txt

----------
Added file: http://bugs.python.org/file19991/unicodedata-doc.diff

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Update and expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Changes by Alexander Belopolsky <[hidden email]>:


Removed file: http://bugs.python.org/file19990/unicodedata-doc.diff

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Update and expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Martin v. Löwis <[hidden email]> added the comment:

Please, one issue per report and checkin, and no work-in-progress on the tracker. The issue of factually correcting claims about the unicodedata module and elaborations on how it works are unrelated issues.

----------
nosy: +loewis

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Alexander Belopolsky <[hidden email]> added the comment:

On Thu, Dec 9, 2010 at 6:10 PM, Martin v. Löwis <[hidden email]> wrote:
..
> Please, one issue per report and checkin,

The s/5.2/6.0/ issue is hardly worth a tracker ticket.   I've
committed these changes in r87159.  (Sorry for the unrelated changes -
reverted in the next checkin.)

> and no work-in-progress on the tracker.

Why?  I thought "release early, release often" was a good thing.   I
wanted to get an early feedback because we certainly don't want to
replicate the Unicode Standard in the Python documentation, but I
think at least for the category() method that returns cryptic 2-letter
codes, we should include a table explaining them.   I am not so sure
about bidirectional() or asian_width().

> The issue of factually correcting claims about the unicodedata module and elaborations on how it works are unrelated issues.

I am changing the title of the issue to make it cover only the latter.
>
> ----------
> nosy: +loewis
>
> _______________________________________
> Python tracker <[hidden email]>
> <http://bugs.python.org/issue10665>
> _______________________________________
>

----------
title: Update and expand unicodedata module documentation -> Expand unicodedata module documentation

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Changes by Alexander Belopolsky <[hidden email]>:


----------
nosy: +ezio.melotti, haypo, lemburg

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Alexander Belopolsky <[hidden email]> added the comment:

In issue10665.diff, I completed the character examples in the general categories table.

----------
Added file: http://bugs.python.org/file20002/issue10665.diff

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Martin v. Löwis <[hidden email]> added the comment:

> Why?  I thought "release early, release often" was a good thing.

Create a branch for that, or post an issue on Rietveld. W-I-P IMO
confuses people reviewing the patches, running into the same ones
over-and-over again, only to find out every time "it's not ready yet".
So the natural reaction is to close it as rejected, for it being
incomplete.

Regards,
Martin

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor

Alexander Belopolsky <[hidden email]> added the comment:

On Fri, Dec 10, 2010 at 6:04 PM, Martin v. Löwis <[hidden email]> wrote:
..
>> Why?  I thought "release early, release often" was a good thing.
>
> Create a branch for that, or post an issue on Rietveld.

Martin,

This is a documentation patch affecting a single HTML page.  An svn
branch for something like this is certainly an overkill.  Maybe when
branches become more user-friendly with Hg, it will make sense to do
something like this in a branch.   Rietveld is great for code reviews,
but for doc patches its is sometimes desirable to post a rendered page
for a review.  In this particular case, however the reST in diff is
quite readable.

> W-I-P IMO
> confuses people reviewing the patches, running into the same ones
> over-and-over again, only to find out every time "it's not ready yet".

I posted this patch with a specific question: "how much of the Unicode
Standard information we would want to present?"   The patch included
several tables similar to what a determined reader can find at
unicode.org.   I think it is useful to present this information in the
Python docs for several reasons:

1. It makes the information more readily accessible to someone who
just want to figure out what the code returned by
unicodedata.category() means.

2. We can present examples using Python notation and focus on what is
relevant to Python users.  For example, what are the digits other than
0-9, or what is the difference between a digit and a decimal.

3. Other parts of the documentation can refer to this information more
easily.  For example, str.isdigit() can refer to 'Nd' general
category.

The downside is that we may need to update this info when Unicode
Standard changes.  Given the pace of change for this info, I don't
think this is serious burden and most of the data can be
auto-generated from UCD files.

> So the natural reaction is to close it as rejected, for it being
> incomplete.

Are you going to reject say issue2636 on this basis? :-)   Has *any*
patch ever been rejected as incomplete?

Seriously, I had a specific reason to post an incomplete patch for
review: formatting reST tables is tedious and if others think we
should not include this info in Python docs, I don't want to spend
more time polishing the patch.  On the other hand, an incomplete patch
is helpful because it demonstrates how much information I am proposing
to include.  If I just posted a request without a patch, a natural
reaction would be: do you have a patch?

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Ezio Melotti <[hidden email]> added the comment:

The patch contains non-ascii chars that should be avoided (they break `make pdf`).

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

STINNER Victor <[hidden email]> added the comment:

> The patch contains non-ascii chars that should be avoided (they break `make pdf`).

What is the error? Can't you fix the PDF generator instead?

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Ezio Melotti <[hidden email]> added the comment:

I think Georg said that non-ascii chars shouldn't be used directly in the rst files, and one of the reasons is that they break `make pdf` (I haven't tried though).
I added him to the nosy.

----------
nosy: +georg.brandl

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Georg Brandl <[hidden email]> added the comment:

The "PDF generator" is PDFLaTeX, whose range of Unicode characters is very limited, so no, I can't fix it.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Ezio Melotti <[hidden email]> added the comment:

FWIW latin-1 chars are fine (e.g. ¶ or è).  The right command to build the pdfs is `make latex` and then `make all-pdf` in build/latex/.

Alexander, now you could also make a remote hg repo, and use it to generate up-to-date patches that can be reviewed directly.

I left a few more comments on rietveld about the last patch.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Ezio Melotti <[hidden email]> added the comment:

Alexander suggested on IRC to use the 'unicode' directive[0], but even if that works in the HTML (only outside code blocks), it still breaks the PDF.
Another alternative that might work is the 'raw' role[1].

[0]: http://docutils.sourceforge.net/docs/ref/rst/directives.html#unicode-character-codes
[1]: http://docutils.sourceforge.net/docs/ref/rst/roles.html#specialized-roles

----------
keywords: +needs review

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Alexander Belopolsky <[hidden email]> added the comment:

> The "PDF generator" is PDFLaTeX, whose range of Unicode characters
> is very limited, so no, I can't fix it.

My search for pdflatex and unicode has quickly revealed this 4-year old howto:

http://tclab.kaist.ac.kr/ipe/pdftex_2.html

I'll experiment with some recent LaTeX distributions before making further effort to work around current unicode limitations.  For example, XeTeX appears to have good unicode support:

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=xetex

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Marc-Andre Lemburg <[hidden email]> added the comment:

Ezio Melotti wrote:
>
> Ezio Melotti <[hidden email]> added the comment:
>
> Alexander suggested on IRC to use the 'unicode' directive[0], but even if that works in the HTML (only outside code blocks), it still breaks the PDF.
> Another alternative that might work is the 'raw' role[1].
>
> [0]: http://docutils.sourceforge.net/docs/ref/rst/directives.html#unicode-character-codes
> [1]: http://docutils.sourceforge.net/docs/ref/rst/roles.html#specialized-roles

I don't think we should include Unicode code points as literals
in Python source code examples, for much the same reason we
don't want them in the stdlib source code.

Why don't you use the standard literal escapes for the examples
and annotate the code points with the code point names ?

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor
In reply to this post by STINNER Victor

Ezio Melotti <[hidden email]> added the comment:

One reason is that unicodedata.lookup actually returns a unicode char, so if we want to show a code snippet that uses unicodedata.lookup we either have to use a unicode literal or limit the chars in the examples to latin1 to make sure it works nice with the PDF generator.

Using escape sequences elsewhere might work, but in some examples it's better to use the actual chars IMHO (except that they don't work with the PDF).

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue10665] Expand unicodedata module documentation

STINNER Victor

Marc-Andre Lemburg <[hidden email]> added the comment:

Ezio Melotti wrote:
>
> Ezio Melotti <[hidden email]> added the comment:
>
> One reason is that unicodedata.lookup actually returns a unicode char, so if we want to show a code snippet that uses unicodedata.lookup we either have to use a unicode literal or limit the chars in the examples to latin1 to make sure it works nice with the PDF generator.

Why not wrap the calls with a repr() ?

> Using escape sequences elsewhere might work, but in some examples it's better to use the actual chars IMHO (except that they don't work with the PDF).

Sure, it'll look nicer, but it will also make comparing the examples
with the actual output users see on the screen error-prone (e.g. if
the fonts don't have the necessary glyphs).

Copy&paste will also often fail.

I think it's more useful to show examples that more or less always
work, than ones which display all available goodies.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue10665>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%40nabble.com

12