Quantcast

Building Python Document 30% faster.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Building Python Document 30% faster.

稲田直哉
Hi, all.

I'm a member of Japanese translate of Python document Project.
We complete translating Python 2.5 document last year and now
work for Python 2.6 Document.

I feel building document is slow a little. So I try to tune docutils
and Sphinx.

Attached patches make building document 30% faster.
(In my environ. 330sec -> 220sec roughly)

I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch.
Could anyone review these patch?

These patches changes following:

1. Use PyStemmer instead of PorterStemmer.
PorterStemmer is implemented Python and consumes about 50seconds
during buid.
PyStemmer <http://pypi.python.org/pypi/PyStemmer/1.0.1> implemented in C
and consumes only 7 seconds.

But searchindex.js with PyStemmer is different to one with PorterStemmer.

2. Avoid building OptionParser many times.
Sphinx uses docutils.core.publish_parts() without `settings` argument
many times.
This causes building docutils.frontend.OptionParser many times and consumes
29 seconds.

3. Avoid building NestedStateMachine many times.
NestedStateMachine is built and destroyed many times.
Recycling that SM make significant performance gain.

== before ==
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
25720/459    0.997    0.000  134.085    0.292
tools/docutils/statemachine.py:178(run)
92281/1513    1.420    0.000  133.935    0.089
tools/docutils/statemachine.py:384(check_line)
    25720    0.184    0.000   89.628    0.003
tools/docutils/statemachine.py:129(__init__)
    25720    0.632    0.000   89.444    0.003
tools/docutils/statemachine.py:448(add_states)
   385800    1.665    0.000   88.813    0.000
tools/docutils/statemachine.py:436(add_state)
   385800    2.356    0.000   85.287    0.000
tools/docutils/statemachine.py:928(__init__)
   385800    1.793    0.000   82.931    0.000
tools/docutils/statemachine.py:566(__init__)

== after ==
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
25720/459    1.051    0.000   68.175    0.149
tools/docutils/statemachine.py:178(run)
92281/1513    1.405    0.000   68.024    0.045
tools/docutils/statemachine.py:384(check_line)
     6862    0.031    0.000   24.241    0.004
tools/docutils/statemachine.py:129(__init__)
     6862    0.174    0.000   24.210    0.004
tools/docutils/statemachine.py:448(add_states)
   102930    0.430    0.000   24.036    0.000
tools/docutils/statemachine.py:436(add_state)
   102930    0.633    0.000   23.162    0.000
tools/docutils/statemachine.py:928(__init__)
   102930    0.549    0.000   22.529    0.000
tools/docutils/statemachine.py:566(__init__)

_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig

sphinx.patch (5K) Download Attachment
docutils.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Building Python Document 30% faster.

Georg Brandl-2
稲田直哉 schrieb:
> Hi, all.
>
> I'm a member of Japanese translate of Python document Project.
> We complete translating Python 2.5 document last year and now
> work for Python 2.6 Document.
>
> I feel building document is slow a little. So I try to tune docutils
> and Sphinx.

Great! I've already started tuning a bit with the docutils Node.traverse()
patch, but did not do much more than that.

> Attached patches make building document 30% faster.
> (In my environ. 330sec -> 220sec roughly)
>
> I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch.
> Could anyone review these patch?

I will, when I have a bit more time.

> These patches changes following:
>
> 1. Use PyStemmer instead of PorterStemmer.
> PorterStemmer is implemented Python and consumes about 50seconds
> during buid.
> PyStemmer <http://pypi.python.org/pypi/PyStemmer/1.0.1> implemented in C
> and consumes only 7 seconds.
>
> But searchindex.js with PyStemmer is different to one with PorterStemmer.

This could be a problem.  The client-side search implemented in JavaScript
uses exactly the same stemmer (which is necessary to be able to find all
words).  In short, if you can find a C implementation of the Porter stemmer
we could include it in Sphinx as an optional extension.

> 2. Avoid building OptionParser many times.
> Sphinx uses docutils.core.publish_parts() without `settings` argument
> many times.
> This causes building docutils.frontend.OptionParser many times and consumes
> 29 seconds.
>
> 3. Avoid building NestedStateMachine many times.
> NestedStateMachine is built and destroyed many times.
> Recycling that SM make significant performance gain.

I assume that both of this is in the second commit I see on bitbucket?  Both
look like a worthy optimization.

Thanks,
Georg

--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Building Python Document 30% faster.

Michael Foord-5
In reply to this post by 稲田直哉
Hello,

There is a docutils specific mailing list:

docutils users <[hidden email]>

You will need to subscribe from sourceforge, or you can just post your
patch on sourceforge:

http://docutils.sf.net

Another patch was recently submitted by Georg Brandl offering a similar
speedup. No idea if it is in the same area or not.

All the best,


Michael Foord

稲田直哉 wrote:

> Hi, all.
>
> I'm a member of Japanese translate of Python document Project.
> We complete translating Python 2.5 document last year and now
> work for Python 2.6 Document.
>
> I feel building document is slow a little. So I try to tune docutils
> and Sphinx.
>
> Attached patches make building document 30% faster.
> (In my environ. 330sec -> 220sec roughly)
>
> I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch.
> Could anyone review these patch?
>
> These patches changes following:
>
> 1. Use PyStemmer instead of PorterStemmer.
> PorterStemmer is implemented Python and consumes about 50seconds
> during buid.
> PyStemmer <http://pypi.python.org/pypi/PyStemmer/1.0.1> implemented in C
> and consumes only 7 seconds.
>
> But searchindex.js with PyStemmer is different to one with PorterStemmer.
>
> 2. Avoid building OptionParser many times.
> Sphinx uses docutils.core.publish_parts() without `settings` argument
> many times.
> This causes building docutils.frontend.OptionParser many times and consumes
> 29 seconds.
>
> 3. Avoid building NestedStateMachine many times.
> NestedStateMachine is built and destroyed many times.
> Recycling that SM make significant performance gain.
>
> == before ==
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
> 25720/459    0.997    0.000  134.085    0.292
> tools/docutils/statemachine.py:178(run)
> 92281/1513    1.420    0.000  133.935    0.089
> tools/docutils/statemachine.py:384(check_line)
>     25720    0.184    0.000   89.628    0.003
> tools/docutils/statemachine.py:129(__init__)
>     25720    0.632    0.000   89.444    0.003
> tools/docutils/statemachine.py:448(add_states)
>    385800    1.665    0.000   88.813    0.000
> tools/docutils/statemachine.py:436(add_state)
>    385800    2.356    0.000   85.287    0.000
> tools/docutils/statemachine.py:928(__init__)
>    385800    1.793    0.000   82.931    0.000
> tools/docutils/statemachine.py:566(__init__)
>
> == after ==
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
> 25720/459    1.051    0.000   68.175    0.149
> tools/docutils/statemachine.py:178(run)
> 92281/1513    1.405    0.000   68.024    0.045
> tools/docutils/statemachine.py:384(check_line)
>      6862    0.031    0.000   24.241    0.004
> tools/docutils/statemachine.py:129(__init__)
>      6862    0.174    0.000   24.210    0.004
> tools/docutils/statemachine.py:448(add_states)
>    102930    0.430    0.000   24.036    0.000
> tools/docutils/statemachine.py:436(add_state)
>    102930    0.633    0.000   23.162    0.000
> tools/docutils/statemachine.py:928(__init__)
>    102930    0.549    0.000   22.529    0.000
> tools/docutils/statemachine.py:566(__init__)
>  
> ------------------------------------------------------------------------
>
> _______________________________________________
> Doc-SIG maillist  -  [hidden email]
> http://mail.python.org/mailman/listinfo/doc-sig
>  


--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog


_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Building Python Document 30% faster.

Aahz
On Sat, Apr 04, 2009, Michael Foord wrote:
>
> There is a docutils specific mailing list:
>
> docutils users <[hidden email]>

Actually, there are two docutils mailing lists, and I think that
docutils-develop is probably more appropriate for this.
--
Aahz ([hidden email])           <*>         http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it."  --Brian W. Kernighan
_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Building Python Document 30% faster.

稲田直哉
In reply to this post by Georg Brandl-2
Hi Georg.

>> Attached patches make building document 30% faster.
>> (In my environ. 330sec -> 220sec roughly)
>>
>> I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch.
>> Could anyone review these patch?
>
> I will, when I have a bit more time.

Thank you.

>> But searchindex.js with PyStemmer is different to one with PorterStemmer.
>
> This could be a problem.  The client-side search implemented in JavaScript
> uses exactly the same stemmer (which is necessary to be able to find all
> words).  In short, if you can find a C implementation of the Porter stemmer
> we could include it in Sphinx as an optional extension.

I see.
Original Porter Stemmer is here.
http://tartarus.org/~martin/PorterStemmer/

And that implemented in C. I'll try to make Python wrapper with swig and
compare searchindex.js. Wait for a while.


>> 2. Avoid building OptionParser many times.
>> Sphinx uses docutils.core.publish_parts() without `settings` argument
>> many times.
>> This causes building docutils.frontend.OptionParser many times and consumes
>> 29 seconds.
>>
>> 3. Avoid building NestedStateMachine many times.
>> NestedStateMachine is built and destroyed many times.
>> Recycling that SM make significant performance gain.
>
> I assume that both of this is in the second commit I see on bitbucket?  Both
> look like a worthy optimization.

Former is in bitbucket.
http://bitbucket.org/methane/sphinx-speedup/changeset/72fa0ceefcae/

And later is not in bitbucket because NestedStateMachine is not in Sphinx
but docutils.

--
Naoki INADA  <[hidden email]>
   KLab Inc.  <http://www.klab.jp>
_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Building Python Document 30% faster.

稲田直哉
In reply to this post by Aahz
> On Sat, Apr 04, 2009, Michael Foord wrote:
>>
>> There is a docutils specific mailing list:
>>
>> docutils users <[hidden email]>
>
> Actually, there are two docutils mailing lists, and I think that
> docutils-develop is probably more appropriate for this.

OK. I'll subscribe both.

--
Naoki INADA  <[hidden email]>
   KLab Inc.  <http://www.klab.jp>
_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Building Python Document 30% faster.

稲田直哉
In reply to this post by 稲田直哉
>>> But searchindex.js with PyStemmer is different to one with PorterStemmer.
>>
>> This could be a problem.  The client-side search implemented in JavaScript
>> uses exactly the same stemmer (which is necessary to be able to find all
>> words).  In short, if you can find a C implementation of the Porter stemmer
>> we could include it in Sphinx as an optional extension.
>
> I see.
> Original Porter Stemmer is here.
> http://tartarus.org/~martin/PorterStemmer/
>
> And that implemented in C. I'll try to make Python wrapper with swig and
> compare searchindex.js. Wait for a while.

I make a Python wrapper!
http://bitbucket.org/methane/porterstemmer/

This is my first extension module, and still alpha version.
But I can build Python document with the porterstemmer and searchindex.js is
same to original.

--
Naoki INADA  <[hidden email]>
   KLab Inc.  <http://www.klab.jp>
_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Building Python Document 30% faster.

Georg Brandl-2
In reply to this post by 稲田直哉
Naoki INADA schrieb:

>>> 2. Avoid building OptionParser many times.
>>> Sphinx uses docutils.core.publish_parts() without `settings` argument
>>> many times.
>>> This causes building docutils.frontend.OptionParser many times and consumes
>>> 29 seconds.
>>>
>>> 3. Avoid building NestedStateMachine many times.
>>> NestedStateMachine is built and destroyed many times.
>>> Recycling that SM make significant performance gain.
>>
>> I assume that both of this is in the second commit I see on bitbucket?  Both
>> look like a worthy optimization.
>
> Former is in bitbucket.
> http://bitbucket.org/methane/sphinx-speedup/changeset/72fa0ceefcae/

Thanks, merged!  When porterstemmer is mature I'd also like to include it in
the Sphinx distribution as an optional extension.

> And later is not in bitbucket because NestedStateMachine is not in Sphinx
> but docutils.

OK, let's see. I'd first try to get the patch into docutils, after passing the
tests.  However, since most people will be using docutils 0.4 or 0.5 it might
also make sense to make a monkey-patch version for sphinx, like the traverse one.

Georg


_______________________________________________
Doc-SIG maillist  -  [hidden email]
http://mail.python.org/mailman/listinfo/doc-sig
Loading...