Improving Python+MPI import performance

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Improving Python+MPI import performance

Langton, Asher
Hi all,

I work on a Python/C++ scientific code that runs as a number of
independent Python processes communicating via MPI. Unfortunately, as
some of you may have experienced, module importing does not scale well
in Python/MPI applications. For 32k processes on BlueGene/P, importing
100 trivial C-extension modules takes 5.5 hours, compared to 35
minutes for all other interpreter loading and initialization. We
developed a simple pure-Python module (based on knee.py, a
hierarchical import example) that cuts the import time from 5.5 hours
to 6 minutes.

The code is available here:

https://github.com/langton/MPI_Import

Usage, implementation details, and limitations are described in a
docstring at the beginning of the file (just after the mandatory
legalese).

I've talked with a few people who've faced the same problem and heard
about a variety of approaches, which range from putting all necessary
files in one directory to hacking the interpreter itself so it
distributes the module-loading over MPI. Last summer, I had a student
intern try a few of these approaches. It turned out that the problem
wasn't so much the simultaneous module loads, but rather the huge
number of failed open() calls (ENOENT) as the interpreter tries to
find the module files. In the MPI_Import module, we have rank 0
perform the module lookups and then broadcast the locations to the
rest of the processes. For our real-world scientific applications
written in Python and C++, this has meant that we can start a problem
and actually make computational progress before the batch allocation
ends.

If you try out the code, I'd appreciate any feedback you have:
performance results, bugfixes/feature-additions, or alternate
approaches to solving this problem. Thanks!

-Asher
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Improving Python+MPI import performance

Minesh B. Amin
Hi Asher,

Simply amazing :)

Reducing the number of failed open() calls is indeed the key.

Cheers!
Minesh

On Thu, 2012-01-12 at 12:23 -0800, Asher Langton wrote:

> Hi all,
>
> I work on a Python/C++ scientific code that runs as a number of
> independent Python processes communicating via MPI. Unfortunately, as
> some of you may have experienced, module importing does not scale well
> in Python/MPI applications. For 32k processes on BlueGene/P, importing
> 100 trivial C-extension modules takes 5.5 hours, compared to 35
> minutes for all other interpreter loading and initialization. We
> developed a simple pure-Python module (based on knee.py, a
> hierarchical import example) that cuts the import time from 5.5 hours
> to 6 minutes.
>
> The code is available here:
>
> https://github.com/langton/MPI_Import
>
>

_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies
Reply | Threaded
Open this post in threaded view
|

Re: Improving Python+MPI import performance

Fernando Perez
In reply to this post by Langton, Asher
On Thu, Jan 12, 2012 at 12:23 PM, Asher Langton <[hidden email]> wrote:

> I work on a Python/C++ scientific code that runs as a number of
> independent Python processes communicating via MPI. Unfortunately, as
> some of you may have experienced, module importing does not scale well
> in Python/MPI applications. For 32k processes on BlueGene/P, importing
> 100 trivial C-extension modules takes 5.5 hours, compared to 35
> minutes for all other interpreter loading and initialization. We
> developed a simple pure-Python module (based on knee.py, a
> hierarchical import example) that cuts the import time from 5.5 hours
> to 6 minutes.
>
> The code is available here:
>
> https://github.com/langton/MPI_Import

Excellent!  I suggest you post this to the numpy list, I bet you it
will be of interest to a lot of people...

Cheers,

f
_______________________________________________
Baypiggies mailing list
[hidden email]
To change your subscription options or unsubscribe:
http://mail.python.org/mailman/listinfo/baypiggies