[IPython-User] Sharing data across an IPython Parallel Cluster

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
G B
Reply | Threaded
Open this post in threaded view
|

[IPython-User] Sharing data across an IPython Parallel Cluster

G B
Hey--

 I apologize if I'm over looking something obvious in the literature or documentation.  I'm new to cluster computing, but I just managed to get a small IPython cluster running and I'm jazzed. I now have a small cluster running through SSH that I interface with through the Notebook.  56 engines-- 8 on the local machine and 16 each on 3 connected workstations.  Hitting shift-enter and hearing all those cooling fans kick on is a rush.

I have two related questions about sharing data through a cluster:

1) Is it possible to move data directly between engines?  Will the transfer happen locally, or does everything have to pass through the controller?

2) I'm looking for a way to more efficiently share data out to the engines.  In my case I have a core data set of about 30MB that each engine will be processing in different ways.  When I do a push() to the engines, it does 48 network transfers of that 30MB (because 8 engines are local).  What I'd like is to do 3 network transfers, one to each remote machine, and then do local memory copies to the engines on each machine.  How would I go about this?  

2.5)  Is there a way to have the cluster determine the optimal distribution strategy?  I don't mean to sidebar the list on what is probably a topic of research-- I'm just wondering if something is already implemented, or maybe a pointer on where to look.

Thanks!
 Greg
 

_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|

Re: Sharing data across an IPython Parallel Cluster

Martin Luessi
Hi Greg,

I'm not sure about 1) and 2.5) see my reply to 2) below.

Best,

Martin

On Thu, Apr 24, 2014 at 2:23 PM, G B <[hidden email]> wrote:

> Hey--
>
>  I apologize if I'm over looking something obvious in the literature or
> documentation.  I'm new to cluster computing, but I just managed to get a
> small IPython cluster running and I'm jazzed. I now have a small cluster
> running through SSH that I interface with through the Notebook.  56
> engines-- 8 on the local machine and 16 each on 3 connected workstations.
> Hitting shift-enter and hearing all those cooling fans kick on is a rush.
>
> I have two related questions about sharing data through a cluster:
>
> 1) Is it possible to move data directly between engines?  Will the transfer
> happen locally, or does everything have to pass through the controller?
>
> 2) I'm looking for a way to more efficiently share data out to the engines.
> In my case I have a core data set of about 30MB that each engine will be
> processing in different ways.  When I do a push() to the engines, it does 48
> network transfers of that 30MB (because 8 engines are local).  What I'd like
> is to do 3 network transfers, one to each remote machine, and then do local
> memory copies to the engines on each machine.  How would I go about this?

If you have access to a common file system (e.g. NFS), it is in my
experience best to store the data to disk and loading it in the
engines when required. Like this you can easily run jobs where the
data doesn't fit in the memory available on the machine running the
controller. A pretty good example for this can be found here:

https://github.com/ogrisel/parallel_ml_tutorial

>
> 2.5)  Is there a way to have the cluster determine the optimal distribution
> strategy?  I don't mean to sidebar the list on what is probably a topic of
> research-- I'm just wondering if something is already implemented, or maybe
> a pointer on where to look.
>
> Thanks!
>  Greg
>
>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|

Re: Sharing data across an IPython Parallel Cluster

Matthias Bussonnier
In reply to this post by G B
Hi,

I would suggest posting on ipython-dev which is more active.

Le 24 avr. 2014 à 20:23, G B a écrit :

> Hey--
>
>  I apologize if I'm over looking something obvious in the literature or documentation.  I'm new to cluster computing, but I just managed to get a small IPython cluster running and I'm jazzed. I now have a small cluster running through SSH that I interface with through the Notebook.  56 engines-- 8 on the local machine and 16 each on 3 connected workstations.  Hitting shift-enter and hearing all those cooling fans kick on is a rush.
>
> I have two related questions about sharing data through a cluster:
>
> 1) Is it possible to move data directly between engines?  Will the transfer happen locally, or does everything have to pass through the controller?

I'm not a big user of parallel, but it is not possible IIRC. There was a discussion at euro-scipy 2013
that using the torrent protocol to send data to engines would be neat.

--
M

>
> 2) I'm looking for a way to more efficiently share data out to the engines.  In my case I have a core data set of about 30MB that each engine will be processing in different ways.  When I do a push() to the engines, it does 48 network transfers of that 30MB (because 8 engines are local).  What I'd like is to do 3 network transfers, one to each remote machine, and then do local memory copies to the engines on each machine.  How would I go about this?  
>
> 2.5)  Is there a way to have the cluster determine the optimal distribution strategy?  I don't mean to sidebar the list on what is probably a topic of research-- I'm just wondering if something is already implemented, or maybe a pointer on where to look.
>
> Thanks!
>  Greg
>  
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user

_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user