Quantcast

Getting setup on a remote cluster w/ Sun Grid Engine.

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Getting setup on a remote cluster w/ Sun Grid Engine.

Dharhas Pothina

Hi All,


We have managed to parallelize one of our spatial interpolation scripts very easily with the new ipython parallel. Thanks for developing such a great tool, it was fairly easy to get working. Now we are trying to set things up to run on our internal cluster and I'm having difficulties understanding how to configure things.


What I would like to do is have ipython running on a local machine (windows & linux) connect to the cluster, request some nodes through SGE and run the computation. I'm not quite getting what goes where from the documentation.


I think I understood the PBS example but I'm still not understanding where I would put the connection information to log into the cluster. I would really appreciate a step by step of what files need to be where and any example config files for an SGE setup.


thanks,


- dharhas







_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting setup on a remote cluster w/ Sun Grid Engine.

MinRK
On a login node on the cluster:

# create profile with default parallel config files, called sge
[login] $> ipython profile create sge --parallel

edit IPYTHON_DIR/profile_sge/ipcontroller_config.py, adding the line:

c.HubFactory.ip = '0.0.0.0'

to instruct the controller to listen on all interfaces.

Edit IPYTHON_DIR/profile_sge/ipcluster_config.py, adding the line:

c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'
c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'

# optional: specify a queue for all:
c.SGELauncher.queue = 'short'
To instruct ipcluster to use SGE to launch the engines and the controller

At this point, you can start 10 engines and a controller with:

[login] $> ipcluster start -n 10 --profile=sge

Now the only file you will need to connect to the cluster will be in:

IPYTHON_DIR/profile_sge/security/ipcontroller_client.json

Just move that file around, and you will be able to connect clients.
To connect from a laptop, you will probably need to specify a login
node as the ssh server when you do:

from IPython import parallel

rc = parallel.Client('/path/to/ipcontroller_client.json',
sshserver='[hidden email]')

-MinRK


On Wed, Aug 24, 2011 at 13:18, Dharhas Pothina
<[hidden email]> wrote:

> Hi All,
>
> We have managed to parallelize one of our spatial interpolation scripts very
> easily with the new ipython parallel. Thanks for developing such a great
> tool, it was fairly easy to get working. Now we are trying to set things up
> to run on our internal cluster and I'm having difficulties understanding how
> to configure things.
>
> What I would like to do is have ipython running on a local machine (windows
> & linux) connect to the cluster, request some nodes through SGE and run the
> computation. I'm not quite getting what goes where from the documentation.
>
> I think I understood the PBS example but I'm still not understanding where I
> would put the connection information to log into the cluster. I would really
> appreciate a step by step of what files need to be where and any example
> config files for an SGE setup.
>
> thanks,
>
> - dharhas
>
>
>
>
>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting setup on a remote cluster w/ Sun Grid Engine.

Dharhas Pothina

I was able to start the engines and they were submitted to the queue properly but I do not have a json file in the corresponding security folder. Do I need to do something to generate it.


- dharhas

>>> MinRK <[hidden email]> 8/24/2011 4:44 PM >>>
On a login node on the cluster:

# create profile with default parallel config files, called sge
[login] $> ipython profile create sge --parallel

edit IPYTHON_DIR/profile_sge/ipcontroller_config.py, adding the line:

c.HubFactory.ip = '0.0.0.0'

to instruct the controller to listen on all interfaces.

Edit IPYTHON_DIR/profile_sge/ipcluster_config.py, adding the line:

c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'
c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'

# optional: specify a queue for all:
c.SGELauncher.queue = 'short'
To instruct ipcluster to use SGE to launch the engines and the controller

At this point, you can start 10 engines and a controller with:

[login] $> ipcluster start -n 10 --profile=sge

Now the only file you will need to connect to the cluster will be in:

IPYTHON_DIR/profile_sge/security/ipcontroller_client.json

Just move that file around, and you will be able to connect clients.
To connect from a laptop, you will probably need to specify a login
node as the ssh server when you do:

from IPython import parallel

rc = parallel.Client('/path/to/ipcontroller_client.json',
sshserver='[hidden email]')

-MinRK


On Wed, Aug 24, 2011 at 13:18, Dharhas Pothina
<[hidden email]> wrote:


> Hi All,
>
> We have managed to parallelize one of our spatial interpolation scripts very
> easily with the new ipython parallel. Thanks for developing such a great
> tool, it was fairly easy to get working. Now we are trying to set things up
> to run on our internal cluster and I'm having difficulties understanding how
> to configure things.
>
> What I would like to do is have ipython running on a local machine (windows
> & linux) connect to the cluster, request some nodes through SGE and run the
> computation. I'm not quite getting what goes where from the documentation.
>
> I think I understood the PBS example but I'm still not understanding where I
> would put the connection information to log into the cluster. I would really
> appreciate a step by step of what files need to be where and any example
> config files for an SGE setup.
>
> thanks,
>
> - dharhas
>
>
>
>
>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user


_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting setup on a remote cluster w/ Sun Grid Engine.

MinRK
On Wed, Aug 24, 2011 at 15:05, Dharhas Pothina
<[hidden email]> wrote:
>
> I was able to start the engines and they were submitted to the queue
> properly but I do not have a json file in the corresponding security folder.
> Do I need to do something to generate it.

The JSON file is written by ipcontroller, so it will only show up
after the controller has started.

>
> - dharhas
>
>>>> MinRK <[hidden email]> 8/24/2011 4:44 PM >>>
> On a login node on the cluster:
>
> # create profile with default parallel config files, called sge
> [login] $> ipython profile create sge --parallel
>
> edit IPYTHON_DIR/profile_sge/ipcontroller_config.py, adding the line:
>
> c.HubFactory.ip = '0.0.0.0'
>
> to instruct the controller to listen on all interfaces.
>
> Edit IPYTHON_DIR/profile_sge/ipcluster_config.py, adding the line:
>
> c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'
> c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'
>
> # optional: specify a queue for all:
> c.SGELauncher.queue = 'short'
> To instruct ipcluster to use SGE to launch the engines and the controller
>
> At this point, you can start 10 engines and a controller with:
>
> [login] $> ipcluster start -n 10 --profile=sge
>
> Now the only file you will need to connect to the cluster will be in:
>
> IPYTHON_DIR/profile_sge/security/ipcontroller_client.json
>
> Just move that file around, and you will be able to connect clients.
> To connect from a laptop, you will probably need to specify a login
> node as the ssh server when you do:
>
> from IPython import parallel
>
> rc = parallel.Client('/path/to/ipcontroller_client.json',
> sshserver='[hidden email]')
>
> -MinRK
>
>
> On Wed, Aug 24, 2011 at 13:18, Dharhas Pothina
> <[hidden email]> wrote:
>> Hi All,
>>
>> We have managed to parallelize one of our spatial interpolation scripts
>> very
>> easily with the new ipython parallel. Thanks for developing such a great
>> tool, it was fairly easy to get working. Now we are trying to set things
>> up
>> to run on our internal cluster and I'm having difficulties understanding
>> how
>> to configure things.
>>
>> What I would like to do is have ipython running on a local machine
>> (windows
>> & linux) connect to the cluster, request some nodes through SGE and run
>> the
>> computation. I'm not quite getting what goes where from the documentation.
>>
>> I think I understood the PBS example but I'm still not understanding where
>> I
>> would put the connection information to log into the cluster. I would
>> really
>> appreciate a step by step of what files need to be where and any example
>> config files for an SGE setup.
>>
>> thanks,
>>
>> - dharhas
>>
>>
>>
>>
>>
>> _______________________________________________
>> IPython-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>
>>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Running multiple ipclusters on remote cluster w/ Sun Grid Engine.

Dharhas Pothina

One more related question. I've got things working as described below and it looks like each time I start ipcluster it overwrites the ipcontroller_client.json and the ipcontroller_engine.json file. Does this mean I can only have one ipcluster running? Or if I start the ipcluster and then copy the json file to the clients that require them can I then start another ipcluster job and use the new json file for the new clients.


thanks,


- dharhas

>>> MinRK <[hidden email]> 8/24/2011 5:07 PM >>>
On Wed, Aug 24, 2011 at 15:05, Dharhas Pothina
<[hidden email]> wrote:
>
> I was able to start the engines and they were submitted to the queue
> properly but I do not have a json file in the corresponding security folder.
> Do I need to do something to generate it.

The JSON file is written by ipcontroller, so it will only show up
after the controller has started.


>
> - dharhas
>
>>>> MinRK <[hidden email]> 8/24/2011 4:44 PM >>>
> On a login node on the cluster:
>
> # create profile with default parallel config files, called sge
> [login] $> ipython profile create sge --parallel
>
> edit IPYTHON_DIR/profile_sge/ipcontroller_config.py, adding the line:
>
> c.HubFactory.ip = '0.0.0.0'
>
> to instruct the controller to listen on all interfaces.
>
> Edit IPYTHON_DIR/profile_sge/ipcluster_config.py, adding the line:
>
> c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'
> c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'
>
> # optional: specify a queue for all:
> c.SGELauncher.queue = 'short'
> To instruct ipcluster to use SGE to launch the engines and the controller
>
> At this point, you can start 10 engines and a controller with:
>
> [login] $> ipcluster start -n 10 --profile=sge
>
> Now the only file you will need to connect to the cluster will be in:
>
> IPYTHON_DIR/profile_sge/security/ipcontroller_client.json
>
> Just move that file around, and you will be able to connect clients.
> To connect from a laptop, you will probably need to specify a login
> node as the ssh server when you do:
>
> from IPython import parallel
>
> rc = parallel.Client('/path/to/ipcontroller_client.json',
> sshserver='[hidden email]')
>
> -MinRK
>
>
> On Wed, Aug 24, 2011 at 13:18, Dharhas Pothina
> <[hidden email]> wrote:
>> Hi All,
>>
>> We have managed to parallelize one of our spatial interpolation scripts
>> very
>> easily with the new ipython parallel. Thanks for developing such a great
>> tool, it was fairly easy to get working. Now we are trying to set things
>> up
>> to run on our internal cluster and I'm having difficulties understanding
>> how
>> to configure things.
>>
>> What I would like to do is have ipython running on a local machine
>> (windows
>> & linux) connect to the cluster, request some nodes through SGE and run
>> the
>> computation. I'm not quite getting what goes where from the documentation.
>>
>> I think I understood the PBS example but I'm still not understanding where
>> I
>> would put the connection information to log into the cluster. I would
>> really
>> appreciate a step by step of what files need to be where and any example
>> config files for an SGE setup.
>>
>> thanks,
>>
>> - dharhas
>>
>>
>>
>>
>>
>> _______________________________________________
>> IPython-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>
>>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>


_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running multiple ipclusters on remote cluster w/ Sun Grid Engine.

Fernando Perez
Hi Dharhas,

On Fri, Sep 2, 2011 at 6:38 AM, Dharhas Pothina
<[hidden email]> wrote:
> One more related question. I've got things working as described below and it
> looks like each time I start ipcluster it overwrites the
> ipcontroller_client.json and the ipcontroller_engine.json file. Does this
> mean I can only have one ipcluster running? Or if I start the ipcluster and
> then copy the json file to the clients that require them can I then start
> another ipcluster job and use the new json file for the new clients.

If you pass the --reuse flag directly to the ipcontroller script, it
will reuse a connection file.  Let us know if this helps...

Cheers,

f
_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running multiple ipclusters on remote cluster w/ Sun Grid Engine.

MinRK


On Mon, Sep 12, 2011 at 14:35, Fernando Perez <[hidden email]> wrote:
Hi Dharhas,

On Fri, Sep 2, 2011 at 6:38 AM, Dharhas Pothina
<[hidden email]> wrote:
> One more related question. I've got things working as described below and it
> looks like each time I start ipcluster it overwrites the
> ipcontroller_client.json and the ipcontroller_engine.json file. Does this
> mean I can only have one ipcluster running?

Currently there is an expectation that controllers are singletons *per profile*.  You can start as many controllers as you like, as long as they are using different profiles.
 
Or if I start the ipcluster and
> then copy the json file to the clients that require them can I then start
> another ipcluster job and use the new json file for the new clients.

If you pass the --reuse flag directly to the ipcontroller script, it
will reuse a connection file.  Let us know if this helps...

Cheers,

f


_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running multiple ipclusters on remote cluster w/ Sun Grid Engine.

Dharhas Pothina

Hi,


I ended up writing a script that connected to the cluster and made a copy of an already created profile with a new unique name, started ipcluster, waited till the json file was created and then retrieved the json file for use in a local client, runs my script and then cleans up afterwards.


This seems to be working fairly well except when the local script exits because of an error. In that case, I need to log in and stop the engines, clean up files etc manually.


- dharhas

>>> MinRK <[hidden email]> 9/12/2011 4:46 PM >>>


On Mon, Sep 12, 2011 at 14:35, Fernando Perez <[hidden email]> wrote:

Hi Dharhas,

On Fri, Sep 2, 2011 at 6:38 AM, Dharhas Pothina

<[hidden email]> wrote:

> One more related question. I've got things working as described below and it
> looks like each time I start ipcluster it overwrites the
> ipcontroller_client.json and the ipcontroller_engine.json file. Does this
> mean I can only have one ipcluster running?


Currently there is an expectation that controllers are singletons *per profile*. You can start as many controllers as you like, as long as they are using different profiles.


Or if I start the ipcluster and
> then copy the json file to the clients that require them can I then start
> another ipcluster job and use the new json file for the new clients.

If you pass the --reuse flag directly to the ipcontroller script, it
will reuse a connection file. Let us know if this helps...


Cheers,

f



_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running multiple ipclusters on remote cluster w/ Sun Grid Engine.

Fernando Perez
Hi Dharhas,

On Wed, Sep 14, 2011 at 6:59 AM, Dharhas Pothina
<[hidden email]> wrote:
> I ended up writing a script that connected to the cluster and made a copy of
> an already created profile with a new unique name, started ipcluster, waited
> till the json file was created and then retrieved the json file for use in a
> local client, runs my script and then cleans up afterwards.
>
> This seems to be working fairly well except when the local script exits
> because of an error. In that case, I need to log in and stop the engines,
> clean up files etc manually.

OK.  We probably should remove the assumption of a 1 to 1 mapping
between profiles and running clusters, but that will require a fair
bit of reorganization of code that uses that assumption, so I'm glad
you found a solution for now.

Cheers,

f
_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running multiple ipclusters on remote cluster w/ Sun Grid Engine.

MinRK


On Wed, Sep 14, 2011 at 11:13, Fernando Perez <[hidden email]> wrote:
Hi Dharhas,

On Wed, Sep 14, 2011 at 6:59 AM, Dharhas Pothina
<[hidden email]> wrote:
> I ended up writing a script that connected to the cluster and made a copy of
> an already created profile with a new unique name, started ipcluster, waited
> till the json file was created and then retrieved the json file for use in a
> local client, runs my script and then cleans up afterwards.
>
> This seems to be working fairly well except when the local script exits
> because of an error. In that case, I need to log in and stop the engines,
> clean up files etc manually.

OK.  We probably should remove the assumption of a 1 to 1 mapping
between profiles and running clusters, but that will require a fair
bit of reorganization of code that uses that assumption, so I'm glad
you found a solution for now.

Yes, it's a pretty big deal that the only thing engines and clients need to know to connect to a cluster is the profile name.  That is lost entirely if we allow multiple clusters with a single profile, since profile name becomes ambiguous.  We would then need to add a second layer of specification for which controller to use within a given profile, e.g.:

ipengine --profile=mysge --controller-id=12345

I think I could add support for exactly this without much code change at all, though.

Feature Request opened on GitHub: https://github.com/ipython/ipython/issues/794
 

Cheers,

f


_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running multiple ipclusters on remote cluster w/ Sun Grid Engine.

Fernando Perez
On Wed, Sep 14, 2011 at 12:12 PM, MinRK <[hidden email]> wrote:
>
> I think I could add support for exactly this without much code change at
> all, though.
> Feature Request opened on
> GitHub: https://github.com/ipython/ipython/issues/794

Great, thanks!
_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running multiple ipclusters on remote cluster w/ Sun Grid Engine.

MinRK


On Wed, Sep 14, 2011 at 12:23, Fernando Perez <[hidden email]> wrote:
On Wed, Sep 14, 2011 at 12:12 PM, MinRK <[hidden email]> wrote:
>
> I think I could add support for exactly this without much code change at
> all, though.
> Feature Request opened on
> GitHub: https://github.com/ipython/ipython/issues/794

Great, thanks!

In fact, I've already got it working locally, so it was extremely easy.  The only thing left to do is allow ipcluster to be aware of it, which requires a little bit of thinking.

-MinRK


_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running multiple ipclusters on remote cluster w/ Sun Grid Engine.

Dharhas Pothina
In reply to this post by MinRK

It may be more a case of differing nomenclature. To me a profile/profile name is something you set up once and applies to a class of things. i.e within SGE we have a parallel environment (or profile) called mpich and when we tell any script to use that particular parallel environment it sets things up a certain way. When you actually submit a job to SGE using that profile it gets a jobid which is what you can use to track or kill the actual job.


The 1-1 correspondence makes sense if you plan to have the ipcluster running continuously of a certain number of cluster nodes and keep connecting and disconnecting with local ipython clients.


To me the use case that makes sense is different. We submit a job to run on a certain number of nodes and after the job s completed those the nodes are released for other non ipython runs like our fortran hydro models. In that case the 'profile' is what tells it how to submit a job to the sge queue etc and the job-id or controller-id is what we use to run the job/kill the job etc. Maybe --controller-id flag could be an optional parameter.


Another feature request is some way of knowing when the engines have all started up, depending on how busy the cluster SGE queue is the engine may not start up immediately. Right now, I'm using a while loop that checks for the presence of the json file every 5 seconds. This works but seem inelegant.


let me know if this use case makes sense or if I'm missing something in the way these features were designed to be used.


- dharhas



>>> MinRK <[hidden email]> 9/14/2011 2:12 PM >>>


On Wed, Sep 14, 2011 at 11:13, Fernando Perez <[hidden email]> wrote:

Hi Dharhas,

On Wed, Sep 14, 2011 at 6:59 AM, Dharhas Pothina

<[hidden email]> wrote:

> I ended up writing a script that connected to the cluster and made a copy of
> an already created profile with a new unique name, started ipcluster, waited
> till the json file was created and then retrieved the json file for use in a
> local client, runs my script and then cleans up afterwards.
>
> This seems to be working fairly well except when the local script exits
> because of an error. In that case, I need to log in and stop the engines,
> clean up files etc manually.

OK. We probably should remove the assumption of a 1 to 1 mapping
between profiles and running clusters, but that will require a fair
bit of reorganization of code that uses that assumption, so I'm glad
you found a solution for now.


Yes, it's a pretty big deal that the only thing engines and clients need to know to connect to a cluster is the profile name. That is lost entirely if we allow multiple clusters with a single profile, since profile name becomes ambiguous. We would then need to add a second layer of specification for which controller to use within a given profile, e.g.:


ipengine --profile=mysge --controller-id=12345


I think I could add support for exactly this without much code change at all, though.


Feature Request opened on GitHub: https://github.com/ipython/ipython/issues/794



Cheers,

f



_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Running multiple ipclusters on remote cluster w/ Sun Grid Engine.

MinRK


On Wed, Sep 14, 2011 at 14:32, Dharhas Pothina <[hidden email]> wrote:

It may be more a case of differing nomenclature. To me a profile/profile name is something you set up once and applies to a class of things. i.e within SGE we have a parallel environment (or profile) called mpich and when we tell any script to use that particular parallel environment it sets things up a certain way. When you actually submit a job to SGE using that profile it gets a jobid which is what you can use to track or kill the actual job. 


There is not a 1:1 correspondence of jobid to IPython cluster.  The controller may or may not be run via SGE, and you can have an arbitrary number of SGE jobs corresponding to engines.  Only in the case of non-SGE controller and single group of SGE engines is there 1 jobid per cluster.  SGE Controller+engines will have two job IDs, and if you add/remove engines over time, there can be 0-many job IDs associated with the cluster, and the active job ids are a function of time.  The *only* constant is the controller (again, which may or may not have a job id at all), and which will ultimately become resumable, so even its job id / pid cannot be assumed to be constant.
 


The 1-1 correspondence makes sense if you plan to have the ipcluster running continuously of a certain number of cluster nodes and keep connecting and disconnecting with local ipython clients.


In fact, it makes sense for *all* cases that don't include running multiple simultaneous clusters with identical configuration.


To me the use case that makes sense is different. We submit a job to run on a certain number of nodes and after the job s completed those the nodes are released for other non ipython runs like our fortran hydro models. In that case the 'profile' is what tells it how to submit a job to the sge queue etc and the job-id or controller-id is what we use to run the job/kill the job etc. Maybe --controller-id flag could be an optional parameter.


There is a bit of mismatch in design goals in IPython profiles, due to their evolution.  The entire profile system in IPython was developed for the purpose of consolidating the information about configuring and connecting to a single cluster instance (including repeated runs, but never simultaneous).  This has been expanded and adopted by IPython as a whole, for managing configurations and runtime files, and has come to mean something slightly different as a result.  The parallel code has not been changed to consider these ideas yet.

I think the restriction that a cluster is a singleton per-profile will remain, unless you specify a new cluster_id *for each additional cluster*.  The benefits of this assumption are just far too great to not make it by default.
 


Another feature request is some way of knowing when the engines have all started up, depending on how busy the cluster SGE queue is the engine may not start up immediately. Right now, I'm using a while loop that checks for the presence of the json file every 5 seconds. This works but seem inelegant.


Yes, this would certainly be useful.  Right now, there is no notion of a queued state for jobs, but it could conceivably be added (pull requests are welcome!).  I should note that polling for the JSON file only detects when the *controller* is running, and has nothing to do with engines.  Engines do not necessarily write any files.

SGE (as with all batch systems) already provides you with queue monitoring tools - there's no need to poll the filesystem, as you can just use qstat directly to see when engines have started.
 


let me know if this use case makes sense or if I'm missing something in the way these features were designed to be used.


I think this use case does make sense, but we are just running into the issue that ipcluster is not meant to solve every problem, and the false notion that ipcluster is the only (or even primary) way to start IPython. In fact, I think it is used far more often than it should be.  I think people wrongly assume that ipcluster is the only way to start an IPython cluster, when it is in fact only a convenient way to start *simple* clusters.  

ipcluster is intended as an extremely basic launcher.  Its purpose is to handle the simple cases of starting zero-to-one controller and one-to-many engines in various environments.  *All it does* is start these other processes with a bit of abstraction regarding what starting/stopping means with respect to qsub, mpi, etc.  It was never meant to handle every case, and never will.  Writing your own scripts, that call ipengine/ipcontroller directly, to submit via qsub will frequently be a better solution than ipcluster.

It is not at all difficult to replicate the subset of what ipcluster does for your environment with *much* simpler code that would ultimately be more useful and controllable for you.

-MinRK
 


- dharhas



>>> MinRK <[hidden email]> 9/14/2011 2:12 PM >>>


On Wed, Sep 14, 2011 at 11:13, Fernando Perez <[hidden email]> wrote:

Hi Dharhas,

On Wed, Sep 14, 2011 at 6:59 AM, Dharhas Pothina

<[hidden email]> wrote:

> I ended up writing a script that connected to the cluster and made a copy of
> an already created profile with a new unique name, started ipcluster, waited
> till the json file was created and then retrieved the json file for use in a
> local client, runs my script and then cleans up afterwards.
>
> This seems to be working fairly well except when the local script exits
> because of an error. In that case, I need to log in and stop the engines,
> clean up files etc manually.

OK. We probably should remove the assumption of a 1 to 1 mapping
between profiles and running clusters, but that will require a fair
bit of reorganization of code that uses that assumption, so I'm glad
you found a solution for now.


Yes, it's a pretty big deal that the only thing engines and clients need to know to connect to a cluster is the profile name. That is lost entirely if we allow multiple clusters with a single profile, since profile name becomes ambiguous. We would then need to add a second layer of specification for which controller to use within a given profile, e.g.:


ipengine --profile=mysge --controller-id=12345


I think I could add support for exactly this without much code change at all, though.


Feature Request opened on GitHub: https://github.com/ipython/ipython/issues/794



Cheers,

f



_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user



_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting setup on a remote cluster w/ Sun Grid Engine.

Ariel Rokem-2
In reply to this post by MinRK
Hi everyone,

Following up on this thread, I am trying to get this working on the SGE on our local cluster (thankfully, everyone is away at a conference, so I have the cluster pretty much to myself. Good week for experimenting...).

I updated my fork from ipython/master this afternoon and followed the instructions below. I am getting the following behavior:

celadon:~  $ipcluster start --n=10 --profile=sge
[IPClusterStart] Using existing profile dir: u'/home/arokem/.config/ipython/profile_sge'
[IPClusterStart] Starting ipcluster with [daemon=False]
[IPClusterStart] Creating pid file: /home/arokem/.config/ipython/profile_sge/pid/ipcluster.pid
[IPClusterStart] Starting PBSControllerLauncher: ['qsub', u'./sge_controller']
[IPClusterStart] adding job array settings to batch script
ERROR:root:Error in periodic callback
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line 423, in _run
    self.callback()
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/ipclusterapp.py", line 497, in start_controller
    self.controller_launcher.start()
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 1022, in start
    return super(SGEControllerLauncher, self).start(1)
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 936, in start
    self.write_batch_script(n)
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 925, in write_batch_script
    script_as_string = self.formatter.format(self.batch_template, **self.context)
  File "/usr/lib64/python2.7/string.py", line 545, in format
    return self.vformat(format_string, args, kwargs)
  File "/usr/lib64/python2.7/string.py", line 549, in vformat
    result = self._vformat(format_string, args, kwargs, used_args, 2)
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/utils/text.py", line 652, in _vformat
    obj = eval(field_name, kwargs)
  File "<string>", line 1, in <module>
NameError: name 'n' is not defined
[IPClusterStart] Starting 10 engines
[IPClusterStart] Starting 10 engines with SGEEngineSetLauncher: ['qsub', u'./sge_engines']
[IPClusterStart] adding job array settings to batch script
[IPClusterStart] Writing instantiated batch script: ./sge_engines
[IPClusterStart] Job submitted with job id: '430658'
[IPClusterStart] Process 'qsub' started: '430658'
[IPClusterStart] Engines appear to have started successfully

It looks like something goes wrong (the NameError), but then the jobs get submitted and for a brief time, qmon does acknowledge the existence of a list of jobs with that id, but then it disappears (somehow gets deleted?) from qmon almost immediately and when I try to initialize a parallel.Client with the "sge" profile in an ipython session, I get a "TimeoutError: Hub connection request timed out". I also tried initializing ipcluster with the default profile and run some computations and I am getting the approximately 7-fold expected speed-up (on an 8 core machine), so some things do work. Does anyone have any idea what is going wrong with the SGE?

Thanks,

Ariel




On Wed, Aug 24, 2011 at 3:07 PM, MinRK <[hidden email]> wrote:
On Wed, Aug 24, 2011 at 15:05, Dharhas Pothina
<[hidden email]> wrote:
>
> I was able to start the engines and they were submitted to the queue
> properly but I do not have a json file in the corresponding security folder.
> Do I need to do something to generate it.

The JSON file is written by ipcontroller, so it will only show up
after the controller has started.

>
> - dharhas
>
>>>> MinRK <[hidden email]> 8/24/2011 4:44 PM >>>
> On a login node on the cluster:
>
> # create profile with default parallel config files, called sge
> [login] $> ipython profile create sge --parallel
>
> edit IPYTHON_DIR/profile_sge/ipcontroller_config.py, adding the line:
>
> c.HubFactory.ip = '0.0.0.0'
>
> to instruct the controller to listen on all interfaces.
>
> Edit IPYTHON_DIR/profile_sge/ipcluster_config.py, adding the line:
>
> c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'
> c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'
>
> # optional: specify a queue for all:
> c.SGELauncher.queue = 'short'
> To instruct ipcluster to use SGE to launch the engines and the controller
>
> At this point, you can start 10 engines and a controller with:
>
> [login] $> ipcluster start -n 10 --profile=sge
>
> Now the only file you will need to connect to the cluster will be in:
>
> IPYTHON_DIR/profile_sge/security/ipcontroller_client.json
>
> Just move that file around, and you will be able to connect clients.
> To connect from a laptop, you will probably need to specify a login
> node as the ssh server when you do:
>
> from IPython import parallel
>
> rc = parallel.Client('/path/to/ipcontroller_client.json',
> sshserver='[hidden email]')
>
> -MinRK
>
>
> On Wed, Aug 24, 2011 at 13:18, Dharhas Pothina
> <[hidden email]> wrote:
>> Hi All,
>>
>> We have managed to parallelize one of our spatial interpolation scripts
>> very
>> easily with the new ipython parallel. Thanks for developing such a great
>> tool, it was fairly easy to get working. Now we are trying to set things
>> up
>> to run on our internal cluster and I'm having difficulties understanding
>> how
>> to configure things.
>>
>> What I would like to do is have ipython running on a local machine
>> (windows
>> & linux) connect to the cluster, request some nodes through SGE and run
>> the
>> computation. I'm not quite getting what goes where from the documentation.
>>
>> I think I understood the PBS example but I'm still not understanding where
>> I
>> would put the connection information to log into the cluster. I would
>> really
>> appreciate a step by step of what files need to be where and any example
>> config files for an SGE setup.
>>
>> thanks,
>>
>> - dharhas
>>
>>
>>
>>
>>
>> _______________________________________________
>> IPython-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>
>>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user


_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting setup on a remote cluster w/ Sun Grid Engine.

MinRK


On Mon, Nov 14, 2011 at 19:10, Ariel Rokem <[hidden email]> wrote:
Hi everyone,

Following up on this thread, I am trying to get this working on the SGE on our local cluster (thankfully, everyone is away at a conference, so I have the cluster pretty much to myself. Good week for experimenting...).

I updated my fork from ipython/master this afternoon and followed the instructions below. I am getting the following behavior:

celadon:~  $ipcluster start --n=10 --profile=sge
[IPClusterStart] Using existing profile dir: u'/home/arokem/.config/ipython/profile_sge'
[IPClusterStart] Starting ipcluster with [daemon=False]
[IPClusterStart] Creating pid file: /home/arokem/.config/ipython/profile_sge/pid/ipcluster.pid
[IPClusterStart] Starting PBSControllerLauncher: ['qsub', u'./sge_controller']
[IPClusterStart] adding job array settings to batch script
ERROR:root:Error in periodic callback
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line 423, in _run
    self.callback()
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/ipclusterapp.py", line 497, in start_controller
    self.controller_launcher.start()
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 1022, in start
    return super(SGEControllerLauncher, self).start(1)
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 936, in start
    self.write_batch_script(n)
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 925, in write_batch_script
    script_as_string = self.formatter.format(self.batch_template, **self.context)
  File "/usr/lib64/python2.7/string.py", line 545, in format
    return self.vformat(format_string, args, kwargs)
  File "/usr/lib64/python2.7/string.py", line 549, in vformat
    result = self._vformat(format_string, args, kwargs, used_args, 2)
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/utils/text.py", line 652, in _vformat
    obj = eval(field_name, kwargs)
  File "<string>", line 1, in <module>
NameError: name 'n' is not defined
[IPClusterStart] Starting 10 engines
[IPClusterStart] Starting 10 engines with SGEEngineSetLauncher: ['qsub', u'./sge_engines']
[IPClusterStart] adding job array settings to batch script
[IPClusterStart] Writing instantiated batch script: ./sge_engines
[IPClusterStart] Job submitted with job id: '430658'
[IPClusterStart] Process 'qsub' started: '430658'
[IPClusterStart] Engines appear to have started successfully

It looks like something goes wrong (the NameError), but then the jobs get submitted and for a brief time, qmon does acknowledge the existence of a list of jobs with that id, but then it disappears (somehow gets deleted?) from qmon almost immediately and when I try to initialize a parallel.Client with the "sge" profile in an ipython session, I get a "TimeoutError: Hub connection request timed out". I also tried initializing ipcluster with the default profile and run some computations and I am getting the approximately 7-fold expected speed-up (on an 8 core machine), so some things do work. Does anyone have any idea what is going wrong with the SGE?

This is a horrible typo that crept in when I did some reorganization in the launchers.  Should be fixed in master.

The TimeoutError in the client generally means that the controller isn't running, or at least isn't where connection files claimed it to be.

 

Thanks,

Ariel




On Wed, Aug 24, 2011 at 3:07 PM, MinRK <[hidden email]> wrote:
On Wed, Aug 24, 2011 at 15:05, Dharhas Pothina
<[hidden email]> wrote:
>
> I was able to start the engines and they were submitted to the queue
> properly but I do not have a json file in the corresponding security folder.
> Do I need to do something to generate it.

The JSON file is written by ipcontroller, so it will only show up
after the controller has started.

>
> - dharhas
>
>>>> MinRK <[hidden email]> 8/24/2011 4:44 PM >>>
> On a login node on the cluster:
>
> # create profile with default parallel config files, called sge
> [login] $> ipython profile create sge --parallel
>
> edit IPYTHON_DIR/profile_sge/ipcontroller_config.py, adding the line:
>
> c.HubFactory.ip = '0.0.0.0'
>
> to instruct the controller to listen on all interfaces.
>
> Edit IPYTHON_DIR/profile_sge/ipcluster_config.py, adding the line:
>
> c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'
> c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'
>
> # optional: specify a queue for all:
> c.SGELauncher.queue = 'short'
> To instruct ipcluster to use SGE to launch the engines and the controller
>
> At this point, you can start 10 engines and a controller with:
>
> [login] $> ipcluster start -n 10 --profile=sge
>
> Now the only file you will need to connect to the cluster will be in:
>
> IPYTHON_DIR/profile_sge/security/ipcontroller_client.json
>
> Just move that file around, and you will be able to connect clients.
> To connect from a laptop, you will probably need to specify a login
> node as the ssh server when you do:
>
> from IPython import parallel
>
> rc = parallel.Client('/path/to/ipcontroller_client.json',
> sshserver='[hidden email]')
>
> -MinRK
>
>
> On Wed, Aug 24, 2011 at 13:18, Dharhas Pothina
> <[hidden email]> wrote:
>> Hi All,
>>
>> We have managed to parallelize one of our spatial interpolation scripts
>> very
>> easily with the new ipython parallel. Thanks for developing such a great
>> tool, it was fairly easy to get working. Now we are trying to set things
>> up
>> to run on our internal cluster and I'm having difficulties understanding
>> how
>> to configure things.
>>
>> What I would like to do is have ipython running on a local machine
>> (windows
>> & linux) connect to the cluster, request some nodes through SGE and run
>> the
>> computation. I'm not quite getting what goes where from the documentation.
>>
>> I think I understood the PBS example but I'm still not understanding where
>> I
>> would put the connection information to log into the cluster. I would
>> really
>> appreciate a step by step of what files need to be where and any example
>> config files for an SGE setup.
>>
>> thanks,
>>
>> - dharhas
>>
>>
>>
>>
>>
>> _______________________________________________
>> IPython-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>
>>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user


_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user



_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Getting setup on a remote cluster w/ Sun Grid Engine.

MinRK


On Tue, Nov 15, 2011 at 09:57, Ariel Rokem <[hidden email]> wrote:


On Mon, Nov 14, 2011 at 7:45 PM, MinRK <[hidden email]> wrote:


On Mon, Nov 14, 2011 at 19:10, Ariel Rokem <[hidden email]> wrote:
Hi everyone,

Following up on this thread, I am trying to get this working on the SGE on our local cluster (thankfully, everyone is away at a conference, so I have the cluster pretty much to myself. Good week for experimenting...).

I updated my fork from ipython/master this afternoon and followed the instructions below. I am getting the following behavior:

celadon:~  $ipcluster start --n=10 --profile=sge
[IPClusterStart] Using existing profile dir: u'/home/arokem/.config/ipython/profile_sge'
[IPClusterStart] Starting ipcluster with [daemon=False]
[IPClusterStart] Creating pid file: /home/arokem/.config/ipython/profile_sge/pid/ipcluster.pid
[IPClusterStart] Starting PBSControllerLauncher: ['qsub', u'./sge_controller']
[IPClusterStart] adding job array settings to batch script
ERROR:root:Error in periodic callback
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/zmq/eventloop/ioloop.py", line 423, in _run
    self.callback()
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/ipclusterapp.py", line 497, in start_controller
    self.controller_launcher.start()
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 1022, in start
    return super(SGEControllerLauncher, self).start(1)
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 936, in start
    self.write_batch_script(n)
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/parallel/apps/launcher.py", line 925, in write_batch_script
    script_as_string = self.formatter.format(self.batch_template, **self.context)
  File "/usr/lib64/python2.7/string.py", line 545, in format
    return self.vformat(format_string, args, kwargs)
  File "/usr/lib64/python2.7/string.py", line 549, in vformat
    result = self._vformat(format_string, args, kwargs, used_args, 2)
  File "/home/arokem/usr/local/lib/python2.7/site-packages/IPython/utils/text.py", line 652, in _vformat
    obj = eval(field_name, kwargs)
  File "<string>", line 1, in <module>
NameError: name 'n' is not defined
[IPClusterStart] Starting 10 engines
[IPClusterStart] Starting 10 engines with SGEEngineSetLauncher: ['qsub', u'./sge_engines']
[IPClusterStart] adding job array settings to batch script
[IPClusterStart] Writing instantiated batch script: ./sge_engines
[IPClusterStart] Job submitted with job id: '430658'
[IPClusterStart] Process 'qsub' started: '430658'
[IPClusterStart] Engines appear to have started successfully

It looks like something goes wrong (the NameError), but then the jobs get submitted and for a brief time, qmon does acknowledge the existence of a list of jobs with that id, but then it disappears (somehow gets deleted?) from qmon almost immediately and when I try to initialize a parallel.Client with the "sge" profile in an ipython session, I get a "TimeoutError: Hub connection request timed out". I also tried initializing ipcluster with the default profile and run some computations and I am getting the approximately 7-fold expected speed-up (on an 8 core machine), so some things do work. Does anyone have any idea what is going wrong with the SGE?

This is a horrible typo that crept in when I did some reorganization in the launchers.  Should be fixed in master.

Yes - fixed. I don't see that NameError anymore. Thanks!

 
The TimeoutError in the client generally means that the controller isn't running, or at least isn't where connection files claimed it to be.


OK - I think that the controller really was not there before, but now it is being started, but I am still having trouble getting my engines to persist on the sge. I see them get created through qmon, as well as the ipcontroller, but then the engine jobs are almost immediately deleted from the "running jobs". The controller job persists and when I initialize a client I don't get a TimeoutError, but rather get a client object with an empty ids list. Is that still a problem with the connection files? Are those the ones that are under ~.config/ipython/profile_sge/security?

It could be, or it could be an issue of the engines giving up too soon, if the controller isn't ready for them.  What is the output of the engine jobs?

The most likely cases:

1. the engines start before the controller has written the connection files. They will wait up to `IPEngineApp.wait_for_url_file` for that file to exist (default 5s), then give up.
2. same as 1., but old files exist, so the connection info will be stale.  This is normally addressed by setting `IPControllerApp.reuse_files=True`, but I'm not sure that works when the controller is started by SGE, where it won't consistently be on the same host. You may want to manually empty the security dir (IPYTHON_DIR/profile_sge/security) prior to starting the cluster, to prevent this case.
3. connection info is wrong - the Controller is not listening on the right interface, or is listening on localhost only (the default).  This is `HubFactory.ip`
4. regular timeout (controlled by `Engine.timeout`) - the connection info is correct, but the controller does not respond promptly (This value can need to be large in cases where `reuse_files=True`, and the controller/engines start simultaneously or out-of-order).

The easiest solution to all this is usually to increase IPClusterStart.delay, which is a delay (in seconds) between starting the Controller and starting the engines when you do `ipcluster start`.  This is less effective in SGE, where the time between calling `ipcluster start` and the jobs actually starting on nodes can be hours - so a few seconds of delay in submitting the batch jobs has no effect.  It may be sufficient if your queue is clear, and jobs start right away.

Depending on your sysadmin, it may make sense to *not* start the Controller with SGE, and only entrust SGE with the engines.  This gives you more control over the order of events.  There is no need *in general* for your Controller and Engines to use the same launchers.  There is a reason they are separate config variables.

-MinRK


 


 

Thanks,

Ariel




On Wed, Aug 24, 2011 at 3:07 PM, MinRK <[hidden email]> wrote:
On Wed, Aug 24, 2011 at 15:05, Dharhas Pothina
<[hidden email]> wrote:
>
> I was able to start the engines and they were submitted to the queue
> properly but I do not have a json file in the corresponding security folder.
> Do I need to do something to generate it.

The JSON file is written by ipcontroller, so it will only show up
after the controller has started.

>
> - dharhas
>
>>>> MinRK <[hidden email]> 8/24/2011 4:44 PM >>>
> On a login node on the cluster:
>
> # create profile with default parallel config files, called sge
> [login] $> ipython profile create sge --parallel
>
> edit IPYTHON_DIR/profile_sge/ipcontroller_config.py, adding the line:
>
> c.HubFactory.ip = '0.0.0.0'
>
> to instruct the controller to listen on all interfaces.
>
> Edit IPYTHON_DIR/profile_sge/ipcluster_config.py, adding the line:
>
> c.IPClusterEngines.engine_launcher_class = 'SGEEngineSetLauncher'
> c.IPClusterStart.controller_launcher_class = 'SGEControllerLauncher'
>
> # optional: specify a queue for all:
> c.SGELauncher.queue = 'short'
> To instruct ipcluster to use SGE to launch the engines and the controller
>
> At this point, you can start 10 engines and a controller with:
>
> [login] $> ipcluster start -n 10 --profile=sge
>
> Now the only file you will need to connect to the cluster will be in:
>
> IPYTHON_DIR/profile_sge/security/ipcontroller_client.json
>
> Just move that file around, and you will be able to connect clients.
> To connect from a laptop, you will probably need to specify a login
> node as the ssh server when you do:
>
> from IPython import parallel
>
> rc = parallel.Client('/path/to/ipcontroller_client.json',
> sshserver='[hidden email]')
>
> -MinRK
>
>
> On Wed, Aug 24, 2011 at 13:18, Dharhas Pothina
> <[hidden email]> wrote:
>> Hi All,
>>
>> We have managed to parallelize one of our spatial interpolation scripts
>> very
>> easily with the new ipython parallel. Thanks for developing such a great
>> tool, it was fairly easy to get working. Now we are trying to set things
>> up
>> to run on our internal cluster and I'm having difficulties understanding
>> how
>> to configure things.
>>
>> What I would like to do is have ipython running on a local machine
>> (windows
>> & linux) connect to the cluster, request some nodes through SGE and run
>> the
>> computation. I'm not quite getting what goes where from the documentation.
>>
>> I think I understood the PBS example but I'm still not understanding where
>> I
>> would put the connection information to log into the cluster. I would
>> really
>> appreciate a step by step of what files need to be where and any example
>> config files for an SGE setup.
>>
>> thanks,
>>
>> - dharhas
>>
>>
>>
>>
>>
>> _______________________________________________
>> IPython-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/ipython-user
>>
>>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
> _______________________________________________
> IPython-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user


_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user





_______________________________________________
IPython-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/ipython-user
Loading...