Using 'saw', I'm trying to understand the CPU usage levels that I'm
observing on the 40 engines (10x2x2 xeon) relative to the load on the
controller CPU (the remote controller is instantiated on yet another
In my outermost for-loop, I use executeAll() followed by a pullAll(). I
inserted a time.sleep(5) between them, and observed the CPU load shown
on the attached graph. On this graph, each host's % utilization for
each CPU is summed together, for a max of 4. The mp* hosts run 4
engines each, and the tp host runs the controller, and the most recent
stats appear on the right (axis mis-labeled). Thus, the hump between 10
and 17 corresponds to the executeAll(), between 17 and 21 the sleep(),
and 22 to 25 the pullAll().
What is surprising to me is that the load on the controller lasts so
much longer (in time) than the load on the engines caused by the
pullAll(), although they probably have the same area ...
So, I guess there is nothing shocking in the graph, although it would be
interesting to see how things would change if the controller were able
to use more than 1 CPU.
BTW, the low time between 25 and 27 is a disk access, and I plan to
reduce the effect of this by running a tandem set of engines that simply
pull the data into the NFS cache ... we'll see how that goes.
I think what is going on is the following. When you issue a pullAll
command, and that request reaches the engines, the engines each send
back the data. Once that data has been sent to the controller, the
engine load goes back down. But, now think about what the controller
has to do:
1) Receive all the data *from every engine*
2) Collate the resulting data from each engine into a single list of
3) Send the data back to the RemoteController.
Steps 2 and 3 won't happen until after the engine load goes back down,
so I think that is what you are seeing. The other aspect the
amplifies this effect, is that each engine only handles 1 object,
whereas the controller handles N objects (for N engines). There is
simply a lot more for the controller to do.
One thing that this shows is that the controller can be a bottleneck
for certain types of algorithms. Anytime, I end up with such a
bottleneck, I try to see if there are ways of moving more things onto
the engines and avoid the data movement through the controller.
With that said, I guess we need to submit the final version of the
slides by tomorrow. Are you on target for that. It is great to got
this working with 40 engines.
> So, I guess there is nothing shocking in the graph, although it would be
> interesting to see how things would change if the controller were able
> to use more than 1 CPU.
You can do this right now pretty easily - but I am not sure it is
worth it. You could start two controllers and have 20 engines connect
to each. Then in your client code you would simply create 2
RemoteControllers and write the algorithm in terms of those. In the
future, we really need to create a MetaRemoteController object that
supports the notion of aggregating multiple RemoteController objects
into a single one. But, even if you do this, it is possible that the
process running the RemoteController will still be a bottleneck. Not
sure if you have time to explore all this.
Let me know if you need anything else.
> Best Regards,
> Glen Mabey
> BTW, the low time between 25 and 27 is a disk access, and I plan to
> reduce the effect of this by running a tandem set of engines that simply
> pull the data into the NFS cache ... we'll see how that goes.
> IPython-dev mailing list
> [hidden email] > http://lists.ipython.scipy.org/mailman/listinfo/ipython-dev >