[GE users] Correct accounting with mpich-mx tight integration

Reuti reuti at staff.uni-marburg.de
Thu May 10 17:09:16 BST 2007


Hi,

Am 10.05.2007 um 13:51 schrieb Chris Rudge:

> I've set up a PE to get tight MPI integration with mpich-mx but wonder
> if there's a way to get walltime accounting correct.
>
> Using sge_mpirun, rsh is replaced with 'qrsh -inherit'. As mpich-mx  
> runs
> an rsh (or now qrsh) to launch every process the PE has to have  
> "job is
> first task" set to false. This appears to mean that, for a 16 cpu job,
> there are 17 lots of accounting done - the 16 MPI processes plus the
> job. This is OK for cpu time but is wrong for walltime.
>
> Is there any way to avoid this with mpich-mx?

you mean the accumulation in the qacct command? The problem with the  
wallclock will hit you in different ways: e.g. Gaussian is not  
computing all steps in parallel, although the slots are reserverd for  
you in the cluster. Simple approach to deal with that is to use the  
number of granted slots multiplied by the wallclock time of the  
master job with a small script.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list