[GE users] Difference between qmod y load thresholds for parrallel job?

Reuti reuti at staff.uni-marburg.de
Tue May 31 15:10:48 BST 2005


Jerome,

this looks like the suspend of the queue will suspend all tasks running 
there, regardlessly whether these are real jobs or just qrsh tasks. The 
suspend/threshold instead will suspend only (?) the jobs running on this 
machine. So freeing up the machine for the qrsh tasks running there.

But I'm not sure, whether a partitial suspend of a MPICH2 job, which you 
are looking for, would be good at all: if SGE would stop just some qrsh 
tasks on some machines, and on other machines the other tasks of the 
same job would continue and wait for a reply of the suspended nodes. Not 
really advantageous IMO.

CU - Reuti

Jerome wrote:
> I'm a new user of Grid Engine (with Rocks 3.3.0).
> I'm trying to use the load/thresholds possibilities with a parralel job 
> using a smpd deamonless system (as indicated in the "Tight Integration 
> of the MPICH2 library into SGE" pdf document).
> As i can see, the SIGSTOP signal stop well the "mpiexec" and the "qrsh" 
> program, but the mpi program stay running. And so, the load average 
> still increasing. That's not very usefull...
> 
> But if i use the queue control panel as the SGE administrador, and 
> suspend the queue where this job is running, all goes good. I can see 
> that all the programs (mpiexec, qrsh and the mpi-program) are stopped.
> So what's the difference between the control panel from the 
> administrador and the load/thresholds system?
> 
> Hope that's someone can help me:
> Thank's a lot.
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list