[GE users] scheduler problems mpi job waits for ever

Rene Salmon rsalmon at tulane.edu
Tue Aug 17 15:24:54 BST 2004


Hello,

I need some help setting up my scheduler.  I have a cluster running
SGEEE 5.3p5.  The cluster has one queue per host and these queues run both
single cpu jobs and MPI jobs.

The problem I have is that an mpi job will usually request 6-8 cpus
and will end up waiting for ever for these to be available because single
cpu jobs keep skipping ahead of the mpi job and running on some available
single cpus that are in the queues.



queuename            qtype used/tot. load_avg arch       states
----------------------------------------------------------------------------
compute-0-0.q        BIP   1/2       0.97     lx24-amd64
    309     0 just_200_r btice        r     08/17/2004 00:52:17 MASTER
----------------------------------------------------------------------------
compute-0-1.q        BIP   0/2       0.00     lx24-amd64
----------------------------------------------------------------------------
compute-0-2.q        BIP   2/2       2.04     lx24-amd64
    308     0 full_run   btice        r     08/16/2004 14:23:03 MASTER
    310     0 full_run   btice        r     08/17/2004 03:28:04 MASTER
----------------------------------------------------------------------------
compute-0-3.q        BIP   1/2       1.00     lx24-amd64
    307     0 full_run   btice        r     08/16/2004 05:11:31 MASTER
----------------------------------------------------------------------------
compute-0-4.q        BIP   1/2       1.00     lx24-amd64
    288     0 JasonJob18 jconsta      r     08/15/2004 15:49:38 MASTER

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING
JOBS
############################################################################
    285     0 dyn13      bishop       qw    08/13/2004 09:06:15
    311     0 full_run   btice        qw    08/17/2004 01:07:35
    312     0 full_run   btice        qw    08/17/2004 03:37:41




for example dyn13 is the mpi job that requested 6 cpus.  as you can see it
has been waiting for a while and single cpu jobs that get queued after it
skip ahead and run on the cpus that are available.

Any one know how I can help this starving mpi jobs?

Thanks
Rene


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list