[GE users] array jobs mess up fair share ??

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Tue Jul 8 14:02:27 BST 2008


Hi Chris,

I know this phenomenon, but unfortunately I can't explain it.

Please find my question on your setup in

    http://gridengine.sunsource.net/issues/show_bug.cgi?id=2298

my hope is that you and I could jointly encirle the problem step-by-step.

Regards,
Andreas

On Tue, 8 Jul 2008, Chris Rudge wrote:

> Andreas,
>
> Hopefully the following information taken from sge_share_mon output will
> start to give a clue as to what's going wrong.
>
> My understanding is that the "cpu" value is the cpu time used in
> seconds. In a 15 second interval between reports from sge_share_mon, if
> a project is efficiently using 10 cpus on the cluster, then the cpu
> value would increase by 150 (15 seconds * 10 cpus).
>
> For a project not using array jobs, I can see that this is indeed true -
> give or take an allowance for inefficient parallel jobs. In 15 seconds,
> the cpu value increases by about 1200 and the project is using about 80
> cpus on the cluster.
>
> For the project using array jobs, in a 15 second period the cpu value
> increases by around 60,000 !?! This would suggest they're using about
> 4,000 cpus on the cluster. This is obviously wrong on our 264 cpu
> cluster. I can see that the jobs for this project are:
>
> # qstat -u ajh67,hb100
> job-ID  prior   name       user         state submit/start at     queue            slots ja-task-ID
> ----------------------------------------------------------------------------------------------
> 901068 1.37111 fsi_30_ajh ajh67        r     07/07/2008 12:15:45 default.q at comp24     1
> 901172 1.36839 run_job.sh hb100        r     07/07/2008 17:23:37 default.q at comp24     1 9031
> 901172 1.36839 run_job.sh hb100        r     07/07/2008 17:23:37 default.q at comp24     1 9032
> 901172 1.36839 run_job.sh hb100        r     07/07/2008 17:23:37 default.q at comp24     1 9033
> 901067 1.37111 fsi_23_ajh ajh67        r     07/07/2008 12:15:45 default.q at comp36     1
> 901172 1.36839 run_job.sh hb100        r     07/07/2008 17:23:37 default.q at comp36     1 9034
> 901172 1.36839 run_job.sh hb100        r     07/07/2008 17:23:37 default.q at comp36     1 9035
> 901172 1.36839 run_job.sh hb100        r     07/07/2008 17:23:37 default.q at comp36     1 9036
> 901172 1.36839 run_job.sh hb100        r     07/08/2008 11:16:21 default.q at comp65     1 9037
> 901172 1.36839 run_job.sh hb100        r     07/08/2008 11:16:21 default.q at comp65     1 9038
> 901172 1.36839 run_job.sh hb100        r     07/08/2008 11:17:06 default.q at comp65     1 9039
> 901172 1.36839 run_job.sh hb100        r     07/08/2008 11:17:36 default.q at comp65     1 9040
> 901172 1.26780 run_job.sh hb100        qw    07/07/2008 17:23:28                      1 9041-9060:1
>
> i.e. two serial jobs for user ajh67 and an array job with 30 tasks for
> user hb100 of which 10 are running. Note that these aren't the last 30
> tasks of a 9060 task array job but are the 30 tasks of an array job with
> task range 9031-9060.
>
> Regards,
> Chris
>
>
> On Mon, 2008-07-07 at 18:08 +0200, Andreas.Haas at Sun.COM wrote:
>> Hi Chris,
>>
>> please find my reply in
>>
>>     http://gridengine.sunsource.net/issues/show_bug.cgi?id=2298
>>
>>
>> Regards,
>> Andreas
>>
>
> -- 
> Dr Chris Rudge
> chris.rudge at astro.le.ac.uk
>
> Research Computing Manager
> Dept of Physics & Astronomy
> University of Leicester
> LE1 7RH
>
> web.  www.ukaff.ac.uk
> Tel.  +44 (0)116 2523331
> Fax.  +44 (0)116 2231283
> Mob.  +44 (0)794 1379420
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

http://gridengine.info/

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list