[GE users] Upper bound for array jobs?

Bernard Li bli at bcgsc.ca
Fri Aug 27 18:15:23 BST 2004


Hi Andy: 

> Anyhow: This problem looks for me like a commlib problem or 
> bug - i don#t think it has to do with the size of array jobs.
> 
> How many execution host are configured in this cluster? Is 
> there an indication that ths happens many many jobs are 
> scheduled in a single scheduler run an qmaster tries to send 
> them to all the execd's?

Currently we have 192 execution hosts which are all running SGE 5.3p6.
I was able to reproduce the problem on our end by simply submitting an
array job with 26,000 tasks.

Basically, the way users submit array jobs (in our case) is they have a
batchfile which lists all the commands to be executed - so each separate
line in the batchfile corresponds to a separate task.  A script will
parse this batchfile and create corresponding script files which will
then be submitted as one big array job.

Now that I think about it, perhaps this is the reason behind this?  In
the case of an array job with 26,000 tasks, there will be 26,000
separate script files created - perhaps commd got stuck trying to
communicate all that info?

Thanks for the help.

Cheers,

Bernard

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list