[GE users] Upper bound for array jobs?

Andy Schwierskott andy.schwierskott at sun.com
Fri Aug 27 15:31:23 BST 2004


Hi,

> Andy Schwierskott wrote:
>
>> Bernard,
>> 
>>> Sorry for not being clear, what I wanted to ask was what's the upper
>>> bound that SGE can handle (not whether I can limit it).
>>> 
>>> Our problem right now is being outlined here:
>>> 
>>> http://gridengine.sunsource.net/servlets/ReadMsg?msgId=20558&lsistName=u
>>> sers
>> 
>> 
>> I'm getting an error message when trying this URL - not sure if it's
>> temporary.
>
>
> Try this one:
>
> http://tinyurl.com/3ofrj

Thanks;-)

The reason was the typo "lsistName" instead of "listName".

Anyhow: This problem looks for me like a commlib problem or bug - i don#t
think it has to do with the size of array jobs.

How many execution host are configured in this cluster? Is there an
indication that ths happens many many jobs are scheduled in a single
scheduler run an qmaster tries to send them to all the execd's?

Andy

Andy

>
>> 
>>> It seems that when there are large amounts of job in the queue, commd
>>> simply gets stuck and the program becomes irresponsive.  There have been
>>> various comments in the mailing-list about commd issues, I wonder if
>>> they are somewhat related?
>>> 
>>> Previously I was using 5.3p5 but we have already updated to 5.3p6 - the
>>> problem still persists.
>> 
>> 
>> The size of array jobs should not at all influence the commd. The protocol
>> between qmaster and execd does not depend whether this is an array job or
>> not.
>> 
>> Having 26,000 jobs in the system or having an array job with  26,000
>> tasks should not matter. Are you experiencing any problems
>> 
>> I remember thaere have been reports on the mailing list which indicate 
>> that
>> there are problems related to array jobs - however so far we were not able
>> to reproduce such problems. We'd need some description how to reproduce 
>> the
>> problem - otherwise it will be quite difficult to look into that problem.
>> 
>> Could you do any tests with 6.0(u1) - do you experience the same array job
>> problems?
>> 
>> Andy
>> 
>> 
>>> 
>>> Thanks,
>>> 
>>> Bernard
>>> 
>>>> -----Original Message-----
>>>> From: Andy Schwierskott [mailto:andy.schwierskott at sun.com]
>>>> Sent: Thursday, August 26, 2004 1:27
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] Upper bound for array jobs?
>>>> 
>>>> Bernard,
>>>> 
>>>> see sge_conf(5):
>>>> 
>>>>     max_aj_tasks
>>>> 
>>>> and probably
>>>> 
>>>>     max_aj_instances
>>>> 
>>>> Which problems did you encounter?
>>>> 
>>>> Andy
>>>> 
>>>>> Is there a limit to how big an array job can be?  Have people
>>>>> encountered problems with array jobs with 26,000 tasks?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Bernard

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list