[GE users] problems with queueing and scheduling after upgrading to 6.2u5

reuti reuti at staff.uni-marburg.de
Fri Mar 19 11:16:46 GMT 2010


Am 19.03.2010 um 11:58 schrieb Marco Donauer:

> what kind of jobs do you submit?
> Are these jobs long runnings or do you permanently resubmit short
> running jobs?

These jobs request "-l h_rt=864000" (i.e. 10 days) and they are going to the right queue usually (there usually around 100 jobs waiting in this condition). But once in a while in a scheduling cycle a bunch of jobs end up in a queue with a h_rt of 3600 set in the queue definition and get killed.

I checked the accounting file where also the options of the command line are recoreded and the user specified the right things for these jobs.


> I think about to submit sleeper jobs for this.

This I tried on a test cluster with a bunch of jobs which are sitting there for days, but there it's not happening up to now.

-- Reuti


> Marco
> 
> 
> Am 19.03.2010 11:52, schrieb reuti:
>> Hi,
>> 
>> Am 19.03.2010 um 11:39 schrieb dom:
>> 
>> 
>>> Serge,
>>> 
>>> I'm trying to reproduce this issue and currently it works for me.
>>> It could be that I misunderstood your setup and I'm using a different setup.
>>> Could you send me some more information on this. How you setup your queues (eg. send the qconf sq output if possible) and your complexlist and the way how you submit jobs.
>>> This would be very helpful.
>>> 
>> yes, also the issue I entered can't be reproduced "on command". It happens once in a while and then it's running fine for days again. I set up to get an eMail when this is happening, but unfortunately the last time it happened it was in the middle of the night and an alarm clock isn't connected. Let's wait for the next time...
>> 
>> -- Reuti
>> 
>> 
>> 
>>> Is you hight queue subordinating med.q and low.q or only med.q which subordinates the low.q?
>>> 
>>> Marco
>>> 
>>> 
>>> Am 18.03.2010 21:04, schrieb snosov:
>>> 
>>>> BTW: do you mean the hard wall-clock limit of the job or the queue?
>>>> 
>>>> I meant the h_rt value that is specified for the whole queue.
>>>> 
>>>> 
>>>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=249653
>> 
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=249661

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list