[GE users] Is this a bug?

Aaron Turner aaron at cs.york.ac.uk
Wed Dec 10 14:30:07 GMT 2008


reuti wrote:
> Am 09.12.2008 um 16:58 schrieb Aaron Turner:
> 
>> reuti wrote:
>>> Then it will set h_vmem=0 but not infinity, you can of course specify
>>> h_vmem=infinity. As one can expect, the application will crash with
>>> zero memory.
>> Is perhaps the failure mode of the application somehow dragging  
>> down the
>> queue? The error states on the queues can be cleared relatively  
>> easily,
>> but it does block new jobs until the point that this has been done.
> 
> What exactly is stated in the messages file ($SGE_ROOT/default/common/ 
> qmaster/messages). Just "queue xy was put into error state because of  
> jobs xy failure?).
> 
> If I submit a job with h_vmem=0 the job is killed immediately by SGE  
> as already it's startup uses too much memory.

It does kill the job, but seems to set the queue to the error state:

12/03/2008 21:21:47|worker|qmaster|W|job 70451.1 failed on host exechost 
general before job because: 12/03/2008 21:21:57 [0:15753]: can't set 
additional group id (uid=0, euid=0): Cannot allocate memory
12/03/2008 21:21:47|worker|qmaster|W|rescheduling job 70451.1
12/03/2008 21:21:47|worker|qmaster|E|queue long marked QERROR as result 
of job 70451's failure at host exechost

Regards,

   Aaron Turner

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92081

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list