[GE users] Moved to users list: Re: [GE issues] qsub never returns; though the jobs complete successfully

reuti reuti at staff.uni-marburg.de
Tue May 4 18:45:34 BST 2010


Am 04.05.2010 um 19:25 schrieb Tatachar, Gopinath [Tech]:

> Yes the qsub is with "sync -y" and yes, $SGE_ROOT/default/common is shared

Can you start an strace for the running process ID (of `qsub`) while it's hanging? I can't find it in the archive, but IIRC there was a discussions where it turned out that `qsub -sync y` is clever enough to ask the the new qmaster.

-- Reuti


> 
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: Tuesday, May 04, 2010 1:20 PM
> To: users
> Cc: Tatachar, Gopinath [Tech]
> Subject: Moved to users list: Re: [GE issues] qsub never returns; though the jobs complete successfully
> 
> Am 04.05.2010 um 15:50 schrieb gtatachar:
> 
>> Our production qmaster process hung/died and the failover worked after about 10/15m and qmaster process started on the sahdow server. 
>> 
>> This caused our production batch which had kicekd off a number of jobs never returned. So while the job ran successfully, qsub never came back, sitting out there waiting! The qsub commands (launched via scripts in autosys) were outastanding waiting - a lot of them. 
>> 
>> Is there any way to get qsub to return? or do we need to kill them and have autosys jobs go to failure? 
> 
> Was the `qsub` used with the "-sync y" option?
> 
> The $SGE_ROOT/default/common is shared with the submission host?
> 
> -- Reuti
> 
> 
>> I would appreciate any information?
>> 
>> thanks
>> Gopi
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=256093
>> 
>> To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=256134

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list