[GE users] Moved to users list: Re: [GE issues] qsub never returns; though the jobs complete successfully

reuti reuti at staff.uni-marburg.de
Wed May 5 11:42:45 BST 2010


Am 04.05.2010 um 19:25 schrieb gtatachar:

> Yes the qsub is with "sync -y" and yes, $SGE_ROOT/default/common is shared

Sure it still will happen with the strace, but does the strace hang, or do you see that it's trying to connect all the time to the wrong machine?

-- Reuti


> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: Tuesday, May 04, 2010 1:20 PM
> To: users
> Cc: Tatachar, Gopinath [Tech]
> Subject: Moved to users list: Re: [GE issues] qsub never returns; though the jobs complete successfully
> 
> Am 04.05.2010 um 15:50 schrieb gtatachar:
> 
>> Our production qmaster process hung/died and the failover worked after about 10/15m and qmaster process started on the sahdow server. 
>> 
>> This caused our production batch which had kicekd off a number of jobs never returned. So while the job ran successfully, qsub never came back, sitting out there waiting! The qsub commands (launched via scripts in autosys) were outastanding waiting - a lot of them. 
>> 
>> Is there any way to get qsub to return? or do we need to kill them and have autosys jobs go to failure? 
> 
> Was the `qsub` used with the "-sync y" option?
> 
> The $SGE_ROOT/default/common is shared with the submission host?
> 
> -- Reuti
> 
> 
>> I would appreciate any information?
>> 
>> thanks
>> Gopi
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=256093
>> 
>> To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=256133
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=256246

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list