[GE users] Moved to users list: Re: [GE issues] qsub never returns; though the jobs complete successfully

gtatachar gopinath.tatachar at gs.com
Tue May 4 18:25:41 BST 2010


Yes the qsub is with "sync -y" and yes, $SGE_ROOT/default/common is shared

-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Tuesday, May 04, 2010 1:20 PM
To: users
Cc: Tatachar, Gopinath [Tech]
Subject: Moved to users list: Re: [GE issues] qsub never returns; though the jobs complete successfully

Am 04.05.2010 um 15:50 schrieb gtatachar:

> Our production qmaster process hung/died and the failover worked after about 10/15m and qmaster process started on the sahdow server. 
> 
> This caused our production batch which had kicekd off a number of jobs never returned. So while the job ran successfully, qsub never came back, sitting out there waiting! The qsub commands (launched via scripts in autosys) were outastanding waiting - a lot of them. 
> 
> Is there any way to get qsub to return? or do we need to kill them and have autosys jobs go to failure? 

Was the `qsub` used with the "-sync y" option?

The $SGE_ROOT/default/common is shared with the submission host?

-- Reuti


> I would appreciate any information?
> 
> thanks
> Gopi
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=256093
> 
> To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=256133

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list