[GE users] here is a strange one: submitting to a PE reliably takes out sge_schedd

reuti reuti at staff.uni-marburg.de
Mon Jan 18 22:08:16 GMT 2010


Hi,

Am 18.01.2010 um 14:24 schrieb elauzier:

> We had this problem show up with our cluster also in late 2009.   
> Chris was involved and witnessed the events.  Here is what we did  
> to look into the issue:
>
> 1.  submitted the job so that it would not dispatch until say 30  
> minutes later, making sure that the problem was not with dispatching.
> Indeed, this showed that it was during the submission process.
>
> 2.  Inspected the user's environment and had him completely log out  
> and then log back in again after cleaning up his env init scripts.
>
> The reason for (2) was that in the LSF world we have seen this also  
> and there can be a parsing issue.  I suspected something in the env  
> caused scheduler's parser to croak.
>
> Well, after we cleaned up the env and had the user log out and then  
> back in again, the problem went away.
>
> If the problem shows up again, then we will request an instrumented  
> binary for further debug...

a) was this a qsub with anything special (i.e. -V or so)?

b) were any environemnt variables set, which could alter qsub's  
behavior?

c) when the scheduler was re-enabled, the job was processed w/o any  
issue and run at a later point in time? qacct was also showing fine  
entries after the job?

-- Reuti


> Ed Lauzier
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=239522
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=239614

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list