[GE users] SGE 6 - queues entering error state

Reuti reuti at staff.uni-marburg.de
Thu Aug 10 20:04:31 BST 2006


Am 10.08.2006 um 20:24 schrieb Bevan C. Bennett:

>
>> can you please post your queue, sge and exechost configuration.
>
> Which parts of it?

The first few lines form the SGE conf, where the spool directories  
are defined, and maybe you have a local configuration for some of  
your hosts (qconf -sconfl)? And prolog/epilog in any of them?


>> The only thing I see from this is, that the "pid" doesn't belong into
>> /scratch/2313.1.all.q/pid, but into
>> /mnt/local/common/grid-test/default/spool/cobalt/active_jobs/ 
>> 2313.1/pid.
>
> I know... for all my correctly running jobs this is what happens.
> Could the user be accidently setting some environment variable that  
> points SGE
> to $TMPDIR instead of the spool directory?

Was it an interactive qlogin/qrsh, or a batch qsub/qrsh? Do you see  
this happen only on certain hosts? Is there any prolog/epilog, either  
global or for a queue/host?

>
>> Just for curiosity: which type of jobs are you running, to put  
>> $TMPDIR
>> on a shared space? Many jobs benefit from a local $TMPDIR.
>
> /scratch ($TMPDIR) is local to each system.

Sorry, this was a misunderstanding on my side, as you wrote "...is a  
world writable directory ". You current implementation is fine of  
course.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list