[GE users] Job Failure Deletes Local Spool Directory

Reuti reuti at staff.uni-marburg.de
Wed Feb 23 13:07:00 GMT 2005


Do you have a different entry for each host? Otherwise you could edit 
the entry (and only in this case!) by hand in 
$SGE_ROOT/default/common/configuration

My entries there are:

qmaster_spool_dir         /var/spool/sge/qmaster
execd_spool_dir           /var/spool/sge

Cheers - Reuti

Dan Gruhn wrote:
> Okay, that makes sense to me and I want to change my local spool 
> location.  I try to change the spool directory on one of my machines by 
> running:
> 
> qconf -mconf <host>
> 
> and I get the message:
> 
> "Changing parameter "execd_spool_dir" only supported in a shut-down 
> cluster."
> 
> So I run:
> 
> qconf -ke all
> qconf -mconf <host>
> 
> and I get the message:
> 
> "Changing parameter "execd_spool_dir" only supported in a shut-down 
> cluster."
> 
> So, I run
> 
> qconf -ks
> qconf -km
> qconf -mconf <host>
> 
> and I get the message:
> 
> unable to contact qmaster using port 461 on host "<hostname>"
> 
> So, how do I change this?
> 
> Dan
> 
> On Tue, 2005-02-22 at 10:20, Reuti wrote:
> 
>>I usually create a directory /var/spool/sge and put the SGE stuff there. 
>>/var seems a good place for this. - Reuti
>>
>>Dan Gruhn wrote:
>>> Interesting idea, but I don't see any cron jobs that do this.  As a 
>>> test, I have made a file in each /tmp dir to see if that file disappears 
>>> when this happens again.
>>> 
>>> Anyone have any other ideas?
>>> 
>>> Dan
>>> 
>>> On Tue, 2005-02-22 at 09:00, Reuti wrote:
>>> 
>>>>Hi,
>>>>
>>>>maybe it wasn't done by SGE: is there a cron job running on the machine 
>>>>to clean the /tmp from time to time?
>>>>
>>>>Cheers - Reuti
>>>>
>>>>
>>>>Dan Gruhn wrote:
>>>>> Greetings Everyone,
>>>>> 
>>>>> I am using Fedora Core 1 to run 6.0u3 and have a strange failure mode.  
>>>>> I get an administration email of the following:
>>>>> 
>>>>> Subject:  	"N1GE 6.0u3: Job-array task 6574.262 failed "
>>>>> 
>>>>> 
>>>>> Job 6574 caused action: Queue "low.q at class05-lx.group-w-inc.com 
>>>>> <mailto:low.q at class05-lx.group-w-inc.com>" set to ERROR
>>>>>  User        = dgruhn
>>>>>  Queue       = low.q at class05-lx.group-w-inc.com 
>>>>> <mailto:low.q at class05-lx.group-w-inc.com>
>>>>>  Host        = class05-lx.group-w-inc.com
>>>>>  Start Time  = <unknown>
>>>>>  End Time    = <unknown>
>>>>> failed assumedly before job:can't create directory active_jobs/6574.262: 
>>>>> No such file or directory
>>>>> 
>>>>> 
>>>>> 
>>>>> When I look on the host, I see that the execution daemon is running just 
>>>>> fine, but that my local spool directory (/tmp/sgespool in my case) is 
>>>>> completely gone without a trace.  There is no /tmp/execd error file or 
>>>>> anything.
>>>>> 
>>>>> These hosts are single processor, Pentium(R) 4 CPU 1.80GHz with 512 MB 
>>>>> of RAM.  They are the least capabile in my set of hosts.  The error 
>>>>> doesn't happen a lot, but it has happened enough that I'd like to solve 
>>>>> it if possible.  Of course, SGE recovers the job and runs it on another 
>>>>> host, but that queue is out of action until I shut down the execution 
>>>>> daemon and bring it back up.  It then recreates the local spool dir and 
>>>>> all is well.
>>>>> 
>>>>> Has anyone else experienced this or have any idea what may be 
>>>>> happening?  That is, what in SGE would delete the entire local spool 
>>>>> directory tree but leave the executor running?
>>>>> 
>>>>> Any help will be greatly appreciated.
>>>>> 
>>>>> Dan
>>>>> 
>>>>> 
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list