[GE users] Re: Script submit error

reuti reuti at staff.uni-marburg.de
Sun Nov 1 19:43:19 GMT 2009


Am 30.10.2009 um 02:36 schrieb jsadino:

> Thank you craffi for the excellent suggestions.  First I tried  
> restarting, and that seems to have erased my act_qmaster file.  I  
> made a new one with the line:
> cluster.mrilab.net
> Still didn't solve my problem though.
> The qstat -f command looks fine:
> [FSuser at cluster common]$ qstat -f
> queuename                      qtype used/tot. load_avg  
> arch          states
> ---------------------------------------------------------------------- 
> ------
> all.q at compute-0-0.local        BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> all.q at compute-0-2.local        BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> all.q at compute-0-3.local        BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> all.q at compute-0-4.local        BIP   0/2       -NA-     lx26- 
> amd64    au
> ---------------------------------------------------------------------- 
> ------
> all.q at compute-0-7.local        BIP   0/8       8.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> long.q at compute-0-0.local       BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> long.q at compute-0-2.local       BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> long.q at compute-0-3.local       BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> long.q at compute-0-4.local       BIP   0/2       -NA-     lx26- 
> amd64    au
> ---------------------------------------------------------------------- 
> ------
> long.q at compute-0-7.local       BIP   0/8       8.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> short.q at compute-0-0.local      BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> short.q at compute-0-2.local      BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> short.q at compute-0-3.local      BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> short.q at compute-0-4.local      BIP   0/2       -NA-     lx26- 
> amd64    au
> ---------------------------------------------------------------------- 
> ------
> short.q at compute-0-7.local      BIP   0/8       8.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> verylong.q at compute-0-0.local   BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> verylong.q at compute-0-2.local   BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> verylong.q at compute-0-3.local   BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> verylong.q at compute-0-4.local   BIP   0/2       -NA-     lx26- 
> amd64    au
> ---------------------------------------------------------------------- 
> ------
> verylong.q at compute-0-7.local   BIP   0/8       8.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> veryshort.q at compute-0-0.local  BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> veryshort.q at compute-0-2.local  BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> veryshort.q at compute-0-3.local  BIP   0/2       2.00     lx26-amd64
> ---------------------------------------------------------------------- 
> ------
> veryshort.q at compute-0-4.local  BIP   0/2       -NA-     lx26- 
> amd64    au
> ---------------------------------------------------------------------- 
> ------
> veryshort.q at compute-0-7.local  BIP   0/8       8.00     lx26-amd64
>
>
> and the qsub shortcut looks fine:
> [FSuser at cluster common]$ which qsub
> /opt/gridengine/bin/lx26-amd64/qsub
>
> The qmaster/messages folder didn't have any entries for the past  
> month.  What was there looks fine.  I know it was working  
> successfully on Oct. 1:
> 09/23/2009 12:26:18|qmaster|cluster|E|commlib error: got read error  
> (closing "cluster.mrilab.net/qmon/17177")
> 10/01/2009 15:18:38|qmaster|cluster|E|denied: "1segstatssubmit.sh"  
> is not a valid job name (job cannot start with a digit)
> 10/01/2009 16:01:20|qmaster|cluster|W|job 177755.1 failed on host  
> compute-0-0.local assumedly after job because: job 177755.1 died  
> through signal KILL (9)
> 10/01/2009 16:44:07|qmaster|cluster|W|FSuser - you have no  
> permission to modify queue "all.q at compute-0-4.local"
> 10/28/2009 12:28:01|qmaster|cluster|E|denied: "10.1.1.13" is not a  
> valid job name (job cannot start with a digit)
> 10/28/2009 12:28:50|qmaster|cluster|E|denied: "10.1.1.13" is not a  
> valid job name (job cannot start with a digit)
> 10/28/2009 13:15:24|qmaster|cluster|W|rule "default rule (spool  
> dir)" in spooling context "classic spooling" failed writing an object
> 10/28/2009 13:18:04|qmaster|cluster|W|rule "default rule (spool  
> dir)" in spooling context "classic spooling" failed writing an object
> 10/28/2009 13:18:31|qmaster|cluster|W|rule "default rule (spool  
> dir)" in spooling context "classic spooling" failed writing an object
> 10/28/2009 13:18:36|qmaster|cluster|W|rule "default rule (spool  
> dir)" in spooling context "classic spooling" failed writing an object
> 10/28/2009 13:18:44|qmaster|cluster|W|rule "default rule (spool  
> dir)" in spooling context "classic spooling" failed writing an object
>
> Sorry, I don't know what qrsh hostname really does, but it says ok.
> [FSuser at cluster stats]$ qrsh hostname
> ok

So, the first part of the error message came from SGE (Unable ti  
runjob:), but this seems to have a different source. Is there any JSV  
defined?


> [FSuser at cluster stats]$
>  and qsub ./submittmp.sh still same error
>
> Don't know if it matters, but that server has two network cards,  
> one for remote access and one to connect to our RAID.
> I moved my /tmp folder around recently b/c my os drive was getting  
> full.  Could that have anything to do with it?

Unless the spool directoty of SGE is on /tmp, it shouldn't matter.

-- Reuti


> Any other ideas would be very helpful.  Thank you in advance!
> Jeff Sadino
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=224112
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=224487

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list