[GE users] Few newbie questions

Vijay Avarachen vavarachen at gmail.com
Wed Mar 16 16:23:26 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

> You should look at the log files in:
> 
> $SGE_ROOT/$SGE_CELL/spool/qmaster/messages
> 
> $SGE_ROOT/$SGE_CELL/spool/<node>/messages
> 
Performing a qstat -j gives me the following:
[root at spduslisclust01 root]# qstat -j
scheduling info:            queue instance
"all.q at node1.na.net.dana.com" dropped because it is temporarily not
available
                            queue instance
"all.q at node2.na.net.dana.com" dropped because it is temporarily not
available
                            queue instance
"all.q at node3.na.net.dana.com" dropped because it is temporarily not
available

Also performing a qstat -explain acAE gives me:
[root at spduslisclust01 root]# qstat -explain acAE
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q at node1.na.net.dana.com    BIP   0/2       0.00     lx24-x86      E
        queue all.q marked QERROR as result of job 2's failure at host
node1.na.net.dana.com
----------------------------------------------------------------------------
all.q at node2.na.net.dana.com    BIP   0/2       0.00     lx24-x86      E
        queue all.q marked QERROR as result of job 2's failure at host
node2.na.net.dana.com
----------------------------------------------------------------------------
all.q at node3.na.net.dana.com    BIP   0/2       0.00     lx24-x86      E
        queue all.q marked QERROR as result of job 12's failure at
host node3.na.net.dana.com
----------------------------------------------------------------------------
all.q at node4.na.net.dana.com    BIP   0/2       0.00     lx24-x86
----------------------------------------------------------------------------
all.q at spduslisclust01          BIP   0/2       0.62     lx24-x86

It seems ever since job 2 failed, those nodes have been black-listed.

I must be doing something wrong with the install_execd script on
nodes1-3, because I noticed that it did not create the spool and
common folders in CELL.  However in node4 everything is looking good.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list