[GE users] qmaster dies

Kirk Patton kpatton at transmeta.com
Wed Jun 2 23:40:52 BST 2004


Hello,

I am having a strange issue.  The qmaster daemon on my master host died.
The log file says
Wed Jun  2 15:23:28 2004|qmaster|lsf-k8|E|cant open file users/.teesea3d: No space left on device

But, I do not have any full partitions that I can find on that host
[root at lsf-k8 users]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda2             6.0G  4.1G  1.6G  73% /
/dev/hda1             122M   25M   91M  22% /boot
/dev/hda5             176G   34M  167G   1% /export/home
none                  2.0G     0  2.0G   0% /dev/shm
none                  2.0G   72K  2.0G   1% /tmp
usrlocal-fs:/fs/usrlocal/i386-linux-libc6
                       22G   17G  5.4G  76% /transmeta/i386-linux-libc6
mis-fs:/fs/mis/project-mis
                       17G   16G  874M  95% /home/mis
lsf-fs:/fs/lsf/transmeta-lsf4.0.1
                       11G  2.8G  7.6G  27% /transmeta/lsf4.0.1
cerise:/vol/vol1/sge/sge_5.3p5
                       10G  141M  9.9G   2% /transmeta/sge
eng3-fs:/fs/eng3/home/kpatton
                      376G  287G   86G  78% /home/kpatton
cad-fs:/fs/cad/transmeta-cad
                      108G  105G  2.9G  98% /transmeta/cad

When I start the daemons, I get 
Reading in users:
        User "chris".
        User "gsmith".
        User "jamesd".
        User "kpatton".
        User "teesec60d".
        User "teeser10".
        User "teeser9".
        User "teeser8".
        User "teeser11".
        User "teesea3d".
        User "teeseastro".
        User "teesef4ad".
        User "teeseaat".
        User "teesef4in".
removing reference to no longer existing job 182722 of user "teesea3d"
error: cant open file users/.teesea3d: No space left on device
   starting sge_schedd
error: getting configuration: unable to contact qmaster via "" commd - qmaster not enrolled at commd
error: can't get configuration from qmaster -- backgrounding
   starting sge_shadowd

The files that it is complaining about reside on a nfs shared directory and it has lots of space.
cerise:/vol/vol1/sge/sge_5.3p5
                       10G  141M  9.9G   2% /transmeta/sge

Anyone have an idea how I can track down what is wrong?

Kirk
-- 
Kirk Patton
Unix Administrator
Transmeta Inc.
Tel. 408 919-3055

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list