[GE users] specific host, specific user

Chris Dagdigian dag at sonsorol.org
Thu Oct 30 14:39:15 GMT 2008


If the job runs successfully on all other hosts I'd look first at the  
host where the job fails to see "what is different" about it.

Common reasons could be:

- different file system permissions on that host
- different UID/GID mappings (NIS,LDAP issue or /etc/passwd|group are  
out of date, setuid or squash bits set on filesystems), SELINUX, etc.
- user does not exist on that host (that would trigger a queue  
instance E state though)
- missing application dependencies (libraries, modules)
- can't read or write to the location where the input and output files  
are meant to go

The best debug information is the standard output and standard error  
from the job itself. If the job produces no output then look in the  
execd messages file in the spool directory for that particular host to  
see what may be wrong. May also help to check /var/log/messages or  
equiv and especially check in "/tmp" as that is the SGE panic log  
location of last resort.

-Chris



On Oct 30, 2008, at 10:26 AM, Paolo Supino wrote:

> Hi
>
> I'm running a small grid (1 master +2 compute nodes) with SGE 6.2  
> and I'm experiencing the following issue: I have 1 specific user  
> that when he submits a job, a job sent to a specific host (same host  
> every time) hangs and never finishes to run. What do I need to look  
> for in order to find where the problem is and resolve it?
>
>
>
>
>
> --
> TIA
> Paolo


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list