[GE users] only 1 set of Qs will run
harry.mangalam at uci.edu
Thu Nov 27 03:38:17 GMT 2008
I have 2 subclusters, one AMD64, one i32, both running CentOS5.2, both
under control of one SGE 6.2, which is slowly starting to behave.
The i32 nodes are running in a private net, along with the qmaster,
which has a public and private interface.
The AMD64 nodes are running 'remotely' on public IP #s across campus.
Both groups show up correctly on a qhost query.
Both nodes can be passwordlessly ssh'ed into from the login node which
has both private and public interfaces.
Because of the arch & geographic differences, I've set up different Qs
to feed each subcluster (xxx_i32, xxx_a64)
After a few hiccups, the private net nodes are running both
interactive and batch jobs correctly after being submitted from the
login node, but the remote AMD64 nodes are still refusing to execute
for example, trying to log into an a64 Q that has been defined to be
----- example start -----
$ qrsh -verbose -q int_a64
local configuration bduc-login.nacs.uci.edu not defined - using global
Your job 140 ("QRLOGIN") has been submitted
waiting for interactive job to be scheduled ...timeout (3 s) expired
while waiting on socket fd 4
Your "qrsh" request could not be scheduled, try again later.
----- example end -----
A qsub of simple.sh to one of the a64 Qs are held in 'qw'
or 'Pending' status until killed.
If I use qmon and click on the job and then the "why?" button, it
scheduling info: (Collecting of scheduling job information is turned
It also USED to show this error:
Error for job 108: can't create directory active_jobs/108.1: Stale NFS
but after I restarted the sge_execd on the nodes, it no longer shows
that error, just the one noted above.
The differences between the Q definitions of the working i32 Qs and
the nonworking a64 Qs are minimal:
$ qconf -sq int_i32 >int_i32.q_config
$ qconf -sq int_a64 >int_a64.q_config
$ diff int_a64.q_config int_i32.q_config
< qname int_a64
< hostlist @int_a64
> qname int_i32
> hostlist @int_i32
I'm missing something but don't know what...
Harry Mangalam - Research Computing, NACS, E2148, Engineering Gateway,
UC Irvine 92697 949 824-0084(o), 949 285-4487(c)
Good judgment comes from experience;
Experience comes from bad judgment. [F. Brooks.]
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users