[GE users] Protocol error, closed connection when using qrsh for hpux 11i

Nir Dvir nir at chipx.co.il
Mon Jun 7 10:14:58 BST 2004

When trying to use qrsh -l hostname=some-hpux-hostname I receive an
immediate error 


Protocol error, some-hpux-hostname closed connection


This suddenly happens on all my hpux systems running hp-ux 11i and 5.3p2


None of the hosts reported anything in the spool/hostname/messages, nor
in the qmaster/messages.


I installed a new host and got the same error, but it did record the


In the qmaster messages:

Mon Jun  7 11:24:00 2004|qmaster|endor|I|starting up 5.3p2 (sgeee)

Mon Jun  7 11:24:34 2004|qmaster|endor|W|job 26099.1 failed on host
jabba assumedly after job because: job 26099.1 died through signal KILL


In the host's messages:

Mon Jun  7 11:18:13 2004|execd|jabba|I|starting up 5.3p2 (sgeee)

Mon Jun  7 11:19:33 2004|execd|jabba|W|can't receive request: WRITE

Mon Jun  7 11:40:57 2004|execd|jabba|W|can't receive request: READ ERROR

Mon Jun  7 11:42:11 2004|execd|jabba|W|can't receive request: READ ERROR


When I use qrsh -l hostname=some-solaris-hostname , it works just fine
the same works for all linux hosts.


qsub -l hostname=some-hpux-hostname -cwd ./worker.sh works on all my
hpux hosts, so I guess it has something to do with qrsh.


rsh  some-hpux-hostname ~/worker.sh works too. 


I tried looking at the users' archive and found errors that related to
service ports, host resolution, but found nothing there.


Anyone has any idea why this could happen suddenly, or better yet,
pointers for resolving this?







More information about the gridengine-users mailing list