[GE users] qrsh in MPICH PE problem

Alessandro Federico alessandro.federico at caspur.it
Wed Nov 24 15:59:25 GMT 2004

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


running an MPICH job on two nodes (dual Opteron) with the
rsh-wrapper (which uses qrsh) I get the following errors.

The qmaster logs the messages:

Wed Nov 24 12:48:27 2004|qmaster|poseidon|I|task 1.slacs16 at 
slacs16.caspur.it of job 3460.1 died through signal HUP
Wed Nov 24 12:48:27 2004|qmaster|poseidon|E|task 1.slacs16 of job 3460 
failed - killing job
Wed Nov 24 12:48:33 2004|qmaster|poseidon|I|task 1.slacs01 at 
slacs01.caspur.it of job 3460.1 died through signal KILL
Wed Nov 24 12:48:33 2004|qmaster|poseidon|W|job 3460.1 failed on host 
slacs01.caspur.it  assumedly
  after job because: job 3460.1 died through signal KILL (9)

the node slacs16 logs:

Wed Nov 24 12:48:29 2004|execd|slacs16|E|reaping job "3460" ptf 
complains: Job does not exist

but the directory 

If I run the same job on one node everything is OK.



    [ Part 2: "Attached Text" ]

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list