[GE users] puzzling MPICH behaviour with GE 5.3

Reuti reuti at staff.uni-marburg.de
Mon Aug 23 15:18:39 BST 2004


two things I look into are:

> compute-0-1
> compute-0-1
> compute-0-3
> Process 1 of 3 on compute-0-3.local
> Process 2 of 3 on compute-0-3.local
> Process 0 of 3 on compute-0-1.local

if `hostname` gives compute-0-1.local, you should change one entry to 
this full name, so that MPICH can remove it from the list during the 
first scan of the machinefile (although this is at this point not the 
reason for the strange distribution to the nodes). If you are not using 
a host_aliases file at all, maybe you can adjust the /etc/hosts.

> Could not find enough machines for architecture LINUX 

For SGE the machine is free and you got two slots - so the job got 
started. What is in /home/mpich/share/machines.LINUX? The error you got 
comes from MPICH I think. Is your mpirun the final script, or links it 
to something else?

Ciao - Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list