[GE users] LAM/MPI and SGE : tight_integration

christophe.caron at jouy.inra.fr christophe.caron at jouy.inra.fr
Wed Jan 11 18:27:23 GMT 2006


> Hi Christophe,
> Is the PE assigned to a queue?

Yes it was ! Thanks for all reply about this HOWTO but which seems to be
obsolete now.

So after a break i've decided to look at the last HOW-TO LAM/SGE: 
So i've got LAM 7.1.1 (versus 7.0.2) and i've apply all modifications
in "Tight integration using qrsh" section.

My PE configuration:
$ qconf -sp lam711
pe_name           lam711
slots             8
user_lists        test
xuser_lists       deadlineusers
start_proc_args   /opt/lam711-sge/lam_tight_qrsh/startlam.sh -catch_rsh \
stop_proc_args    /opt/lam711-sge/lam_tight_qrsh/stoplam.sh
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

I'm using qrsh with ssh (qrsh ls /tmp works)

Now it seems i've some problems to dispatch jobs on all nodes
# qsub -pe lam711 8 test_lam.sh
will launched lamd on one first node
  /usr/local/public/lam/bin/lamd_binary -d -H -P 32849 -n 0 
-o 0 -sessionsuffix sge-122589-undefined

But not on all others nodes with this error
#more lam.err
The selected RPI failed to initialize during MPI_INIT.  This is a
fatal error; I must abort.

This occurred on host n57 (n2).
The PID of failed process was 12494 (MPI_COMM_WORLD rank: 4)
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 11964 failed on node n0 ( with exit status 1.
mkdir: No such file or directory

I've searched agian and again since some hours without any success
(i had other problems but this is the last)

Any clue ?



Prenez note de ma nouvelle adresse : christophe.caron at jouy.inra.fr

  Christophe Caron - INRA
  Mathematique, Informatique et Genome
  Domaine de Vilvert 78350 Jouy-en-Josas
  Web: http://migale.jouy.inra.fr/
  Tel: 01-34-65-28-88  Email: christophe.caron at jouy.inra.fr 

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list