[GE users] LAM/MPI and SGE : tight_integration

christophe.caron at jouy.inra.fr christophe.caron at jouy.inra.fr
Wed Jan 11 18:49:42 GMT 2006


>> 
>> My PE configuration:
>> $ qconf -sp lam711
>> pe_name           lam711
>
> as you used another name for this PE: did you also adjust the lamd_wrapper to 
> test against this name? - Reuti

Oups... && Great!!
Yes i've forgot to change this name in the lamd_wrapper script :(
Now it seems to work !!!!

Thanks again !!

cc

>
>> slots             8
>> user_lists        test
>> xuser_lists       deadlineusers
>> start_proc_args   /opt/lam711-sge/lam_tight_qrsh/startlam.sh -catch_rsh \
>>                   $pe_hostfile
>> stop_proc_args    /opt/lam711-sge/lam_tight_qrsh/stoplam.sh
>> allocation_rule   $round_robin
>> control_slaves    TRUE
>> job_is_first_task FALSE
>> urgency_slots     min
>> 
>> 
>> I'm using qrsh with ssh (qrsh ls /tmp works)
>> 
>> Now it seems i've some problems to dispatch jobs on all nodes
>> # qsub -pe lam711 8 test_lam.sh
>> will launched lamd on one first node
>>  /usr/local/public/lam/bin/lamd_binary -d -H 192.168.1.56 -P 32849 -n 0 -o 
>> 0 -sessionsuffix sge-122589-undefined
>> 
>> 
>> But not on all others nodes with this error
>> #more lam.err
>> ---------------------------------------------------------------------- 
>> -------
>> The selected RPI failed to initialize during MPI_INIT.  This is a
>> fatal error; I must abort.
>> 
>> This occurred on host n57 (n2).
>> The PID of failed process was 12494 (MPI_COMM_WORLD rank: 4)
>> ---------------------------------------------------------------------- 
>> -------
>> ---------------------------------------------------------------------- 
>> -------
>> One of the processes started by mpirun has exited with a nonzero exit
>> code.  This typically indicates that the process finished in error.
>> If your process did not finish in error, be sure to include a "return
>> 0" or "exit(0)" in your C code before exiting the application.
>> 
>> PID 11964 failed on node n0 (192.168.1.56) with exit status 1.
>> ---------------------------------------------------------------------- 
>> -------
>> mkdir: No such file or directory
>> 
>> 
>> 
>> I've searched agian and again since some hours without any success
>> (i had other problems but this is the last)
>> 
>> Any clue ?
>> 
>> thanks
>> 
>> cc
>> 
>> 
>> Prenez note de ma nouvelle adresse : christophe.caron at jouy.inra.fr
>> 
>> ***********************************************************
>>  Christophe Caron - INRA
>>  Mathematique, Informatique et Genome
>>  Domaine de Vilvert 78350 Jouy-en-Josas
>>  Web: http://migale.jouy.inra.fr/
>>  Tel: 01-34-65-28-88  Email: christophe.caron at jouy.inra.fr 
>> ***********************************************************
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


Prenez note de ma nouvelle adresse : christophe.caron at jouy.inra.fr

***********************************************************
  Christophe Caron - INRA
  Mathematique, Informatique et Genome
  Domaine de Vilvert 78350 Jouy-en-Josas
  Web: http://migale.jouy.inra.fr/
  Tel: 01-34-65-28-88  Email: christophe.caron at jouy.inra.fr 
***********************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list