[GE users] MPICH2 job deletion

Alan Carriou Alan.Carriou at jet.uk
Wed Apr 27 16:22:23 BST 2005


Hi Reuti,

I did not set the "MPIEXEC_RSH".

 > you set up SGE to use ssh in it's config,
What parameter are you referring to ?

Thanks,
Alan

Reuti wrote:
> Hi Alan,
> 
> you set up SGE to use ssh in it's config, and/or did you just avoid 
> setting "MPIEXEC_RSH=rsh"?
> 
> CU - Reuti
> 
> 
> Alan Carriou wrote:
> 
>> Hi
>>
>> On our grid, we have SGE 6.0u3 and MPICH2 1.0.1.
>> Using the smpd daemonless startup, we have a problem : when we delete 
>> a running MPI-job, the MPI processes are not killed.
>> The slots are freed, the job is said to be finished, the mpiexec and 
>> ssh processes on the first node are killed, but the MPI processes 
>> themselves are still alive. This happens both with qdel and qmon. The 
>> qmaster/messages says just:
>>
>> 04/27/2005 15:49:07|qmaster|testgrid-3|W|job 51.1 failed on host 
>> testgrid-4.jet.uk assumedly after job because: job 51.1 died through 
>> signal KILL (9)
>>
>> If this may explain something, we use ssh instead of rsh to connect to 
>> other hosts.
>>
>> Using the daemon-based startup, the job deletion works fine. And, 
>> using both, the normal end of a MPI-job causes no problem.
>>
>> Does anyone have an idea ?
>>
>> Thanks,
>> Alan
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list