[GE users] qdel does not delete the job

Reuti reuti at staff.uni-marburg.de
Sat Mar 25 08:14:31 GMT 2006


Am 25.03.2006 um 06:16 schrieb Srikanth:

> Hi,
>
> As it is on ROCKS cluster 'rsh' does not work, it uses 'ssh'
>

Please use the hint in the Howto to direct your MPICH application to  
use rsh again, which will then be caught by the rsh-wrapper of SGE.  
But there is no need to have a working rsh in the cluster, as SGE  
will use it's own rsh-daemon, dedicated and started for each qrsh call.

-- Reuti


>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Friday, March 24, 2006 9:24 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] qdel does not delete the job
>
> Hi,
>
> Am 24.03.2006 um 16:49 schrieb Srikanth:
>
>> I am using MPICH 1.2.7-1 which comes default along with ROCKS
>> cluster 4.1.
>>
>> Even I tried submitting the job with environment variable
>> "MPICH_PROCESS_GROUP=no", but it could not kill slave processes on  
>> the
>> compute nodes.
>>
>
> you also included the -V in the rsh-wrapper as outlined in the Howto
> for MPICH? - Reuti
>
>> Please suggest how to come out of this issue.
>>
>>
>> Sri
>>
>> -----Original Message-----
>> From: Rayson Ho [mailto:rayrayson at gmail.com]
>> Sent: Tuesday, March 21, 2006 6:26 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] qdel does not delete the job
>>
>> Which MPI implementation are you using??
>>
>> You need to use the "tight integration" for the PE:
>> http://gridengine.sunsource.net/howto/howto.html#Tight%20Integration
>> %20of%20
>> Parallel%20Libraries
>>
>> Rayson
>>
>>
>> On 3/21/06, Srikanth <srikanth at locuz.com> wrote:
>>> I am facing typical problem in my Rocks Cluster.
>>>
>>> I have installed and using the SUN grid roll on the ROCKS Cluster,
>>>
>>> When I use 'qdel' to delete the running job on the cluster it
>>> displays
>> that
>>> the job is deleted and even in 'qstat -f' output the job is
>>> cleared from
>> the
>>> queue but the instances of the jobs are running on all the nodes
>>> of the
>>> cluster.
>>>
>>> It was unable to kill the all job processes and unable to clear
>>> 'ipcs".
>>>
>>> Kindly help me resolving the issue.
>>>
>>>
>>> Regards,
>>>
>>> Sri
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list