[GE users] qrsh fails

Marco Donauer - SUN Microsystems Marco.Donauer at Sun.COM
Fri Jan 27 14:29:54 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Jean-Paul,

is it possible, that you didn't execute the setfileperm.sh script?
This is also executed, if you answering the question concering the 
filepermissions (during installation) in the
right way. For your example it should be answered with "no".

Regards,
Marco

Jean-Paul Minet wrote:

> Reuti,
>
> <snip>
>
>>> As root, qrsh is working OK.  As normal user, I get:
>>> minet at lmexec-86 ~ >qrsh -verbose -l mem_free=10M -l num_proc=2 -q  
>>> all.q at lmexec-75 date
>>> your job 2496 ("date") has been submitted
>>> waiting for interactive job to be scheduled ...
>>> Your interactive job 2496 has been successfully scheduled.
>>> Establishing /gridware/sge/utilbin/lx24-amd64/rsh session to host  
>>> lmexec-75 ...
>>> rcmd: socket: Permission denied
>>>
>>
>> what is:
>>
>> $ ls -lh /gridware/sge/utilbin/lx24-amd64/rlogin
>> $ ls -lh /gridware/sge/utilbin/lx24-amd64/rsh
>
>
> OK, got it.  rsh/rlogin were owned by sgeadmin :-( (... as installed 
> by Sun engineers)  Made them owned by root and it now works.  Thks 
> again for your help.
>
> jp
>
>>
>> saying? - Reuti
>>
>>
>>> Note that qlogin works as normal user:
>>> minet at lmexec-86 ~ >qlogin -verbose -l mem_free=10M -l num_proc=2 -q  
>>> all.q at lmexec-75
>>> your job 2497 ("QLOGIN") has been submitted
>>> waiting for interactive job to be scheduled ...
>>> Your interactive job 2497 has been successfully scheduled.
>>> Establishing telnet session to host lmexec-75 ...
>>> Trying 192.168.241.75...
>>> Connected to lmexec-75.
>>> Escape character is '^]'.
>>> Welcome to SUSE LINUX Enterprise Server 9 (x86_64) - Kernel  
>>> 2.6.5-7.97-smp (1).
>>>
>>> This is with SUID on utilbin/rlogin and rsh (as explained in howto's).
>>>
>>> Any hint?
>>>
>>> jp
>>>
>>>
>>>>> All in all, a crap-shoot.
>>>>>
>>>>> David S.
>>>>>
>>>>> On Mon, Jan 16, 2006 at 09:16:50AM +0100, Jean-Paul Minet wrote:
>>>>>
>>>>>> Reuti,
>>>>>>
>>>>>>>> I am trying to get tight integration to work (MPICH 1.2.6 et SGE
>>>>>>>> 6.0u6) and face a problem with qrsh.  Trying to debug it  
>>>>>>>> separately
>>>>>>>> from the integration bit, I obtain a "poll:protocol failure in
>>>>>>>> circuit setup" on the host initiating the qrsh (cfr. below).    
>>>>>>>> On  the
>>>>>>>> target host, I get the following wierd messages:
>>>>>>>>
>>>>>>>> Message from syslogd at lmexec-92 at Fri Jan 13 10:47:21 2006 ...
>>>>>>>> lmexec-92 kernel: Oops: 0000 [2] SMP
>>>>>>>>
>>>>>>>> Message from syslogd at lmexec-92 at Fri Jan 13 10:47:21 2006 ...
>>>>>>>> lmexec-92 kernel: CR2: 0000000000000108
>>>>>>>>
>>>>>>>> We use SUSE 9.0 (kernel 2.6.5-7.97-smp) on Sun V20z (bi-opteron).
>>>>>>>>
>>>>>>>
>>>>>>> this looks like a bug in the kernel - was the 2.6.5-7.97-smp   
>>>>>>> kernel  the
>>>>>>> latest for 9.0?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> We actually use SLES 9 (entreprise version).  The cluster has been
>>>>>> purchased and installed last quarter.  I checked on the Novell   
>>>>>> site and
>>>>>> didn't see any subsequent release.
>>>>>>
>>>>>>> - Is this on all hosts or only on one specific one?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Just tried with a few hosts, and the behavior is the same...
>>>>>>
>>>>>>> - Is this new and worked before? As 9.0 isn't the latest of  
>>>>>>> 9.x, I'd
>>>>>>> assume that your cluster is already in operation for some time  
>>>>>>> now.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> It never worked before.  Install is new; SGE configured and  
>>>>>> more  or less
>>>>>> working, except bits and pieces here and there, among which tight
>>>>>> integration for mpich/ethernet interconnect; I have also  
>>>>>> trouble  with the
>>>>>> infiniband interconnect integration: the patch for mpich/  
>>>>>> infiniband and SGE
>>>>>> tight integration, available on the HowTo site, doesn't match  
>>>>>> the  version
>>>>>> of mpich supplied and customized by the Infiniband vendor.  I  
>>>>>> am  awaiting
>>>>>> support form Infiniband vendor to get latest mpich/mvapich version
>>>>>> installed/customized.
>>>>>>
>>>>>> thnks & rgds
>>>>>>
>>>>>> Jean-Paul
>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>>> Would someone have an idea on how to further debug the problem (I
>>>>>>>> have tried using tcpdump between the submit host and the  
>>>>>>>> target   host,
>>>>>>>> as well as the qmaster host and the target host, to dig into
>>>>>>>> communication bits, but it's getting complicated...)?
>>>>>>>>
>>>>>>>> Thks for any help
>>>>>>>>
>>>>>>>> Jean-paul
>>>>>>>>
>>>>>>>> ---- qrsh command and output ----
>>>>>>>> lemaitre /gridware/sge/bin/lx24-amd64 # qrsh -verbose -l    
>>>>>>>> mem_free=10M
>>>>>>>> -l num_proc=2 -q all.q at lmexec-92 date
>>>>>>>> local configuration lemaitre.cism.ucl.ac.be not defined - using
>>>>>>>> global configuration
>>>>>>>> your job 1788 ("date") has been submitted
>>>>>>>> waiting for interactive job to be scheduled ...
>>>>>>>> Your interactive job 1788 has been successfully scheduled.
>>>>>>>> Establishing /gridware/sge/utilbin/lx24-amd64/rsh session to host
>>>>>>>> lmexec-92 ...
>>>>>>>> poll: protocol failure in circuit setup
>>>>>>>> /gridware/sge/utilbin/lx24-amd64/rsh exited with exit code 1
>>>>>>>> reading exit code from shepherd ... 129
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Jean-Paul Minet
>>>>>>>> Gestionnaire CISM - Institut de Calcul Intensif et de  
>>>>>>>> Stockage  de  Masse
>>>>>>>> Universit? Catholique de Louvain
>>>>>>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------- 
>>>>>>>> -- --
>>>>>>>> To unsubscribe, e-mail: users- 
>>>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>>>> For additional commands, e-mail: users-  
>>>>>>>> help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------ 
>>>>>>> -- -
>>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: users- 
>>>>>>> help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Jean-Paul Minet
>>>>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage  
>>>>>> de  Masse
>>>>>> Universit? Catholique de Louvain
>>>>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>>>>
>>>>>> ------------------------------------------------------------------- 
>>>>>> -- 
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users- 
>>>>>> help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------- 
>>>>> -
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>> -- 
>>> Jean-Paul Minet
>>> Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de  
>>> Masse
>>> Université Catholique de Louvain
>>> Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list