[GE users] qrsh /bin/bash error mark all Queue to Error state

Reuti reuti at staff.uni-marburg.de
Thu Jul 3 10:53:16 BST 2008


Hiho,

Am 03.07.2008 um 01:26 schrieb Angel Arancibia:

> 2008/6/30 Reuti <reuti at staff.uni-marburg.de>:
>> Hi,
>>
>> well, my usual approach is to use the plain qrsh (which uses rsh  
>> in the end)
>> for interactive job support inside the cluster. There is no need  
>> to have the
>> rlogin.d/telnet.d running all the time, as SGE will start one  
>> daemon on its
>> own per job (" disable         = yes" in /etc/xinet.d/rlogin or  
>> telnet). For
>> the ssh login you could then add a line to /etc/ssh/sshd_config with:
>>
>> AllowUsers angel
>>
>> or
>>
>> AllowGroups admins
>>
>> or alike.
>>
>> ============================================================
>>
>> Nevertheless, if you need to use ssh in your cluster, you can also  
>> change
>> the above mentioned files, and supply new ones for SGE invocation,  
>> means (-f
>> and -F - sic!) and set all options therein:
>>
>> rsh_daemon                   /usr/local/sbin/sshd -f
>> /usr/sge/cluster/special_sshd_config
>> rsh_command                  /usr/local/bin/ssh -F
>> /usr/sge/cluster/special_ssh_config
>>
>
> That was what I did, and it worked grate, but it presents a drawback.

why do you want to copy something to the node? You can't know the  
node you will get in advance, unless you request exactly one node by  
name.

> The users are enable to use scp also. I could bypass this, telling to
> invert the scp order ... from the node (accessible through a Qrsh) to
> the master instead of the usual way (from the master to nodes). But it
> is a little tricky also, cause, what happen if the node are full? the
> Qrsh will never give a interactive shell.
> How you use to implement this in professionals clusters?

The idea around SGE is, that the /home is shared across all nodes. So  
all files are accessible already. If you want to compute locally on a  
node for perfomance reasons (avoiding network traffic): the aproach  
we have is to copy the inputfiles for the computation to the SGE  
supplied temporary directory on the node (which will be removed after  
the job automatically - /tmp/$JOB_ID.$TASK_ID.$QUEUE or $TMPDIR), and  
copy before the end of the job the results back to /home of the user.

So there is no need to access a node outside of SGE.

-- Reuti

> Anyway, about the original subjet, if some want to do a "qrsh
> /bin/bash" still put all the queue in E state ... all.
> How can I debug it, in order to try to find an explanation, or avoid
> that error? Althougt the users are few and trusteable, they could
> mistaken.
>
> Thanks in advance,
>
> Angel
>
> PS: Please excuse me my rough english, it is not my natural  
> language :)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list