[GE users] Limit Number of Jobs on Exec Hosts

reuti reuti at staff.uni-marburg.de
Thu Jan 21 17:44:33 GMT 2010


Am 21.01.2010 um 17:33 schrieb myiagros:

> I've run into another issue since limiting the slot number. I have  
> tried changing the slots through QMON and through qconf, either way  
> I get errors when trying to submit jobs.
>
> 01/21/2010 11:24:01|schedu|dilithium|E|unable to find job 35 from  
> the scheduler order package
> 01/21/2010 11:24:04|schedu|dilithium|E|could not find job "35" in  
> master list
> 01/21/2010 11:24:04|schedu|dilithium|E|callback function for event  
> "223. EVENT DEL JOB 35.1" failed
> 01/21/2010 11:24:34|worker|dilithium|E|unable to find job 36 from  
> the scheduler order package
> 01/21/2010 11:24:34|worker|dilithium|W|Skipping remaining 0 orders
> 01/21/2010 11:24:34|schedu|dilithium|E|unable to find job 36 from  
> the scheduler order package
> 01/21/2010 11:24:49|schedu|dilithium|E|could not find job "36" in  
> master list
>
> Am I missing something? I was able to submit and run jobs with no  
> errors, once I changed the slots I started getting these errors. In  
> qmon it shows with a popup(with start job immediately selected),  
> "No free slots for interactive job ##".

Interactive jobs in QMON will use `qsh` which is nowadays not the  
best option, as it will create a direct connection from the execution  
host to the submit host  by just starting `xterm` on the execuction  
host (you have to allow the connection on the submit (i.e. target)  
machine with `xhosts +` which is considered unsafe). When the  
connection fails (maybe also due to a firewall setting), you get the  
mentioned error.

It's better to run on the command line: `qrsh xterm` and setup SGE to  
use SSH with X11 forwarding. For this work it's sufficient to define  
"rsh_command" and "rsh_daemon " to use SSH as outlined here: http:// 
gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html Note: when there  
are local configurations defined for the nodes, these may override  
the global definiton. In case you have a homogenous cluster you could  
even delete all local configurations.

-- Reuti


>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=240201
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=240212

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list