[GE users] Issues submitting jobs from QMON

sumyee sum-yee.lai at hp.com
Wed Jan 20 17:53:32 GMT 2010


After doing a bit more testing, I don't think $SGE_ROOT/$SGE_CELL/common/sge_request was ever read when I submit interactive jobs through QMON.  I initially thought it did because sometimes the JSV script specified on that particular sge_request got run when no JSV URL was specified in QMON itself.  However, none of the other settings (e.g. priority, log file location) from the sge_request file were ever set.  

It seems like the JSV URL specified on $SGE_ROOT/$SGE_CELL/common/sge_request should always be run.  Otherwise, I either have to run server JSVs, use wrapper submit scripts, or disable the QMON submit tab and force users to submit jobs by commandline only.

Sum Yee

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Thursday, January 14, 2010 2:32 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Issues submitting jobs from QMON


Am 14.01.2010 um 22:22 schrieb sumyee:

> Hello,
> I am running into a few problems with submitting jobs from QMON on  
> SGE6.2u3.  I can submit both batch and interactive jobs from the  
> command line without errors, but I get different problems when  
> submitting from QMON.
> In the interactive case:
> 1)  The default settings in $SGE_ROOT/$SGE_CELL/common/sge_request  
> doesn't always get read.

For me it's never read when I submit a job this way (an interactive  
job in QMON), and always for a standard batch job submitted from  
QMON. But maybe it was bad luck and sometimes it would succeed.

Can you please file an issue?

> 2)  No x window is launched.  The prolog and epilog runs, then I  
> get a pop-up window that saids:  No free slots for interactive job  
> $JOB_NUMBER!  (To submit the interactive job, I just selected the  
> Interactive button and gave it a job name.  No other settings were  
> changed from default.)  The error messages were logged in the  
> message files.  Qacct registers an exit_status of 1.

Correct, it's the same behavior you would notice when you issue `qsh`  
on the command line. The original idea behind this was to start xterm  
on a node and connect to the submitting machine by the display  
environment variable, hence you would need to allow "xhosts +" on the  
submitting machine. This is nowadays judged as being unsafe. Your  
definitions of *_daemon/command in SGE's configuration are not used  
for `qsh`.

AFAICS you can only `qsub` and `qsh` from QMON, but `qrsh` isn't  
supported. The best would be to issue `qrsh xtern` to get an X-Window  
with your set up SSH forwarding of X11.

-- Reuti

> In the batch case:
> 1)  SGE_O_HOST and SGE_O_WORKDIR are not set.
> Any help will be greatly appreciated!
> Sum Yee
> ---------------------------------------------------------------------- 
> ------
> My config is as follows:
> $qconf -sconf global
> #global:
> execd_spool_dir              /usr/local/GridEngine/default/spool
> mailer                       /bin/mail
> xterm                        /usr/bin/X11/xterm
> load_sensor                  /usr/local/GridEngine/custom/src/load/ 
> load.py
> prolog                       none
> epilog                       none
> shell_start_mode             unix_behavior
> login_shells                 sh,ksh,csh,tcsh,bash
> min_uid                      0
> min_gid                      0
> user_lists                   none
> xuser_lists                  none
> projects                     none
> xprojects                    none
> enforce_project              false
> enforce_user                 auto
> load_report_time             00:00:40
> max_unheard                  00:05:00
> reschedule_unknown           00:00:00
> loglevel                     log_warning
> administrator_mail           sum-yee.lai at hp.com
> set_token_cmd                none
> pag_cmd                      none
> token_extend_time            none
> shepherd_cmd                 none
> qmaster_params               MAX_DYN_EC=10000
> execd_params                 none
> reporting_params             accounting=true reporting=true \
>                              flush_time=00:00:15 joblog=false  
> sharelog=00:00:00
> finished_jobs                100
> gid_range                    20000-30000
> qlogin_command               builtin
> qlogin_daemon                builtin
> rlogin_command               /usr/bin/ssh -t -X -Y
> rlogin_daemon                /usr/sbin/sshd -i
> rsh_command                  /usr/bin/ssh -t -X -Y
> rsh_daemon                   /usr/sbin/sshd -i
> max_aj_instances             2000
> max_aj_tasks                 75000
> max_u_jobs                   0
> max_jobs                     0
> max_advance_reservations     15
> auto_user_oticket            0
> auto_user_fshare             0
> auto_user_default_project    none
> auto_user_delete_time        86400
> delegated_file_staging       false
> reprioritize                 false
> jsv_url                      none
> jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=238844
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list