[GE users] Job does not run on the other host.

Reuti reuti at staff.uni-marburg.de
Mon Sep 3 17:04:51 BST 2007


Am 03.09.2007 um 16:23 schrieb amanyus:

> Is it because I don't use network-shared $SGE_ROOT?

At least the $SGE_ROOT/default/common should be shared. OTOH you can  
copy the settings from this directory to the other machine(s) to  
reflect the actual setup:

http://gridengine.sunsource.net/howto/nfsreduce.html

So you can continue without sharing the $SGE_ROOT, but as stated:  
both machines must show up in `qconf -sel`.

-- Reuti


> Here I ran simple.sh after I have both host have execd installed.  
> Both also added each other as submit host and execution host.
>
> bash-3.00$ qstat
> job-ID  prior   name       user         state submit/start at      
> queue                          slots ja-task-ID
> ---------------------------------------------------------------------- 
> -------------------------------------------
>       1 0.55500 simple.sh  sgeadmin     r     09/03/2007 22:16:03  
> all.q at unknown                      1
>
> queue says all.q at unknown
>
> The job only can run local. So if I submit more jobs, they will  
> only be waiting to run on local. Not something that we expect from  
> a grid system.
>
>
> On Sep 3, 2007, at 7:33 PM, Reuti wrote:
>
>> Am 03.09.2007 um 13:09 schrieb Aman Yus:
>>
>>> Yes, the hostname is unknown. /etc/hosts file is accurate and these
>>> hosts can identify each other.
>>
>> But the other machine must also show up in `qconf -sel`. Are you  
>> sharing the $SGE_ROOT with the default/common inside between both  
>> nodes? - Reuti
>>
>>> bash-3.00# hostname
>>> unknown
>>>
>>>
>>> On 9/3/07, Reuti <reuti at staff.uni-marburg.de> wrote:
>>>> Hi,
>>>>
>>>> Am 03.09.2007 um 10:29 schrieb Aman Yus:
>>>>
>>>>> bash-3.2$ qconf -sh
>>>>> sun
>>>>> unknown
>>>>>
>>>>> bash-3.2$ qconf -sel
>>>>> unknown
>>>>
>>>> seems to be problem with the name resolution of one machine. So,  
>>>> this
>>>> machine is identified as "unknown" using the command `hostname`?  
>>>> Both
>>>> machines should show up here.
>>>>
>>>> -- Reuti
>>>>
>>>>> bash-3.2$ qconf -sql
>>>>> all.q
>>>>>
>>>>> bash-3.2$ qconf -sconf
>>>>> global:
>>>>> execd_spool_dir              /opt/n1ge6/default/spool
>>>>> mailer                       /bin/mailx
>>>>> xterm                        /usr/openwin/bin/xterm
>>>>> load_sensor                  none
>>>>> prolog                       none
>>>>> epilog                       none
>>>>> shell_start_mode             posix_compliant
>>>>> login_shells                 sh,ksh,csh,tcsh
>>>>> min_uid                      0
>>>>> min_gid                      0
>>>>> user_lists                   none
>>>>> xuser_lists                  none
>>>>> projects                     none
>>>>> xprojects                    none
>>>>> enforce_project              false
>>>>> enforce_user                 auto
>>>>> load_report_time             00:00:40
>>>>> max_unheard                  00:05:00
>>>>> reschedule_unknown           00:00:00
>>>>> loglevel                     log_warning
>>>>> administrator_mail           codept at gmail.com
>>>>> set_token_cmd                none
>>>>> pag_cmd                      none
>>>>> token_extend_time            none
>>>>> shepherd_cmd                 none
>>>>> qmaster_params               none
>>>>> execd_params                 none
>>>>> reporting_params             accounting=true reporting=false \
>>>>>                              flush_time=00:00:15 joblog=false
>>>>> sharelog=00:00:00
>>>>> finished_jobs                100
>>>>> gid_range                    50000-50100
>>>>> qlogin_command               telnet
>>>>> qlogin_daemon                /usr/sbin/in.telnetd
>>>>> rlogin_daemon                /usr/sbin/in.rlogind
>>>>> max_aj_instances             2000
>>>>> max_aj_tasks                 75000
>>>>> max_u_jobs                   0
>>>>> max_jobs                     0
>>>>> auto_user_oticket            0
>>>>> auto_user_fshare             0
>>>>> auto_user_default_project    none
>>>>> auto_user_delete_time        86400
>>>>> delegated_file_staging       false
>>>>> reprioritize                 0
>>>>>
>>>>>
>>>>> qname                 all.q
>>>>> hostlist              @allhosts
>>>>> seq_no                0
>>>>> load_thresholds       np_load_avg=1.75
>>>>> suspend_thresholds    NONE
>>>>> nsuspend              1
>>>>> suspend_interval      00:05:00
>>>>> priority              0
>>>>> min_cpu_interval      00:05:00
>>>>> processors            UNDEFINED
>>>>> qtype                 BATCH INTERACTIVE
>>>>> ckpt_list             NONE
>>>>> pe_list               make
>>>>> rerun                 FALSE
>>>>> slots                 1
>>>>> tmpdir                /tmp
>>>>> shell                 /bin/csh
>>>>> prolog                NONE
>>>>> epilog                NONE
>>>>> shell_start_mode      posix_compliant
>>>>> starter_method        NONE
>>>>> suspend_method        NONE
>>>>> resume_method         NONE
>>>>> terminate_method      NONE
>>>>> notify                00:00:60
>>>>> owner_list            NONE
>>>>> user_lists            NONE
>>>>> xuser_lists           NONE
>>>>> subordinate_list      NONE
>>>>> complex_values        NONE
>>>>> projects              NONE
>>>>> xprojects             NONE
>>>>> calendar              NONE
>>>>> initial_state         default
>>>>> s_rt                  INFINITY
>>>>> h_rt                  INFINITY
>>>>> s_cpu                 INFINITY
>>>>> h_cpu                 INFINITY
>>>>> s_fsize               INFINITY
>>>>>
>>>>> On 9/3/07, Reuti <reuti at staff.uni-marburg.de> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Am 03.09.2007 um 05:19 schrieb amanyus:
>>>>>>
>>>>>>> I've set up a very simple grid system consisting of two host.
>>>>>>> HostB) has execd installed and it was added as execution host on
>>>>>>> HostA by using qconf -ae. The problem is, when issue qsub  
>>>>>>> simple.sh
>>>>>>> from HostA, it won't run simple.sh on HostB. Jobs only run  
>>>>>>> locally
>>>>>>> successfully if qsub on HostB itself. Tips and guidance are much
>>>>>>> appreciated. Thanks.
>>>>>>>
>>>>>>> bash-3.00$ qsub simple.sh
>>>>>>> Unable to run job: warning: sgeadmin your job is not allowed  
>>>>>>> to run
>>>>>>> in any queue
>>>>>>> Your job 3 ("simple.sh") has been submitted.
>>>>>>> Exiting.
>>>>>>> bash-3.00$
>>>>>>>
>>>>>>> even as root,
>>>>>>>
>>>>>>> bash-3.00# qsub simple.sh
>>>>>>> Unable to run job: warning: root your job is not allowed to  
>>>>>>> run in
>>>>>>> any queue
>>>>>>> Your job 1 ("simple.sh") has been submitted.
>>>>>>> Exiting.
>>>>>>> bash-3.00#
>>>>>>
>>>>>> can you please post the SGE, queue and hostgroup configuration? -
>>>>>> Reuti
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ----
>>>>>> To unsubscribe, e-mail: users- 
>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users- 
>>>>>> help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list