[GE users] Job does not run on the other host.

amanyus amanyus at gmail.com
Mon Sep 3 15:23:58 BST 2007


Is it because I don't use network-shared $SGE_ROOT?

Here I ran simple.sh after I have both host have execd installed.  
Both also added each other as submit host and execution host.

bash-3.00$ qstat
job-ID  prior   name       user         state submit/start at      
queue                          slots ja-task-ID
------------------------------------------------------------------------ 
-----------------------------------------
       1 0.55500 simple.sh  sgeadmin     r     09/03/2007 22:16:03  
all.q at unknown                      1

queue says all.q at unknown

The job only can run local. So if I submit more jobs, they will only  
be waiting to run on local. Not something that we expect from a grid  
system.


On Sep 3, 2007, at 7:33 PM, Reuti wrote:

> Am 03.09.2007 um 13:09 schrieb Aman Yus:
>
>> Yes, the hostname is unknown. /etc/hosts file is accurate and these
>> hosts can identify each other.
>
> But the other machine must also show up in `qconf -sel`. Are you  
> sharing the $SGE_ROOT with the default/common inside between both  
> nodes? - Reuti
>
>> bash-3.00# hostname
>> unknown
>>
>>
>> On 9/3/07, Reuti <reuti at staff.uni-marburg.de> wrote:
>>> Hi,
>>>
>>> Am 03.09.2007 um 10:29 schrieb Aman Yus:
>>>
>>>> bash-3.2$ qconf -sh
>>>> sun
>>>> unknown
>>>>
>>>> bash-3.2$ qconf -sel
>>>> unknown
>>>
>>> seems to be problem with the name resolution of one machine. So,  
>>> this
>>> machine is identified as "unknown" using the command `hostname`?  
>>> Both
>>> machines should show up here.
>>>
>>> -- Reuti
>>>
>>>> bash-3.2$ qconf -sql
>>>> all.q
>>>>
>>>> bash-3.2$ qconf -sconf
>>>> global:
>>>> execd_spool_dir              /opt/n1ge6/default/spool
>>>> mailer                       /bin/mailx
>>>> xterm                        /usr/openwin/bin/xterm
>>>> load_sensor                  none
>>>> prolog                       none
>>>> epilog                       none
>>>> shell_start_mode             posix_compliant
>>>> login_shells                 sh,ksh,csh,tcsh
>>>> min_uid                      0
>>>> min_gid                      0
>>>> user_lists                   none
>>>> xuser_lists                  none
>>>> projects                     none
>>>> xprojects                    none
>>>> enforce_project              false
>>>> enforce_user                 auto
>>>> load_report_time             00:00:40
>>>> max_unheard                  00:05:00
>>>> reschedule_unknown           00:00:00
>>>> loglevel                     log_warning
>>>> administrator_mail           codept at gmail.com
>>>> set_token_cmd                none
>>>> pag_cmd                      none
>>>> token_extend_time            none
>>>> shepherd_cmd                 none
>>>> qmaster_params               none
>>>> execd_params                 none
>>>> reporting_params             accounting=true reporting=false \
>>>>                              flush_time=00:00:15 joblog=false
>>>> sharelog=00:00:00
>>>> finished_jobs                100
>>>> gid_range                    50000-50100
>>>> qlogin_command               telnet
>>>> qlogin_daemon                /usr/sbin/in.telnetd
>>>> rlogin_daemon                /usr/sbin/in.rlogind
>>>> max_aj_instances             2000
>>>> max_aj_tasks                 75000
>>>> max_u_jobs                   0
>>>> max_jobs                     0
>>>> auto_user_oticket            0
>>>> auto_user_fshare             0
>>>> auto_user_default_project    none
>>>> auto_user_delete_time        86400
>>>> delegated_file_staging       false
>>>> reprioritize                 0
>>>>
>>>>
>>>> qname                 all.q
>>>> hostlist              @allhosts
>>>> seq_no                0
>>>> load_thresholds       np_load_avg=1.75
>>>> suspend_thresholds    NONE
>>>> nsuspend              1
>>>> suspend_interval      00:05:00
>>>> priority              0
>>>> min_cpu_interval      00:05:00
>>>> processors            UNDEFINED
>>>> qtype                 BATCH INTERACTIVE
>>>> ckpt_list             NONE
>>>> pe_list               make
>>>> rerun                 FALSE
>>>> slots                 1
>>>> tmpdir                /tmp
>>>> shell                 /bin/csh
>>>> prolog                NONE
>>>> epilog                NONE
>>>> shell_start_mode      posix_compliant
>>>> starter_method        NONE
>>>> suspend_method        NONE
>>>> resume_method         NONE
>>>> terminate_method      NONE
>>>> notify                00:00:60
>>>> owner_list            NONE
>>>> user_lists            NONE
>>>> xuser_lists           NONE
>>>> subordinate_list      NONE
>>>> complex_values        NONE
>>>> projects              NONE
>>>> xprojects             NONE
>>>> calendar              NONE
>>>> initial_state         default
>>>> s_rt                  INFINITY
>>>> h_rt                  INFINITY
>>>> s_cpu                 INFINITY
>>>> h_cpu                 INFINITY
>>>> s_fsize               INFINITY
>>>>
>>>> On 9/3/07, Reuti <reuti at staff.uni-marburg.de> wrote:
>>>>> Hi,
>>>>>
>>>>> Am 03.09.2007 um 05:19 schrieb amanyus:
>>>>>
>>>>>> I've set up a very simple grid system consisting of two host.
>>>>>> HostB) has execd installed and it was added as execution host on
>>>>>> HostA by using qconf -ae. The problem is, when issue qsub  
>>>>>> simple.sh
>>>>>> from HostA, it won't run simple.sh on HostB. Jobs only run  
>>>>>> locally
>>>>>> successfully if qsub on HostB itself. Tips and guidance are much
>>>>>> appreciated. Thanks.
>>>>>>
>>>>>> bash-3.00$ qsub simple.sh
>>>>>> Unable to run job: warning: sgeadmin your job is not allowed  
>>>>>> to run
>>>>>> in any queue
>>>>>> Your job 3 ("simple.sh") has been submitted.
>>>>>> Exiting.
>>>>>> bash-3.00$
>>>>>>
>>>>>> even as root,
>>>>>>
>>>>>> bash-3.00# qsub simple.sh
>>>>>> Unable to run job: warning: root your job is not allowed to  
>>>>>> run in
>>>>>> any queue
>>>>>> Your job 1 ("simple.sh") has been submitted.
>>>>>> Exiting.
>>>>>> bash-3.00#
>>>>>
>>>>> can you please post the SGE, queue and hostgroup configuration? -
>>>>> Reuti
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list