[GE users] Trying to use starter_method with sge 6.0

Robert Olson olson at mcs.anl.gov
Mon Nov 8 05:09:29 GMT 2004


Hi --

I'm trying to use the starter_method queue property with a sge 6.0 
installation. As far as I can tell, this property is being ignored, 
even though the documentation seems to show it should work.

I ended up running an sge_execd with debugging on, and see that the 
parameter just doesn't show up as being read. I know the queue configs 
are being seen, since if I change another value in them I see the 
change show up here.

What I'm trying to do is run some portable scripts across different 
clusters that have different filesystem layouts; the starter_method 
would be a great way to do this. I've tried running a script without 
the starter, assuming the execvp() in the shepherd would find it in the 
path, but apparently the path that the execd is running with is 
different than the one that I gave it (I started it from my shell).

Anyway, following is a copy of the debugging output showing the 
queueing configuration. If anyone has any insights, I'd love to hear 
them. (I've found so far that SGE has had mechanisms to solve the 
problems I've run into, so am hoping this one is also that way ...)

thanks,
--bob

olson at tg-c052:~/SGE/default/common> env SGE_DEBUG_LEVEL="10 0 0 0 0 0 0 
0" ../../bin/lx24-ia64/sge_execd
      0  18953 1024     Getting host by name - Linux
      1  18953 1024     1 names in h_addr_list
      2  18953 1024     0 names in h_aliases
      3  18953 1024     me.who                      >19<
      4  18953 1024     me.sge_formal_prog_name     >execd<
      5  18953 1024     me.qualified_hostname       
 >tg-c052.uc.teragrid.org<
      6  18953 1024     me.unqualified_hostname     >tg-c052<
      7  18953 1024     me.uid                      >762<
      8  18953 1024     me.gid                      >100<
      9  18953 1024     me.daemonized               >0<
     10  18953 1024     me.user_name                >olson<
     11  18953 1024     me.default_cell             >default<
     12  18953 1024     sge_root            >/home/olson/SGE<
     13  18953 1024     cell_root           >/home/olson/SGE/default<
     14  18953 1024     conf_file           
 >/home/olson/SGE/default/common/bootstrap<
     15  18953 1024     bootstrap_file      
 >/home/olson/SGE/default/common/configuration<
     16  18953 1024     act_qmaster_file    
 >/home/olson/SGE/default/common/act_qmaster<
     17  18953 1024     acct_file           
 >/home/olson/SGE/default/common/accounting<
     18  18953 1024     reporting_file      
 >/home/olson/SGE/default/common/reporting<
     19  18953 1024     local_conf_dir      
 >/home/olson/SGE/default/common/local_conf<
     20  18953 1024     shadow_masters_file 
 >/home/olson/SGE/default/common/shadow_masters<
     21  18953 1024     admin_user          >olson<
     22  18953 1024     default_domain      >uc.teragrid.org<
     23  18953 1024     ignore_fqdn         >false<
     24  18953 1024     spooling_method     >classic<
     25  18953 1024     spooling_lib        >libspoolc<
     26  18953 1024     spooling_params     
 >/home/olson/SGE/default/common;/home/olson/SGE/default/spool/qmaster<
     27  18953 1024     binary_path         >/home/olson/SGE/bin<
     28  18953 1024     qmaster_spool_dir   
 >/home/olson/SGE/default/spool/qmaster<
     29  18953 1024     security_mode        >none<
     30  18953 1024     ../libs/gdi/sge_any_request.c 228 starting up 
communication without threads
     31  18953 1024     Getting host by name - Linux
     32  18953 1024     1 names in h_addr_list
     33  18953 1024     0 names in h_aliases
     34  18953 1024     me.qualified_hostname: tg-c052.uc.teragrid.org
     35  18953 1024     returning port value: 10111
     36  18953 1024     Getting host by name - Linux
     37  18953 1024     1 names in h_addr_list
     38  18953 1024     0 names in h_aliases
     39  18953 1024     returning port value: 10112
     40  18953 1024     Getting host by name - Linux
     41  18953 1024     1 names in h_addr_list
     42  18953 1024     0 names in h_aliases
     43  18953 1024     ../libs/gdi/sge_any_request.c 362 created 
communication handel for component name "execd"
     44  18953 1024     qualified hostname: tg-c052.uc.teragrid.org
     45  18953 1024     get_configuration: unique for 
tg-c052.uc.teragrid.org: tg-c052.uc.teragrid.org
     46  18953 1024     requesting global and tg-c052.uc.teragrid.org
     47  18953 1024     reresolve port timeout in 600
     48  18953 1024     returning cached port value: 10111
     49  18953 1024     ../libs/gdi/sge_any_request.c 571 received from: 
a6.339.sc04.org,1
     50  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"log_warning" for loglevel
     51  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"/home/olson/SGE/default/spool" for execd_spool_dir
     52  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"/usr/bin/mail" for mailer
     53  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"/usr/bin/X11/xterm" for xterm
     54  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
load_sensor
     55  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"/home/olson/SGE/transfer-prolog $fs_stdin_file_staging $fs_stdin_host 
$fs_stdin_path $fs_stdin_tmp_p" for prolog
     56  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"/home/olson/SGE/transfer-epilog $fs_stdout_file_staging 
$fs_stdout_host $fs_stdout_path $fs_stdout_t" for epilog
     57  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"posix_compliant" for shell_start_mode
     58  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"sh,ksh,csh,tcsh" for login_shells
     59  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "0" for 
min_uid
     60  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "0" for 
min_gid
     61  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"20000-20100" for gid_range
     62  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "00:00:40" 
for load_report_time
     63  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "false" for 
enforce_project
     64  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "auto" for 
enforce_user
     65  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "00:05:00" 
for max_unheard
     66  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"log_warning" for loglevel
     67  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
administrator_mail
     68  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
set_token_cmd
     69  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
pag_cmd
     70  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
token_extend_time
     71  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
shepherd_cmd
     72  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
qmaster_params
     73  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
execd_params
     74  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"accounting=true reporting=false flush_time=00:00:15 joblog=false 
sharelog=00:00:00" for reporting_params
     75  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "100" for 
finished_jobs
     76  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"/usr/libexec/telnetd" for qlogin_daemon
     77  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "telnet" for 
qlogin_command
     78  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
rsh_daemon
     79  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
rsh_command
     80  18953 1024     ../libs/sgeobj/sge_conf.c 294 using 
"/usr/libexec/rlogind" for rlogin_daemon
     81  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
rlogin_command
     82  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "00:00:00" 
for reschedule_unknown
     83  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "2000" for 
max_aj_instances
     84  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "75000" for 
max_aj_tasks
     85  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "0" for 
max_u_jobs
     86  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "0" for 
max_jobs
     87  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "0" for 
reprioritize
     88  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "0" for 
auto_user_oticket
     89  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "0" for 
auto_user_fshare
     90  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "none" for 
auto_user_default_project
     91  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "100" for 
auto_user_delete_time
     92  18953 1024     ../libs/sgeobj/sge_conf.c 294 using "true" for 
delegated_file_staging
     93  18953 1024     conf.execd_spool_dir        
 >/home/olson/SGE/default/spool<
     94  18953 1024     conf.mailer                 >/usr/bin/mail<
     95  18953 1024     conf.prolog                 
 >/home/olson/SGE/transfer-prolog $fs_stdin_file_staging $fs_stdin_host 
$fs_stdin_path $fs_stdin_tmp_path<
     96  18953 1024     conf.epilog                 
 >/home/olson/SGE/transfer-epilog $fs_stdout_file_staging 
$fs_stdout_host $fs_stdout_path $fs_stdout_tmp_path 
$fs_stderr_file_staging $fs_stderr_host $fs_stderr_path 
$fs_stderr_tmp_path<
     97  18953 1024     conf.shell_start_mode       >posix_compliant<
     98  18953 1024     conf.login_shells           >sh,ksh,csh,tcsh<
     99  18953 1024     conf.administrator_mail     >none<
    100  18953 1024     conf.min_gid                >0<
    101  18953 1024     conf.min_uid                >0<
    102  18953 1024     conf.load_report_time       >40<
    103  18953 1024     conf.max_unheard            >300<
    104  18953 1024     conf.loglevel               >4<
    105  18953 1024     conf.xterm                  >/usr/bin/X11/xterm<
    106  18953 1024     conf.load_sensor            >none<
    107  18953 1024     conf.enforce_project        >false<
    108  18953 1024     conf.enforce_user           >auto<
    109  18953 1024     conf.set_token_cmd          >none<
    110  18953 1024     conf.pag_cmd                >none<
    111  18953 1024     conf.token_extend_time      >0<
    112  18953 1024     conf.shepherd_cmd           >none<
    113  18953 1024     conf.qmaster_params         >none<
    114  18953 1024     conf.execd_params           >none<
    115  18953 1024     conf.gid_range              >20000-20100<
    116  18953 1024     conf.zombie_jobs            >100<
    117  18953 1024     conf.qlogin_daemon          
 >/usr/libexec/telnetd<
    118  18953 1024     conf.qlogin_command         >telnet<
    119  18953 1024     conf.rsh_daemon             >none<
    120  18953 1024     conf.rsh_command            >none<
    121  18953 1024     conf.rlogin_daemon          
 >/usr/libexec/rlogind<
    122  18953 1024     conf.rlogin_command         >none<
    123  18953 1024     conf.reschedule_unknown     >0<
    124  18953 1024     conf.max_aj_instances       >2000<
    125  18953 1024     conf.max_aj_tasks           >75000<
    126  18953 1024     conf.max_u_jobs             >0<
    127  18953 1024     conf.max_jobs               >0<
    128  18953 1024     conf.reprioritize           >0<
    129  18953 1024     conf.auto_user_oticket      >0<
    130  18953 1024     conf.auto_user_fshare       >0<
    131  18953 1024     conf.auto_user_default_project >none<
    132  18953 1024     conf.auto_user_delete_time  >100<
    133  18953 1024     conf.delegated_file_staging >true<
    134  18953 1024     me.qualified_hostname: tg-c052.uc.teragrid.org
    135  18953 1024     chdir("/")----------------------------
    136  18953 1024     Making directories----------------------------
    137  18953 1024     
chdir("/home/olson/SGE/default/spool")----------------------------
    138  18953 1024     
chdir("tg-c052",me.unqualified_hostname)--------------------------
    139  18953 1024     Making directories----------------------------
    140  18953 1024     use_qidle: 0
    141  18953 1024     ---> 0.000000 0.000000 0.000000 - 0
    142  18953 1024     *****Checking In With qmaster*****
    143  18953 1024     reresolve port timeout in 600
    144  18953 1024     returning cached port value: 10111
    145  18953 1024     ../libs/gdi/sge_any_request.c 571 received from: 
a6.339.sc04.org,1
    146  18953 1024     ../daemons/common/shutdown.c 65 starting up 6.0u1
    147  18953 1024     ../daemons/execd/execd.c 223 User 'root' did not 
start the application
    148  18953 1024     ../daemons/execd/execd.c 231 successfully 
started PDC and PTF

    149  18953 1024     ../daemons/execd/reaper_execd.c 1224 checking 
for old jobs
    150  18953 1024     ../daemons/execd/reaper_execd.c 1245 no old jobs 
at startup
    151  18953 1024     ALIVE TEST OF MASTER
    152  18953 1024     ../libs/gdi/sge_any_request.c 688 qmaster is 
still running
    153  18953 1024     ../libs/gdi/sge_any_request.c 693 endpoint is up 
since 15 seconds and has status 0
    154  18953 1024     SENDING LOAD AND REPORTS
    155  18953 1024      REPORT_JOB
    156  18953 1024     reresolve port timeout in 600
    157  18953 1024     returning cached port value: 10111
    158  18953 1024     ../libs/gdi/sge_security.c 370 fromcommproc is 
empty string
    159  18953 1024     ../libs/gdi/sge_security.c 389 standard gdi 
request to qmaster
    160  18953 1024     receive_message_cach_n_ack() returns: got no 
message (//0)
    161  18953 1024     No jobs to start
    162  18953 1024     ALIVE TEST OF MASTER
    163  18953 1024     ../libs/gdi/sge_any_request.c 688 qmaster is 
still running
    164  18953 1024     ../libs/gdi/sge_any_request.c 693 endpoint is up 
since 16 seconds and has status 0
    165  18953 1024     SENDING LOAD AND REPORTS
    166  18953 1024      REPORT_LOAD
    167  18953 1024     ---> 0.000000 0.000000 0.000000 - 0
    168  18953 1024      REPORT_CONF
    169  18953 1024      REPORT_LICENSE
    170  18953 1024      REPORT_JOB
    171  18953 1024     reresolve port timeout in 598
    172  18953 1024     returning cached port value: 10111
    173  18953 1024     ----> was_communication_error: no error happened 
(1000)
    174  18953 1024     ====================[ DISPATCH EPOCH 
]===========================
    175  18953 1024     ../libs/gdi/sge_security.c 370 fromcommproc is 
empty string
    176  18953 1024     ../libs/gdi/sge_security.c 389 standard gdi 
request to qmaster
    177  18953 1024     receive_message_cach_n_ack() returns: got no 
message (//0)
    178  18953 1024     ====================[ DISPATCH EPOCH 
]===========================
    179  18953 1024     ../libs/gdi/sge_security.c 370 fromcommproc is 
empty string
    180  18953 1024     ../libs/gdi/sge_security.c 389 standard gdi 
request to qmaster
    181  18953 1024     receive_message_cach_n_ack() returns: got no 
message (//0)
    182  18953 1024     ====================[ DISPATCH EPOCH 
]===========================
    183  18953 1024     ../libs/gdi/sge_security.c 370 fromcommproc is 
empty string
    184  18953 1024     ../libs/gdi/sge_security.c 389 standard gdi 
request to qmaster
    185  18953 1024     receive_message_cach_n_ack() returns: got no 
message (//0)
    186  18953 1024     ====================[ DISPATCH EPOCH 
]===========================
    187  18953 1024     ../daemons/common/shutdown.c 96 controlled 
shutdown 6.0u1


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list