[GE users] Transfer queue(s)

Marcel Turcotte turcotte at site.uottawa.ca
Thu Oct 7 18:05:49 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi!

I would like to implement a transfer queue, as proposed here,

http://gridengine.sunsource.net/project/gridengine/howto/TransferQueues/transferqueues.html

I am running sge-5_3p2 on Solaris 9 (sparc) environment, the masters
and execution hosts are in the same NIS domain, and share the same
disk space, NFS.

I've got two cells: homa_cell and clicc_cell, and I would like to
automatically transfer jobs to clicc_cell whenever possible.

Homa and simorgh are two masters for homa_cell and clicc_cell
respectively.

I am calling the transfer queue toclicc, and it is defined as
follows.

homa% qconf -sq toclicc

qname                toclicc
hostname             simorgh.site.uottawa.ca
seq_no               0
load_thresholds      mpk27jobs=5
suspend_thresholds   NONE
nsuspend             1
suspend_interval     00:05:00
priority             1
min_cpu_interval     00:05:00
processors           UNDEFINED
qtype                BATCH
rerun                FALSE
slots                2
tmpdir               /tmp
shell                /bin/csh
shell_start_mode     NONE
prolog               NONE
epilog               NONE
starter_method       /local/sge/scripts/toclicc_starter.sh
suspend_method       /local/sge/scripts/toclicc_suspend.sh
resume_method        /local/sge/scripts/toclicc_resume.sh
terminate_method     /local/sge/scripts/toclicc_terminate.sh
notify               00:00:60
owner_list           NONE
user_lists           NONE
xuser_lists          NONE
subordinate_list     NONE
complex_list         NONE
complex_values       NONE
calendar             NONE
initial_state        enabled
s_rt                 INFINITY
h_rt                 INFINITY
s_cpu                INFINITY
h_cpu                INFINITY
s_fsize              INFINITY
h_fsize              INFINITY
s_data               INFINITY
h_data               INFINITY
s_stack              INFINITY
h_stack              INFINITY
s_core               INFINITY
h_core               INFINITY
s_rss                INFINITY
h_rss                INFINITY
s_vmem               INFINITY
h_vmem               INFINITY

----------------------------------------------------------------------
The host simorgh is defined as follows:

homa% qconf -sconf simorgh
simorgh.site.uottawa.ca:
load_sensor               /local/sge/scripts/clicc_clusterload.sh
qlogin_daemon             /usr/sbin/in.telnetd
load_report_time          00:00:09
rlogin_daemon             /usr/sbin/in.rlogind

where /local/sge/scripts/clicc_clusterload.sh is

----------------------------------------------------------------------
#!/bin/sh

# load sensor to report number of running jobs on a cluster
# NOTE: make sure the host which this script runs on is a submit host
# for the remote cluster
# below should contain path to SGE_ROOT of cluster being queried

SGE_ROOT=/local/sge
SGE_CELL=clicc_cell
export SGE_ROOT SGE_CELL

ARCH=`$SGE_ROOT/util/arch`

PATH=/bin:/usr/bin:$SGE_ROOT/bin/$ARCH

end=false
while [ $end = false ]; do

   jobs=`qstat -s p -g d | awk '$1 ~ /^[0-9]/ {print $0} '| wc -l`

   # ----------------------------------------
   # wait for an input
   #
   read input
   if [ $? != 0 ]; then
      end=true
      break
   fi
   
   if [ "$input" = "quit" ]; then
      end=true
      break
   fi

   echo "begin"
   echo "global:mpk27jobs:$jobs"
   echo "end"

done

----------------------------------------------------------------------

Running clicc_clusterload.sh from homa seems to produce the expected
result, i.e. shows the number of queued jobs on simorgh.

homa% /local/sge/scripts/clicc_clusterload.sh

begin
global:mpk27jobs:       1
end

begin
global:mpk27jobs:       0
end

...

----------------------------------------------------------------------

However, the queue is not available:

homa% qstat -alarm

queuename            qtype used/tot. load_avg arch      states
----------------------------------------------------------------------------
toclicc              B     0/2       99.99    none      au
        error: no load value for threshold mpk27jobs

I am not really clear about mpk27jobs.  I've defined it as an
attribute of the host complex:

homa% qconf -sc host

#name            shortcut   type   value           relop requestable 
consumable default
#--------------------------------------------------------------------------------------
...

mpk27jobs        mpk27jobs  INT    0               <=    YES         
NO         0    

----------------------------------------------------------------------

I would really appreciate it if someone could take the time to
describe with a bit more details the procedure to create a transfer
queue.

Any pointer will be greatly appreciated,
Marcel


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list