[GE users] Transfer queue(s)

Charu Chaubal Charu.Chaubal at Sun.COM
Thu Oct 7 19:02:28 BST 2004


Hello,

See below:

Marcel Turcotte wrote:
> Hi!
> 
> I would like to implement a transfer queue, as proposed here,
> 
> http://gridengine.sunsource.net/project/gridengine/howto/TransferQueues/transferqueues.html
> 
> I am running sge-5_3p2 on Solaris 9 (sparc) environment, the masters
> and execution hosts are in the same NIS domain, and share the same
> disk space, NFS.
> 
> I've got two cells: homa_cell and clicc_cell, and I would like to
> automatically transfer jobs to clicc_cell whenever possible.
> 
> Homa and simorgh are two masters for homa_cell and clicc_cell
> respectively.
> 
> I am calling the transfer queue toclicc, and it is defined as
> follows.
> 
> homa% qconf -sq toclicc
> 
> qname                toclicc
> hostname             simorgh.site.uottawa.ca
> seq_no               0
> load_thresholds      mpk27jobs=5
> suspend_thresholds   NONE
> nsuspend             1
> suspend_interval     00:05:00
> priority             1
> min_cpu_interval     00:05:00
> processors           UNDEFINED
> qtype                BATCH
> rerun                FALSE
> slots                2
> tmpdir               /tmp
> shell                /bin/csh
> shell_start_mode     NONE
> prolog               NONE
> epilog               NONE
> starter_method       /local/sge/scripts/toclicc_starter.sh
> suspend_method       /local/sge/scripts/toclicc_suspend.sh
> resume_method        /local/sge/scripts/toclicc_resume.sh
> terminate_method     /local/sge/scripts/toclicc_terminate.sh
> notify               00:00:60
> owner_list           NONE
> user_lists           NONE
> xuser_lists          NONE
> subordinate_list     NONE
> complex_list         NONE
> complex_values       NONE
> calendar             NONE
> initial_state        enabled
> s_rt                 INFINITY
> h_rt                 INFINITY
> s_cpu                INFINITY
> h_cpu                INFINITY
> s_fsize              INFINITY
> h_fsize              INFINITY
> s_data               INFINITY
> h_data               INFINITY
> s_stack              INFINITY
> h_stack              INFINITY
> s_core               INFINITY
> h_core               INFINITY
> s_rss                INFINITY
> h_rss                INFINITY
> s_vmem               INFINITY
> h_vmem               INFINITY
> 
> ----------------------------------------------------------------------
> The host simorgh is defined as follows:
> 
> homa% qconf -sconf simorgh
> simorgh.site.uottawa.ca:
> load_sensor               /local/sge/scripts/clicc_clusterload.sh
> qlogin_daemon             /usr/sbin/in.telnetd
> load_report_time          00:00:09
> rlogin_daemon             /usr/sbin/in.rlogind
> 
> where /local/sge/scripts/clicc_clusterload.sh is
> 
> ----------------------------------------------------------------------
> #!/bin/sh
> 
> # load sensor to report number of running jobs on a cluster
> # NOTE: make sure the host which this script runs on is a submit host
> # for the remote cluster
> # below should contain path to SGE_ROOT of cluster being queried
> 
> SGE_ROOT=/local/sge
> SGE_CELL=clicc_cell
> export SGE_ROOT SGE_CELL
> 
> ARCH=`$SGE_ROOT/util/arch`
> 
> PATH=/bin:/usr/bin:$SGE_ROOT/bin/$ARCH
> 
> end=false
> while [ $end = false ]; do
> 
>    jobs=`qstat -s p -g d | awk '$1 ~ /^[0-9]/ {print $0} '| wc -l`
> 
>    # ----------------------------------------
>    # wait for an input
>    #
>    read input
>    if [ $? != 0 ]; then
>       end=true
>       break
>    fi
>    
>    if [ "$input" = "quit" ]; then
>       end=true
>       break
>    fi
> 
>    echo "begin"
>    echo "global:mpk27jobs:$jobs"
>    echo "end"
> 
> done
> 
> ----------------------------------------------------------------------
> 
> Running clicc_clusterload.sh from homa seems to produce the expected
> result, i.e. shows the number of queued jobs on simorgh.
> 
> homa% /local/sge/scripts/clicc_clusterload.sh
> 
> begin
> global:mpk27jobs:       1
> end
> 
> begin
> global:mpk27jobs:       0
> end
> 
> ...
> 
> ----------------------------------------------------------------------
> 
> However, the queue is not available:
> 
> homa% qstat -alarm
> 
> queuename            qtype used/tot. load_avg arch      states
> ----------------------------------------------------------------------------
> toclicc              B     0/2       99.99    none      au
>         error: no load value for threshold mpk27jobs
> 

The load_avg 99.99 and the state "au" indicates that the sge_execd daemon is
either down, or else unable to connect to the qmaster (assuming the system
itself is not having problems).  Please check why this is the case first.

Regards,
	Charu


> I am not really clear about mpk27jobs.  I've defined it as an
> attribute of the host complex:
> 
> homa% qconf -sc host
> 
> #name            shortcut   type   value           relop requestable 
> consumable default
> #--------------------------------------------------------------------------------------
> ...
> 
> mpk27jobs        mpk27jobs  INT    0               <=    YES         
> NO         0    
> 
> ----------------------------------------------------------------------
> 
> I would really appreciate it if someone could take the time to
> describe with a bit more details the procedure to create a transfer
> queue.
> 
> Any pointer will be greatly appreciated,
> Marcel
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
####################################################################
# Charu V. Chaubal              # Phone: (650) 786-7672 (x87672)   #
# Grid Computing Technologist   # Fax:   (650) 786-4591            #
# Sun Microsystems, Inc.        # Email: charu.chaubal at sun.com     #
####################################################################


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list