[GE users] Node allocation considering network topolgy

Richard Ems r.ems at gmx.net
Sun Mar 5 13:28:08 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti wrote:
> Strange: what was your qsub command? - Reuti

The first qsub command was:

# qsub -pe mpich* 8 ~/job1.sh

I also tried with

# qsub -pe mpich* 2 ~/job1.sh

but with the same results "cannot run in PE "mpich_09" because it only
offers 0 slots"



# cat job1.sh
echo hostname=`hostname`
sleep 30



# qstat -g c
CLUSTER QUEUE                   CQLOAD   USED  AVAIL  TOTAL aoACDS  cdsuE
-------------------------------------------------------------------------------
all.q                             -NA-      0      0      0      0      0
cluster09.q                   0.00      0      8      8      0      0
cluster10.q                   0.45      0      7      8      1      0
cluster11.q                   0.47      0      0      8      0      8
cluster12.q                   0.47      0      0      8      0      8
c_para                        0.47     15      0     15     15      0



# qstat -j 854
==============================================================
job_number:                 854
exec_file:                  job_scripts/854
submission_time:            Sun Mar  5 14:21:05 2006
owner:                      ems
uid:                        501
group:                      users
gid:                        100
sge_o_home:                 /net/fs02/home/ems
sge_o_log_name:             ems
sge_o_path:
/opt/sge/bin/lx24-x86:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/usr/games:/opt/gnome/bin:/opt/kde3/bin:/usr/lib/java/b
       in
sge_o_shell:                /bin/bash
sge_o_workdir:              /net/fs02/home/ems
sge_o_host:                 fs02
account:                    sge
mail_list:                  ems at fs02.
notify:                     FALSE
job_name:                   job1.sh
jobshare:                   0
env_list:
script_file:                /net/fs02/home/ems/job1.sh
parallel environment:  mpich* range: 2
scheduling info:            queue instance "c_para at cn11001" dropped
because it is overloaded: np_load_avg=0.480000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn19001" dropped
because it is overloaded: np_load_avg=0.475000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn23001" dropped
because it is overloaded: np_load_avg=0.435000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn18001" dropped
because it is overloaded: np_load_avg=0.490000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn21001" dropped
because it is overloaded: np_load_avg=0.450000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn20001" dropped
because it is overloaded: np_load_avg=0.465000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn15001" dropped
because it is overloaded: np_load_avg=0.490000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn12001" dropped
because it is overloaded: np_load_avg=0.485000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn10001" dropped
because it is overloaded: np_load_avg=0.485000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn13001" dropped
because it is overloaded: np_load_avg=0.450000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn17001" dropped
because it is overloaded: np_load_avg=0.470000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn24001" dropped
because it is overloaded: np_load_avg=0.455000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn14001" dropped
because it is overloaded: np_load_avg=0.480000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn16001" dropped
because it is overloaded: np_load_avg=0.465000 (no load adjustment) >=
       0.25
                            queue instance "c_para at cn22001" dropped
because it is overloaded: np_load_avg=0.375000 (no load adjustment) >=
       0.25
                            queue instance "cluster12.q at cn12001" dropped
because it is disabled
                            queue instance "cluster12.q at cn12002" dropped
because it is disabled
                            queue instance "cluster12.q at cn12003" dropped
because it is disabled
                            queue instance "cluster12.q at cn12004" dropped
because it is disabled
                            queue instance "cluster12.q at cn12005" dropped
because it is disabled
                            queue instance "cluster12.q at cn12006" dropped
because it is disabled
                            queue instance "cluster12.q at cn12007" dropped
because it is disabled
                            queue instance "cluster12.q at cn12008" dropped
because it is disabled
                            queue instance "cluster11.q at cn11001" dropped
because it is disabled
                            queue instance "cluster11.q at cn11002" dropped
because it is disabled
                            queue instance "cluster11.q at cn11003" dropped
because it is disabled
                            queue instance "cluster11.q at cn11004" dropped
because it is disabled
                            queue instance "cluster11.q at cn11005" dropped
because it is disabled
                            queue instance "cluster11.q at cn11006" dropped
because it is disabled
                            queue instance "cluster11.q at cn11007" dropped
because it is disabled
                            queue instance "cluster11.q at cn11008" dropped
because it is disabled
                            cannot run in queue instance
"cluster10.q at cn10005" because PE "mpich_09" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10006" because PE "mpich_09" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10003" because PE "mpich_09" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10007" because PE "mpich_09" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10002" because PE "mpich_09" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10001" because PE "mpich_09" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10008" because PE "mpich_09" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10004" because PE "mpich_09" is not in pe list
                            cannot run in PE "mpich_09" because it only
offers 0 slots
                            cannot run in queue instance
"cluster09.q at cn09006" because PE "mpich_10" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09005" because PE "mpich_10" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09003" because PE "mpich_10" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09002" because PE "mpich_10" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09007" because PE "mpich_10" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09004" because PE "mpich_10" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09001" because PE "mpich_10" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09008" because PE "mpich_10" is not in pe list
                            cannot run in PE "mpich_10" because it only
offers 0 slots
                            cannot run in queue instance
"cluster09.q at cn09006" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09005" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09003" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09002" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09007" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09004" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09001" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09008" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10005" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10006" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10003" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10007" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10002" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10001" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10008" because PE "mpich_11" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10004" because PE "mpich_11" is not in pe list
                            cannot run in PE "mpich_11" because it only
offers 0 slots
                            cannot run in queue instance
"cluster09.q at cn09006" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09005" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09003" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09002" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09007" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09004" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09001" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster09.q at cn09008" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10005" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10006" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10003" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10007" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10002" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10001" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10008" because PE "mpich_12" is not in pe list
                            cannot run in queue instance
"cluster10.q at cn10004" because PE "mpich_12" is not in pe list
                            cannot run in PE "mpich_12" because it only
offers 0 slots

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list