[GE users] schedd hangs with infinite loop :-((

Christian Kauhaus ckauhaus at informatik.uni-jena.de
Thu Apr 1 13:35:20 BST 2004

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


Andy Schwierskott <andy.schwierskott at sun.com>:
>   - did you recently upgrade your glibc version

We did an upgrade ca. one week ago from libc6 Debian package
2.3.2.ds1-9 to 2.3.2.ds1-11. But it went quite well for one week,
scheduling something about 1500 Jobs.

>   - or did you move the master machine to this new machine

No, the master has been on the same host all the time.

>   - or did you begin to use functional tickets

We use functional tickets for a while and never had any problems
with 5.3p5. 

>Please send your scheduler config (qconf -ssconf) as well.

# qconf -ssconf
algorithm                  default
schedule_interval          00:00:30
maxujobs                   30
queue_sort_method          seqno
user_sort                  true
job_load_adjustments       np_load_avg=0.9
load_adjustment_decay_time 0:02:00
load_formula               np_load_avg*100000+swap_rate
schedd_job_info            true
sgeee_schedule_interval    00:02:30
halftime                   168
usage_weight_list          cpu=0.5,mem=0.25,io=0.25
compensation_factor        5
weight_user                0.2
weight_project             0.2
weight_jobclass            0.2
weight_department          0.2
weight_job                 0.2
weight_tickets_functional  10000
weight_tickets_share       100000
weight_tickets_deadline    10000

The complex value 'swap_rate' comes from a custom load sensor script,
since the built in load sensor seems not to work on arch glinux. It is
measured in bytes/sec. We need this because some of our machines tend to
run short on memory due to interactive usage.

It is also noteworthy that I've actually got sge_schedd running again by
removing all jobs from the directories
$SGE_ROOT/default/spool/qmaster/jobs and
$SGE_ROOT/default/spool/qmaster/job_scripts. Of cause I got some angry


Dipl.-Inf. Christian Kauhaus                               <><
Lehrstuhl Rechnerarchitektur und -kommunikation 
Institut fuer Informatik · Ernst-Abbe-Platz 1-2 · D-07743 Jena
Tel.: (+49) 3641 9 46376 · Fax: (+49) 3641 9 46372 · Raum 3217

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list