[GE users] problems with load sensor

Andreas Haupt ahaupt at ifh.de
Mon Apr 25 14:53:35 BST 2005


Hello,

I wrote my own load sensor. It produces the following output after each 
line feed:

begin
pr360.ifh.de:tmp_used:1386400K
pr360.ifh.de:tmp_free:25244996K
pr360.ifh.de:tmp_total:28056596K
end

This sensor is registered and even started on the exec host. But I do not 
get the data!

[fuchur] ~ % qconf -sconf pr360.ifh.de
pr360.ifh.de:
load_sensor                  /usr1/scratch/ahaupt/sge/sensor.pl
prolog                       /usr1/scratch/ahaupt/sge/test1
epilog                       /usr1/scratch/ahaupt/sge/test2

The prolog and epilog scripts are not executed as well, but that's
another story...

The complexes are configured that way:

[fuchur] ~ % qconf -sc | grep tmp_
tmp_free            tf         MEMORY      <=    YES         YES        0 
0
tmp_total           tt         MEMORY      <=    YES         NO         0 
0
tmp_used            tu         MEMORY      >=    NO          NO         0 
0

[fuchur] ~ % qhost -F tmp_free -h pr360.ifh.de
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO 
SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       - 
-
pr360                   ia32            1  0.00  226.8M  120.0M 1024.0M 
16.3M

As you can see it's not there... I even submitted a test job with the new 
complexes:

[fuchur] ~ % qsub -l tmp_free=10G -l hostname=pr360 SGE/jobs/test
Your job 79201 ("test") has been submitted.

But it won't be started:

[fuchur] ~ % qstat -j 79201 | grep "unknown resource"
                             (-l hostname=pr360.ifh.de,tmp_free=10G) cannot
run in queue instance "pr360-long.q at pr360.ifh.de" because job requests
unknown resource (tmp_free)
                             (-l hostname=pr360.ifh.de,tmp_free=10G) cannot
run in queue instance "pr360-short.q at pr360.ifh.de" because job requests
unknown resource (tmp_free)

I do not understand this because if I try to submit jobs with an "unknown 
resource" the following happens:

[fuchur] ~ % qsub -l blabla=10G -l hostname=pr360 SGE/jobs/test
Unable to run job: unknown resource "blabla".
Exiting.

Everything runs under sge 6.0u3. Any hints?

Thanks in advance
Andreas

-- 
| Andreas Haupt                      | E-Mail:  andreas.haupt at desy.de
|  DESY Zeuthen                      | WWW:     http://www.desy.de/~ahaupt
|  Platanenallee 6                   | Phone:   +49/33762/7-7359
|  D-15738 Zeuthen                   | Fax:     +49/33762/7-7216

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list