[GE users] problems with load sensor

Reuti reuti at staff.uni-marburg.de
Mon Apr 25 16:05:37 BST 2005


Hi Andreas,

you can check in the message files 
($SGE_ROOT/default/spool/qmaster/messages and 
$SGE_ROOT/default/spool/pr360/messages), what configuration was used and 
whether the load sensor was started at all.

This would explain, why the resource is unknown to this host. But it 
will take some time, until the load value will show up in qhost -F.

If you want to have the load value also as a consumable, you may look 
into "man complex / Overriding attributes" for the behavior of SGE in 
this case and attach a value to the execution host (qconf -me pr360).

What platform are you running on?

Cheers - Reuti


Andreas Haupt wrote:
> Hello,
> 
> I wrote my own load sensor. It produces the following output after each 
> line feed:
> 
> begin
> pr360.ifh.de:tmp_used:1386400K
> pr360.ifh.de:tmp_free:25244996K
> pr360.ifh.de:tmp_total:28056596K
> end
> 
> This sensor is registered and even started on the exec host. But I do 
> not get the data!
> 
> [fuchur] ~ % qconf -sconf pr360.ifh.de
> pr360.ifh.de:
> load_sensor                  /usr1/scratch/ahaupt/sge/sensor.pl
> prolog                       /usr1/scratch/ahaupt/sge/test1
> epilog                       /usr1/scratch/ahaupt/sge/test2
> 
> The prolog and epilog scripts are not executed as well, but that's
> another story...
> 
> The complexes are configured that way:
> 
> [fuchur] ~ % qconf -sc | grep tmp_
> tmp_free            tf         MEMORY      <=    YES         YES        0 0
> tmp_total           tt         MEMORY      <=    YES         NO         0 0
> tmp_used            tu         MEMORY      >=    NO          NO         0 0
> 
> [fuchur] ~ % qhost -F tmp_free -h pr360.ifh.de
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO 
> SWAPUS
> ------------------------------------------------------------------------------- 
> 
> global                  -               -     -       -       -       - -
> pr360                   ia32            1  0.00  226.8M  120.0M 1024.0M 
> 16.3M
> 
> As you can see it's not there... I even submitted a test job with the 
> new complexes:
> 
> [fuchur] ~ % qsub -l tmp_free=10G -l hostname=pr360 SGE/jobs/test
> Your job 79201 ("test") has been submitted.
> 
> But it won't be started:
> 
> [fuchur] ~ % qstat -j 79201 | grep "unknown resource"
>                             (-l hostname=pr360.ifh.de,tmp_free=10G) cannot
> run in queue instance "pr360-long.q at pr360.ifh.de" because job requests
> unknown resource (tmp_free)
>                             (-l hostname=pr360.ifh.de,tmp_free=10G) cannot
> run in queue instance "pr360-short.q at pr360.ifh.de" because job requests
> unknown resource (tmp_free)
> 
> I do not understand this because if I try to submit jobs with an 
> "unknown resource" the following happens:
> 
> [fuchur] ~ % qsub -l blabla=10G -l hostname=pr360 SGE/jobs/test
> Unable to run job: unknown resource "blabla".
> Exiting.
> 
> Everything runs under sge 6.0u3. Any hints?
> 
> Thanks in advance
> Andreas
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list