[GE users] problems with load sensor

Andreas Haupt ahaupt at ifh.de
Tue Apr 26 08:04:31 BST 2005


Hello Reuti,

On Mon, 25 Apr 2005, Reuti wrote:

> Hi Andreas,
>
> you can check in the message files ($SGE_ROOT/default/spool/qmaster/messages 
> and $SGE_ROOT/default/spool/pr360/messages), what configuration was used and 
> whether the load sensor was started at all.

Thanks, but as I wrote, the load sensor gets started. In the messages 
files there is also a note that it has been started.

> This would explain, why the resource is unknown to this host. But it will 
> take some time, until the load value will show up in qhost -F.

Yes, this I read as well. But in the meantime I restarted the exec daemon 
and even the master daemon. Nothing happened...

> If you want to have the load value also as a consumable, you may look into 
> "man complex / Overriding attributes" for the behavior of SGE in this case 
> and attach a value to the execution host (qconf -me pr360).

After setting a hard value for tmp_free with "qconf -me pr360" it also 
showed up in qhost -F. But after removing that complex again it also
disappeared in qhost.

> What platform are you running on?

Scientific Linux 3.04 (IA32)

Greetings
Andreas

> Andreas Haupt wrote:
>> Hello,
>> 
>> I wrote my own load sensor. It produces the following output after each 
>> line feed:
>> 
>> begin
>> pr360.ifh.de:tmp_used:1386400K
>> pr360.ifh.de:tmp_free:25244996K
>> pr360.ifh.de:tmp_total:28056596K
>> end
>> 
>> This sensor is registered and even started on the exec host. But I do not 
>> get the data!
>> 
>> [fuchur] ~ % qconf -sconf pr360.ifh.de
>> pr360.ifh.de:
>> load_sensor                  /usr1/scratch/ahaupt/sge/sensor.pl
>> prolog                       /usr1/scratch/ahaupt/sge/test1
>> epilog                       /usr1/scratch/ahaupt/sge/test2
>> 
>> The prolog and epilog scripts are not executed as well, but that's
>> another story...
>> 
>> The complexes are configured that way:
>> 
>> [fuchur] ~ % qconf -sc | grep tmp_
>> tmp_free            tf         MEMORY      <=    YES         YES        0 0
>> tmp_total           tt         MEMORY      <=    YES         NO         0 0
>> tmp_used            tu         MEMORY      >=    NO          NO         0 0
>> 
>> [fuchur] ~ % qhost -F tmp_free -h pr360.ifh.de
>> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO 
>> SWAPUS
>> 
>> ------------------------------------------------------------------------------- 
>> global                  -               -     -       -       -       - -
>> pr360                   ia32            1  0.00  226.8M  120.0M 1024.0M 
>> 16.3M
>> 
>> As you can see it's not there... I even submitted a test job with the new 
>> complexes:
>> 
>> [fuchur] ~ % qsub -l tmp_free=10G -l hostname=pr360 SGE/jobs/test
>> Your job 79201 ("test") has been submitted.
>> 
>> But it won't be started:
>> 
>> [fuchur] ~ % qstat -j 79201 | grep "unknown resource"
>>                             (-l hostname=pr360.ifh.de,tmp_free=10G) cannot
>> run in queue instance "pr360-long.q at pr360.ifh.de" because job requests
>> unknown resource (tmp_free)
>>                             (-l hostname=pr360.ifh.de,tmp_free=10G) cannot
>> run in queue instance "pr360-short.q at pr360.ifh.de" because job requests
>> unknown resource (tmp_free)
>> 
>> I do not understand this because if I try to submit jobs with an "unknown 
>> resource" the following happens:
>> 
>> [fuchur] ~ % qsub -l blabla=10G -l hostname=pr360 SGE/jobs/test
>> Unable to run job: unknown resource "blabla".
>> Exiting.
>> 
>> Everything runs under sge 6.0u3. Any hints?
>> 
>> Thanks in advance
>> Andreas
>> 
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

-- 
| Andreas Haupt                      | E-Mail:  andreas.haupt at desy.de
|  DESY Zeuthen                      | WWW:     http://www.desy.de/~ahaupt
|  Platanenallee 6                   | Phone:   +49/33762/7-7359
|  D-15738 Zeuthen                   | Fax:     +49/33762/7-7216

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list