[GE users] problems with load sensor

Reuti reuti at staff.uni-marburg.de
Tue Apr 26 10:06:52 BST 2005


Andreas Haupt wrote:
> Hello Reuti,
> 
> On Mon, 25 Apr 2005, Reuti wrote:
> 
>> Hi Andreas,
>>
>> you can check in the message files 
>> ($SGE_ROOT/default/spool/qmaster/messages and 
>> $SGE_ROOT/default/spool/pr360/messages), what configuration was used 
>> and whether the load sensor was started at all.
> 
> 
> Thanks, but as I wrote, the load sensor gets started. In the messages 
> files there is also a note that it has been started.
> 
>> This would explain, why the resource is unknown to this host. But it 
>> will take some time, until the load value will show up in qhost -F.
> 
> 
> Yes, this I read as well. But in the meantime I restarted the exec 
> daemon and even the master daemon. Nothing happened...
> 
>> If you want to have the load value also as a consumable, you may look 
>> into "man complex / Overriding attributes" for the behavior of SGE in 
>> this case and attach a value to the execution host (qconf -me pr360).

Can you try to list only the hostname in the load sensor, instead of the 
FQDN and see the result? - Reuti

> 
> After setting a hard value for tmp_free with "qconf -me pr360" it also 
> showed up in qhost -F. But after removing that complex again it also
> disappeared in qhost.
> 
>> What platform are you running on?
> 
> 
> Scientific Linux 3.04 (IA32)
> 
> Greetings
> Andreas
> 
>> Andreas Haupt wrote:
>>
>>> Hello,
>>>
>>> I wrote my own load sensor. It produces the following output after 
>>> each line feed:
>>>
>>> begin
>>> pr360.ifh.de:tmp_used:1386400K
>>> pr360.ifh.de:tmp_free:25244996K
>>> pr360.ifh.de:tmp_total:28056596K
>>> end
>>>
>>> This sensor is registered and even started on the exec host. But I do 
>>> not get the data!
>>>
>>> [fuchur] ~ % qconf -sconf pr360.ifh.de
>>> pr360.ifh.de:
>>> load_sensor                  /usr1/scratch/ahaupt/sge/sensor.pl
>>> prolog                       /usr1/scratch/ahaupt/sge/test1
>>> epilog                       /usr1/scratch/ahaupt/sge/test2
>>>
>>> The prolog and epilog scripts are not executed as well, but that's
>>> another story...
>>>
>>> The complexes are configured that way:
>>>
>>> [fuchur] ~ % qconf -sc | grep tmp_
>>> tmp_free            tf         MEMORY      <=    YES         
>>> YES        0 0
>>> tmp_total           tt         MEMORY      <=    YES         
>>> NO         0 0
>>> tmp_used            tu         MEMORY      >=    NO          
>>> NO         0 0
>>>
>>> [fuchur] ~ % qhost -F tmp_free -h pr360.ifh.de
>>> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  
>>> SWAPTO SWAPUS
>>>
>>> ------------------------------------------------------------------------------- 
>>> global                  -               -     -       -       -       
>>> - -
>>> pr360                   ia32            1  0.00  226.8M  120.0M 
>>> 1024.0M 16.3M
>>>
>>> As you can see it's not there... I even submitted a test job with the 
>>> new complexes:
>>>
>>> [fuchur] ~ % qsub -l tmp_free=10G -l hostname=pr360 SGE/jobs/test
>>> Your job 79201 ("test") has been submitted.
>>>
>>> But it won't be started:
>>>
>>> [fuchur] ~ % qstat -j 79201 | grep "unknown resource"
>>>                             (-l hostname=pr360.ifh.de,tmp_free=10G) 
>>> cannot
>>> run in queue instance "pr360-long.q at pr360.ifh.de" because job requests
>>> unknown resource (tmp_free)
>>>                             (-l hostname=pr360.ifh.de,tmp_free=10G) 
>>> cannot
>>> run in queue instance "pr360-short.q at pr360.ifh.de" because job requests
>>> unknown resource (tmp_free)
>>>
>>> I do not understand this because if I try to submit jobs with an 
>>> "unknown resource" the following happens:
>>>
>>> [fuchur] ~ % qsub -l blabla=10G -l hostname=pr360 SGE/jobs/test
>>> Unable to run job: unknown resource "blabla".
>>> Exiting.
>>>
>>> Everything runs under sge 6.0u3. Any hints?
>>>
>>> Thanks in advance
>>> Andreas
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list