[GE users] Node exclusion by users

Iwona Sakrejda isakrejda at lbl.gov
Wed Aug 9 00:20:17 BST 2006



Reuti wrote:

>> This is part of the problem. I do not want to exculde any
>> nodes a priori. I only want users to have a chance
>> to avoid nodes they think are bad for them....
> 
> 
> Aha, now I got your intention. Is there at least one reasonable  person 
> available at weekends? 
Might be, but not 365x24 so the management does not want users to know
and I need to present a solution that does not depend on a reasonable
person being present.....

> We had some black holes in the cluster by a filled /tmp. There we use  
> now the load-sensor from http://gridengine.sunsource.net/howto/ 
> loadsensor.html and setup a load_threshold to close the queue, if  less 
> than 1GB is free to avoid it.
my most popular black hole is when the /scratch goes read-only
because of a corrupted file system. Testing for read-only is easy
so it can be handled your way. But the nodes always find new ways
for going bad...

On the other hand users submit hundreds of job, and the jobs
are not always that well tested so I don't want to close
a node only because it started flushing jobs.

Anyway, between a list of hostgroups and wild cars in hostnames, we'll manage.

Thanks a lot,

Iwona


> 
> HTH - Reuti
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list