[GE users] Load average problem again

Anand S Bisen vmlinuz at abisen.com
Mon Aug 16 16:21:36 BST 2004


Reuti,

Exactly, so here is how the whole process is setup now. We generate lots of
data files continuously that is pushed to the cluster. As soons as these
data files are received they are submitted to the SGE. And each SGE job will
in turn use these perl scripts that will do different kinds of processing on
these data files. And it spawns a tree of processes 1.pl opens the data file
that generates 10000's of small files now script 2.pl is executed on
different nodes each with their share of small files and one instance of
2.pl will open one small file and perform some computation and call 3.pl and
will wait on 3.pl to finish and 3.pl is linked with 4.pl. Due to the nature
of the scripts they actually run for a very short time but the initiation
and termination of all these scripts takes a lot of time if it is done
through SGE and most of the time the processor is free. Now at any given
point of time there can be 10's of these jobs running on the cluster but
even then the load average is becomgin high because of so many small waiting
pl scripts so jobs keeps on waiting on for the queue to be down on load.

What should be an appropritate configuration I should use for the SGE. Right
now I have increased the slots to 28 on each node and the load
np_load_Average = 50 and the interesting thing is that the load average goes
up to 50 and at that point CPU usage is around 95% on our dual processor
machines with 28/28 slots full. But is this right way fo doing things.

Thanks

Anand
 

-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Sunday, August 15, 2004 3:15 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Load average problem again 

Hi,

>our dual Pentium 4 Xeon 40 node cluster. The cluster is working on 
>bioinformatics applications that are developed using perl scripts that 
>call each other and wait for each other to finish. Hence at any given 
>point of time there are many executing scripts that are actually 
>waiting and this increases the load average artificially. If I increase 
>my load_threshold on

can you provide some more details about your scripts? They startup as serial
jobs, and then they are starting something in the background and polling for
the results?

Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list