[GE users] asking about speeding up load reporting -- johnny layne

Johnny Layne laynejg at vcu.edu
Thu Sep 20 22:56:25 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

hi everyone,
     I hope that I can get some info from folks experienced with this.  
On a cluster with 116 nodes, we're trying to set things so that users 
don't oversubscribe memory.  I don't want to depend on users adding "-l" 
options to their qsub scripts, they just won't do it reliably.  So I've 
set aside a node just for me to play with and I've been running some 
memory-intensive jobs & watching what happens as I adjust & play with 
mem_free and other values for suspend thresholds.  It works really great 
to set the mem_free value in "Suspend Thresholds" to a value such that 
when my jobs get to using too much memory the suspend script I wrote 
kicks in.  All of that is fine.

     Now what I'm wondering, how is it working for people trying to 
speed up the reporting time of the load values?  I just changed 
load_report_time for this 1-node queue to 20 seconds from the default 
40, and watched my jobs using top & other methods.  The newest memory 
hog got suspended nicely & restarted OK as usual, this time it happened 
(the suspend-restart-run process) very cleanly and quickly compared to 
the 40 second report time.  That's really nice!  However, I'm sure as I 
double the number of load reports that occur, communication costs are 
getting a lot worse on the cluster.  Have any of you experimented with 
this sort of thing?  Do you have suggestions about how to test reporting 
time values to find an optimal one beyond the trial & error method I'm 
using?  Do you have any other suggestions about this sort of thing?  
Thanks a lot for any information!

     Oh the cluster uses gigabit networking.
     johnny

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list