[GE users] SGE6 does not backfill
christian.bolliger at id.unizh.ch
Fri Apr 15 16:16:18 BST 2005
[ The following text is in the "ISO-8859-15" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
How many files are open on your master system (lsof | wc -l)?
Have you controled also the hard file descriptor limit ('ulimit -Hn' or
'limit -h decriptors').
What happend here is that we lost jobs because of the hard file
descriptor limit (but we have 256 nodes).
Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
>I just checked in a fix for one of the problems we found during evaluation
>your problem report. PE jobs should not start much faster and the likely hod
>of loosing jobs should be nearly 0.
>Could you download the latest changes and test them in our env.?
>Thank you very much for your detailed problem analysis and your help
>Juha Jäykkä wrote:
>>>For me it's only dropped, if there is something running in all slots of
>>>a queue. But not for a reservation. - Reuti
>>Perhaps this is the cause of my trouble then. Any ideas how to fix this?
>>It's quite frustrating to have a cluster which has 23 CPUs out of 24 just
>>doing nothing (current situation), because one job reserves all the CPUs
>>and backfill does not work at all.
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
IT Services | http://www.id.unizh.ch/
Central Systems / HPC | http://www.matterhorn.unizh.ch/
University of Zuerich | E-Mail: christian.bolliger at id.unizh.ch
Winterthurerstr. 190 | Tel: +41 (0)1 63 56775
CH-8057 Zuerich; Switzerland | Fax: +41 (0)1 63 54505
Mime/S CA: https://www.ca.unizh.ch/client/
More information about the gridengine-users