[GE users] More slots scheduled than available on execution host

reuti reuti at staff.uni-marburg.de
Tue Aug 11 23:13:41 BST 2009


Hi,

Am 11.08.2009 um 15:03 schrieb s_kreidl:

> Hi Reuti,
>
> I restarted the master in the meantime (a couple of times since I  
> had some problems with advanced reservation also), and now it  
> happened again, that some of the processes of a large parallel job  
> were scheduled onto nodes that were already occupied with  
> sequential jobs, thereby exceeding the maximum number of slots  
> allowed on the host.
>
> The circumstances are similar to the last time:
> 1. We are in a brutal load situation (99.8% load).

load on the cluster or the machine where there qmaster runs? Was it  
also running out of memory and got some oom (out-of-memory)-killer  
action from the kernel (should appear in /var/log/messages)?


> 2. Not all slots of the hosts in question were occupied by  
> sequential jobs, means there were open slots on the host, just not  
> enough.

The parallel job used correctly the assigned nodes with its qrsh  
command?

-- Reuti


> Has anyone an idea, where and how I could start searching for the  
> problem? I'm out of ideas here.
>
> Regards,
> Sabine
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=211821
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=211902

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list