[GE users] Failed receiving gdi request

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Thu Aug 3 10:47:52 BST 2006


Hi Sean,

On Tue, 1 Aug 2006, Sean Dilda wrote:

>> Analysing the problem, I came across the following things:
>> * Even after the message 'failed receiving gdi request', the qmaster is 
>> still reachable by a qping.
>
>
> I've seen this error a fair bit.  In my case, I have SGE using classic

I would assume this is not related, but I can't really say for sure.

> spooling that's shared over NFS (for shadow master failover).  Sometimes SGE 
> gets to really liking certain jobs (especially parallel jobs) and starts

How large is this parallel job? In earlier releases there was some issue 
with tightly integrated parallel jobs causing massive spooling. So let us
know what Grid Engine version are you using?

> rewriting their spool information quite a bit.  SGE ends up hanging waiting 
> on these writes and can be slow to respond and will give the 'failed 
> receiving gdi request' error.

Do your execd's also spool over NFS? If so, chances are good that you
could overcome the issue by switching to local execd spooling.

Regards,
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list