[GE users] Scheduler stops transferring queued jobs after GDI error
reuti at staff.uni-marburg.de
Mon Jan 11 15:08:05 GMT 2010
[ The following text is in the "utf-8" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
Am 08.01.2010 um 16:33 schrieb futurity:
> I was wondering if anyone may be able to help me?
> We?re using Grid Engine 6.1u3 and experiencing problems where
> queued jobs aren?t transferred from state ?qw / queue waiting? to
> machines to be run. This has been ongoing for the last few months
> where this problem used to only occur once every 2 weeks at the
> start, but since the new year its started to happen multiple times
> a day. Rebooting the host machines doesn?t seem to stop it
> happening any less frequently.
> When the qmaster is soft stopped and started again, the queued jobs
> then transfer and run fine until the problem reoccurs.
> The sequence of events leading up the the problem are as follows:
> Everything on the grid is working fine.
> A user experiences an error message ?error: failed receiving gdi
> Subsequence job submission appear to work without the gdi error
> being received.
> Jobs in state ?qw? or jobs submitted after step 2 stay in state
> ?qw? and are never transferred.
> We haven?t modified our grid configuration for 6 months, possibly a
> year and its been running without any problems what so ever for
> months before this started to happen.
are the spool directories local or on the file server?
> Disk space is fine (7GB free).
Where: in /tmp, /var? No disk quota in place in /home?
> Top shows that the machine?s load is nothing when the grid is
> working fine and when in this problem state.
> Has anyone else experienced this problem or has any other suggestions?
> Would upgrading to 6.1u6 help?
I would wait for the 6.2u5 binaries being available. Although I can't
guarantee, that it will solve your issue.
> Kind Regards
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users