[GE users] Scheduler stops transferring queued jobs after GDI error

reuti reuti at staff.uni-marburg.de
Mon Jan 11 15:08:05 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

Am 08.01.2010 um 16:33 schrieb futurity:

> Hi,
>
> I was wondering if anyone may be able to help me?
>
> We?re using Grid Engine 6.1u3 and experiencing problems where  
> queued jobs aren?t transferred from state ?qw / queue waiting? to  
> machines to be run.  This has been ongoing for the last few months  
> where this problem used to only occur once every 2 weeks at the  
> start, but since the new year its started to happen multiple times  
> a day.  Rebooting the host machines doesn?t seem to stop it  
> happening any less frequently.
>
> When the qmaster is soft stopped and started again, the queued jobs  
> then transfer and run fine until the problem reoccurs.
>
> The sequence of events leading up the the problem are as follows:
> Everything on the grid is working fine.
> A user experiences an error message ?error: failed receiving gdi  
> request?.
> Subsequence job submission appear to work without the gdi error  
> being received.
> Jobs in state ?qw? or jobs submitted after step 2 stay in state  
> ?qw? and are never transferred.
>
> We haven?t modified our grid configuration for 6 months, possibly a  
> year and its been running without any problems what so ever for  
> months before this started to happen.

are the spool directories local or on the file server?


> Disk space is fine (7GB free).

Where: in /tmp, /var? No disk quota in place in /home?


>  Top shows that the machine?s load is nothing when the grid is  
> working fine and when in this problem state.
>
> Has anyone else experienced this problem or has any other suggestions?
>
> Would upgrading to 6.1u6 help?

I would wait for the 6.2u5 binaries being available. Although I can't  
guarantee, that it will solve your issue.

-- Reuti


> Kind Regards
>
> Neil

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=238113

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list