[GE users] Queues are in error state

Chakravarthi_Mohan Chakravarthi_Mohan at satyam.com
Tue May 3 11:00:57 BST 2005


Hi,

Before explaining the possible cause of this issue,

Let me tell you how to clear the error through command,
Try this command,
1. qmod -cq <Your Queue name> 

This error may be due to, when you are try to execute binary file i.e.
compiled executable on cluster nodes .you have to use some options
available on "qsub" command to over come this problem.

2. qsub -b y <your executable> 

Check "qsub" manual page for more information.

I hope this helps you.




-----Original Message-----
From: Mark Ellerby [mailto:issmde at leeds.ac.uk] 
Sent: Tuesday, May 03, 2005 3:06 PM
To: users at gridengine.sunsource.net
Subject: [GE users] Queues are in error state

Hi,

We have had SGE 6.0 installed on our Linux beowulf cluster for a couple 
of weeks and it has been working OK. However when I came back to work 
following the long weekend I found most queues to be in error state. 
Having looked through the qmaster/messages file the problem seemed to 
start when job 982 failed. The error message is as follows:

04/30/2005 21:44:28|qmaster|snowdon|W|job 982.1 failed on host 
snowdon.leeds.ac.uk general assumedly before j
ob because: can't write script file "job_scripts/982" wrote only -1 of 
4451552 bytes: Bad address

I can't find any record of that job unfortunately, so I can't see what 
it was trying to run.

The strangest thing is, it seems that SGE tried to then run that job on 
pretty much all the queues in the system, putting most of our compute 
nodes out of action (in error state). I can't understand why the 
queueing system would do this, because it doesn't normally do that when 
a job fails.

Could this be a bug on SGE6.0, or could it be that I've not set up the 
queueing system correctly?

Any help appreciated

Thanks
Mark

-- 
Mark Ellerby                         email: m.d.ellerby at leeds.ac.uk
Information Systems Services         phone: +44 (0)113 3435429
University of Leeds

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list