[GE users] core duo systems not accepting jobs

craffi dag at sonsorol.org
Mon Jul 13 00:17:01 BST 2009


This is pretty strange, at this point I'd stop with trying to figure  
out why real jobs are not running on your dual-cores and start  
submitting some directed jobs that might flush out some better  
errors ...

Try sending some test scripts directly at the dual core hosts and  
queue a few times:

$ qrsh -q x86_64.q@@coreduos /bin/hostname

$ qsub -cwd -q x86_64.q@@coreduos $SGE_ROOT/examples/jobs/simple.sh

And maybe even some 1-way parallel requests just to see what happens:

$ qsub -cwd -pe gauss 2 -q x86_64.q@@coreduos $SGE_ROOT/examples/jobs/ 
simple.sh


You've probably already done this but it's time to move beyond qstat  
and qconf output, do you see anything in your SGE spool logs for the  
qmaster host, the scheduler process or even the execd messages file  
for some of the 2-way systems?


-Chris




On Jul 12, 2009, at 6:40 PM, flengyel wrote:

>
>
>
>
> -----Original Message-----
> From: craffi [mailto:dag at sonsorol.org]
> Sent: Sun 7/12/2009 6:39 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] core duo systems not accepting jobs
>
> Things look pretty good, a few queue instances down in 'au' state and
> one of your x86_64 hosts in load alarm state 'a' with some insane load
> average. Your quad.q hosts are almost totally maxed out.
>
> Indeed.
>
> And you do have a bunch of x86_64.q hosts with free job slots that are
> totally idle.
>
> Right
>
> Commenting only now on the "qstat -j" data you posted I'd zero in on
> this report from the scheduler:
>
> >                             cannot run because no access to pe  
> "gauss"
> >                             cannot run in PE "gauss" because it only
> > offers 0 slot
>
>
>
> This brings to mind a few guesses:
>
> - Have you run out of "gauss" PE slots? How many are configured in the
> PE object?
>
> 9999
>
> - Is your user allowed to access that PE or is there a quota or ACL
> list that may be blocking them?
>
> Yes. No quota that I am aware of.
>
>
> - Is your user part of the "Research" group? You have access control
> configured on that queue via the "user_lists" parameter in the queue
> config
>
> Yes.
>
> -Chris
>
>
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206723

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list