[GE users] qconf -tsm

reuti reuti at staff.uni-marburg.de
Tue Apr 13 22:11:40 BST 2010


Hi,

Am 13.04.2010 um 18:36 schrieb isakrejda:

> Last night I had trouble figuring out why a set of jobs is not  
> entering
> execution and since it
> was late and i was tired I took an easy way out and ran qconf -tsm. I
> missed the fact that
> number of pending jobs crept up to 10k and with 1k of job slots that  
> one
> cycle of
> the scheduler took 3h and almost drained the cluster. So I have few
> questions.

the `qstat -j <job_id>` output didn't give any result scheduling info  
turned on in the scheduler?

-- Reuti


> 1. Is there a way to turn this debugging for just one job through 1
> cycle of the scheduler?
> We do not have the option to keep track of why the job is waiting on
> because it puts too much
> load and in the past caused draining of the cluster.
>
> 2. Once I realize that it's going to take too much time what is the  
> best
> way to interrupt the cycle?
> If I stop and restart the master, is it going to kill the scheduler or
> wait until the scheduler finishes
> its cycle? If I just kill the qmaster, is there a flag somewhere that
> would tell to trigger
> the logging of the scheduler cycle again or is it all in the memory?
>
> 3. Is there a way to redirect output of that debugging run out of the
> <ge_root>/<cell>/common/
> directory? The directory is on a fairly heavily used common fs. I
> thought about creating a link to
> a local fs before issuing the command. Would that work? I remember  
> (but
> haven't done
> it for a while) that if the  <ge_root>/<cell>/common/schedd_runlog
> exists,  sge appends to it.
> So would that work and would it help? I wonder whether it was a pure  
> IO
> that slowed the run
> so much or is something else going on when I am troeing in that  
> qconf -tsm.
>
> Taking into account consequences I am reluctant to experiment so any
> insight would be
> appreciated.
>
> I am running 6.2u5 version.
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253263
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253294

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list