[GE users] sched_job_info problems
skylar2 at u.washington.edu
Thu Aug 6 14:47:30 BST 2009
> Am 06.08.2009 um 00:48 schrieb skylar2:
>> We're running into problems with GE 6.1u6 where having schedd_job_info
>> enabled makes sge_schedd eat up all the RAM on the system when we have
>> users submit jobs with PE slot range requests. This appears to be an
>> issue that is reported here and supposedly fixed:
>> Disabling schedd_job_info makes the problem go away, but our users
>> depend on schedd_job_info output to debug their jobs. Does anyone know
>> of a workaround for this problem?
> what dou you mean by "debug"? Usually it happens to users who are new
> to a cluster to request too much resources and they want to
> investigate why the jobs aren't starting. One test could be:
> $ qalster -w v <jobid>
> instead to get a reply whether there are any suitables queues at all,
> "-w p" to get a reply under the current load.
I ran across that qsub/qalter trick right after I sent my email. It
looks like it'll work for us, but we'll just have to re-educate our users.
> In special cases I turn
> the schedd_job_info on for some minutes for investigation and turn it
> off again.
I tried this, but by the time you get useful "qstat -j" output memory
usage has already ballooned and the only way to get it back down is to
restart the scheduler.
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S048, (206)-685-7354
-- University of Washington School of Medicine
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
[ Part 2, "OpenPGP digital signature" Application/PGP-SIGNATURE ]
[ (Name: "signature.asc") 261 bytes. ]
[ Unable to print this part. ]
More information about the gridengine-users