[GE users] Very slow scheduling of array job
theveracious at yahoo.com
Mon Mar 23 13:38:30 GMT 2009
This morning when I checked the status of my queue there were jobs waiting despite there being a sufficient number of cores available. I checked the status of the queue and found one instance marked "E". It turned out that the node was up but overloaded, so I disabled it and cleared the error. The jobs in status qw are one array job and one MPI job. The scheduler seems to allocate cores for the array job, just very very slowly, maybe 12 threads per every 5 minutes or so. The MPI job isn't getting scheduled at all, I am assuming its waiting for the array job to be completely scheduled. "qstat -j" shows nothing wrong, neither for the array job nor for the MPI job. I get nothing new in qmaster's messages log. How can I troubleshoot this? What is the most likely cause, the array job itself or something in SGE, like the scheduler?
I am using SGE 6.2u2. max_jobs, max_ujobs, max_aj_instances and max_aj_tasks are all set to 0.
More information about the gridengine-users