Opened 13 years ago
Last modified 10 years ago
#464 new enhancement
IZ2386: Slow scheduling with many queue instances
Reported by: | sgaure | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | 6.1u2 |
Severity: | Keywords: | scheduling | |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2386]
Issue #: 2386 Platform: All Reporter: sgaure (sgaure) Component: gridengine OS: All Subcomponent: scheduling Version: 6.1u2 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: ENHANCEMENT Target milestone: --- Assigned to: andreas (andreas) QA Contact: andreas URL: http://titan.uio.no * Summary: Slow scheduling with many queue instances Status whiteboard: Attachments: Issue 2386 blocks: Votes for issue 2386: Opened: Tue Oct 2 14:13:00 -0700 2007 ------------------------ We do have problems with scheduling taking too long time, i.e. qlogins timeout, or have to wait half a minute or more, even though there are lots of available slots in the cluster. With max_reservation=1 (instead of the current 0), we are most of the time not able to schedule interactive jobs at all. We typically have approx 10-20 pending jobs, some of them array jobs with 10-100k tasks. We have set max_pending_tasks_per_job=20. I've done some profiling with oprofile on sge_schedd and found that it spends most of its time in sge_eval_expression(), sge_is_expression(), sge_strlcpy(), sge_hostcpy() and sge_hostcmp() We do have approx 25 cluster queues on approx 450 nodes; approx 10000 queue instances. One of the queues is subordinate to all the others. Access is governed by rqs. Though I have not yet been able to do a gprof (this is a system in full production) it seems very likely that the routine qinstance_list_locate is to blame. It's a linear search in a list of queue instances, with two quite elaborate tests (sge_eval_expression) containing a lot of setup (and copying in sge_hostcmp) for what boils down to (more or less) a strcmp. I guess the thing must be initiated from the cqueue_locate_qinstance() call in so_list_resolve(). For scalability reasons, I strongly suggest this part of the code be rewritten to be more efficient. ------- Additional comments from sgaure Wed Oct 3 02:52:48 -0700 2007 ------- Nah, it's not there, it's all over the place, linear searches in hostlists, queue instances. Calls into fancy routines with malloc, copying, free, typically ending in a simple strcmp. Over and over and over again, on the same data. We have problems now, with a mere 450 nodes, and a 4-cpu dedicated sge server. In a couple of years we might have 3000. If sge_schedd could be made parallel, we could set apart a 20-node cluster with 160 cpus for running it. A serious effort should be made to make sge scale. ------- Additional comments from sgaure Fri Oct 5 17:28:45 -0700 2007 ------- When my sge_schedd gets one of its fits, the copy in sge_hostcmp is the top cpu- hog. I suppose the hostcpy is used to strip domain names etc, but this should be unnecessary to do over and over again during scheduling, hostnames should be normalized before they're admitted to internal structures in the scheduler. If it's required that users see the exact hostname they supplied, a literal copy should be kept in the job-structure. ------- Additional comments from andreas Mon Oct 8 10:29:36 -0700 2007 ------- Thanks for reporting the observations. Actually efforts for scheduler improvements with many, many queue instances are already underway and already 6.1u3 will be faster. Though I can not predict how the improvement will be with your special setup, but they are significant in particular when many resource quotas, many hosts and many hosts are involved. Implementation-wise the improvements do not change functions like sge_eval_expression(), but instead aim on reducing the overall amount of calls of such functions.
Note: See
TracTickets for help on using
tickets.