Opened 16 years ago
Closed 10 years ago
#248 closed patch (fixed)
IZ1617: Bad check for jobs when removing execution hosts
Reported by: | uddeborg | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | 6.0u4 |
Severity: | minor | Keywords: | install |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1617]
Issue #: 1617 Platform: All Reporter: uddeborg (uddeborg) Component: gridengine OS: All Subcomponent: install Version: 6.0u4 CC: None defined Status: REOPENED Priority: P3 Resolution: Issue type: PATCH Target milestone: --- Assigned to: dom (dom) QA Contact: dom URL: * Summary: Bad check for jobs when removing execution hosts Status whiteboard: Attachments: Date/filename: Description: Submitted by: Fri May 13 00:28:00 -0700 2005: inst_execd_uninst.sh.patch Suggested patch for inst_execd_uninst.sh (text/plain) uddeborg Wed Jun 8 08:35:00 -0700 2005: p Updated patch, taken after the changes made in issue 1627. (text/plain) uddeborg Issue 1617 blocks: Votes for issue 1617: Opened: Fri May 13 00:27:00 -0700 2005 ------------------------ When trying to remove some execution hosts (inst_sge -ux -host xxx) I the script complained that it could not move all jobs in the machine's queues, although I knew for sure it was empty. (And the machine had not been up for several days.) I took a look in the relevant script, inst_execd_uninst.sh. In several places a list of queues are extracted with this pipe: qstat -f | grep $exechost | cut -d" " -f1 There are some problems with this 1. If one is using names without domains, it is not too unlikely one machine may have a name which is a substring of another machine's name. We have one named "cat" and another named "catoosa" for example. The former would match the latter be found with the above line. To solve that, I suggest - fully resolved names are used, using ResolveHosts. - the grep is anchored. The left side can be anchored with an @. If the "cut" is done before the "grep", the right side can be anchored as an end-of-line. 2. The same command is used to figure out if there are any jobs to suspend/reschedule. But the command will list also empty queues. There ought to be a -ne flag to qstat in those cases. The attached patch is for the 6.0U4 version. Since finished binaries for that version still isn't released, I've tried a similar one on our 6.0U3 system, but have not tested this precise patch. ------- Additional comments from uddeborg Fri May 13 00:28:16 -0700 2005 ------- Created an attachment (id=59) Suggested patch for inst_execd_uninst.sh ------- Additional comments from uddeborg Wed Jun 8 08:33:12 -0700 2005 ------- The resolution of bug 1627 also fixed the first subproblem reported here. And in a better way than I suggested! The second subproblem remains. ------- Additional comments from uddeborg Wed Jun 8 08:35:14 -0700 2005 ------- Created an attachment (id=61) Updated patch, taken after the changes made in issue 1627. ------- Additional comments from roland Fri Jun 17 07:17:00 -0700 2005 ------- take care of it ------- Additional comments from roland Mon Jun 20 05:57:26 -0700 2005 ------- the -ne switch only print out queues with scheduled jobs. Empty queues will be ignored. Inside the script we want to suspend/disable... all queues, not only queues with jobs. If we add the -ne switch it could be that a queue get a job after we executed the disable/suspend code. In this case the queue will be deleted while the job is running. ------- Additional comments from uddeborg Wed Jul 27 06:55:37 -0700 2005 ------- Point taken. All queues should be suspended. In SuspendQueue(), there should be no "-ne". But in SuspendJobs() and RescheduleJobs() it would still make sense with "-ne", wouldn't it? These functions don't do anything with the queues themselves, only with jobs in them, if any.
Attachments (2)
Change History (3)
Changed 10 years ago by dlove
Changed 10 years ago by dlove
comment:1 Changed 10 years ago by dlove
- Resolution set to fixed
- Severity set to minor
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.
Fixed according to RD-2005-06-20-0 in Changelog