Custom Query (431 matches)
Results (145 - 147 of 431)
Ticket | Resolution | Summary | Owner | Reporter |
---|---|---|---|---|
#621 | fixed | IZ2871: SGE_COMPLEX_ hook should be documented | templedf | |
Description |
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2871] Issue #: 2871 Platform: All Reporter: templedf (templedf) Component: gridengine OS: All Subcomponent: man Version: 6.2 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: DEFECT Target milestone: --- Assigned to: andreas (andreas) QA Contact: andreas URL: * Summary: SGE_COMPLEX_ hook should be documented Status whiteboard: Attachments: Issue 2871 blocks: Votes for issue 2871: Opened: Mon Jan 12 14:41:00 -0700 2009 ------------------------ The SGE_COMPLEX_ hook described in the workaround for Issue 409 should be documented in the man page for qsub/qrsh. ------- Additional comments from templedf Mon Jan 12 14:58:27 -0700 2009 ------- I just tested this feature in 6.2, and it doesn't appear to work. Perhaps this should be a qsub issue instead. |
|||
#628 | fixed | IZ2898: ACCT_RESERVED_USAGE ignores slot count | rdickson | |
Description |
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2898] Issue #: 2898 Platform: All Reporter: rdickson (rdickson) Component: gridengine OS: All Subcomponent: man Version: 6.1u5 CC: None defined Status: REOPENED Priority: P3 Resolution: Issue type: DEFECT Target milestone: --- Assigned to: pollinger (pollinger) QA Contact: andreas URL: * Summary: ACCT_RESERVED_USAGE ignores slot count Status whiteboard: Attachments: Issue 2898 blocks: Votes for issue 2898: Opened: Thu Jan 29 14:33:00 -0700 2009 ------------------------ The execd_param ACCT_RESERVED_USAGE (and probably SHARETREE_RESERVED_USAGE too) does not handle parallel environments in a reasonable way. For example: Jobs 231 is just "sleep 60" in a parallel environment. ACCT_RESERVED_USAGE was turned off while it ran. > qacct -j 231 | egrep "wallclock|slots|cpu" slots 4 ru_wallclock 60 cpu 0 We turn on ACCT_RESERVED_USAGE and run it again, job 233. Its accounting record looks like this: > qacct -j 233 | egrep "wallclock|slots|cpu" slots 4 ru_wallclock 60 cpu 60 Seems to me the user ought to be charged for 240 cpu seconds --- 60 seconds on each of 4 slots. ------- Additional comments from crei Thu Feb 26 02:09:51 -0700 2009 ------- When using parallel environment the submitter have to specify the nr. of slots the job will use (qsub -pe): qsub -pe mytestpe 1 $SGE_ROOT/examples/jobs/sleeper.sh 60 Your job 260736 ("Sleeper") has been submitted > qacct -j 260736 | egrep "wallclock|slots|cpu" slots 1 ru_wallclock 60 cpu 0 The man page of ACCT_RESERVED_USAGE does not mention to have an influence at the slots value: ACCT_RESERVED_USAGE If this parameter is set to true, the usage of reserved resources is used for the accounting entries cpu, mem and io instead of the measured usage. Therefore I close this issue and set it to invalid. ------- Additional comments from rdickson Fri Feb 27 06:59:44 -0700 2009 ------- Ok, then I'm going to reopen this issue against the man page. The man entry, "the usage of reserved resources is used for the accounting entries cpu, mem and io instead of the measured usage," is uninformative. Does ACCT_RESERVED_USAGE have anything to do with resource reservation? (qsub -R y) No, it does not. Does it have anything to do with reserved run time? (qsub -l h_rt=hh:mm:ss)? No, it does not, because the accounting entry made is not the *reserved* h_rt, it is the *measured* wallclock time. The name implies some entity is being reserved, and slots seemed the obvious thing. But apparently I was wrong. Perhaps the feature has a misleading name? I understand it would be futile to suggest renaming it. But please change the man page to describe what it does, and perhaps even what it's good for? |
|||
#629 | fixed | IZ2899: bad RQS syntax can crash/hang qmaster process | craffi | |
Description |
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2899] Issue #: 2899 Platform: All Reporter: craffi (craffi) Component: gridengine OS: All Subcomponent: qmaster Version: 6.2u1 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: DEFECT Target milestone: 6.2u3 Assigned to: ernst (ernst) QA Contact: ernst URL: * Summary: bad RQS syntax can crash/hang qmaster process Status whiteboard: Attachments: Issue 2899 blocks: Votes for issue 2899: Opened: Thu Jan 29 20:19:00 -0700 2009 ------------------------ Reproduce: Make 2 project objects: name testProject oticket 0 fshare 0 usage NONE acl NONE xacl NONE name testProject2 oticket 0 fshare 0 usage NONE acl NONE xacl NONE The following BAD RQS syntax will hang or crash Grid Engine. On Linux/x86_64 I was able to crash the qmaster process, on Mac OS X the system seems to hang and clients will eventually see this error: "ERROR: failed receiving gdi request response for mid=4 (got syncron message receive timeout error)." This will cause the problem: { name testRQS description will this crash a qmaster? enabled TRUE limit projects !{testProject,testProject2} queues * hosts * to slots=10 } ------- Additional comments from crei Thu Feb 26 02:37:18 -0700 2009 ------- When I try to create the projects in current 62u2 (maintrunk) system I get the following error message: error: unknown attribute name "usage" error: error reading file: "/tmp/1782-VUVTwk" cant read project But setting a incorrect resource quota set still can crash qmaster: qconf -arqs { name abc description will we crash the qmaster? enabled TRUE limit projects !{testProject,testProject2} queues * hosts * to slots=10 } 1024 7629 worker000 worker000 takes packet from priority queue. (packet_queue->counter = 0; packet_queue->waiting = 1) 1025 7629 worker000 GDI ADD resource quota set (host2/qconf/1) (user1/0815/rad/10) 1026 7629 worker000 got new resource quota set 1027 7629 worker000 got new description 1028 7629 worker000 got new enabled 1029 7629 worker000 !!!!!!! sge_resolve_host: WARNING call with old lStringT data type, 1030 7629 worker000 !!!!!!! this data type should be replaced with lHostT data type in 1031 7629 worker000 !!!!!!! the future! Nevertheless, just a warning! Function works fine! => crash !!! Priority P3 is ok since you have to be manager user to setup invalid rqs! |
Note: See TracQuery
for help on using queries.