Custom Query (431 matches)
Results (97 - 99 of 431)
Ticket | Resolution | Summary | Owner | Reporter |
---|---|---|---|---|
#360 | fixed | IZ2062: Memory leak in qmaster | olle | |
Description |
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2062] Issue #: 2062 Platform: All Reporter: olle (olle) Component: gridengine OS: Linux Subcomponent: qmaster Version: 6.0u8 CC: [_] reuti [_] uddeborg [_] Remove selected CCs Status: REOPENED Priority: P3 Resolution: Issue type: DEFECT Target milestone: 6.0u8 Assigned to: ernst (ernst) QA Contact: ernst URL: * Summary: Memory leak in qmaster Status whiteboard: Attachments: Issue 2062 blocks: Votes for issue 2062: Opened: Wed May 24 02:53:00 -0700 2006 ------------------------ After five days of uptime our qmaster process is above 10GB and is still growing in size. I have no idea how to debug this on a running cluster in production, and I have not managed to reproduce it on a smaller environment. Any ideas are welcome. ------- Additional comments from uddeborg Wed May 24 06:00:42 -0700 2006 ------- The requirement to enter a comment to add yourself as a CC is a bit silly. ------- Additional comments from reuti Mon Sep 18 06:00:27 -0700 2006 ------- For us this seems to happen, when the accounting file reaches a certain size. Deleting the accounting file and restarting the qmaster solved the problem apparently. ------- Additional comments from olle Mon Sep 18 06:30:33 -0700 2006 ------- Any idea on what size would trigger it? We have a daily rotation of the accounting file and usually less than 30000 lines in one file. ------- Additional comments from andreas Mon Sep 18 07:09:04 -0700 2006 ------- I have really no idea how I could explain it. The qmaster never ever reads in the accounting file. All qmaster does is append a line to accounting file for each record. Actually, if qmaster memory growth can be reproduced, it would be interesting to see whether 'accounting_flush_time' setting in sge_conf(5) has any effect on it ... ------- Additional comments from joga Fri Jan 26 04:43:20 -0700 2007 ------- Has been fixed in 6.0u8. Problem was: qmaster buffers the accounting records, and writes the buffered data in fixed intervals. When closing the accounting file after writing failed (e.g. when the filesystem was full), the buffer was not deleted. Beginning with 6.0u8, the buffer is always cleared, regardless if the writing succeeded. Of course this may lead to a data loss, if for example the filesystem is full. ------- Additional comments from olle Mon Feb 19 08:44:41 -0700 2007 ------- > Has been fixed in 6.0u8 I think I reported this issue on 6.0u8. It might have been introduced in earlier versions, but it was not fixed in the courtesy binaries of version 6.0u8. |
|||
#361 | invalid | IZ2065: qstat -r -xml is missing entry | ovid | |
Description |
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2065] Issue #: 2065 Platform: Sun Reporter: ovid (ovid) Component: gridengine OS: All Subcomponent: clients Version: 6.0u4 CC: None defined Status: NEW Priority: P4 Resolution: Issue type: DEFECT Target milestone: --- Assigned to: roland (roland) QA Contact: roland URL: * Summary: qstat -r -xml is missing entry Status whiteboard: Attachments: Issue 2065 blocks: Votes for issue 2065: Opened: Fri May 26 13:48:00 -0700 2006 ------------------------ qstat -r -xml does not have Master queue entry. Here's a fragment of a plain qstat -r output: sgetest@dt218-123# qstat -r job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 7675 0.50500 STARTED sgetest r 05/26/2006 13:07:38 all.q@dt218-123 1 Full jobname: STARTED Master queue: all.q@dt218-123 Hard Resources: Soft Resources: Hard requested queues: all.q ..... and here's the corresponding qstat -r -xml output: sgetest@dt218-123# qstat -r -xml <?xml version='1.0'?> <job_info xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <queue_info> <job_list state="running"> <JB_job_number>7675</JB_job_number> <JAT_prio>0.50500</JAT_prio> <JB_name>STARTED</JB_name> <JB_owner>sgetest</JB_owner> <state>r</state> <JAT_start_time>05/26/2006 13:07:38</JAT_start_time> <queue_name>all.q@dt218-123</queue_name> <slots>1</slots> <hard_req_queue>all.q</hard_req_queue> </job_list> Notice that there is no Master queue entry in the XML output. ------- Additional comments from ovid Fri May 26 14:56:48 -0700 2006 ------- It also lacks entry for Full hjobname. qstat -r looks like this: .... 7750 0.00000 HOLD sgetest hqw 05/26/2006 14:18:21 1 Full jobname: HOLD Hard Resources: Soft Resources: Hard requested queues: all.q while qstat -r -xml looks like this: ...... <job_list state="pending"> <JB_job_number>7750</JB_job_number> <JAT_prio>0.00000</JAT_prio> <JB_name>HOLD</JB_name> <JB_owner>sgetest</JB_owner> <state>hqw</state> <JB_submission_time>05/26/2006 14:18:21</JB_submission_time> <queue_name></queue_name> <slots>1</slots> <hard_req_queue>all.q</hard_req_queue> </job_list> </job_info> </job_info> Note there is only one entry for JB_name, but not for Full jobname. ------- Additional comments from ovid Fri May 26 15:52:17 -0700 2006 ------- For Hard resource, output is inconsistent between qstat -r and qstat -r -xml. For qstat -r, we have: .... 7794 0.55500 AMD64 sgetest r 05/26/2006 15:13:28 all.q@dt218-151 1 5 Full jobname: AMD64 Master queue: all.q@dt218-151 Hard Resources: arch=lx24-amd64 (0.000000) Soft Resources: while for qstat -r -xml, we have ..... <job_list state="running"> <JB_job_number>7794</JB_job_number> <JAT_prio>0.55500</JAT_prio> <JB_name>AMD64</JB_name> <JB_owner>sgetest</JB_owner> <state>r</state> <JAT_start_time>05/26/2006 15:13:28</JAT_start_time> <queue_name>all.q@dt218-151</queue_name> <slots>1</slots> <tasks>5</tasks> <hard_request name="arch" resource_contribution="0.000000">lx24-amd64</hard_request> </job_list> Note that here we don't have the "arch=" part. |
|||
#362 | fixed | IZ2066: drmaa_run_bulk_jobs() input parameter data types must be consistent with 1.0 specification | andreas | |
Description |
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2066] Issue #: 2066 Platform: Sun Reporter: andreas (andreas) Component: gridengine OS: All Subcomponent: drmaa Version: 6.0 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: DEFECT Target milestone: --- Assigned to: templedf (templedf) QA Contact: templedf URL: * Summary: drmaa_run_bulk_jobs() input parameter data types must be consistent with 1.0 specification Status whiteboard: Attachments: Issue 2066 blocks: Votes for issue 2066: Opened: Wed May 31 07:06:00 -0700 2006 ------------------------ DESCRIPTION: The DRMAA 1.0 specification defines drmaa_run_bulk_jobs parameters as: start, end - unsigned integer incr - signed integer in DRMAA 0.95 binding 'start' and 'end' are signed integer. |
Note: See TracQuery
for help on using queries.