[GE users] qmaster using HUGE memory
seandavi at gmail.com
Fri Aug 27 15:50:45 BST 2010
[ The following text is in the "utf-8" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
On Mon, Jun 14, 2010 at 7:39 AM, Sean Davis <sdavis2 at mail.nih.gov<mailto:sdavis2 at mail.nih.gov>> wrote:
On Wed, Jun 9, 2010 at 1:08 PM, kjpursley <kevin.pursley at bp.com<mailto:kevin.pursley at bp.com>> wrote:
As first pass turn off schedd_job_info. We have seen this use a bunch
I THINK this fixed things for us. In any case, we have not had any problems recently. Thanks for the help.
This post is a bit dated, but I wanted to follow up. Turning off schedd_job_info did fix this issue (6.2u5). However, users miss the job scheduling information. Any other hints that we could try that would get us the scheduling info back? This is not a big system and I wouldn't expect scheduling for a few dozen jobs to require more than 32GB of RAM (what our qmaster has).
From: seandavi [mailto:seandavi at gmail.com<mailto:seandavi at gmail.com>]
Sent: Wednesday, June 09, 2010 12:02 PM
To: users at gridengine.sunsource.net<mailto:users at gridengine.sunsource.net>
Subject: Re: [GE users] qmaster using HUGE memory
On Wed, Jun 9, 2010 at 11:19 AM, rems0 <Richard.Ems at cape-horn-eng.com<mailto:Richard.Ems at cape-horn-eng.com>> wrote:
Is your schedd_job_info set to false (qconf -ssconf) ?
Just for fun, the whole output. To answer directly, schedd_job_info=true in our setup.
Are you using parallel environments ?
Yes. All jobs are simply SMP jobs, though, so no MPI integration.
On 06/09/2010 12:35 PM, seandavi wrote:
> Just a followup....
> The qmaster did finally come back down to a "normal" size (about 20
> minutes) of 40m or so. So, I suppose this is expected behavior that I
> just happened to be around to observe and it may have happened before.
> I'm still curious as to why it might happen. I had just submitted 5
> new jobs, but they were not array jobs or anything else complicated.
> Thanks again,
> On Wed, Jun 9, 2010 at 6:24 AM, Sean Davis <seandavi at gmail.com<mailto:seandavi at gmail.com>
> <mailto:seandavi at gmail.com<mailto:seandavi at gmail.com>>> wrote:
> Using 6.2u5, I found this AM that jobs were not being scheduled. I
> checked around a bit and it turns out that the qmaster was using
> 30GB of RAM and the machine was thrashing. This is with no array
> jobs scheduled or running, 10 jobs in the queue, and a very small
> cluster with only about 10 nodes. The messages file is bland, I
> think, but I can post an excerpt since last restarting the qmaster
> (I have done that a couple of times). Any suggestions?
> The config looks like:
> execd_spool_dir /import/cluster/sge6_2u5/default/spool
> mailer /bin/mail
> xterm /usr/bin/X11/xterm
> load_sensor none
> prolog none
> epilog none
> shell_start_mode posix_compliant
> login_shells sh,ksh,csh,tcsh
> min_uid 0
> min_gid 0
> user_lists none
> xuser_lists none
> projects none
> xprojects none
> enforce_project false
> enforce_user auto
> load_report_time 00:00:40
> max_unheard 00:05:00
> reschedule_unknown 00:00:00
> loglevel log_warning
> administrator_mail sdavis2 at mail.nih.gov<mailto:sdavis2 at mail.nih.gov>
> <mailto:sdavis2 at mail.nih.gov<mailto:sdavis2 at mail.nih.gov>>
> set_token_cmd none
> pag_cmd none
> token_extend_time none
> shepherd_cmd none
> qmaster_params none
> execd_params none
> reporting_params accounting=true reporting=true \
> flush_time=00:00:10 joblog=true
> finished_jobs 100
> gid_range 20200-20300
> qlogin_command builtin
> qlogin_daemon builtin
> rlogin_command builtin
> rlogin_daemon builtin
> rsh_command builtin
> rsh_daemon builtin
> max_aj_instances 2000
> max_aj_tasks 5000
> max_u_jobs 0
> max_jobs 0
> max_advance_reservations 0
> auto_user_oticket 0
> auto_user_fshare 100
> auto_user_default_project none
> auto_user_delete_time 86400
> delegated_file_staging false
> reprioritize false
> additional_jvm_args -Xmx2g
> jsv_url none
> jsv_allowed_mod ac,h,i,e,o,j,M,N,p,w
Richard Ems mail: Richard.Ems at Cape-Horn-Eng.com
Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5? piso
Tel : +34 96 3242923 / Fax 924
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].
More information about the gridengine-users