[GE users] Capping logfile size

Parikh, Neal Neal.Parikh at gs.com
Mon Oct 20 17:36:55 BST 2008


I guess there are two separate issues:

(1) Warnings. Every once in a while, some user kicks off code that
appears to produce absurdly large logfiles. These files are so large
that they sometimes fill up the common disk where all the grid
stdout/stderr output is sent. So I'd like to have a warning sent to some
admins and the job owner saying "you're running something that's already
produced over 100 MB of logs, please check" or something to that effect.

(2) Hard limits. If the issue mentioned above continues, and some users
aren't good about fixing their code to not produce such huge logs, then
I was hoping to find some way to just limit the size of the log files,
either by having SGE just stop updating those files after a certain
point, compressing the logfile and rotating it, or something like that.

In both cases, it will definitely be a cluster administrator setting, I
don't want users setting any of this at submission time.

If there is no simple way to do this, I'll find some workaround outside
SGE, but it would have been nice to have the capability.

Thanks,
Neal

-----Original Message-----
From: Rayson Ho [mailto:rayrayson at gmail.com] 
Sent: Monday, October 20, 2008 12:21 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Capping logfile size

First of all, I would like to find out how you want to limit the size
file: Do you want the job owner to set the limit at job submission
time, or you want the limit to be set globally by the cluster
administrator??

Currently we don't have a direct way to do this, but you can use an
external load sensor (suggested by Reuti).

However, if you open an issue (see
http://gridengine.sunsource.net/issues/ ), then we may be able to add
this feature inside SGE in a future version.

I just read the code, one way to implement this feature is to add some
code in the main loop of execd. We then iterate through the list of
jobs, we check the size of the job's out/err file (JB_stdout_path_list
and JB_stderr_path_list). This should be real simple to do (may be a
few hours of work), but the only way to set the threshold limit is by
the cluster administrator if we implement it this way.

Rayson



On 10/20/08, Parikh, Neal <Neal.Parikh at gs.com> wrote:
> Thanks. This is close to what I want to do but not quite the same. I
> want to send an alert email about logfile size even if the job is
still
> running; it seems like this would only allow me to send the email
after
> the job is complete. Is there a way of doing that?
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Monday, October 20, 2008 10:45 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Capping logfile size
>
> Hi,
>
> Am 20.10.2008 um 15:44 schrieb Parikh, Neal:
>
> > Is it possible to include some automatic monitoring that generates
an
> > email alert (to some pre-specified addresses, not just the job
owner)
>
> if you would just like to kill the job, you could set s_fsize in the
> queue configuration. The any further write would fail. But this will
> affect all file accesses of the job, not only the logfile. Reading
> bigger files should be possible though.
>
> > when a job's logfile goes over a certain file size? I want to
monitor
> > stdout_path_list and stderr_path_list and would prefer to do it
> > directly
> > within SGE.
>
> If you only want to write a warning mail after the job, you could put
> it in queue or global epilog and check therein $SGE_STDOUT_PATH and
> $SGE_STDERR_PATH
>
> -- Reuti
>
> > Thanks,
> > Neal
> >
> >
---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list