[GE users] Capping logfile size

Reuti reuti at staff.uni-marburg.de
Mon Oct 20 17:56:07 BST 2008


Am 20.10.2008 um 18:36 schrieb Parikh, Neal:

> I guess there are two separate issues:
>
> (1) Warnings. Every once in a while, some user kicks off code that
> appears to produce absurdly large logfiles. These files are so large
> that they sometimes fill up the common disk where all the grid
> stdout/stderr output is sent. So I'd like to have a warning sent to  
> some
> admins and the job owner saying "you're running something that's  
> already
> produced over 100 MB of logs, please check" or something to that  
> effect.

What about disk quotas for a partition? It can start a daily (or even  
hourly) "quotacheck" and send a warning. If noone response, at one  
time the hard limit will be enforced and no further writes are  
possible. This will also avoid, that other users are impeded by some  
odd scripts.

-- Reuti


>
> (2) Hard limits. If the issue mentioned above continues, and some  
> users
> aren't good about fixing their code to not produce such huge logs,  
> then
> I was hoping to find some way to just limit the size of the log files,
> either by having SGE just stop updating those files after a certain
> point, compressing the logfile and rotating it, or something like  
> that.
>
> In both cases, it will definitely be a cluster administrator  
> setting, I
> don't want users setting any of this at submission time.
>
> If there is no simple way to do this, I'll find some workaround  
> outside
> SGE, but it would have been nice to have the capability.
>
> Thanks,
> Neal
>
> -----Original Message-----
> From: Rayson Ho [mailto:rayrayson at gmail.com]
> Sent: Monday, October 20, 2008 12:21 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Capping logfile size
>
> First of all, I would like to find out how you want to limit the size
> file: Do you want the job owner to set the limit at job submission
> time, or you want the limit to be set globally by the cluster
> administrator??
>
> Currently we don't have a direct way to do this, but you can use an
> external load sensor (suggested by Reuti).
>
> However, if you open an issue (see
> http://gridengine.sunsource.net/issues/ ), then we may be able to add
> this feature inside SGE in a future version.
>
> I just read the code, one way to implement this feature is to add some
> code in the main loop of execd. We then iterate through the list of
> jobs, we check the size of the job's out/err file (JB_stdout_path_list
> and JB_stderr_path_list). This should be real simple to do (may be a
> few hours of work), but the only way to set the threshold limit is by
> the cluster administrator if we implement it this way.
>
> Rayson
>
>
>
> On 10/20/08, Parikh, Neal <Neal.Parikh at gs.com> wrote:
>> Thanks. This is close to what I want to do but not quite the same. I
>> want to send an alert email about logfile size even if the job is
> still
>> running; it seems like this would only allow me to send the email
> after
>> the job is complete. Is there a way of doing that?
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: Monday, October 20, 2008 10:45 AM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Capping logfile size
>>
>> Hi,
>>
>> Am 20.10.2008 um 15:44 schrieb Parikh, Neal:
>>
>>> Is it possible to include some automatic monitoring that generates
> an
>>> email alert (to some pre-specified addresses, not just the job
> owner)
>>
>> if you would just like to kill the job, you could set s_fsize in the
>> queue configuration. The any further write would fail. But this will
>> affect all file accesses of the job, not only the logfile. Reading
>> bigger files should be possible though.
>>
>>> when a job's logfile goes over a certain file size? I want to
> monitor
>>> stdout_path_list and stderr_path_list and would prefer to do it
>>> directly
>>> within SGE.
>>
>> If you only want to write a warning mail after the job, you could put
>> it in queue or global epilog and check therein $SGE_STDOUT_PATH and
>> $SGE_STDERR_PATH
>>
>> -- Reuti
>>
>>> Thanks,
>>> Neal
>>>
>>>
> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list