[GE users] Capping logfile size

Reuti reuti at staff.uni-marburg.de
Tue Oct 21 15:02:24 BST 2008


Am 21.10.2008 um 15:50 schrieb Parikh, Neal:

> To clarify, I don't care about the user filling up his own space
> directory, since that doesn't really happen in practice (plus it only
> affects that user, rather than all the other users). What happens is
> that they don't write code properly, and stderr or stdout gets
> completely flooded, unintentionally, with some infinite loop. So there
> is only one directory, the common logfile directory, that I am  
> concerned
> about. That should be much lower overhead than what you were asking
> about.

If your workflow is to put all logfiles into one directory, you can  
even setup a disk quota for this partition with different limits from  
their /home and avoid affecting other users. Disk quota will not  
check the size of a directory, but adding up the size of all files  
belonging to each user.

-- Reuti


>
> Yes, I will open an issue.
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Tuesday, October 21, 2008 7:11 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Capping logfile size
>
> Am 21.10.2008 um 06:51 schrieb Ron Chen:
>
>> --- On Tue, 10/21/08, Parikh, Neal <Neal.Parikh at gs.com> wrote:
>>> (2) Hard limits. If the issue mentioned above continues,
>>> and some users aren't good about fixing their code to not
>>> produce such huge logs, then
>>> I was hoping to find some way to just limit the size of the
>>> log files, either by having SGE just stop updating those
>>> files after a certain point, compressing the logfile and
>>> rotating it, or something like that.
>>
>> Note that the user can submit a job that fills up the temp
>> directory or the user's home directory, and SGE won't be able to
>> detect that! In order to make it 100% loophole free, SGE will need
>> to trap all the system calls performed by the job, and that's high
>> overhead IMO.
>>
>> I think the truncate(2) system call can be used to reduce the file
>> size, but we will need to discuss about how everything fits in --
>> as each time SGE truncates the files, new data are also written to
>> those files at the same time.
>>
>> Neal, can you open an issue so that we can track this feature
>> request? Otherwise after a week or two we will forget all this
>> discussion.
>
> This also depends on the OS. AFAIK in NEC's Super-UX you can set the
> user limit "fspace" for the space allocated by all files of a process
> in total (in addition to fsize where it's per file).
>
> -- Reuti
>
>
>>  -Ron
>>
>>
>>>
>>> In both cases, it will definitely be a cluster
>>> administrator setting, I
>>> don't want users setting any of this at submission
>>> time.
>>>
>>> If there is no simple way to do this, I'll find some
>>> workaround outside
>>> SGE, but it would have been nice to have the capability.
>>>
>>> Thanks,
>>> Neal
>>>
>>> -----Original Message-----
>>> From: Rayson Ho [mailto:rayrayson at gmail.com]
>>> Sent: Monday, October 20, 2008 12:21 PM
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] Capping logfile size
>>>
>>> First of all, I would like to find out how you want to
>>> limit the size
>>> file: Do you want the job owner to set the limit at job
>>> submission
>>> time, or you want the limit to be set globally by the
>>> cluster
>>> administrator??
>>>
>>> Currently we don't have a direct way to do this, but
>>> you can use an
>>> external load sensor (suggested by Reuti).
>>>
>>> However, if you open an issue (see
>>> http://gridengine.sunsource.net/issues/ ), then we may be
>>> able to add
>>> this feature inside SGE in a future version.
>>>
>>> I just read the code, one way to implement this feature is
>>> to add some
>>> code in the main loop of execd. We then iterate through the
>>> list of
>>> jobs, we check the size of the job's out/err file
>>> (JB_stdout_path_list
>>> and JB_stderr_path_list). This should be real simple to do
>>> (may be a
>>> few hours of work), but the only way to set the threshold
>>> limit is by
>>> the cluster administrator if we implement it this way.
>>>
>>> Rayson
>>>
>>>
>>>
>>> On 10/20/08, Parikh, Neal <Neal.Parikh at gs.com> wrote:
>>>> Thanks. This is close to what I want to do but not
>>> quite the same. I
>>>> want to send an alert email about logfile size even if
>>> the job is
>>> still
>>>> running; it seems like this would only allow me to
>>> send the email
>>> after
>>>> the job is complete. Is there a way of doing that?
>>>>
>>>> -----Original Message-----
>>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>> Sent: Monday, October 20, 2008 10:45 AM
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] Capping logfile size
>>>>
>>>> Hi,
>>>>
>>>> Am 20.10.2008 um 15:44 schrieb Parikh, Neal:
>>>>
>>>>> Is it possible to include some automatic
>>> monitoring that generates
>>> an
>>>>> email alert (to some pre-specified addresses, not
>>> just the job
>>> owner)
>>>>
>>>> if you would just like to kill the job, you could set
>>> s_fsize in the
>>>> queue configuration. The any further write would fail.
>>> But this will
>>>> affect all file accesses of the job, not only the
>>> logfile. Reading
>>>> bigger files should be possible though.
>>>>
>>>>> when a job's logfile goes over a certain file
>>> size? I want to
>>> monitor
>>>>> stdout_path_list and stderr_path_list and would
>>> prefer to do it
>>>>> directly
>>>>> within SGE.
>>>>
>>>> If you only want to write a warning mail after the
>>> job, you could put
>>>> it in queue or global epilog and check therein
>>> $SGE_STDOUT_PATH and
>>>> $SGE_STDERR_PATH
>>>>
>>>> -- Reuti
>>>>
>>>>> Thanks,
>>>>> Neal
>>>>>
>>>>>
>>> -------------------------------------------------------------------- 
>>> -
>>>>> To unsubscribe, e-mail:
>>> users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail:
>>> users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>> -------------------------------------------------------------------- 
>>> -
>>>> To unsubscribe, e-mail:
>>> users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail:
>>> users-help at gridengine.sunsource.net
>>>>
>>>>
>>> -------------------------------------------------------------------- 
>>> -
>>>> To unsubscribe, e-mail:
>>> users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail:
>>> users-help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail:
>>> users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail:
>>> users-help at gridengine.sunsource.net
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail:
>>> users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail:
>>> users-help at gridengine.sunsource.net
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list