[GE users] very large messages file

templedf dan.templeton at sun.com
Fri Jan 29 15:34:51 GMT 2010

While SGE does come with a script to rotate the log files, please be 
aware that it's pretty rudimentary.  There are several better log 
rotation utilities available from third parties.


On 01/29/10 07:25, reuti wrote:
> Hi,
> Am 29.01.2010 um 10:36 schrieb cgull:
>> Hi,
>> Last night on a cluster running SGE, a job appeared to run ok but
>> at the end of the run the following error was outputted to the log
>> "error: commlib error: got read error (closing "nodename/sheperd_i
>> js/1").
>> For all of the nodes.
>> I can then see the following error message in one of the nodes
>> "main|nuvo|E|slave sheperd of job 1094.1 exited with exit status = 11.
>> "main|nuvo|E|abnormal termination of shepherd for job 1094.1 task
>> 10.nuvo: "exit_status" file is empty.
>> The next job that attempted to go onto these machines then was
>> unable to start as the directory was filled.
>> The hard disk appeared to fill up on a few nodes as the messages
>> file in the dir /opt/sge6-2/ge6.2u3/default/spool/"nodename"
>> I am unsure if the directory being full was the cause of not
>> exiting cleanly.
>> Or not exiting cleanly made very large messages file.
>> A few of the nodes had the error main|"nodename"|W|get exit ack for
>> pe task 1."nodename" but task is not in state exiting.
>> repeated lots of times. Making the files over 20G!! Which filled
>> the disks.
>> Any ideas as to what the actual problem was and how to fix this so
>> that it does not happen again?
>> I have currently removed the very large messages files and
>> restarted the sge daemons and jobs are launching and exiting ok?
> there is a prepared script to rotate SGE's logfiles:
> $SGE_ROOT/util/logchecker.sh
> -- Reuti
>> Thanks for your time in advance.
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=241692
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=241750
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list