[GE users] db_checkpoint keeps running

Ondrej Bojar bojar at ufal.ms.mff.cuni.cz
Mon Dec 17 23:47:29 GMT 2007


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi, all.

Our SGE has recently started leaving db_checkpoint running. Because 
db_checkpoint appears to be launched regularly (every minute or so), our master 
node soon runs out of open files or process numbers.

Any ideas what could be causing the problem? Some broken Berkeley DB? Some 
hanging lockfiles? Someones faulty job?

This is one entry from 'ps auxwf' (there are more than 600 such entries now):

sgeadmin  2697  0.0  0.3   9896  3216 ?        S    Dec17   0:00  \_ crond

sgeadmin  2698  0.0  0.0   2556   960 ?        Ss   Dec17   0:00  |   \_
/bin/sh /net/projects/SGE/AMD64_6.1u2/util/bdb_checkpoint.sh 
/net/projects/SGE/AMD64_6.1u2 default /net/projects/SGE/AMD64_6.1u2/default/spooldb

sgeadmin  2762  0.0  0.0   2556   500 ?        S    Dec17   0:00  |       \_ 
/bin/sh /net/projects/SGE/AMD64_6.1u2/util/bdb_checkpoint.sh 
/net/projects/SGE/AMD64_6.1u2 default /net/projects/SGE/AMD64_6.1u2/default/spooldb

sgeadmin  2763  0.0  0.0   3528   792 ?        S    Dec17   0:22  |           \_ 
/net/projects/SGE/AMD64_6.1u2/utilbin/lx24-x86/db_checkpoint -1 -h 
/net/projects/SGE/AMD64_6.1u2/default/spooldb

Thanks for hints, Ondrej.

-- 
Ondrej Bojar (mailto:obo at cuni.cz / bojar at ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list