[GE users] 6.2 qping and deadlocks

Justin Ottley ottley at coredp.com
Thu Oct 30 14:33:15 GMT 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hey all,
Im debugging a new 6.2 install, and noticed that the qping 'info' output 
looks like this:

info:   MAIN: E (60015.99) | signaler000: E (60015.58) | 
event_master000: E (0.63) | timer000: E (0.63) | worker000: E (59431.99) 
| worker001: E (59446.31) | listener000: E (3.53) | listener001: E 
(3.53) | scheduler000: E (0.63) | ERROR

pretty much all the time, even under the following conditions:
- no visible problem (jobs get queued and run, commands like qstat, 
qconf, etc work, qmon is functional)
- a clean, minimal install of 6.2 using berkeley db RPC (no execd, no 
shadowd, no arco)
- a clean, minimal install of 6.2 using classic spooling (no execd, no 
shadowd, no arco)

anyone know whether this behavior is normal or not?
the output of qping on a 6.1 install on the same box shows no such 
errors (i acknowledge the qping info format is different in 6.1, but 
shows OK)

In addition, the problem im actually having is my 6.2 / berkeley db RPC 
install seems to suffer from relatively frequent deadlocks, with errors 
of the form:

|E|error writing object with key "JOB:     259" into berkeley database: 
(-30995) DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock

This is after restarts of the RPC server and qmaster.
Ive ran thousands of jobs in a 6.1 install and never saw this error..

Arch: lx24-x86
Fedora Core 4, Fedora Core 6

thanks for any help/info/advice,
-justin


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list