[GE users] Questions about log file: $SGE_ROOT/default/spool/qmaster

Viktor Oudovenko udo at physics.rutgers.edu
Wed May 25 20:10:17 BST 2005


Hi,

Is it normal such kind of log output or not?

05/25/2005 13:34:53|qmaster|rupc-cs04b|E|orders user/project version (2366)
is not uptodate (2367) for user/project "cfennie"

05/25/2005 13:34:53|qmaster|rupc-cs04b|E|orders user/project version (955)
is not uptodate (956) for user/project "karenjoh"

05/25/2005 14:05:08|qmaster|rupc-cs04b|E|tightly integrated parallel task
21840.1 task 3.sub04n68 failed - killing job

05/25/2005 14:08:30|qmaster|rupc-cs04b|E|tightly integrated parallel task
21858.1 task 4.sub04n61 failed - killing job



Actually 2 questions:

1) when I modify policy configuration I get messages like in the first 2
lines.
How can I get rid of them?

2) each time parallel job on parallel or myrinet  queue finishes I get error
messages like the last two lines.
Is it normal? 
The  only trick I do it is in the "qmon; queues;  execution method" I put
"Terminate Method" SIGTERM.
It is very helpful to get rid of whole job on all slaves. Especially on
myrinet cluster.

3) the most important question:
One of my users runs perl script calling mpi command a few times in the SGE
script. On occasionally one gets in messages the following lines after which
jobs gets terminated. Any idea what could it be and how to avoid it?

05/25/2005 08:58:23|qmaster|rupc-cs04b|E|tightly integrated parallel task
21823.1 task 5.sub04n88 failed - killing job

05/25/2005 09:00:12|qmaster|rupc-cs04b|W|job 21823.1 failed on host sub04n86
assumedly after job because: job 21823.1 died through signal TERM (15)

Thank you very much to everyone who can help.

Viktor.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list