[GE users] Questions about log file: $SGE_ROOT/default/spool/qmaster

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Fri May 27 08:07:53 BST 2005



Viktor Oudovenko wrote:

>Hi,
>
>Is it normal such kind of log output or not?
>
>05/25/2005 13:34:53|qmaster|rupc-cs04b|E|orders user/project version (2366)
>is not uptodate (2367) for user/project "cfennie"
>
>05/25/2005 13:34:53|qmaster|rupc-cs04b|E|orders user/project version (955)
>is not uptodate (956) for user/project "karenjoh"
>
>05/25/2005 14:05:08|qmaster|rupc-cs04b|E|tightly integrated parallel task
>21840.1 task 3.sub04n68 failed - killing job
>
>05/25/2005 14:08:30|qmaster|rupc-cs04b|E|tightly integrated parallel task
>21858.1 task 4.sub04n61 failed - killing job
>
>
>
>Actually 2 questions:
>
>1) when I modify policy configuration I get messages like in the first 2
>lines.
>How can I get rid of them?
>  
>
Do not change the configuration. If you change the configuration while the
scheduler is busy, it is working on the "old" data and when the scheduling
decision is send the the qmaster it figures out, that the scheduling
decision
is out of date.

Stephan

>2) each time parallel job on parallel or myrinet  queue finishes I get error
>messages like the last two lines.
>Is it normal? 
>The  only trick I do it is in the "qmon; queues;  execution method" I put
>"Terminate Method" SIGTERM.
>It is very helpful to get rid of whole job on all slaves. Especially on
>myrinet cluster.
>
>3) the most important question:
>One of my users runs perl script calling mpi command a few times in the SGE
>script. On occasionally one gets in messages the following lines after which
>jobs gets terminated. Any idea what could it be and how to avoid it?
>
>05/25/2005 08:58:23|qmaster|rupc-cs04b|E|tightly integrated parallel task
>21823.1 task 5.sub04n88 failed - killing job
>
>05/25/2005 09:00:12|qmaster|rupc-cs04b|W|job 21823.1 failed on host sub04n86
>assumedly after job because: job 21823.1 died through signal TERM (15)
>
>Thank you very much to everyone who can help.
>
>Viktor.
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list