[GE users] high CPU load for sge_qmaster

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Mon May 9 16:33:00 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

You had a failing qstat and qrsh at that time. Looks like a commlib 
problem..... Or at
least is this a second hint towards that direction. Do you know, why the 
clients
failed?

Thanks a lot.
Stephan


Sean Dilda wrote:

> Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
>
>> Hi Sean,
>>
>> we got a report, which sounds very similar to yours. Do you have any 
>> commlib
>> error messages in your qmaster messages file? It could be,that the 
>> high CPU
>> load is triggered by broken connections. Just an assumption. Do you have
>> data
>> do back up this idea?
>
>
> Thanks for the reminder.  I restarted my sge_qmaster on the 3rd, and 
> my load monitoring showed the load shooting up around 6pm that 
> evening. There was some interesting stuff in the logs that I meant to 
> send, but kept forgetting to.  I should note that I am using CSP.  
> Here are the logs:
>
>
> 05/03/2005 17:57:14|qmaster|head4|I|jes12 has registered the job 
> 140920 for deletion
> 05/03/2005 17:57:20|qmaster|head4|I|Discontinued delete transaction of 
> user "jes12" after job 140920
> 05/03/2005 17:57:20|qmaster|head4|E|can't send asynchronous message to 
> commproc (qdel:815) on host "head2": no valid port number
> 05/03/2005 17:57:20|qmaster|head4|E|Send response to (qstat:816:head2) 
> failed
> 05/03/2005 17:57:20|qmaster|head4|E|sec_respond_announce to 
> (qstat:816:head2) failed
> 05/03/2005 17:57:20|qmaster|head4|E|failed handle announce for 
> (head2:qstat:816)
> 05/03/2005 17:57:20|qmaster|head4|I|task 2.bio-n057 at bio-n057 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:20|qmaster|head4|I|task 2.cbcb-n32 at cbcb-n32 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:20|qmaster|head4|I|task 2.bio-n039 at bio-n039 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:21|qmaster|head4|I|task 1.bio-n039 at bio-n039 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:21|qmaster|head4|I|task 2.core-n28 at core-n28 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:21|qmaster|head4|I|task 1.core-n28 at core-n28 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:21|qmaster|head4|I|task 2.nsoe-n02 at nsoe-n02 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:22|qmaster|head4|I|task 1.nsoe-n02 at nsoe-n02 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:22|qmaster|head4|I|task 1.bio-n027 at bio-n027 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:22|qmaster|head4|I|task 2.cbcb-n19 at cbcb-n19 of job 
> 140920.1 finished
> 05/03/2005 17:57:22|qmaster|head4|I|task 1.cbcb-n15 at cbcb-n15 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:22|qmaster|head4|E|Send response to (qrsh:817:head2) 
> failed
> 05/03/2005 17:57:23|qmaster|head4|E|sec_respond_announce to 
> (qrsh:817:head2) failed
> 05/03/2005 17:57:23|qmaster|head4|E|failed handle announce for 
> (head2:qrsh:817)
> 05/03/2005 17:57:23|qmaster|head4|I|task 2.stat-n33 at stat-n33 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:23|qmaster|head4|E|Send response to (qstat:818:head2) 
> failed
> 05/03/2005 17:57:23|qmaster|head4|E|sec_respond_announce to 
> (qstat:818:head2) failed
> 05/03/2005 17:57:23|qmaster|head4|E|failed handle announce for 
> (head2:qstat:818)
> 05/03/2005 17:57:23|qmaster|head4|E|Send response to 
> (qrsh:819:stat-n13) failed
> 05/03/2005 17:57:23|qmaster|head4|E|sec_respond_announce to 
> (qrsh:819:stat-n13) failed
> 05/03/2005 17:57:23|qmaster|head4|E|failed handle announce for 
> (stat-n13:qrsh:819)05/03/2005 17:57:23|qmaster|head4|I|task 2.stat-n21 
> at stat-n21 of job 140920.1 died through signal KILL
> 05/03/2005 17:57:23|qmaster|head4|I|task 1.stat-n21 at stat-n21 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:24|qmaster|head4|I|task 2.bio-n081 at bio-n081 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:24|qmaster|head4|I|task 1.bio-n081 at bio-n081 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:24|qmaster|head4|I|task 1.stat-n15 at stat-n15 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:24|qmaster|head4|I|task 2.bio-n069 at bio-n069 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:25|qmaster|head4|I|task 1.bio-n069 at bio-n069 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:25|qmaster|head4|E|Send response to (qstat:820:head2) 
> failed
> 05/03/2005 17:57:25|qmaster|head4|E|sec_respond_announce to 
> (qstat:820:head2) failed
> 05/03/2005 17:57:25|qmaster|head4|E|failed handle announce for 
> (head2:qstat:820)
> 05/03/2005 17:57:25|qmaster|head4|I|task 1.bio-n050 at bio-n050 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:25|qmaster|head4|I|task 2.stat-n05 at stat-n05 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:25|qmaster|head4|I|task 1.stat-n05 at stat-n05 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:26|qmaster|head4|I|task 2.bio-n097 at bio-n097 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:26|qmaster|head4|I|task 2.bio-n092 at bio-n092 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:26|qmaster|head4|I|task 1.bio-n092 at bio-n092 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:26|qmaster|head4|I|task 1.stat-n27 at stat-n27 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:27|qmaster|head4|I|task 2.cbcb-n26 at cbcb-n26 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:27|qmaster|head4|I|task 1.cbcb-n26 at cbcb-n26 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:27|qmaster|head4|I|task 2.bio-n014 at bio-n014 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:27|qmaster|head4|I|task 1.bio-n014 at bio-n014 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:28|qmaster|head4|I|task 2.bio-n082 at bio-n082 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:28|qmaster|head4|I|task 1.bio-n082 at bio-n082 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:28|qmaster|head4|I|task 1.bio-n066 at bio-n066 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:28|qmaster|head4|I|task 2.bio-n030 at bio-n030 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:29|qmaster|head4|I|task 2.chg-n07 at chg-n07 of job 
> 140920.1 died through signal HUP
> 05/03/2005 17:57:29|qmaster|head4|I|task 1.chg-n07 at chg-n07 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:29|qmaster|head4|I|task 2.stat-n26 at stat-n26 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:29|qmaster|head4|E|execd at nsoe-n06 reports running job 
> (140925.1/master) in queue "lowprio.q at nsoe-n06" that was not supposed 
> to be there - killing
> 05/03/2005 17:57:29|qmaster|head4|I|task 1.chg-n05 at chg-n05 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:30|qmaster|head4|I|task 2.bio-n079 at bio-n079 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:30|qmaster|head4|I|task 1.bio-n079 at bio-n079 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:30|qmaster|head4|I|task 2.stat-n12 at stat-n12 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:30|qmaster|head4|I|task 1.cbcb-n20 at cbcb-n20 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:31|qmaster|head4|I|task 2.bio-n058 at bio-n058 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:31|qmaster|head4|I|task 1.cbcb-n17 at cbcb-n17 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:31|qmaster|head4|I|task 2.chg-n08 at chg-n08 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:33|qmaster|head4|I|task 2.core-n34 at core-n34 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:33|qmaster|head4|I|task 1.core-n34 at core-n34 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:33|qmaster|head4|I|task 1.cbcb-n28 at cbcb-n28 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:34|qmaster|head4|I|task 1.bio-n071 at bio-n071 of job 
> 140920.1 finished
> 05/03/2005 17:57:34|qmaster|head4|I|task 1.core-n40 at core-n40 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:34|qmaster|head4|I|task 1.core-n60 at core-n60 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:34|qmaster|head4|I|task 2.bio-n067 at bio-n067 of job 
> 140920.1 finished
> 05/03/2005 17:57:35|qmaster|head4|I|task 1.bio-n067 at bio-n067 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:35|qmaster|head4|I|task 2.stat-n23 at stat-n23 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:35|qmaster|head4|I|task 1.stat-n23 at stat-n23 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:35|qmaster|head4|I|task 2.stat-n19 at stat-n19 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:36|qmaster|head4|I|task 1.stat-n19 at stat-n19 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:36|qmaster|head4|E|can't send asynchronous message to 
> commproc (qdel:815) on host "head2": no valid port number
> 05/03/2005 17:57:36|qmaster|head4|I|task 1.bio-n057 at bio-n057 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:36|qmaster|head4|I|task 1.cbcb-n32 at cbcb-n32 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:37|qmaster|head4|I|task 2.bio-n027 at bio-n027 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:37|qmaster|head4|I|task 1.cbcb-n19 at cbcb-n19 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:37|qmaster|head4|I|task 2.cbcb-n15 at cbcb-n15 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:37|qmaster|head4|E|Send response to (qrsh:817:head2) 
> failed
> 05/03/2005 17:57:37|qmaster|head4|E|sec_respond_announce to 
> (qrsh:817:head2) failed
> 05/03/2005 17:57:37|qmaster|head4|E|failed handle announce for 
> (head2:qrsh:817)
> 05/03/2005 17:57:38|qmaster|head4|I|task 1.stat-n33 at stat-n33 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:38|qmaster|head4|E|Send response to (qstat:818:head2) 
> failed
> 05/03/2005 17:57:38|qmaster|head4|E|sec_respond_announce to 
> (qstat:818:head2) failed
> 05/03/2005 17:57:38|qmaster|head4|E|failed handle announce for 
> (head2:qstat:818)
> 05/03/2005 17:57:38|qmaster|head4|E|Send response to 
> (qrsh:819:stat-n13) failed
> 05/03/2005 17:57:38|qmaster|head4|E|sec_respond_announce to 
> (qrsh:819:stat-n13) failed
> 05/03/2005 17:57:38|qmaster|head4|E|failed handle announce for 
> (stat-n13:qrsh:819)
> 05/03/2005 17:57:38|qmaster|head4|I|task 2.stat-n15 at stat-n15 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:38|qmaster|head4|E|Send response to (qstat:820:head2) 
> failed
> 05/03/2005 17:57:38|qmaster|head4|E|sec_respond_announce to 
> (qstat:820:head2) failed
> 05/03/2005 17:57:38|qmaster|head4|E|failed handle announce for 
> (head2:qstat:820)
> 05/03/2005 17:57:38|qmaster|head4|I|task 2.bio-n050 at bio-n050 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:39|qmaster|head4|I|task 1.bio-n097 at bio-n097 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:39|qmaster|head4|I|task 2.stat-n27 at stat-n27 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:39|qmaster|head4|I|task 2.bio-n066 at bio-n066 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:39|qmaster|head4|I|task 1.bio-n030 at bio-n030 of job 
> 140920.1 died through signal HUP
> 05/03/2005 17:57:40|qmaster|head4|I|task 1.stat-n26 at stat-n26 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:40|qmaster|head4|E|execd at nsoe-n06 reports running job 
> (140925.1/master) in queue "lowprio.q at nsoe-n06" that was not supposed 
> to be there - killing
> 05/03/2005 17:57:40|qmaster|head4|I|task 2.chg-n05 at chg-n05 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:40|qmaster|head4|I|task 1.stat-n12 at stat-n12 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:40|qmaster|head4|I|task 2.cbcb-n20 at cbcb-n20 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:41|qmaster|head4|I|task 1.bio-n058 at bio-n058 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:41|qmaster|head4|I|task 2.cbcb-n17 at cbcb-n17 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:41|qmaster|head4|I|task 1.chg-n08 at chg-n08 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:41|qmaster|head4|E|orders user/project version (187) 
> is not uptodate (188) for user/project "sbroy"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (477) 
> is not uptodate (478) for user/project "kpe"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (619) 
> is not uptodate (620) for user/project "jsm23"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (697) 
> is not uptodate (698) for user/project "jolantam"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (697) 
> is not uptodate (698) for user/project "jtranqui"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (697) 
> is not uptodate (698) for user/project "fkauff"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (697) 
> is not uptodate (698) for user/project "jdh14"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (697) 
> is not uptodate (698) for user/project "jes12"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (697) 
> is not uptodate (698) for user/project "ilya"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (697) 
> is not uptodate (698) for user/project "jk4"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (697) 
> is not uptodate (698) for user/project "cmh27"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (697) 
> is not uptodate (698) for user/project "slj2"
> 05/03/2005 17:57:42|qmaster|head4|E|orders user/project version (697) 
> is not uptodate (698) for user/project "ql10"
> 05/03/2005 17:57:42|qmaster|head4|I|task 2.cbcb-n28 at cbcb-n28 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:42|qmaster|head4|I|task 2.core-n40 at core-n40 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:43|qmaster|head4|I|task 2.core-n60 at core-n60 of job 
> 140920.1 died through signal KILL
> 05/03/2005 17:57:43|qmaster|head4|I|execd on stat-n13 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on core-n23 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on core-n32 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on bio-n038 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on bio-n004 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on nsoe-n01 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on cod-n020 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on core-n59 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on bio-n083 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on bio-n062 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on core-n25 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on bio-n042 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on opt-n04 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on opt-n03 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on cbcb-n30 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on cbcb-n21 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on bio-n077 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on cod-n017 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on bio-n049 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on cod-n034 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on bio-n031 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on bio-n024 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on cod-n039 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on core-n37 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on core-n48 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on core-n36 registered
> 05/03/2005 17:57:43|qmaster|head4|I|execd on cod-n019 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on core-n29 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on cod-n033 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on bio-n099 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on cod-n036 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on core-n46 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on core-n31 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on stat-n31 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on bio-n070 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on bio-n055 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on bio-n087 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on bio-n008 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on stat-n37 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on cod-n024 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on cod-n021 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on cbcb-n10 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on bio-n020 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on cod-n037 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on bio-n086 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on bio-n091 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on cod-n029 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on core-n38 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on cod-n013 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on stat-n22 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on bio-n033 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on cbcb-n12 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on opt-n06 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on bio-n054 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on bio-n001 registered
> 05/03/2005 17:57:44|qmaster|head4|I|execd on stat-n02 registered
> 05/03/2005 17:57:46|qmaster|head4|I|execd on stat-n21 registered
> 05/03/2005 17:57:46|qmaster|head4|I|execd on bio-n081 registered
> 05/03/2005 17:57:46|qmaster|head4|I|execd on stat-n05 registered
> 05/03/2005 17:57:46|qmaster|head4|I|execd on bio-n092 registered
> 05/03/2005 17:57:46|qmaster|head4|I|execd on cbcb-n26 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on bio-n014 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on bio-n082 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on chg-n07 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on bio-n079 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on core-n34 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on bio-n067 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on stat-n19 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on bio-n057 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on cbcb-n32 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on bio-n039 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on nsoe-n02 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on stat-n33 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on stat-n15 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on bio-n097 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on stat-n27 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on bio-n066 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on bio-n030 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on chg-n05 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on stat-n12 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on cbcb-n20 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on bio-n058 registered
> 05/03/2005 17:57:47|qmaster|head4|I|execd on chg-n08 registered
> 05/03/2005 17:57:48|qmaster|head4|I|execd on cbcb-n28 registered
> 05/03/2005 17:57:48|qmaster|head4|I|execd on bio-n071 registered
> 05/03/2005 17:57:48|qmaster|head4|I|execd on core-n60 registered
> 05/03/2005 17:57:48|qmaster|head4|I|execd on cbcb-n19 registered
> 05/03/2005 17:57:48|qmaster|head4|I|execd on cbcb-n15 registered
> 05/03/2005 17:57:51|qmaster|head4|W|job 140920.1 failed on host 
> bio-n071 assumedly after job because: job 140920.1 died through signal 
> KILL (9)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list