[GE users] The Scheduler dies" COMPLETE information

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Mon May 23 08:45:27 BST 2005


Hi Viktor,

you encounter issue 1416. This is fixed with u4.
However, the important question is, why your scheduler is reregistering
so often.

Stephan

Viktor Oudovenko wrote:

>Hi, Stephan and anybody who can help!
>
>Could you have a look at the attachment to see what is going on with my
>scheduler.
>What I did I just run as you advised scheduler demon in dl 1 mode and waited
>until it crashes.
>And it did. It dies even without any events.  I mean you will find two lines
>in from messages file when the scheduler died without any reason. But the
>last crash happened because one of the myrinet jobs finished.
>Could you give any hint what could it be and what could it be done.
>I am running Linux SuSE 8.2 on the server  and 9.0 and 9.2 on the slaves. 
>I also have a few opterons (8 machines). I am happy to provide any further
>information if necessary.
>Please help. 
>
>With kind regards,
>Viktor
>P.S. In the attachment I put  not only the last iteration but a couple of
>successful ones. 
>Actually in debug mode the scheduler updates information like every 5-10
>second or so.
>
>  
>
>>-----Original Message-----
>>From: Stephan Grell - Sun Germany - SSG - Software Engineer 
>>[mailto:stephan.grell at sun.com] 
>>Sent: Friday, May 20, 2005 3:05
>>To: users at gridengine.sunsource.net
>>Subject: Re: [GE users] Scheduler dies like a hell
>>
>>
>>Hi,
>>
>>I am not sure, that a currupted file is the problem. The 
>>qmaster does some validation during the startup. Could you 
>>run the scheduler in debug mode and post the output just 
>>before it dies?
>>
>>You can set the debug mode with:
>>
>>source $SGE_ROOT/<CELL>/common/settings.csh
>>source $SGE_ROOT/util/dl.csh
>>dl 1
>>
>>bin/<arch>/sge_schedd
>>
>>Or, do you have a stack trace of the scheduler?
>>
>>Which version are you running on which arch?
>>
>>Thanks,
>>Stephan
>>
>>Viktor Oudovenko wrote:
>>
>>    
>>
>>>Ron,
>>>
>>>Can I try to cat part of accounting file ? I mean to EDIT it 
>>>      
>>>
>>MANUALLY 
>>    
>>
>>>despite it is written do not do it? Best regards,
>>>v
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Ron Chen [mailto:ron_chen_123 at yahoo.com]
>>>>Sent: Thursday, May 19, 2005 22:02
>>>>To: users at gridengine.sunsource.net
>>>>Subject: RE: [GE users] Scheduler dies like a hell
>>>>
>>>>
>>>>It is not easy to find out which file gets corrupted
>>>>:(
>>>>
>>>>One thing you can try is to move spooled job files (in
>>>>default/spool/qmaster/jobs) to a backup directory.
>>>>Also, you can use qconf to dump the configuration for
>>>>the queues/users/hosts, and see if the values "make
>>>>sense".
>>>>
>>>>Of course the best way to fix this is to restore from
>>>>backup!
>>>>
>>>>-Ron
>>>>
>>>>
>>>>--- Viktor Oudovenko <udo at physics.rutgers.edu> wrote:
>>>>   
>>>>
>>>>        
>>>>
>>>>>Hi, Ron,
>>>>>
>>>>>I am using classic spooling.
>>>>>Which file should I look for corruption? Can I edit
>>>>>it manually?
>>>>>Thank you very much in advance.
>>>>>v
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Ron Chen [mailto:ron_chen_123 at yahoo.com]
>>>>>>Sent: Thursday, May 19, 2005 20:38
>>>>>>To: users at gridengine.sunsource.net
>>>>>>Subject: RE: [GE users] Scheduler dies like a hell
>>>>>>
>>>>>>
>>>>>>Are you using classic spooling or Berkeley DB
>>>>>>spooling?
>>>>>>
>>>>>>With classic spooling, when the machine crashes,
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>the
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>files may get corrupted. And when qmaster reads in
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>the
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>corrupted files, it may also corrupt the qmasters'
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>data structures.
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>IIRC, Berkeley DB handles recovery itself, but I
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>have
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>never played with it myself :)
>>>>>>
>>>>>>-Ron
>>>>>>
>>>>>>
>>>>>>--- Viktor Oudovenko <udo at physics.rutgers.edu>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>wrote:
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>Hi, Mac,
>>>>>>>Thank you very much for your advices!
>>>>>>>I'll try. I think one of running or finished
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>jobs
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>did a bad record somewhere
>>>>>>>(like jobs directory).
>>>>>>>Best regards,
>>>>>>>v
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: McCalla, Mac
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>[mailto:macmccalla at hess.com]
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>>Sent: Thursday, May 19, 2005 15:12
>>>>>>>>To: users at gridengine.sunsource.net
>>>>>>>>Subject: RE: [GE users] Scheduler dies like a
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>hell
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>>Hi,
>>>>>>>>
>>>>>>>>Some thinks to look at:  any messages in
>>>>>>>>$SGE_ROOT/......../qmaster/schedd/messages  ?
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>To
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>get more
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>info about what scheduler is doing while it is
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>running, see
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>info about scheduler params profile and
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>monitor,
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>you can set
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>them equal to 1 to turn on
>>>>>>>>some scheduler diagnostics,  see man
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>sched_conf.
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>>To extend timeout value for scheduler you can
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>set
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>>qmaster_params SCHEDULER_TIMEOUT to some value
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>greater than
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>600 (seconds).
>>>>>>>>You can also use system command strace to get
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>trace of
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>scheduler activity while it is running to
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>perhaps
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>get a
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>better idea of what it is spending its time
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>doing.
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>>Hope this helps,
>>>>>>>>
>>>>>>>>mac mccalla
>>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Viktor Oudovenko
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>[mailto:udo at physics.rutgers.edu]
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Sent: Thursday, May 19, 2005 12:00 PM
>>>>>>>>To: users at gridengine.sunsource.net
>>>>>>>>Subject: [GE users] Scheduler dies like a hell
>>>>>>>>
>>>>>>>>Hi, everybody,
>>>>>>>>
>>>>>>>>I am asking your help and ideas what could be
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>done
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>to restore
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>normal operation of the scheduler. First what
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>happened. A few
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>time during last week our main server died and
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>I
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>needed to
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>reboot it and even replace it. But jobs which
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>used
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>automount
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>proceed run. But from yesterday or day before
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>yesterday
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>scheduler demon dies. I tried to restart
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>sge_master but it
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>did not help. Now when demon died I start it
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>manually simply typing:
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>/opt/SGE/bin/lx24-x86/sge_schedd
>>>>>>>>
>>>>>>>>but after some time it died again. Please
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>advice
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>what could it be?
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Below plz find some info form file messages:
>>>>>>>>
>>>>>>>>
>>>>>>>>05/19/2005 01:02:37|qmaster|rupc-cs04b|E|no
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>execd
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>known on
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>host sub04n87 to send conf notification
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>05/19/2005
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>on
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>host sub04n88
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>to send conf notification 05/19/2005
>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>on
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>host sub04n89
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>to send conf notification 05/19/2005
>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>on
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>host sub04n90
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>to send conf notification 05/19/2005
>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>on
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>host sub04n91
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>to send conf notification 05/19/2005
>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>on
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>host
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>rupc04.rutgers.edu to send conf notification
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>05/19/2005
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>01:02:37|qmaster|rupc-cs04b|I|starting up
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>6.0u3
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>05/19/2005
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>01:08:11|qmaster|rupc-cs04b|E|commlib error:
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>got
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>read error
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>(closing connection) 05/19/2005
>>>>>>>>01:11:06|qmaster|rupc-cs04b|E|event client
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>"scheduler"
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>(rupc-cs04b/schedd/1) reregistered - it will
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>need
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>a total
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>update 05/19/2005
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>01:24:31|qmaster|rupc-cs04b|W|job 21171.1
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>failed on host sub04n203 assumedly after job
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>because: job
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>21171.1 died through signal TERM
>>>>>>>>(15)
>>>>>>>>05/19/2005
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>05:17:19|qmaster|rupc-cs04b|E|acknowledge
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>timeout
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>>after 600 seconds for event client (schedd:1)
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>on
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>host
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>"rupc-cs04b" 05/19/2005
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>09:29:03|qmaster|rupc-cs04b|W|job
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>21060.1 failed on host sub04n74 assumedly
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>after
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>job because:
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>job 21060.1 died through signal TERM (15)
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>05/19/2005
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>09:30:37|qmaster|rupc-cs04b|E|event client
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>"scheduler"
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>(rupc-cs04b/schedd/1) reregistered - it will
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>need
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>a total
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>update 05/19/2005
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>11:04:21|qmaster|rupc-cs04b|W|job 20222.1
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>failed on host sub04n29 assumedly after job
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>because: job
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>20222.1 died through signal KILL (9)
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>05/19/2005
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>>11:05:50|qmaster|rupc-cs04b|W|job 21212.1
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>failed
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>on host
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>sub04n25 assumedly after job because: job
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>21212.1
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>died
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>through signal KILL (9) 05/19/2005
>>>>>>>>12:04:51|qmaster|rupc-cs04b|E|acknowledge
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>timeout
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>>after 600
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>seconds for event client (schedd:1) on host
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>"rupc-cs04b"
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>=== message truncated ===
>>>>
>>>>
>>>>
>>>>		
>>>>Discover Yahoo!
>>>>Have fun online with music videos, cool games, IM and more. 
>>>>Check it out! 
>>>>http://discover.yahoo.com/online.html
>>>>
>>>>------------------------------------------------------------
>>>>        
>>>>
>>---------
>>    
>>
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> 
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>    
>>
>
>  
>
>------------------------------------------------------------------------
>
>WS128133  25368 16384     SENDING 22 ORDERS TO QMASTER
>128134  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
>128135  25368 16384     reresolve port timeout in 340
>128136  25368 16384     returning cached port value: 536
>--------------STOP-SCHEDULER-RUN-------------
>128137  25368 16384     ec_get retrieving events - will do max 20 fetches
>128138  25368 16384     doing sync fetch for messages, 20 still to do
>128139  25368 16384     try to get request from qmaster, id 1
>128140  25368 16384     Checking 55 events (44303-44357) while waiting for #44303
>128141  25368 16384     check complete, 55 events in list
>128142  25368 16384     got 55 events till 44357
>128143  25368 16384     doing async fetch for messages, 19 still to do
>128144  25368 16384     try to get request from qmaster, id 1
>128145  25368 16384     reresolve port timeout in 320
>128146  25368 16384     returning cached port value: 536
>128147  25368 16384     Sent ack for all events lower or equal 44357
>128148  25368 16384     ec_get - received 55 events
>128149  25368 16384     44303. EVENT MOD EXECHOST sub04n147
>128150  25368 16384     44304. EVENT MOD USER udo
>128151  25368 16384     44305. EVENT MOD USER iber
>128152  25368 16384     44306. EVENT MOD USER dieguez
>128153  25368 16384     44307. EVENT MOD USER karenjoh
>128154  25368 16384     44308. EVENT MOD USER lorenzo
>128155  25368 16384     44309. EVENT MOD USER parcolle
>128156  25368 16384     44310. EVENT MOD USER cfennie
>128157  25368 16384     44311. EVENT MOD USER civelli
>128158  25368 16384     44312. EVENT MOD EXECHOST sub04n135
>128159  25368 16384     44313. EVENT MOD EXECHOST sub04n141
>128160  25368 16384     44314. EVENT MOD EXECHOST sub04n127
>128161  25368 16384     44315. EVENT MOD EXECHOST sub04n145
>128162  25368 16384     44316. EVENT MOD EXECHOST sub04n133
>128163  25368 16384     44317. EVENT MOD EXECHOST sub04n148
>128164  25368 16384     44318. EVENT MOD EXECHOST sub04n74
>128165  25368 16384     44319. EVENT JOB 21542.1 task 2.sub04n74 USAGE
>128166  25368 16384     44320. EVENT JOB 21542.1 task 1.sub04n74 USAGE
>128167  25368 16384     44321. EVENT MOD EXECHOST rupc03.rutgers.edu
>128168  25368 16384     44322. EVENT MOD EXECHOST sub04n139
>128169  25368 16384     44323. EVENT MOD EXECHOST rupc02.rutgers.edu
>128170  25368 16384     44324. EVENT MOD EXECHOST sub04n80
>128171  25368 16384     44325. EVENT MOD EXECHOST sub04n207
>128172  25368 16384     44326. EVENT MOD EXECHOST sub04n180
>128173  25368 16384     44327. EVENT MOD EXECHOST sub04n23
>128174  25368 16384     44328. EVENT MOD EXECHOST sub04n30
>128175  25368 16384     44329. EVENT MOD EXECHOST sub04n203
>128176  25368 16384     44330. EVENT MOD EXECHOST sub04n109
>128177  25368 16384     44331. EVENT MOD EXECHOST rupc04.rutgers.edu
>128178  25368 16384     44332. EVENT MOD EXECHOST sub04n114
>128179  25368 16384     44333. EVENT MOD EXECHOST sub04n106
>128180  25368 16384     44334. EVENT MOD EXECHOST sub04n88
>128181  25368 16384     44335. EVENT JOB 21507.1 task 6.sub04n88 USAGE
>128182  25368 16384     44336. EVENT JOB 21507.1 task 5.sub04n88 USAGE
>128183  25368 16384     44337. EVENT MOD EXECHOST sub04n157
>128184  25368 16384     44338. EVENT MOD EXECHOST sub04n20
>128185  25368 16384     44339. EVENT MOD EXECHOST sub04n156
>128186  25368 16384     44340. EVENT MOD EXECHOST sub04n26
>128187  25368 16384     44341. EVENT JOB 21213.1 USAGE
>128188  25368 16384     44342. EVENT MOD EXECHOST sub04n05
>128189  25368 16384     44343. EVENT MOD EXECHOST sub04n103
>128190  25368 16384     44344. EVENT MOD EXECHOST sub04n164
>128191  25368 16384     44345. EVENT MOD EXECHOST sub04n09
>128192  25368 16384     44346. EVENT MOD EXECHOST sub04n105
>128193  25368 16384     44347. EVENT MOD EXECHOST sub04n113
>128194  25368 16384     44348. EVENT MOD EXECHOST sub04n28
>128195  25368 16384     44349. EVENT MOD EXECHOST sub04n76
>128196  25368 16384     44350. EVENT MOD EXECHOST sub04n162
>128197  25368 16384     44351. EVENT MOD EXECHOST sub04n108
>128198  25368 16384     44352. EVENT MOD EXECHOST sub04n38
>128199  25368 16384     44353. EVENT MOD EXECHOST sub04n04
>128200  25368 16384     44354. EVENT MOD EXECHOST sub04n116
>128201  25368 16384     44355. EVENT MOD EXECHOST sub04n179
>128202  25368 16384     44356. EVENT MOD EXECHOST sub04n160
>128203  25368 16384     44357. EVENT MOD EXECHOST sub04n107
>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7, CKPT:0 US:15 PR:4 S:nd:12/lf:7 
>128204  25368 16384     ================[SCHEDULING-EPOCH]==================
>128205  25368 16384     JOB 20937.1 start_time = 1116447112 running_time 338079 decay_time = 450
>128206  25368 16384     JOB 20938.1 start_time = 1116374344 running_time 410847 decay_time = 450
>128207  25368 16384     JOB 21040.1 start_time = 1116443073 running_time 342118 decay_time = 450
>128208  25368 16384     JOB 21076.1 start_time = 1116451351 running_time 333840 decay_time = 450
>128209  25368 16384     JOB 21210.1 start_time = 1116514970 running_time 270221 decay_time = 450
>128210  25368 16384     JOB 21213.1 start_time = 1116515250 running_time 269941 decay_time = 450
>128211  25368 16384     JOB 21338.1 start_time = 1116543252 running_time 241939 decay_time = 450
>128212  25368 16384     JOB 21423.1 start_time = 1116629274 running_time 155917 decay_time = 450
>128213  25368 16384     JOB 21424.1 start_time = 1116631365 running_time 153826 decay_time = 450
>128214  25368 16384     JOB 21440.1 start_time = 1116632934 running_time 152257 decay_time = 450
>128215  25368 16384     JOB 21441.1 start_time = 1116632994 running_time 152197 decay_time = 450
>128216  25368 16384     JOB 21443.1 start_time = 1116633602 running_time 151589 decay_time = 450
>128217  25368 16384     JOB 21474.1 start_time = 1116655118 running_time 130073 decay_time = 450
>128218  25368 16384     JOB 21503.1 start_time = 1116707395 running_time 77796 decay_time = 450
>128219  25368 16384     JOB 21507.1 start_time = 1116714061 running_time 71130 decay_time = 450
>128220  25368 16384     JOB 21528.1 start_time = 1116707641 running_time 77550 decay_time = 450
>128221  25368 16384     JOB 21530.1 start_time = 1116714453 running_time 70738 decay_time = 450
>128222  25368 16384     JOB 21537.1 start_time = 1116724845 running_time 60346 decay_time = 450
>128223  25368 16384     JOB 21542.1 start_time = 1116782511 running_time 2680 decay_time = 450
>128224  25368 16384     verified threshold of 169 queues
>128225  25368 16384     queue myrinet at sub04n61 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128226  25368 16384     queue myrinet at sub04n62 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128227  25368 16384     queue myrinet at sub04n65 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128228  25368 16384     queue myrinet at sub04n66 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128229  25368 16384     queue myrinet at sub04n67 tagged to be overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>
>128230  25368 16384     queue myrinet at sub04n68 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128231  25368 16384     queue myrinet at sub04n69 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128232  25368 16384     queue myrinet at sub04n70 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128233  25368 16384     queue myrinet at sub04n71 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128234  25368 16384     queue myrinet at sub04n72 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128235  25368 16384     queue myrinet at sub04n75 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128236  25368 16384     queue myrinet at sub04n77 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128237  25368 16384     queue myrinet at sub04n78 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128238  25368 16384     queue myrinet at sub04n79 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128239  25368 16384     queue myrinet at sub04n81 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128240  25368 16384     queue myrinet at sub04n84 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128241  25368 16384     queue myrinet at sub04n85 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128242  25368 16384     queue myrinet at sub04n86 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128243  25368 16384     queue myrinet at sub04n87 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128244  25368 16384     queue myrinet at sub04n88 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128245  25368 16384     queue myrinet at sub04n89 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128246  25368 16384     queue myrinet at sub04n90 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128247  25368 16384     queue myrinet at sub04n91 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128248  25368 16384     queue myrinet at sub04n63 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128249  25368 16384     queue myrinet at sub04n64 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128250  25368 16384     queue myrinet at sub04n73 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128251  25368 16384     queue myrinet at sub04n74 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128252  25368 16384     queue opteronp at sub04n202 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128253  25368 16384     queue opteronp at sub04n205 tagged to be overloaded: load_medium=1.010000 (no load adjustment) >= 1.0
>
>128254  25368 16384     queue opteronp at sub04n206 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128255  25368 16384     queue opteronp at sub04n208 tagged to be overloaded: load_medium=1.010000 (no load adjustment) >= 1.0
>
>128256  25368 16384     queue parallel at sub04n121 tagged to be overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>
>128257  25368 16384     queue parallel at sub04n139 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128258  25368 16384     queue parallel at sub04n140 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128259  25368 16384     queue parallel at sub04n141 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128260  25368 16384     queue parallel at sub04n142 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128261  25368 16384     queue parallel at sub04n143 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128262  25368 16384     queue parallel at sub04n144 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128263  25368 16384     queue parallel at sub04n146 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128264  25368 16384     queue parallel at sub04n02 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128265  25368 16384     queue parallel at sub04n03 tagged to be overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>
>128266  25368 16384     queue parallel at sub04n04 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128267  25368 16384     queue parallel at sub04n05 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128268  25368 16384     queue parallel at sub04n06 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128269  25368 16384     queue parallel at sub04n07 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128270  25368 16384     queue parallel at sub04n08 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128271  25368 16384     queue parallel at sub04n09 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128272  25368 16384     queue parallel at sub04n10 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128273  25368 16384     queue parallel at sub04n11 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128274  25368 16384     verified threshold of 169 queues
>128275  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
>128276  25368 16384     Not enrolled ja_tasks: 0
>128277  25368 16384     Enrolled ja_tasks: 1
>128278  25368 16384     Not enrolled ja_tasks: 0
>128279  25368 16384     Enrolled ja_tasks: 1
>128280  25368 16384     Not enrolled ja_tasks: 0
>128281  25368 16384     Enrolled ja_tasks: 1
>128282  25368 16384     Not enrolled ja_tasks: 0
>128283  25368 16384     Enrolled ja_tasks: 1
>128284  25368 16384     Not enrolled ja_tasks: 0
>128285  25368 16384     Enrolled ja_tasks: 1
>128286  25368 16384     Not enrolled ja_tasks: 0
>128287  25368 16384     Enrolled ja_tasks: 1
>128288  25368 16384     Not enrolled ja_tasks: 0
>128289  25368 16384     Enrolled ja_tasks: 1
>128290  25368 16384     Not enrolled ja_tasks: 0
>128291  25368 16384     Enrolled ja_tasks: 1
>128292  25368 16384     Not enrolled ja_tasks: 0
>128293  25368 16384     Enrolled ja_tasks: 1
>128294  25368 16384     Not enrolled ja_tasks: 0
>128295  25368 16384     Enrolled ja_tasks: 1
>128296  25368 16384     Not enrolled ja_tasks: 0
>128297  25368 16384     Enrolled ja_tasks: 1
>128298  25368 16384     Not enrolled ja_tasks: 0
>128299  25368 16384     Enrolled ja_tasks: 1
>128300  25368 16384     Not enrolled ja_tasks: 0
>128301  25368 16384     Enrolled ja_tasks: 1
>128302  25368 16384     Not enrolled ja_tasks: 0
>128303  25368 16384     Enrolled ja_tasks: 1
>128304  25368 16384     Not enrolled ja_tasks: 0
>128305  25368 16384     Enrolled ja_tasks: 1
>128306  25368 16384     Not enrolled ja_tasks: 0
>128307  25368 16384     Enrolled ja_tasks: 1
>128308  25368 16384     Not enrolled ja_tasks: 0
>128309  25368 16384     Enrolled ja_tasks: 1
>128310  25368 16384     Not enrolled ja_tasks: 0
>128311  25368 16384     Enrolled ja_tasks: 1
>128312  25368 16384     Not enrolled ja_tasks: 0
>128313  25368 16384     Enrolled ja_tasks: 1
>128314  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
>128315  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128316  25368 16384        slots: 1.000000 * 1000.000000 * 6    ---> 6000.000000
>128317  25368 16384     slot request assumed for static urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
>128318  25368 16384        slots: 1.000000 * 1000.000000 * 20    ---> 20000.000000
>128319  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128320  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128321  25368 16384        slots: 1.000000 * 1000.000000 * 6    ---> 6000.000000
>128322  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128323  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128324  25368 16384     slot request assumed for static urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" setting "min"
>128325  25368 16384        slots: 1.000000 * 1000.000000 * 2    ---> 2000.000000
>128326  25368 16384        slots: 1.000000 * 1000.000000 * 8    ---> 8000.000000
>128327  25368 16384     ASU min = 1000.00000000000, ASU max = 20000.00000000000
>128328  25368 16384     
>128329  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
>128330  25368 16384     
>128331  25368 16384     =====================[Pass 0]======================
>128332  25368 16384     =====================[Pass 1]======================
>128333  25368 16384     =====================[Pass 2]======================
>128334  25368 16384     
>128335  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
>128336  25368 16384     
>128337  25368 16384     =====================[Pass 0]======================
>128338  25368 16384     =====================[Pass 1]======================
>128339  25368 16384     =====================[Pass 2]======================
>128340  25368 16384     Normalizing tickets using 0.000000/18.333333 as min_tix/max_tix
>128341  25368 16384        got 19 running jobs
>128342  25368 16384        added 19 ticket orders for running jobs
>128343  25368 16384        added 1 orders for updating usage of user
>128344  25368 16384        added 0 orders for updating usage of project
>128345  25368 16384        added 0 orders for updating share tree
>128346  25368 16384        added 1 orders for scheduler configuration
>128347  25368 16384     SENDING 22 ORDERS TO QMASTER
>128348  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
>128349  25368 16384     reresolve port timeout in 320
>128350  25368 16384     returning cached port value: 536
>--------------STOP-SCHEDULER-RUN-------------
>128351  25368 16384     ec_get retrieving events - will do max 20 fetches
>128352  25368 16384     doing sync fetch for messages, 20 still to do
>128353  25368 16384     try to get request from qmaster, id 1
>128354  25368 16384     Checking 120 events (44358-44477) while waiting for #44358
>128355  25368 16384     check complete, 120 events in list
>128356  25368 16384     got 120 events till 44477
>128357  25368 16384     doing async fetch for messages, 19 still to do
>128358  25368 16384     try to get request from qmaster, id 1
>128359  25368 16384     reresolve port timeout in 300
>128360  25368 16384     returning cached port value: 536
>128361  25368 16384     Sent ack for all events lower or equal 44477
>128362  25368 16384     ec_get - received 120 events
>128363  25368 16384     44358. EVENT MOD EXECHOST sub04n166
>128364  25368 16384     44359. EVENT MOD EXECHOST sub04n90
>128365  25368 16384     44360. EVENT JOB 21503.1 task 2.sub04n90 USAGE
>128366  25368 16384     44361. EVENT JOB 21503.1 task 1.sub04n90 USAGE
>128367  25368 16384     44362. EVENT MOD EXECHOST sub04n168
>128368  25368 16384     44363. EVENT MOD EXECHOST sub04n112
>128369  25368 16384     44364. EVENT MOD EXECHOST sub04n08
>128370  25368 16384     44365. EVENT MOD EXECHOST sub04n75
>128371  25368 16384     44366. EVENT JOB 21040.1 task 6.sub04n75 USAGE
>128372  25368 16384     44367. EVENT JOB 21040.1 task 5.sub04n75 USAGE
>128373  25368 16384     44368. EVENT MOD USER udo
>128374  25368 16384     44369. EVENT MOD USER iber
>128375  25368 16384     44370. EVENT MOD USER dieguez
>128376  25368 16384     44371. EVENT MOD USER karenjoh
>128377  25368 16384     44372. EVENT MOD USER lorenzo
>128378  25368 16384     44373. EVENT MOD USER parcolle
>128379  25368 16384     44374. EVENT MOD USER cfennie
>128380  25368 16384     44375. EVENT MOD USER civelli
>128381  25368 16384     44376. EVENT MOD EXECHOST sub04n14
>128382  25368 16384     44377. EVENT MOD EXECHOST sub04n150
>128383  25368 16384     44378. EVENT MOD EXECHOST sub04n169
>128384  25368 16384     44379. EVENT MOD EXECHOST sub04n165
>128385  25368 16384     44380. EVENT MOD EXECHOST sub04n136
>128386  25368 16384     44381. EVENT MOD EXECHOST sub04n81
>128387  25368 16384     44382. EVENT JOB 21507.1 task 6.sub04n81 USAGE
>128388  25368 16384     44383. EVENT JOB 21507.1 task 5.sub04n81 USAGE
>128389  25368 16384     44384. EVENT MOD EXECHOST sub04n176
>128390  25368 16384     44385. EVENT MOD EXECHOST sub04n161
>128391  25368 16384     44386. EVENT MOD EXECHOST sub04n124
>128392  25368 16384     44387. EVENT MOD EXECHOST sub04n01
>128393  25368 16384     44388. EVENT MOD EXECHOST sub04n158
>128394  25368 16384     44389. EVENT MOD EXECHOST sub04n159
>128395  25368 16384     44390. EVENT MOD EXECHOST sub04n134
>128396  25368 16384     44391. EVENT MOD EXECHOST sub04n143
>128397  25368 16384     44392. EVENT MOD EXECHOST sub04n121
>128398  25368 16384     44393. EVENT MOD EXECHOST sub04n15
>128399  25368 16384     44394. EVENT MOD EXECHOST sub04n13
>128400  25368 16384     44395. EVENT MOD EXECHOST sub04n118
>128401  25368 16384     44396. EVENT MOD EXECHOST sub04n64
>128402  25368 16384     44397. EVENT JOB 21542.1 task 2.sub04n64 USAGE
>128403  25368 16384     44398. EVENT JOB 21542.1 task 1.sub04n64 USAGE
>128404  25368 16384     44399. EVENT MOD EXECHOST sub04n151
>128405  25368 16384     44400. EVENT MOD EXECHOST sub04n154
>128406  25368 16384     44401. EVENT MOD EXECHOST sub04n149
>128407  25368 16384     44402. EVENT MOD EXECHOST sub04n16
>128408  25368 16384     44403. EVENT MOD EXECHOST sub04n155
>128409  25368 16384     44404. EVENT MOD EXECHOST sub04n152
>128410  25368 16384     44405. EVENT MOD EXECHOST sub04n163
>128411  25368 16384     44406. EVENT MOD EXECHOST sub04n86
>128412  25368 16384     44407. EVENT JOB 21423.1 task 2.sub04n86 USAGE
>128413  25368 16384     44408. EVENT JOB 21423.1 task 1.sub04n86 USAGE
>128414  25368 16384     44409. EVENT MOD EXECHOST sub04n43
>128415  25368 16384     44410. EVENT MOD EXECHOST sub04n204
>128416  25368 16384     44411. EVENT MOD EXECHOST rupc01.rutgers.edu
>128417  25368 16384     44412. EVENT MOD EXECHOST sub04n125
>128418  25368 16384     44413. EVENT MOD EXECHOST sub04n03
>128419  25368 16384     44414. EVENT JOB 21076.1 USAGE
>128420  25368 16384     44415. EVENT MOD EXECHOST sub04n44
>128421  25368 16384     44416. EVENT MOD EXECHOST sub04n32
>128422  25368 16384     44417. EVENT MOD EXECHOST sub04n21
>128423  25368 16384     44418. EVENT MOD EXECHOST sub04n22
>128424  25368 16384     44419. EVENT MOD EXECHOST sub04n35
>128425  25368 16384     44420. EVENT MOD EXECHOST sub04n201
>128426  25368 16384     44421. EVENT MOD EXECHOST sub04n146
>128427  25368 16384     44422. EVENT MOD EXECHOST sub04n111
>128428  25368 16384     44423. EVENT MOD EXECHOST sub04n177
>128429  25368 16384     44424. EVENT MOD EXECHOST sub04n89
>128430  25368 16384     44425. EVENT JOB 21530.1 task 2.sub04n89 USAGE
>128431  25368 16384     44426. EVENT JOB 21530.1 task 1.sub04n89 USAGE
>128432  25368 16384     44427. EVENT JOB 21530.1 USAGE
>128433  25368 16384     44428. EVENT MOD EXECHOST sub04n205
>128434  25368 16384     44429. EVENT JOB 21440.1 USAGE
>128435  25368 16384     44430. EVENT MOD EXECHOST sub04n208
>128436  25368 16384     44431. EVENT JOB 21528.1 USAGE
>128437  25368 16384     44432. EVENT MOD EXECHOST sub04n104
>128438  25368 16384     44433. EVENT MOD EXECHOST sub04n24
>128439  25368 16384     44434. EVENT JOB 21210.1 USAGE
>128440  25368 16384     44435. EVENT MOD EXECHOST sub04n18
>128441  25368 16384     44436. EVENT MOD EXECHOST sub04n31
>128442  25368 16384     44437. EVENT JOB 20937.1 USAGE
>128443  25368 16384     44438. EVENT MOD EXECHOST sub04n202
>128444  25368 16384     44439. EVENT JOB 21443.1 USAGE
>128445  25368 16384     44440. EVENT MOD EXECHOST sub04n171
>128446  25368 16384     44441. EVENT MOD EXECHOST sub04n37
>128447  25368 16384     44442. EVENT MOD EXECHOST sub04n36
>128448  25368 16384     44443. EVENT MOD EXECHOST sub04n40
>128449  25368 16384     44444. EVENT MOD EXECHOST sub04n12
>128450  25368 16384     44445. EVENT MOD EXECHOST sub04n172
>128451  25368 16384     44446. EVENT MOD EXECHOST sub04n79
>128452  25368 16384     44447. EVENT JOB 21040.1 task 6.sub04n79 USAGE
>128453  25368 16384     44448. EVENT JOB 21040.1 task 5.sub04n79 USAGE
>128454  25368 16384     44449. EVENT JOB 21040.1 USAGE
>128455  25368 16384     44450. EVENT MOD EXECHOST sub04n61
>128456  25368 16384     44451. EVENT JOB 21040.1 task 6.sub04n61 USAGE
>128457  25368 16384     44452. EVENT JOB 21040.1 task 5.sub04n61 USAGE
>128458  25368 16384     44453. EVENT MOD EXECHOST sub04n170
>128459  25368 16384     44454. EVENT MOD EXECHOST sub04n41
>128460  25368 16384     44455. EVENT JOB 20938.1 USAGE
>128461  25368 16384     44456. EVENT MOD EXECHOST sub04n153
>128462  25368 16384     44457. EVENT MOD EXECHOST sub04n39
>128463  25368 16384     44458. EVENT MOD EXECHOST sub04n83
>128464  25368 16384     44459. EVENT MOD EXECHOST sub04n82
>128465  25368 16384     44460. EVENT MOD EXECHOST sub04n174
>128466  25368 16384     44461. EVENT MOD EXECHOST sub04n173
>128467  25368 16384     44462. EVENT MOD EXECHOST sub04n85
>128468  25368 16384     44463. EVENT JOB 21423.1 task 2.sub04n85 USAGE
>128469  25368 16384     44464. EVENT JOB 21423.1 task 1.sub04n85 USAGE
>128470  25368 16384     44465. EVENT MOD EXECHOST sub04n68
>128471  25368 16384     44466. EVENT JOB 21474.1 task 14.sub04n68 USAGE
>128472  25368 16384     44467. EVENT JOB 21474.1 task 13.sub04n68 USAGE
>128473  25368 16384     44468. EVENT MOD EXECHOST beowulf.rutgers.edu
>128474  25368 16384     44469. EVENT MOD EXECHOST sub04n91
>128475  25368 16384     44470. EVENT JOB 21423.1 task 2.sub04n91 USAGE
>128476  25368 16384     44471. EVENT JOB 21423.1 task 1.sub04n91 USAGE
>128477  25368 16384     44472. EVENT JOB 21423.1 USAGE
>128478  25368 16384     44473. EVENT MOD EXECHOST sub04n29
>128479  25368 16384     44474. EVENT MOD EXECHOST sub04n69
>128480  25368 16384     44475. EVENT JOB 21474.1 task 14.sub04n69 USAGE
>128481  25368 16384     44476. EVENT JOB 21474.1 task 13.sub04n69 USAGE
>128482  25368 16384     44477. EVENT MOD EXECHOST sub04n175
>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7, CKPT:0 US:15 PR:4 S:nd:12/lf:7 
>128483  25368 16384     ================[SCHEDULING-EPOCH]==================
>128484  25368 16384     JOB 20937.1 start_time = 1116447112 running_time 338099 decay_time = 450
>128485  25368 16384     JOB 20938.1 start_time = 1116374344 running_time 410867 decay_time = 450
>128486  25368 16384     JOB 21040.1 start_time = 1116443073 running_time 342138 decay_time = 450
>128487  25368 16384     JOB 21076.1 start_time = 1116451351 running_time 333860 decay_time = 450
>128488  25368 16384     JOB 21210.1 start_time = 1116514970 running_time 270241 decay_time = 450
>128489  25368 16384     JOB 21213.1 start_time = 1116515250 running_time 269961 decay_time = 450
>128490  25368 16384     JOB 21338.1 start_time = 1116543252 running_time 241959 decay_time = 450
>128491  25368 16384     JOB 21423.1 start_time = 1116629274 running_time 155937 decay_time = 450
>128492  25368 16384     JOB 21424.1 start_time = 1116631365 running_time 153846 decay_time = 450
>128493  25368 16384     JOB 21440.1 start_time = 1116632934 running_time 152277 decay_time = 450
>128494  25368 16384     JOB 21441.1 start_time = 1116632994 running_time 152217 decay_time = 450
>128495  25368 16384     JOB 21443.1 start_time = 1116633602 running_time 151609 decay_time = 450
>128496  25368 16384     JOB 21474.1 start_time = 1116655118 running_time 130093 decay_time = 450
>128497  25368 16384     JOB 21503.1 start_time = 1116707395 running_time 77816 decay_time = 450
>128498  25368 16384     JOB 21507.1 start_time = 1116714061 running_time 71150 decay_time = 450
>128499  25368 16384     JOB 21528.1 start_time = 1116707641 running_time 77570 decay_time = 450
>128500  25368 16384     JOB 21530.1 start_time = 1116714453 running_time 70758 decay_time = 450
>128501  25368 16384     JOB 21537.1 start_time = 1116724845 running_time 60366 decay_time = 450
>128502  25368 16384     JOB 21542.1 start_time = 1116782511 running_time 2700 decay_time = 450
>128503  25368 16384     verified threshold of 169 queues
>128504  25368 16384     queue myrinet at sub04n61 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128505  25368 16384     queue myrinet at sub04n62 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128506  25368 16384     queue myrinet at sub04n65 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128507  25368 16384     queue myrinet at sub04n66 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128508  25368 16384     queue myrinet at sub04n67 tagged to be overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>
>128509  25368 16384     queue myrinet at sub04n68 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128510  25368 16384     queue myrinet at sub04n69 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128511  25368 16384     queue myrinet at sub04n70 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128512  25368 16384     queue myrinet at sub04n71 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128513  25368 16384     queue myrinet at sub04n72 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128514  25368 16384     queue myrinet at sub04n75 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128515  25368 16384     queue myrinet at sub04n77 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128516  25368 16384     queue myrinet at sub04n78 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128517  25368 16384     queue myrinet at sub04n79 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128518  25368 16384     queue myrinet at sub04n81 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128519  25368 16384     queue myrinet at sub04n84 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128520  25368 16384     queue myrinet at sub04n85 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128521  25368 16384     queue myrinet at sub04n86 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128522  25368 16384     queue myrinet at sub04n87 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128523  25368 16384     queue myrinet at sub04n88 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128524  25368 16384     queue myrinet at sub04n89 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128525  25368 16384     queue myrinet at sub04n90 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128526  25368 16384     queue myrinet at sub04n91 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128527  25368 16384     queue myrinet at sub04n63 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128528  25368 16384     queue myrinet at sub04n64 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128529  25368 16384     queue myrinet at sub04n73 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128530  25368 16384     queue myrinet at sub04n74 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128531  25368 16384     queue opteronp at sub04n202 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128532  25368 16384     queue opteronp at sub04n205 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128533  25368 16384     queue opteronp at sub04n206 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128534  25368 16384     queue opteronp at sub04n208 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128535  25368 16384     queue parallel at sub04n121 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128536  25368 16384     queue parallel at sub04n139 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128537  25368 16384     queue parallel at sub04n140 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128538  25368 16384     queue parallel at sub04n141 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128539  25368 16384     queue parallel at sub04n142 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128540  25368 16384     queue parallel at sub04n143 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128541  25368 16384     queue parallel at sub04n144 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128542  25368 16384     queue parallel at sub04n146 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128543  25368 16384     queue parallel at sub04n02 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128544  25368 16384     queue parallel at sub04n03 tagged to be overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>
>128545  25368 16384     queue parallel at sub04n04 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128546  25368 16384     queue parallel at sub04n05 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128547  25368 16384     queue parallel at sub04n06 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128548  25368 16384     queue parallel at sub04n07 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128549  25368 16384     queue parallel at sub04n08 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128550  25368 16384     queue parallel at sub04n09 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128551  25368 16384     queue parallel at sub04n10 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128552  25368 16384     queue parallel at sub04n11 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128553  25368 16384     verified threshold of 169 queues
>128554  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
>128555  25368 16384     Not enrolled ja_tasks: 0
>128556  25368 16384     Enrolled ja_tasks: 1
>128557  25368 16384     Not enrolled ja_tasks: 0
>128558  25368 16384     Enrolled ja_tasks: 1
>128559  25368 16384     Not enrolled ja_tasks: 0
>128560  25368 16384     Enrolled ja_tasks: 1
>128561  25368 16384     Not enrolled ja_tasks: 0
>128562  25368 16384     Enrolled ja_tasks: 1
>128563  25368 16384     Not enrolled ja_tasks: 0
>128564  25368 16384     Enrolled ja_tasks: 1
>128565  25368 16384     Not enrolled ja_tasks: 0
>128566  25368 16384     Enrolled ja_tasks: 1
>128567  25368 16384     Not enrolled ja_tasks: 0
>128568  25368 16384     Enrolled ja_tasks: 1
>128569  25368 16384     Not enrolled ja_tasks: 0
>128570  25368 16384     Enrolled ja_tasks: 1
>128571  25368 16384     Not enrolled ja_tasks: 0
>128572  25368 16384     Enrolled ja_tasks: 1
>128573  25368 16384     Not enrolled ja_tasks: 0
>128574  25368 16384     Enrolled ja_tasks: 1
>128575  25368 16384     Not enrolled ja_tasks: 0
>128576  25368 16384     Enrolled ja_tasks: 1
>128577  25368 16384     Not enrolled ja_tasks: 0
>128578  25368 16384     Enrolled ja_tasks: 1
>128579  25368 16384     Not enrolled ja_tasks: 0
>128580  25368 16384     Enrolled ja_tasks: 1
>128581  25368 16384     Not enrolled ja_tasks: 0
>128582  25368 16384     Enrolled ja_tasks: 1
>128583  25368 16384     Not enrolled ja_tasks: 0
>128584  25368 16384     Enrolled ja_tasks: 1
>128585  25368 16384     Not enrolled ja_tasks: 0
>128586  25368 16384     Enrolled ja_tasks: 1
>128587  25368 16384     Not enrolled ja_tasks: 0
>128588  25368 16384     Enrolled ja_tasks: 1
>128589  25368 16384     Not enrolled ja_tasks: 0
>128590  25368 16384     Enrolled ja_tasks: 1
>128591  25368 16384     Not enrolled ja_tasks: 0
>128592  25368 16384     Enrolled ja_tasks: 1
>128593  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
>128594  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128595  25368 16384        slots: 1.000000 * 1000.000000 * 6    ---> 6000.000000
>128596  25368 16384     slot request assumed for static urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
>128597  25368 16384        slots: 1.000000 * 1000.000000 * 20    ---> 20000.000000
>128598  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128599  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128600  25368 16384        slots: 1.000000 * 1000.000000 * 6    ---> 6000.000000
>128601  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128602  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128603  25368 16384     slot request assumed for static urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" setting "min"
>128604  25368 16384        slots: 1.000000 * 1000.000000 * 2    ---> 2000.000000
>128605  25368 16384        slots: 1.000000 * 1000.000000 * 8    ---> 8000.000000
>128606  25368 16384     ASU min = 1000.00000000000, ASU max = 20000.00000000000
>128607  25368 16384     
>128608  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
>128609  25368 16384     
>128610  25368 16384     =====================[Pass 0]======================
>128611  25368 16384     =====================[Pass 1]======================
>128612  25368 16384     =====================[Pass 2]======================
>128613  25368 16384     
>128614  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
>128615  25368 16384     
>128616  25368 16384     =====================[Pass 0]======================
>128617  25368 16384     =====================[Pass 1]======================
>128618  25368 16384     =====================[Pass 2]======================
>128619  25368 16384     Normalizing tickets using 0.000000/18.333333 as min_tix/max_tix
>128620  25368 16384        got 19 running jobs
>128621  25368 16384        added 19 ticket orders for running jobs
>128622  25368 16384        added 1 orders for updating usage of user
>128623  25368 16384        added 0 orders for updating usage of project
>128624  25368 16384        added 0 orders for updating share tree
>128625  25368 16384        added 1 orders for scheduler configuration
>128626  25368 16384     SENDING 22 ORDERS TO QMASTER
>128627  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
>128628  25368 16384     reresolve port timeout in 300
>128629  25368 16384     returning cached port value: 536
>--------------STOP-SCHEDULER-RUN-------------
>128630  25368 16384     ec_get retrieving events - will do max 20 fetches
>128631  25368 16384     doing sync fetch for messages, 20 still to do
>128632  25368 16384     try to get request from qmaster, id 1
>128633  25368 16384     Checking 84 events (44478-44561) while waiting for #44478
>128634  25368 16384     check complete, 84 events in list
>128635  25368 16384     got 84 events till 44561
>128636  25368 16384     doing async fetch for messages, 19 still to do
>128637  25368 16384     try to get request from qmaster, id 1
>128638  25368 16384     reresolve port timeout in 280
>128639  25368 16384     returning cached port value: 536
>128640  25368 16384     Getting host by name - Linux
>128641  25368 16384     1 names in h_addr_list
>128642  25368 16384     0 names in h_aliases
>128643  25368 16384     Sent ack for all events lower or equal 44561
>128644  25368 16384     ec_get - received 84 events
>128645  25368 16384     44478. EVENT MOD EXECHOST sub04n167
>128646  25368 16384     44479. EVENT MOD EXECHOST sub04n63
>128647  25368 16384     44480. EVENT JOB 21542.1 task 2.sub04n63 USAGE
>128648  25368 16384     44481. EVENT JOB 21542.1 task 1.sub04n63 USAGE
>128649  25368 16384     44482. EVENT JOB 21542.1 USAGE
>128650  25368 16384     44483. EVENT MOD EXECHOST sub04n71
>128651  25368 16384     44484. EVENT JOB 21537.1 task 2.sub04n71 USAGE
>128652  25368 16384     44485. EVENT JOB 21537.1 task 1.sub04n71 USAGE
>128653  25368 16384     44486. EVENT MOD EXECHOST sub04n65
>128654  25368 16384     44487. EVENT JOB 21424.1 task 2.sub04n65 USAGE
>128655  25368 16384     44488. EVENT JOB 21424.1 task 1.sub04n65 USAGE
>128656  25368 16384     44489. EVENT MOD USER udo
>128657  25368 16384     44490. EVENT MOD USER iber
>128658  25368 16384     44491. EVENT MOD USER dieguez
>128659  25368 16384     44492. EVENT MOD USER karenjoh
>128660  25368 16384     44493. EVENT MOD USER lorenzo
>128661  25368 16384     44494. EVENT MOD USER parcolle
>128662  25368 16384     44495. EVENT MOD USER cfennie
>128663  25368 16384     44496. EVENT MOD USER civelli
>128664  25368 16384     44497. EVENT MOD EXECHOST sub04n25
>128665  25368 16384     44498. EVENT MOD EXECHOST sub04n144
>128666  25368 16384     44499. EVENT MOD EXECHOST sub04n206
>128667  25368 16384     44500. EVENT JOB 21441.1 USAGE
>128668  25368 16384     44501. EVENT MOD EXECHOST sub04n87
>128669  25368 16384     44502. EVENT JOB 21503.1 task 2.sub04n87 USAGE
>128670  25368 16384     44503. EVENT JOB 21503.1 task 1.sub04n87 USAGE
>128671  25368 16384     44504. EVENT MOD EXECHOST sub04n70
>128672  25368 16384     44505. EVENT JOB 21503.1 task 2.sub04n70 USAGE
>128673  25368 16384     44506. EVENT JOB 21503.1 task 1.sub04n70 USAGE
>128674  25368 16384     44507. EVENT JOB 21503.1 USAGE
>128675  25368 16384     44508. EVENT MOD EXECHOST sub04n19
>128676  25368 16384     44509. EVENT JOB 21338.1 USAGE
>128677  25368 16384     44510. EVENT MOD EXECHOST sub04n84
>128678  25368 16384     44511. EVENT JOB 21424.1 task 2.sub04n84 USAGE
>128679  25368 16384     44512. EVENT JOB 21424.1 task 1.sub04n84 USAGE
>128680  25368 16384     44513. EVENT MOD EXECHOST sub04n178
>128681  25368 16384     44514. EVENT MOD EXECHOST sub04n67
>128682  25368 16384     44515. EVENT JOB 21474.1 task 14.sub04n67 USAGE
>128683  25368 16384     44516. EVENT JOB 21474.1 task 13.sub04n67 USAGE
>128684  25368 16384     44517. EVENT JOB 21474.1 USAGE
>128685  25368 16384     44518. EVENT MOD EXECHOST sub04n27
>128686  25368 16384     44519. EVENT MOD EXECHOST sub04n34
>128687  25368 16384     44520. EVENT MOD EXECHOST sub04n72
>128688  25368 16384     44521. EVENT JOB 21537.1 task 2.sub04n72 USAGE
>128689  25368 16384     44522. EVENT JOB 21537.1 task 1.sub04n72 USAGE
>128690  25368 16384     44523. EVENT MOD EXECHOST sub04n78
>128691  25368 16384     44524. EVENT JOB 21507.1 task 6.sub04n78 USAGE
>128692  25368 16384     44525. EVENT JOB 21507.1 task 5.sub04n78 USAGE
>128693  25368 16384     44526. EVENT JOB 21507.1 USAGE
>128694  25368 16384     44527. EVENT MOD EXECHOST sub04n17
>128695  25368 16384     44528. EVENT MOD EXECHOST sub04n07
>128696  25368 16384     44529. EVENT MOD EXECHOST sub04n128
>128697  25368 16384     44530. EVENT MOD EXECHOST sub04n42
>128698  25368 16384     44531. EVENT MOD EXECHOST sub04n62
>128699  25368 16384     44532. EVENT JOB 21424.1 task 2.sub04n62 USAGE
>128700  25368 16384     44533. EVENT JOB 21424.1 task 1.sub04n62 USAGE
>128701  25368 16384     44534. EVENT JOB 21424.1 USAGE
>128702  25368 16384     44535. EVENT MOD EXECHOST sub04n10
>128703  25368 16384     44536. EVENT MOD EXECHOST sub04n77
>128704  25368 16384     44537. EVENT JOB 21537.1 task 2.sub04n77 USAGE
>128705  25368 16384     44538. EVENT JOB 21537.1 task 1.sub04n77 USAGE
>128706  25368 16384     44539. EVENT MOD EXECHOST sub04n11
>128707  25368 16384     44540. EVENT MOD EXECHOST sub04n02
>128708  25368 16384     44541. EVENT MOD EXECHOST sub04n120
>128709  25368 16384     44542. EVENT MOD EXECHOST sub04n115
>128710  25368 16384     44543. EVENT MOD EXECHOST sub04n101
>128711  25368 16384     44544. EVENT MOD EXECHOST sub04n66
>128712  25368 16384     44545. EVENT JOB 21537.1 task 2.sub04n66 USAGE
>128713  25368 16384     44546. EVENT JOB 21537.1 task 1.sub04n66 USAGE
>128714  25368 16384     44547. EVENT JOB 21537.1 USAGE
>128715  25368 16384     44548. EVENT MOD EXECHOST sub04n142
>128716  25368 16384     44549. EVENT MOD EXECHOST sub04n123
>128717  25368 16384     44550. EVENT MOD EXECHOST sub04n33
>128718  25368 16384     44551. EVENT MOD EXECHOST sub04n126
>128719  25368 16384     44552. EVENT MOD EXECHOST sub04n140
>128720  25368 16384     44553. EVENT MOD EXECHOST sub04n119
>128721  25368 16384     44554. EVENT MOD EXECHOST sub04n102
>128722  25368 16384     44555. EVENT MOD EXECHOST sub04n110
>128723  25368 16384     44556. EVENT MOD EXECHOST sub04n117
>128724  25368 16384     44557. EVENT MOD EXECHOST sub04n06
>128725  25368 16384     44558. EVENT MOD EXECHOST sub04n73
>128726  25368 16384     44559. EVENT JOB 21542.1 task 2.sub04n73 USAGE
>128727  25368 16384     44560. EVENT JOB 21542.1 task 1.sub04n73 USAGE
>128728  25368 16384     44561. EVENT MOD EXECHOST sub04n122
>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7, CKPT:0 US:15 PR:4 S:nd:12/lf:7 
>128729  25368 16384     ================[SCHEDULING-EPOCH]==================
>128730  25368 16384     JOB 20937.1 start_time = 1116447112 running_time 338119 decay_time = 450
>128731  25368 16384     JOB 20938.1 start_time = 1116374344 running_time 410887 decay_time = 450
>128732  25368 16384     JOB 21040.1 start_time = 1116443073 running_time 342158 decay_time = 450
>128733  25368 16384     JOB 21076.1 start_time = 1116451351 running_time 333880 decay_time = 450
>128734  25368 16384     JOB 21210.1 start_time = 1116514970 running_time 270261 decay_time = 450
>128735  25368 16384     JOB 21213.1 start_time = 1116515250 running_time 269981 decay_time = 450
>128736  25368 16384     JOB 21338.1 start_time = 1116543252 running_time 241979 decay_time = 450
>128737  25368 16384     JOB 21423.1 start_time = 1116629274 running_time 155957 decay_time = 450
>128738  25368 16384     JOB 21424.1 start_time = 1116631365 running_time 153866 decay_time = 450
>128739  25368 16384     JOB 21440.1 start_time = 1116632934 running_time 152297 decay_time = 450
>128740  25368 16384     JOB 21441.1 start_time = 1116632994 running_time 152237 decay_time = 450
>128741  25368 16384     JOB 21443.1 start_time = 1116633602 running_time 151629 decay_time = 450
>128742  25368 16384     JOB 21474.1 start_time = 1116655118 running_time 130113 decay_time = 450
>128743  25368 16384     JOB 21503.1 start_time = 1116707395 running_time 77836 decay_time = 450
>128744  25368 16384     JOB 21507.1 start_time = 1116714061 running_time 71170 decay_time = 450
>128745  25368 16384     JOB 21528.1 start_time = 1116707641 running_time 77590 decay_time = 450
>128746  25368 16384     JOB 21530.1 start_time = 1116714453 running_time 70778 decay_time = 450
>128747  25368 16384     JOB 21537.1 start_time = 1116724845 running_time 60386 decay_time = 450
>128748  25368 16384     JOB 21542.1 start_time = 1116782511 running_time 2720 decay_time = 450
>128749  25368 16384     verified threshold of 169 queues
>128750  25368 16384     queue myrinet at sub04n61 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128751  25368 16384     queue myrinet at sub04n62 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128752  25368 16384     queue myrinet at sub04n65 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128753  25368 16384     queue myrinet at sub04n66 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128754  25368 16384     queue myrinet at sub04n67 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128755  25368 16384     queue myrinet at sub04n68 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128756  25368 16384     queue myrinet at sub04n69 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128757  25368 16384     queue myrinet at sub04n70 tagged to be overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>
>128758  25368 16384     queue myrinet at sub04n71 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128759  25368 16384     queue myrinet at sub04n72 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128760  25368 16384     queue myrinet at sub04n75 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128761  25368 16384     queue myrinet at sub04n77 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128762  25368 16384     queue myrinet at sub04n78 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128763  25368 16384     queue myrinet at sub04n79 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128764  25368 16384     queue myrinet at sub04n81 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128765  25368 16384     queue myrinet at sub04n84 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128766  25368 16384     queue myrinet at sub04n85 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128767  25368 16384     queue myrinet at sub04n86 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128768  25368 16384     queue myrinet at sub04n87 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128769  25368 16384     queue myrinet at sub04n88 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128770  25368 16384     queue myrinet at sub04n89 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128771  25368 16384     queue myrinet at sub04n90 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128772  25368 16384     queue myrinet at sub04n91 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128773  25368 16384     queue myrinet at sub04n63 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128774  25368 16384     queue myrinet at sub04n64 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128775  25368 16384     queue myrinet at sub04n73 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128776  25368 16384     queue myrinet at sub04n74 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128777  25368 16384     queue opteronp at sub04n202 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128778  25368 16384     queue opteronp at sub04n205 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128779  25368 16384     queue opteronp at sub04n206 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128780  25368 16384     queue opteronp at sub04n208 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128781  25368 16384     queue parallel at sub04n121 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128782  25368 16384     queue parallel at sub04n139 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128783  25368 16384     queue parallel at sub04n140 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128784  25368 16384     queue parallel at sub04n141 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128785  25368 16384     queue parallel at sub04n142 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128786  25368 16384     queue parallel at sub04n143 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128787  25368 16384     queue parallel at sub04n144 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128788  25368 16384     queue parallel at sub04n146 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128789  25368 16384     queue parallel at sub04n02 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128790  25368 16384     queue parallel at sub04n03 tagged to be overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>
>128791  25368 16384     queue parallel at sub04n04 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128792  25368 16384     queue parallel at sub04n05 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128793  25368 16384     queue parallel at sub04n06 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128794  25368 16384     queue parallel at sub04n07 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128795  25368 16384     queue parallel at sub04n08 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128796  25368 16384     queue parallel at sub04n09 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128797  25368 16384     queue parallel at sub04n10 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128798  25368 16384     queue parallel at sub04n11 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128799  25368 16384     verified threshold of 169 queues
>128800  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
>128801  25368 16384     Not enrolled ja_tasks: 0
>128802  25368 16384     Enrolled ja_tasks: 1
>128803  25368 16384     Not enrolled ja_tasks: 0
>128804  25368 16384     Enrolled ja_tasks: 1
>128805  25368 16384     Not enrolled ja_tasks: 0
>128806  25368 16384     Enrolled ja_tasks: 1
>128807  25368 16384     Not enrolled ja_tasks: 0
>128808  25368 16384     Enrolled ja_tasks: 1
>128809  25368 16384     Not enrolled ja_tasks: 0
>128810  25368 16384     Enrolled ja_tasks: 1
>128811  25368 16384     Not enrolled ja_tasks: 0
>128812  25368 16384     Enrolled ja_tasks: 1
>128813  25368 16384     Not enrolled ja_tasks: 0
>128814  25368 16384     Enrolled ja_tasks: 1
>128815  25368 16384     Not enrolled ja_tasks: 0
>128816  25368 16384     Enrolled ja_tasks: 1
>128817  25368 16384     Not enrolled ja_tasks: 0
>128818  25368 16384     Enrolled ja_tasks: 1
>128819  25368 16384     Not enrolled ja_tasks: 0
>128820  25368 16384     Enrolled ja_tasks: 1
>128821  25368 16384     Not enrolled ja_tasks: 0
>128822  25368 16384     Enrolled ja_tasks: 1
>128823  25368 16384     Not enrolled ja_tasks: 0
>128824  25368 16384     Enrolled ja_tasks: 1
>128825  25368 16384     Not enrolled ja_tasks: 0
>128826  25368 16384     Enrolled ja_tasks: 1
>128827  25368 16384     Not enrolled ja_tasks: 0
>128828  25368 16384     Enrolled ja_tasks: 1
>128829  25368 16384     Not enrolled ja_tasks: 0
>128830  25368 16384     Enrolled ja_tasks: 1
>128831  25368 16384     Not enrolled ja_tasks: 0
>128832  25368 16384     Enrolled ja_tasks: 1
>128833  25368 16384     Not enrolled ja_tasks: 0
>128834  25368 16384     Enrolled ja_tasks: 1
>128835  25368 16384     Not enrolled ja_tasks: 0
>128836  25368 16384     Enrolled ja_tasks: 1
>128837  25368 16384     Not enrolled ja_tasks: 0
>128838  25368 16384     Enrolled ja_tasks: 1
>128839  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
>128840  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128841  25368 16384        slots: 1.000000 * 1000.000000 * 6    ---> 6000.000000
>128842  25368 16384     slot request assumed for static urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
>128843  25368 16384        slots: 1.000000 * 1000.000000 * 20    ---> 20000.000000
>128844  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128845  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128846  25368 16384        slots: 1.000000 * 1000.000000 * 6    ---> 6000.000000
>128847  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128848  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>128849  25368 16384     slot request assumed for static urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" setting "min"
>128850  25368 16384        slots: 1.000000 * 1000.000000 * 2    ---> 2000.000000
>128851  25368 16384        slots: 1.000000 * 1000.000000 * 8    ---> 8000.000000
>128852  25368 16384     ASU min = 1000.00000000000, ASU max = 20000.00000000000
>128853  25368 16384     
>128854  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
>128855  25368 16384     
>128856  25368 16384     =====================[Pass 0]======================
>128857  25368 16384     =====================[Pass 1]======================
>128858  25368 16384     =====================[Pass 2]======================
>128859  25368 16384     
>128860  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
>128861  25368 16384     
>128862  25368 16384     =====================[Pass 0]======================
>128863  25368 16384     =====================[Pass 1]======================
>128864  25368 16384     =====================[Pass 2]======================
>128865  25368 16384     Normalizing tickets using 0.000000/18.333333 as min_tix/max_tix
>128866  25368 16384        got 19 running jobs
>128867  25368 16384        added 19 ticket orders for running jobs
>128868  25368 16384        added 1 orders for updating usage of user
>128869  25368 16384        added 0 orders for updating usage of project
>128870  25368 16384        added 0 orders for updating share tree
>128871  25368 16384        added 1 orders for scheduler configuration
>128872  25368 16384     SENDING 22 ORDERS TO QMASTER
>128873  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
>128874  25368 16384     reresolve port timeout in 280
>128875  25368 16384     returning cached port value: 536
>--------------STOP-SCHEDULER-RUN-------------
>128876  25368 16384     ec_get retrieving events - will do max 20 fetches
>128877  25368 16384     doing sync fetch for messages, 20 still to do
>128878  25368 16384     try to get request from qmaster, id 1
>128879  25368 16384     Checking 55 events (44562-44616) while waiting for #44562
>128880  25368 16384     check complete, 55 events in list
>128881  25368 16384     got 55 events till 44616
>128882  25368 16384     doing async fetch for messages, 19 still to do
>128883  25368 16384     try to get request from qmaster, id 1
>128884  25368 16384     reresolve port timeout in 260
>128885  25368 16384     returning cached port value: 536
>128886  25368 16384     Sent ack for all events lower or equal 44616
>128887  25368 16384     ec_get - received 55 events
>128888  25368 16384     44562. EVENT MOD EXECHOST sub04n147
>128889  25368 16384     44563. EVENT MOD USER udo
>128890  25368 16384     44564. EVENT MOD USER iber
>128891  25368 16384     44565. EVENT MOD USER dieguez
>128892  25368 16384     44566. EVENT MOD USER karenjoh
>128893  25368 16384     44567. EVENT MOD USER lorenzo
>128894  25368 16384     44568. EVENT MOD USER parcolle
>128895  25368 16384     44569. EVENT MOD USER cfennie
>128896  25368 16384     44570. EVENT MOD USER civelli
>128897  25368 16384     44571. EVENT MOD EXECHOST sub04n135
>128898  25368 16384     44572. EVENT MOD EXECHOST sub04n141
>128899  25368 16384     44573. EVENT MOD EXECHOST sub04n127
>128900  25368 16384     44574. EVENT MOD EXECHOST sub04n145
>128901  25368 16384     44575. EVENT MOD EXECHOST sub04n133
>128902  25368 16384     44576. EVENT MOD EXECHOST sub04n148
>128903  25368 16384     44577. EVENT MOD EXECHOST sub04n74
>128904  25368 16384     44578. EVENT JOB 21542.1 task 2.sub04n74 USAGE
>128905  25368 16384     44579. EVENT JOB 21542.1 task 1.sub04n74 USAGE
>128906  25368 16384     44580. EVENT MOD EXECHOST rupc03.rutgers.edu
>128907  25368 16384     44581. EVENT MOD EXECHOST sub04n139
>128908  25368 16384     44582. EVENT MOD EXECHOST rupc02.rutgers.edu
>128909  25368 16384     44583. EVENT MOD EXECHOST sub04n80
>128910  25368 16384     44584. EVENT MOD EXECHOST sub04n207
>128911  25368 16384     44585. EVENT MOD EXECHOST sub04n180
>128912  25368 16384     44586. EVENT MOD EXECHOST sub04n23
>128913  25368 16384     44587. EVENT MOD EXECHOST sub04n30
>128914  25368 16384     44588. EVENT MOD EXECHOST sub04n203
>128915  25368 16384     44589. EVENT MOD EXECHOST sub04n109
>128916  25368 16384     44590. EVENT MOD EXECHOST rupc04.rutgers.edu
>128917  25368 16384     44591. EVENT MOD EXECHOST sub04n114
>128918  25368 16384     44592. EVENT MOD EXECHOST sub04n106
>128919  25368 16384     44593. EVENT MOD EXECHOST sub04n88
>128920  25368 16384     44594. EVENT JOB 21507.1 task 6.sub04n88 USAGE
>128921  25368 16384     44595. EVENT JOB 21507.1 task 5.sub04n88 USAGE
>128922  25368 16384     44596. EVENT MOD EXECHOST sub04n157
>128923  25368 16384     44597. EVENT MOD EXECHOST sub04n20
>128924  25368 16384     44598. EVENT MOD EXECHOST sub04n156
>128925  25368 16384     44599. EVENT MOD EXECHOST sub04n26
>128926  25368 16384     44600. EVENT JOB 21213.1 USAGE
>128927  25368 16384     44601. EVENT MOD EXECHOST sub04n09
>128928  25368 16384     44602. EVENT MOD EXECHOST sub04n05
>128929  25368 16384     44603. EVENT MOD EXECHOST sub04n103
>128930  25368 16384     44604. EVENT MOD EXECHOST sub04n164
>128931  25368 16384     44605. EVENT MOD EXECHOST sub04n105
>128932  25368 16384     44606. EVENT MOD EXECHOST sub04n113
>128933  25368 16384     44607. EVENT MOD EXECHOST sub04n28
>128934  25368 16384     44608. EVENT MOD EXECHOST sub04n76
>128935  25368 16384     44609. EVENT MOD EXECHOST sub04n162
>128936  25368 16384     44610. EVENT MOD EXECHOST sub04n108
>128937  25368 16384     44611. EVENT MOD EXECHOST sub04n38
>128938  25368 16384     44612. EVENT MOD EXECHOST sub04n116
>128939  25368 16384     44613. EVENT MOD EXECHOST sub04n179
>128940  25368 16384     44614. EVENT MOD EXECHOST sub04n04
>128941  25368 16384     44615. EVENT MOD EXECHOST sub04n160
>128942  25368 16384     44616. EVENT MOD EXECHOST sub04n107
>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7, CKPT:0 US:15 PR:4 S:nd:12/lf:7 
>128943  25368 16384     ================[SCHEDULING-EPOCH]==================
>128944  25368 16384     JOB 20937.1 start_time = 1116447112 running_time 338139 decay_time = 450
>128945  25368 16384     JOB 20938.1 start_time = 1116374344 running_time 410907 decay_time = 450
>128946  25368 16384     JOB 21040.1 start_time = 1116443073 running_time 342178 decay_time = 450
>128947  25368 16384     JOB 21076.1 start_time = 1116451351 running_time 333900 decay_time = 450
>128948  25368 16384     JOB 21210.1 start_time = 1116514970 running_time 270281 decay_time = 450
>128949  25368 16384     JOB 21213.1 start_time = 1116515250 running_time 270001 decay_time = 450
>128950  25368 16384     JOB 21338.1 start_time = 1116543252 running_time 241999 decay_time = 450
>128951  25368 16384     JOB 21423.1 start_time = 1116629274 running_time 155977 decay_time = 450
>128952  25368 16384     JOB 21424.1 start_time = 1116631365 running_time 153886 decay_time = 450
>128953  25368 16384     JOB 21440.1 start_time = 1116632934 running_time 152317 decay_time = 450
>128954  25368 16384     JOB 21441.1 start_time = 1116632994 running_time 152257 decay_time = 450
>128955  25368 16384     JOB 21443.1 start_time = 1116633602 running_time 151649 decay_time = 450
>128956  25368 16384     JOB 21474.1 start_time = 1116655118 running_time 130133 decay_time = 450
>128957  25368 16384     JOB 21503.1 start_time = 1116707395 running_time 77856 decay_time = 450
>128958  25368 16384     JOB 21507.1 start_time = 1116714061 running_time 71190 decay_time = 450
>128959  25368 16384     JOB 21528.1 start_time = 1116707641 running_time 77610 decay_time = 450
>128960  25368 16384     JOB 21530.1 start_time = 1116714453 running_time 70798 decay_time = 450
>128961  25368 16384     JOB 21537.1 start_time = 1116724845 running_time 60406 decay_time = 450
>128962  25368 16384     JOB 21542.1 start_time = 1116782511 running_time 2740 decay_time = 450
>128963  25368 16384     verified threshold of 169 queues
>128964  25368 16384     queue myrinet at sub04n61 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128965  25368 16384     queue myrinet at sub04n62 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128966  25368 16384     queue myrinet at sub04n65 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128967  25368 16384     queue myrinet at sub04n66 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128968  25368 16384     queue myrinet at sub04n67 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128969  25368 16384     queue myrinet at sub04n68 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128970  25368 16384     queue myrinet at sub04n69 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128971  25368 16384     queue myrinet at sub04n70 tagged to be overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>
>128972  25368 16384     queue myrinet at sub04n71 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128973  25368 16384     queue myrinet at sub04n72 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128974  25368 16384     queue myrinet at sub04n75 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128975  25368 16384     queue myrinet at sub04n77 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128976  25368 16384     queue myrinet at sub04n78 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128977  25368 16384     queue myrinet at sub04n79 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128978  25368 16384     queue myrinet at sub04n81 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128979  25368 16384     queue myrinet at sub04n84 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128980  25368 16384     queue myrinet at sub04n85 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128981  25368 16384     queue myrinet at sub04n86 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128982  25368 16384     queue myrinet at sub04n87 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128983  25368 16384     queue myrinet at sub04n88 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128984  25368 16384     queue myrinet at sub04n89 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128985  25368 16384     queue myrinet at sub04n90 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128986  25368 16384     queue myrinet at sub04n91 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128987  25368 16384     queue myrinet at sub04n63 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128988  25368 16384     queue myrinet at sub04n64 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128989  25368 16384     queue myrinet at sub04n73 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128990  25368 16384     queue myrinet at sub04n74 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128991  25368 16384     queue opteronp at sub04n202 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128992  25368 16384     queue opteronp at sub04n205 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128993  25368 16384     queue opteronp at sub04n206 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128994  25368 16384     queue opteronp at sub04n208 tagged to be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>
>128995  25368 16384     queue parallel at sub04n121 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>128996  25368 16384     queue parallel at sub04n139 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128997  25368 16384     queue parallel at sub04n140 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128998  25368 16384     queue parallel at sub04n141 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>128999  25368 16384     queue parallel at sub04n142 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>129000  25368 16384     queue parallel at sub04n143 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>129001  25368 16384     queue parallel at sub04n144 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>129002  25368 16384     queue parallel at sub04n146 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>129003  25368 16384     queue parallel at sub04n02 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>129004  25368 16384     queue parallel at sub04n03 tagged to be overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>
>129005  25368 16384     queue parallel at sub04n04 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>129006  25368 16384     queue parallel at sub04n05 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>129007  25368 16384     queue parallel at sub04n06 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>129008  25368 16384     queue parallel at sub04n07 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>129009  25368 16384     queue parallel at sub04n08 tagged to be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>
>129010  25368 16384     queue parallel at sub04n09 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>129011  25368 16384     queue parallel at sub04n10 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>129012  25368 16384     queue parallel at sub04n11 tagged to be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>
>129013  25368 16384     verified threshold of 169 queues
>129014  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
>129015  25368 16384     Not enrolled ja_tasks: 0
>129016  25368 16384     Enrolled ja_tasks: 1
>129017  25368 16384     Not enrolled ja_tasks: 0
>129018  25368 16384     Enrolled ja_tasks: 1
>129019  25368 16384     Not enrolled ja_tasks: 0
>129020  25368 16384     Enrolled ja_tasks: 1
>129021  25368 16384     Not enrolled ja_tasks: 0
>129022  25368 16384     Enrolled ja_tasks: 1
>129023  25368 16384     Not enrolled ja_tasks: 0
>129024  25368 16384     Enrolled ja_tasks: 1
>129025  25368 16384     Not enrolled ja_tasks: 0
>129026  25368 16384     Enrolled ja_tasks: 1
>129027  25368 16384     Not enrolled ja_tasks: 0
>129028  25368 16384     Enrolled ja_tasks: 1
>129029  25368 16384     Not enrolled ja_tasks: 0
>129030  25368 16384     Enrolled ja_tasks: 1
>129031  25368 16384     Not enrolled ja_tasks: 0
>129032  25368 16384     Enrolled ja_tasks: 1
>129033  25368 16384     Not enrolled ja_tasks: 0
>129034  25368 16384     Enrolled ja_tasks: 1
>129035  25368 16384     Not enrolled ja_tasks: 0
>129036  25368 16384     Enrolled ja_tasks: 1
>129037  25368 16384     Not enrolled ja_tasks: 0
>129038  25368 16384     Enrolled ja_tasks: 1
>129039  25368 16384     Not enrolled ja_tasks: 0
>129040  25368 16384     Enrolled ja_tasks: 1
>129041  25368 16384     Not enrolled ja_tasks: 0
>129042  25368 16384     Enrolled ja_tasks: 1
>129043  25368 16384     Not enrolled ja_tasks: 0
>129044  25368 16384     Enrolled ja_tasks: 1
>129045  25368 16384     Not enrolled ja_tasks: 0
>129046  25368 16384     Enrolled ja_tasks: 1
>129047  25368 16384     Not enrolled ja_tasks: 0
>129048  25368 16384     Enrolled ja_tasks: 1
>129049  25368 16384     Not enrolled ja_tasks: 0
>129050  25368 16384     Enrolled ja_tasks: 1
>129051  25368 16384     Not enrolled ja_tasks: 0
>129052  25368 16384     Enrolled ja_tasks: 1
>129053  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
>129054  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>129055  25368 16384        slots: 1.000000 * 1000.000000 * 6    ---> 6000.000000
>129056  25368 16384     slot request assumed for static urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
>129057  25368 16384        slots: 1.000000 * 1000.000000 * 20    ---> 20000.000000
>129058  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>129059  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>129060  25368 16384        slots: 1.000000 * 1000.000000 * 6    ---> 6000.000000
>129061  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>129062  25368 16384        slots: 1.000000 * 1000.000000 * 1    ---> 1000.000000
>129063  25368 16384     slot request assumed for static urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" setting "min"
>129064  25368 16384        slots: 1.000000 * 1000.000000 * 2    ---> 2000.000000
>129065  25368 16384        slots: 1.000000 * 1000.000000 * 8    ---> 8000.000000
>129066  25368 16384     ASU min = 1000.00000000000, ASU max = 20000.00000000000
>129067  25368 16384     
>129068  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
>129069  25368 16384     
>129070  25368 16384     =====================[Pass 0]======================
>129071  25368 16384     =====================[Pass 1]======================
>129072  25368 16384     =====================[Pass 2]======================
>129073  25368 16384     
>129074  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
>129075  25368 16384     
>129076  25368 16384     =====================[Pass 0]======================
>129077  25368 16384     =====================[Pass 1]======================
>129078  25368 16384     =====================[Pass 2]======================
>129079  25368 16384     Normalizing tickets using 0.000000/18.333333 as min_tix/max_tix
>129080  25368 16384        got 19 running jobs
>129081  25368 16384        added 19 ticket orders for running jobs
>129082  25368 16384        added 1 orders for updating usage of user
>129083  25368 16384        added 0 orders for updating usage of project
>129084  25368 16384        added 0 orders for updating share tree
>129085  25368 16384        added 1 orders for scheduler configuration
>129086  25368 16384     SENDING 22 ORDERS TO QMASTER
>129087  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
>129088  25368 16384     reresolve port timeout in 260
>129089  25368 16384     returning cached port value: 536
>--------------STOP-SCHEDULER-RUN-------------
>129090  25368 16384     ec_get retrieving events - will do max 20 fetches
>129091  25368 16384     doing sync fetch for messages, 20 still to do
>129092  25368 16384     try to get request from qmaster, id 1
>129093  25368 16384     Checking 154 events (44617-44770) while waiting for #44617
>129094  25368 16384     check complete, 154 events in list
>129095  25368 16384     got 154 events till 44770
>129096  25368 16384     doing async fetch for messages, 19 still to do
>129097  25368 16384     try to get request from qmaster, id 1
>129098  25368 16384     reresolve port timeout in 240
>129099  25368 16384     returning cached port value: 536
>129100  25368 16384     Sent ack for all events lower or equal 44770
>129101  25368 16384     ec_get - received 154 events
>129102  25368 16384     44617. EVENT MOD EXECHOST sub04n08
>129103  25368 16384     44618. EVENT MOD EXECHOST sub04n166
>129104  25368 16384     44619. EVENT MOD EXECHOST sub04n168
>129105  25368 16384     44620. EVENT MOD EXECHOST sub04n112
>129106  25368 16384     44621. EVENT MOD EXECHOST sub04n90
>129107  25368 16384     44622. EVENT JOB 21503.1 task 2.sub04n90 USAGE
>129108  25368 16384     44623. EVENT JOB 21503.1 task 1.sub04n90 USAGE
>129109  25368 16384     44624. EVENT MOD USER udo
>129110  25368 16384     44625. EVENT MOD USER iber
>129111  25368 16384     44626. EVENT MOD USER dieguez
>129112  25368 16384     44627. EVENT MOD USER karenjoh
>129113  25368 16384     44628. EVENT MOD USER lorenzo
>129114  25368 16384     44629. EVENT MOD USER parcolle
>129115  25368 16384     44630. EVENT MOD USER cfennie
>129116  25368 16384     44631. EVENT MOD USER civelli
>129117  25368 16384     44632. EVENT MOD EXECHOST sub04n14
>129118  25368 16384     44633. EVENT MOD EXECHOST sub04n75
>129119  25368 16384     44634. EVENT JOB 21040.1 task 6.sub04n75 USAGE
>129120  25368 16384     44635. EVENT JOB 21040.1 task 5.sub04n75 USAGE
>129121  25368 16384     44636. EVENT MOD EXECHOST sub04n150
>129122  25368 16384     44637. EVENT MOD EXECHOST sub04n169
>129123  25368 16384     44638. EVENT MOD EXECHOST sub04n165
>129124  25368 16384     44639. EVENT MOD EXECHOST sub04n136
>129125  25368 16384     44640. EVENT MOD EXECHOST sub04n176
>129126  25368 16384     44641. EVENT MOD EXECHOST sub04n81
>129127  25368 16384     44642. EVENT JOB 21507.1 task 6.sub04n81 USAGE
>129128  25368 16384     44643. EVENT JOB 21507.1 task 5.sub04n81 USAGE
>129129  25368 16384     44644. EVENT JOB 21507.1 task past_usage USAGE
>129130  25368 16384     44645. EVENT DEL PETASK 21507.1 task 6.sub04n88
>129131  25368 16384     44646. EVENT JOB 21507.1 task past_usage USAGE
>129132  25368 16384     44647. EVENT DEL PETASK 21507.1 task 6.sub04n78
>129133  25368 16384     44648. EVENT JOB 21507.1 task past_usage USAGE
>129134  25368 16384     44649. EVENT DEL PETASK 21507.1 task 6.sub04n81
>129135  25368 16384     44650. EVENT JOB 21507.1 task past_usage USAGE
>129136  25368 16384     44651. EVENT DEL PETASK 21507.1 task 5.sub04n81
>129137  25368 16384     44652. EVENT JOB 21507.1 task past_usage USAGE
>129138  25368 16384     44653. EVENT DEL PETASK 21507.1 task 5.sub04n88
>129139  25368 16384     44654. EVENT JOB 21507.1 task past_usage USAGE
>129140  25368 16384     44655. EVENT DEL PETASK 21507.1 task 5.sub04n78
>129141  25368 16384     44656. EVENT MOD EXECHOST sub04n161
>129142  25368 16384     44657. EVENT MOD EXECHOST sub04n124
>129143  25368 16384     44658. EVENT ADD PETASK 21507.1 task 7.sub04n88
>129144  25368 16384     44659. EVENT ADD PETASK 21507.1 task 7.sub04n78
>129145  25368 16384     44660. EVENT MOD EXECHOST sub04n158
>129146  25368 16384     44661. EVENT MOD EXECHOST sub04n01
>129147  25368 16384     44662. EVENT MOD EXECHOST sub04n159
>129148  25368 16384     44663. EVENT ADD PETASK 21507.1 task 7.sub04n81
>129149  25368 16384     44664. EVENT MOD EXECHOST sub04n134
>129150  25368 16384     44665. EVENT ADD PETASK 21507.1 task 8.sub04n88
>129151  25368 16384     44666. EVENT ADD PETASK 21507.1 task 8.sub04n78
>129152  25368 16384     44667. EVENT ADD PETASK 21507.1 task 8.sub04n81
>129153  25368 16384     44668. EVENT MOD EXECHOST sub04n121
>129154  25368 16384     44669. EVENT MOD EXECHOST sub04n143
>129155  25368 16384     44670. EVENT MOD EXECHOST sub04n15
>129156  25368 16384     44671. EVENT MOD EXECHOST sub04n13
>129157  25368 16384     44672. EVENT MOD EXECHOST sub04n64
>129158  25368 16384     44673. EVENT JOB 21542.1 task 2.sub04n64 USAGE
>129159  25368 16384     44674. EVENT JOB 21542.1 task 1.sub04n64 USAGE
>129160  25368 16384     44675. EVENT MOD EXECHOST sub04n118
>129161  25368 16384     44676. EVENT MOD EXECHOST sub04n151
>129162  25368 16384     44677. EVENT MOD EXECHOST sub04n154
>129163  25368 16384     44678. EVENT MOD EXECHOST sub04n149
>129164  25368 16384     44679. EVENT MOD EXECHOST sub04n16
>129165  25368 16384     44680. EVENT MOD EXECHOST sub04n155
>129166  25368 16384     44681. EVENT MOD EXECHOST sub04n152
>129167  25368 16384     44682. EVENT MOD EXECHOST sub04n163
>129168  25368 16384     44683. EVENT MOD EXECHOST sub04n43
>129169  25368 16384     44684. EVENT MOD EXECHOST sub04n86
>129170  25368 16384     44685. EVENT JOB 21423.1 task 2.sub04n86 USAGE
>129171  25368 16384     44686. EVENT JOB 21423.1 task 1.sub04n86 USAGE
>129172  25368 16384     44687. EVENT MOD EXECHOST sub04n03
>129173  25368 16384     44688. EVENT JOB 21076.1 USAGE
>129174  25368 16384     44689. EVENT MOD EXECHOST sub04n204
>129175  25368 16384     44690. EVENT MOD EXECHOST rupc01.rutgers.edu
>129176  25368 16384     44691. EVENT MOD EXECHOST sub04n125
>129177  25368 16384     44692. EVENT MOD EXECHOST sub04n44
>129178  25368 16384     44693. EVENT MOD EXECHOST sub04n32
>129179  25368 16384     44694. EVENT MOD EXECHOST sub04n21
>129180  25368 16384     44695. EVENT MOD EXECHOST sub04n22
>129181  25368 16384     44696. EVENT MOD EXECHOST sub04n35
>129182  25368 16384     44697. EVENT MOD EXECHOST sub04n201
>129183  25368 16384     44698. EVENT MOD EXECHOST sub04n205
>129184  25368 16384     44699. EVENT JOB 21440.1 USAGE
>129185  25368 16384     44700. EVENT MOD EXECHOST sub04n111
>129186  25368 16384     44701. EVENT MOD EXECHOST sub04n89
>129187  25368 16384     44702. EVENT JOB 21530.1 task 2.sub04n89 USAGE
>129188  25368 16384     44703. EVENT JOB 21530.1 task 1.sub04n89 USAGE
>129189  25368 16384     44704. EVENT JOB 21530.1 USAGE
>129190  25368 16384     44705. EVENT MOD EXECHOST sub04n177
>129191  25368 16384     44706. EVENT MOD EXECHOST sub04n146
>129192  25368 16384     44707. EVENT ADD PETASK 21507.1 task 9.sub04n88
>129193  25368 16384     44708. EVENT JOB 21507.1 task past_usage USAGE
>129194  25368 16384     44709. EVENT DEL PETASK 21507.1 task 7.sub04n88
>Segmentation fault
>You have new mail in /var/spool/mail/root
>rupc-cs04b:/opt/SGE/util # 
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>/opt/SGE/default/spool/qmaster
>
>Sun May 22 14:25:16 EDT 2005
>05/22/2005 00:20:01|qmaster|rupc-cs04b|E|event client "scheduler" (rupc-cs04b/schedd/1) reregistered - it will need a total update
>05/22/2005 00:32:40|qmaster|rupc-cs04b|W|job 21538.1 failed on host sub04n63 in recognising job because: execd doesn't know this job
>05/22/2005 00:32:49|qmaster|rupc-cs04b|E|execd sub04n63 reports running state for job (21538.1/master) in queue "myrinet at sub04n63" while job is in state 65536 
>05/22/2005 00:33:49|qmaster|rupc-cs04b|E|execd at sub04n63 reports running job (21538.1/master) in queue "myrinet at sub04n63" that was not supposed to be there - killing
>05/22/2005 02:10:01|qmaster|rupc-cs04b|E|event client "scheduler" (rupc-cs04b/schedd/1) reregistered - it will need a total update
>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project version (1035) is not uptodate (1036) for user/project "udo"
>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project version (1035) is not uptodate (1036) for user/project "iber"
>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project version (1035) is not uptodate (1036) for user/project "dieguez"
>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project version (1035) is not uptodate (1036) for user/project "zayak"
>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project version (1035) is not uptodate (1036) for user/project "karenjoh"
>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project version (1035) is not uptodate (1036) for user/project "lorenzo"
>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project version (1035) is not uptodate (1036) for user/project "parcolle"
>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project version (1035) is not uptodate (1036) for user/project "cfennie"
>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project version (1035) is not uptodate (1036) for user/project "civelli"
>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project version (1044) is not uptodate (1045) for user/project "udo"
>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project version (1044) is not uptodate (1045) for user/project "iber"
>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project version (1044) is not uptodate (1045) for user/project "dieguez"
>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project version (1044) is not uptodate (1045) for user/project "zayak"
>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project version (1044) is not uptodate (1045) for user/project "karenjoh"
>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project version (1044) is not uptodate (1045) for user/project "lorenzo"
>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project version (1044) is not uptodate (1045) for user/project "parcolle"
>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project version (1044) is not uptodate (1045) for user/project "cfennie"
>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project version (1044) is not uptodate (1045) for user/project "civelli"
>05/22/2005 03:02:47|qmaster|rupc-cs04b|E|tightly integrated parallel task 21539.1 task 3.sub04n83 failed - killing job
>05/22/2005 03:10:01|qmaster|rupc-cs04b|E|event client "scheduler" (rupc-cs04b/schedd/1) reregistered - it will need a total update    <-- YOU SEE THESE 2 lines : THE SCHEDULER DIED EVEN WITHOUT ANY EVENTS , JUST by itself !!!
>05/22/2005 07:30:01|qmaster|rupc-cs04b|E|event client "scheduler" (rupc-cs04b/schedd/1) reregistered - it will need a total update
>05/22/2005 11:11:39|qmaster|rupc-cs04b|E|event client "scheduler" (rupc-cs04b/schedd/1) reregistered - it will need a total update    <-- BEFORE THE LAST CRASH
>05/22/2005 14:07:53|qmaster|rupc-cs04b|E|tightly integrated parallel task 21507.1 task 10.sub04n88 failed - killing job                       <-- THIS IS WHAT TRIGGERED the CRASH
>05/22/2005 14:09:14|qmaster|rupc-cs04b|W|job 21507.1 failed on host sub04n78 assumedly after job because: job 21507.1 died through signal TERM (15)
>05/22/2005 14:10:00|qmaster|rupc-cs04b|E|event client "scheduler" (rupc-cs04b/schedd/1) reregistered - it will need a total update    <- SCHEDULER START AFTER THE CRASH
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>SCHEDULER  messages  BELOW
>
>05/22/2005 00:20:01|schedd|rupc-cs04b|I|starting up 6.0u3
>05/22/2005 02:10:01|schedd|rupc-cs04b|I|starting up 6.0u3
>05/22/2005 02:30:26|schedd|rupc-cs04b|I|controlled shutdown 6.0u3
>05/22/2005 02:31:10|schedd|rupc-cs04b|I|starting up 6.0u3
>05/22/2005 02:34:06|schedd|rupc-cs04b|I|controlled shutdown 6.0u3
>05/22/2005 02:40:00|schedd|rupc-cs04b|I|starting up 6.0u3
>05/22/2005 03:10:01|schedd|rupc-cs04b|I|starting up 6.0u3
>05/22/2005 07:30:01|schedd|rupc-cs04b|I|starting up 6.0u3
>05/22/2005 11:11:39|schedd|rupc-cs04b|I|starting up 6.0u3        <--- before the last crush (I started debug mode)
>05/22/2005 14:10:00|schedd|rupc-cs04b|I|starting up 6.0u3        <--- AFTER the last crush
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>  
>
>------------------------------------------------------------------------
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list