[GE users] The Scheduler dies" COMPLETE information

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Mon May 23 09:53:26 BST 2005


The work around is to remove all pe jobs, start the scheduler and than
resubmit the pe jobs...

u4 will be available soon. I do not know the date. Sorry. However, you can
compile it yourself by checking out the u4 tag.

Stephan

Viktor Oudovenko wrote:

>Hi, Stephan,
>
>Thank you for the answer.
>When u4 will be issued and where I can read about issue 1416?
>
>Meanwhile  I tried many things but nothing helped me at the moment.
>Why my scheduler reregister so often. Because  after it dies I restart it
>manually. Simply issuing command:
>$SGE_ROOT/bin/lx..../scg_schedd 
>Then the information about reregistering appears.
>
>Thank you very much for your help.
>v
>
>  
>
>>-----Original Message-----
>>From: Stephan Grell - Sun Germany - SSG - Software Engineer 
>>[mailto:stephan.grell at sun.com] 
>>Sent: Monday, May 23, 2005 3:45
>>To: users at gridengine.sunsource.net
>>Subject: Re: [GE users] The Scheduler dies" COMPLETE information
>>
>>
>>Hi Viktor,
>>
>>you encounter issue 1416. This is fixed with u4.
>>However, the important question is, why your scheduler is 
>>reregistering so often.
>>
>>Stephan
>>
>>Viktor Oudovenko wrote:
>>
>>    
>>
>>>Hi, Stephan and anybody who can help!
>>>
>>>Could you have a look at the attachment to see what is going 
>>>      
>>>
>>on with my 
>>    
>>
>>>scheduler. What I did I just run as you advised scheduler 
>>>      
>>>
>>demon in dl 1 
>>    
>>
>>>mode and waited until it crashes.
>>>And it did. It dies even without any events.  I mean you 
>>>      
>>>
>>will find two lines
>>    
>>
>>>in from messages file when the scheduler died without any 
>>>      
>>>
>>reason. But the
>>    
>>
>>>last crash happened because one of the myrinet jobs finished.
>>>Could you give any hint what could it be and what could it be done.
>>>I am running Linux SuSE 8.2 on the server  and 9.0 and 9.2 
>>>      
>>>
>>on the slaves. 
>>    
>>
>>>I also have a few opterons (8 machines). I am happy to 
>>>      
>>>
>>provide any further
>>    
>>
>>>information if necessary.
>>>Please help. 
>>>
>>>With kind regards,
>>>Viktor
>>>P.S. In the attachment I put  not only the last iteration 
>>>      
>>>
>>but a couple 
>>    
>>
>>>of successful ones. Actually in debug mode the scheduler updates 
>>>information like every 5-10 second or so.
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Stephan Grell - Sun Germany - SSG - Software Engineer
>>>>[mailto:stephan.grell at sun.com] 
>>>>Sent: Friday, May 20, 2005 3:05
>>>>To: users at gridengine.sunsource.net
>>>>Subject: Re: [GE users] Scheduler dies like a hell
>>>>
>>>>
>>>>Hi,
>>>>
>>>>I am not sure, that a currupted file is the problem. The
>>>>qmaster does some validation during the startup. Could you 
>>>>run the scheduler in debug mode and post the output just 
>>>>before it dies?
>>>>
>>>>You can set the debug mode with:
>>>>
>>>>source $SGE_ROOT/<CELL>/common/settings.csh
>>>>source $SGE_ROOT/util/dl.csh
>>>>dl 1
>>>>
>>>>bin/<arch>/sge_schedd
>>>>
>>>>Or, do you have a stack trace of the scheduler?
>>>>
>>>>Which version are you running on which arch?
>>>>
>>>>Thanks,
>>>>Stephan
>>>>
>>>>Viktor Oudovenko wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>Ron,
>>>>>
>>>>>Can I try to cat part of accounting file ? I mean to EDIT it
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>MANUALLY
>>>>   
>>>>
>>>>        
>>>>
>>>>>despite it is written do not do it? Best regards,
>>>>>v
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Ron Chen [mailto:ron_chen_123 at yahoo.com]
>>>>>>Sent: Thursday, May 19, 2005 22:02
>>>>>>To: users at gridengine.sunsource.net
>>>>>>Subject: RE: [GE users] Scheduler dies like a hell
>>>>>>
>>>>>>
>>>>>>It is not easy to find out which file gets corrupted
>>>>>>:(
>>>>>>
>>>>>>One thing you can try is to move spooled job files (in
>>>>>>default/spool/qmaster/jobs) to a backup directory.
>>>>>>Also, you can use qconf to dump the configuration for
>>>>>>the queues/users/hosts, and see if the values "make
>>>>>>sense".
>>>>>>
>>>>>>Of course the best way to fix this is to restore from backup!
>>>>>>
>>>>>>-Ron
>>>>>>
>>>>>>
>>>>>>--- Viktor Oudovenko <udo at physics.rutgers.edu> wrote:
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Hi, Ron,
>>>>>>>
>>>>>>>I am using classic spooling.
>>>>>>>Which file should I look for corruption? Can I edit
>>>>>>>it manually?
>>>>>>>Thank you very much in advance.
>>>>>>>v
>>>>>>>
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Ron Chen [mailto:ron_chen_123 at yahoo.com]
>>>>>>>>Sent: Thursday, May 19, 2005 20:38
>>>>>>>>To: users at gridengine.sunsource.net
>>>>>>>>Subject: RE: [GE users] Scheduler dies like a hell
>>>>>>>>
>>>>>>>>
>>>>>>>>Are you using classic spooling or Berkeley DB
>>>>>>>>spooling?
>>>>>>>>
>>>>>>>>With classic spooling, when the machine crashes,
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>the
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>files may get corrupted. And when qmaster reads in
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>the
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>corrupted files, it may also corrupt the qmasters'
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>data structures.
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>IIRC, Berkeley DB handles recovery itself, but I
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>have
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>never played with it myself :)
>>>>>>>>
>>>>>>>>-Ron
>>>>>>>>
>>>>>>>>
>>>>>>>>--- Viktor Oudovenko <udo at physics.rutgers.edu>
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>wrote:
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>Hi, Mac,
>>>>>>>>>Thank you very much for your advices!
>>>>>>>>>I'll try. I think one of running or finished
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>jobs
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>did a bad record somewhere
>>>>>>>>>(like jobs directory).
>>>>>>>>>Best regards,
>>>>>>>>>v
>>>>>>>>>
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>-----Original Message-----
>>>>>>>>>>From: McCalla, Mac
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>[mailto:macmccalla at hess.com]
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>Sent: Thursday, May 19, 2005 15:12
>>>>>>>>>>To: users at gridengine.sunsource.net
>>>>>>>>>>Subject: RE: [GE users] Scheduler dies like a
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>hell
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>Hi,
>>>>>>>>>>
>>>>>>>>>>Some thinks to look at:  any messages in 
>>>>>>>>>>$SGE_ROOT/......../qmaster/schedd/messages  ?
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>To
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>get more
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>info about what scheduler is doing while it is
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>running, see
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>info about scheduler params profile and
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>monitor,
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>you can set
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>them equal to 1 to turn on
>>>>>>>>>>some scheduler diagnostics,  see man
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>sched_conf.
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>To extend timeout value for scheduler you can
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>set
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>qmaster_params SCHEDULER_TIMEOUT to some value
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>greater than
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>600 (seconds).
>>>>>>>>>>You can also use system command strace to get
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>trace of
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>scheduler activity while it is running to
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>perhaps
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>get a
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>better idea of what it is spending its time
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>doing.
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>Hope this helps,
>>>>>>>>>>
>>>>>>>>>>mac mccalla
>>>>>>>>>>
>>>>>>>>>>-----Original Message-----
>>>>>>>>>>From: Viktor Oudovenko
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>[mailto:udo at physics.rutgers.edu]
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>Sent: Thursday, May 19, 2005 12:00 PM
>>>>>>>>>>To: users at gridengine.sunsource.net
>>>>>>>>>>Subject: [GE users] Scheduler dies like a hell
>>>>>>>>>>
>>>>>>>>>>Hi, everybody,
>>>>>>>>>>
>>>>>>>>>>I am asking your help and ideas what could be
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>done
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>to restore
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>normal operation of the scheduler. First what
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>happened. A few
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>time during last week our main server died and
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>I
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>needed to
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>reboot it and even replace it. But jobs which
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>used
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>automount
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>proceed run. But from yesterday or day before
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>yesterday
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>scheduler demon dies. I tried to restart
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>sge_master but it
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>did not help. Now when demon died I start it
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>manually simply typing:
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>/opt/SGE/bin/lx24-x86/sge_schedd
>>>>>>>>>>
>>>>>>>>>>but after some time it died again. Please
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>advice
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>what could it be?
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>Below plz find some info form file messages:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>05/19/2005 01:02:37|qmaster|rupc-cs04b|E|no
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>execd
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>known on
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>host sub04n87 to send conf notification
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>05/19/2005
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>on
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>host sub04n88
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>to send conf notification 05/19/2005 
>>>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>on
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>host sub04n89
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>to send conf notification 05/19/2005 
>>>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>on
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>host sub04n90
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>to send conf notification 05/19/2005 
>>>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>on
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>host sub04n91
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>to send conf notification 05/19/2005 
>>>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>on
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>host
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>rupc04.rutgers.edu to send conf notification
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>05/19/2005
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>01:02:37|qmaster|rupc-cs04b|I|starting up
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>6.0u3
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>05/19/2005
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>01:08:11|qmaster|rupc-cs04b|E|commlib error:
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>got
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>read error
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>(closing connection) 05/19/2005 
>>>>>>>>>>01:11:06|qmaster|rupc-cs04b|E|event client
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>"scheduler"
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>(rupc-cs04b/schedd/1) reregistered - it will
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>need
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>a total
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>update 05/19/2005
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>01:24:31|qmaster|rupc-cs04b|W|job 21171.1
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>failed on host sub04n203 assumedly after job
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>because: job
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>21171.1 died through signal TERM
>>>>>>>>>>(15)
>>>>>>>>>>05/19/2005
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>05:17:19|qmaster|rupc-cs04b|E|acknowledge
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>timeout
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>after 600 seconds for event client (schedd:1)
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>on
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>host
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>"rupc-cs04b" 05/19/2005
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>09:29:03|qmaster|rupc-cs04b|W|job
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>21060.1 failed on host sub04n74 assumedly
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>after
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>job because:
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>job 21060.1 died through signal TERM (15)
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>05/19/2005
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>09:30:37|qmaster|rupc-cs04b|E|event client
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>"scheduler"
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>(rupc-cs04b/schedd/1) reregistered - it will
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>need
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>a total
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>update 05/19/2005
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>11:04:21|qmaster|rupc-cs04b|W|job 20222.1
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>failed on host sub04n29 assumedly after job
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>because: job
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>20222.1 died through signal KILL (9)
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>05/19/2005
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>11:05:50|qmaster|rupc-cs04b|W|job 21212.1
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>failed
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>on host
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>sub04n25 assumedly after job because: job
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>21212.1
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>died
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>through signal KILL (9) 05/19/2005 
>>>>>>>>>>12:04:51|qmaster|rupc-cs04b|E|acknowledge
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>timeout
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>after 600
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>seconds for event client (schedd:1) on host
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>"rupc-cs04b"
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>=== message truncated ===
>>>>>>
>>>>>>
>>>>>>
>>>>>>		
>>>>>>Discover Yahoo!
>>>>>>Have fun online with music videos, cool games, IM and more.
>>>>>>Check it out! 
>>>>>>http://discover.yahoo.com/online.html
>>>>>>
>>>>>>------------------------------------------------------------
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>---------
>>>>   
>>>>
>>>>        
>>>>
>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>For additional commands, e-mail: 
>>>>>>            
>>>>>>
>>users-help at gridengine.sunsource.net
>>    
>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>-----------------------------------------------------------
>>>>>          
>>>>>
>>----------
>>    
>>
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: 
>>>>>          
>>>>>
>>users-help at gridengine.sunsource.net
>>    
>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>------------------------------------------------------------
>>>>        
>>>>
>>---------
>>    
>>
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>-------------------------------------------------------------
>>>      
>>>
>>----------
>>    
>>
>>>-
>>>
>>>WS128133  25368 16384     SENDING 22 ORDERS TO QMASTER
>>>128134  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
>>>128135  25368 16384     reresolve port timeout in 340
>>>128136  25368 16384     returning cached port value: 536
>>>--------------STOP-SCHEDULER-RUN-------------
>>>128137  25368 16384     ec_get retrieving events - will do 
>>>      
>>>
>>max 20 fetches
>>    
>>
>>>128138  25368 16384     doing sync fetch for messages, 20 still to do
>>>128139  25368 16384     try to get request from qmaster, id 1
>>>128140  25368 16384     Checking 55 events (44303-44357) 
>>>      
>>>
>>while waiting for #44303
>>    
>>
>>>128141  25368 16384     check complete, 55 events in list
>>>128142  25368 16384     got 55 events till 44357
>>>128143  25368 16384     doing async fetch for messages, 19 
>>>      
>>>
>>still to do
>>    
>>
>>>128144  25368 16384     try to get request from qmaster, id 1
>>>128145  25368 16384     reresolve port timeout in 320
>>>128146  25368 16384     returning cached port value: 536
>>>128147  25368 16384     Sent ack for all events lower or equal 44357
>>>128148  25368 16384     ec_get - received 55 events
>>>128149  25368 16384     44303. EVENT MOD EXECHOST sub04n147
>>>128150  25368 16384     44304. EVENT MOD USER udo
>>>128151  25368 16384     44305. EVENT MOD USER iber
>>>128152  25368 16384     44306. EVENT MOD USER dieguez
>>>128153  25368 16384     44307. EVENT MOD USER karenjoh
>>>128154  25368 16384     44308. EVENT MOD USER lorenzo
>>>128155  25368 16384     44309. EVENT MOD USER parcolle
>>>128156  25368 16384     44310. EVENT MOD USER cfennie
>>>128157  25368 16384     44311. EVENT MOD USER civelli
>>>128158  25368 16384     44312. EVENT MOD EXECHOST sub04n135
>>>128159  25368 16384     44313. EVENT MOD EXECHOST sub04n141
>>>128160  25368 16384     44314. EVENT MOD EXECHOST sub04n127
>>>128161  25368 16384     44315. EVENT MOD EXECHOST sub04n145
>>>128162  25368 16384     44316. EVENT MOD EXECHOST sub04n133
>>>128163  25368 16384     44317. EVENT MOD EXECHOST sub04n148
>>>128164  25368 16384     44318. EVENT MOD EXECHOST sub04n74
>>>128165  25368 16384     44319. EVENT JOB 21542.1 task 
>>>      
>>>
>>2.sub04n74 USAGE
>>    
>>
>>>128166  25368 16384     44320. EVENT JOB 21542.1 task 
>>>      
>>>
>>1.sub04n74 USAGE
>>    
>>
>>>128167  25368 16384     44321. EVENT MOD EXECHOST rupc03.rutgers.edu
>>>128168  25368 16384     44322. EVENT MOD EXECHOST sub04n139
>>>128169  25368 16384     44323. EVENT MOD EXECHOST rupc02.rutgers.edu
>>>128170  25368 16384     44324. EVENT MOD EXECHOST sub04n80
>>>128171  25368 16384     44325. EVENT MOD EXECHOST sub04n207
>>>128172  25368 16384     44326. EVENT MOD EXECHOST sub04n180
>>>128173  25368 16384     44327. EVENT MOD EXECHOST sub04n23
>>>128174  25368 16384     44328. EVENT MOD EXECHOST sub04n30
>>>128175  25368 16384     44329. EVENT MOD EXECHOST sub04n203
>>>128176  25368 16384     44330. EVENT MOD EXECHOST sub04n109
>>>128177  25368 16384     44331. EVENT MOD EXECHOST rupc04.rutgers.edu
>>>128178  25368 16384     44332. EVENT MOD EXECHOST sub04n114
>>>128179  25368 16384     44333. EVENT MOD EXECHOST sub04n106
>>>128180  25368 16384     44334. EVENT MOD EXECHOST sub04n88
>>>128181  25368 16384     44335. EVENT JOB 21507.1 task 
>>>      
>>>
>>6.sub04n88 USAGE
>>    
>>
>>>128182  25368 16384     44336. EVENT JOB 21507.1 task 
>>>      
>>>
>>5.sub04n88 USAGE
>>    
>>
>>>128183  25368 16384     44337. EVENT MOD EXECHOST sub04n157
>>>128184  25368 16384     44338. EVENT MOD EXECHOST sub04n20
>>>128185  25368 16384     44339. EVENT MOD EXECHOST sub04n156
>>>128186  25368 16384     44340. EVENT MOD EXECHOST sub04n26
>>>128187  25368 16384     44341. EVENT JOB 21213.1 USAGE
>>>128188  25368 16384     44342. EVENT MOD EXECHOST sub04n05
>>>128189  25368 16384     44343. EVENT MOD EXECHOST sub04n103
>>>128190  25368 16384     44344. EVENT MOD EXECHOST sub04n164
>>>128191  25368 16384     44345. EVENT MOD EXECHOST sub04n09
>>>128192  25368 16384     44346. EVENT MOD EXECHOST sub04n105
>>>128193  25368 16384     44347. EVENT MOD EXECHOST sub04n113
>>>128194  25368 16384     44348. EVENT MOD EXECHOST sub04n28
>>>128195  25368 16384     44349. EVENT MOD EXECHOST sub04n76
>>>128196  25368 16384     44350. EVENT MOD EXECHOST sub04n162
>>>128197  25368 16384     44351. EVENT MOD EXECHOST sub04n108
>>>128198  25368 16384     44352. EVENT MOD EXECHOST sub04n38
>>>128199  25368 16384     44353. EVENT MOD EXECHOST sub04n04
>>>128200  25368 16384     44354. EVENT MOD EXECHOST sub04n116
>>>128201  25368 16384     44355. EVENT MOD EXECHOST sub04n179
>>>128202  25368 16384     44356. EVENT MOD EXECHOST sub04n160
>>>128203  25368 16384     44357. EVENT MOD EXECHOST sub04n107
>>>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7, 
>>>      
>>>
>>CKPT:0 US:15 PR:4 S:nd:12/lf:7 
>>    
>>
>>>128204  25368 16384     
>>>      
>>>
>>================[SCHEDULING-EPOCH]==================
>>    
>>
>>>128205  25368 16384     JOB 20937.1 start_time = 1116447112 
>>>      
>>>
>>running_time 338079 decay_time = 450
>>    
>>
>>>128206  25368 16384     JOB 20938.1 start_time = 1116374344 
>>>      
>>>
>>running_time 410847 decay_time = 450
>>    
>>
>>>128207  25368 16384     JOB 21040.1 start_time = 1116443073 
>>>      
>>>
>>running_time 342118 decay_time = 450
>>    
>>
>>>128208  25368 16384     JOB 21076.1 start_time = 1116451351 
>>>      
>>>
>>running_time 333840 decay_time = 450
>>    
>>
>>>128209  25368 16384     JOB 21210.1 start_time = 1116514970 
>>>      
>>>
>>running_time 270221 decay_time = 450
>>    
>>
>>>128210  25368 16384     JOB 21213.1 start_time = 1116515250 
>>>      
>>>
>>running_time 269941 decay_time = 450
>>    
>>
>>>128211  25368 16384     JOB 21338.1 start_time = 1116543252 
>>>      
>>>
>>running_time 241939 decay_time = 450
>>    
>>
>>>128212  25368 16384     JOB 21423.1 start_time = 1116629274 
>>>      
>>>
>>running_time 155917 decay_time = 450
>>    
>>
>>>128213  25368 16384     JOB 21424.1 start_time = 1116631365 
>>>      
>>>
>>running_time 153826 decay_time = 450
>>    
>>
>>>128214  25368 16384     JOB 21440.1 start_time = 1116632934 
>>>      
>>>
>>running_time 152257 decay_time = 450
>>    
>>
>>>128215  25368 16384     JOB 21441.1 start_time = 1116632994 
>>>      
>>>
>>running_time 152197 decay_time = 450
>>    
>>
>>>128216  25368 16384     JOB 21443.1 start_time = 1116633602 
>>>      
>>>
>>running_time 151589 decay_time = 450
>>    
>>
>>>128217  25368 16384     JOB 21474.1 start_time = 1116655118 
>>>      
>>>
>>running_time 130073 decay_time = 450
>>    
>>
>>>128218  25368 16384     JOB 21503.1 start_time = 1116707395 
>>>      
>>>
>>running_time 77796 decay_time = 450
>>    
>>
>>>128219  25368 16384     JOB 21507.1 start_time = 1116714061 
>>>      
>>>
>>running_time 71130 decay_time = 450
>>    
>>
>>>128220  25368 16384     JOB 21528.1 start_time = 1116707641 
>>>      
>>>
>>running_time 77550 decay_time = 450
>>    
>>
>>>128221  25368 16384     JOB 21530.1 start_time = 1116714453 
>>>      
>>>
>>running_time 70738 decay_time = 450
>>    
>>
>>>128222  25368 16384     JOB 21537.1 start_time = 1116724845 
>>>      
>>>
>>running_time 60346 decay_time = 450
>>    
>>
>>>128223  25368 16384     JOB 21542.1 start_time = 1116782511 
>>>      
>>>
>>running_time 2680 decay_time = 450
>>    
>>
>>>128224  25368 16384     verified threshold of 169 queues
>>>128225  25368 16384     queue myrinet at sub04n61 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128226  25368 16384     queue myrinet at sub04n62 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128227  25368 16384     queue myrinet at sub04n65 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128228  25368 16384     queue myrinet at sub04n66 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128229  25368 16384     queue myrinet at sub04n67 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>>    
>>
>>>128230  25368 16384     queue myrinet at sub04n68 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128231  25368 16384     queue myrinet at sub04n69 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128232  25368 16384     queue myrinet at sub04n70 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128233  25368 16384     queue myrinet at sub04n71 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128234  25368 16384     queue myrinet at sub04n72 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128235  25368 16384     queue myrinet at sub04n75 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128236  25368 16384     queue myrinet at sub04n77 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128237  25368 16384     queue myrinet at sub04n78 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128238  25368 16384     queue myrinet at sub04n79 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128239  25368 16384     queue myrinet at sub04n81 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128240  25368 16384     queue myrinet at sub04n84 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128241  25368 16384     queue myrinet at sub04n85 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128242  25368 16384     queue myrinet at sub04n86 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128243  25368 16384     queue myrinet at sub04n87 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128244  25368 16384     queue myrinet at sub04n88 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128245  25368 16384     queue myrinet at sub04n89 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128246  25368 16384     queue myrinet at sub04n90 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128247  25368 16384     queue myrinet at sub04n91 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128248  25368 16384     queue myrinet at sub04n63 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128249  25368 16384     queue myrinet at sub04n64 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128250  25368 16384     queue myrinet at sub04n73 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128251  25368 16384     queue myrinet at sub04n74 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128252  25368 16384     queue opteronp at sub04n202 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128253  25368 16384     queue opteronp at sub04n205 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.010000 (no load adjustment) >= 1.0
>>    
>>
>>>128254  25368 16384     queue opteronp at sub04n206 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128255  25368 16384     queue opteronp at sub04n208 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.010000 (no load adjustment) >= 1.0
>>    
>>
>>>128256  25368 16384     queue parallel at sub04n121 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>>    
>>
>>>128257  25368 16384     queue parallel at sub04n139 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128258  25368 16384     queue parallel at sub04n140 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128259  25368 16384     queue parallel at sub04n141 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128260  25368 16384     queue parallel at sub04n142 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128261  25368 16384     queue parallel at sub04n143 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128262  25368 16384     queue parallel at sub04n144 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128263  25368 16384     queue parallel at sub04n146 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128264  25368 16384     queue parallel at sub04n02 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128265  25368 16384     queue parallel at sub04n03 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>>    
>>
>>>128266  25368 16384     queue parallel at sub04n04 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128267  25368 16384     queue parallel at sub04n05 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128268  25368 16384     queue parallel at sub04n06 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128269  25368 16384     queue parallel at sub04n07 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128270  25368 16384     queue parallel at sub04n08 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128271  25368 16384     queue parallel at sub04n09 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128272  25368 16384     queue parallel at sub04n10 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128273  25368 16384     queue parallel at sub04n11 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128274  25368 16384     verified threshold of 169 queues
>>>128275  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
>>>128276  25368 16384     Not enrolled ja_tasks: 0
>>>128277  25368 16384     Enrolled ja_tasks: 1
>>>128278  25368 16384     Not enrolled ja_tasks: 0
>>>128279  25368 16384     Enrolled ja_tasks: 1
>>>128280  25368 16384     Not enrolled ja_tasks: 0
>>>128281  25368 16384     Enrolled ja_tasks: 1
>>>128282  25368 16384     Not enrolled ja_tasks: 0
>>>128283  25368 16384     Enrolled ja_tasks: 1
>>>128284  25368 16384     Not enrolled ja_tasks: 0
>>>128285  25368 16384     Enrolled ja_tasks: 1
>>>128286  25368 16384     Not enrolled ja_tasks: 0
>>>128287  25368 16384     Enrolled ja_tasks: 1
>>>128288  25368 16384     Not enrolled ja_tasks: 0
>>>128289  25368 16384     Enrolled ja_tasks: 1
>>>128290  25368 16384     Not enrolled ja_tasks: 0
>>>128291  25368 16384     Enrolled ja_tasks: 1
>>>128292  25368 16384     Not enrolled ja_tasks: 0
>>>128293  25368 16384     Enrolled ja_tasks: 1
>>>128294  25368 16384     Not enrolled ja_tasks: 0
>>>128295  25368 16384     Enrolled ja_tasks: 1
>>>128296  25368 16384     Not enrolled ja_tasks: 0
>>>128297  25368 16384     Enrolled ja_tasks: 1
>>>128298  25368 16384     Not enrolled ja_tasks: 0
>>>128299  25368 16384     Enrolled ja_tasks: 1
>>>128300  25368 16384     Not enrolled ja_tasks: 0
>>>128301  25368 16384     Enrolled ja_tasks: 1
>>>128302  25368 16384     Not enrolled ja_tasks: 0
>>>128303  25368 16384     Enrolled ja_tasks: 1
>>>128304  25368 16384     Not enrolled ja_tasks: 0
>>>128305  25368 16384     Enrolled ja_tasks: 1
>>>128306  25368 16384     Not enrolled ja_tasks: 0
>>>128307  25368 16384     Enrolled ja_tasks: 1
>>>128308  25368 16384     Not enrolled ja_tasks: 0
>>>128309  25368 16384     Enrolled ja_tasks: 1
>>>128310  25368 16384     Not enrolled ja_tasks: 0
>>>128311  25368 16384     Enrolled ja_tasks: 1
>>>128312  25368 16384     Not enrolled ja_tasks: 0
>>>128313  25368 16384     Enrolled ja_tasks: 1
>>>128314  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
>>>128315  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128316  25368 16384        slots: 1.000000 * 1000.000000 * 6 
>>>      
>>>
>>   ---> 6000.000000
>>    
>>
>>>128317  25368 16384     slot request assumed for static 
>>>      
>>>
>>urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
>>    
>>
>>>128318  25368 16384        slots: 1.000000 * 1000.000000 * 
>>>      
>>>
>>20    ---> 20000.000000
>>    
>>
>>>128319  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128320  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128321  25368 16384        slots: 1.000000 * 1000.000000 * 6 
>>>      
>>>
>>   ---> 6000.000000
>>    
>>
>>>128322  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128323  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128324  25368 16384     slot request assumed for static 
>>>      
>>>
>>urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" setting "min"
>>    
>>
>>>128325  25368 16384        slots: 1.000000 * 1000.000000 * 2 
>>>      
>>>
>>   ---> 2000.000000
>>    
>>
>>>128326  25368 16384        slots: 1.000000 * 1000.000000 * 8 
>>>      
>>>
>>   ---> 8000.000000
>>    
>>
>>>128327  25368 16384     ASU min = 1000.00000000000, ASU max 
>>>      
>>>
>>= 20000.00000000000
>>    
>>
>>>128328  25368 16384     
>>>128329  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
>>>128330  25368 16384     
>>>128331  25368 16384     =====================[Pass 
>>>      
>>>
>>0]======================
>>    
>>
>>>128332  25368 16384     =====================[Pass 
>>>      
>>>
>>1]======================
>>    
>>
>>>128333  25368 16384     =====================[Pass 
>>>      
>>>
>>2]======================
>>    
>>
>>>128334  25368 16384     
>>>128335  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
>>>128336  25368 16384     
>>>128337  25368 16384     =====================[Pass 
>>>      
>>>
>>0]======================
>>    
>>
>>>128338  25368 16384     =====================[Pass 
>>>      
>>>
>>1]======================
>>    
>>
>>>128339  25368 16384     =====================[Pass 
>>>      
>>>
>>2]======================
>>    
>>
>>>128340  25368 16384     Normalizing tickets using 
>>>      
>>>
>>0.000000/18.333333 as min_tix/max_tix
>>    
>>
>>>128341  25368 16384        got 19 running jobs
>>>128342  25368 16384        added 19 ticket orders for running jobs
>>>128343  25368 16384        added 1 orders for updating usage of user
>>>128344  25368 16384        added 0 orders for updating usage 
>>>      
>>>
>>of project
>>    
>>
>>>128345  25368 16384        added 0 orders for updating share tree
>>>128346  25368 16384        added 1 orders for scheduler configuration
>>>128347  25368 16384     SENDING 22 ORDERS TO QMASTER
>>>128348  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
>>>128349  25368 16384     reresolve port timeout in 320
>>>128350  25368 16384     returning cached port value: 536
>>>--------------STOP-SCHEDULER-RUN-------------
>>>128351  25368 16384     ec_get retrieving events - will do 
>>>      
>>>
>>max 20 fetches
>>    
>>
>>>128352  25368 16384     doing sync fetch for messages, 20 still to do
>>>128353  25368 16384     try to get request from qmaster, id 1
>>>128354  25368 16384     Checking 120 events (44358-44477) 
>>>      
>>>
>>while waiting for #44358
>>    
>>
>>>128355  25368 16384     check complete, 120 events in list
>>>128356  25368 16384     got 120 events till 44477
>>>128357  25368 16384     doing async fetch for messages, 19 
>>>      
>>>
>>still to do
>>    
>>
>>>128358  25368 16384     try to get request from qmaster, id 1
>>>128359  25368 16384     reresolve port timeout in 300
>>>128360  25368 16384     returning cached port value: 536
>>>128361  25368 16384     Sent ack for all events lower or equal 44477
>>>128362  25368 16384     ec_get - received 120 events
>>>128363  25368 16384     44358. EVENT MOD EXECHOST sub04n166
>>>128364  25368 16384     44359. EVENT MOD EXECHOST sub04n90
>>>128365  25368 16384     44360. EVENT JOB 21503.1 task 
>>>      
>>>
>>2.sub04n90 USAGE
>>    
>>
>>>128366  25368 16384     44361. EVENT JOB 21503.1 task 
>>>      
>>>
>>1.sub04n90 USAGE
>>    
>>
>>>128367  25368 16384     44362. EVENT MOD EXECHOST sub04n168
>>>128368  25368 16384     44363. EVENT MOD EXECHOST sub04n112
>>>128369  25368 16384     44364. EVENT MOD EXECHOST sub04n08
>>>128370  25368 16384     44365. EVENT MOD EXECHOST sub04n75
>>>128371  25368 16384     44366. EVENT JOB 21040.1 task 
>>>      
>>>
>>6.sub04n75 USAGE
>>    
>>
>>>128372  25368 16384     44367. EVENT JOB 21040.1 task 
>>>      
>>>
>>5.sub04n75 USAGE
>>    
>>
>>>128373  25368 16384     44368. EVENT MOD USER udo
>>>128374  25368 16384     44369. EVENT MOD USER iber
>>>128375  25368 16384     44370. EVENT MOD USER dieguez
>>>128376  25368 16384     44371. EVENT MOD USER karenjoh
>>>128377  25368 16384     44372. EVENT MOD USER lorenzo
>>>128378  25368 16384     44373. EVENT MOD USER parcolle
>>>128379  25368 16384     44374. EVENT MOD USER cfennie
>>>128380  25368 16384     44375. EVENT MOD USER civelli
>>>128381  25368 16384     44376. EVENT MOD EXECHOST sub04n14
>>>128382  25368 16384     44377. EVENT MOD EXECHOST sub04n150
>>>128383  25368 16384     44378. EVENT MOD EXECHOST sub04n169
>>>128384  25368 16384     44379. EVENT MOD EXECHOST sub04n165
>>>128385  25368 16384     44380. EVENT MOD EXECHOST sub04n136
>>>128386  25368 16384     44381. EVENT MOD EXECHOST sub04n81
>>>128387  25368 16384     44382. EVENT JOB 21507.1 task 
>>>      
>>>
>>6.sub04n81 USAGE
>>    
>>
>>>128388  25368 16384     44383. EVENT JOB 21507.1 task 
>>>      
>>>
>>5.sub04n81 USAGE
>>    
>>
>>>128389  25368 16384     44384. EVENT MOD EXECHOST sub04n176
>>>128390  25368 16384     44385. EVENT MOD EXECHOST sub04n161
>>>128391  25368 16384     44386. EVENT MOD EXECHOST sub04n124
>>>128392  25368 16384     44387. EVENT MOD EXECHOST sub04n01
>>>128393  25368 16384     44388. EVENT MOD EXECHOST sub04n158
>>>128394  25368 16384     44389. EVENT MOD EXECHOST sub04n159
>>>128395  25368 16384     44390. EVENT MOD EXECHOST sub04n134
>>>128396  25368 16384     44391. EVENT MOD EXECHOST sub04n143
>>>128397  25368 16384     44392. EVENT MOD EXECHOST sub04n121
>>>128398  25368 16384     44393. EVENT MOD EXECHOST sub04n15
>>>128399  25368 16384     44394. EVENT MOD EXECHOST sub04n13
>>>128400  25368 16384     44395. EVENT MOD EXECHOST sub04n118
>>>128401  25368 16384     44396. EVENT MOD EXECHOST sub04n64
>>>128402  25368 16384     44397. EVENT JOB 21542.1 task 
>>>      
>>>
>>2.sub04n64 USAGE
>>    
>>
>>>128403  25368 16384     44398. EVENT JOB 21542.1 task 
>>>      
>>>
>>1.sub04n64 USAGE
>>    
>>
>>>128404  25368 16384     44399. EVENT MOD EXECHOST sub04n151
>>>128405  25368 16384     44400. EVENT MOD EXECHOST sub04n154
>>>128406  25368 16384     44401. EVENT MOD EXECHOST sub04n149
>>>128407  25368 16384     44402. EVENT MOD EXECHOST sub04n16
>>>128408  25368 16384     44403. EVENT MOD EXECHOST sub04n155
>>>128409  25368 16384     44404. EVENT MOD EXECHOST sub04n152
>>>128410  25368 16384     44405. EVENT MOD EXECHOST sub04n163
>>>128411  25368 16384     44406. EVENT MOD EXECHOST sub04n86
>>>128412  25368 16384     44407. EVENT JOB 21423.1 task 
>>>      
>>>
>>2.sub04n86 USAGE
>>    
>>
>>>128413  25368 16384     44408. EVENT JOB 21423.1 task 
>>>      
>>>
>>1.sub04n86 USAGE
>>    
>>
>>>128414  25368 16384     44409. EVENT MOD EXECHOST sub04n43
>>>128415  25368 16384     44410. EVENT MOD EXECHOST sub04n204
>>>128416  25368 16384     44411. EVENT MOD EXECHOST rupc01.rutgers.edu
>>>128417  25368 16384     44412. EVENT MOD EXECHOST sub04n125
>>>128418  25368 16384     44413. EVENT MOD EXECHOST sub04n03
>>>128419  25368 16384     44414. EVENT JOB 21076.1 USAGE
>>>128420  25368 16384     44415. EVENT MOD EXECHOST sub04n44
>>>128421  25368 16384     44416. EVENT MOD EXECHOST sub04n32
>>>128422  25368 16384     44417. EVENT MOD EXECHOST sub04n21
>>>128423  25368 16384     44418. EVENT MOD EXECHOST sub04n22
>>>128424  25368 16384     44419. EVENT MOD EXECHOST sub04n35
>>>128425  25368 16384     44420. EVENT MOD EXECHOST sub04n201
>>>128426  25368 16384     44421. EVENT MOD EXECHOST sub04n146
>>>128427  25368 16384     44422. EVENT MOD EXECHOST sub04n111
>>>128428  25368 16384     44423. EVENT MOD EXECHOST sub04n177
>>>128429  25368 16384     44424. EVENT MOD EXECHOST sub04n89
>>>128430  25368 16384     44425. EVENT JOB 21530.1 task 
>>>      
>>>
>>2.sub04n89 USAGE
>>    
>>
>>>128431  25368 16384     44426. EVENT JOB 21530.1 task 
>>>      
>>>
>>1.sub04n89 USAGE
>>    
>>
>>>128432  25368 16384     44427. EVENT JOB 21530.1 USAGE
>>>128433  25368 16384     44428. EVENT MOD EXECHOST sub04n205
>>>128434  25368 16384     44429. EVENT JOB 21440.1 USAGE
>>>128435  25368 16384     44430. EVENT MOD EXECHOST sub04n208
>>>128436  25368 16384     44431. EVENT JOB 21528.1 USAGE
>>>128437  25368 16384     44432. EVENT MOD EXECHOST sub04n104
>>>128438  25368 16384     44433. EVENT MOD EXECHOST sub04n24
>>>128439  25368 16384     44434. EVENT JOB 21210.1 USAGE
>>>128440  25368 16384     44435. EVENT MOD EXECHOST sub04n18
>>>128441  25368 16384     44436. EVENT MOD EXECHOST sub04n31
>>>128442  25368 16384     44437. EVENT JOB 20937.1 USAGE
>>>128443  25368 16384     44438. EVENT MOD EXECHOST sub04n202
>>>128444  25368 16384     44439. EVENT JOB 21443.1 USAGE
>>>128445  25368 16384     44440. EVENT MOD EXECHOST sub04n171
>>>128446  25368 16384     44441. EVENT MOD EXECHOST sub04n37
>>>128447  25368 16384     44442. EVENT MOD EXECHOST sub04n36
>>>128448  25368 16384     44443. EVENT MOD EXECHOST sub04n40
>>>128449  25368 16384     44444. EVENT MOD EXECHOST sub04n12
>>>128450  25368 16384     44445. EVENT MOD EXECHOST sub04n172
>>>128451  25368 16384     44446. EVENT MOD EXECHOST sub04n79
>>>128452  25368 16384     44447. EVENT JOB 21040.1 task 
>>>      
>>>
>>6.sub04n79 USAGE
>>    
>>
>>>128453  25368 16384     44448. EVENT JOB 21040.1 task 
>>>      
>>>
>>5.sub04n79 USAGE
>>    
>>
>>>128454  25368 16384     44449. EVENT JOB 21040.1 USAGE
>>>128455  25368 16384     44450. EVENT MOD EXECHOST sub04n61
>>>128456  25368 16384     44451. EVENT JOB 21040.1 task 
>>>      
>>>
>>6.sub04n61 USAGE
>>    
>>
>>>128457  25368 16384     44452. EVENT JOB 21040.1 task 
>>>      
>>>
>>5.sub04n61 USAGE
>>    
>>
>>>128458  25368 16384     44453. EVENT MOD EXECHOST sub04n170
>>>128459  25368 16384     44454. EVENT MOD EXECHOST sub04n41
>>>128460  25368 16384     44455. EVENT JOB 20938.1 USAGE
>>>128461  25368 16384     44456. EVENT MOD EXECHOST sub04n153
>>>128462  25368 16384     44457. EVENT MOD EXECHOST sub04n39
>>>128463  25368 16384     44458. EVENT MOD EXECHOST sub04n83
>>>128464  25368 16384     44459. EVENT MOD EXECHOST sub04n82
>>>128465  25368 16384     44460. EVENT MOD EXECHOST sub04n174
>>>128466  25368 16384     44461. EVENT MOD EXECHOST sub04n173
>>>128467  25368 16384     44462. EVENT MOD EXECHOST sub04n85
>>>128468  25368 16384     44463. EVENT JOB 21423.1 task 
>>>      
>>>
>>2.sub04n85 USAGE
>>    
>>
>>>128469  25368 16384     44464. EVENT JOB 21423.1 task 
>>>      
>>>
>>1.sub04n85 USAGE
>>    
>>
>>>128470  25368 16384     44465. EVENT MOD EXECHOST sub04n68
>>>128471  25368 16384     44466. EVENT JOB 21474.1 task 
>>>      
>>>
>>14.sub04n68 USAGE
>>    
>>
>>>128472  25368 16384     44467. EVENT JOB 21474.1 task 
>>>      
>>>
>>13.sub04n68 USAGE
>>    
>>
>>>128473  25368 16384     44468. EVENT MOD EXECHOST beowulf.rutgers.edu
>>>128474  25368 16384     44469. EVENT MOD EXECHOST sub04n91
>>>128475  25368 16384     44470. EVENT JOB 21423.1 task 
>>>      
>>>
>>2.sub04n91 USAGE
>>    
>>
>>>128476  25368 16384     44471. EVENT JOB 21423.1 task 
>>>      
>>>
>>1.sub04n91 USAGE
>>    
>>
>>>128477  25368 16384     44472. EVENT JOB 21423.1 USAGE
>>>128478  25368 16384     44473. EVENT MOD EXECHOST sub04n29
>>>128479  25368 16384     44474. EVENT MOD EXECHOST sub04n69
>>>128480  25368 16384     44475. EVENT JOB 21474.1 task 
>>>      
>>>
>>14.sub04n69 USAGE
>>    
>>
>>>128481  25368 16384     44476. EVENT JOB 21474.1 task 
>>>      
>>>
>>13.sub04n69 USAGE
>>    
>>
>>>128482  25368 16384     44477. EVENT MOD EXECHOST sub04n175
>>>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7, 
>>>      
>>>
>>CKPT:0 US:15 PR:4 S:nd:12/lf:7 
>>    
>>
>>>128483  25368 16384     
>>>      
>>>
>>================[SCHEDULING-EPOCH]==================
>>    
>>
>>>128484  25368 16384     JOB 20937.1 start_time = 1116447112 
>>>      
>>>
>>running_time 338099 decay_time = 450
>>    
>>
>>>128485  25368 16384     JOB 20938.1 start_time = 1116374344 
>>>      
>>>
>>running_time 410867 decay_time = 450
>>    
>>
>>>128486  25368 16384     JOB 21040.1 start_time = 1116443073 
>>>      
>>>
>>running_time 342138 decay_time = 450
>>    
>>
>>>128487  25368 16384     JOB 21076.1 start_time = 1116451351 
>>>      
>>>
>>running_time 333860 decay_time = 450
>>    
>>
>>>128488  25368 16384     JOB 21210.1 start_time = 1116514970 
>>>      
>>>
>>running_time 270241 decay_time = 450
>>    
>>
>>>128489  25368 16384     JOB 21213.1 start_time = 1116515250 
>>>      
>>>
>>running_time 269961 decay_time = 450
>>    
>>
>>>128490  25368 16384     JOB 21338.1 start_time = 1116543252 
>>>      
>>>
>>running_time 241959 decay_time = 450
>>    
>>
>>>128491  25368 16384     JOB 21423.1 start_time = 1116629274 
>>>      
>>>
>>running_time 155937 decay_time = 450
>>    
>>
>>>128492  25368 16384     JOB 21424.1 start_time = 1116631365 
>>>      
>>>
>>running_time 153846 decay_time = 450
>>    
>>
>>>128493  25368 16384     JOB 21440.1 start_time = 1116632934 
>>>      
>>>
>>running_time 152277 decay_time = 450
>>    
>>
>>>128494  25368 16384     JOB 21441.1 start_time = 1116632994 
>>>      
>>>
>>running_time 152217 decay_time = 450
>>    
>>
>>>128495  25368 16384     JOB 21443.1 start_time = 1116633602 
>>>      
>>>
>>running_time 151609 decay_time = 450
>>    
>>
>>>128496  25368 16384     JOB 21474.1 start_time = 1116655118 
>>>      
>>>
>>running_time 130093 decay_time = 450
>>    
>>
>>>128497  25368 16384     JOB 21503.1 start_time = 1116707395 
>>>      
>>>
>>running_time 77816 decay_time = 450
>>    
>>
>>>128498  25368 16384     JOB 21507.1 start_time = 1116714061 
>>>      
>>>
>>running_time 71150 decay_time = 450
>>    
>>
>>>128499  25368 16384     JOB 21528.1 start_time = 1116707641 
>>>      
>>>
>>running_time 77570 decay_time = 450
>>    
>>
>>>128500  25368 16384     JOB 21530.1 start_time = 1116714453 
>>>      
>>>
>>running_time 70758 decay_time = 450
>>    
>>
>>>128501  25368 16384     JOB 21537.1 start_time = 1116724845 
>>>      
>>>
>>running_time 60366 decay_time = 450
>>    
>>
>>>128502  25368 16384     JOB 21542.1 start_time = 1116782511 
>>>      
>>>
>>running_time 2700 decay_time = 450
>>    
>>
>>>128503  25368 16384     verified threshold of 169 queues
>>>128504  25368 16384     queue myrinet at sub04n61 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128505  25368 16384     queue myrinet at sub04n62 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128506  25368 16384     queue myrinet at sub04n65 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128507  25368 16384     queue myrinet at sub04n66 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128508  25368 16384     queue myrinet at sub04n67 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>>    
>>
>>>128509  25368 16384     queue myrinet at sub04n68 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128510  25368 16384     queue myrinet at sub04n69 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128511  25368 16384     queue myrinet at sub04n70 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128512  25368 16384     queue myrinet at sub04n71 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128513  25368 16384     queue myrinet at sub04n72 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128514  25368 16384     queue myrinet at sub04n75 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128515  25368 16384     queue myrinet at sub04n77 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128516  25368 16384     queue myrinet at sub04n78 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128517  25368 16384     queue myrinet at sub04n79 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128518  25368 16384     queue myrinet at sub04n81 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128519  25368 16384     queue myrinet at sub04n84 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128520  25368 16384     queue myrinet at sub04n85 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128521  25368 16384     queue myrinet at sub04n86 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128522  25368 16384     queue myrinet at sub04n87 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128523  25368 16384     queue myrinet at sub04n88 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128524  25368 16384     queue myrinet at sub04n89 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128525  25368 16384     queue myrinet at sub04n90 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128526  25368 16384     queue myrinet at sub04n91 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128527  25368 16384     queue myrinet at sub04n63 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128528  25368 16384     queue myrinet at sub04n64 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128529  25368 16384     queue myrinet at sub04n73 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128530  25368 16384     queue myrinet at sub04n74 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128531  25368 16384     queue opteronp at sub04n202 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128532  25368 16384     queue opteronp at sub04n205 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128533  25368 16384     queue opteronp at sub04n206 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128534  25368 16384     queue opteronp at sub04n208 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128535  25368 16384     queue parallel at sub04n121 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128536  25368 16384     queue parallel at sub04n139 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128537  25368 16384     queue parallel at sub04n140 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128538  25368 16384     queue parallel at sub04n141 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128539  25368 16384     queue parallel at sub04n142 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128540  25368 16384     queue parallel at sub04n143 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128541  25368 16384     queue parallel at sub04n144 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128542  25368 16384     queue parallel at sub04n146 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128543  25368 16384     queue parallel at sub04n02 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128544  25368 16384     queue parallel at sub04n03 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>>    
>>
>>>128545  25368 16384     queue parallel at sub04n04 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128546  25368 16384     queue parallel at sub04n05 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128547  25368 16384     queue parallel at sub04n06 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128548  25368 16384     queue parallel at sub04n07 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128549  25368 16384     queue parallel at sub04n08 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128550  25368 16384     queue parallel at sub04n09 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128551  25368 16384     queue parallel at sub04n10 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128552  25368 16384     queue parallel at sub04n11 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128553  25368 16384     verified threshold of 169 queues
>>>128554  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
>>>128555  25368 16384     Not enrolled ja_tasks: 0
>>>128556  25368 16384     Enrolled ja_tasks: 1
>>>128557  25368 16384     Not enrolled ja_tasks: 0
>>>128558  25368 16384     Enrolled ja_tasks: 1
>>>128559  25368 16384     Not enrolled ja_tasks: 0
>>>128560  25368 16384     Enrolled ja_tasks: 1
>>>128561  25368 16384     Not enrolled ja_tasks: 0
>>>128562  25368 16384     Enrolled ja_tasks: 1
>>>128563  25368 16384     Not enrolled ja_tasks: 0
>>>128564  25368 16384     Enrolled ja_tasks: 1
>>>128565  25368 16384     Not enrolled ja_tasks: 0
>>>128566  25368 16384     Enrolled ja_tasks: 1
>>>128567  25368 16384     Not enrolled ja_tasks: 0
>>>128568  25368 16384     Enrolled ja_tasks: 1
>>>128569  25368 16384     Not enrolled ja_tasks: 0
>>>128570  25368 16384     Enrolled ja_tasks: 1
>>>128571  25368 16384     Not enrolled ja_tasks: 0
>>>128572  25368 16384     Enrolled ja_tasks: 1
>>>128573  25368 16384     Not enrolled ja_tasks: 0
>>>128574  25368 16384     Enrolled ja_tasks: 1
>>>128575  25368 16384     Not enrolled ja_tasks: 0
>>>128576  25368 16384     Enrolled ja_tasks: 1
>>>128577  25368 16384     Not enrolled ja_tasks: 0
>>>128578  25368 16384     Enrolled ja_tasks: 1
>>>128579  25368 16384     Not enrolled ja_tasks: 0
>>>128580  25368 16384     Enrolled ja_tasks: 1
>>>128581  25368 16384     Not enrolled ja_tasks: 0
>>>128582  25368 16384     Enrolled ja_tasks: 1
>>>128583  25368 16384     Not enrolled ja_tasks: 0
>>>128584  25368 16384     Enrolled ja_tasks: 1
>>>128585  25368 16384     Not enrolled ja_tasks: 0
>>>128586  25368 16384     Enrolled ja_tasks: 1
>>>128587  25368 16384     Not enrolled ja_tasks: 0
>>>128588  25368 16384     Enrolled ja_tasks: 1
>>>128589  25368 16384     Not enrolled ja_tasks: 0
>>>128590  25368 16384     Enrolled ja_tasks: 1
>>>128591  25368 16384     Not enrolled ja_tasks: 0
>>>128592  25368 16384     Enrolled ja_tasks: 1
>>>128593  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
>>>128594  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128595  25368 16384        slots: 1.000000 * 1000.000000 * 6 
>>>      
>>>
>>   ---> 6000.000000
>>    
>>
>>>128596  25368 16384     slot request assumed for static 
>>>      
>>>
>>urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
>>    
>>
>>>128597  25368 16384        slots: 1.000000 * 1000.000000 * 
>>>      
>>>
>>20    ---> 20000.000000
>>    
>>
>>>128598  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128599  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128600  25368 16384        slots: 1.000000 * 1000.000000 * 6 
>>>      
>>>
>>   ---> 6000.000000
>>    
>>
>>>128601  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128602  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128603  25368 16384     slot request assumed for static 
>>>      
>>>
>>urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" setting "min"
>>    
>>
>>>128604  25368 16384        slots: 1.000000 * 1000.000000 * 2 
>>>      
>>>
>>   ---> 2000.000000
>>    
>>
>>>128605  25368 16384        slots: 1.000000 * 1000.000000 * 8 
>>>      
>>>
>>   ---> 8000.000000
>>    
>>
>>>128606  25368 16384     ASU min = 1000.00000000000, ASU max 
>>>      
>>>
>>= 20000.00000000000
>>    
>>
>>>128607  25368 16384     
>>>128608  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
>>>128609  25368 16384     
>>>128610  25368 16384     =====================[Pass 
>>>      
>>>
>>0]======================
>>    
>>
>>>128611  25368 16384     =====================[Pass 
>>>      
>>>
>>1]======================
>>    
>>
>>>128612  25368 16384     =====================[Pass 
>>>      
>>>
>>2]======================
>>    
>>
>>>128613  25368 16384     
>>>128614  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
>>>128615  25368 16384     
>>>128616  25368 16384     =====================[Pass 
>>>      
>>>
>>0]======================
>>    
>>
>>>128617  25368 16384     =====================[Pass 
>>>      
>>>
>>1]======================
>>    
>>
>>>128618  25368 16384     =====================[Pass 
>>>      
>>>
>>2]======================
>>    
>>
>>>128619  25368 16384     Normalizing tickets using 
>>>      
>>>
>>0.000000/18.333333 as min_tix/max_tix
>>    
>>
>>>128620  25368 16384        got 19 running jobs
>>>128621  25368 16384        added 19 ticket orders for running jobs
>>>128622  25368 16384        added 1 orders for updating usage of user
>>>128623  25368 16384        added 0 orders for updating usage 
>>>      
>>>
>>of project
>>    
>>
>>>128624  25368 16384        added 0 orders for updating share tree
>>>128625  25368 16384        added 1 orders for scheduler configuration
>>>128626  25368 16384     SENDING 22 ORDERS TO QMASTER
>>>128627  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
>>>128628  25368 16384     reresolve port timeout in 300
>>>128629  25368 16384     returning cached port value: 536
>>>--------------STOP-SCHEDULER-RUN-------------
>>>128630  25368 16384     ec_get retrieving events - will do 
>>>      
>>>
>>max 20 fetches
>>    
>>
>>>128631  25368 16384     doing sync fetch for messages, 20 still to do
>>>128632  25368 16384     try to get request from qmaster, id 1
>>>128633  25368 16384     Checking 84 events (44478-44561) 
>>>      
>>>
>>while waiting for #44478
>>    
>>
>>>128634  25368 16384     check complete, 84 events in list
>>>128635  25368 16384     got 84 events till 44561
>>>128636  25368 16384     doing async fetch for messages, 19 
>>>      
>>>
>>still to do
>>    
>>
>>>128637  25368 16384     try to get request from qmaster, id 1
>>>128638  25368 16384     reresolve port timeout in 280
>>>128639  25368 16384     returning cached port value: 536
>>>128640  25368 16384     Getting host by name - Linux
>>>128641  25368 16384     1 names in h_addr_list
>>>128642  25368 16384     0 names in h_aliases
>>>128643  25368 16384     Sent ack for all events lower or equal 44561
>>>128644  25368 16384     ec_get - received 84 events
>>>128645  25368 16384     44478. EVENT MOD EXECHOST sub04n167
>>>128646  25368 16384     44479. EVENT MOD EXECHOST sub04n63
>>>128647  25368 16384     44480. EVENT JOB 21542.1 task 
>>>      
>>>
>>2.sub04n63 USAGE
>>    
>>
>>>128648  25368 16384     44481. EVENT JOB 21542.1 task 
>>>      
>>>
>>1.sub04n63 USAGE
>>    
>>
>>>128649  25368 16384     44482. EVENT JOB 21542.1 USAGE
>>>128650  25368 16384     44483. EVENT MOD EXECHOST sub04n71
>>>128651  25368 16384     44484. EVENT JOB 21537.1 task 
>>>      
>>>
>>2.sub04n71 USAGE
>>    
>>
>>>128652  25368 16384     44485. EVENT JOB 21537.1 task 
>>>      
>>>
>>1.sub04n71 USAGE
>>    
>>
>>>128653  25368 16384     44486. EVENT MOD EXECHOST sub04n65
>>>128654  25368 16384     44487. EVENT JOB 21424.1 task 
>>>      
>>>
>>2.sub04n65 USAGE
>>    
>>
>>>128655  25368 16384     44488. EVENT JOB 21424.1 task 
>>>      
>>>
>>1.sub04n65 USAGE
>>    
>>
>>>128656  25368 16384     44489. EVENT MOD USER udo
>>>128657  25368 16384     44490. EVENT MOD USER iber
>>>128658  25368 16384     44491. EVENT MOD USER dieguez
>>>128659  25368 16384     44492. EVENT MOD USER karenjoh
>>>128660  25368 16384     44493. EVENT MOD USER lorenzo
>>>128661  25368 16384     44494. EVENT MOD USER parcolle
>>>128662  25368 16384     44495. EVENT MOD USER cfennie
>>>128663  25368 16384     44496. EVENT MOD USER civelli
>>>128664  25368 16384     44497. EVENT MOD EXECHOST sub04n25
>>>128665  25368 16384     44498. EVENT MOD EXECHOST sub04n144
>>>128666  25368 16384     44499. EVENT MOD EXECHOST sub04n206
>>>128667  25368 16384     44500. EVENT JOB 21441.1 USAGE
>>>128668  25368 16384     44501. EVENT MOD EXECHOST sub04n87
>>>128669  25368 16384     44502. EVENT JOB 21503.1 task 
>>>      
>>>
>>2.sub04n87 USAGE
>>    
>>
>>>128670  25368 16384     44503. EVENT JOB 21503.1 task 
>>>      
>>>
>>1.sub04n87 USAGE
>>    
>>
>>>128671  25368 16384     44504. EVENT MOD EXECHOST sub04n70
>>>128672  25368 16384     44505. EVENT JOB 21503.1 task 
>>>      
>>>
>>2.sub04n70 USAGE
>>    
>>
>>>128673  25368 16384     44506. EVENT JOB 21503.1 task 
>>>      
>>>
>>1.sub04n70 USAGE
>>    
>>
>>>128674  25368 16384     44507. EVENT JOB 21503.1 USAGE
>>>128675  25368 16384     44508. EVENT MOD EXECHOST sub04n19
>>>128676  25368 16384     44509. EVENT JOB 21338.1 USAGE
>>>128677  25368 16384     44510. EVENT MOD EXECHOST sub04n84
>>>128678  25368 16384     44511. EVENT JOB 21424.1 task 
>>>      
>>>
>>2.sub04n84 USAGE
>>    
>>
>>>128679  25368 16384     44512. EVENT JOB 21424.1 task 
>>>      
>>>
>>1.sub04n84 USAGE
>>    
>>
>>>128680  25368 16384     44513. EVENT MOD EXECHOST sub04n178
>>>128681  25368 16384     44514. EVENT MOD EXECHOST sub04n67
>>>128682  25368 16384     44515. EVENT JOB 21474.1 task 
>>>      
>>>
>>14.sub04n67 USAGE
>>    
>>
>>>128683  25368 16384     44516. EVENT JOB 21474.1 task 
>>>      
>>>
>>13.sub04n67 USAGE
>>    
>>
>>>128684  25368 16384     44517. EVENT JOB 21474.1 USAGE
>>>128685  25368 16384     44518. EVENT MOD EXECHOST sub04n27
>>>128686  25368 16384     44519. EVENT MOD EXECHOST sub04n34
>>>128687  25368 16384     44520. EVENT MOD EXECHOST sub04n72
>>>128688  25368 16384     44521. EVENT JOB 21537.1 task 
>>>      
>>>
>>2.sub04n72 USAGE
>>    
>>
>>>128689  25368 16384     44522. EVENT JOB 21537.1 task 
>>>      
>>>
>>1.sub04n72 USAGE
>>    
>>
>>>128690  25368 16384     44523. EVENT MOD EXECHOST sub04n78
>>>128691  25368 16384     44524. EVENT JOB 21507.1 task 
>>>      
>>>
>>6.sub04n78 USAGE
>>    
>>
>>>128692  25368 16384     44525. EVENT JOB 21507.1 task 
>>>      
>>>
>>5.sub04n78 USAGE
>>    
>>
>>>128693  25368 16384     44526. EVENT JOB 21507.1 USAGE
>>>128694  25368 16384     44527. EVENT MOD EXECHOST sub04n17
>>>128695  25368 16384     44528. EVENT MOD EXECHOST sub04n07
>>>128696  25368 16384     44529. EVENT MOD EXECHOST sub04n128
>>>128697  25368 16384     44530. EVENT MOD EXECHOST sub04n42
>>>128698  25368 16384     44531. EVENT MOD EXECHOST sub04n62
>>>128699  25368 16384     44532. EVENT JOB 21424.1 task 
>>>      
>>>
>>2.sub04n62 USAGE
>>    
>>
>>>128700  25368 16384     44533. EVENT JOB 21424.1 task 
>>>      
>>>
>>1.sub04n62 USAGE
>>    
>>
>>>128701  25368 16384     44534. EVENT JOB 21424.1 USAGE
>>>128702  25368 16384     44535. EVENT MOD EXECHOST sub04n10
>>>128703  25368 16384     44536. EVENT MOD EXECHOST sub04n77
>>>128704  25368 16384     44537. EVENT JOB 21537.1 task 
>>>      
>>>
>>2.sub04n77 USAGE
>>    
>>
>>>128705  25368 16384     44538. EVENT JOB 21537.1 task 
>>>      
>>>
>>1.sub04n77 USAGE
>>    
>>
>>>128706  25368 16384     44539. EVENT MOD EXECHOST sub04n11
>>>128707  25368 16384     44540. EVENT MOD EXECHOST sub04n02
>>>128708  25368 16384     44541. EVENT MOD EXECHOST sub04n120
>>>128709  25368 16384     44542. EVENT MOD EXECHOST sub04n115
>>>128710  25368 16384     44543. EVENT MOD EXECHOST sub04n101
>>>128711  25368 16384     44544. EVENT MOD EXECHOST sub04n66
>>>128712  25368 16384     44545. EVENT JOB 21537.1 task 
>>>      
>>>
>>2.sub04n66 USAGE
>>    
>>
>>>128713  25368 16384     44546. EVENT JOB 21537.1 task 
>>>      
>>>
>>1.sub04n66 USAGE
>>    
>>
>>>128714  25368 16384     44547. EVENT JOB 21537.1 USAGE
>>>128715  25368 16384     44548. EVENT MOD EXECHOST sub04n142
>>>128716  25368 16384     44549. EVENT MOD EXECHOST sub04n123
>>>128717  25368 16384     44550. EVENT MOD EXECHOST sub04n33
>>>128718  25368 16384     44551. EVENT MOD EXECHOST sub04n126
>>>128719  25368 16384     44552. EVENT MOD EXECHOST sub04n140
>>>128720  25368 16384     44553. EVENT MOD EXECHOST sub04n119
>>>128721  25368 16384     44554. EVENT MOD EXECHOST sub04n102
>>>128722  25368 16384     44555. EVENT MOD EXECHOST sub04n110
>>>128723  25368 16384     44556. EVENT MOD EXECHOST sub04n117
>>>128724  25368 16384     44557. EVENT MOD EXECHOST sub04n06
>>>128725  25368 16384     44558. EVENT MOD EXECHOST sub04n73
>>>128726  25368 16384     44559. EVENT JOB 21542.1 task 
>>>      
>>>
>>2.sub04n73 USAGE
>>    
>>
>>>128727  25368 16384     44560. EVENT JOB 21542.1 task 
>>>      
>>>
>>1.sub04n73 USAGE
>>    
>>
>>>128728  25368 16384     44561. EVENT MOD EXECHOST sub04n122
>>>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7, 
>>>      
>>>
>>CKPT:0 US:15 PR:4 S:nd:12/lf:7 
>>    
>>
>>>128729  25368 16384     
>>>      
>>>
>>================[SCHEDULING-EPOCH]==================
>>    
>>
>>>128730  25368 16384     JOB 20937.1 start_time = 1116447112 
>>>      
>>>
>>running_time 338119 decay_time = 450
>>    
>>
>>>128731  25368 16384     JOB 20938.1 start_time = 1116374344 
>>>      
>>>
>>running_time 410887 decay_time = 450
>>    
>>
>>>128732  25368 16384     JOB 21040.1 start_time = 1116443073 
>>>      
>>>
>>running_time 342158 decay_time = 450
>>    
>>
>>>128733  25368 16384     JOB 21076.1 start_time = 1116451351 
>>>      
>>>
>>running_time 333880 decay_time = 450
>>    
>>
>>>128734  25368 16384     JOB 21210.1 start_time = 1116514970 
>>>      
>>>
>>running_time 270261 decay_time = 450
>>    
>>
>>>128735  25368 16384     JOB 21213.1 start_time = 1116515250 
>>>      
>>>
>>running_time 269981 decay_time = 450
>>    
>>
>>>128736  25368 16384     JOB 21338.1 start_time = 1116543252 
>>>      
>>>
>>running_time 241979 decay_time = 450
>>    
>>
>>>128737  25368 16384     JOB 21423.1 start_time = 1116629274 
>>>      
>>>
>>running_time 155957 decay_time = 450
>>    
>>
>>>128738  25368 16384     JOB 21424.1 start_time = 1116631365 
>>>      
>>>
>>running_time 153866 decay_time = 450
>>    
>>
>>>128739  25368 16384     JOB 21440.1 start_time = 1116632934 
>>>      
>>>
>>running_time 152297 decay_time = 450
>>    
>>
>>>128740  25368 16384     JOB 21441.1 start_time = 1116632994 
>>>      
>>>
>>running_time 152237 decay_time = 450
>>    
>>
>>>128741  25368 16384     JOB 21443.1 start_time = 1116633602 
>>>      
>>>
>>running_time 151629 decay_time = 450
>>    
>>
>>>128742  25368 16384     JOB 21474.1 start_time = 1116655118 
>>>      
>>>
>>running_time 130113 decay_time = 450
>>    
>>
>>>128743  25368 16384     JOB 21503.1 start_time = 1116707395 
>>>      
>>>
>>running_time 77836 decay_time = 450
>>    
>>
>>>128744  25368 16384     JOB 21507.1 start_time = 1116714061 
>>>      
>>>
>>running_time 71170 decay_time = 450
>>    
>>
>>>128745  25368 16384     JOB 21528.1 start_time = 1116707641 
>>>      
>>>
>>running_time 77590 decay_time = 450
>>    
>>
>>>128746  25368 16384     JOB 21530.1 start_time = 1116714453 
>>>      
>>>
>>running_time 70778 decay_time = 450
>>    
>>
>>>128747  25368 16384     JOB 21537.1 start_time = 1116724845 
>>>      
>>>
>>running_time 60386 decay_time = 450
>>    
>>
>>>128748  25368 16384     JOB 21542.1 start_time = 1116782511 
>>>      
>>>
>>running_time 2720 decay_time = 450
>>    
>>
>>>128749  25368 16384     verified threshold of 169 queues
>>>128750  25368 16384     queue myrinet at sub04n61 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128751  25368 16384     queue myrinet at sub04n62 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128752  25368 16384     queue myrinet at sub04n65 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128753  25368 16384     queue myrinet at sub04n66 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128754  25368 16384     queue myrinet at sub04n67 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128755  25368 16384     queue myrinet at sub04n68 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128756  25368 16384     queue myrinet at sub04n69 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128757  25368 16384     queue myrinet at sub04n70 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>>    
>>
>>>128758  25368 16384     queue myrinet at sub04n71 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128759  25368 16384     queue myrinet at sub04n72 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128760  25368 16384     queue myrinet at sub04n75 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128761  25368 16384     queue myrinet at sub04n77 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128762  25368 16384     queue myrinet at sub04n78 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128763  25368 16384     queue myrinet at sub04n79 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128764  25368 16384     queue myrinet at sub04n81 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128765  25368 16384     queue myrinet at sub04n84 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128766  25368 16384     queue myrinet at sub04n85 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128767  25368 16384     queue myrinet at sub04n86 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128768  25368 16384     queue myrinet at sub04n87 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128769  25368 16384     queue myrinet at sub04n88 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128770  25368 16384     queue myrinet at sub04n89 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128771  25368 16384     queue myrinet at sub04n90 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128772  25368 16384     queue myrinet at sub04n91 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128773  25368 16384     queue myrinet at sub04n63 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128774  25368 16384     queue myrinet at sub04n64 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128775  25368 16384     queue myrinet at sub04n73 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128776  25368 16384     queue myrinet at sub04n74 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128777  25368 16384     queue opteronp at sub04n202 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128778  25368 16384     queue opteronp at sub04n205 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128779  25368 16384     queue opteronp at sub04n206 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128780  25368 16384     queue opteronp at sub04n208 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128781  25368 16384     queue parallel at sub04n121 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128782  25368 16384     queue parallel at sub04n139 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128783  25368 16384     queue parallel at sub04n140 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128784  25368 16384     queue parallel at sub04n141 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128785  25368 16384     queue parallel at sub04n142 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128786  25368 16384     queue parallel at sub04n143 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128787  25368 16384     queue parallel at sub04n144 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128788  25368 16384     queue parallel at sub04n146 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128789  25368 16384     queue parallel at sub04n02 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128790  25368 16384     queue parallel at sub04n03 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>>    
>>
>>>128791  25368 16384     queue parallel at sub04n04 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128792  25368 16384     queue parallel at sub04n05 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128793  25368 16384     queue parallel at sub04n06 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128794  25368 16384     queue parallel at sub04n07 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128795  25368 16384     queue parallel at sub04n08 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128796  25368 16384     queue parallel at sub04n09 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128797  25368 16384     queue parallel at sub04n10 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128798  25368 16384     queue parallel at sub04n11 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128799  25368 16384     verified threshold of 169 queues
>>>128800  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
>>>128801  25368 16384     Not enrolled ja_tasks: 0
>>>128802  25368 16384     Enrolled ja_tasks: 1
>>>128803  25368 16384     Not enrolled ja_tasks: 0
>>>128804  25368 16384     Enrolled ja_tasks: 1
>>>128805  25368 16384     Not enrolled ja_tasks: 0
>>>128806  25368 16384     Enrolled ja_tasks: 1
>>>128807  25368 16384     Not enrolled ja_tasks: 0
>>>128808  25368 16384     Enrolled ja_tasks: 1
>>>128809  25368 16384     Not enrolled ja_tasks: 0
>>>128810  25368 16384     Enrolled ja_tasks: 1
>>>128811  25368 16384     Not enrolled ja_tasks: 0
>>>128812  25368 16384     Enrolled ja_tasks: 1
>>>128813  25368 16384     Not enrolled ja_tasks: 0
>>>128814  25368 16384     Enrolled ja_tasks: 1
>>>128815  25368 16384     Not enrolled ja_tasks: 0
>>>128816  25368 16384     Enrolled ja_tasks: 1
>>>128817  25368 16384     Not enrolled ja_tasks: 0
>>>128818  25368 16384     Enrolled ja_tasks: 1
>>>128819  25368 16384     Not enrolled ja_tasks: 0
>>>128820  25368 16384     Enrolled ja_tasks: 1
>>>128821  25368 16384     Not enrolled ja_tasks: 0
>>>128822  25368 16384     Enrolled ja_tasks: 1
>>>128823  25368 16384     Not enrolled ja_tasks: 0
>>>128824  25368 16384     Enrolled ja_tasks: 1
>>>128825  25368 16384     Not enrolled ja_tasks: 0
>>>128826  25368 16384     Enrolled ja_tasks: 1
>>>128827  25368 16384     Not enrolled ja_tasks: 0
>>>128828  25368 16384     Enrolled ja_tasks: 1
>>>128829  25368 16384     Not enrolled ja_tasks: 0
>>>128830  25368 16384     Enrolled ja_tasks: 1
>>>128831  25368 16384     Not enrolled ja_tasks: 0
>>>128832  25368 16384     Enrolled ja_tasks: 1
>>>128833  25368 16384     Not enrolled ja_tasks: 0
>>>128834  25368 16384     Enrolled ja_tasks: 1
>>>128835  25368 16384     Not enrolled ja_tasks: 0
>>>128836  25368 16384     Enrolled ja_tasks: 1
>>>128837  25368 16384     Not enrolled ja_tasks: 0
>>>128838  25368 16384     Enrolled ja_tasks: 1
>>>128839  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
>>>128840  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128841  25368 16384        slots: 1.000000 * 1000.000000 * 6 
>>>      
>>>
>>   ---> 6000.000000
>>    
>>
>>>128842  25368 16384     slot request assumed for static 
>>>      
>>>
>>urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
>>    
>>
>>>128843  25368 16384        slots: 1.000000 * 1000.000000 * 
>>>      
>>>
>>20    ---> 20000.000000
>>    
>>
>>>128844  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128845  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128846  25368 16384        slots: 1.000000 * 1000.000000 * 6 
>>>      
>>>
>>   ---> 6000.000000
>>    
>>
>>>128847  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128848  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>128849  25368 16384     slot request assumed for static 
>>>      
>>>
>>urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" setting "min"
>>    
>>
>>>128850  25368 16384        slots: 1.000000 * 1000.000000 * 2 
>>>      
>>>
>>   ---> 2000.000000
>>    
>>
>>>128851  25368 16384        slots: 1.000000 * 1000.000000 * 8 
>>>      
>>>
>>   ---> 8000.000000
>>    
>>
>>>128852  25368 16384     ASU min = 1000.00000000000, ASU max 
>>>      
>>>
>>= 20000.00000000000
>>    
>>
>>>128853  25368 16384     
>>>128854  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
>>>128855  25368 16384     
>>>128856  25368 16384     =====================[Pass 
>>>      
>>>
>>0]======================
>>    
>>
>>>128857  25368 16384     =====================[Pass 
>>>      
>>>
>>1]======================
>>    
>>
>>>128858  25368 16384     =====================[Pass 
>>>      
>>>
>>2]======================
>>    
>>
>>>128859  25368 16384     
>>>128860  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
>>>128861  25368 16384     
>>>128862  25368 16384     =====================[Pass 
>>>      
>>>
>>0]======================
>>    
>>
>>>128863  25368 16384     =====================[Pass 
>>>      
>>>
>>1]======================
>>    
>>
>>>128864  25368 16384     =====================[Pass 
>>>      
>>>
>>2]======================
>>    
>>
>>>128865  25368 16384     Normalizing tickets using 
>>>      
>>>
>>0.000000/18.333333 as min_tix/max_tix
>>    
>>
>>>128866  25368 16384        got 19 running jobs
>>>128867  25368 16384        added 19 ticket orders for running jobs
>>>128868  25368 16384        added 1 orders for updating usage of user
>>>128869  25368 16384        added 0 orders for updating usage 
>>>      
>>>
>>of project
>>    
>>
>>>128870  25368 16384        added 0 orders for updating share tree
>>>128871  25368 16384        added 1 orders for scheduler configuration
>>>128872  25368 16384     SENDING 22 ORDERS TO QMASTER
>>>128873  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
>>>128874  25368 16384     reresolve port timeout in 280
>>>128875  25368 16384     returning cached port value: 536
>>>--------------STOP-SCHEDULER-RUN-------------
>>>128876  25368 16384     ec_get retrieving events - will do 
>>>      
>>>
>>max 20 fetches
>>    
>>
>>>128877  25368 16384     doing sync fetch for messages, 20 still to do
>>>128878  25368 16384     try to get request from qmaster, id 1
>>>128879  25368 16384     Checking 55 events (44562-44616) 
>>>      
>>>
>>while waiting for #44562
>>    
>>
>>>128880  25368 16384     check complete, 55 events in list
>>>128881  25368 16384     got 55 events till 44616
>>>128882  25368 16384     doing async fetch for messages, 19 
>>>      
>>>
>>still to do
>>    
>>
>>>128883  25368 16384     try to get request from qmaster, id 1
>>>128884  25368 16384     reresolve port timeout in 260
>>>128885  25368 16384     returning cached port value: 536
>>>128886  25368 16384     Sent ack for all events lower or equal 44616
>>>128887  25368 16384     ec_get - received 55 events
>>>128888  25368 16384     44562. EVENT MOD EXECHOST sub04n147
>>>128889  25368 16384     44563. EVENT MOD USER udo
>>>128890  25368 16384     44564. EVENT MOD USER iber
>>>128891  25368 16384     44565. EVENT MOD USER dieguez
>>>128892  25368 16384     44566. EVENT MOD USER karenjoh
>>>128893  25368 16384     44567. EVENT MOD USER lorenzo
>>>128894  25368 16384     44568. EVENT MOD USER parcolle
>>>128895  25368 16384     44569. EVENT MOD USER cfennie
>>>128896  25368 16384     44570. EVENT MOD USER civelli
>>>128897  25368 16384     44571. EVENT MOD EXECHOST sub04n135
>>>128898  25368 16384     44572. EVENT MOD EXECHOST sub04n141
>>>128899  25368 16384     44573. EVENT MOD EXECHOST sub04n127
>>>128900  25368 16384     44574. EVENT MOD EXECHOST sub04n145
>>>128901  25368 16384     44575. EVENT MOD EXECHOST sub04n133
>>>128902  25368 16384     44576. EVENT MOD EXECHOST sub04n148
>>>128903  25368 16384     44577. EVENT MOD EXECHOST sub04n74
>>>128904  25368 16384     44578. EVENT JOB 21542.1 task 
>>>      
>>>
>>2.sub04n74 USAGE
>>    
>>
>>>128905  25368 16384     44579. EVENT JOB 21542.1 task 
>>>      
>>>
>>1.sub04n74 USAGE
>>    
>>
>>>128906  25368 16384     44580. EVENT MOD EXECHOST rupc03.rutgers.edu
>>>128907  25368 16384     44581. EVENT MOD EXECHOST sub04n139
>>>128908  25368 16384     44582. EVENT MOD EXECHOST rupc02.rutgers.edu
>>>128909  25368 16384     44583. EVENT MOD EXECHOST sub04n80
>>>128910  25368 16384     44584. EVENT MOD EXECHOST sub04n207
>>>128911  25368 16384     44585. EVENT MOD EXECHOST sub04n180
>>>128912  25368 16384     44586. EVENT MOD EXECHOST sub04n23
>>>128913  25368 16384     44587. EVENT MOD EXECHOST sub04n30
>>>128914  25368 16384     44588. EVENT MOD EXECHOST sub04n203
>>>128915  25368 16384     44589. EVENT MOD EXECHOST sub04n109
>>>128916  25368 16384     44590. EVENT MOD EXECHOST rupc04.rutgers.edu
>>>128917  25368 16384     44591. EVENT MOD EXECHOST sub04n114
>>>128918  25368 16384     44592. EVENT MOD EXECHOST sub04n106
>>>128919  25368 16384     44593. EVENT MOD EXECHOST sub04n88
>>>128920  25368 16384     44594. EVENT JOB 21507.1 task 
>>>      
>>>
>>6.sub04n88 USAGE
>>    
>>
>>>128921  25368 16384     44595. EVENT JOB 21507.1 task 
>>>      
>>>
>>5.sub04n88 USAGE
>>    
>>
>>>128922  25368 16384     44596. EVENT MOD EXECHOST sub04n157
>>>128923  25368 16384     44597. EVENT MOD EXECHOST sub04n20
>>>128924  25368 16384     44598. EVENT MOD EXECHOST sub04n156
>>>128925  25368 16384     44599. EVENT MOD EXECHOST sub04n26
>>>128926  25368 16384     44600. EVENT JOB 21213.1 USAGE
>>>128927  25368 16384     44601. EVENT MOD EXECHOST sub04n09
>>>128928  25368 16384     44602. EVENT MOD EXECHOST sub04n05
>>>128929  25368 16384     44603. EVENT MOD EXECHOST sub04n103
>>>128930  25368 16384     44604. EVENT MOD EXECHOST sub04n164
>>>128931  25368 16384     44605. EVENT MOD EXECHOST sub04n105
>>>128932  25368 16384     44606. EVENT MOD EXECHOST sub04n113
>>>128933  25368 16384     44607. EVENT MOD EXECHOST sub04n28
>>>128934  25368 16384     44608. EVENT MOD EXECHOST sub04n76
>>>128935  25368 16384     44609. EVENT MOD EXECHOST sub04n162
>>>128936  25368 16384     44610. EVENT MOD EXECHOST sub04n108
>>>128937  25368 16384     44611. EVENT MOD EXECHOST sub04n38
>>>128938  25368 16384     44612. EVENT MOD EXECHOST sub04n116
>>>128939  25368 16384     44613. EVENT MOD EXECHOST sub04n179
>>>128940  25368 16384     44614. EVENT MOD EXECHOST sub04n04
>>>128941  25368 16384     44615. EVENT MOD EXECHOST sub04n160
>>>128942  25368 16384     44616. EVENT MOD EXECHOST sub04n107
>>>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7, 
>>>      
>>>
>>CKPT:0 US:15 PR:4 S:nd:12/lf:7 
>>    
>>
>>>128943  25368 16384     
>>>      
>>>
>>================[SCHEDULING-EPOCH]==================
>>    
>>
>>>128944  25368 16384     JOB 20937.1 start_time = 1116447112 
>>>      
>>>
>>running_time 338139 decay_time = 450
>>    
>>
>>>128945  25368 16384     JOB 20938.1 start_time = 1116374344 
>>>      
>>>
>>running_time 410907 decay_time = 450
>>    
>>
>>>128946  25368 16384     JOB 21040.1 start_time = 1116443073 
>>>      
>>>
>>running_time 342178 decay_time = 450
>>    
>>
>>>128947  25368 16384     JOB 21076.1 start_time = 1116451351 
>>>      
>>>
>>running_time 333900 decay_time = 450
>>    
>>
>>>128948  25368 16384     JOB 21210.1 start_time = 1116514970 
>>>      
>>>
>>running_time 270281 decay_time = 450
>>    
>>
>>>128949  25368 16384     JOB 21213.1 start_time = 1116515250 
>>>      
>>>
>>running_time 270001 decay_time = 450
>>    
>>
>>>128950  25368 16384     JOB 21338.1 start_time = 1116543252 
>>>      
>>>
>>running_time 241999 decay_time = 450
>>    
>>
>>>128951  25368 16384     JOB 21423.1 start_time = 1116629274 
>>>      
>>>
>>running_time 155977 decay_time = 450
>>    
>>
>>>128952  25368 16384     JOB 21424.1 start_time = 1116631365 
>>>      
>>>
>>running_time 153886 decay_time = 450
>>    
>>
>>>128953  25368 16384     JOB 21440.1 start_time = 1116632934 
>>>      
>>>
>>running_time 152317 decay_time = 450
>>    
>>
>>>128954  25368 16384     JOB 21441.1 start_time = 1116632994 
>>>      
>>>
>>running_time 152257 decay_time = 450
>>    
>>
>>>128955  25368 16384     JOB 21443.1 start_time = 1116633602 
>>>      
>>>
>>running_time 151649 decay_time = 450
>>    
>>
>>>128956  25368 16384     JOB 21474.1 start_time = 1116655118 
>>>      
>>>
>>running_time 130133 decay_time = 450
>>    
>>
>>>128957  25368 16384     JOB 21503.1 start_time = 1116707395 
>>>      
>>>
>>running_time 77856 decay_time = 450
>>    
>>
>>>128958  25368 16384     JOB 21507.1 start_time = 1116714061 
>>>      
>>>
>>running_time 71190 decay_time = 450
>>    
>>
>>>128959  25368 16384     JOB 21528.1 start_time = 1116707641 
>>>      
>>>
>>running_time 77610 decay_time = 450
>>    
>>
>>>128960  25368 16384     JOB 21530.1 start_time = 1116714453 
>>>      
>>>
>>running_time 70798 decay_time = 450
>>    
>>
>>>128961  25368 16384     JOB 21537.1 start_time = 1116724845 
>>>      
>>>
>>running_time 60406 decay_time = 450
>>    
>>
>>>128962  25368 16384     JOB 21542.1 start_time = 1116782511 
>>>      
>>>
>>running_time 2740 decay_time = 450
>>    
>>
>>>128963  25368 16384     verified threshold of 169 queues
>>>128964  25368 16384     queue myrinet at sub04n61 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128965  25368 16384     queue myrinet at sub04n62 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128966  25368 16384     queue myrinet at sub04n65 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128967  25368 16384     queue myrinet at sub04n66 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128968  25368 16384     queue myrinet at sub04n67 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128969  25368 16384     queue myrinet at sub04n68 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128970  25368 16384     queue myrinet at sub04n69 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128971  25368 16384     queue myrinet at sub04n70 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>>    
>>
>>>128972  25368 16384     queue myrinet at sub04n71 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128973  25368 16384     queue myrinet at sub04n72 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128974  25368 16384     queue myrinet at sub04n75 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128975  25368 16384     queue myrinet at sub04n77 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128976  25368 16384     queue myrinet at sub04n78 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128977  25368 16384     queue myrinet at sub04n79 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128978  25368 16384     queue myrinet at sub04n81 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128979  25368 16384     queue myrinet at sub04n84 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128980  25368 16384     queue myrinet at sub04n85 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128981  25368 16384     queue myrinet at sub04n86 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128982  25368 16384     queue myrinet at sub04n87 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128983  25368 16384     queue myrinet at sub04n88 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128984  25368 16384     queue myrinet at sub04n89 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128985  25368 16384     queue myrinet at sub04n90 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128986  25368 16384     queue myrinet at sub04n91 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128987  25368 16384     queue myrinet at sub04n63 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128988  25368 16384     queue myrinet at sub04n64 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128989  25368 16384     queue myrinet at sub04n73 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128990  25368 16384     queue myrinet at sub04n74 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128991  25368 16384     queue opteronp at sub04n202 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128992  25368 16384     queue opteronp at sub04n205 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128993  25368 16384     queue opteronp at sub04n206 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128994  25368 16384     queue opteronp at sub04n208 tagged to 
>>>      
>>>
>>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
>>    
>>
>>>128995  25368 16384     queue parallel at sub04n121 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>128996  25368 16384     queue parallel at sub04n139 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128997  25368 16384     queue parallel at sub04n140 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128998  25368 16384     queue parallel at sub04n141 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>128999  25368 16384     queue parallel at sub04n142 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>129000  25368 16384     queue parallel at sub04n143 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>129001  25368 16384     queue parallel at sub04n144 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>129002  25368 16384     queue parallel at sub04n146 tagged to 
>>>      
>>>
>>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>129003  25368 16384     queue parallel at sub04n02 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>129004  25368 16384     queue parallel at sub04n03 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
>>    
>>
>>>129005  25368 16384     queue parallel at sub04n04 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>129006  25368 16384     queue parallel at sub04n05 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>129007  25368 16384     queue parallel at sub04n06 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>129008  25368 16384     queue parallel at sub04n07 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>129009  25368 16384     queue parallel at sub04n08 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
>>    
>>
>>>129010  25368 16384     queue parallel at sub04n09 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>129011  25368 16384     queue parallel at sub04n10 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>129012  25368 16384     queue parallel at sub04n11 tagged to be 
>>>      
>>>
>>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
>>    
>>
>>>129013  25368 16384     verified threshold of 169 queues
>>>129014  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
>>>129015  25368 16384     Not enrolled ja_tasks: 0
>>>129016  25368 16384     Enrolled ja_tasks: 1
>>>129017  25368 16384     Not enrolled ja_tasks: 0
>>>129018  25368 16384     Enrolled ja_tasks: 1
>>>129019  25368 16384     Not enrolled ja_tasks: 0
>>>129020  25368 16384     Enrolled ja_tasks: 1
>>>129021  25368 16384     Not enrolled ja_tasks: 0
>>>129022  25368 16384     Enrolled ja_tasks: 1
>>>129023  25368 16384     Not enrolled ja_tasks: 0
>>>129024  25368 16384     Enrolled ja_tasks: 1
>>>129025  25368 16384     Not enrolled ja_tasks: 0
>>>129026  25368 16384     Enrolled ja_tasks: 1
>>>129027  25368 16384     Not enrolled ja_tasks: 0
>>>129028  25368 16384     Enrolled ja_tasks: 1
>>>129029  25368 16384     Not enrolled ja_tasks: 0
>>>129030  25368 16384     Enrolled ja_tasks: 1
>>>129031  25368 16384     Not enrolled ja_tasks: 0
>>>129032  25368 16384     Enrolled ja_tasks: 1
>>>129033  25368 16384     Not enrolled ja_tasks: 0
>>>129034  25368 16384     Enrolled ja_tasks: 1
>>>129035  25368 16384     Not enrolled ja_tasks: 0
>>>129036  25368 16384     Enrolled ja_tasks: 1
>>>129037  25368 16384     Not enrolled ja_tasks: 0
>>>129038  25368 16384     Enrolled ja_tasks: 1
>>>129039  25368 16384     Not enrolled ja_tasks: 0
>>>129040  25368 16384     Enrolled ja_tasks: 1
>>>129041  25368 16384     Not enrolled ja_tasks: 0
>>>129042  25368 16384     Enrolled ja_tasks: 1
>>>129043  25368 16384     Not enrolled ja_tasks: 0
>>>129044  25368 16384     Enrolled ja_tasks: 1
>>>129045  25368 16384     Not enrolled ja_tasks: 0
>>>129046  25368 16384     Enrolled ja_tasks: 1
>>>129047  25368 16384     Not enrolled ja_tasks: 0
>>>129048  25368 16384     Enrolled ja_tasks: 1
>>>129049  25368 16384     Not enrolled ja_tasks: 0
>>>129050  25368 16384     Enrolled ja_tasks: 1
>>>129051  25368 16384     Not enrolled ja_tasks: 0
>>>129052  25368 16384     Enrolled ja_tasks: 1
>>>129053  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
>>>129054  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>129055  25368 16384        slots: 1.000000 * 1000.000000 * 6 
>>>      
>>>
>>   ---> 6000.000000
>>    
>>
>>>129056  25368 16384     slot request assumed for static 
>>>      
>>>
>>urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
>>    
>>
>>>129057  25368 16384        slots: 1.000000 * 1000.000000 * 
>>>      
>>>
>>20    ---> 20000.000000
>>    
>>
>>>129058  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>129059  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>129060  25368 16384        slots: 1.000000 * 1000.000000 * 6 
>>>      
>>>
>>   ---> 6000.000000
>>    
>>
>>>129061  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>129062  25368 16384        slots: 1.000000 * 1000.000000 * 1 
>>>      
>>>
>>   ---> 1000.000000
>>    
>>
>>>129063  25368 16384     slot request assumed for static 
>>>      
>>>
>>urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" setting "min"
>>    
>>
>>>129064  25368 16384        slots: 1.000000 * 1000.000000 * 2 
>>>      
>>>
>>   ---> 2000.000000
>>    
>>
>>>129065  25368 16384        slots: 1.000000 * 1000.000000 * 8 
>>>      
>>>
>>   ---> 8000.000000
>>    
>>
>>>129066  25368 16384     ASU min = 1000.00000000000, ASU max 
>>>      
>>>
>>= 20000.00000000000
>>    
>>
>>>129067  25368 16384     
>>>129068  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
>>>129069  25368 16384     
>>>129070  25368 16384     =====================[Pass 
>>>      
>>>
>>0]======================
>>    
>>
>>>129071  25368 16384     =====================[Pass 
>>>      
>>>
>>1]======================
>>    
>>
>>>129072  25368 16384     =====================[Pass 
>>>      
>>>
>>2]======================
>>    
>>
>>>129073  25368 16384     
>>>129074  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
>>>129075  25368 16384     
>>>129076  25368 16384     =====================[Pass 
>>>      
>>>
>>0]======================
>>    
>>
>>>129077  25368 16384     =====================[Pass 
>>>      
>>>
>>1]======================
>>    
>>
>>>129078  25368 16384     =====================[Pass 
>>>      
>>>
>>2]======================
>>    
>>
>>>129079  25368 16384     Normalizing tickets using 
>>>      
>>>
>>0.000000/18.333333 as min_tix/max_tix
>>    
>>
>>>129080  25368 16384        got 19 running jobs
>>>129081  25368 16384        added 19 ticket orders for running jobs
>>>129082  25368 16384        added 1 orders for updating usage of user
>>>129083  25368 16384        added 0 orders for updating usage 
>>>      
>>>
>>of project
>>    
>>
>>>129084  25368 16384        added 0 orders for updating share tree
>>>129085  25368 16384        added 1 orders for scheduler configuration
>>>129086  25368 16384     SENDING 22 ORDERS TO QMASTER
>>>129087  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
>>>129088  25368 16384     reresolve port timeout in 260
>>>129089  25368 16384     returning cached port value: 536
>>>--------------STOP-SCHEDULER-RUN-------------
>>>129090  25368 16384     ec_get retrieving events - will do 
>>>      
>>>
>>max 20 fetches
>>    
>>
>>>129091  25368 16384     doing sync fetch for messages, 20 still to do
>>>129092  25368 16384     try to get request from qmaster, id 1
>>>129093  25368 16384     Checking 154 events (44617-44770) 
>>>      
>>>
>>while waiting for #44617
>>    
>>
>>>129094  25368 16384     check complete, 154 events in list
>>>129095  25368 16384     got 154 events till 44770
>>>129096  25368 16384     doing async fetch for messages, 19 
>>>      
>>>
>>still to do
>>    
>>
>>>129097  25368 16384     try to get request from qmaster, id 1
>>>129098  25368 16384     reresolve port timeout in 240
>>>129099  25368 16384     returning cached port value: 536
>>>129100  25368 16384     Sent ack for all events lower or equal 44770
>>>129101  25368 16384     ec_get - received 154 events
>>>129102  25368 16384     44617. EVENT MOD EXECHOST sub04n08
>>>129103  25368 16384     44618. EVENT MOD EXECHOST sub04n166
>>>129104  25368 16384     44619. EVENT MOD EXECHOST sub04n168
>>>129105  25368 16384     44620. EVENT MOD EXECHOST sub04n112
>>>129106  25368 16384     44621. EVENT MOD EXECHOST sub04n90
>>>129107  25368 16384     44622. EVENT JOB 21503.1 task 
>>>      
>>>
>>2.sub04n90 USAGE
>>    
>>
>>>129108  25368 16384     44623. EVENT JOB 21503.1 task 
>>>      
>>>
>>1.sub04n90 USAGE
>>    
>>
>>>129109  25368 16384     44624. EVENT MOD USER udo
>>>129110  25368 16384     44625. EVENT MOD USER iber
>>>129111  25368 16384     44626. EVENT MOD USER dieguez
>>>129112  25368 16384     44627. EVENT MOD USER karenjoh
>>>129113  25368 16384     44628. EVENT MOD USER lorenzo
>>>129114  25368 16384     44629. EVENT MOD USER parcolle
>>>129115  25368 16384     44630. EVENT MOD USER cfennie
>>>129116  25368 16384     44631. EVENT MOD USER civelli
>>>129117  25368 16384     44632. EVENT MOD EXECHOST sub04n14
>>>129118  25368 16384     44633. EVENT MOD EXECHOST sub04n75
>>>129119  25368 16384     44634. EVENT JOB 21040.1 task 
>>>      
>>>
>>6.sub04n75 USAGE
>>    
>>
>>>129120  25368 16384     44635. EVENT JOB 21040.1 task 
>>>      
>>>
>>5.sub04n75 USAGE
>>    
>>
>>>129121  25368 16384     44636. EVENT MOD EXECHOST sub04n150
>>>129122  25368 16384     44637. EVENT MOD EXECHOST sub04n169
>>>129123  25368 16384     44638. EVENT MOD EXECHOST sub04n165
>>>129124  25368 16384     44639. EVENT MOD EXECHOST sub04n136
>>>129125  25368 16384     44640. EVENT MOD EXECHOST sub04n176
>>>129126  25368 16384     44641. EVENT MOD EXECHOST sub04n81
>>>129127  25368 16384     44642. EVENT JOB 21507.1 task 
>>>      
>>>
>>6.sub04n81 USAGE
>>    
>>
>>>129128  25368 16384     44643. EVENT JOB 21507.1 task 
>>>      
>>>
>>5.sub04n81 USAGE
>>    
>>
>>>129129  25368 16384     44644. EVENT JOB 21507.1 task 
>>>      
>>>
>>past_usage USAGE
>>    
>>
>>>129130  25368 16384     44645. EVENT DEL PETASK 21507.1 task 
>>>      
>>>
>>6.sub04n88
>>    
>>
>>>129131  25368 16384     44646. EVENT JOB 21507.1 task 
>>>      
>>>
>>past_usage USAGE
>>    
>>
>>>129132  25368 16384     44647. EVENT DEL PETASK 21507.1 task 
>>>      
>>>
>>6.sub04n78
>>    
>>
>>>129133  25368 16384     44648. EVENT JOB 21507.1 task 
>>>      
>>>
>>past_usage USAGE
>>    
>>
>>>129134  25368 16384     44649. EVENT DEL PETASK 21507.1 task 
>>>      
>>>
>>6.sub04n81
>>    
>>
>>>129135  25368 16384     44650. EVENT JOB 21507.1 task 
>>>      
>>>
>>past_usage USAGE
>>    
>>
>>>129136  25368 16384     44651. EVENT DEL PETASK 21507.1 task 
>>>      
>>>
>>5.sub04n81
>>    
>>
>>>129137  25368 16384     44652. EVENT JOB 21507.1 task 
>>>      
>>>
>>past_usage USAGE
>>    
>>
>>>129138  25368 16384     44653. EVENT DEL PETASK 21507.1 task 
>>>      
>>>
>>5.sub04n88
>>    
>>
>>>129139  25368 16384     44654. EVENT JOB 21507.1 task 
>>>      
>>>
>>past_usage USAGE
>>    
>>
>>>129140  25368 16384     44655. EVENT DEL PETASK 21507.1 task 
>>>      
>>>
>>5.sub04n78
>>    
>>
>>>129141  25368 16384     44656. EVENT MOD EXECHOST sub04n161
>>>129142  25368 16384     44657. EVENT MOD EXECHOST sub04n124
>>>129143  25368 16384     44658. EVENT ADD PETASK 21507.1 task 
>>>      
>>>
>>7.sub04n88
>>    
>>
>>>129144  25368 16384     44659. EVENT ADD PETASK 21507.1 task 
>>>      
>>>
>>7.sub04n78
>>    
>>
>>>129145  25368 16384     44660. EVENT MOD EXECHOST sub04n158
>>>129146  25368 16384     44661. EVENT MOD EXECHOST sub04n01
>>>129147  25368 16384     44662. EVENT MOD EXECHOST sub04n159
>>>129148  25368 16384     44663. EVENT ADD PETASK 21507.1 task 
>>>      
>>>
>>7.sub04n81
>>    
>>
>>>129149  25368 16384     44664. EVENT MOD EXECHOST sub04n134
>>>129150  25368 16384     44665. EVENT ADD PETASK 21507.1 task 
>>>      
>>>
>>8.sub04n88
>>    
>>
>>>129151  25368 16384     44666. EVENT ADD PETASK 21507.1 task 
>>>      
>>>
>>8.sub04n78
>>    
>>
>>>129152  25368 16384     44667. EVENT ADD PETASK 21507.1 task 
>>>      
>>>
>>8.sub04n81
>>    
>>
>>>129153  25368 16384     44668. EVENT MOD EXECHOST sub04n121
>>>129154  25368 16384     44669. EVENT MOD EXECHOST sub04n143
>>>129155  25368 16384     44670. EVENT MOD EXECHOST sub04n15
>>>129156  25368 16384     44671. EVENT MOD EXECHOST sub04n13
>>>129157  25368 16384     44672. EVENT MOD EXECHOST sub04n64
>>>129158  25368 16384     44673. EVENT JOB 21542.1 task 
>>>      
>>>
>>2.sub04n64 USAGE
>>    
>>
>>>129159  25368 16384     44674. EVENT JOB 21542.1 task 
>>>      
>>>
>>1.sub04n64 USAGE
>>    
>>
>>>129160  25368 16384     44675. EVENT MOD EXECHOST sub04n118
>>>129161  25368 16384     44676. EVENT MOD EXECHOST sub04n151
>>>129162  25368 16384     44677. EVENT MOD EXECHOST sub04n154
>>>129163  25368 16384     44678. EVENT MOD EXECHOST sub04n149
>>>129164  25368 16384     44679. EVENT MOD EXECHOST sub04n16
>>>129165  25368 16384     44680. EVENT MOD EXECHOST sub04n155
>>>129166  25368 16384     44681. EVENT MOD EXECHOST sub04n152
>>>129167  25368 16384     44682. EVENT MOD EXECHOST sub04n163
>>>129168  25368 16384     44683. EVENT MOD EXECHOST sub04n43
>>>129169  25368 16384     44684. EVENT MOD EXECHOST sub04n86
>>>129170  25368 16384     44685. EVENT JOB 21423.1 task 
>>>      
>>>
>>2.sub04n86 USAGE
>>    
>>
>>>129171  25368 16384     44686. EVENT JOB 21423.1 task 
>>>      
>>>
>>1.sub04n86 USAGE
>>    
>>
>>>129172  25368 16384     44687. EVENT MOD EXECHOST sub04n03
>>>129173  25368 16384     44688. EVENT JOB 21076.1 USAGE
>>>129174  25368 16384     44689. EVENT MOD EXECHOST sub04n204
>>>129175  25368 16384     44690. EVENT MOD EXECHOST rupc01.rutgers.edu
>>>129176  25368 16384     44691. EVENT MOD EXECHOST sub04n125
>>>129177  25368 16384     44692. EVENT MOD EXECHOST sub04n44
>>>129178  25368 16384     44693. EVENT MOD EXECHOST sub04n32
>>>129179  25368 16384     44694. EVENT MOD EXECHOST sub04n21
>>>129180  25368 16384     44695. EVENT MOD EXECHOST sub04n22
>>>129181  25368 16384     44696. EVENT MOD EXECHOST sub04n35
>>>129182  25368 16384     44697. EVENT MOD EXECHOST sub04n201
>>>129183  25368 16384     44698. EVENT MOD EXECHOST sub04n205
>>>129184  25368 16384     44699. EVENT JOB 21440.1 USAGE
>>>129185  25368 16384     44700. EVENT MOD EXECHOST sub04n111
>>>129186  25368 16384     44701. EVENT MOD EXECHOST sub04n89
>>>129187  25368 16384     44702. EVENT JOB 21530.1 task 
>>>      
>>>
>>2.sub04n89 USAGE
>>    
>>
>>>129188  25368 16384     44703. EVENT JOB 21530.1 task 
>>>      
>>>
>>1.sub04n89 USAGE
>>    
>>
>>>129189  25368 16384     44704. EVENT JOB 21530.1 USAGE
>>>129190  25368 16384     44705. EVENT MOD EXECHOST sub04n177
>>>129191  25368 16384     44706. EVENT MOD EXECHOST sub04n146
>>>129192  25368 16384     44707. EVENT ADD PETASK 21507.1 task 
>>>      
>>>
>>9.sub04n88
>>    
>>
>>>129193  25368 16384     44708. EVENT JOB 21507.1 task 
>>>      
>>>
>>past_usage USAGE
>>    
>>
>>>129194  25368 16384     44709. EVENT DEL PETASK 21507.1 task 
>>>      
>>>
>>7.sub04n88
>>    
>>
>>>Segmentation fault
>>>You have new mail in /var/spool/mail/root rupc-cs04b:/opt/SGE/util #
>>>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>      
>>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>+++++++++++++++++++++
>>    
>>
>>>/opt/SGE/default/spool/qmaster
>>>
>>>Sun May 22 14:25:16 EDT 2005
>>>05/22/2005 00:20:01|qmaster|rupc-cs04b|E|event client "scheduler" 
>>>(rupc-cs04b/schedd/1) reregistered - it will need a total update 
>>>05/22/2005 00:32:40|qmaster|rupc-cs04b|W|job 21538.1 failed on host 
>>>sub04n63 in recognising job because: execd doesn't know this job 
>>>05/22/2005 00:32:49|qmaster|rupc-cs04b|E|execd sub04n63 
>>>      
>>>
>>reports running 
>>    
>>
>>>state for job (21538.1/master) in queue "myrinet at sub04n63" 
>>>      
>>>
>>while job is 
>>    
>>
>>>in state 65536 05/22/2005 
>>>      
>>>
>>00:33:49|qmaster|rupc-cs04b|E|execd at sub04n63 
>>    
>>
>>>reports running job (21538.1/master) in queue 
>>>      
>>>
>>"myrinet at sub04n63" that 
>>    
>>
>>>was not supposed to be there - killing 05/22/2005 
>>>02:10:01|qmaster|rupc-cs04b|E|event client "scheduler" 
>>>(rupc-cs04b/schedd/1) reregistered - it will need a total update 
>>>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
>>>(1035) is not uptodate (1036) for user/project "udo" 05/22/2005 
>>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
>>>      
>>>
>>(1035) is not 
>>    
>>
>>>uptodate (1036) for user/project "iber" 05/22/2005 
>>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
>>>      
>>>
>>(1035) is not 
>>    
>>
>>>uptodate (1036) for user/project "dieguez" 05/22/2005 
>>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
>>>      
>>>
>>(1035) is not 
>>    
>>
>>>uptodate (1036) for user/project "zayak" 05/22/2005 
>>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
>>>      
>>>
>>(1035) is not 
>>    
>>
>>>uptodate (1036) for user/project "karenjoh" 05/22/2005 
>>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
>>>      
>>>
>>(1035) is not 
>>    
>>
>>>uptodate (1036) for user/project "lorenzo" 05/22/2005 
>>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
>>>      
>>>
>>(1035) is not uptodate (1036) for user/project "parcolle" 
>>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project 
>>version (1035) is not uptodate (1036) for user/project 
>>"cfennie" 05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders 
>>user/project version (1035) is not uptodate (1036) for 
>>user/project "civelli" 05/22/2005 
>>02:34:06|qmaster|rupc-cs04b|E|orders user/project version 
>>(1044) is not uptodate (1045) for user/project "udo" 
>>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project 
>>version (1044) is not uptodate (1045) for user/project "iber" 
>>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project 
>>version (1044) is not uptodate (1045) for user/project 
>>"dieguez" 05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders 
>>user/project version (1044) is not uptodate (1045) for 
>>user/project "zayak" 05/22/2005 
>>02:34:06|qmaster|rupc-cs04b|E|orders user/project version 
>>(1044) is not uptodate (1045) for user/project "karenjoh" 
>>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project 
>>version (1044) is not uptodate (1045) for user/project 
>>"lorenzo" 05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders 
>>user/project version (1044) is not uptodate (1045) for 
>>user/project "parcolle" 05/22/2005 
>>02:34:06|qmaster|rupc-cs04b|E|orders user/project version 
>>(1044) is not uptodate (1045) for user/project "cfennie" 
>>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project 
>>version (1044) is not uptodate (1045) for user/project 
>>"civelli" 05/22/2005 03:02:47|qmaster|rupc-cs04b|E|tightly 
>>integrated parallel task 21539.1 task 3.sub04n83 failed - killing job
>>    
>>
>>>05/22/2005 03:10:01|qmaster|rupc-cs04b|E|event client 
>>>      
>>>
>>"scheduler" (rupc-cs04b/schedd/1) reregistered - it will need 
>>a total update    <-- YOU SEE THESE 2 lines : THE SCHEDULER 
>>DIED EVEN WITHOUT ANY EVENTS , JUST by itself !!!
>>    
>>
>>>05/22/2005 07:30:01|qmaster|rupc-cs04b|E|event client 
>>>      
>>>
>>"scheduler" (rupc-cs04b/schedd/1) reregistered - it will need 
>>a total update
>>    
>>
>>>05/22/2005 11:11:39|qmaster|rupc-cs04b|E|event client 
>>>      
>>>
>>"scheduler" (rupc-cs04b/schedd/1) reregistered - it will need 
>>a total update    <-- BEFORE THE LAST CRASH
>>    
>>
>>>05/22/2005 14:07:53|qmaster|rupc-cs04b|E|tightly integrated 
>>>      
>>>
>>parallel task 21507.1 task 10.sub04n88 failed - killing job   
>>                    <-- THIS IS WHAT TRIGGERED the CRASH
>>    
>>
>>>05/22/2005 14:09:14|qmaster|rupc-cs04b|W|job 21507.1 failed 
>>>      
>>>
>>on host sub04n78 assumedly after job because: job 21507.1 
>>died through signal TERM (15)
>>    
>>
>>>05/22/2005 14:10:00|qmaster|rupc-cs04b|E|event client 
>>>      
>>>
>>"scheduler" (rupc-cs04b/schedd/1) reregistered - it will need 
>>a total update    <- SCHEDULER START AFTER THE CRASH
>>    
>>
>>>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>      
>>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>+++++++++++++++++++++
>>    
>>
>>>SCHEDULER  messages  BELOW
>>>
>>>05/22/2005 00:20:01|schedd|rupc-cs04b|I|starting up 6.0u3 05/22/2005 
>>>02:10:01|schedd|rupc-cs04b|I|starting up 6.0u3 05/22/2005 
>>>02:30:26|schedd|rupc-cs04b|I|controlled shutdown 6.0u3 05/22/2005 
>>>02:31:10|schedd|rupc-cs04b|I|starting up 6.0u3 05/22/2005 
>>>02:34:06|schedd|rupc-cs04b|I|controlled shutdown 6.0u3 05/22/2005 
>>>02:40:00|schedd|rupc-cs04b|I|starting up 6.0u3 05/22/2005 
>>>03:10:01|schedd|rupc-cs04b|I|starting up 6.0u3 05/22/2005 
>>>07:30:01|schedd|rupc-cs04b|I|starting up 6.0u3
>>>05/22/2005 11:11:39|schedd|rupc-cs04b|I|starting up 6.0u3    
>>>      
>>>
>>    <--- before the last crush (I started debug mode)
>>    
>>
>>>05/22/2005 14:10:00|schedd|rupc-cs04b|I|starting up 6.0u3    
>>>      
>>>
>>    <--- AFTER the last crush
>>    
>>
>>>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>      
>>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>+++++++++++++++++++++
>>    
>>
>>>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>      
>>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>+++++++++++++++++++++
>>    
>>
>>> 
>>>
>>>-------------------------------------------------------------
>>>      
>>>
>>----------
>>    
>>
>>>-
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>> 
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list