[GE users] The Scheduler dies" COMPLETE information

Viktor Oudovenko udo at physics.rutgers.edu
Mon May 23 15:41:46 BST 2005


Thank you very much , Stephan, most probably 1416 is my problem. At least
all the symptoms the same.
What a relief! Thanks a lot! 
Just for information when you will have a chance to answer. If I remove
stuff from directory jobs to some other place and then restart the sgemaster
(softstop) will all the running jobs killed or not? From my play with one of
the jobs it was exactly the way I said i.e. job got killed.
v

> -----Original Message-----
> From: Stephan Grell - Sun Germany - SSG - Software Engineer 
> [mailto:stephan.grell at sun.com] 
> Sent: Monday, May 23, 2005 4:53
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] The Scheduler dies" COMPLETE information
> 
> 
> The work around is to remove all pe jobs, start the scheduler 
> and than resubmit the pe jobs...
> 
> u4 will be available soon. I do not know the date. Sorry. 
> However, you can compile it yourself by checking out the u4 tag.
> 
> Stephan
> 
> Viktor Oudovenko wrote:
> 
> >Hi, Stephan,
> >
> >Thank you for the answer.
> >When u4 will be issued and where I can read about issue 1416?
> >
> >Meanwhile  I tried many things but nothing helped me at the 
> moment. Why 
> >my scheduler reregister so often. Because  after it dies I 
> restart it 
> >manually. Simply issuing command: $SGE_ROOT/bin/lx..../scg_schedd
> >Then the information about reregistering appears.
> >
> >Thank you very much for your help.
> >v
> >
> >  
> >
> >>-----Original Message-----
> >>From: Stephan Grell - Sun Germany - SSG - Software Engineer
> >>[mailto:stephan.grell at sun.com] 
> >>Sent: Monday, May 23, 2005 3:45
> >>To: users at gridengine.sunsource.net
> >>Subject: Re: [GE users] The Scheduler dies" COMPLETE information
> >>
> >>
> >>Hi Viktor,
> >>
> >>you encounter issue 1416. This is fixed with u4.
> >>However, the important question is, why your scheduler is
> >>reregistering so often.
> >>
> >>Stephan
> >>
> >>Viktor Oudovenko wrote:
> >>
> >>    
> >>
> >>>Hi, Stephan and anybody who can help!
> >>>
> >>>Could you have a look at the attachment to see what is going
> >>>      
> >>>
> >>on with my
> >>    
> >>
> >>>scheduler. What I did I just run as you advised scheduler
> >>>      
> >>>
> >>demon in dl 1
> >>    
> >>
> >>>mode and waited until it crashes.
> >>>And it did. It dies even without any events.  I mean you
> >>>      
> >>>
> >>will find two lines
> >>    
> >>
> >>>in from messages file when the scheduler died without any
> >>>      
> >>>
> >>reason. But the
> >>    
> >>
> >>>last crash happened because one of the myrinet jobs 
> finished. Could 
> >>>you give any hint what could it be and what could it be done. I am 
> >>>running Linux SuSE 8.2 on the server  and 9.0 and 9.2
> >>>      
> >>>
> >>on the slaves.
> >>    
> >>
> >>>I also have a few opterons (8 machines). I am happy to
> >>>      
> >>>
> >>provide any further
> >>    
> >>
> >>>information if necessary.
> >>>Please help.
> >>>
> >>>With kind regards,
> >>>Viktor
> >>>P.S. In the attachment I put  not only the last iteration
> >>>      
> >>>
> >>but a couple
> >>    
> >>
> >>>of successful ones. Actually in debug mode the scheduler updates
> >>>information like every 5-10 second or so.
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>-----Original Message-----
> >>>>From: Stephan Grell - Sun Germany - SSG - Software Engineer 
> >>>>[mailto:stephan.grell at sun.com]
> >>>>Sent: Friday, May 20, 2005 3:05
> >>>>To: users at gridengine.sunsource.net
> >>>>Subject: Re: [GE users] Scheduler dies like a hell
> >>>>
> >>>>
> >>>>Hi,
> >>>>
> >>>>I am not sure, that a currupted file is the problem. The qmaster 
> >>>>does some validation during the startup. Could you run 
> the scheduler 
> >>>>in debug mode and post the output just before it dies?
> >>>>
> >>>>You can set the debug mode with:
> >>>>
> >>>>source $SGE_ROOT/<CELL>/common/settings.csh
> >>>>source $SGE_ROOT/util/dl.csh
> >>>>dl 1
> >>>>
> >>>>bin/<arch>/sge_schedd
> >>>>
> >>>>Or, do you have a stack trace of the scheduler?
> >>>>
> >>>>Which version are you running on which arch?
> >>>>
> >>>>Thanks,
> >>>>Stephan
> >>>>
> >>>>Viktor Oudovenko wrote:
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>Ron,
> >>>>>
> >>>>>Can I try to cat part of accounting file ? I mean to EDIT it
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>MANUALLY
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>despite it is written do not do it? Best regards,
> >>>>>v
> >>>>>
> >>>>>
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>-----Original Message-----
> >>>>>>From: Ron Chen [mailto:ron_chen_123 at yahoo.com]
> >>>>>>Sent: Thursday, May 19, 2005 22:02
> >>>>>>To: users at gridengine.sunsource.net
> >>>>>>Subject: RE: [GE users] Scheduler dies like a hell
> >>>>>>
> >>>>>>
> >>>>>>It is not easy to find out which file gets corrupted
> >>>>>>:(
> >>>>>>
> >>>>>>One thing you can try is to move spooled job files (in
> >>>>>>default/spool/qmaster/jobs) to a backup directory.
> >>>>>>Also, you can use qconf to dump the configuration for
> >>>>>>the queues/users/hosts, and see if the values "make sense".
> >>>>>>
> >>>>>>Of course the best way to fix this is to restore from backup!
> >>>>>>
> >>>>>>-Ron
> >>>>>>
> >>>>>>
> >>>>>>--- Viktor Oudovenko <udo at physics.rutgers.edu> wrote:
> >>>>>>  
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>Hi, Ron,
> >>>>>>>
> >>>>>>>I am using classic spooling.
> >>>>>>>Which file should I look for corruption? Can I edit
> >>>>>>>it manually?
> >>>>>>>Thank you very much in advance.
> >>>>>>>v
> >>>>>>>
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>-----Original Message-----
> >>>>>>>>From: Ron Chen [mailto:ron_chen_123 at yahoo.com]
> >>>>>>>>Sent: Thursday, May 19, 2005 20:38
> >>>>>>>>To: users at gridengine.sunsource.net
> >>>>>>>>Subject: RE: [GE users] Scheduler dies like a hell
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>Are you using classic spooling or Berkeley DB
> >>>>>>>>spooling?
> >>>>>>>>
> >>>>>>>>With classic spooling, when the machine crashes,
> >>>>>>>>      
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>                
> >>>>>>>>
> >>>>>>>the
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>files may get corrupted. And when qmaster reads in
> >>>>>>>>      
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>                
> >>>>>>>>
> >>>>>>>the
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>corrupted files, it may also corrupt the qmasters'
> >>>>>>>>      
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>                
> >>>>>>>>
> >>>>>>>data structures.
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>IIRC, Berkeley DB handles recovery itself, but I
> >>>>>>>>      
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>                
> >>>>>>>>
> >>>>>>>have
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>never played with it myself :)
> >>>>>>>>
> >>>>>>>>-Ron
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>--- Viktor Oudovenko <udo at physics.rutgers.edu>
> >>>>>>>>      
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>                
> >>>>>>>>
> >>>>>>>wrote:
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>Hi, Mac,
> >>>>>>>>>Thank you very much for your advices!
> >>>>>>>>>I'll try. I think one of running or finished
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>jobs
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>did a bad record somewhere
> >>>>>>>>>(like jobs directory).
> >>>>>>>>>Best regards,
> >>>>>>>>>v
> >>>>>>>>>
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>-----Original Message-----
> >>>>>>>>>>From: McCalla, Mac
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>[mailto:macmccalla at hess.com]
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>>Sent: Thursday, May 19, 2005 15:12
> >>>>>>>>>>To: users at gridengine.sunsource.net
> >>>>>>>>>>Subject: RE: [GE users] Scheduler dies like a
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>hell
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>>Hi,
> >>>>>>>>>>
> >>>>>>>>>>Some thinks to look at:  any messages in
> >>>>>>>>>>$SGE_ROOT/......../qmaster/schedd/messages  ?
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>To
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>get more
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>info about what scheduler is doing while it is
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>running, see
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>info about scheduler params profile and
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>monitor,
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>you can set
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>them equal to 1 to turn on
> >>>>>>>>>>some scheduler diagnostics,  see man
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>sched_conf.
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>>To extend timeout value for scheduler you can
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>set
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>>qmaster_params SCHEDULER_TIMEOUT to some value
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>greater than
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>600 (seconds).
> >>>>>>>>>>You can also use system command strace to get
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>trace of
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>scheduler activity while it is running to
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>perhaps
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>get a
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>better idea of what it is spending its time
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>doing.
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>>Hope this helps,
> >>>>>>>>>>
> >>>>>>>>>>mac mccalla
> >>>>>>>>>>
> >>>>>>>>>>-----Original Message-----
> >>>>>>>>>>From: Viktor Oudovenko
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>[mailto:udo at physics.rutgers.edu]
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>Sent: Thursday, May 19, 2005 12:00 PM
> >>>>>>>>>>To: users at gridengine.sunsource.net
> >>>>>>>>>>Subject: [GE users] Scheduler dies like a hell
> >>>>>>>>>>
> >>>>>>>>>>Hi, everybody,
> >>>>>>>>>>
> >>>>>>>>>>I am asking your help and ideas what could be
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>done
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>to restore
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>normal operation of the scheduler. First what
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>happened. A few
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>time during last week our main server died and
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>I
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>needed to
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>reboot it and even replace it. But jobs which
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>used
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>automount
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>proceed run. But from yesterday or day before
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>yesterday
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>scheduler demon dies. I tried to restart
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>sge_master but it
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>did not help. Now when demon died I start it
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>manually simply typing:
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>/opt/SGE/bin/lx24-x86/sge_schedd
> >>>>>>>>>>
> >>>>>>>>>>but after some time it died again. Please
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>advice
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>what could it be?
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>Below plz find some info form file messages:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>05/19/2005 01:02:37|qmaster|rupc-cs04b|E|no
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>execd
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>known on
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>host sub04n87 to send conf notification
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>05/19/2005
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>on
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>host sub04n88
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>to send conf notification 05/19/2005
> >>>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>on
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>host sub04n89
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>to send conf notification 05/19/2005
> >>>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>on
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>host sub04n90
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>to send conf notification 05/19/2005
> >>>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>on
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>host sub04n91
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>to send conf notification 05/19/2005
> >>>>>>>>>>01:02:37|qmaster|rupc-cs04b|E|no execd known
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>on
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>host
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>rupc04.rutgers.edu to send conf notification
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>05/19/2005
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>01:02:37|qmaster|rupc-cs04b|I|starting up
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>6.0u3
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>05/19/2005
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>01:08:11|qmaster|rupc-cs04b|E|commlib error:
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>got
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>read error
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>(closing connection) 05/19/2005
> >>>>>>>>>>01:11:06|qmaster|rupc-cs04b|E|event client
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>"scheduler"
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>(rupc-cs04b/schedd/1) reregistered - it will
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>need
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>a total
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>update 05/19/2005
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>01:24:31|qmaster|rupc-cs04b|W|job 21171.1
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>failed on host sub04n203 assumedly after job
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>because: job
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>21171.1 died through signal TERM
> >>>>>>>>>>(15)
> >>>>>>>>>>05/19/2005
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>05:17:19|qmaster|rupc-cs04b|E|acknowledge
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>timeout
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>>after 600 seconds for event client (schedd:1)
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>on
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>host
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>"rupc-cs04b" 05/19/2005
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>09:29:03|qmaster|rupc-cs04b|W|job
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>21060.1 failed on host sub04n74 assumedly
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>after
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>job because:
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>job 21060.1 died through signal TERM (15)
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>05/19/2005
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>09:30:37|qmaster|rupc-cs04b|E|event client
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>"scheduler"
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>(rupc-cs04b/schedd/1) reregistered - it will
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>need
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>a total
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>update 05/19/2005
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>11:04:21|qmaster|rupc-cs04b|W|job 20222.1
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>failed on host sub04n29 assumedly after job
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>because: job
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>20222.1 died through signal KILL (9)
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>05/19/2005
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>>11:05:50|qmaster|rupc-cs04b|W|job 21212.1
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>failed
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>on host
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>sub04n25 assumedly after job because: job
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>21212.1
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>died
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>through signal KILL (9) 05/19/2005
> >>>>>>>>>>12:04:51|qmaster|rupc-cs04b|E|acknowledge
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>timeout
> >>>>>>>    
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>>after 600
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>seconds for event client (schedd:1) on host
> >>>>>>>>>>          
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>"rupc-cs04b"
> >>>>>>>>>        
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>=== message truncated ===
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>		
> >>>>>>Discover Yahoo!
> >>>>>>Have fun online with music videos, cool games, IM and 
> more. Check 
> >>>>>>it out! http://discover.yahoo.com/online.html
> >>>>>>
> >>>>>>------------------------------------------------------------
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>---------
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>>To unsubscribe, e-mail: 
> users-unsubscribe at gridengine.sunsource.net
> >>>>>>For additional commands, e-mail:
> >>>>>>            
> >>>>>>
> >>users-help at gridengine.sunsource.net
> >>    
> >>
> >>>>>>  
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>-----------------------------------------------------------
> >>>>>          
> >>>>>
> >>----------
> >>    
> >>
> >>>>>To unsubscribe, e-mail: 
> users-unsubscribe at gridengine.sunsource.net
> >>>>>For additional commands, e-mail:
> >>>>>          
> >>>>>
> >>users-help at gridengine.sunsource.net
> >>    
> >>
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>------------------------------------------------------------
> >>>>        
> >>>>
> >>---------
> >>    
> >>
> >>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>For additional commands, e-mail: 
> users-help at gridengine.sunsource.net
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>> 
> >>>
> >>>-------------------------------------------------------------
> >>>      
> >>>
> >>----------
> >>    
> >>
> >>>-
> >>>
> >>>WS128133  25368 16384     SENDING 22 ORDERS TO QMASTER
> >>>128134  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
> >>>128135  25368 16384     reresolve port timeout in 340
> >>>128136  25368 16384     returning cached port value: 536
> >>>--------------STOP-SCHEDULER-RUN-------------
> >>>128137  25368 16384     ec_get retrieving events - will do 
> >>>      
> >>>
> >>max 20 fetches
> >>    
> >>
> >>>128138  25368 16384     doing sync fetch for messages, 20 
> still to do
> >>>128139  25368 16384     try to get request from qmaster, id 1
> >>>128140  25368 16384     Checking 55 events (44303-44357) 
> >>>      
> >>>
> >>while waiting for #44303
> >>    
> >>
> >>>128141  25368 16384     check complete, 55 events in list
> >>>128142  25368 16384     got 55 events till 44357
> >>>128143  25368 16384     doing async fetch for messages, 19 
> >>>      
> >>>
> >>still to do
> >>    
> >>
> >>>128144  25368 16384     try to get request from qmaster, id 1
> >>>128145  25368 16384     reresolve port timeout in 320
> >>>128146  25368 16384     returning cached port value: 536
> >>>128147  25368 16384     Sent ack for all events lower or 
> equal 44357
> >>>128148  25368 16384     ec_get - received 55 events
> >>>128149  25368 16384     44303. EVENT MOD EXECHOST sub04n147
> >>>128150  25368 16384     44304. EVENT MOD USER udo
> >>>128151  25368 16384     44305. EVENT MOD USER iber
> >>>128152  25368 16384     44306. EVENT MOD USER dieguez
> >>>128153  25368 16384     44307. EVENT MOD USER karenjoh
> >>>128154  25368 16384     44308. EVENT MOD USER lorenzo
> >>>128155  25368 16384     44309. EVENT MOD USER parcolle
> >>>128156  25368 16384     44310. EVENT MOD USER cfennie
> >>>128157  25368 16384     44311. EVENT MOD USER civelli
> >>>128158  25368 16384     44312. EVENT MOD EXECHOST sub04n135
> >>>128159  25368 16384     44313. EVENT MOD EXECHOST sub04n141
> >>>128160  25368 16384     44314. EVENT MOD EXECHOST sub04n127
> >>>128161  25368 16384     44315. EVENT MOD EXECHOST sub04n145
> >>>128162  25368 16384     44316. EVENT MOD EXECHOST sub04n133
> >>>128163  25368 16384     44317. EVENT MOD EXECHOST sub04n148
> >>>128164  25368 16384     44318. EVENT MOD EXECHOST sub04n74
> >>>128165  25368 16384     44319. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>2.sub04n74 USAGE
> >>    
> >>
> >>>128166  25368 16384     44320. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>1.sub04n74 USAGE
> >>    
> >>
> >>>128167  25368 16384     44321. EVENT MOD EXECHOST 
> rupc03.rutgers.edu
> >>>128168  25368 16384     44322. EVENT MOD EXECHOST sub04n139
> >>>128169  25368 16384     44323. EVENT MOD EXECHOST 
> rupc02.rutgers.edu
> >>>128170  25368 16384     44324. EVENT MOD EXECHOST sub04n80
> >>>128171  25368 16384     44325. EVENT MOD EXECHOST sub04n207
> >>>128172  25368 16384     44326. EVENT MOD EXECHOST sub04n180
> >>>128173  25368 16384     44327. EVENT MOD EXECHOST sub04n23
> >>>128174  25368 16384     44328. EVENT MOD EXECHOST sub04n30
> >>>128175  25368 16384     44329. EVENT MOD EXECHOST sub04n203
> >>>128176  25368 16384     44330. EVENT MOD EXECHOST sub04n109
> >>>128177  25368 16384     44331. EVENT MOD EXECHOST 
> rupc04.rutgers.edu
> >>>128178  25368 16384     44332. EVENT MOD EXECHOST sub04n114
> >>>128179  25368 16384     44333. EVENT MOD EXECHOST sub04n106
> >>>128180  25368 16384     44334. EVENT MOD EXECHOST sub04n88
> >>>128181  25368 16384     44335. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>6.sub04n88 USAGE
> >>    
> >>
> >>>128182  25368 16384     44336. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>5.sub04n88 USAGE
> >>    
> >>
> >>>128183  25368 16384     44337. EVENT MOD EXECHOST sub04n157
> >>>128184  25368 16384     44338. EVENT MOD EXECHOST sub04n20
> >>>128185  25368 16384     44339. EVENT MOD EXECHOST sub04n156
> >>>128186  25368 16384     44340. EVENT MOD EXECHOST sub04n26
> >>>128187  25368 16384     44341. EVENT JOB 21213.1 USAGE
> >>>128188  25368 16384     44342. EVENT MOD EXECHOST sub04n05
> >>>128189  25368 16384     44343. EVENT MOD EXECHOST sub04n103
> >>>128190  25368 16384     44344. EVENT MOD EXECHOST sub04n164
> >>>128191  25368 16384     44345. EVENT MOD EXECHOST sub04n09
> >>>128192  25368 16384     44346. EVENT MOD EXECHOST sub04n105
> >>>128193  25368 16384     44347. EVENT MOD EXECHOST sub04n113
> >>>128194  25368 16384     44348. EVENT MOD EXECHOST sub04n28
> >>>128195  25368 16384     44349. EVENT MOD EXECHOST sub04n76
> >>>128196  25368 16384     44350. EVENT MOD EXECHOST sub04n162
> >>>128197  25368 16384     44351. EVENT MOD EXECHOST sub04n108
> >>>128198  25368 16384     44352. EVENT MOD EXECHOST sub04n38
> >>>128199  25368 16384     44353. EVENT MOD EXECHOST sub04n04
> >>>128200  25368 16384     44354. EVENT MOD EXECHOST sub04n116
> >>>128201  25368 16384     44355. EVENT MOD EXECHOST sub04n179
> >>>128202  25368 16384     44356. EVENT MOD EXECHOST sub04n160
> >>>128203  25368 16384     44357. EVENT MOD EXECHOST sub04n107
> >>>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7,
> >>>      
> >>>
> >>CKPT:0 US:15 PR:4 S:nd:12/lf:7
> >>    
> >>
> >>>128204  25368 16384     
> >>>      
> >>>
> >>================[SCHEDULING-EPOCH]==================
> >>    
> >>
> >>>128205  25368 16384     JOB 20937.1 start_time = 1116447112 
> >>>      
> >>>
> >>running_time 338079 decay_time = 450
> >>    
> >>
> >>>128206  25368 16384     JOB 20938.1 start_time = 1116374344 
> >>>      
> >>>
> >>running_time 410847 decay_time = 450
> >>    
> >>
> >>>128207  25368 16384     JOB 21040.1 start_time = 1116443073 
> >>>      
> >>>
> >>running_time 342118 decay_time = 450
> >>    
> >>
> >>>128208  25368 16384     JOB 21076.1 start_time = 1116451351 
> >>>      
> >>>
> >>running_time 333840 decay_time = 450
> >>    
> >>
> >>>128209  25368 16384     JOB 21210.1 start_time = 1116514970 
> >>>      
> >>>
> >>running_time 270221 decay_time = 450
> >>    
> >>
> >>>128210  25368 16384     JOB 21213.1 start_time = 1116515250 
> >>>      
> >>>
> >>running_time 269941 decay_time = 450
> >>    
> >>
> >>>128211  25368 16384     JOB 21338.1 start_time = 1116543252 
> >>>      
> >>>
> >>running_time 241939 decay_time = 450
> >>    
> >>
> >>>128212  25368 16384     JOB 21423.1 start_time = 1116629274 
> >>>      
> >>>
> >>running_time 155917 decay_time = 450
> >>    
> >>
> >>>128213  25368 16384     JOB 21424.1 start_time = 1116631365 
> >>>      
> >>>
> >>running_time 153826 decay_time = 450
> >>    
> >>
> >>>128214  25368 16384     JOB 21440.1 start_time = 1116632934 
> >>>      
> >>>
> >>running_time 152257 decay_time = 450
> >>    
> >>
> >>>128215  25368 16384     JOB 21441.1 start_time = 1116632994 
> >>>      
> >>>
> >>running_time 152197 decay_time = 450
> >>    
> >>
> >>>128216  25368 16384     JOB 21443.1 start_time = 1116633602 
> >>>      
> >>>
> >>running_time 151589 decay_time = 450
> >>    
> >>
> >>>128217  25368 16384     JOB 21474.1 start_time = 1116655118 
> >>>      
> >>>
> >>running_time 130073 decay_time = 450
> >>    
> >>
> >>>128218  25368 16384     JOB 21503.1 start_time = 1116707395 
> >>>      
> >>>
> >>running_time 77796 decay_time = 450
> >>    
> >>
> >>>128219  25368 16384     JOB 21507.1 start_time = 1116714061 
> >>>      
> >>>
> >>running_time 71130 decay_time = 450
> >>    
> >>
> >>>128220  25368 16384     JOB 21528.1 start_time = 1116707641 
> >>>      
> >>>
> >>running_time 77550 decay_time = 450
> >>    
> >>
> >>>128221  25368 16384     JOB 21530.1 start_time = 1116714453 
> >>>      
> >>>
> >>running_time 70738 decay_time = 450
> >>    
> >>
> >>>128222  25368 16384     JOB 21537.1 start_time = 1116724845 
> >>>      
> >>>
> >>running_time 60346 decay_time = 450
> >>    
> >>
> >>>128223  25368 16384     JOB 21542.1 start_time = 1116782511 
> >>>      
> >>>
> >>running_time 2680 decay_time = 450
> >>    
> >>
> >>>128224  25368 16384     verified threshold of 169 queues
> >>>128225  25368 16384     queue myrinet at sub04n61 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128226  25368 16384     queue myrinet at sub04n62 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128227  25368 16384     queue myrinet at sub04n65 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128228  25368 16384     queue myrinet at sub04n66 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128229  25368 16384     queue myrinet at sub04n67 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128230  25368 16384     queue myrinet at sub04n68 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128231  25368 16384     queue myrinet at sub04n69 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128232  25368 16384     queue myrinet at sub04n70 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128233  25368 16384     queue myrinet at sub04n71 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128234  25368 16384     queue myrinet at sub04n72 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128235  25368 16384     queue myrinet at sub04n75 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128236  25368 16384     queue myrinet at sub04n77 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128237  25368 16384     queue myrinet at sub04n78 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128238  25368 16384     queue myrinet at sub04n79 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128239  25368 16384     queue myrinet at sub04n81 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128240  25368 16384     queue myrinet at sub04n84 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128241  25368 16384     queue myrinet at sub04n85 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128242  25368 16384     queue myrinet at sub04n86 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128243  25368 16384     queue myrinet at sub04n87 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128244  25368 16384     queue myrinet at sub04n88 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128245  25368 16384     queue myrinet at sub04n89 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128246  25368 16384     queue myrinet at sub04n90 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128247  25368 16384     queue myrinet at sub04n91 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128248  25368 16384     queue myrinet at sub04n63 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128249  25368 16384     queue myrinet at sub04n64 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128250  25368 16384     queue myrinet at sub04n73 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128251  25368 16384     queue myrinet at sub04n74 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128252  25368 16384     queue opteronp at sub04n202 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128253  25368 16384     queue opteronp at sub04n205 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.010000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128254  25368 16384     queue opteronp at sub04n206 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128255  25368 16384     queue opteronp at sub04n208 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.010000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128256  25368 16384     queue parallel at sub04n121 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128257  25368 16384     queue parallel at sub04n139 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128258  25368 16384     queue parallel at sub04n140 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128259  25368 16384     queue parallel at sub04n141 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128260  25368 16384     queue parallel at sub04n142 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128261  25368 16384     queue parallel at sub04n143 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128262  25368 16384     queue parallel at sub04n144 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128263  25368 16384     queue parallel at sub04n146 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128264  25368 16384     queue parallel at sub04n02 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128265  25368 16384     queue parallel at sub04n03 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128266  25368 16384     queue parallel at sub04n04 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128267  25368 16384     queue parallel at sub04n05 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128268  25368 16384     queue parallel at sub04n06 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128269  25368 16384     queue parallel at sub04n07 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128270  25368 16384     queue parallel at sub04n08 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128271  25368 16384     queue parallel at sub04n09 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128272  25368 16384     queue parallel at sub04n10 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128273  25368 16384     queue parallel at sub04n11 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128274  25368 16384     verified threshold of 169 queues
> >>>128275  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
> >>>128276  25368 16384     Not enrolled ja_tasks: 0
> >>>128277  25368 16384     Enrolled ja_tasks: 1
> >>>128278  25368 16384     Not enrolled ja_tasks: 0
> >>>128279  25368 16384     Enrolled ja_tasks: 1
> >>>128280  25368 16384     Not enrolled ja_tasks: 0
> >>>128281  25368 16384     Enrolled ja_tasks: 1
> >>>128282  25368 16384     Not enrolled ja_tasks: 0
> >>>128283  25368 16384     Enrolled ja_tasks: 1
> >>>128284  25368 16384     Not enrolled ja_tasks: 0
> >>>128285  25368 16384     Enrolled ja_tasks: 1
> >>>128286  25368 16384     Not enrolled ja_tasks: 0
> >>>128287  25368 16384     Enrolled ja_tasks: 1
> >>>128288  25368 16384     Not enrolled ja_tasks: 0
> >>>128289  25368 16384     Enrolled ja_tasks: 1
> >>>128290  25368 16384     Not enrolled ja_tasks: 0
> >>>128291  25368 16384     Enrolled ja_tasks: 1
> >>>128292  25368 16384     Not enrolled ja_tasks: 0
> >>>128293  25368 16384     Enrolled ja_tasks: 1
> >>>128294  25368 16384     Not enrolled ja_tasks: 0
> >>>128295  25368 16384     Enrolled ja_tasks: 1
> >>>128296  25368 16384     Not enrolled ja_tasks: 0
> >>>128297  25368 16384     Enrolled ja_tasks: 1
> >>>128298  25368 16384     Not enrolled ja_tasks: 0
> >>>128299  25368 16384     Enrolled ja_tasks: 1
> >>>128300  25368 16384     Not enrolled ja_tasks: 0
> >>>128301  25368 16384     Enrolled ja_tasks: 1
> >>>128302  25368 16384     Not enrolled ja_tasks: 0
> >>>128303  25368 16384     Enrolled ja_tasks: 1
> >>>128304  25368 16384     Not enrolled ja_tasks: 0
> >>>128305  25368 16384     Enrolled ja_tasks: 1
> >>>128306  25368 16384     Not enrolled ja_tasks: 0
> >>>128307  25368 16384     Enrolled ja_tasks: 1
> >>>128308  25368 16384     Not enrolled ja_tasks: 0
> >>>128309  25368 16384     Enrolled ja_tasks: 1
> >>>128310  25368 16384     Not enrolled ja_tasks: 0
> >>>128311  25368 16384     Enrolled ja_tasks: 1
> >>>128312  25368 16384     Not enrolled ja_tasks: 0
> >>>128313  25368 16384     Enrolled ja_tasks: 1
> >>>128314  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
> >>>128315  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128316  25368 16384        slots: 1.000000 * 1000.000000 * 6 
> >>>      
> >>>
> >>   ---> 6000.000000
> >>    
> >>
> >>>128317  25368 16384     slot request assumed for static 
> >>>      
> >>>
> >>urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
> >>    
> >>
> >>>128318  25368 16384        slots: 1.000000 * 1000.000000 * 
> >>>      
> >>>
> >>20    ---> 20000.000000
> >>    
> >>
> >>>128319  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128320  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128321  25368 16384        slots: 1.000000 * 1000.000000 * 6 
> >>>      
> >>>
> >>   ---> 6000.000000
> >>    
> >>
> >>>128322  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128323  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128324  25368 16384     slot request assumed for static 
> >>>      
> >>>
> >>urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" 
> setting "min"
> >>    
> >>
> >>>128325  25368 16384        slots: 1.000000 * 1000.000000 * 2 
> >>>      
> >>>
> >>   ---> 2000.000000
> >>    
> >>
> >>>128326  25368 16384        slots: 1.000000 * 1000.000000 * 8 
> >>>      
> >>>
> >>   ---> 8000.000000
> >>    
> >>
> >>>128327  25368 16384     ASU min = 1000.00000000000, ASU max 
> >>>      
> >>>
> >>= 20000.00000000000
> >>    
> >>
> >>>128328  25368 16384     
> >>>128329  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
> >>>128330  25368 16384     
> >>>128331  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>0]======================
> >>    
> >>
> >>>128332  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>1]======================
> >>    
> >>
> >>>128333  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>2]======================
> >>    
> >>
> >>>128334  25368 16384     
> >>>128335  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
> >>>128336  25368 16384     
> >>>128337  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>0]======================
> >>    
> >>
> >>>128338  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>1]======================
> >>    
> >>
> >>>128339  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>2]======================
> >>    
> >>
> >>>128340  25368 16384     Normalizing tickets using 
> >>>      
> >>>
> >>0.000000/18.333333 as min_tix/max_tix
> >>    
> >>
> >>>128341  25368 16384        got 19 running jobs
> >>>128342  25368 16384        added 19 ticket orders for running jobs
> >>>128343  25368 16384        added 1 orders for updating 
> usage of user
> >>>128344  25368 16384        added 0 orders for updating usage 
> >>>      
> >>>
> >>of project
> >>    
> >>
> >>>128345  25368 16384        added 0 orders for updating share tree
> >>>128346  25368 16384        added 1 orders for scheduler 
> configuration
> >>>128347  25368 16384     SENDING 22 ORDERS TO QMASTER
> >>>128348  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
> >>>128349  25368 16384     reresolve port timeout in 320
> >>>128350  25368 16384     returning cached port value: 536
> >>>--------------STOP-SCHEDULER-RUN-------------
> >>>128351  25368 16384     ec_get retrieving events - will do 
> >>>      
> >>>
> >>max 20 fetches
> >>    
> >>
> >>>128352  25368 16384     doing sync fetch for messages, 20 
> still to do
> >>>128353  25368 16384     try to get request from qmaster, id 1
> >>>128354  25368 16384     Checking 120 events (44358-44477) 
> >>>      
> >>>
> >>while waiting for #44358
> >>    
> >>
> >>>128355  25368 16384     check complete, 120 events in list
> >>>128356  25368 16384     got 120 events till 44477
> >>>128357  25368 16384     doing async fetch for messages, 19 
> >>>      
> >>>
> >>still to do
> >>    
> >>
> >>>128358  25368 16384     try to get request from qmaster, id 1
> >>>128359  25368 16384     reresolve port timeout in 300
> >>>128360  25368 16384     returning cached port value: 536
> >>>128361  25368 16384     Sent ack for all events lower or 
> equal 44477
> >>>128362  25368 16384     ec_get - received 120 events
> >>>128363  25368 16384     44358. EVENT MOD EXECHOST sub04n166
> >>>128364  25368 16384     44359. EVENT MOD EXECHOST sub04n90
> >>>128365  25368 16384     44360. EVENT JOB 21503.1 task 
> >>>      
> >>>
> >>2.sub04n90 USAGE
> >>    
> >>
> >>>128366  25368 16384     44361. EVENT JOB 21503.1 task 
> >>>      
> >>>
> >>1.sub04n90 USAGE
> >>    
> >>
> >>>128367  25368 16384     44362. EVENT MOD EXECHOST sub04n168
> >>>128368  25368 16384     44363. EVENT MOD EXECHOST sub04n112
> >>>128369  25368 16384     44364. EVENT MOD EXECHOST sub04n08
> >>>128370  25368 16384     44365. EVENT MOD EXECHOST sub04n75
> >>>128371  25368 16384     44366. EVENT JOB 21040.1 task 
> >>>      
> >>>
> >>6.sub04n75 USAGE
> >>    
> >>
> >>>128372  25368 16384     44367. EVENT JOB 21040.1 task 
> >>>      
> >>>
> >>5.sub04n75 USAGE
> >>    
> >>
> >>>128373  25368 16384     44368. EVENT MOD USER udo
> >>>128374  25368 16384     44369. EVENT MOD USER iber
> >>>128375  25368 16384     44370. EVENT MOD USER dieguez
> >>>128376  25368 16384     44371. EVENT MOD USER karenjoh
> >>>128377  25368 16384     44372. EVENT MOD USER lorenzo
> >>>128378  25368 16384     44373. EVENT MOD USER parcolle
> >>>128379  25368 16384     44374. EVENT MOD USER cfennie
> >>>128380  25368 16384     44375. EVENT MOD USER civelli
> >>>128381  25368 16384     44376. EVENT MOD EXECHOST sub04n14
> >>>128382  25368 16384     44377. EVENT MOD EXECHOST sub04n150
> >>>128383  25368 16384     44378. EVENT MOD EXECHOST sub04n169
> >>>128384  25368 16384     44379. EVENT MOD EXECHOST sub04n165
> >>>128385  25368 16384     44380. EVENT MOD EXECHOST sub04n136
> >>>128386  25368 16384     44381. EVENT MOD EXECHOST sub04n81
> >>>128387  25368 16384     44382. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>6.sub04n81 USAGE
> >>    
> >>
> >>>128388  25368 16384     44383. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>5.sub04n81 USAGE
> >>    
> >>
> >>>128389  25368 16384     44384. EVENT MOD EXECHOST sub04n176
> >>>128390  25368 16384     44385. EVENT MOD EXECHOST sub04n161
> >>>128391  25368 16384     44386. EVENT MOD EXECHOST sub04n124
> >>>128392  25368 16384     44387. EVENT MOD EXECHOST sub04n01
> >>>128393  25368 16384     44388. EVENT MOD EXECHOST sub04n158
> >>>128394  25368 16384     44389. EVENT MOD EXECHOST sub04n159
> >>>128395  25368 16384     44390. EVENT MOD EXECHOST sub04n134
> >>>128396  25368 16384     44391. EVENT MOD EXECHOST sub04n143
> >>>128397  25368 16384     44392. EVENT MOD EXECHOST sub04n121
> >>>128398  25368 16384     44393. EVENT MOD EXECHOST sub04n15
> >>>128399  25368 16384     44394. EVENT MOD EXECHOST sub04n13
> >>>128400  25368 16384     44395. EVENT MOD EXECHOST sub04n118
> >>>128401  25368 16384     44396. EVENT MOD EXECHOST sub04n64
> >>>128402  25368 16384     44397. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>2.sub04n64 USAGE
> >>    
> >>
> >>>128403  25368 16384     44398. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>1.sub04n64 USAGE
> >>    
> >>
> >>>128404  25368 16384     44399. EVENT MOD EXECHOST sub04n151
> >>>128405  25368 16384     44400. EVENT MOD EXECHOST sub04n154
> >>>128406  25368 16384     44401. EVENT MOD EXECHOST sub04n149
> >>>128407  25368 16384     44402. EVENT MOD EXECHOST sub04n16
> >>>128408  25368 16384     44403. EVENT MOD EXECHOST sub04n155
> >>>128409  25368 16384     44404. EVENT MOD EXECHOST sub04n152
> >>>128410  25368 16384     44405. EVENT MOD EXECHOST sub04n163
> >>>128411  25368 16384     44406. EVENT MOD EXECHOST sub04n86
> >>>128412  25368 16384     44407. EVENT JOB 21423.1 task 
> >>>      
> >>>
> >>2.sub04n86 USAGE
> >>    
> >>
> >>>128413  25368 16384     44408. EVENT JOB 21423.1 task 
> >>>      
> >>>
> >>1.sub04n86 USAGE
> >>    
> >>
> >>>128414  25368 16384     44409. EVENT MOD EXECHOST sub04n43
> >>>128415  25368 16384     44410. EVENT MOD EXECHOST sub04n204
> >>>128416  25368 16384     44411. EVENT MOD EXECHOST 
> rupc01.rutgers.edu
> >>>128417  25368 16384     44412. EVENT MOD EXECHOST sub04n125
> >>>128418  25368 16384     44413. EVENT MOD EXECHOST sub04n03
> >>>128419  25368 16384     44414. EVENT JOB 21076.1 USAGE
> >>>128420  25368 16384     44415. EVENT MOD EXECHOST sub04n44
> >>>128421  25368 16384     44416. EVENT MOD EXECHOST sub04n32
> >>>128422  25368 16384     44417. EVENT MOD EXECHOST sub04n21
> >>>128423  25368 16384     44418. EVENT MOD EXECHOST sub04n22
> >>>128424  25368 16384     44419. EVENT MOD EXECHOST sub04n35
> >>>128425  25368 16384     44420. EVENT MOD EXECHOST sub04n201
> >>>128426  25368 16384     44421. EVENT MOD EXECHOST sub04n146
> >>>128427  25368 16384     44422. EVENT MOD EXECHOST sub04n111
> >>>128428  25368 16384     44423. EVENT MOD EXECHOST sub04n177
> >>>128429  25368 16384     44424. EVENT MOD EXECHOST sub04n89
> >>>128430  25368 16384     44425. EVENT JOB 21530.1 task 
> >>>      
> >>>
> >>2.sub04n89 USAGE
> >>    
> >>
> >>>128431  25368 16384     44426. EVENT JOB 21530.1 task 
> >>>      
> >>>
> >>1.sub04n89 USAGE
> >>    
> >>
> >>>128432  25368 16384     44427. EVENT JOB 21530.1 USAGE
> >>>128433  25368 16384     44428. EVENT MOD EXECHOST sub04n205
> >>>128434  25368 16384     44429. EVENT JOB 21440.1 USAGE
> >>>128435  25368 16384     44430. EVENT MOD EXECHOST sub04n208
> >>>128436  25368 16384     44431. EVENT JOB 21528.1 USAGE
> >>>128437  25368 16384     44432. EVENT MOD EXECHOST sub04n104
> >>>128438  25368 16384     44433. EVENT MOD EXECHOST sub04n24
> >>>128439  25368 16384     44434. EVENT JOB 21210.1 USAGE
> >>>128440  25368 16384     44435. EVENT MOD EXECHOST sub04n18
> >>>128441  25368 16384     44436. EVENT MOD EXECHOST sub04n31
> >>>128442  25368 16384     44437. EVENT JOB 20937.1 USAGE
> >>>128443  25368 16384     44438. EVENT MOD EXECHOST sub04n202
> >>>128444  25368 16384     44439. EVENT JOB 21443.1 USAGE
> >>>128445  25368 16384     44440. EVENT MOD EXECHOST sub04n171
> >>>128446  25368 16384     44441. EVENT MOD EXECHOST sub04n37
> >>>128447  25368 16384     44442. EVENT MOD EXECHOST sub04n36
> >>>128448  25368 16384     44443. EVENT MOD EXECHOST sub04n40
> >>>128449  25368 16384     44444. EVENT MOD EXECHOST sub04n12
> >>>128450  25368 16384     44445. EVENT MOD EXECHOST sub04n172
> >>>128451  25368 16384     44446. EVENT MOD EXECHOST sub04n79
> >>>128452  25368 16384     44447. EVENT JOB 21040.1 task 
> >>>      
> >>>
> >>6.sub04n79 USAGE
> >>    
> >>
> >>>128453  25368 16384     44448. EVENT JOB 21040.1 task 
> >>>      
> >>>
> >>5.sub04n79 USAGE
> >>    
> >>
> >>>128454  25368 16384     44449. EVENT JOB 21040.1 USAGE
> >>>128455  25368 16384     44450. EVENT MOD EXECHOST sub04n61
> >>>128456  25368 16384     44451. EVENT JOB 21040.1 task 
> >>>      
> >>>
> >>6.sub04n61 USAGE
> >>    
> >>
> >>>128457  25368 16384     44452. EVENT JOB 21040.1 task 
> >>>      
> >>>
> >>5.sub04n61 USAGE
> >>    
> >>
> >>>128458  25368 16384     44453. EVENT MOD EXECHOST sub04n170
> >>>128459  25368 16384     44454. EVENT MOD EXECHOST sub04n41
> >>>128460  25368 16384     44455. EVENT JOB 20938.1 USAGE
> >>>128461  25368 16384     44456. EVENT MOD EXECHOST sub04n153
> >>>128462  25368 16384     44457. EVENT MOD EXECHOST sub04n39
> >>>128463  25368 16384     44458. EVENT MOD EXECHOST sub04n83
> >>>128464  25368 16384     44459. EVENT MOD EXECHOST sub04n82
> >>>128465  25368 16384     44460. EVENT MOD EXECHOST sub04n174
> >>>128466  25368 16384     44461. EVENT MOD EXECHOST sub04n173
> >>>128467  25368 16384     44462. EVENT MOD EXECHOST sub04n85
> >>>128468  25368 16384     44463. EVENT JOB 21423.1 task 
> >>>      
> >>>
> >>2.sub04n85 USAGE
> >>    
> >>
> >>>128469  25368 16384     44464. EVENT JOB 21423.1 task 
> >>>      
> >>>
> >>1.sub04n85 USAGE
> >>    
> >>
> >>>128470  25368 16384     44465. EVENT MOD EXECHOST sub04n68
> >>>128471  25368 16384     44466. EVENT JOB 21474.1 task 
> >>>      
> >>>
> >>14.sub04n68 USAGE
> >>    
> >>
> >>>128472  25368 16384     44467. EVENT JOB 21474.1 task 
> >>>      
> >>>
> >>13.sub04n68 USAGE
> >>    
> >>
> >>>128473  25368 16384     44468. EVENT MOD EXECHOST 
> beowulf.rutgers.edu
> >>>128474  25368 16384     44469. EVENT MOD EXECHOST sub04n91
> >>>128475  25368 16384     44470. EVENT JOB 21423.1 task 
> >>>      
> >>>
> >>2.sub04n91 USAGE
> >>    
> >>
> >>>128476  25368 16384     44471. EVENT JOB 21423.1 task 
> >>>      
> >>>
> >>1.sub04n91 USAGE
> >>    
> >>
> >>>128477  25368 16384     44472. EVENT JOB 21423.1 USAGE
> >>>128478  25368 16384     44473. EVENT MOD EXECHOST sub04n29
> >>>128479  25368 16384     44474. EVENT MOD EXECHOST sub04n69
> >>>128480  25368 16384     44475. EVENT JOB 21474.1 task 
> >>>      
> >>>
> >>14.sub04n69 USAGE
> >>    
> >>
> >>>128481  25368 16384     44476. EVENT JOB 21474.1 task 
> >>>      
> >>>
> >>13.sub04n69 USAGE
> >>    
> >>
> >>>128482  25368 16384     44477. EVENT MOD EXECHOST sub04n175
> >>>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7,
> >>>      
> >>>
> >>CKPT:0 US:15 PR:4 S:nd:12/lf:7
> >>    
> >>
> >>>128483  25368 16384     
> >>>      
> >>>
> >>================[SCHEDULING-EPOCH]==================
> >>    
> >>
> >>>128484  25368 16384     JOB 20937.1 start_time = 1116447112 
> >>>      
> >>>
> >>running_time 338099 decay_time = 450
> >>    
> >>
> >>>128485  25368 16384     JOB 20938.1 start_time = 1116374344 
> >>>      
> >>>
> >>running_time 410867 decay_time = 450
> >>    
> >>
> >>>128486  25368 16384     JOB 21040.1 start_time = 1116443073 
> >>>      
> >>>
> >>running_time 342138 decay_time = 450
> >>    
> >>
> >>>128487  25368 16384     JOB 21076.1 start_time = 1116451351 
> >>>      
> >>>
> >>running_time 333860 decay_time = 450
> >>    
> >>
> >>>128488  25368 16384     JOB 21210.1 start_time = 1116514970 
> >>>      
> >>>
> >>running_time 270241 decay_time = 450
> >>    
> >>
> >>>128489  25368 16384     JOB 21213.1 start_time = 1116515250 
> >>>      
> >>>
> >>running_time 269961 decay_time = 450
> >>    
> >>
> >>>128490  25368 16384     JOB 21338.1 start_time = 1116543252 
> >>>      
> >>>
> >>running_time 241959 decay_time = 450
> >>    
> >>
> >>>128491  25368 16384     JOB 21423.1 start_time = 1116629274 
> >>>      
> >>>
> >>running_time 155937 decay_time = 450
> >>    
> >>
> >>>128492  25368 16384     JOB 21424.1 start_time = 1116631365 
> >>>      
> >>>
> >>running_time 153846 decay_time = 450
> >>    
> >>
> >>>128493  25368 16384     JOB 21440.1 start_time = 1116632934 
> >>>      
> >>>
> >>running_time 152277 decay_time = 450
> >>    
> >>
> >>>128494  25368 16384     JOB 21441.1 start_time = 1116632994 
> >>>      
> >>>
> >>running_time 152217 decay_time = 450
> >>    
> >>
> >>>128495  25368 16384     JOB 21443.1 start_time = 1116633602 
> >>>      
> >>>
> >>running_time 151609 decay_time = 450
> >>    
> >>
> >>>128496  25368 16384     JOB 21474.1 start_time = 1116655118 
> >>>      
> >>>
> >>running_time 130093 decay_time = 450
> >>    
> >>
> >>>128497  25368 16384     JOB 21503.1 start_time = 1116707395 
> >>>      
> >>>
> >>running_time 77816 decay_time = 450
> >>    
> >>
> >>>128498  25368 16384     JOB 21507.1 start_time = 1116714061 
> >>>      
> >>>
> >>running_time 71150 decay_time = 450
> >>    
> >>
> >>>128499  25368 16384     JOB 21528.1 start_time = 1116707641 
> >>>      
> >>>
> >>running_time 77570 decay_time = 450
> >>    
> >>
> >>>128500  25368 16384     JOB 21530.1 start_time = 1116714453 
> >>>      
> >>>
> >>running_time 70758 decay_time = 450
> >>    
> >>
> >>>128501  25368 16384     JOB 21537.1 start_time = 1116724845 
> >>>      
> >>>
> >>running_time 60366 decay_time = 450
> >>    
> >>
> >>>128502  25368 16384     JOB 21542.1 start_time = 1116782511 
> >>>      
> >>>
> >>running_time 2700 decay_time = 450
> >>    
> >>
> >>>128503  25368 16384     verified threshold of 169 queues
> >>>128504  25368 16384     queue myrinet at sub04n61 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128505  25368 16384     queue myrinet at sub04n62 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128506  25368 16384     queue myrinet at sub04n65 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128507  25368 16384     queue myrinet at sub04n66 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128508  25368 16384     queue myrinet at sub04n67 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128509  25368 16384     queue myrinet at sub04n68 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128510  25368 16384     queue myrinet at sub04n69 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128511  25368 16384     queue myrinet at sub04n70 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128512  25368 16384     queue myrinet at sub04n71 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128513  25368 16384     queue myrinet at sub04n72 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128514  25368 16384     queue myrinet at sub04n75 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128515  25368 16384     queue myrinet at sub04n77 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128516  25368 16384     queue myrinet at sub04n78 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128517  25368 16384     queue myrinet at sub04n79 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128518  25368 16384     queue myrinet at sub04n81 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128519  25368 16384     queue myrinet at sub04n84 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128520  25368 16384     queue myrinet at sub04n85 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128521  25368 16384     queue myrinet at sub04n86 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128522  25368 16384     queue myrinet at sub04n87 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128523  25368 16384     queue myrinet at sub04n88 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128524  25368 16384     queue myrinet at sub04n89 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128525  25368 16384     queue myrinet at sub04n90 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128526  25368 16384     queue myrinet at sub04n91 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128527  25368 16384     queue myrinet at sub04n63 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128528  25368 16384     queue myrinet at sub04n64 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128529  25368 16384     queue myrinet at sub04n73 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128530  25368 16384     queue myrinet at sub04n74 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128531  25368 16384     queue opteronp at sub04n202 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128532  25368 16384     queue opteronp at sub04n205 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128533  25368 16384     queue opteronp at sub04n206 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128534  25368 16384     queue opteronp at sub04n208 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128535  25368 16384     queue parallel at sub04n121 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128536  25368 16384     queue parallel at sub04n139 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128537  25368 16384     queue parallel at sub04n140 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128538  25368 16384     queue parallel at sub04n141 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128539  25368 16384     queue parallel at sub04n142 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128540  25368 16384     queue parallel at sub04n143 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128541  25368 16384     queue parallel at sub04n144 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128542  25368 16384     queue parallel at sub04n146 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128543  25368 16384     queue parallel at sub04n02 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128544  25368 16384     queue parallel at sub04n03 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128545  25368 16384     queue parallel at sub04n04 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128546  25368 16384     queue parallel at sub04n05 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128547  25368 16384     queue parallel at sub04n06 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128548  25368 16384     queue parallel at sub04n07 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128549  25368 16384     queue parallel at sub04n08 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128550  25368 16384     queue parallel at sub04n09 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128551  25368 16384     queue parallel at sub04n10 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128552  25368 16384     queue parallel at sub04n11 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128553  25368 16384     verified threshold of 169 queues
> >>>128554  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
> >>>128555  25368 16384     Not enrolled ja_tasks: 0
> >>>128556  25368 16384     Enrolled ja_tasks: 1
> >>>128557  25368 16384     Not enrolled ja_tasks: 0
> >>>128558  25368 16384     Enrolled ja_tasks: 1
> >>>128559  25368 16384     Not enrolled ja_tasks: 0
> >>>128560  25368 16384     Enrolled ja_tasks: 1
> >>>128561  25368 16384     Not enrolled ja_tasks: 0
> >>>128562  25368 16384     Enrolled ja_tasks: 1
> >>>128563  25368 16384     Not enrolled ja_tasks: 0
> >>>128564  25368 16384     Enrolled ja_tasks: 1
> >>>128565  25368 16384     Not enrolled ja_tasks: 0
> >>>128566  25368 16384     Enrolled ja_tasks: 1
> >>>128567  25368 16384     Not enrolled ja_tasks: 0
> >>>128568  25368 16384     Enrolled ja_tasks: 1
> >>>128569  25368 16384     Not enrolled ja_tasks: 0
> >>>128570  25368 16384     Enrolled ja_tasks: 1
> >>>128571  25368 16384     Not enrolled ja_tasks: 0
> >>>128572  25368 16384     Enrolled ja_tasks: 1
> >>>128573  25368 16384     Not enrolled ja_tasks: 0
> >>>128574  25368 16384     Enrolled ja_tasks: 1
> >>>128575  25368 16384     Not enrolled ja_tasks: 0
> >>>128576  25368 16384     Enrolled ja_tasks: 1
> >>>128577  25368 16384     Not enrolled ja_tasks: 0
> >>>128578  25368 16384     Enrolled ja_tasks: 1
> >>>128579  25368 16384     Not enrolled ja_tasks: 0
> >>>128580  25368 16384     Enrolled ja_tasks: 1
> >>>128581  25368 16384     Not enrolled ja_tasks: 0
> >>>128582  25368 16384     Enrolled ja_tasks: 1
> >>>128583  25368 16384     Not enrolled ja_tasks: 0
> >>>128584  25368 16384     Enrolled ja_tasks: 1
> >>>128585  25368 16384     Not enrolled ja_tasks: 0
> >>>128586  25368 16384     Enrolled ja_tasks: 1
> >>>128587  25368 16384     Not enrolled ja_tasks: 0
> >>>128588  25368 16384     Enrolled ja_tasks: 1
> >>>128589  25368 16384     Not enrolled ja_tasks: 0
> >>>128590  25368 16384     Enrolled ja_tasks: 1
> >>>128591  25368 16384     Not enrolled ja_tasks: 0
> >>>128592  25368 16384     Enrolled ja_tasks: 1
> >>>128593  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
> >>>128594  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128595  25368 16384        slots: 1.000000 * 1000.000000 * 6 
> >>>      
> >>>
> >>   ---> 6000.000000
> >>    
> >>
> >>>128596  25368 16384     slot request assumed for static 
> >>>      
> >>>
> >>urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
> >>    
> >>
> >>>128597  25368 16384        slots: 1.000000 * 1000.000000 * 
> >>>      
> >>>
> >>20    ---> 20000.000000
> >>    
> >>
> >>>128598  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128599  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128600  25368 16384        slots: 1.000000 * 1000.000000 * 6 
> >>>      
> >>>
> >>   ---> 6000.000000
> >>    
> >>
> >>>128601  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128602  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128603  25368 16384     slot request assumed for static 
> >>>      
> >>>
> >>urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" 
> setting "min"
> >>    
> >>
> >>>128604  25368 16384        slots: 1.000000 * 1000.000000 * 2 
> >>>      
> >>>
> >>   ---> 2000.000000
> >>    
> >>
> >>>128605  25368 16384        slots: 1.000000 * 1000.000000 * 8 
> >>>      
> >>>
> >>   ---> 8000.000000
> >>    
> >>
> >>>128606  25368 16384     ASU min = 1000.00000000000, ASU max 
> >>>      
> >>>
> >>= 20000.00000000000
> >>    
> >>
> >>>128607  25368 16384     
> >>>128608  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
> >>>128609  25368 16384     
> >>>128610  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>0]======================
> >>    
> >>
> >>>128611  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>1]======================
> >>    
> >>
> >>>128612  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>2]======================
> >>    
> >>
> >>>128613  25368 16384     
> >>>128614  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
> >>>128615  25368 16384     
> >>>128616  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>0]======================
> >>    
> >>
> >>>128617  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>1]======================
> >>    
> >>
> >>>128618  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>2]======================
> >>    
> >>
> >>>128619  25368 16384     Normalizing tickets using 
> >>>      
> >>>
> >>0.000000/18.333333 as min_tix/max_tix
> >>    
> >>
> >>>128620  25368 16384        got 19 running jobs
> >>>128621  25368 16384        added 19 ticket orders for running jobs
> >>>128622  25368 16384        added 1 orders for updating 
> usage of user
> >>>128623  25368 16384        added 0 orders for updating usage 
> >>>      
> >>>
> >>of project
> >>    
> >>
> >>>128624  25368 16384        added 0 orders for updating share tree
> >>>128625  25368 16384        added 1 orders for scheduler 
> configuration
> >>>128626  25368 16384     SENDING 22 ORDERS TO QMASTER
> >>>128627  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
> >>>128628  25368 16384     reresolve port timeout in 300
> >>>128629  25368 16384     returning cached port value: 536
> >>>--------------STOP-SCHEDULER-RUN-------------
> >>>128630  25368 16384     ec_get retrieving events - will do 
> >>>      
> >>>
> >>max 20 fetches
> >>    
> >>
> >>>128631  25368 16384     doing sync fetch for messages, 20 
> still to do
> >>>128632  25368 16384     try to get request from qmaster, id 1
> >>>128633  25368 16384     Checking 84 events (44478-44561) 
> >>>      
> >>>
> >>while waiting for #44478
> >>    
> >>
> >>>128634  25368 16384     check complete, 84 events in list
> >>>128635  25368 16384     got 84 events till 44561
> >>>128636  25368 16384     doing async fetch for messages, 19 
> >>>      
> >>>
> >>still to do
> >>    
> >>
> >>>128637  25368 16384     try to get request from qmaster, id 1
> >>>128638  25368 16384     reresolve port timeout in 280
> >>>128639  25368 16384     returning cached port value: 536
> >>>128640  25368 16384     Getting host by name - Linux
> >>>128641  25368 16384     1 names in h_addr_list
> >>>128642  25368 16384     0 names in h_aliases
> >>>128643  25368 16384     Sent ack for all events lower or 
> equal 44561
> >>>128644  25368 16384     ec_get - received 84 events
> >>>128645  25368 16384     44478. EVENT MOD EXECHOST sub04n167
> >>>128646  25368 16384     44479. EVENT MOD EXECHOST sub04n63
> >>>128647  25368 16384     44480. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>2.sub04n63 USAGE
> >>    
> >>
> >>>128648  25368 16384     44481. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>1.sub04n63 USAGE
> >>    
> >>
> >>>128649  25368 16384     44482. EVENT JOB 21542.1 USAGE
> >>>128650  25368 16384     44483. EVENT MOD EXECHOST sub04n71
> >>>128651  25368 16384     44484. EVENT JOB 21537.1 task 
> >>>      
> >>>
> >>2.sub04n71 USAGE
> >>    
> >>
> >>>128652  25368 16384     44485. EVENT JOB 21537.1 task 
> >>>      
> >>>
> >>1.sub04n71 USAGE
> >>    
> >>
> >>>128653  25368 16384     44486. EVENT MOD EXECHOST sub04n65
> >>>128654  25368 16384     44487. EVENT JOB 21424.1 task 
> >>>      
> >>>
> >>2.sub04n65 USAGE
> >>    
> >>
> >>>128655  25368 16384     44488. EVENT JOB 21424.1 task 
> >>>      
> >>>
> >>1.sub04n65 USAGE
> >>    
> >>
> >>>128656  25368 16384     44489. EVENT MOD USER udo
> >>>128657  25368 16384     44490. EVENT MOD USER iber
> >>>128658  25368 16384     44491. EVENT MOD USER dieguez
> >>>128659  25368 16384     44492. EVENT MOD USER karenjoh
> >>>128660  25368 16384     44493. EVENT MOD USER lorenzo
> >>>128661  25368 16384     44494. EVENT MOD USER parcolle
> >>>128662  25368 16384     44495. EVENT MOD USER cfennie
> >>>128663  25368 16384     44496. EVENT MOD USER civelli
> >>>128664  25368 16384     44497. EVENT MOD EXECHOST sub04n25
> >>>128665  25368 16384     44498. EVENT MOD EXECHOST sub04n144
> >>>128666  25368 16384     44499. EVENT MOD EXECHOST sub04n206
> >>>128667  25368 16384     44500. EVENT JOB 21441.1 USAGE
> >>>128668  25368 16384     44501. EVENT MOD EXECHOST sub04n87
> >>>128669  25368 16384     44502. EVENT JOB 21503.1 task 
> >>>      
> >>>
> >>2.sub04n87 USAGE
> >>    
> >>
> >>>128670  25368 16384     44503. EVENT JOB 21503.1 task 
> >>>      
> >>>
> >>1.sub04n87 USAGE
> >>    
> >>
> >>>128671  25368 16384     44504. EVENT MOD EXECHOST sub04n70
> >>>128672  25368 16384     44505. EVENT JOB 21503.1 task 
> >>>      
> >>>
> >>2.sub04n70 USAGE
> >>    
> >>
> >>>128673  25368 16384     44506. EVENT JOB 21503.1 task 
> >>>      
> >>>
> >>1.sub04n70 USAGE
> >>    
> >>
> >>>128674  25368 16384     44507. EVENT JOB 21503.1 USAGE
> >>>128675  25368 16384     44508. EVENT MOD EXECHOST sub04n19
> >>>128676  25368 16384     44509. EVENT JOB 21338.1 USAGE
> >>>128677  25368 16384     44510. EVENT MOD EXECHOST sub04n84
> >>>128678  25368 16384     44511. EVENT JOB 21424.1 task 
> >>>      
> >>>
> >>2.sub04n84 USAGE
> >>    
> >>
> >>>128679  25368 16384     44512. EVENT JOB 21424.1 task 
> >>>      
> >>>
> >>1.sub04n84 USAGE
> >>    
> >>
> >>>128680  25368 16384     44513. EVENT MOD EXECHOST sub04n178
> >>>128681  25368 16384     44514. EVENT MOD EXECHOST sub04n67
> >>>128682  25368 16384     44515. EVENT JOB 21474.1 task 
> >>>      
> >>>
> >>14.sub04n67 USAGE
> >>    
> >>
> >>>128683  25368 16384     44516. EVENT JOB 21474.1 task 
> >>>      
> >>>
> >>13.sub04n67 USAGE
> >>    
> >>
> >>>128684  25368 16384     44517. EVENT JOB 21474.1 USAGE
> >>>128685  25368 16384     44518. EVENT MOD EXECHOST sub04n27
> >>>128686  25368 16384     44519. EVENT MOD EXECHOST sub04n34
> >>>128687  25368 16384     44520. EVENT MOD EXECHOST sub04n72
> >>>128688  25368 16384     44521. EVENT JOB 21537.1 task 
> >>>      
> >>>
> >>2.sub04n72 USAGE
> >>    
> >>
> >>>128689  25368 16384     44522. EVENT JOB 21537.1 task 
> >>>      
> >>>
> >>1.sub04n72 USAGE
> >>    
> >>
> >>>128690  25368 16384     44523. EVENT MOD EXECHOST sub04n78
> >>>128691  25368 16384     44524. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>6.sub04n78 USAGE
> >>    
> >>
> >>>128692  25368 16384     44525. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>5.sub04n78 USAGE
> >>    
> >>
> >>>128693  25368 16384     44526. EVENT JOB 21507.1 USAGE
> >>>128694  25368 16384     44527. EVENT MOD EXECHOST sub04n17
> >>>128695  25368 16384     44528. EVENT MOD EXECHOST sub04n07
> >>>128696  25368 16384     44529. EVENT MOD EXECHOST sub04n128
> >>>128697  25368 16384     44530. EVENT MOD EXECHOST sub04n42
> >>>128698  25368 16384     44531. EVENT MOD EXECHOST sub04n62
> >>>128699  25368 16384     44532. EVENT JOB 21424.1 task 
> >>>      
> >>>
> >>2.sub04n62 USAGE
> >>    
> >>
> >>>128700  25368 16384     44533. EVENT JOB 21424.1 task 
> >>>      
> >>>
> >>1.sub04n62 USAGE
> >>    
> >>
> >>>128701  25368 16384     44534. EVENT JOB 21424.1 USAGE
> >>>128702  25368 16384     44535. EVENT MOD EXECHOST sub04n10
> >>>128703  25368 16384     44536. EVENT MOD EXECHOST sub04n77
> >>>128704  25368 16384     44537. EVENT JOB 21537.1 task 
> >>>      
> >>>
> >>2.sub04n77 USAGE
> >>    
> >>
> >>>128705  25368 16384     44538. EVENT JOB 21537.1 task 
> >>>      
> >>>
> >>1.sub04n77 USAGE
> >>    
> >>
> >>>128706  25368 16384     44539. EVENT MOD EXECHOST sub04n11
> >>>128707  25368 16384     44540. EVENT MOD EXECHOST sub04n02
> >>>128708  25368 16384     44541. EVENT MOD EXECHOST sub04n120
> >>>128709  25368 16384     44542. EVENT MOD EXECHOST sub04n115
> >>>128710  25368 16384     44543. EVENT MOD EXECHOST sub04n101
> >>>128711  25368 16384     44544. EVENT MOD EXECHOST sub04n66
> >>>128712  25368 16384     44545. EVENT JOB 21537.1 task 
> >>>      
> >>>
> >>2.sub04n66 USAGE
> >>    
> >>
> >>>128713  25368 16384     44546. EVENT JOB 21537.1 task 
> >>>      
> >>>
> >>1.sub04n66 USAGE
> >>    
> >>
> >>>128714  25368 16384     44547. EVENT JOB 21537.1 USAGE
> >>>128715  25368 16384     44548. EVENT MOD EXECHOST sub04n142
> >>>128716  25368 16384     44549. EVENT MOD EXECHOST sub04n123
> >>>128717  25368 16384     44550. EVENT MOD EXECHOST sub04n33
> >>>128718  25368 16384     44551. EVENT MOD EXECHOST sub04n126
> >>>128719  25368 16384     44552. EVENT MOD EXECHOST sub04n140
> >>>128720  25368 16384     44553. EVENT MOD EXECHOST sub04n119
> >>>128721  25368 16384     44554. EVENT MOD EXECHOST sub04n102
> >>>128722  25368 16384     44555. EVENT MOD EXECHOST sub04n110
> >>>128723  25368 16384     44556. EVENT MOD EXECHOST sub04n117
> >>>128724  25368 16384     44557. EVENT MOD EXECHOST sub04n06
> >>>128725  25368 16384     44558. EVENT MOD EXECHOST sub04n73
> >>>128726  25368 16384     44559. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>2.sub04n73 USAGE
> >>    
> >>
> >>>128727  25368 16384     44560. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>1.sub04n73 USAGE
> >>    
> >>
> >>>128728  25368 16384     44561. EVENT MOD EXECHOST sub04n122
> >>>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7,
> >>>      
> >>>
> >>CKPT:0 US:15 PR:4 S:nd:12/lf:7
> >>    
> >>
> >>>128729  25368 16384     
> >>>      
> >>>
> >>================[SCHEDULING-EPOCH]==================
> >>    
> >>
> >>>128730  25368 16384     JOB 20937.1 start_time = 1116447112 
> >>>      
> >>>
> >>running_time 338119 decay_time = 450
> >>    
> >>
> >>>128731  25368 16384     JOB 20938.1 start_time = 1116374344 
> >>>      
> >>>
> >>running_time 410887 decay_time = 450
> >>    
> >>
> >>>128732  25368 16384     JOB 21040.1 start_time = 1116443073 
> >>>      
> >>>
> >>running_time 342158 decay_time = 450
> >>    
> >>
> >>>128733  25368 16384     JOB 21076.1 start_time = 1116451351 
> >>>      
> >>>
> >>running_time 333880 decay_time = 450
> >>    
> >>
> >>>128734  25368 16384     JOB 21210.1 start_time = 1116514970 
> >>>      
> >>>
> >>running_time 270261 decay_time = 450
> >>    
> >>
> >>>128735  25368 16384     JOB 21213.1 start_time = 1116515250 
> >>>      
> >>>
> >>running_time 269981 decay_time = 450
> >>    
> >>
> >>>128736  25368 16384     JOB 21338.1 start_time = 1116543252 
> >>>      
> >>>
> >>running_time 241979 decay_time = 450
> >>    
> >>
> >>>128737  25368 16384     JOB 21423.1 start_time = 1116629274 
> >>>      
> >>>
> >>running_time 155957 decay_time = 450
> >>    
> >>
> >>>128738  25368 16384     JOB 21424.1 start_time = 1116631365 
> >>>      
> >>>
> >>running_time 153866 decay_time = 450
> >>    
> >>
> >>>128739  25368 16384     JOB 21440.1 start_time = 1116632934 
> >>>      
> >>>
> >>running_time 152297 decay_time = 450
> >>    
> >>
> >>>128740  25368 16384     JOB 21441.1 start_time = 1116632994 
> >>>      
> >>>
> >>running_time 152237 decay_time = 450
> >>    
> >>
> >>>128741  25368 16384     JOB 21443.1 start_time = 1116633602 
> >>>      
> >>>
> >>running_time 151629 decay_time = 450
> >>    
> >>
> >>>128742  25368 16384     JOB 21474.1 start_time = 1116655118 
> >>>      
> >>>
> >>running_time 130113 decay_time = 450
> >>    
> >>
> >>>128743  25368 16384     JOB 21503.1 start_time = 1116707395 
> >>>      
> >>>
> >>running_time 77836 decay_time = 450
> >>    
> >>
> >>>128744  25368 16384     JOB 21507.1 start_time = 1116714061 
> >>>      
> >>>
> >>running_time 71170 decay_time = 450
> >>    
> >>
> >>>128745  25368 16384     JOB 21528.1 start_time = 1116707641 
> >>>      
> >>>
> >>running_time 77590 decay_time = 450
> >>    
> >>
> >>>128746  25368 16384     JOB 21530.1 start_time = 1116714453 
> >>>      
> >>>
> >>running_time 70778 decay_time = 450
> >>    
> >>
> >>>128747  25368 16384     JOB 21537.1 start_time = 1116724845 
> >>>      
> >>>
> >>running_time 60386 decay_time = 450
> >>    
> >>
> >>>128748  25368 16384     JOB 21542.1 start_time = 1116782511 
> >>>      
> >>>
> >>running_time 2720 decay_time = 450
> >>    
> >>
> >>>128749  25368 16384     verified threshold of 169 queues
> >>>128750  25368 16384     queue myrinet at sub04n61 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128751  25368 16384     queue myrinet at sub04n62 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128752  25368 16384     queue myrinet at sub04n65 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128753  25368 16384     queue myrinet at sub04n66 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128754  25368 16384     queue myrinet at sub04n67 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128755  25368 16384     queue myrinet at sub04n68 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128756  25368 16384     queue myrinet at sub04n69 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128757  25368 16384     queue myrinet at sub04n70 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128758  25368 16384     queue myrinet at sub04n71 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128759  25368 16384     queue myrinet at sub04n72 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128760  25368 16384     queue myrinet at sub04n75 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128761  25368 16384     queue myrinet at sub04n77 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128762  25368 16384     queue myrinet at sub04n78 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128763  25368 16384     queue myrinet at sub04n79 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128764  25368 16384     queue myrinet at sub04n81 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128765  25368 16384     queue myrinet at sub04n84 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128766  25368 16384     queue myrinet at sub04n85 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128767  25368 16384     queue myrinet at sub04n86 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128768  25368 16384     queue myrinet at sub04n87 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128769  25368 16384     queue myrinet at sub04n88 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128770  25368 16384     queue myrinet at sub04n89 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128771  25368 16384     queue myrinet at sub04n90 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128772  25368 16384     queue myrinet at sub04n91 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128773  25368 16384     queue myrinet at sub04n63 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128774  25368 16384     queue myrinet at sub04n64 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128775  25368 16384     queue myrinet at sub04n73 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128776  25368 16384     queue myrinet at sub04n74 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128777  25368 16384     queue opteronp at sub04n202 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128778  25368 16384     queue opteronp at sub04n205 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128779  25368 16384     queue opteronp at sub04n206 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128780  25368 16384     queue opteronp at sub04n208 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128781  25368 16384     queue parallel at sub04n121 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128782  25368 16384     queue parallel at sub04n139 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128783  25368 16384     queue parallel at sub04n140 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128784  25368 16384     queue parallel at sub04n141 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128785  25368 16384     queue parallel at sub04n142 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128786  25368 16384     queue parallel at sub04n143 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128787  25368 16384     queue parallel at sub04n144 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128788  25368 16384     queue parallel at sub04n146 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128789  25368 16384     queue parallel at sub04n02 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128790  25368 16384     queue parallel at sub04n03 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128791  25368 16384     queue parallel at sub04n04 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128792  25368 16384     queue parallel at sub04n05 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128793  25368 16384     queue parallel at sub04n06 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128794  25368 16384     queue parallel at sub04n07 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128795  25368 16384     queue parallel at sub04n08 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128796  25368 16384     queue parallel at sub04n09 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128797  25368 16384     queue parallel at sub04n10 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128798  25368 16384     queue parallel at sub04n11 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128799  25368 16384     verified threshold of 169 queues
> >>>128800  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
> >>>128801  25368 16384     Not enrolled ja_tasks: 0
> >>>128802  25368 16384     Enrolled ja_tasks: 1
> >>>128803  25368 16384     Not enrolled ja_tasks: 0
> >>>128804  25368 16384     Enrolled ja_tasks: 1
> >>>128805  25368 16384     Not enrolled ja_tasks: 0
> >>>128806  25368 16384     Enrolled ja_tasks: 1
> >>>128807  25368 16384     Not enrolled ja_tasks: 0
> >>>128808  25368 16384     Enrolled ja_tasks: 1
> >>>128809  25368 16384     Not enrolled ja_tasks: 0
> >>>128810  25368 16384     Enrolled ja_tasks: 1
> >>>128811  25368 16384     Not enrolled ja_tasks: 0
> >>>128812  25368 16384     Enrolled ja_tasks: 1
> >>>128813  25368 16384     Not enrolled ja_tasks: 0
> >>>128814  25368 16384     Enrolled ja_tasks: 1
> >>>128815  25368 16384     Not enrolled ja_tasks: 0
> >>>128816  25368 16384     Enrolled ja_tasks: 1
> >>>128817  25368 16384     Not enrolled ja_tasks: 0
> >>>128818  25368 16384     Enrolled ja_tasks: 1
> >>>128819  25368 16384     Not enrolled ja_tasks: 0
> >>>128820  25368 16384     Enrolled ja_tasks: 1
> >>>128821  25368 16384     Not enrolled ja_tasks: 0
> >>>128822  25368 16384     Enrolled ja_tasks: 1
> >>>128823  25368 16384     Not enrolled ja_tasks: 0
> >>>128824  25368 16384     Enrolled ja_tasks: 1
> >>>128825  25368 16384     Not enrolled ja_tasks: 0
> >>>128826  25368 16384     Enrolled ja_tasks: 1
> >>>128827  25368 16384     Not enrolled ja_tasks: 0
> >>>128828  25368 16384     Enrolled ja_tasks: 1
> >>>128829  25368 16384     Not enrolled ja_tasks: 0
> >>>128830  25368 16384     Enrolled ja_tasks: 1
> >>>128831  25368 16384     Not enrolled ja_tasks: 0
> >>>128832  25368 16384     Enrolled ja_tasks: 1
> >>>128833  25368 16384     Not enrolled ja_tasks: 0
> >>>128834  25368 16384     Enrolled ja_tasks: 1
> >>>128835  25368 16384     Not enrolled ja_tasks: 0
> >>>128836  25368 16384     Enrolled ja_tasks: 1
> >>>128837  25368 16384     Not enrolled ja_tasks: 0
> >>>128838  25368 16384     Enrolled ja_tasks: 1
> >>>128839  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
> >>>128840  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128841  25368 16384        slots: 1.000000 * 1000.000000 * 6 
> >>>      
> >>>
> >>   ---> 6000.000000
> >>    
> >>
> >>>128842  25368 16384     slot request assumed for static 
> >>>      
> >>>
> >>urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
> >>    
> >>
> >>>128843  25368 16384        slots: 1.000000 * 1000.000000 * 
> >>>      
> >>>
> >>20    ---> 20000.000000
> >>    
> >>
> >>>128844  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128845  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128846  25368 16384        slots: 1.000000 * 1000.000000 * 6 
> >>>      
> >>>
> >>   ---> 6000.000000
> >>    
> >>
> >>>128847  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128848  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>128849  25368 16384     slot request assumed for static 
> >>>      
> >>>
> >>urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" 
> setting "min"
> >>    
> >>
> >>>128850  25368 16384        slots: 1.000000 * 1000.000000 * 2 
> >>>      
> >>>
> >>   ---> 2000.000000
> >>    
> >>
> >>>128851  25368 16384        slots: 1.000000 * 1000.000000 * 8 
> >>>      
> >>>
> >>   ---> 8000.000000
> >>    
> >>
> >>>128852  25368 16384     ASU min = 1000.00000000000, ASU max 
> >>>      
> >>>
> >>= 20000.00000000000
> >>    
> >>
> >>>128853  25368 16384     
> >>>128854  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
> >>>128855  25368 16384     
> >>>128856  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>0]======================
> >>    
> >>
> >>>128857  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>1]======================
> >>    
> >>
> >>>128858  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>2]======================
> >>    
> >>
> >>>128859  25368 16384     
> >>>128860  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
> >>>128861  25368 16384     
> >>>128862  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>0]======================
> >>    
> >>
> >>>128863  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>1]======================
> >>    
> >>
> >>>128864  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>2]======================
> >>    
> >>
> >>>128865  25368 16384     Normalizing tickets using 
> >>>      
> >>>
> >>0.000000/18.333333 as min_tix/max_tix
> >>    
> >>
> >>>128866  25368 16384        got 19 running jobs
> >>>128867  25368 16384        added 19 ticket orders for running jobs
> >>>128868  25368 16384        added 1 orders for updating 
> usage of user
> >>>128869  25368 16384        added 0 orders for updating usage 
> >>>      
> >>>
> >>of project
> >>    
> >>
> >>>128870  25368 16384        added 0 orders for updating share tree
> >>>128871  25368 16384        added 1 orders for scheduler 
> configuration
> >>>128872  25368 16384     SENDING 22 ORDERS TO QMASTER
> >>>128873  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
> >>>128874  25368 16384     reresolve port timeout in 280
> >>>128875  25368 16384     returning cached port value: 536
> >>>--------------STOP-SCHEDULER-RUN-------------
> >>>128876  25368 16384     ec_get retrieving events - will do 
> >>>      
> >>>
> >>max 20 fetches
> >>    
> >>
> >>>128877  25368 16384     doing sync fetch for messages, 20 
> still to do
> >>>128878  25368 16384     try to get request from qmaster, id 1
> >>>128879  25368 16384     Checking 55 events (44562-44616) 
> >>>      
> >>>
> >>while waiting for #44562
> >>    
> >>
> >>>128880  25368 16384     check complete, 55 events in list
> >>>128881  25368 16384     got 55 events till 44616
> >>>128882  25368 16384     doing async fetch for messages, 19 
> >>>      
> >>>
> >>still to do
> >>    
> >>
> >>>128883  25368 16384     try to get request from qmaster, id 1
> >>>128884  25368 16384     reresolve port timeout in 260
> >>>128885  25368 16384     returning cached port value: 536
> >>>128886  25368 16384     Sent ack for all events lower or 
> equal 44616
> >>>128887  25368 16384     ec_get - received 55 events
> >>>128888  25368 16384     44562. EVENT MOD EXECHOST sub04n147
> >>>128889  25368 16384     44563. EVENT MOD USER udo
> >>>128890  25368 16384     44564. EVENT MOD USER iber
> >>>128891  25368 16384     44565. EVENT MOD USER dieguez
> >>>128892  25368 16384     44566. EVENT MOD USER karenjoh
> >>>128893  25368 16384     44567. EVENT MOD USER lorenzo
> >>>128894  25368 16384     44568. EVENT MOD USER parcolle
> >>>128895  25368 16384     44569. EVENT MOD USER cfennie
> >>>128896  25368 16384     44570. EVENT MOD USER civelli
> >>>128897  25368 16384     44571. EVENT MOD EXECHOST sub04n135
> >>>128898  25368 16384     44572. EVENT MOD EXECHOST sub04n141
> >>>128899  25368 16384     44573. EVENT MOD EXECHOST sub04n127
> >>>128900  25368 16384     44574. EVENT MOD EXECHOST sub04n145
> >>>128901  25368 16384     44575. EVENT MOD EXECHOST sub04n133
> >>>128902  25368 16384     44576. EVENT MOD EXECHOST sub04n148
> >>>128903  25368 16384     44577. EVENT MOD EXECHOST sub04n74
> >>>128904  25368 16384     44578. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>2.sub04n74 USAGE
> >>    
> >>
> >>>128905  25368 16384     44579. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>1.sub04n74 USAGE
> >>    
> >>
> >>>128906  25368 16384     44580. EVENT MOD EXECHOST 
> rupc03.rutgers.edu
> >>>128907  25368 16384     44581. EVENT MOD EXECHOST sub04n139
> >>>128908  25368 16384     44582. EVENT MOD EXECHOST 
> rupc02.rutgers.edu
> >>>128909  25368 16384     44583. EVENT MOD EXECHOST sub04n80
> >>>128910  25368 16384     44584. EVENT MOD EXECHOST sub04n207
> >>>128911  25368 16384     44585. EVENT MOD EXECHOST sub04n180
> >>>128912  25368 16384     44586. EVENT MOD EXECHOST sub04n23
> >>>128913  25368 16384     44587. EVENT MOD EXECHOST sub04n30
> >>>128914  25368 16384     44588. EVENT MOD EXECHOST sub04n203
> >>>128915  25368 16384     44589. EVENT MOD EXECHOST sub04n109
> >>>128916  25368 16384     44590. EVENT MOD EXECHOST 
> rupc04.rutgers.edu
> >>>128917  25368 16384     44591. EVENT MOD EXECHOST sub04n114
> >>>128918  25368 16384     44592. EVENT MOD EXECHOST sub04n106
> >>>128919  25368 16384     44593. EVENT MOD EXECHOST sub04n88
> >>>128920  25368 16384     44594. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>6.sub04n88 USAGE
> >>    
> >>
> >>>128921  25368 16384     44595. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>5.sub04n88 USAGE
> >>    
> >>
> >>>128922  25368 16384     44596. EVENT MOD EXECHOST sub04n157
> >>>128923  25368 16384     44597. EVENT MOD EXECHOST sub04n20
> >>>128924  25368 16384     44598. EVENT MOD EXECHOST sub04n156
> >>>128925  25368 16384     44599. EVENT MOD EXECHOST sub04n26
> >>>128926  25368 16384     44600. EVENT JOB 21213.1 USAGE
> >>>128927  25368 16384     44601. EVENT MOD EXECHOST sub04n09
> >>>128928  25368 16384     44602. EVENT MOD EXECHOST sub04n05
> >>>128929  25368 16384     44603. EVENT MOD EXECHOST sub04n103
> >>>128930  25368 16384     44604. EVENT MOD EXECHOST sub04n164
> >>>128931  25368 16384     44605. EVENT MOD EXECHOST sub04n105
> >>>128932  25368 16384     44606. EVENT MOD EXECHOST sub04n113
> >>>128933  25368 16384     44607. EVENT MOD EXECHOST sub04n28
> >>>128934  25368 16384     44608. EVENT MOD EXECHOST sub04n76
> >>>128935  25368 16384     44609. EVENT MOD EXECHOST sub04n162
> >>>128936  25368 16384     44610. EVENT MOD EXECHOST sub04n108
> >>>128937  25368 16384     44611. EVENT MOD EXECHOST sub04n38
> >>>128938  25368 16384     44612. EVENT MOD EXECHOST sub04n116
> >>>128939  25368 16384     44613. EVENT MOD EXECHOST sub04n179
> >>>128940  25368 16384     44614. EVENT MOD EXECHOST sub04n04
> >>>128941  25368 16384     44615. EVENT MOD EXECHOST sub04n160
> >>>128942  25368 16384     44616. EVENT MOD EXECHOST sub04n107
> >>>Q:169, AQ:343 J:19(19), H:169(170), C:49, A:4, D:3, P:7,
> >>>      
> >>>
> >>CKPT:0 US:15 PR:4 S:nd:12/lf:7
> >>    
> >>
> >>>128943  25368 16384     
> >>>      
> >>>
> >>================[SCHEDULING-EPOCH]==================
> >>    
> >>
> >>>128944  25368 16384     JOB 20937.1 start_time = 1116447112 
> >>>      
> >>>
> >>running_time 338139 decay_time = 450
> >>    
> >>
> >>>128945  25368 16384     JOB 20938.1 start_time = 1116374344 
> >>>      
> >>>
> >>running_time 410907 decay_time = 450
> >>    
> >>
> >>>128946  25368 16384     JOB 21040.1 start_time = 1116443073 
> >>>      
> >>>
> >>running_time 342178 decay_time = 450
> >>    
> >>
> >>>128947  25368 16384     JOB 21076.1 start_time = 1116451351 
> >>>      
> >>>
> >>running_time 333900 decay_time = 450
> >>    
> >>
> >>>128948  25368 16384     JOB 21210.1 start_time = 1116514970 
> >>>      
> >>>
> >>running_time 270281 decay_time = 450
> >>    
> >>
> >>>128949  25368 16384     JOB 21213.1 start_time = 1116515250 
> >>>      
> >>>
> >>running_time 270001 decay_time = 450
> >>    
> >>
> >>>128950  25368 16384     JOB 21338.1 start_time = 1116543252 
> >>>      
> >>>
> >>running_time 241999 decay_time = 450
> >>    
> >>
> >>>128951  25368 16384     JOB 21423.1 start_time = 1116629274 
> >>>      
> >>>
> >>running_time 155977 decay_time = 450
> >>    
> >>
> >>>128952  25368 16384     JOB 21424.1 start_time = 1116631365 
> >>>      
> >>>
> >>running_time 153886 decay_time = 450
> >>    
> >>
> >>>128953  25368 16384     JOB 21440.1 start_time = 1116632934 
> >>>      
> >>>
> >>running_time 152317 decay_time = 450
> >>    
> >>
> >>>128954  25368 16384     JOB 21441.1 start_time = 1116632994 
> >>>      
> >>>
> >>running_time 152257 decay_time = 450
> >>    
> >>
> >>>128955  25368 16384     JOB 21443.1 start_time = 1116633602 
> >>>      
> >>>
> >>running_time 151649 decay_time = 450
> >>    
> >>
> >>>128956  25368 16384     JOB 21474.1 start_time = 1116655118 
> >>>      
> >>>
> >>running_time 130133 decay_time = 450
> >>    
> >>
> >>>128957  25368 16384     JOB 21503.1 start_time = 1116707395 
> >>>      
> >>>
> >>running_time 77856 decay_time = 450
> >>    
> >>
> >>>128958  25368 16384     JOB 21507.1 start_time = 1116714061 
> >>>      
> >>>
> >>running_time 71190 decay_time = 450
> >>    
> >>
> >>>128959  25368 16384     JOB 21528.1 start_time = 1116707641 
> >>>      
> >>>
> >>running_time 77610 decay_time = 450
> >>    
> >>
> >>>128960  25368 16384     JOB 21530.1 start_time = 1116714453 
> >>>      
> >>>
> >>running_time 70798 decay_time = 450
> >>    
> >>
> >>>128961  25368 16384     JOB 21537.1 start_time = 1116724845 
> >>>      
> >>>
> >>running_time 60406 decay_time = 450
> >>    
> >>
> >>>128962  25368 16384     JOB 21542.1 start_time = 1116782511 
> >>>      
> >>>
> >>running_time 2740 decay_time = 450
> >>    
> >>
> >>>128963  25368 16384     verified threshold of 169 queues
> >>>128964  25368 16384     queue myrinet at sub04n61 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128965  25368 16384     queue myrinet at sub04n62 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128966  25368 16384     queue myrinet at sub04n65 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128967  25368 16384     queue myrinet at sub04n66 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128968  25368 16384     queue myrinet at sub04n67 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128969  25368 16384     queue myrinet at sub04n68 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128970  25368 16384     queue myrinet at sub04n69 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128971  25368 16384     queue myrinet at sub04n70 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128972  25368 16384     queue myrinet at sub04n71 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128973  25368 16384     queue myrinet at sub04n72 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128974  25368 16384     queue myrinet at sub04n75 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128975  25368 16384     queue myrinet at sub04n77 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128976  25368 16384     queue myrinet at sub04n78 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128977  25368 16384     queue myrinet at sub04n79 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128978  25368 16384     queue myrinet at sub04n81 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128979  25368 16384     queue myrinet at sub04n84 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128980  25368 16384     queue myrinet at sub04n85 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128981  25368 16384     queue myrinet at sub04n86 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128982  25368 16384     queue myrinet at sub04n87 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128983  25368 16384     queue myrinet at sub04n88 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128984  25368 16384     queue myrinet at sub04n89 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128985  25368 16384     queue myrinet at sub04n90 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128986  25368 16384     queue myrinet at sub04n91 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128987  25368 16384     queue myrinet at sub04n63 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128988  25368 16384     queue myrinet at sub04n64 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128989  25368 16384     queue myrinet at sub04n73 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128990  25368 16384     queue myrinet at sub04n74 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128991  25368 16384     queue opteronp at sub04n202 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128992  25368 16384     queue opteronp at sub04n205 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128993  25368 16384     queue opteronp at sub04n206 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128994  25368 16384     queue opteronp at sub04n208 tagged to 
> >>>      
> >>>
> >>be overloaded: load_medium=1.000000 (no load adjustment) >= 1.0
> >>    
> >>
> >>>128995  25368 16384     queue parallel at sub04n121 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128996  25368 16384     queue parallel at sub04n139 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128997  25368 16384     queue parallel at sub04n140 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128998  25368 16384     queue parallel at sub04n141 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>128999  25368 16384     queue parallel at sub04n142 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129000  25368 16384     queue parallel at sub04n143 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129001  25368 16384     queue parallel at sub04n144 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129002  25368 16384     queue parallel at sub04n146 tagged to 
> >>>      
> >>>
> >>be overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129003  25368 16384     queue parallel at sub04n02 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129004  25368 16384     queue parallel at sub04n03 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.020000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129005  25368 16384     queue parallel at sub04n04 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129006  25368 16384     queue parallel at sub04n05 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129007  25368 16384     queue parallel at sub04n06 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129008  25368 16384     queue parallel at sub04n07 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129009  25368 16384     queue parallel at sub04n08 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.010000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129010  25368 16384     queue parallel at sub04n09 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129011  25368 16384     queue parallel at sub04n10 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129012  25368 16384     queue parallel at sub04n11 tagged to be 
> >>>      
> >>>
> >>overloaded: load_avg=2.000000 (no load adjustment) >= 1.4
> >>    
> >>
> >>>129013  25368 16384     verified threshold of 169 queues
> >>>129014  25368 16384     STARTING PASS 1 WITH 0 PENDING JOBS
> >>>129015  25368 16384     Not enrolled ja_tasks: 0
> >>>129016  25368 16384     Enrolled ja_tasks: 1
> >>>129017  25368 16384     Not enrolled ja_tasks: 0
> >>>129018  25368 16384     Enrolled ja_tasks: 1
> >>>129019  25368 16384     Not enrolled ja_tasks: 0
> >>>129020  25368 16384     Enrolled ja_tasks: 1
> >>>129021  25368 16384     Not enrolled ja_tasks: 0
> >>>129022  25368 16384     Enrolled ja_tasks: 1
> >>>129023  25368 16384     Not enrolled ja_tasks: 0
> >>>129024  25368 16384     Enrolled ja_tasks: 1
> >>>129025  25368 16384     Not enrolled ja_tasks: 0
> >>>129026  25368 16384     Enrolled ja_tasks: 1
> >>>129027  25368 16384     Not enrolled ja_tasks: 0
> >>>129028  25368 16384     Enrolled ja_tasks: 1
> >>>129029  25368 16384     Not enrolled ja_tasks: 0
> >>>129030  25368 16384     Enrolled ja_tasks: 1
> >>>129031  25368 16384     Not enrolled ja_tasks: 0
> >>>129032  25368 16384     Enrolled ja_tasks: 1
> >>>129033  25368 16384     Not enrolled ja_tasks: 0
> >>>129034  25368 16384     Enrolled ja_tasks: 1
> >>>129035  25368 16384     Not enrolled ja_tasks: 0
> >>>129036  25368 16384     Enrolled ja_tasks: 1
> >>>129037  25368 16384     Not enrolled ja_tasks: 0
> >>>129038  25368 16384     Enrolled ja_tasks: 1
> >>>129039  25368 16384     Not enrolled ja_tasks: 0
> >>>129040  25368 16384     Enrolled ja_tasks: 1
> >>>129041  25368 16384     Not enrolled ja_tasks: 0
> >>>129042  25368 16384     Enrolled ja_tasks: 1
> >>>129043  25368 16384     Not enrolled ja_tasks: 0
> >>>129044  25368 16384     Enrolled ja_tasks: 1
> >>>129045  25368 16384     Not enrolled ja_tasks: 0
> >>>129046  25368 16384     Enrolled ja_tasks: 1
> >>>129047  25368 16384     Not enrolled ja_tasks: 0
> >>>129048  25368 16384     Enrolled ja_tasks: 1
> >>>129049  25368 16384     Not enrolled ja_tasks: 0
> >>>129050  25368 16384     Enrolled ja_tasks: 1
> >>>129051  25368 16384     Not enrolled ja_tasks: 0
> >>>129052  25368 16384     Enrolled ja_tasks: 1
> >>>129053  25368 16384     STARTING PASS 2 WITH 0 PENDING JOBS
> >>>129054  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>129055  25368 16384        slots: 1.000000 * 1000.000000 * 6 
> >>>      
> >>>
> >>   ---> 6000.000000
> >>    
> >>
> >>>129056  25368 16384     slot request assumed for static 
> >>>      
> >>>
> >>urgency is 20 for ,20-64 PE range due to PE's "mpi" setting "min"
> >>    
> >>
> >>>129057  25368 16384        slots: 1.000000 * 1000.000000 * 
> >>>      
> >>>
> >>20    ---> 20000.000000
> >>    
> >>
> >>>129058  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>129059  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>129060  25368 16384        slots: 1.000000 * 1000.000000 * 6 
> >>>      
> >>>
> >>   ---> 6000.000000
> >>    
> >>
> >>>129061  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>129062  25368 16384        slots: 1.000000 * 1000.000000 * 1 
> >>>      
> >>>
> >>   ---> 1000.000000
> >>    
> >>
> >>>129063  25368 16384     slot request assumed for static 
> >>>      
> >>>
> >>urgency is 2 for ,2-8 PE range due to PE's "mpich_myri" 
> setting "min"
> >>    
> >>
> >>>129064  25368 16384        slots: 1.000000 * 1000.000000 * 2 
> >>>      
> >>>
> >>   ---> 2000.000000
> >>    
> >>
> >>>129065  25368 16384        slots: 1.000000 * 1000.000000 * 8 
> >>>      
> >>>
> >>   ---> 8000.000000
> >>    
> >>
> >>>129066  25368 16384     ASU min = 1000.00000000000, ASU max 
> >>>      
> >>>
> >>= 20000.00000000000
> >>    
> >>
> >>>129067  25368 16384     
> >>>129068  25368 16384     no DDJU: do_usage: 1 finished_jobs 0
> >>>129069  25368 16384     
> >>>129070  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>0]======================
> >>    
> >>
> >>>129071  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>1]======================
> >>    
> >>
> >>>129072  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>2]======================
> >>    
> >>
> >>>129073  25368 16384     
> >>>129074  25368 16384     no DDJU: do_usage: 0 finished_jobs 0
> >>>129075  25368 16384     
> >>>129076  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>0]======================
> >>    
> >>
> >>>129077  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>1]======================
> >>    
> >>
> >>>129078  25368 16384     =====================[Pass 
> >>>      
> >>>
> >>2]======================
> >>    
> >>
> >>>129079  25368 16384     Normalizing tickets using 
> >>>      
> >>>
> >>0.000000/18.333333 as min_tix/max_tix
> >>    
> >>
> >>>129080  25368 16384        got 19 running jobs
> >>>129081  25368 16384        added 19 ticket orders for running jobs
> >>>129082  25368 16384        added 1 orders for updating 
> usage of user
> >>>129083  25368 16384        added 0 orders for updating usage 
> >>>      
> >>>
> >>of project
> >>    
> >>
> >>>129084  25368 16384        added 0 orders for updating share tree
> >>>129085  25368 16384        added 1 orders for scheduler 
> configuration
> >>>129086  25368 16384     SENDING 22 ORDERS TO QMASTER
> >>>129087  25368 16384     RESETTING BUSY STATE OF EVENT CLIENT
> >>>129088  25368 16384     reresolve port timeout in 260
> >>>129089  25368 16384     returning cached port value: 536
> >>>--------------STOP-SCHEDULER-RUN-------------
> >>>129090  25368 16384     ec_get retrieving events - will do 
> >>>      
> >>>
> >>max 20 fetches
> >>    
> >>
> >>>129091  25368 16384     doing sync fetch for messages, 20 
> still to do
> >>>129092  25368 16384     try to get request from qmaster, id 1
> >>>129093  25368 16384     Checking 154 events (44617-44770) 
> >>>      
> >>>
> >>while waiting for #44617
> >>    
> >>
> >>>129094  25368 16384     check complete, 154 events in list
> >>>129095  25368 16384     got 154 events till 44770
> >>>129096  25368 16384     doing async fetch for messages, 19 
> >>>      
> >>>
> >>still to do
> >>    
> >>
> >>>129097  25368 16384     try to get request from qmaster, id 1
> >>>129098  25368 16384     reresolve port timeout in 240
> >>>129099  25368 16384     returning cached port value: 536
> >>>129100  25368 16384     Sent ack for all events lower or 
> equal 44770
> >>>129101  25368 16384     ec_get - received 154 events
> >>>129102  25368 16384     44617. EVENT MOD EXECHOST sub04n08
> >>>129103  25368 16384     44618. EVENT MOD EXECHOST sub04n166
> >>>129104  25368 16384     44619. EVENT MOD EXECHOST sub04n168
> >>>129105  25368 16384     44620. EVENT MOD EXECHOST sub04n112
> >>>129106  25368 16384     44621. EVENT MOD EXECHOST sub04n90
> >>>129107  25368 16384     44622. EVENT JOB 21503.1 task 
> >>>      
> >>>
> >>2.sub04n90 USAGE
> >>    
> >>
> >>>129108  25368 16384     44623. EVENT JOB 21503.1 task 
> >>>      
> >>>
> >>1.sub04n90 USAGE
> >>    
> >>
> >>>129109  25368 16384     44624. EVENT MOD USER udo
> >>>129110  25368 16384     44625. EVENT MOD USER iber
> >>>129111  25368 16384     44626. EVENT MOD USER dieguez
> >>>129112  25368 16384     44627. EVENT MOD USER karenjoh
> >>>129113  25368 16384     44628. EVENT MOD USER lorenzo
> >>>129114  25368 16384     44629. EVENT MOD USER parcolle
> >>>129115  25368 16384     44630. EVENT MOD USER cfennie
> >>>129116  25368 16384     44631. EVENT MOD USER civelli
> >>>129117  25368 16384     44632. EVENT MOD EXECHOST sub04n14
> >>>129118  25368 16384     44633. EVENT MOD EXECHOST sub04n75
> >>>129119  25368 16384     44634. EVENT JOB 21040.1 task 
> >>>      
> >>>
> >>6.sub04n75 USAGE
> >>    
> >>
> >>>129120  25368 16384     44635. EVENT JOB 21040.1 task 
> >>>      
> >>>
> >>5.sub04n75 USAGE
> >>    
> >>
> >>>129121  25368 16384     44636. EVENT MOD EXECHOST sub04n150
> >>>129122  25368 16384     44637. EVENT MOD EXECHOST sub04n169
> >>>129123  25368 16384     44638. EVENT MOD EXECHOST sub04n165
> >>>129124  25368 16384     44639. EVENT MOD EXECHOST sub04n136
> >>>129125  25368 16384     44640. EVENT MOD EXECHOST sub04n176
> >>>129126  25368 16384     44641. EVENT MOD EXECHOST sub04n81
> >>>129127  25368 16384     44642. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>6.sub04n81 USAGE
> >>    
> >>
> >>>129128  25368 16384     44643. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>5.sub04n81 USAGE
> >>    
> >>
> >>>129129  25368 16384     44644. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>past_usage USAGE
> >>    
> >>
> >>>129130  25368 16384     44645. EVENT DEL PETASK 21507.1 task 
> >>>      
> >>>
> >>6.sub04n88
> >>    
> >>
> >>>129131  25368 16384     44646. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>past_usage USAGE
> >>    
> >>
> >>>129132  25368 16384     44647. EVENT DEL PETASK 21507.1 task 
> >>>      
> >>>
> >>6.sub04n78
> >>    
> >>
> >>>129133  25368 16384     44648. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>past_usage USAGE
> >>    
> >>
> >>>129134  25368 16384     44649. EVENT DEL PETASK 21507.1 task 
> >>>      
> >>>
> >>6.sub04n81
> >>    
> >>
> >>>129135  25368 16384     44650. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>past_usage USAGE
> >>    
> >>
> >>>129136  25368 16384     44651. EVENT DEL PETASK 21507.1 task 
> >>>      
> >>>
> >>5.sub04n81
> >>    
> >>
> >>>129137  25368 16384     44652. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>past_usage USAGE
> >>    
> >>
> >>>129138  25368 16384     44653. EVENT DEL PETASK 21507.1 task 
> >>>      
> >>>
> >>5.sub04n88
> >>    
> >>
> >>>129139  25368 16384     44654. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>past_usage USAGE
> >>    
> >>
> >>>129140  25368 16384     44655. EVENT DEL PETASK 21507.1 task 
> >>>      
> >>>
> >>5.sub04n78
> >>    
> >>
> >>>129141  25368 16384     44656. EVENT MOD EXECHOST sub04n161
> >>>129142  25368 16384     44657. EVENT MOD EXECHOST sub04n124
> >>>129143  25368 16384     44658. EVENT ADD PETASK 21507.1 task 
> >>>      
> >>>
> >>7.sub04n88
> >>    
> >>
> >>>129144  25368 16384     44659. EVENT ADD PETASK 21507.1 task 
> >>>      
> >>>
> >>7.sub04n78
> >>    
> >>
> >>>129145  25368 16384     44660. EVENT MOD EXECHOST sub04n158
> >>>129146  25368 16384     44661. EVENT MOD EXECHOST sub04n01
> >>>129147  25368 16384     44662. EVENT MOD EXECHOST sub04n159
> >>>129148  25368 16384     44663. EVENT ADD PETASK 21507.1 task 
> >>>      
> >>>
> >>7.sub04n81
> >>    
> >>
> >>>129149  25368 16384     44664. EVENT MOD EXECHOST sub04n134
> >>>129150  25368 16384     44665. EVENT ADD PETASK 21507.1 task 
> >>>      
> >>>
> >>8.sub04n88
> >>    
> >>
> >>>129151  25368 16384     44666. EVENT ADD PETASK 21507.1 task 
> >>>      
> >>>
> >>8.sub04n78
> >>    
> >>
> >>>129152  25368 16384     44667. EVENT ADD PETASK 21507.1 task 
> >>>      
> >>>
> >>8.sub04n81
> >>    
> >>
> >>>129153  25368 16384     44668. EVENT MOD EXECHOST sub04n121
> >>>129154  25368 16384     44669. EVENT MOD EXECHOST sub04n143
> >>>129155  25368 16384     44670. EVENT MOD EXECHOST sub04n15
> >>>129156  25368 16384     44671. EVENT MOD EXECHOST sub04n13
> >>>129157  25368 16384     44672. EVENT MOD EXECHOST sub04n64
> >>>129158  25368 16384     44673. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>2.sub04n64 USAGE
> >>    
> >>
> >>>129159  25368 16384     44674. EVENT JOB 21542.1 task 
> >>>      
> >>>
> >>1.sub04n64 USAGE
> >>    
> >>
> >>>129160  25368 16384     44675. EVENT MOD EXECHOST sub04n118
> >>>129161  25368 16384     44676. EVENT MOD EXECHOST sub04n151
> >>>129162  25368 16384     44677. EVENT MOD EXECHOST sub04n154
> >>>129163  25368 16384     44678. EVENT MOD EXECHOST sub04n149
> >>>129164  25368 16384     44679. EVENT MOD EXECHOST sub04n16
> >>>129165  25368 16384     44680. EVENT MOD EXECHOST sub04n155
> >>>129166  25368 16384     44681. EVENT MOD EXECHOST sub04n152
> >>>129167  25368 16384     44682. EVENT MOD EXECHOST sub04n163
> >>>129168  25368 16384     44683. EVENT MOD EXECHOST sub04n43
> >>>129169  25368 16384     44684. EVENT MOD EXECHOST sub04n86
> >>>129170  25368 16384     44685. EVENT JOB 21423.1 task 
> >>>      
> >>>
> >>2.sub04n86 USAGE
> >>    
> >>
> >>>129171  25368 16384     44686. EVENT JOB 21423.1 task 
> >>>      
> >>>
> >>1.sub04n86 USAGE
> >>    
> >>
> >>>129172  25368 16384     44687. EVENT MOD EXECHOST sub04n03
> >>>129173  25368 16384     44688. EVENT JOB 21076.1 USAGE
> >>>129174  25368 16384     44689. EVENT MOD EXECHOST sub04n204
> >>>129175  25368 16384     44690. EVENT MOD EXECHOST 
> rupc01.rutgers.edu
> >>>129176  25368 16384     44691. EVENT MOD EXECHOST sub04n125
> >>>129177  25368 16384     44692. EVENT MOD EXECHOST sub04n44
> >>>129178  25368 16384     44693. EVENT MOD EXECHOST sub04n32
> >>>129179  25368 16384     44694. EVENT MOD EXECHOST sub04n21
> >>>129180  25368 16384     44695. EVENT MOD EXECHOST sub04n22
> >>>129181  25368 16384     44696. EVENT MOD EXECHOST sub04n35
> >>>129182  25368 16384     44697. EVENT MOD EXECHOST sub04n201
> >>>129183  25368 16384     44698. EVENT MOD EXECHOST sub04n205
> >>>129184  25368 16384     44699. EVENT JOB 21440.1 USAGE
> >>>129185  25368 16384     44700. EVENT MOD EXECHOST sub04n111
> >>>129186  25368 16384     44701. EVENT MOD EXECHOST sub04n89
> >>>129187  25368 16384     44702. EVENT JOB 21530.1 task 
> >>>      
> >>>
> >>2.sub04n89 USAGE
> >>    
> >>
> >>>129188  25368 16384     44703. EVENT JOB 21530.1 task 
> >>>      
> >>>
> >>1.sub04n89 USAGE
> >>    
> >>
> >>>129189  25368 16384     44704. EVENT JOB 21530.1 USAGE
> >>>129190  25368 16384     44705. EVENT MOD EXECHOST sub04n177
> >>>129191  25368 16384     44706. EVENT MOD EXECHOST sub04n146
> >>>129192  25368 16384     44707. EVENT ADD PETASK 21507.1 task 
> >>>      
> >>>
> >>9.sub04n88
> >>    
> >>
> >>>129193  25368 16384     44708. EVENT JOB 21507.1 task 
> >>>      
> >>>
> >>past_usage USAGE
> >>    
> >>
> >>>129194  25368 16384     44709. EVENT DEL PETASK 21507.1 task 
> >>>      
> >>>
> >>7.sub04n88
> >>    
> >>
> >>>Segmentation fault
> >>>You have new mail in /var/spool/mail/root 
> rupc-cs04b:/opt/SGE/util #
> >>>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>      
> >>>
> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>+++++++++++++++++++++
> >>    
> >>
> >>>/opt/SGE/default/spool/qmaster
> >>>
> >>>Sun May 22 14:25:16 EDT 2005
> >>>05/22/2005 00:20:01|qmaster|rupc-cs04b|E|event client "scheduler"
> >>>(rupc-cs04b/schedd/1) reregistered - it will need a total update 
> >>>05/22/2005 00:32:40|qmaster|rupc-cs04b|W|job 21538.1 
> failed on host 
> >>>sub04n63 in recognising job because: execd doesn't know this job 
> >>>05/22/2005 00:32:49|qmaster|rupc-cs04b|E|execd sub04n63 
> >>>      
> >>>
> >>reports running
> >>    
> >>
> >>>state for job (21538.1/master) in queue "myrinet at sub04n63"
> >>>      
> >>>
> >>while job is
> >>    
> >>
> >>>in state 65536 05/22/2005
> >>>      
> >>>
> >>00:33:49|qmaster|rupc-cs04b|E|execd at sub04n63
> >>    
> >>
> >>>reports running job (21538.1/master) in queue
> >>>      
> >>>
> >>"myrinet at sub04n63" that
> >>    
> >>
> >>>was not supposed to be there - killing 05/22/2005
> >>>02:10:01|qmaster|rupc-cs04b|E|event client "scheduler" 
> >>>(rupc-cs04b/schedd/1) reregistered - it will need a total update 
> >>>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders 
> user/project version 
> >>>(1035) is not uptodate (1036) for user/project "udo" 05/22/2005 
> >>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
> >>>      
> >>>
> >>(1035) is not
> >>    
> >>
> >>>uptodate (1036) for user/project "iber" 05/22/2005
> >>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
> >>>      
> >>>
> >>(1035) is not
> >>    
> >>
> >>>uptodate (1036) for user/project "dieguez" 05/22/2005
> >>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
> >>>      
> >>>
> >>(1035) is not
> >>    
> >>
> >>>uptodate (1036) for user/project "zayak" 05/22/2005
> >>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
> >>>      
> >>>
> >>(1035) is not
> >>    
> >>
> >>>uptodate (1036) for user/project "karenjoh" 05/22/2005
> >>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
> >>>      
> >>>
> >>(1035) is not
> >>    
> >>
> >>>uptodate (1036) for user/project "lorenzo" 05/22/2005
> >>>02:30:26|qmaster|rupc-cs04b|E|orders user/project version 
> >>>      
> >>>
> >>(1035) is not uptodate (1036) for user/project "parcolle"
> >>05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders user/project 
> >>version (1035) is not uptodate (1036) for user/project 
> >>"cfennie" 05/22/2005 02:30:26|qmaster|rupc-cs04b|E|orders 
> >>user/project version (1035) is not uptodate (1036) for 
> >>user/project "civelli" 05/22/2005 
> >>02:34:06|qmaster|rupc-cs04b|E|orders user/project version 
> >>(1044) is not uptodate (1045) for user/project "udo" 
> >>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project 
> >>version (1044) is not uptodate (1045) for user/project "iber" 
> >>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project 
> >>version (1044) is not uptodate (1045) for user/project 
> >>"dieguez" 05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders 
> >>user/project version (1044) is not uptodate (1045) for 
> >>user/project "zayak" 05/22/2005 
> >>02:34:06|qmaster|rupc-cs04b|E|orders user/project version 
> >>(1044) is not uptodate (1045) for user/project "karenjoh" 
> >>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project 
> >>version (1044) is not uptodate (1045) for user/project 
> >>"lorenzo" 05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders 
> >>user/project version (1044) is not uptodate (1045) for 
> >>user/project "parcolle" 05/22/2005 
> >>02:34:06|qmaster|rupc-cs04b|E|orders user/project version 
> >>(1044) is not uptodate (1045) for user/project "cfennie" 
> >>05/22/2005 02:34:06|qmaster|rupc-cs04b|E|orders user/project 
> >>version (1044) is not uptodate (1045) for user/project 
> >>"civelli" 05/22/2005 03:02:47|qmaster|rupc-cs04b|E|tightly 
> >>integrated parallel task 21539.1 task 3.sub04n83 failed - 
> killing job
> >>    
> >>
> >>>05/22/2005 03:10:01|qmaster|rupc-cs04b|E|event client
> >>>      
> >>>
> >>"scheduler" (rupc-cs04b/schedd/1) reregistered - it will need 
> >>a total update    <-- YOU SEE THESE 2 lines : THE SCHEDULER 
> >>DIED EVEN WITHOUT ANY EVENTS , JUST by itself !!!
> >>    
> >>
> >>>05/22/2005 07:30:01|qmaster|rupc-cs04b|E|event client
> >>>      
> >>>
> >>"scheduler" (rupc-cs04b/schedd/1) reregistered - it will need
> >>a total update
> >>    
> >>
> >>>05/22/2005 11:11:39|qmaster|rupc-cs04b|E|event client
> >>>      
> >>>
> >>"scheduler" (rupc-cs04b/schedd/1) reregistered - it will need 
> >>a total update    <-- BEFORE THE LAST CRASH
> >>    
> >>
> >>>05/22/2005 14:07:53|qmaster|rupc-cs04b|E|tightly integrated
> >>>      
> >>>
> >>parallel task 21507.1 task 10.sub04n88 failed - killing job   
> >>                    <-- THIS IS WHAT TRIGGERED the CRASH
> >>    
> >>
> >>>05/22/2005 14:09:14|qmaster|rupc-cs04b|W|job 21507.1 failed
> >>>      
> >>>
> >>on host sub04n78 assumedly after job because: job 21507.1
> >>died through signal TERM (15)
> >>    
> >>
> >>>05/22/2005 14:10:00|qmaster|rupc-cs04b|E|event client
> >>>      
> >>>
> >>"scheduler" (rupc-cs04b/schedd/1) reregistered - it will need 
> >>a total update    <- SCHEDULER START AFTER THE CRASH
> >>    
> >>
> >>>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>      
> >>>
> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>+++++++++++++++++++++
> >>    
> >>
> >>>SCHEDULER  messages  BELOW
> >>>
> >>>05/22/2005 00:20:01|schedd|rupc-cs04b|I|starting up 6.0u3 
> 05/22/2005
> >>>02:10:01|schedd|rupc-cs04b|I|starting up 6.0u3 05/22/2005 
> >>>02:30:26|schedd|rupc-cs04b|I|controlled shutdown 6.0u3 05/22/2005 
> >>>02:31:10|schedd|rupc-cs04b|I|starting up 6.0u3 05/22/2005 
> >>>02:34:06|schedd|rupc-cs04b|I|controlled shutdown 6.0u3 05/22/2005 
> >>>02:40:00|schedd|rupc-cs04b|I|starting up 6.0u3 05/22/2005 
> >>>03:10:01|schedd|rupc-cs04b|I|starting up 6.0u3 05/22/2005 
> >>>07:30:01|schedd|rupc-cs04b|I|starting up 6.0u3
> >>>05/22/2005 11:11:39|schedd|rupc-cs04b|I|starting up 6.0u3    
> >>>      
> >>>
> >>    <--- before the last crush (I started debug mode)
> >>    
> >>
> >>>05/22/2005 14:10:00|schedd|rupc-cs04b|I|starting up 6.0u3    
> >>>      
> >>>
> >>    <--- AFTER the last crush
> >>    
> >>
> >>>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>      
> >>>
> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>+++++++++++++++++++++
> >>    
> >>
> >>>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>      
> >>>
> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>+++++++++++++++++++++
> >>    
> >>
> >>> 
> >>>
> >>>-------------------------------------------------------------
> >>>      
> >>>
> >>----------
> >>    
> >>
> >>>-
> >>>
> >>>-----------------------------------------------------------
> ----------
> >>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>For additional commands, e-mail: 
> users-help at gridengine.sunsource.net
> >>> 
> >>>
> >>>      
> >>>
> >>------------------------------------------------------------
> ---------
> >>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>    
> >>
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >  
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list