[GE users] when a queue is full

Peiran Song peirans at cs.uoregon.edu
Thu Dec 8 19:15:04 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Ron,

> - can you dump all the environment variables (add a line "env >
><the filename>" in the job script) and compare them with the
>environment started by dsh? And, does your job do a lot of disk
>I/O? Is the I/O path the same? BTW, what kind of job is it -
>parallel or serial?
>  
>
vm_stat shows there are still plenty of memory free.
What parameters should I be looking for in the environment list? 
Environment  variables that I know would be used are there for both 
cases. Yes, there are quite some disk I/O involved.

I was running a BLAST job, used a script to segment it to 10 sub-jobs 
and one master script sent the job array to SGE. When I monitor it by 
qstat -f, two sub-jobs at one node were stretching long to finish. Then 
I used dsh to send this two particular jobs to the particular node 
simultaneously,  the jobs finished in one eighth of the time. This is 
what puzzled me...

Thanks,
Peiran


>2) yes, SGE doesn't know processes started by other things or by
>hand - but it does look at the load of the machines and tries
>not to schedule jobs to busy machines.
>
> -Ron
>
>
>--- Peiran Song <peirans at cs.uoregon.edu> wrote: 
>  
>
>>>1) for the job execution time difference, can you use top,
>>>vmstat, and/or iostat to find out what is going on in the
>>>system? Are any of the SGE daemons running consuming the
>>>processor?
>>> 
>>>
>>>      
>>>
>>vmstat is not available at our system. iostat doesn't show any
>>status 
>>difference. top, the only difference I saw, when comparing the
>>outputs 
>>from two simultaneous jobs submitted by dsh versus tow jobs
>>scheduled 
>>through the scheduler, is the VSIZE. Two jobs submitted
>>through dsh, 
>>which executes much faster, are each consuming 50M less
>>virtual memory. 
>>That doesn't seem big enough to be the matter, does it?
>>
>>    
>>
>>>2) if a queue instance (host) is full, it means all the job
>>>slots are used up. If you have 2 CPUs, and 2 jobs are running
>>>      
>>>
>>on
>>    
>>
>>>the host, then SGE tells you that it is full.
>>> 
>>>
>>>      
>>>
>>So, I didn't see queue full message when submit jobs by dsh
>>was because 
>>that bypassed SGE?
>>I am still naive with SGE and sys admin...
>>
>>Thanks,
>>Peiran
>>
>>
>>    
>>
>>>-Ron
>>>
>>>
>>>--- Peiran Song <peirans at cs.uoregon.edu> wrote:
>>> 
>>>
>>>      
>>>
>>>>Hi All,
>>>>
>>>>We have an Apple cluster running Grid Engine. We observed
>>>>        
>>>>
>>much
>>    
>>
>>>>longer 
>>>>execution time of two subjobs scheduled to the same duel-CPU
>>>>node, 
>>>>comparing to directly send the two sub-jobs about
>>>>simultaneously by dsh 
>>>>to the same node. The time difference is two minutes versus
>>>>        
>>>>
>>15
>>    
>>
>>>>seconds. 
>>>>When I tried qstat -j during the executions, for the first
>>>>case, I got 
>>>>the queue is full info as below, but not for the second
>>>>        
>>>>
>>case.
>>    
>>
>>>>usage    2:                 cpu=00:00:00, mem=0.00000 GBs,
>>>>io=0.00000, 
>>>>vmem=N/A, maxvmem=N/A
>>>>usage    3:                 cpu=00:00:00, mem=0.00000 GBs,
>>>>io=0.00000, 
>>>>vmem=N/A, maxvmem=N/A
>>>>scheduling info:            queue instance 
>>>>"all.q at node005.cluster.private" dropped because it is full
>>>>
>>>>I am wondering at what circumstance a queue would be deemed
>>>>full (no 
>>>>spare CPU, no spare memory?). Is that truly full or is that
>>>>        
>>>>
>>an
>>    
>>
>>>>estimate?  Seems that when it is deemed full, it took much
>>>>longer for 
>>>>the job to be done. Could the configuration parameters be
>>>>tweaked 
>>>>somehow to limit/avoid this happening? Here is our current
>>>>configuration:
>>>>
>>>>algorithm                         default
>>>>schedule_interval                 0:0:1
>>>>maxujobs                          0
>>>>queue_sort_method                 load
>>>>job_load_adjustments              NONE     --- should we
>>>>adjust?
>>>>load_adjustment_decay_time        0:0:0
>>>>load_formula                      np_load_avg
>>>>schedd_job_info                   true
>>>>flush_submit_sec                  0
>>>>flush_finish_sec                  0
>>>>params                            none
>>>>reprioritize_interval             0:0:0
>>>>halftime                          168
>>>>usage_weight_list                
>>>>cpu=1.000000,mem=0.000000,io=0.000000
>>>>compensation_factor               5.000000
>>>>weight_user                       0.250000
>>>>weight_project                    0.250000
>>>>weight_department                 0.250000
>>>>weight_job                        0.250000
>>>>weight_tickets_functional         0
>>>>weight_tickets_share              0
>>>>share_override_tickets            TRUE
>>>>share_functional_shares           TRUE
>>>>max_functional_jobs_to_schedule   200
>>>>report_pjob_tickets               TRUE
>>>>max_pending_tasks_per_job         50
>>>>halflife_decay_list               none
>>>>policy_hierarchy                  OFS
>>>>weight_ticket                     0.010000
>>>>weight_waiting_time               0.000000
>>>>weight_deadline                   3600000.000000
>>>>weight_urgency                    0.100000
>>>>weight_priority                   1.000000
>>>>max_reservation                   0
>>>>default_duration                  0:10:0
>>>>
>>>>Any comments and ideas would be very much appreciated!
>>>>
>>>>Regards,
>>>>Peiran Song
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>---------------------------------------------------------------------
>>    
>>
>>> 
>>>
>>>      
>>>
>>>>To unsubscribe, e-mail:
>>>>users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail:
>>>>users-help at gridengine.sunsource.net
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>__________________________________________________
>>>Do You Yahoo!?
>>>Tired of spam?  Yahoo! Mail has the best spam protection
>>>      
>>>
>>around 
>>    
>>
>>>http://mail.yahoo.com 
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>    
>>
>>>To unsubscribe, e-mail:
>>>      
>>>
>>users-unsubscribe at gridengine.sunsource.net
>>    
>>
>>>For additional commands, e-mail:
>>>      
>>>
>>users-help at gridengine.sunsource.net
>>    
>>
>>> 
>>>
>>>      
>>>
>>
>>
>>    
>>
>---------------------------------------------------------------------
>  
>
>>To unsubscribe, e-mail:
>>users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail:
>>users-help at gridengine.sunsource.net
>>
>>
>>    
>>
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list