[GE users] when a queue is full

Ron Chen ron_chen_123 at yahoo.com
Thu Dec 8 02:43:05 GMT 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

1) - "vmstat" is actually called vm_stat on MacOSX :p

 - can you dump all the environment variables (add a line "env >
<the filename>" in the job script) and compare them with the
environment started by dsh? And, does your job do a lot of disk
I/O? Is the I/O path the same? BTW, what kind of job is it -
parallel or serial?

2) yes, SGE doesn't know processes started by other things or by
hand - but it does look at the load of the machines and tries
not to schedule jobs to busy machines.

 -Ron


--- Peiran Song <peirans at cs.uoregon.edu> wrote: 
> >1) for the job execution time difference, can you use top,
> >vmstat, and/or iostat to find out what is going on in the
> >system? Are any of the SGE daemons running consuming the
> >processor?
> >  
> >
> vmstat is not available at our system. iostat doesn't show any
> status 
> difference. top, the only difference I saw, when comparing the
> outputs 
> from two simultaneous jobs submitted by dsh versus tow jobs
> scheduled 
> through the scheduler, is the VSIZE. Two jobs submitted
> through dsh, 
> which executes much faster, are each consuming 50M less
> virtual memory. 
> That doesn't seem big enough to be the matter, does it?
> 
> >2) if a queue instance (host) is full, it means all the job
> >slots are used up. If you have 2 CPUs, and 2 jobs are running
> on
> >the host, then SGE tells you that it is full.
> >  
> >
> So, I didn't see queue full message when submit jobs by dsh
> was because 
> that bypassed SGE?
> I am still naive with SGE and sys admin...
> 
> Thanks,
> Peiran
> 
> 
> > -Ron
> >
> >
> >--- Peiran Song <peirans at cs.uoregon.edu> wrote:
> >  
> >
> >>Hi All,
> >>
> >>We have an Apple cluster running Grid Engine. We observed
> much
> >>longer 
> >>execution time of two subjobs scheduled to the same duel-CPU
> >>node, 
> >>comparing to directly send the two sub-jobs about
> >>simultaneously by dsh 
> >>to the same node. The time difference is two minutes versus
> 15
> >>seconds. 
> >>When I tried qstat -j during the executions, for the first
> >>case, I got 
> >>the queue is full info as below, but not for the second
> case.
> >>
> >>usage    2:                 cpu=00:00:00, mem=0.00000 GBs,
> >>io=0.00000, 
> >>vmem=N/A, maxvmem=N/A
> >>usage    3:                 cpu=00:00:00, mem=0.00000 GBs,
> >>io=0.00000, 
> >>vmem=N/A, maxvmem=N/A
> >>scheduling info:            queue instance 
> >>"all.q at node005.cluster.private" dropped because it is full
> >>
> >>I am wondering at what circumstance a queue would be deemed
> >>full (no 
> >>spare CPU, no spare memory?). Is that truly full or is that
> an
> >>
> >>estimate?  Seems that when it is deemed full, it took much
> >>longer for 
> >>the job to be done. Could the configuration parameters be
> >>tweaked 
> >>somehow to limit/avoid this happening? Here is our current
> >>configuration:
> >>
> >>algorithm                         default
> >>schedule_interval                 0:0:1
> >>maxujobs                          0
> >>queue_sort_method                 load
> >>job_load_adjustments              NONE     --- should we
> >>adjust?
> >>load_adjustment_decay_time        0:0:0
> >>load_formula                      np_load_avg
> >>schedd_job_info                   true
> >>flush_submit_sec                  0
> >>flush_finish_sec                  0
> >>params                            none
> >>reprioritize_interval             0:0:0
> >>halftime                          168
> >>usage_weight_list                
> >>cpu=1.000000,mem=0.000000,io=0.000000
> >>compensation_factor               5.000000
> >>weight_user                       0.250000
> >>weight_project                    0.250000
> >>weight_department                 0.250000
> >>weight_job                        0.250000
> >>weight_tickets_functional         0
> >>weight_tickets_share              0
> >>share_override_tickets            TRUE
> >>share_functional_shares           TRUE
> >>max_functional_jobs_to_schedule   200
> >>report_pjob_tickets               TRUE
> >>max_pending_tasks_per_job         50
> >>halflife_decay_list               none
> >>policy_hierarchy                  OFS
> >>weight_ticket                     0.010000
> >>weight_waiting_time               0.000000
> >>weight_deadline                   3600000.000000
> >>weight_urgency                    0.100000
> >>weight_priority                   1.000000
> >>max_reservation                   0
> >>default_duration                  0:10:0
> >>
> >>Any comments and ideas would be very much appreciated!
> >>
> >>Regards,
> >>Peiran Song
> >>
> >>
> >>
> >>
> >>
> >>
> >>    
> >>
>
>---------------------------------------------------------------------
> >  
> >
> >>To unsubscribe, e-mail:
> >>users-unsubscribe at gridengine.sunsource.net
> >>For additional commands, e-mail:
> >>users-help at gridengine.sunsource.net
> >>
> >>
> >>    
> >>
> >
> >
> >__________________________________________________
> >Do You Yahoo!?
> >Tired of spam?  Yahoo! Mail has the best spam protection
> around 
> >http://mail.yahoo.com 
> >
>
>---------------------------------------------------------------------
> >To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> >For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> >
> >  
> >
> 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list