[GE users] "messages" log of qmaster

Alexandre Barras Alexandre.Barras at cerfacs.fr
Tue Aug 17 11:14:10 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

hi,

ok! I confirm it is the same issue
thanks!

Alexandre

ps: it is strange that we cannot see this issuezilla page from google!

Stephan Grell - Sun Germany - SSG - Software Engineer wrote:

> Hi,
>
> we had a bug that caused the scheduler and master to get out of sync. 
> It looks
> like you are having the same issue. To get more details, please have a 
> look at:
>
> Issueszilla: 1154
> Bugtraq:      5074788
>
> It should be solved with the next update.
>
> Cheers,
> Stephan
>
> Alexandre Barras wrote:
>
>> Hi,
>>
>> I use SGE 6.0 and it happens after restarting the scheduler.
>> I set the log level to "log_info" but it doesn't give more information.
>>
>> Master host is a RH 8.0, but there was the same issue with a Sun as 
>> master.
>>
>> Here is my scheduler and cluster configurations. But there are very 
>> near from the default one. Maybe it will be helpful.
>>
>> [root at catleya qmaster]# qconf -ssconf
>> algorithm                         default
>> schedule_interval                 0:0:30
>> maxujobs                          8
>> queue_sort_method                 load
>> job_load_adjustments              np_load_avg=0.50
>> load_adjustment_decay_time        0:7:30
>> load_formula                      np_load_avg
>> schedd_job_info                   true
>> flush_submit_sec                  0
>> flush_finish_sec                  0
>> params                            none
>> reprioritize_interval             0:2:0
>> halftime                          168
>> usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
>> compensation_factor               5.000000
>> weight_user                       0.250000
>> weight_project                    0.250000
>> weight_department                 0.250000
>> weight_job                        0.250000
>> weight_tickets_functional         0
>> weight_tickets_share              0
>> share_override_tickets            TRUE
>> share_functional_shares           TRUE
>> max_functional_jobs_to_schedule   200
>> report_pjob_tickets               TRUE
>> max_pending_tasks_per_job         50
>> halflife_decay_list               none
>> policy_hierarchy                  OFS
>> weight_ticket                     0.010000
>> weight_waiting_time               0.000000
>> weight_deadline                   3600000.000000
>> weight_urgency                    0.100000
>> weight_priority                   1.000000
>> max_reservation                   0
>> default_duration                  0:10:0
>>
>> [root at catleya qmaster]# qconf -sconf global
>> global:
>> execd_spool_dir              /opt2/sge/default/spool
>> mailer                       /bin/mailx
>> xterm                        /usr/openwin/bin/xterm
>> load_sensor                  none
>> prolog                       none
>> epilog                       none
>> shell_start_mode             posix_compliant
>> login_shells                 sh,ksh,csh,tcsh
>> min_uid                      0
>> min_gid                      0
>> user_lists                   none
>> xuser_lists                  none
>> projects                     none
>> xprojects                    none
>> enforce_project              false
>> enforce_user                 auto
>> load_report_time             00:00:30
>> stat_log_time                48:00:00
>> max_unheard                  00:00:32
>> reschedule_unknown           00:00:15
>> loglevel                     log_info
>> administrator_mail           barras at cerfacs.fr
>> set_token_cmd                none
>> pag_cmd                      none
>> token_extend_time            none
>> shepherd_cmd                 none
>> qmaster_params               none
>> execd_params                 none
>> reporting_params             accounting=true reporting=false \
>>                             flush_time=00:00:15 joblog=false \
>>                             sharelog=00:00:00
>> finished_jobs                100
>> gid_range                    20000-20100
>> qlogin_command               telnet
>> qlogin_daemon                /usr/sbin/in.telnetd
>> rlogin_daemon                /usr/sbin/in.rlogind
>> max_aj_instances             2000
>> max_aj_tasks                 75000
>> max_u_jobs                   0
>> max_jobs                     0
>> auto_user_oticket            0
>> auto_user_fshare             0
>> auto_user_default_project    none
>> auto_user_delete_time        100
>> delegated_file_staging       false
>> reprioritize                 1
>>
>>
>>
>> Andy Schwierskott wrote:
>>
>>> Alexandre,
>>>
>>> that's certainly a bug - to fix it it would be helpful to get some 
>>> ideas
>>> how to reproduce it.
>>>
>>> Which SGE version are you using?
>>>
>>> Does it happen after restarting the scheduler?
>>>
>>> Can you set the loglevel to "log_info" - may be this gives more insight
>>> what's going on.
>>>
>>> Andy
>>>
>>>> hello,
>>>>
>>>> Every second, the master write this two lines in the "messages" log:
>>>> ----------------------------------------------------
>>>> 08/16/2004 11:24:23|qmaster|catleya|E|can't get task id
>>>> 08/16/2004 11:24:23|qmaster|catleya|E|reinitialization of "scheduler"
>>>> ----------------------------------------------------
>>>>
>>>> I have two questions:
>>>> 1. What is the problem related in the log ? (I have to precise that 
>>>> everything is alright with SGE in my cluster)
>>>> 2. How to prevent SGE to write in the log file so often ?
>>>>
>>>> Reuti wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>> our dual Pentium 4 Xeon 40 node cluster. The cluster is working on
>>>>>> bioinformatics applications that are developed using perl scripts 
>>>>>> that call
>>>>>> each other and wait for each other to finish. Hence at any given 
>>>>>> point of
>>>>>> time there are many executing scripts that are actually waiting 
>>>>>> and this
>>>>>> increases the load average artificially. If I increase my 
>>>>>> load_threshold on
>>>>>>
>>>>>
>>>>> can you provide some more details about your scripts? They startup 
>>>>> as serial jobs, and then they are starting something in the 
>>>>> background and polling for the results?
>>>>>
>>>>> Reuti
>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>

-- 
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	     Alexandre Barras - Computing Support Group
        CERFACS, 42 Av. Coriolis, F-31057 TOULOUSE Cedex 1, FRANCE
        Tel.: (+33) [0]5 61 19 30 75   Fax: (+33) [0]5 61 19 30 00

             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list