[GE users] SDM GE adapter sve got in trouble

cbyun cbyun at ll.mit.edu
Mon Aug 24 14:50:51 BST 2009


Michal,

> -----Original Message-----
> From: Michal.Bachorik at sun.com [mailto:Michal.Bachorik at sun.com]
> Sent: Monday, August 24, 2009 9:23 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] SDM GE adapter sve got in trouble
>
> Chansup,
>
> actually is something called "STU_name" part of your changes? The error
> says something about "STU_name" not being part of descriptor, so I'd
> like to know whether it is something you introduced or touched or not ..
>

No, I have no such a thing called as "STU_name" in my configuration.
Here is my configuration:

# qconf -sconf
#global:
execd_spool_dir              /var/spool/sge
mailer                       /bin/mail
xterm                        /usr/bin/X11/xterm
load_sensor                  none
prolog                       none
epilog                       none
shell_start_mode             posix_compliant
login_shells                 sh,ksh,csh,tcsh
min_uid                      0
min_gid                      0
user_lists                   none
xuser_lists                  none
projects                     none
xprojects                    none
enforce_project              false
enforce_user                 auto
load_report_time             00:00:40
max_unheard                  00:05:00
reschedule_unknown           00:00:00
loglevel                     log_warning
administrator_mail           none
set_token_cmd                none
pag_cmd                      none
token_extend_time            none
shepherd_cmd                 none
qmaster_params               none
execd_params                 none
reporting_params             accounting=true reporting=true \
                             flush_time=00:00:15 joblog=true sharelog=00:00:00
finished_jobs                100
gid_range                    20000-20100
qlogin_command               builtin
qlogin_daemon                builtin
rlogin_command               builtin
rlogin_daemon                builtin
rsh_command                  builtin
rsh_daemon                   builtin
max_aj_instances             2000
max_aj_tasks                 75000
max_u_jobs                   0
max_jobs                     0
max_advance_reservations     0
auto_user_oticket            0
auto_user_fshare             0
auto_user_default_project    none
auto_user_delete_time        86400
delegated_file_staging       false
reprioritize                 0
jsv_url                      none
libjvm_path                  /usr/java/latest/jre/lib/amd64/server/libjvm.so
additional_jvm_args          -Xmx256m
jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w


# qconf -ssconf
algorithm                         default
schedule_interval                 0:2:0
maxujobs                          0
queue_sort_method                 load
job_load_adjustments              NONE
load_adjustment_decay_time        0:0:0
load_formula                      np_load_avg
schedd_job_info                   true
flush_submit_sec                  2
flush_finish_sec                  2
params                            none
reprioritize_interval             0:0:0
halftime                          168
usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor               5.000000
weight_user                       0.250000
weight_project                    0.250000
weight_department                 0.250000
weight_job                        0.250000
weight_tickets_functional         0
weight_tickets_share              0
share_override_tickets            TRUE
share_functional_shares           TRUE
max_functional_jobs_to_schedule   200
report_pjob_tickets               FALSE
max_pending_tasks_per_job         50
halflife_decay_list               none
policy_hierarchy                  OFS
weight_ticket                     0.010000
weight_waiting_time               0.000000
weight_deadline                   3600000.000000
weight_urgency                    0.100000
weight_priority                   1.000000
max_reservation                   0
default_duration                  INFINITY


# qconf -srqs
{
   name         host_slot_limit
   description  Limit total number of slots per hosts (assume uniform machines)
   enabled      TRUE
   limit        hosts {@allhosts} to slots=2
}
{
   name         max_u_jobs
   description  max jobs per user
   enabled      TRUE
   limit        users {*} to slots=256
}


# for i in `qconf -sql`; do echo " ";echo Qname: $i; qconf -sq $i; done

Qname: all.q
qname                 all.q
hostlist              @nohosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make
rerun                 FALSE
slots                 2
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

Qname: normal
qname                 normal
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make
rerun                 FALSE
slots                 2
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

Qname: pmatlab
qname                 pmatlab
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make
rerun                 FALSE
slots                 2
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY


Currently no hosts are assigned to neither of host groups since none of hosts are not being used by SGE adapter service:

# qconf -shgrp @allhosts
group_name @allhosts
hostlist NONE

# qconf -shgrp @nohosts
group_name @nohosts
hostlist NONE

I also turned on the exclusive mode:

# qconf -sc
#name               shortcut   type        relop   requestable consumable default  urgency
#------------------------------------------------------------------------------------------
...
exclusive           excl       BOOL        EXCL    YES         YES        0        1000


Thanks,
- Chansup




> M.
>
>
> cbyun wrote:
> > Michal,
> >
> > Yes, I made a few changes in the SGE configuration.
> >
> > I added a couple of RQS rules, a couple of new cluster queues and a new
> hostgroup, @nohosts.
> >
> > Then, in order to make all.q from being used, I assigned @nohosts group
> to the all.q.
> >
> > I believe the issue appeared after these customizations.
> >
> > Thanks,
> > - Chansup
> >
> >
> >
> >> -----Original Message-----
> >> From: Michal.Bachorik at sun.com [mailto:Michal.Bachorik at sun.com]
> >> Sent: Monday, August 24, 2009 3:28 AM
> >> To: users at gridengine.sunsource.net
> >> Subject: Re: [GE users] SDM GE adapter sve got in trouble
> >>
> >> Chansup,
> >>
> >> did not your SGE changed in any way? There is error coming from jgdi
> >> (from SGE side). Did not you do some kind of "upgrade/downgrade" of
> >> jgdi.jar? I have not seen such error before, so I will need to dig in
> it
> >> - I will let you know once I found something.
> >>
> >> Regards,
> >>
> >> Michal
> >>
> >> cbyun wrote:
> >>
> >>> Hi,
> >>>
> >>> Somehow my ge adapter service got in trouble and I couldn't start it
> any
> >>>
> >> more:
> >>
> >>> # sdmadm suc -c gesvc2
> >>> comp   host            message
> >>> ----------------------------------------
> >>> gesvc2 llgriddev.local startup triggered
> >>>
> >>>
> >>> 08/21/2009
> >>>
> >> 16:25:11|20|e.impl.ge.GEServiceAdapterImpl.doStartService|I|Service
> >> gesvc2: Starting Grid Engine service
> >>
> >>> 08/21/2009
> >>>
> >> 16:25:11|20|rm.service.impl.AbstractServiceAdapter$1.call|E|Service
> >> startup failed: jgdi error: java.lang.IllegalStateException: content
> field
> >> STU_name not found in descriptor
> >>
> >> |set_object_attribute: set_list of property reportVariables failed
> >>
> >>>
> |
> >>> 08/21/2009
> >>>
> >>
> 16:25:11|21|rm.impl.AbstractComponent$3.performTransition|W|Componentgesvc
> >> 2: Error in startup procedure: Service gesvc2: Unexpected error in
> state
> >> transition UnknownStateHandler[UNKNOWN] ->
> StartingStateHandler[STARTING]:
> >> Service startup failed: jgdi error: java.lang.IllegalStateException:
> >> content field STU_name not found in descriptor
> >>
> >> |set_object_attribute: set_list of property reportVariables failed
> >>
> >>>
> |
> >>>
> >>> Is there any way to clear up this error?
> >>>
> >>> Thanks,
> >>> - Chansup
> >>>
> >>> ------------------------------------------------------
> >>>
> >>>
> >>
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> >> =213536
> >>
> >>> To unsubscribe from this discussion, e-mail: [users-
> >>>
> >> unsubscribe at gridengine.sunsource.net].
> >>
> >> ------------------------------------------------------
> >>
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> >> =213890
> >>
> >> To unsubscribe from this discussion, e-mail: [users-
> >> unsubscribe at gridengine.sunsource.net].
> >>
> >
> > ------------------------------------------------------
> >
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> =213947
> >
> > To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].
> >
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId
> =213956
>
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213969

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list