[GE users] one node keeps going into error state

David Mathog mathog at mendel.bio.caltech.edu
Tue Nov 23 18:01:57 GMT 2004


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]



> Ah ... actually I meant the administrator abort mail.
> That one is more detailed than user mail.

Near as I can tell, that WAS the administrator abort mail.

For whatever its worth, here is the config file:

%cat /usr/SGE/default/common/configuration
conf_version           0
qmaster_spool_dir      /usr/SGE/default/spool/qmaster
execd_spool_dir        /usr/SGE/default/spool
binary_path            /usr/SGE/bin
mailer                 /bin/mail
xterm                  /usr/bin/X11/xterm
load_sensor            none
prolog                 none
epilog                 none
shell_start_mode       posix_compliant
login_shells           sh,ksh,csh,tcsh
min_uid                0
min_gid                0
user_lists             none
xuser_lists            none
load_report_time       00:00:40
stat_log_time          48:00:00
max_unheard            00:05:00
loglevel               log_warning
administrator_mail     root at saf.bio.caltech.edu
set_token_cmd          none
pag_cmd                none
token_extend_time      none
shepherd_cmd           none
qmaster_params         none
schedd_params          none
execd_params           none
finished_jobs          100
gid_range              20000-20100
admin_user             sgeadm
qlogin_command         telnet
qlogin_daemon          /usr/sbin/in.telnetd
rlogin_daemon          /usr/sbin/in.rlogind
default_domain         none
ignore_fqdn            true

I tried upping the loglevel to log_info but it didn't reveal anything
extra and the administrator email was the same.


Job 4352 caused action: All Queues on host "mendel" set to ERROR
 User        = safrun
 Queue       = testm
 Host        = mendel
 Start Time  = <unknown>
 End Time    = <unknown>
failed before prolog:shepherd exited with exit status 7
Shepherd pe_hostfile:
mendel 1 testm UNDEFINED

what does the line "Shepherd pe_hostfile" indicate???

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
> 
> Andreas
> 
> On Tue, 23 Nov 2004, David Mathog wrote:
> 
> >
> >
> > > On Mon, 22 Nov 2004, Chris Dagdigian wrote:
> > >
> > > > Hi David,
> > > >
> > > > Anything informative in the spool log files?
> > > >
> > > > /usr/SGE/default/spool/qmaster/messages
> > > > /usr/SGE/default/spool/qmaster/schedd/messages
> > > >
> > > > And especially:
> > > >
> > > > /usr/SGE/default/spool/mendel/messages
> > >
> > > Or try user abort mail as described in "Trouble Shooting" HOWTO
> > >
> > >
> >
http://gridengine.sunsource.net/project/gridengine/howto/troubleshooting.html
> > >
> > >
> > Pretty much the same thing as in the log files:
> >
> > Job 4347 caused action: All Queues on host "mendel" set to ERROR
> > User        = safrun
> > Queue       = testm
> > Host        = mendel
> > Start Time  = <unknown>
> > End Time    = <unknown>
> > failed before prolog:shepherd exited with exit status 7
> > Shepherd pe_hostfile:
> > mendel 1 testm UNDEFINED
> >
> > Thanks,
> >
> > David Mathog
> > mathog at caltech.edu
> > Manager, Sequence Analysis Facility, Biology Division, Caltech
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list