[GE users] jobs in queue always going to "transfer" status

Rayson Ho rayrayson at gmail.com
Thu Oct 2 19:59:48 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

On 10/2/08, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> The machine is using openldap-2.4.9.  It looks like this bug was fixed
> some time ago (unless is has reemerged), or am I reading the bug
> report incorrectly?

Actually, I simply googled the stack trace and found that bug... Other
OpenSUSE 11 users also reported the same problem:

http://lists.opensuse.org/opensuse-bugs/2008-07/msg03377.html

I think it was finally fixed in 2.4.9-1:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=484802

If you upgrade to OpenLDAP 2.4.9-1 and the problem is still there, you
may want to contact the OpenLDAP mailing list directly as it seems to
be an OpenLDAP issue than an SGE issue.

Rayson



>
> Sean
>
>
> > On 10/2/08, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> >> Program received signal SIGABRT, Aborted.
> >> [Switching to Thread 0x7fbcfc4fa6f0 (LWP 18677)]
> >> 0x00007fbcfba5b5c5 in raise () from /lib64/libc.so.6
> >> (gdb) bt
> >> #0  0x00007fbcfba5b5c5 in raise () from /lib64/libc.so.6
> >> #1  0x00007fbcfba5cbb3 in abort () from /lib64/libc.so.6
> >> #2  0x00007fbcfba541e9 in __assert_fail () from /lib64/libc.so.6
> >> #3  0x00007fbcfad91613 in ber_flush2 () from /usr/lib64/liblber-2.4.so.2
> >> #4  0x00007fbcfafbb34c in ldap_int_flush_request ()
> >>   from /usr/lib64/libldap-2.4.so.2
> >> #5  0x00007fbcfafbb75f in ldap_send_server_request ()
> >>   from /usr/lib64/libldap-2.4.so.2
> >> #6  0x00007fbcfafbba10 in ldap_send_initial_request ()
> >>   from /usr/lib64/libldap-2.4.so.2
> >> #7  0x00007fbcfafab360 in ldap_search () from /usr/lib64/libldap-2.4.so.2
> >> #8  0x00007fbcfafab47a in ldap_search_st () from /usr/lib64/libldap-2.4.so.2
> >> #9  0x00007fbcfb1e4703 in ?? () from /lib64/libnss_ldap.so.2
> >> #10 0x00007fbcfb1e3a13 in ?? () from /lib64/libnss_ldap.so.2
> >> #11 0x00007fbcfb1e44ce in ?? () from /lib64/libnss_ldap.so.2
> >> #12 0x00007fbcfb1e4b5f in ?? () from /lib64/libnss_ldap.so.2
> >> #13 0x00007fbcfb1e5197 in _nss_ldap_getpwnam_r () from /lib64/libnss_ldap.so.2
> >> #14 0x00007fbcfb61814b in ?? () from /lib64/libnss_compat.so.2
> >> #15 0x00007fbcfb618417 in _nss_compat_getpwnam_r ()
> >>   from /lib64/libnss_compat.so.2
> >> #16 0x00007fbcfbaca01d in getpwnam_r () from /lib64/libc.so.6
> >> #17 0x000000000050a3cc in sge_getpwnam_r ()
> >> #18 0x00000000004280de in sge_exec_job ()
> >> ---Type <return> to continue, or q <return> to quit---
> >> #19 0x000000000042e60c in exec_job_or_task ()
> >> #20 0x000000000042e160 in sge_start_jobs ()
> >> #21 0x000000000042def0 in do_ck_to_do ()
> >> #22 0x0000000000427835 in sge_execd_process_messages ()
> >> #23 0x0000000000424b6d in main ()
> >>
> >> I didn't mention that we are running openSUSE 11 on this machine.
> >>
> >> uname -a
> >> Linux mahfouz 2.6.25.16-0.1-default #1 SMP 2008-08-21 00:34:25 +0200
> >> x86_64 x86_64 x86_64 GNU/Linux
> >>
> >> And the libc major version is 2.8, if I recall.
> >>
> >> Any other ideas before I try to compile a debugging version with some
> >> print statements?
> >>
> >> Thanks,
> >> Sean
> >>
> >>
> >> > On Wed, Oct 1, 2008 at 8:34 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> >> >> And a couple more lines of interest, all from qmaster:
> >> >>
> >> >> 10/01/2008 20:24:17| timer|shakespeare|W|failed to deliver job 3265.1
> >> >> to queue "all.q at grass.nci.nih.gov"
> >> >> 10/01/2008 20:24:17| timer|shakespeare|E|got max. unheard timeout for
> >> >> target "execd" on host "grass.nci.nih.gov", can't deliver job "3265"
> >> >>
> >> >> The eight jobs before this one went into "run" status, one completed,
> >> >> and the next one was job 3265; it remains in "transfer" status.
> >> >>
> >> >> Sean
> >> >>
> >> >>> Thanks, Rayson.  This looks suspicious.  I'm not sure what to do with
> >> >>> this.  How does one end up with an unknown queue?  The timing was such
> >> >>> that I had submitted several jobs for testing to one of the machines
> >> >>> in question (i.e., qsub -q all.q at machine sleeper.sh).
> >> >>>
> >> >>> Sean
> >> >>>
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> Sean
> >> >>>>>
> >> >>>>> ---------------------------------------------------------------------
> >> >>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> >>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >> >>>>>
> >> >>>>>
> >> >>>>
> >> >>>> ---------------------------------------------------------------------
> >> >>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> >>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >> >>
> >> >>
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >> >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list