[GE users] jobs in queue always going to "transfer" status

Sean Davis sdavis2 at mail.nih.gov
Thu Oct 2 20:06:32 BST 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

On Thu, Oct 2, 2008 at 2:59 PM, Rayson Ho <rayrayson at gmail.com> wrote:
> On 10/2/08, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> The machine is using openldap-2.4.9.  It looks like this bug was fixed
>> some time ago (unless is has reemerged), or am I reading the bug
>> report incorrectly?
>
> Actually, I simply googled the stack trace and found that bug... Other
> OpenSUSE 11 users also reported the same problem:
>
> http://lists.opensuse.org/opensuse-bugs/2008-07/msg03377.html
>
> I think it was finally fixed in 2.4.9-1:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=484802
>
> If you upgrade to OpenLDAP 2.4.9-1 and the problem is still there, you
> may want to contact the OpenLDAP mailing list directly as it seems to
> be an OpenLDAP issue than an SGE issue.

Thanks for doing all my homework for me.  I'll try to fix the openldap
issue and hope that does it.

Sean


>> > On 10/2/08, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> >> Program received signal SIGABRT, Aborted.
>> >> [Switching to Thread 0x7fbcfc4fa6f0 (LWP 18677)]
>> >> 0x00007fbcfba5b5c5 in raise () from /lib64/libc.so.6
>> >> (gdb) bt
>> >> #0  0x00007fbcfba5b5c5 in raise () from /lib64/libc.so.6
>> >> #1  0x00007fbcfba5cbb3 in abort () from /lib64/libc.so.6
>> >> #2  0x00007fbcfba541e9 in __assert_fail () from /lib64/libc.so.6
>> >> #3  0x00007fbcfad91613 in ber_flush2 () from /usr/lib64/liblber-2.4.so.2
>> >> #4  0x00007fbcfafbb34c in ldap_int_flush_request ()
>> >>   from /usr/lib64/libldap-2.4.so.2
>> >> #5  0x00007fbcfafbb75f in ldap_send_server_request ()
>> >>   from /usr/lib64/libldap-2.4.so.2
>> >> #6  0x00007fbcfafbba10 in ldap_send_initial_request ()
>> >>   from /usr/lib64/libldap-2.4.so.2
>> >> #7  0x00007fbcfafab360 in ldap_search () from /usr/lib64/libldap-2.4.so.2
>> >> #8  0x00007fbcfafab47a in ldap_search_st () from /usr/lib64/libldap-2.4.so.2
>> >> #9  0x00007fbcfb1e4703 in ?? () from /lib64/libnss_ldap.so.2
>> >> #10 0x00007fbcfb1e3a13 in ?? () from /lib64/libnss_ldap.so.2
>> >> #11 0x00007fbcfb1e44ce in ?? () from /lib64/libnss_ldap.so.2
>> >> #12 0x00007fbcfb1e4b5f in ?? () from /lib64/libnss_ldap.so.2
>> >> #13 0x00007fbcfb1e5197 in _nss_ldap_getpwnam_r () from /lib64/libnss_ldap.so.2
>> >> #14 0x00007fbcfb61814b in ?? () from /lib64/libnss_compat.so.2
>> >> #15 0x00007fbcfb618417 in _nss_compat_getpwnam_r ()
>> >>   from /lib64/libnss_compat.so.2
>> >> #16 0x00007fbcfbaca01d in getpwnam_r () from /lib64/libc.so.6
>> >> #17 0x000000000050a3cc in sge_getpwnam_r ()
>> >> #18 0x00000000004280de in sge_exec_job ()
>> >> ---Type <return> to continue, or q <return> to quit---
>> >> #19 0x000000000042e60c in exec_job_or_task ()
>> >> #20 0x000000000042e160 in sge_start_jobs ()
>> >> #21 0x000000000042def0 in do_ck_to_do ()
>> >> #22 0x0000000000427835 in sge_execd_process_messages ()
>> >> #23 0x0000000000424b6d in main ()
>> >>
>> >> I didn't mention that we are running openSUSE 11 on this machine.
>> >>
>> >> uname -a
>> >> Linux mahfouz 2.6.25.16-0.1-default #1 SMP 2008-08-21 00:34:25 +0200
>> >> x86_64 x86_64 x86_64 GNU/Linux
>> >>
>> >> And the libc major version is 2.8, if I recall.
>> >>
>> >> Any other ideas before I try to compile a debugging version with some
>> >> print statements?
>> >>
>> >> Thanks,
>> >> Sean
>> >>
>> >>
>> >> > On Wed, Oct 1, 2008 at 8:34 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> >> >> And a couple more lines of interest, all from qmaster:
>> >> >>
>> >> >> 10/01/2008 20:24:17| timer|shakespeare|W|failed to deliver job 3265.1
>> >> >> to queue "all.q at grass.nci.nih.gov"
>> >> >> 10/01/2008 20:24:17| timer|shakespeare|E|got max. unheard timeout for
>> >> >> target "execd" on host "grass.nci.nih.gov", can't deliver job "3265"
>> >> >>
>> >> >> The eight jobs before this one went into "run" status, one completed,
>> >> >> and the next one was job 3265; it remains in "transfer" status.
>> >> >>
>> >> >> Sean
>> >> >>
>> >> >>> Thanks, Rayson.  This looks suspicious.  I'm not sure what to do with
>> >> >>> this.  How does one end up with an unknown queue?  The timing was such
>> >> >>> that I had submitted several jobs for testing to one of the machines
>> >> >>> in question (i.e., qsub -q all.q at machine sleeper.sh).
>> >> >>>
>> >> >>> Sean
>> >> >>>
>> >> >>>>>
>> >> >>>>> Thanks,
>> >> >>>>> Sean
>> >> >>>>>
>> >> >>>>> ---------------------------------------------------------------------
>> >> >>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> >>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >> >>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>> ---------------------------------------------------------------------
>> >> >>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> >>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >> >>
>> >> >>
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >> >
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list