[GE users] Open MPI tight integration in HOWTO page

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Mon Feb 5 13:06:11 GMT 2007


On Fri, 2 Feb 2007, Heywood, Todd wrote:

> Hi,
>
> If you recall, I had 2 classes of errors: (1) the GMSH error while jobs
> give output, and (2) complete failure for a large enough number of MPI
> tasks, sometimes giving a grab bag of error messages (see first post on
> this thread), and sometimes giving no output, but with qstat saying
> "critical error: unrecoverable error - contact systems manager.
> Aborted". This second case might be related to LDAP, as I gound the
> following messages in /var/log/messages of the job nodes:
>
> Feb  2 14:06:16 blade183 sge_execd: nss_ldap: reconnecting to LDAP
> server...
> Feb  2 14:06:16 blade183 sge_execd: nss_ldap: reconnected to LDAP server
> after 1 attempt(s)
> Feb  2 14:06:16 blade183 sge_shepherd-9194: nss_ldap: reconnecting to
> LDAP server...
> Feb  2 14:06:16 blade183 sge_shepherd-9194: nss_ldap: reconnected to
> LDAP server after 1 attempt(s)
> Feb  2 14:06:17 blade183 sge_shepherd-9194: nss_ldap: reconnecting to
> LDAP server...
> Feb  2 14:06:17 blade183 sge_shepherd-9194: nss_ldap: reconnected to
> LDAP server after 1 attempt(s)
> Feb  2 14:07:19 blade183 sge_shepherd-9194: nss_ldap: reconnecting to
> LDAP server...
> Feb  2 14:07:19 blade183 sge_shepherd-9194: nss_ldap: reconnected to
> LDAP server after 1 attempt(s)
>
> Googling on LDAP plus various cluster/MPI/scalability topics shows up
> nothing.

Well, there is something wrong with your set-up in general, but I 
couldn't do more than guessing. Try searching for "reconnecting LDAP 
nss_ldap". This gets you a number of hits.

Note, in an earlier mail Reuti already asked this:

    "Are you using any special communication lib? Myrinet, Infiniband,... ?"

If that were the case, there is a chance it is somehow related.

Regards,
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list