[GE users] semaphore leftovers

Ron Chen ron_chen_123 at yahoo.com
Thu Feb 10 00:49:38 GMT 2005


I think you need to disallow users from escaping from
SGE, or otherwise there would be no easy solution.

 -Ron

--- David Farrell <d-farrell2 at northwestern.edu> wrote:
> Yes this is MPICH, I will give this bit a try. The
> issue here is that 
> the users tend to use ctrl-C sorts to kill a job
> when running in 
> interactive mode, rather than using a more elegant
> technique. It seems 
> that in a way, the users are bypassing SGE by doing
> this and this may 
> be part of the problem. I think that having an
> epilog script to clean 
> up after they have ended the job may work well, but
> I wonder if there 
> are other ways (just so I know). It appears they are
> using the cluster 
> to debug their code on, and in so doing they end up
> using it like a 
> series of workstations through interactive shells.
> Perhaps there is a 
> better way to go about them using the nodes in this
> manner?
> 
> >
> > Another solution for dynamically linked
> application:
> >
> > A wrapper lib which will trap semget(), shmget(),
> msgget() and so on 
> > which is
> > loaded before the job by using LD_PRELOAD for this
> wrapper lib. This 
> > wrapper
> > will call the real semget() and remember the
> assigned ids on the 
> > return to the
> > application. When you shutdown the application by
> qdel, you would know 
> > all the
> > ids of the semaphores you have to remove. It's
> just an idea, but it 
> > would be a
> > cool addition to SGE.
> That does sound interesting, but I am not sure my
> skills are up to the 
> task
> 
> Thanks again,
> 
> Dave
> 
> >
> > Cheers - Reuti
> >
> >
> > Quoting David Farrell
> <d-farrell2 at northwestern.edu>:
> >
> >> I am running into the issue where semaphores
> build up and create a
> >> situation in which uses can no longer start jobs,
> and on occasion, 
> >> jobs
> >> die prematurely. The errors point to the
> semaphore issue and cleaning
> >> it manually has become a bother, as users wish to
> use the machine for
> >> testing(so abnormal exits are common). Is there
> any good solution to
> >> this problem? I have heard that making a cleanup
> script into a cron 
> >> job
> >> sometimes results in jobs being killed, so I
> would be interested in
> >> other possibilities. In addition, I sometimes see
> a situation in which
> >> running the cleanipcs script as root does not
> clean out some of the
> >> semaphores. Is there any solution?
> >>
> >> Thanks in advance,
> >>
> >> Dave
> >>
> >>
> >>
> >> David E. Farrell
> >> Graduate Student
> >> Mechanical Engineering
> >> Northwestern University
> >> email: d-farrell2 at northwestern.edu
> >
> >
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> >
> >
> David E. Farrell
> Graduate Student
> Mechanical Engineering
> Northwestern University
> email: d-farrell2 at northwestern.edu
> 



		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list