[GE users] semaphore leftovers

Reuti reuti at staff.uni-marburg.de
Wed Feb 9 22:33:05 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

are you using MPICH as you mentioned cleanipcs? One solution would be to 
compile MPICH without shared memory support. cleanipcs will simply remove all 
ipcs stuff from one user on a node. If there are two jobs from the same user, 
the second might be killed also by removing the "wrong" semaphores by accident. 
But instead of a cron job, the cleanipcs could be put in the stop_proc_args or 
queue_epilog.

Another solution for dynamically linked application:

A wrapper lib which will trap semget(), shmget(), msgget() and so on which is 
loaded before the job by using LD_PRELOAD for this wrapper lib. This wrapper 
will call the real semget() and remember the assigned ids on the return to the 
application. When you shutdown the application by qdel, you would know all the 
ids of the semaphores you have to remove. It's just an idea, but it would be a 
cool addition to SGE.

Cheers - Reuti


Quoting David Farrell <d-farrell2 at northwestern.edu>:

> I am running into the issue where semaphores build up and create a 
> situation in which uses can no longer start jobs, and on occasion, jobs 
> die prematurely. The errors point to the semaphore issue and cleaning 
> it manually has become a bother, as users wish to use the machine for 
> testing(so abnormal exits are common). Is there any good solution to 
> this problem? I have heard that making a cleanup script into a cron job 
> sometimes results in jobs being killed, so I would be interested in 
> other possibilities. In addition, I sometimes see a situation in which 
> running the cleanipcs script as root does not clean out some of the 
> semaphores. Is there any solution?
> 
> Thanks in advance,
> 
> Dave
> 
> 
> 
> David E. Farrell
> Graduate Student
> Mechanical Engineering
> Northwestern University
> email: d-farrell2 at northwestern.edu



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list