[GE users] nodes overloaded: processes placed on already full nodes

reuti reuti at staff.uni-marburg.de
Tue Dec 21 18:31:02 GMT 2010


Am 21.12.2010 um 19:21 schrieb steve_s:

> On Dec 21 18:22 +0100, reuti wrote:
>>>> What does:
>>>> 
>>>> ps -e f
>>>> 
>>>> (f w/o -) show on such a node? Are all the processes bound to an
>>>> sge_shepherd, or did some jump out of the processes tree and weren't
>>>> killed?
>>> 
>>> There are no sge_shepherd's on the nodes. I did not set up SGE on the
>>> machine but what I understand from the documentation is that
>>> sge_shepherd is only used in the case of "tight integration" of PEs.
>>> In our case, the PE starts the MPI processes.
>> 
>> Well, even with a loose integration, you have to honor the lost of
>> granted machines for your job. What do you mean in detail by "the PE
>> starts the MPI processes"? You will need at least a sgeexecd on the
>> nodes, so that SGE is aware of its existence and can make a suitable
>> slot allocation for your job. (The sgeexecd will then start the
>> shepherd in case of a tight integration.)
> 
> Yes, sge_execd is present on each node, as well as sge_shepherd-$JOB_ID
> on the master node, where the job-script is executed:
> 
> 4693 ?        Sl    33:32 /cm/shared/apps/sge/current/bin/lx26-amd64/sge_execd
> 12165 ?        S      0:00  \_ sge_shepherd-60013 -bg
> 12389 ?        S      0:00                  \_ python /cm/shared/apps/intel/impi/3.2.2.006/bin64/mpiexec ....
> 
> 
> Apparently, we have tight integration then. I did look for sge_shepherd
> on the wrong node (not the master node). This is the first time I take a
> closer look at these daemons, that's why a little confusion here (we got
> the machine pre-configured and all, getting familiar with the system
> always takes a factor of pi longer than expected). Sorry for the noise.

The sge_shepherd will be started on each slave node in case of a tight integration too. When you have a loose integration and no sge_shepherd on the slaves, then there maybe processes which survive the crash of a job and hence results in the effect you observed. Simply because SGE doesn't know anything about the processes started by a simple rsh/ssh outside of SGE's context.

There is a Howto for the tight integration of MPICH2 prior 1.3 and Intel MPI which you are using into SGE:

http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html

http://gridengine.sunsource.net/howto/remove_orphaned_processes.html

Intel MPICH2 will at some point in the future also use the Hydra startup manager.

-- Reuti


> Now that we know what to look for, we can search for jobs which do not
> behave.
> 
> best,
> Steve
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=307950
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=307954

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list