[GE users] GridEngine on Sierra systems

Reuti reuti at staff.uni-marburg.de
Wed May 28 23:15:11 BST 2008


Hi,

Am 28.05.2008 um 13:33 schrieb Fedele STABILE:

> In my installation GE doesn't substitute the native resource manager.
> It's necessary explain the mechanism:
> HP resource manager (we call RMS) is part of the Operating System.
> So if i need to run jobs or allocate cpus on the cluster i need to use
> RMS commands like prun or allocate.
> HP-RMS uses a database to manage informations on the state of the
> system, but it doesn't manage any job queue.
> When i submit a job via GridEngine, i must interact with HP-RMS  to
> execute my job on the cluster. If i don't have any particular
> requirement (example: location of the reserved resources) i can use  
> prun
> instead of  mpirun to launch the job.
> Killing the job is easy because the signal kills also all  
> processes, but
> suspending and resume are problems because these signals are not
> propagated.
> For this reason i developed scripts that i'm testing, but i see  
> they are
> ok.

can you provide any link for HP's RMS? I found something, and at the  
first glance it looks to me like a plain resource manager, just  
without a schedule or queuing facility (similar to what you mentioned).

> Now there's another question to solve: is it possible that GE  
> reserve a
> resource and HP-RMS another?

AFAICS: yes, this might happen. So I would try to stay only with one  
resource manager and don't use HP one's. We use HP-MPI also on Linux,  
and it will work with a PE in SGE if you adjust the generated  
machinefile slightly (in case HP-MPI is your main interest) and  
"export MPI_REMSH=rsh" in your jobscript before "mpirun", so that  
SGE's rsh-wrapper will work.

-- Reuti


> Fedele
>
>
> Il giorno lun, 26/05/2008 alle 16.48 +0200, Reuti ha scritto:
>> Hi,
>>
>> Am 26.05.2008 um 16:40 schrieb Fedele STABILE:
>>
>>> I've installed GridENgine on my Sierra System and it works !!
>>>
>>> Sierra System is an HP project of supercomputer that uses a QSW
>>> network
>>> to connect a cluster of HP server. It uses a Resource Manager that
>>> loads
>>> the parallel executable, as mpirun, and send it to the nodes in the
>>> cluster.
>>> So to submit a job GridEngine needs to communicate with this  
>>> Resource
>>> Manager.
>>>
>>> I have created script for "execution methods" that can suspend and
>>> resume parallel jobs, no modification needed for starting and
>>> termination.
>>>
>>> Is there anyone interested to discuss this argument?
>>
>> if you have it running already, it would be nice if you could prepare
>> a Howto for it (how it's working in Sierra Systems and what the
>> scripts do). Is it still a Tight Integration, although it sends the
>> jobs in the end to HP's resource manager?
>>
>> -- Reuti
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list