[GE users] GridEngine on Sierra systems

Reuti reuti at staff.uni-marburg.de
Thu May 29 13:07:25 BST 2008


Hi,

Am 29.05.2008 um 13:09 schrieb Fedele STABILE:

> Here are links to HP AlphaServerSC (Sierra system) features
> http://h18002.www1.hp.com/alphaserver/archive/sc/ 
> sys_sc45_features.html
>
> here you can read System SPD (system specifications) the first 5 pages
> are on RMS:
> http://webdocs.caspur.it/hp_doc/sc_doc/web/index.htm

thx - I'll have a look at it.

> you can see that someone uses LSF to manage scheduling and queue.

With one difference: there is a special adapter modul available for  
HP-RMS to honor the hostlist and other stuff from LSF (which is not  
the case for SGE). But there is also a long list (section 4.12) of  
Known Problems or Limitations).

Why not just SGE alone and issuing an mpirun instead of the prun?

> Second question:
> In a Sierra cluster generally each node has similar configuration and
> feature to others so i think it's possible to forget the communication
> between HP-RMS and GE-RMS.
> I explain:
> 1) on my installation i defined a complex consumabe named n_cpu (with
> initial value equal to the cpu's available);

The built-in complex "slots" in SGE covers already the number of  
installed cores, if it's set in  the queue definition to the number  
of cores per machine, as these are always cluster queues with a queue  
instance per machine.

> 2) each time user submits a job indicate how many cpu uses/reserves  
> (-l
> n_cpu=# );

Parallel jobs can be submitted by SGE by using a so called parallel  
environment and requesting something like:

qsub -pe hp_mpi 8 myjob.sh

SGE will select 8 cores (according to the allocation rule defined in  
the PE) and give you a list of machines/cores to use which you supply  
as argument to mpirun.

> 3) GridENgine validate the request assigning cpu from one or more  
> nodes
> in the cluster and then runs the command submitted

I assume n_cpu is a global consumable, and the machine SGE selects  
for you, you simply ignore? For SGE these are all serial jobs and the  
hostlist would always contain only one machine/core. It's more like  
"number of licenses I need" or so to SGE.

> 4) HP RMS receive the run (prun) or allocate wherever is executed
> because it's a cluster wide resource and itself defines the  
> location and
> amount of resource.

After SGE made the scheduling (which job should run), only HP-RMS  
will allocate the correct amount of machines/cores.

> So the GridENgine resource n_cpu is a consumable value and needs to GE
> for scheduling purposes (if cluster has 10 cpu available GE execute  
> the
> job sending the command to HP-RMS)
> HP RMS uses n_cpu value to allocate on the cluster the resource using
> the user choice, if he want.

Did you try to run any simple serial job like a "sleep 60" in the  
cluster by using only SGE? All nodes are execution nodes for SGE?

===============================================================

- if you have only parallel jobs which will always use prun and HP- 
RMS, then it should work although it might lead to:

1. job: SGE runs job with n_cpus 4 on "node01" (the jobscript) - like  
a serial job. HP-RMS will look for 4 cores, maybe 2*node02, node03,  
node04
2. job: SGE runs job with n_cpus 4 on "node03" (the jobscript) - like  
a serial job. HP-RMS will look for 4 cores, maybe 2*node01, node03,  
node04

I think it's hard to investigate, where your job is finally running.

- if you have a mixture of serial (no prun in the job) and parallel  
runs, you will most likely oversubscribe some nodes, as SGE will  
allocate machines/cores independently from HP-RMS

-- Reuti 



> Fedele
>
> Il giorno gio, 29/05/2008 alle 00.15 +0200, Reuti ha scritto:
>> Hi,
>>
>> Am 28.05.2008 um 13:33 schrieb Fedele STABILE:
>>
>>> In my installation GE doesn't substitute the native resource  
>>> manager.
>>> It's necessary explain the mechanism:
>>> HP resource manager (we call RMS) is part of the Operating System.
>>> So if i need to run jobs or allocate cpus on the cluster i need  
>>> to use
>>> RMS commands like prun or allocate.
>>> HP-RMS uses a database to manage informations on the state of the
>>> system, but it doesn't manage any job queue.
>>> When i submit a job via GridEngine, i must interact with HP-RMS  to
>>> execute my job on the cluster. If i don't have any particular
>>> requirement (example: location of the reserved resources) i can use
>>> prun
>>> instead of  mpirun to launch the job.
>>> Killing the job is easy because the signal kills also all
>>> processes, but
>>> suspending and resume are problems because these signals are not
>>> propagated.
>>> For this reason i developed scripts that i'm testing, but i see
>>> they are
>>> ok.
>>
>> can you provide any link for HP's RMS? I found something, and at the
>> first glance it looks to me like a plain resource manager, just
>> without a schedule or queuing facility (similar to what you  
>> mentioned).
>>
>>> Now there's another question to solve: is it possible that GE
>>> reserve a
>>> resource and HP-RMS another?
>>
>> AFAICS: yes, this might happen. So I would try to stay only with one
>> resource manager and don't use HP one's. We use HP-MPI also on Linux,
>> and it will work with a PE in SGE if you adjust the generated
>> machinefile slightly (in case HP-MPI is your main interest) and
>> "export MPI_REMSH=rsh" in your jobscript before "mpirun", so that
>> SGE's rsh-wrapper will work.
>>
>> -- Reuti
>>
>>
>>> Fedele
>>>
>>>
>>> Il giorno lun, 26/05/2008 alle 16.48 +0200, Reuti ha scritto:
>>>> Hi,
>>>>
>>>> Am 26.05.2008 um 16:40 schrieb Fedele STABILE:
>>>>
>>>>> I've installed GridENgine on my Sierra System and it works !!
>>>>>
>>>>> Sierra System is an HP project of supercomputer that uses a QSW
>>>>> network
>>>>> to connect a cluster of HP server. It uses a Resource Manager that
>>>>> loads
>>>>> the parallel executable, as mpirun, and send it to the nodes in  
>>>>> the
>>>>> cluster.
>>>>> So to submit a job GridEngine needs to communicate with this
>>>>> Resource
>>>>> Manager.
>>>>>
>>>>> I have created script for "execution methods" that can suspend and
>>>>> resume parallel jobs, no modification needed for starting and
>>>>> termination.
>>>>>
>>>>> Is there anyone interested to discuss this argument?
>>>>
>>>> if you have it running already, it would be nice if you could  
>>>> prepare
>>>> a Howto for it (how it's working in Sierra Systems and what the
>>>> scripts do). Is it still a Tight Integration, although it sends the
>>>> jobs in the end to HP's resource manager?
>>>>
>>>> -- Reuti
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list