[GE users] Error "EH_xacl not found in element"

pablorey prey at cesga.es
Wed Dec 2 18:14:21 GMT 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

    Hi,

    You are right, we allow oversubscription by this user for a short time.

    Thanks by your advices. We will explore the different possibilities that your have suggested.

    Meanwhile, any suggestion about the error?

    Regards,
    Pablo




On 02/12/2009 16:36, reuti wrote:

Hi,

Am 02.12.2009 um 14:50 schrieb pablorey:



    Hi reuti,

    We have a very special user: the regional weather forecast
service. Their jobs cannot wait for free slots as a normal job
because the forecast for the weather has to be published on time.
So we have to move into execution its pending jobs as soon as
possible.These are small jobs necessary to prepare all the files
used by the job (a big job) that will obtain the forecast running
in other machine.

    To do it we follow these steps:
    * ?Check if there are jobs of this user in error state and
clears this state if it is necessary.
    * ?Hold all the pending jobs except its jobs.
    * ?Change the priority for this user.
    * ?Restrict access to the nodes while we are increasing
complex_values like num_proc or memory to avoid jobs of other users
to be executed in the selected nodes.



I always judge num_proc as a fixed feature and it shouldn't be
touched. This can be done by "slots".




    * ?After a short period of time the complex_values of the
selected nodes are restored.
    * ?The last step is to remove the hold state of the pending jobs

    How could we replace it by an RQS?. It sound very well but we
don't know how an RQS could help us to solve this problem.



It looks to me like you allow oversubscription by this user for a
short time. What about a special queue to which only this user has
access with one slot? You can also define a nice value of "0" in this
queue (entry priority), while in the default queue it's "19" (or the
jobs in the default queue even get suspended by subordination).

==

You can also have different limits for different users in an RQS:

limit name total hosts {*} to slots=9

and a second RQS with:

limit name default users !forecast hosts {*} to slots=8

and the limit in the queue definition is arbitrary (can be 9 or 42).
So user forecast always has one slot more. You could use a third RQS
if you want to limit user forecast not to fill a host with 9 slots
alone.

==

Another option could be to submit a bunch of Advance Reservations.
When they are granted you can submit a job into them and it will run
for sure. Having this as a repeating feature (cronlike) is already an
RFE:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=2935

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=230980

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




--
Pablo Rey Mayo
Tecnico de Sistemas
Centro de Supercomputacion de Galicia (CESGA)
Avda. de Vigo s/n (Campus Sur)
15705 Santiago de Compostela (Spain)
Tel: +34 981 56 98 10 ext. 233; Fax: +34 981 59 46 16
email: prey at cesga.es<mailto:prey at cesga.es>; http://www.cesga.es/
------------------------------------------------
NOTA: Este mensaje ha sido redactado intencionadamente sin utilizar
acentos ni caracteres especiales, para que pueda ser visualizado
correctamente desde cualquier cliente de correo y sistema.
------------------------------------------------

[cid:part1.06080604.03090502 at cesga.es]


    [ Part 2, "xacobeo.jpg"  Image/JPEG (Name: "xacobeo.jpg") 28 KB. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list