[GE users] Error "EH_xacl not found in element"

pablorey prey at cesga.es
Wed Dec 2 13:50:12 GMT 2009


    [ The following text is in the "Windows-1252" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

    Hi reuti,

    We have a very special user: the regional weather forecast service. Their jobs cannot wait for free slots as a normal job because the forecast for the weather has to be published on time. So we have to move into execution its pending jobs as soon as possible.These are small jobs necessary to prepare all the files used by the job (a big job) that will obtain the forecast running in other machine.

    To do it we follow these steps:
    * ?Check if there are jobs of this user in error state and clears this state if it is necessary.
    * ?Hold all the pending jobs except its jobs.
    * ?Change the priority for this user.
    * ?Restrict access to the nodes while we are increasing complex_values like num_proc or memory to avoid jobs of other users to be executed in the selected nodes.
    * ?After a short period of time the complex_values of the selected nodes are restored.
    * ?The last step is to remove the hold state of the pending jobs

    How could we replace it by an RQS?. It sound very well but we don't know how an RQS could help us to solve this problem.

    Thanks,
    Pablo



On 02/12/2009 14:26, reuti wrote:

Hi,

can you explain the purpose of the script a little bit? Why are you
first lowering and then increasing s_vmem after 2 minutes?

It looks like it could be replaced by an RQS. You could also an
urgency policy to prioritize some jobs.

-- Reuti


Am 02.12.2009 um 14:02 schrieb pablorey:



    Hi,

    Yes we are using the classic spooling but I think that we
didn't removed any file from the spooling directory.

    We have investigated the problem and we have associated this
problem with a cron job used to prioritize some jobs. This script
is executed at 10, 30 and 50 minutes each hour. Basically, the
script follow this schema:

    for group in GROUP_1 GROUP_2 ... GROUP_N; do
        for node in group; do
            restrict_node_access $node >> $LOGFILE 2>&1
            qconf -mattr exechost complex_values s_vmem=9.7G $node


$LOGFILE 2>&1


        done

        sleep 120

        for node in group; do
            qconf -mattr exechost complex_values s_vmem=8.6G $node


$LOGFILE 2>&1


            restore_node_access $node >> $LOGFILE 2>&1
        done
    done

restrict_node_access(){
  node=$1
  qconf -se $node | grep ^user_lists | awk '{print $2}' >
$STATUSDIR/$node
  qconf -rattr exechost user_lists prey $node
}

restore_node_access(){
  node=$1
  if [ -r $STATUSDIR/$node ]; then
    qconf -rattr exechost user_lists `cat $STATUSDIR/$node` $node
  else
    qconf -rattr exechost user_lists NONE $node
  fi
}

    This script is running since several months ago without
problems until the last week. We have checked the log file and we
found out that this error appears only for GROUP_2 (with GROUP_1
works properly) and only if there is 1 pending job (if there are
more than 1 pending job the script works properly). The problematic
command is "qconf -rattr exechost user_lists prey $node" and these
are the errors detected:
    * In the log file: error: commlib error: got read error
(closing "svgd.local/qmaster/1")
    * In the qmaster messages file: 11/30/2009 06:14:03|worker|svgd|
C|!!!!!!!!!! EH_xacl not found in element !!!!!!!!!!

    We have tried to reproduce the problem for other user but we
were not able so we are disconcerted.

    Regards,
    Pablo



On 01/12/2009 18:02, aja wrote:


Hi, this seems to be a broken configuration of some userset. Do
you use classic spooling? If yes, didn't you remove accidentally
any file from the spooling directory? Regards, aja pablorey wrote:


Dear colleagues, In the last 24 hours we have suffered a very odd
behaviour of the GE master. It was stopped several times and we
found the following error in the qmaster messages: 11/26/2009
06:14:06|worker|svgd|C|!!!!!!!!!! EH_xacl not found in
element !!!!!!!!!! 11/26/2009 18:14:03|worker|svgd|C|!!!!!!!!!!
EH_xacl not found in element !!!!!!!!!! 11/27/2009 06:14:04|
worker|svgd|C|!!!!!!!!!! EH_xacl not found in element !!!!!!!!!!
11/27/2009 06:34:03|worker|svgd|C|!!!!!!!!!! EH_xacl not found in
element !!!!!!!!!! 11/27/2009 06:54:02|worker|svgd|C|!!!!!!!!!!
EH_xacl not found in element !!!!!!!!!! I was searching
information about this error but I did not find anything. Any
idea?. Could be it related to some kind of jobs?. Regards. --
Pablo Rey Mayo Tecnico de Sistemas Centro de Supercomputacion de
Galicia (CESGA) Avda. de Vigo s/n (Campus Sur) 15705 Santiago de
Compostela (Spain) Tel: +34 981 56 98 10 ext. 233; Fax: +34 981
59 46 16 email: prey at cesga.es<mailto:prey at cesga.es> <mailto:prey at cesga.es><mailto:prey at cesga.es>; http://
www.cesga.es/<http://www.cesga.es/> ------------------------------------------------
NOTA: Este mensaje ha sido redactado intencionadamente sin
utilizar acentos ni caracteres especiales, para que pueda ser
visualizado correctamente desde cualquier cliente de correo y
sistema. ------------------------------------------------


--
Pablo Rey Mayo
Tecnico de Sistemas
Centro de Supercomputacion de Galicia (CESGA)
Avda. de Vigo s/n (Campus Sur)
15705 Santiago de Compostela (Spain)
Tel: +34 981 56 98 10 ext. 233; Fax: +34 981 59 46 16
email: prey at cesga.es<mailto:prey at cesga.es>; http://www.cesga.es/
------------------------------------------------
NOTA: Este mensaje ha sido redactado intencionadamente sin utilizar
acentos ni caracteres especiales, para que pueda ser visualizado
correctamente desde cualquier cliente de correo y sistema.
------------------------------------------------
<xacobeo.jpg>



------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=230957

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




--
Pablo Rey Mayo
Tecnico de Sistemas
Centro de Supercomputacion de Galicia (CESGA)
Avda. de Vigo s/n (Campus Sur)
15705 Santiago de Compostela (Spain)
Tel: +34 981 56 98 10 ext. 233; Fax: +34 981 59 46 16
email: prey at cesga.es<mailto:prey at cesga.es>; http://www.cesga.es/
------------------------------------------------
NOTA: Este mensaje ha sido redactado intencionadamente sin utilizar
acentos ni caracteres especiales, para que pueda ser visualizado
correctamente desde cualquier cliente de correo y sistema.
------------------------------------------------

[cid:part1.03050406.05030203 at cesga.es]


    [ Part 2, "xacobeo.jpg"  Image/JPEG (Name: "xacobeo.jpg") 28 KB. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list