[GE users] SGE - Failed To Execute openmp parallel job - job stuck in 't' state

marcosb marbarfa at gmail.com
Mon Sep 28 18:14:12 BST 2009

Hi, I'm having problems executing a simple parallel openmp job.
The job is scheduled and gets stuck in the 't' state, afterwards the sgeexecd falls. I already tried changing the allocation_rule but nothing changed.
I have no idea what it could be, I google it but couldn't find anything, even reinstalled everything but still have the same problem.

Here is the exec messages:

09/28/2009 13:43:57|  main|node10|I|starting up SGE 6.2u3 (lx24-amd64)

Here is the qmaster messages:

09/28/2009 13:46:28| timer|node10|W|failed to deliver job 24.1 to queue "all.q at node10"
09/28/2009 13:46:28|listen|node10|E|commlib error: got read error (closing "node10/execd/1")

Any help is very appreciated!.

Thanks in advance.

Here is may PE:

# Version: 6.2u3
pe_name            smp
slots              4
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $pe_slots
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary TRUE

And here is my script

#$ -N dotPRODUCT
#$ -S /bin/bash
#$ -o ~
#$ -e ~
#$ -q all.q
#$ -pe smp 2 
cd /tmp
gcc -fopenmp dot_product.c -lm
mv a.out dot_product
./dot_product > dot_product_output_nt2.txt
echo "Program output written to dot_product_output_nt2.txt"
rm dot_product


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list