[GE users] SGE - Failed To Execute openmp parallel job - job stuck in 't' state

marcosb marbarfa at gmail.com
Mon Sep 28 18:14:12 BST 2009


Hi, I'm having problems executing a simple parallel openmp job.
The job is scheduled and gets stuck in the 't' state, afterwards the sgeexecd falls. I already tried changing the allocation_rule but nothing changed.
I have no idea what it could be, I google it but couldn't find anything, even reinstalled everything but still have the same problem.

Here is the exec messages:

09/28/2009 13:43:57|  main|node10|I|starting up SGE 6.2u3 (lx24-amd64)



Here is the qmaster messages:

09/28/2009 13:46:28| timer|node10|W|failed to deliver job 24.1 to queue "all.q at node10"
09/28/2009 13:46:28|listen|node10|E|commlib error: got read error (closing "node10/execd/1")




Any help is very appreciated!.

Thanks in advance.


Here is may PE:

# Version: 6.2u3
# 
# DO NOT MODIFY THIS FILE MANUALLY!
# 
pe_name            smp
slots              4
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $pe_slots
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary TRUE
 
 




And here is my script



#$ -N dotPRODUCT
 
#$ -S /bin/bash
#$ -o ~
#$ -e ~
#$ -q all.q
 
#$ -pe smp 2 
#$ -v OMP_NUM_THREADS=$NSLOTS
cd /tmp
gcc -fopenmp dot_product.c -lm
mv a.out dot_product
 
./dot_product > dot_product_output_nt2.txt
echo "Program output written to dot_product_output_nt2.txt"
 
rm dot_product

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=219482

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list