[GE users] abaqus issues on ge 6.2u5

leonardz leonardz at sickkids.ca
Thu May 27 17:52:41 BST 2010


We are trying to see why an abaqus job is failing.

We submit the job requesting 36 GB of memery as -l h_vmem=36g - this wil run on our 128GB node so we should be okay.

the job run through an initialisation phase and then fails without writing any error logs, including the job's .e and .o file. The only t hing we have to go on is the qacct record: that is below.

When I look up exit status=1 on the ge webpage (
http://wikis.sun.com/display/gridengine62u5/Error+Messages)

I see

1       Presumably before job   f       Job could not be started

As you can see below the job ran for 16 minutes and used only 4.25 gb, about 12% of the requested amount.

We don't get any idea from abaqus of what went wrong. The messages file on the compute node shows no messages.

The qacct output below doesn't seem to indicate what the problem is (that I can see). error 1 seems to indicate the job did not start, but it ran for 16 minutes.

Does anybody run sge and abaqus? Any ideas on where to look?
or Alternatively:
Anyone spot a clue in the qacct message?
Thanks.

qacct -j 157741
qname        abaqus.q
hostname     cn-r5-34
.....
jobnumber    157741
taskid       undefined
account      sge
priority     0
qsub_time    Thu May 27 10:46:47 2010
start_time   Thu May 27 10:46:55 2010
end_time     Thu May 27 11:15:36 2010
granted_pe   NONE
slots        1
failed       0
exit_status  1
ru_wallclock 1721
ru_utime     980.909
ru_stime     150.449
ru_maxrss    0
ru_ixrss     0
ru_ismrss    0
ru_idrss     0
ru_isrss     0
ru_minflt    6855375
ru_majflt    0
ru_nswap     0
ru_inblock   0
ru_oublock   0
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     232019
ru_nivcsw    146249
cpu          1131.359
mem          1690.376
io           0.000
iow          0.000
maxvmem      4.273G
arid         undefined



Len Zaifman
Systems Manager, High Performance Systems
The Centre for Computational Biology
The Hospital for Sick Children
555 University Ave.
Toronto, Ont M5G 1X8

tel: 416-813-5513
email: leonardz at sickkids.ca

This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=259062

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list