[GE users] Problem with Gaussian 03 SMP jobs

jfprieur jfprieur at gmail.com
Wed Jul 15 17:57:23 BST 2009


I am encoutering a similar issue as posted in this message:


The difference is that we are not using Linda with Gaussian, these are just jobs that have %nproc=4 in the gaussian input file. When I submit them via SGE, the job gets assigned to a node, and then it stays stuck on l1.exe at 100% CPU. Here is the log file from this job:

Entering Gaussian System, Link 0=g03
Initial command:
/share/apps/g03/l1.exe /state/partition1/Gau-16806.inp

If I submit the same job 'manually', under the user's account, on the front end and the nodes(we are using Rocks 5.2 on a pretty vanilla 20node/160 cpu cluster) it works flawlessly, using 400% cpu and going
through the different Gaussian link executables.

Even weirder is that one of the Gaussian test input files (test397.com) modified with %nproc=4 runs fine when submitted to SGE
(I think this is a very small job compared to my users jobs)

I have tried setting the stacksize to 128,000 (it is lower than this on CentOS 5.3), same results I have tried reducing the shmmax kernel setting to 2GB (was at 64GB) but same behaviour.

I am somewhat stuck, SGE is working flawlessly for 'normal' 1CPU jobs but this error has me scratching my head...My gut says some kind of permissions or path problem (if it is not some kernel setting that I
missed) but I may have been staring at it for too long...

JF Prieur
Department of Chemistry and Biochemistry
Concordia University, Montreal, QC, CANADA


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list