[GE users] Yet anaother MPICH tight-integration problem

David S. dgs at gs.washington.edu
Wed Sep 8 05:00:13 BST 2004


> Your allocation rule should be fine. But try switching
> job_is_first_task from TRUE to FALSE. It this does not
> resolve the problem you need to look into execds messages
> file. There you should find diagnosis information that should
> help you.

Changing the value of 'job_is_first_task' makes no difference.
In either case, the grid engine appears to start the master
process and one slave process on a node, walks through
'$TMPDIR/machines' starting slaves on the nodes listed there,
then tries and fails to start a second slave on the node
running the master.  At that point the job aborts.  All that's
in the 'messages' file in the spool directory of the master's
node is a message like

	09/07/2004 20:48:25|execd|eee006|E|no free queue for job 75 of user dgs at eee008.grid.gs.washington.edu (localhost = eee006.grid.gs.washington.edu)

David S.

> 


    [ Part 2, Application/PGP-SIGNATURE 193 bytes. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list