[GE users] array job using mpich2 tight integration

Reuti reuti at staff.uni-marburg.de
Wed Aug 15 14:14:05 BST 2007


Am 15.08.2007 um 14:37 schrieb Yuan Wan:

> I get a problem running mpich2 array job.
> I followed the HowTo instruction to tightly integrated mpich2 with  
> SGE. And it works fine on out cluster for single job.
>
> But if I submit an array job to SGE (running several same mpi  
> tasks), some tasks will fail to find live smpd on target nodes.
>
> Anyone with succesful experience know the possible reason?  Thanks

I simply haven't thought about array jobs while writing the Howto ;-)  
As I use the job id for the port, this might have odd side effects of  
course.

Maybe you can try something like (depending on your range of task ids  
[here up to 10] and your job turnaround)

index=${SGE_TASK_ID/undefined/1}
port=$((JOB_ID % 500 * 10 + 19999 + index))

in the start/stop/job scripts to create a more unique port numbers.

-- Reuti


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list