[GE users] mpich2 tight integration not working

reuti reuti at staff.uni-marburg.de
Mon Mar 9 19:34:56 GMT 2009


Am 09.03.2009 um 19:46 schrieb kennethsdsc:

> A couple other issues:
>
> - I had to specify task count in my qsub line:
> qsub -t 1-4:4 -l h_rt=18:00:00 -q all.q -pe mpich2_mpd 4 testjob.sh
>
> - I had to use SGE_TASK_ID, instead of TASK_ID in mpich2_mpd.sh:
> #export MPICH2_ROOT=/usr/local/apps/sge/mpich2/install
> #export PATH=$MPICH2_ROOT/bin:$PATH
> #export MPD_CON_EXT="sge_$JOB_ID.$TASK_ID"
> setenv MPICH2_ROOT /usr/local/apps/sge/mpich2/install
> setenv PATH $MPICH2_ROOT/bin:$PATH
> setenv MPD_CON_EXT "sge_$JOB_ID.$SGE_TASK_ID"
>
> It looks like SGE is using csh to execute the file, rather
> than using #!/bin/ksh.  Not sure if that's a configuration issue on
> my part?


For the prolog/epilog it should just exec the specified binaries. You  
are on which platform? /bin/bash is available?

The queue settings for the interpreter should only affect the  
execution of the jobscript, not the prolog/epilog. Can you please  
post your queue definition?

-- Reuti



> Kenneth
>
> On Mon, 9 Mar 2009, kennethsdsc wrote:
>
>> Date: Mon, 9 Mar 2009 11:39:16 -0700 (PDT)
>> From: kennethsdsc <kenneth at sdsc.edu>
>> Reply-To: users <users at gridengine.sunsource.net>
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] mpich2 tight integration not working
>>
>> I also am playing with tight mpich2_mpd integration with sge 62u2.
>> I'm not sure if my problem is related to yours.  I found some
>> mismatches in the start scripts and what SGE is setting.  I was  
>> able to
>> get the mpihello.c to work, by modifying start and stop scripts.
>>
>> It looks like SGE is not setting TASK_ID in the environment,
>> but is setting SGE_TASK_ID, so I modified startmpich2.sh:
>>
>> #export MPD_CON_EXT="sge_$JOB_ID.$TASK_ID"
>> export MPD_CON_EXT="sge_$JOB_ID.$SGE_TASK_ID"
>>
>> I also had to give stopmpich2.sh the full path to mpdallexit:
>> #mpdallexit
>> /usr/local/apps/sge/mpich2/install/bin/mpdallexit
>>
>> Kenneth
>>
>> On Thu, 4 Dec 2008, Patterson, Ron (NIH/NLM/NCBI) [C] wrote:
>>
>>> Date: Thu, 4 Dec 2008 14:25:47 -0500
>>> From: "Patterson, Ron (NIH/NLM/NCBI) [C]"  
>>> <patterso at ncbi.nlm.nih.gov>
>>> Reply-To: users <users at gridengine.sunsource.net>
>>> To: users at gridengine.sunsource.net
>>> Subject: RE: [GE users] mpich2 tight integration not working
>>>
>>> Reuti,
>>>
>>>> you set "job_is_first_task  FALSE" in the PE?
>>>
>>> No - I had it set to TRUE. I made the change and my first test was
>>> successful. Thank you very much for your amazingly speedy reply.
>>>
>>> Ron
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=91204
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=125724
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=125727
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=125765

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list