[GE users] failed receiving gdi request
ithildin at teomech.ugent.be
Fri May 25 14:32:19 BST 2007
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
> Am 25.05.2007 um 11:12 schrieb geno:
>> We freshly set up a GE, version N1GE 6.0u9
>> qmaster on a Xeon, with 2.6.9-42.ELsmp i686
>> sgeexecd on Opteron nodes, with 2.6.9-42.ELsmp x86_64
>> Our first jobs seemed to run fine.
>> Parallel jobs did not run because MPI wasn't (and maybe isn't) set up
>> So we got errors like "cannot run in PE "mpi" because it only offers
>> 0 slots"
> you set the number of slots in the PE definition to a sensible value,
> and attached the PE also to a cluster queue of your choice?
Slots correspond with the total nr of slots.
Qmon shows mpi as referenced PE for my both queus.
# qconf -sp mpi
user_lists astro1 maphy1
start_proc_args /nfsshare/sge-root/mpi/startmpi.sh $pe_hostfile
>> By adding lamboot and lamhalt in the script, and adding some changes
>> to the PE environment, these PE related errors disappeared.
>> Now we got a new error :
>> error: can't unpack gdi request
>> error: error unpacking gdi request: bad argument
>> failed receiving gdi request
> For a proper LAM/MPI integration, this might help:
Thanks. I'll have a closer look at this.
>> In your mailing list archive, this error was related to:
>> - having different GE versions. we don't.
>> - having too much in messages in read buffer. we don't (0).
>> The gdi error prevents us now from starting new jobs, parallel or not.
>> I have no idea about what gdi is. Does anyone know what happens ?
> Can you please check, whether any queues are in status E (error) and
> clear it by using qmod?
One node had status E; I cleared it.
gdi error keeps existing.
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users