[GE users] failed receiving gdi request

geno ithildin at teomech.ugent.be
Fri May 25 14:32:19 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

hi,

Reuti schreef:
> Hi,
>
> Am 25.05.2007 um 11:12 schrieb geno:
>
>> We freshly set up a GE, version N1GE 6.0u9
>> qmaster on a Xeon, with 2.6.9-42.ELsmp i686
>> sgeexecd  on Opteron nodes, with 2.6.9-42.ELsmp x86_64
>>
>> Our first jobs seemed to run fine.
>> Parallel jobs did not run because MPI wasn't (and maybe isn't) set up 
>> properly.
>> So we got errors like "cannot run in PE "mpi" because it only offers 
>> 0 slots"
>
> you set the number of slots in the PE definition to a sensible value, 
> and attached the PE also to a cluster queue of your choice?
Slots correspond with the total nr of slots.
Qmon shows mpi as referenced PE for my both queus.

# qconf -sp mpi
pe_name           mpi
slots             140
user_lists        astro1 maphy1
xuser_lists       NONE
start_proc_args   /nfsshare/sge-root/mpi/startmpi.sh $pe_hostfile
stop_proc_args    /nfsshare/sge-root/mpi/stopmpi.sh
allocation_rule   $fill_up
control_slaves    FALSE
job_is_first_task FALSE
urgency_slots     min
>
>> By adding lamboot and lamhalt in the script, and adding some changes 
>> to the PE environment, these PE related errors disappeared.
>> Now we got a new error :
>>    error: can't unpack gdi request
>>    error: error unpacking gdi request: bad argument
>>    failed receiving gdi request
>
> For a proper LAM/MPI integration, this might help:
>
> http://gridengine.sunsource.net/howto/lam-integration/lam-integration.html 
>
Thanks. I'll have a closer look at this.

>
>> In your mailing list archive, this error was related to:
>> - having different GE versions. we don't.
>> - having too much in messages in read buffer. we don't (0).
>>
>> The gdi error prevents us now from starting new jobs, parallel or not.
>> I have no idea about what gdi is. Does anyone know what happens ?
>> geno
>
> Can you please check, whether any queues are in status E (error) and 
> clear it by using qmod?
One node had status E; I cleared it.

gdi error keeps existing.

geno.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list