[GE users] Migration Issue + Conditional "failed receiving gdi request" error

Reuti reuti at staff.uni-marburg.de
Sat Aug 11 12:41:32 BST 2007


Hi,

Am 11.08.2007 um 04:08 schrieb Jonathan Pierce:

> We have two machines set up for qmaster responsibilities: cerebro- 
> rmn1 and cerebro-rmn2.  Right now, cerebro-rmn1 is master.  If I  
> attempt to qstat from cerebro-rmn2:
>
> <hdezmora at cerebro-rmn2.data> qstat
> error: commlib error: got read error (closing "cerebro-rmn1.data/ 
> qmaster/1")
> error: failed sending gdi request

you set up the shadow master like http://gridengine.sunsource.net/ 
howto/shadow.html and a BDB server?

-- Reuti

> and the following appears in /usr/sge/loni/spool/qmaster/messages  
> (loni is the cell):
>
> 08/10/2007 18:44:21|qmaster|cerebro-rmn1|E|commlib error: got read  
> error (closing "cerebro-rsn2.data/qstat/6")
>
>
> Now, this may be related to another issue we're experiencing.   
> Originally, when we tried to migrate services, it shut down the  
> qmaster, but failed to start it on the second, leaving the grid  
> engine in an unusable state until qmaster was manually started on  
> one or the other.  The main issue was that the lock file was not  
> being deleted.  We hacked the script as follows:
>
> lock_file_read_retries=15
>        lock_file_read_count=0
>        lock_file_found=0
>        while [ $lock_file_read_count -lt $lock_file_read_retries ]; do
>           if [ -f $qmaster_spool_dir/lock ]; then
>               rm $qmaster_spool_dir/lock
>               lock_file_found=1
>              break
>           fi
>           sleep 5
>           lock_file_read_count=`expr $lock_file_read_count + 1`
>        done
>
> where the defaults are lock_file_read_retries=10 and sleep 3; the  
> "rm $qmaster_spool_dir/lock" line was added.  I would assume that  
> migrate should already work on its own, but we added this as a  
> (hopeful) temporary fix.  Included this information in case it's  
> helpful to anybody in figuring out what's wrong.  Any assistance  
> would be greatly appreciated.
>
> Thank you very much,
> Jonathan
>
> Jonathan Pierce
> System Administrator
> Laboratory of Neuro Imaging, UCLA
> 635 Charles E. Young Drive South, Suite 225
> Los Angeles, CA 90095-7332
> Tel: 310.267.5076
> Cell: 310.487.8365
> Fax: 310.206.5518
> jonathan.pierce at loni.ucla.edu
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list