[GE users] qstat Eqw seems to be related to NFS slowness...

jpierce jonathan.pierce at loni.ucla.edu
Mon Aug 24 23:35:33 BST 2009


We tried it on CentOS 4.3 and then again after upgrading to 5.2. Just one 
clarification on the earlier message: we were trying to automount NFS, not ZFS 
shares.

On 8/24/09 3:16 PM, mbay2002 wrote:
> This is Centos 5.2, though according to the other messages in the thread
> this seems to be a known issue.
>
>
> tmacmd wrote:
>> What operating system and version is being used?
>>
>> --tmac
>>
>> RedHat Certified Engineer #804006984323821 (RHEL4)
>> RedHat Certified Engineer #805007643429572 (RHEL5)
>>
>> Principal Consultant
>>
>>
>>
>>
>> On Fri, Aug 21, 2009 at 4:28 PM, mbay2002<jeff at haferman.com>  wrote:
>>> We've got a cluster where /home is a ZFS filsystem, and the rest of our
>>> filesystems are lustre.
>>>
>>> What I've been noticing is that when users submit array jobs through
>>> qsub (it seems they have to have 200 or more instances for this to
>>> occur), some of the jobs error out (qstat shows "Eqw").
>>>
>>> When I inspect the error, it shows that /home does not exist on some of
>>> the nodes.  /home is automounted, so it doesn't appear until a user
>>> connects to a node.  What I've found is that if I clear the error
>>> (qmod -cj<job-number>  ) the jobs usually take off and complete.
>>>
>>> So, what I'm thinking is that when roughly 200 (or more) array jobs are
>>> submitted, NFS / automount can't simultaneously mount /home/<user>
>>> across all the nodes.  MOST of them work, but a few error out.  Perhaps
>>> it is more appropriate to post this to a ZFS forum, but, has anyone else
>>> seen this behavior, and if so, is there a fix?
>>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214049
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

-- 
Jonathan Pierce
Systems Administrator
Laboratory of Neuro Imaging, UCLA
635 Charles E. Young Drive South,
Suite 225 Los Angeles, CA 90095-7334
Tel: 310.267.5076
Cell: 310.487.8365
Fax: 310.206.5518
jonathan.pierce at loni.ucla.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=214051

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list