[GE users] qmake job and errors: pid: No such file or directory

Chris Dagdigian dag at sonsorol.org
Mon Jan 21 19:42:44 GMT 2008


Sean,

Did you ever solve your Solexa pipeline and SGE issues? I did some  
work in December on a lab environment that needed to spin up its  
cluster and storage resources in order to handle the arrival of a 2nd  
Solexa instrument. I did the storage, cluster and SGE work though  
without looking too closely at the Solexa toolset.

The issue we had with Solexa was that the pipeline was built on qmake  
and seemed biased towards synchronous use by a single person on a  
dedicated system -- no easy way to batch submit a job via qsub and let  
it pend asynchronously until resources are available (something that  
will be needed in a multi-user, multi-instrument environment). I think  
people have worked around this by now using qsub wrappers over the  
qmake commands but I'm not sure.

It is very cool (and possibly not known much within the SGE community)  
how so much of the "next generation DNA sequencing" business is being  
built on top of Grid Engine. Solexa uses 'qmake' under the hood for  
their runs and indications are that the new Helicos instruments are  
also going to have analytical workflows that run off of Grid Engine.

This is going to be big in 2008 - I've been thinking about setting up  
a wiki or mailing list specifically for "lab instruments that require  
Grid Engine" so I'm on the hunt for people interested in the topic.

On a slightly related side note -- if you are going to be in the  
Boston area in late april we are organizing a 1-day workshop on "next- 
gen sequencing" with particular focus on the data handling and  
migration problems that smaller labs are beginning to get bitten with.  
I won't spam the details here but we've got the info posted up on http://blog.bioteam.net 
  now.  We are trying to get as many sequencing and IT types as we can  
into the same room for some practical "what the heck do we do with  
terabyte capable lab instruments" talks.

Creating a "DNA Sequencers shipping with SGE" article and summary is  
also on the personal to-do list for gridengine.info as well.

Regards,
Chris




On Dec 30, 2007, at 5:19 PM, Sean Davis wrote:

>
>
> On Dec 30, 2007 10:29 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
> Am 29.12.2007 um 15:05 schrieb Sean Davis:
>
>> On Dec 29, 2007 7:58 AM, Reuti < reuti at staff.uni-marburg.de> wrote:
>> Am 29.12.2007 um 00:11 schrieb Sean Davis:
>>
>>> On Dec 28, 2007 5:16 PM, Reuti < reuti at staff.uni-marburg.de> wrote:
>>> Hi,
>>>
>>> Am 28.12.2007 um 22:15 schrieb Sean Davis:
>>>
>>> > I am trying to run qmake on the Solexa analysis pipeline (probably
>>> > not important, but....).  When I run this command, I get the
>>> > following error:
>>> >
>>> > can't open file /tmp/285.1.all.q/pid: No such file or directory
>>> >
>>> > I have the shepherd trace available, also.  Does this problem ring
>>> > any bells for anyone?
>>> >
>>> > As for details of our setup, we have several linux boxes running
>>> > the lx24-amd64 binaries (though they are intel machines).  All are
>>> > using ssh for communication.  None has a firewall enabled.  They
>>> > are using shared home directories, but /tmp, etc., are local to  
>>> the
>>> > machines.  Qlogin, qrsh, and qsh seem to be working.  We have
>>> > openmpi installed, also.
>>> >
>>> > I know the question is pretty vague.  I am pretty new to SGE, so
>>> > debugging these issues is pretty new also.  Any guidance is
>>> > appreciated.
>>>
>>> usually the 'pid' file should go to a spool directory like:
>>>
>>> $SGE_ROOT/spool/sge/<node>/active_jobs/<job_id>.<task_id>
>>>
>>> or better:
>>>
>>> /var/spool/sge/<node>/active_jobs/<job_id>.<task_id>
>>>
>>> and not the local tmp directory for the job. Where is the spool
>>> directory for the node: local or on the NFS server? Best would be to
>>> have it local on all nodes:
>>>
>>> http://gridengine.sunsource.net/howto/nfsreduce.html
>>>
>>> Thanks, Reuti and John.
>>>
>>> The /tmp directory is world read and write, just to make certain.
>>>
>>> How can I set the location of the pid file?  Is there a convenient  
>>> way to check where the local spooling for each node is located?
>>>
>>> Here is a bit of the shepherd trace, as it seems like the local  
>>> spool directory is used, but also the tmp directory on the qmaster:
>>>
>>> 2/28/2007 16:05:54 [10020:25961]: setting environment
>>> 12/28/2007 16:05:54 [10020:25961]: Initializing error file
>>> 12/28/2007 16:05:54 [10020:25958]: forked "job" with pid 25961
>>> 12/28/2007 16:05:54 [10020:25958]: child: job - pid: 25961
>>> 12/28/2007 16:05:54 [10020:25961]: switching to intermediate/ 
>>> target user
>>> 12/28/2007 16:05:54 [10005:25961]: closing all filedescriptors
>>> 12/28/2007 16:05:54 [10005:25961]: further messages are in "error"  
>>> and "trace"
>>> 12/28/2007 16:05:54 [0:25961]: now running with uid=0, euid=0
>>> 12/28/2007 16:05:54 [0:25961]: start qlogin
>>> 12/28/2007 16:05:54 [0:25961]: calling qlogin_starter(/var/spool/sge
>>> /local/pressa/active_jobs/285.1, /usr/sbin/sshd -i);
>>> 12/28/2007 16:05:54 [0:25961]: uid = 0, euid = 0, gid = 0, egid = 0
>>> 12/28/2007 16:05:54 [0:25961]: using sfd 1
>>> 12/28/2007 16:05:54 [0:25961]: bound to port 54613
>>> 12/28/2007 16:05:54 [0:25961]: write_to_qrsh - data = 0:54613:/usr/ 
>>> local/sge/utilbin/lx24-amd64:/var/spool/sge/local/pressa/ 
>>> active_jobs/285.1:pressa
>>> 12/28/2007 16:05:54 [0:25961]: write_to_qrsh - address =  
>>> shakespeare:44242
>>> 12/28/2007 16:05:54 [0:25961]: write_to_qrsh - host = shakespeare,  
>>> port = 44242
>>> 12/28/2007 16:05:54 [0:25961]: waiting for connection.
>>> 12/28/2007 16:06:54 [0:25961]: nobody connected to the socket
>>
>> What are the protections of your var/spool/sge/local/pressa/ 
>> active_jobs and who is the owner? Who is the admin user of your SGE  
>> installation (the one who owns /usr/local/sge)?
>>
>> Reuti,
>>
>> root owns /usr/local/sge--this could be a problem, since the  
>> partition is mounted with root-squash.  However, sgeadmin is the  
>> admin user (the user under which sge_execd, sge_qmaster, etc.  
>> run).  As for /var/spool/sge/local/pressa/active_jobs, the owner is  
>> sgeadmin; everyone can r-x, owner can rwx.
>>
>> I have changed the ownership of /usr/local/sge to sgeadmin, but the  
>> problem seems to remain.
>
>
> Are these serial or parallel jobs? With root_squash I see a problem:  
> what are the settings in $SGE_ROOT/utilbin/lx24-amd64 for:
>
> -rwsr-xr-x  1 root root  32K Oct 20  2006 rlogin
> -rwsr-xr-x  1 root root  22K Oct 20  2006 rsh
> -rwsr-xr-x  1 root root  23K Oct 20  2006 testsuidroot
>
> They must have a setuid as they must run as root.
>
> I have fixed these and the root_squash issue.  The jobs are qmake  
> jobs that look like:
>
> qmake -pe Test 12-16 -cwd -v PATH -- -j 16
>
> I continue to get error messages related to not finding /tmp/<jobid>/ 
> pid.  I don't know whether this is a symptom or a cause.
>
> Sean
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list