[GE users] Libraries present on execution hosts.

Reuti reuti at staff.uni-marburg.de
Thu Apr 14 15:21:13 BST 2005


I meant the entry:

qtype                 BATCH INTERACTIVE

in your queue definition. But I think, without this the job wouldn't try 
to start at all.

JONATHAN SELANDER wrote:
> qsh said it started ok twice, but i never got an xterm window. The logs didn't tell me much either.
> 
> I never remember setting a "queue type", what do you mean by this?
> 
> Also, does SGE split jobs on different nodes if one is too heavily loaded? Or does it simply choose the least loaded node and run the entire job on that one until it's finished?

The last choice you mentioned: the job will run there up to the end. If 
you want to relocate some jobs, you have to make some kind of 
checkpointing and then it would be possible to move the job to another node.

Depending on your setup, just limit the total slots of a node to the 
number of CPUs inside. This way you will never get a too high load on 
any node.

> 
> I'm tying to make a program that uses a lot of resources until i notice that all nodes are getting loaded, but i didn't succeed very well.

If you are testing serial programs and see this behavior, maybe some old 
processes are still running there after they escaped from the SGE 
control. Can you check this with a "ps -e f" and "top", whether there is 
still anything running which shouldn't be there?

Cheers - Reuti

> 
> Jonathan
> 
> -----Original Message-----
> From: Reuti <reuti at staff.uni-marburg.de>
> To: users at gridengine.sunsource.net
> Date: Thu, 14 Apr 2005 12:45:24 +0200
> Subject: Re: [GE users] Libraries present on execution hosts.
> 
> Yes, you could look in the messages file in 
> $SGE_ROOT/default/spool/qmaster/messages
> 
> Maybe you see more, when you set the loglevel to log_info in "qconf -mconf".
> 
> As qrsh is working, you set the queue type to interactive?
> 
> Cheers - Reuti
> 
> JONATHAN SELANDER wrote:
> 
>>I reinstalled the nodes together with X and libraries, and logging in to a node and running xterm like you said works. qrsh also works, but qsh still gives me the same error:
>>
>>bash-3.00$ qsh                    
>>waiting for interactive job to be scheduled ...
>>Could not start interactive job.
>>
>>Are there any logs or anything anywhere except in the spool dir that i can view? The information above isn't especially helpful.
>>
>>Jonathan
>>
>>-----Original Message-----
>>From: Reuti <reuti at staff.uni-marburg.de>
>>To: users at gridengine.sunsource.net
>>Date: Thu, 14 Apr 2005 10:17:47 +0200
>>Subject: Re: [GE users] Libraries present on execution hosts.
>>
>>Hi,
>>
>>can you try to login to a node by hand and try to start a X-Session with 
>>output to your login node? Then we know, that it's working in principle.
>>
>>export DISPLAY=bras:0
>>xterm
>>
>>or so. We need at least some X-libs on the nodes (and a running X-server 
>>on bras to connect to).
>>
>>Cheers - Reuti
>>
>>
>>JONATHAN SELANDER wrote:
>>
>>
>>>I did as you said, but the interactive job wouldn't start:
>>>
>>>bash-3.00$ export DISPLAY=bras:0
>>>bash-3.00$ qsh
>>>waiting for interactive job to be scheduled ...
>>>Could not start interactive job.
>>>
>>>"bras" is the master host from where i submit jobs, and it resolves to 10.0.0.1 which i also tried. Is the issue that i don't have an X terminal on the execution hosts?
>>>
>>>-----Original Message-----
>>>From: Ron Chen <ron_chen_123 at yahoo.com>
>>>To: users at gridengine.sunsource.net
>>>Date: Wed, 13 Apr 2005 10:12:31 -0700 (PDT)
>>>Subject: Re: [GE users] Libraries present on execution hosts.
>>>
>>>It should only be a warning message, because if
>>>$DISPLAY has the form :<id> (local display), then it
>>>is useless in the remote machine.
>>>
>>>As a test, can you set your DISPLAY with the name of
>>>the machine you submit jobs on (like <host>:0), and
>>>see if it works?
>>>
>>>-Ron
>>>
>>>
>>>--- JONATHAN SELANDER <S026655 at utb.hb.se> wrote:
>>>
>>>
>>>
>>>>However, I cannot run qsh, and I imagine this is
>>>>because there is no X available on the nodes. Is
>>>>this correct? If it is, I assume I always need
>>>>certain libraries and interpreters present on the
>>>>execution hosts for the jobs I wish to execute.
>>>>
>>>>The error qsh gives me is:
>>>>
>>>>error: local DISPLAY variable ":0.0" delivered with
>>>>interactive job
>>>>
>>>>-
>>>>
>>>>$ echo $DISPLAY
>>>>:0.0
>>>>
>>>>Please tell me I'm wrong in my assumption.
>>>>
>>>>Thanks,
>>>>Jonathan
>>>>
>>>>
>>>>
>>>
>>>---------------------------------------------------------------------
>>>
>>>
>>>
>>>>To unsubscribe, e-mail:
>>>>users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail:
>>>>users-help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>>
>>>
>>>		
>>>__________________________________ 
>>>Do you Yahoo!? 
>>>Yahoo! Small Business - Try our new resources site!
>>>http://smallbusiness.yahoo.com/resources/
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list