[GE users] Tight integration with PVM

Reuti reuti at staff.uni-marburg.de
Fri Apr 15 14:07:34 BST 2005


At least the .po file should exist in your home directory (or from where 
you submitted your job), as the granted nodes are listed there by the 
start script. Is the $SGE_ROOT shared and the new version of the 
start/stop-scripts are available on the nodes?

JONATHAN SELANDER wrote:
> I don't have any files like that in SGE_ROOT or the TMPDIR
> 
> ---
> 
> # cat tester_tight.sh
> #!/bin/sh
> 
> export PVM_TMP=/opt/sge/tmp

Please use:

export PVM_TMP=$TMPDIR

$TMPDIR will be set by SGE to the created temporary job directory during 
execution.

CU - Reuti

> 
> ./hello
> 
> exit 0
> 
> ---
> 
> # ls -ld /opt/sge/tmp
> drwxrwxrwt   2 root     root         512 Apr 15 13:21 /opt/sge/tmp
> 
> ---
> 
> -----Original Message-----
> From: Reuti <reuti at staff.uni-marburg.de>
> To: users at gridengine.sunsource.net
> Date: Fri, 15 Apr 2005 14:44:01 +0200
> Subject: Re: [GE users] Tight integration with PVM
> 
> Is there anything in the .po or .pe files, or doesn't they exist at all?
> 
> JONATHAN SELANDER wrote:
> 
>>Adding the PE to a queue fixed that error message. However, one node seems to fail each time i run the job (it has state E when i do qstat -f). It's not the same node each time either that fails.
>>
>>---
>>
>># tail -2 /opt/sge/default/spool/brasnod-2/messages
>>04/15/2005 22:08:06|execd|brasnod-2|E|shepherd of job 102.1 exited with exit status = 10
>>04/15/2005 22:08:06|execd|brasnod-2|W|reaping job "102" ptf complains: Job does not exist
>>
>>---
>>
>># qstat -explain E
>>queuename                      qtype used/tot. load_avg arch          states
>>----------------------------------------------------------------------------
>>all.q at brasnod-2                BIP   0/1       0.02     sol-sparc64   E
>>        queue all.q marked QERROR as result of job 102's failure at host brasnod-2
>>----------------------------------------------------------------------------
>>all.q at brasnod-3                BIP   0/1       0.02     sol-sparc64
>>----------------------------------------------------------------------------
>>all.q at brasnod-4                BIP   0/1       0.01     sol-sparc64
>>
>>############################################################################
>> - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
>>############################################################################
>>    102 0.55500 tester_tig root         qw    04/15/2005 14:08:35     3
>>
>>
>>
>>
>>-----Original Message-----
>>From: Reuti <reuti at staff.uni-marburg.de>
>>To: users at gridengine.sunsource.net
>>Date: Fri, 15 Apr 2005 14:02:23 +0200
>>Subject: Re: [GE users] Tight integration with PVM
>>
>>Hi,
>>
>>did you add the PE to the queue definition (qconf -mq <queue>) like:
>>
>>pe_list    pvm
>>
>>CU - Reuti
>>
>>
>>JONATHAN SELANDER wrote:
>>
>>
>>>I followed the howto at http://gridengine.sunsource.net/howto/pvm-integration/pvm-integration.html for setting up PVM integration with SGE after I had compiled pvm 3 and installed/compiled the utilities in the SGE_ROOT/pvm dir (aimk and install.sh)
>>>
>>>However, when i try the example tester_tight.sh from the howto, i get these scheduling errors in the logs:
>>>
>>>---
>>>
>>>cannot run in queue instance "all.q at brasnod-2" because PE "pvm" is not in pe list
>>>cannot run in queue instance "all.q at brasnod-4" because PE "pvm" is not in pe list
>>>cannot run because resources requested are not available for parallel job
>>>cannot run because available slots combined under PE "pvm" are not in range of job
>>>
>>>---
>>>
>>># qconf -sp pvm
>>>pe_name           pvm
>>>slots             100
>>>user_lists        NONE
>>>xuser_lists       NONE
>>>start_proc_args   /opt/sge/pvm/startpvm.sh -catch_rsh $pe_hostfile $host \
>>>                 /opt/sge/pvm
>>>stop_proc_args    /opt/sge/pvm/stoppvm.sh -catch_rsh $pe_hostfile $host
>>>allocation_rule   1
>>>control_slaves    TRUE
>>>job_is_first_task FALSE
>>>urgency_slots     min
>>>
>>>---
>>>
>>>
>>>What does this mean? brasnod-2,3,4 are execution hosts which work correctly when i run ordinary jobs.
>>>
>>>J
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list