[GE users] Tight integration with PVM

JONATHAN SELANDER S026655 at utb.hb.se
Fri Apr 15 13:22:31 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Adding the PE to a queue fixed that error message. However, one node seems to fail each time i run the job (it has state E when i do qstat -f). It's not the same node each time either that fails.

---

# tail -2 /opt/sge/default/spool/brasnod-2/messages
04/15/2005 22:08:06|execd|brasnod-2|E|shepherd of job 102.1 exited with exit status = 10
04/15/2005 22:08:06|execd|brasnod-2|W|reaping job "102" ptf complains: Job does not exist

---

# qstat -explain E
queuename                      qtype used/tot. load_avg arch          states
----------------------------------------------------------------------------
all.q at brasnod-2                BIP   0/1       0.02     sol-sparc64   E
        queue all.q marked QERROR as result of job 102's failure at host brasnod-2
----------------------------------------------------------------------------
all.q at brasnod-3                BIP   0/1       0.02     sol-sparc64
----------------------------------------------------------------------------
all.q at brasnod-4                BIP   0/1       0.01     sol-sparc64

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
    102 0.55500 tester_tig root         qw    04/15/2005 14:08:35     3




-----Original Message-----
From: Reuti <reuti at staff.uni-marburg.de>
To: users at gridengine.sunsource.net
Date: Fri, 15 Apr 2005 14:02:23 +0200
Subject: Re: [GE users] Tight integration with PVM

Hi,

did you add the PE to the queue definition (qconf -mq <queue>) like:

pe_list    pvm

CU - Reuti


JONATHAN SELANDER wrote:
> I followed the howto at http://gridengine.sunsource.net/howto/pvm-integration/pvm-integration.html for setting up PVM integration with SGE after I had compiled pvm 3 and installed/compiled the utilities in the SGE_ROOT/pvm dir (aimk and install.sh)
> 
> However, when i try the example tester_tight.sh from the howto, i get these scheduling errors in the logs:
> 
> ---
> 
> cannot run in queue instance "all.q at brasnod-2" because PE "pvm" is not in pe list
> cannot run in queue instance "all.q at brasnod-4" because PE "pvm" is not in pe list
> cannot run because resources requested are not available for parallel job
> cannot run because available slots combined under PE "pvm" are not in range of job
> 
> ---
> 
> # qconf -sp pvm
> pe_name           pvm
> slots             100
> user_lists        NONE
> xuser_lists       NONE
> start_proc_args   /opt/sge/pvm/startpvm.sh -catch_rsh $pe_hostfile $host \
>                   /opt/sge/pvm
> stop_proc_args    /opt/sge/pvm/stoppvm.sh -catch_rsh $pe_hostfile $host
> allocation_rule   1
> control_slaves    TRUE
> job_is_first_task FALSE
> urgency_slots     min
> 
> ---
> 
> 
> What does this mean? brasnod-2,3,4 are execution hosts which work correctly when i run ordinary jobs.
> 
> J
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list