[GE users] dropped because it is full

Hugo Darío Barrera hbarrera at iciq.es
Tue Jun 27 09:50:11 BST 2006


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


HI,

thanks for the answer,

yes, I have a PE called "short" and "big1"

Im running this simple script:

#!/bin/bash

#$ -N a
#
# pe request
#$ -pe big2 8


cd /home/mante070/Co1
/usr/bin/mpirun -np 8 -machinefile /scratch/machines /usr/local/bin/vasp


in /scratch/machines i have the name for all nodes.

now I have:

tekla001 to tekla018

i send a job to be run in nodes tekla011 to tekla018 but although qstat -t 
shows: 22 0.55500 a          mante        r     06/27/2006 10:36:56 
big2 at tekla011          SLAVE
     22 0.55500 a          mante        r     06/27/2006 10:36:56 
big2 at tekla012          SLAVE
     22 0.55500 a          mante        r     06/27/2006 10:36:56 
big2 at tekla013          SLAVE
     22 0.55500 a          mante        r     06/27/2006 10:36:56 
big2 at tekla014          SLAVE
     22 0.55500 a          mante        r     06/27/2006 10:36:56 
big2 at tekla015          SLAVE
     22 0.55500 a          mante        r     06/27/2006 10:36:56 
big2 at tekla016          SLAVE
     22 0.55500 a          mante        r     06/27/2006 10:36:56 
big2 at tekla017          MASTER
                                                                  
big2 at tekla017          SLAVE
     22 0.55500 a          mante        r     06/27/2006 10:36:56 
big2 at tekla018          SLAVE


if i go to "Master" node (tekla017 in this case), and run ps aux |grep vasp i 
get:
mante     3697  0.0  0.0   3108  1588 ?        SN   10:35   
0:00 /bin/sh /usr/bin/mpirun -np 
8 -machinefile /scratch/machines /usr/local/bin/vasp
mante     3903 90.5  8.4 214640 175344 ?       RNs  10:35   
6:46 /usr/local/bin/vasp -p4pg /home/mante070/Co1/PI3697 -p4wd /home/mante070/Co1
mante     3904  0.0  0.1  23800  3772 ?        SN   10:35   
0:00 /usr/local/bin/vasp -p4pg /home/mante070/Co1/PI3697 -p4wd /home/mante070/Co1
mante     3905  0.0  0.1   5740  2332 ?        SN   10:35   0:00 /usr/bin/ssh 
tekla001. -l mante -n /usr/local/bin/vasp tekla017 41547 \-p4amslave 
\-p4yourname tekla001. \-p4rmrank 1
mante     3906  0.0  0.1   5740  2336 ?        SN   10:35   0:00 /usr/bin/ssh 
tekla002. -l mante -n /usr/local/bin/vasp tekla017 41547 \-p4amslave 
\-p4yourname tekla002. \-p4rmrank 2
mante     3907  0.0  0.1   5744  2336 ?        SN   10:35   0:00 /usr/bin/ssh 
tekla003. -l mante -n /usr/local/bin/vasp tekla017 41547 \-p4amslave 
\-p4yourname tekla003. \-p4rmrank 3
mante     3908  0.0  0.1   5740  2332 ?        SN   10:35   0:00 /usr/bin/ssh 
tekla004. -l mante -n /usr/local/bin/vasp tekla017 41547 \-p4amslave 
\-p4yourname tekla004. \-p4rmrank 4
mante     3909  0.0  0.1   5744  2336 ?        SN   10:35   0:00 /usr/bin/ssh 
tekla005. -l mante -n /usr/local/bin/vasp tekla017 41547 \-p4amslave 
\-p4yourname tekla005. \-p4rmrank 5
mante     3910  0.0  0.1   5740  2332 ?        SN   10:35   0:00 /usr/bin/ssh 
tekla006. -l mante -n /usr/local/bin/vasp tekla017 41547 \-p4amslave 
\-p4yourname tekla006. \-p4rmrank 6
mante     3911  0.0  0.1   5740  2332 ?        SN   10:35   0:00 /usr/bin/ssh 
tekla007. -l mante -n /usr/local/bin/vasp tekla017 41547 \-p4amslave 
\-p4yourname tekla007. \-p4rmrank 7


so its really taking the first nodes instead of the nodes that are configured 
in the hostgroup configuration


Btw, i have a fresh install of sge.

Tnx 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list