[GE users] allocation or info leaking

lukacm at pdx.edu lukacm at pdx.edu
Wed May 18 17:07:22 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello,

this is concerning the SGE 5.3.
While using NAMd (which uses charmrun compiled against some selected mpi) the
following situation occurs. Assume i request 8 parallel slots using qsub:
qsub -pe mpich 8 namd_submit.sh data.namd.
after the program is started the output of the qstat gives something like this:
----------------------------------------------------------------------------
compute-0-0.2q       BIP   0/2       0.93     lx24-amd64 d
----------------------------------------------------------------------------
compute-0-0.q        BIP   0/1       0.93     lx24-amd64
----------------------------------------------------------------------------
compute-0-0.qq       BIP   0/1       0.93     lx24-amd64
----------------------------------------------------------------------------
compute-0-1.2q       BIP   0/2       0.89     lx24-amd64 d
----------------------------------------------------------------------------
compute-0-1.q        BIP   0/1       0.89     lx24-amd64
----------------------------------------------------------------------------
compute-0-1.qq       BIP   0/1       0.89     lx24-amd64
----------------------------------------------------------------------------
compute-0-2.2q       BIP   0/2       0.84     lx24-amd64 d
----------------------------------------------------------------------------
compute-0-2.q        BIP   1/1       0.84     lx24-amd64
    131     0 NAMD_test  lukacm       r     05/18/2005 09:01:18 SLAVE
----------------------------------------------------------------------------
compute-0-2.qq       BIP   1/1       0.84     lx24-amd64
    131     0 NAMD_test  lukacm       r     05/18/2005 09:01:18 SLAVE
----------------------------------------------------------------------------
compute-0-3.2q       BIP   0/2       0.00     lx24-amd64 d
----------------------------------------------------------------------------
compute-0-3.q        BIP   1/1       0.00     lx24-amd64
    131     0 NAMD_test  lukacm       r     05/18/2005 09:01:18 SLAVE
----------------------------------------------------------------------------
compute-0-3.qq       BIP   1/1       0.00     lx24-amd64
    131     0 NAMD_test  lukacm       r     05/18/2005 09:01:18 MASTER
            0 NAMD_test  lukacm       r     05/18/2005 09:01:18 SLAVE
----------------------------------------------------------------------------
compute-0-4.2q       BIP   0/2       0.00     lx24-amd64 d
----------------------------------------------------------------------------
compute-0-4.q        BIP   1/1       0.00     lx24-amd64
    131     0 NAMD_test  lukacm       r     05/18/2005 09:01:18 SLAVE
----------------------------------------------------------------------------
compute-0-4.qq       BIP   1/1       0.00     lx24-amd64
    131     0 NAMD_test  lukacm       r     05/18/2005 09:01:18 SLAVE
----------------------------------------------------------------------------
compute-0-5.2q       BIP   0/2       0.80     lx24-amd64 d
----------------------------------------------------------------------------
compute-0-5.q        BIP   1/1       0.80     lx24-amd64
    131     0 NAMD_test  lukacm       r     05/18/2005 09:01:18 SLAVE
----------------------------------------------------------------------------
compute-0-5.qq       BIP   1/1       0.80     lx24-amd64
    131     0 NAMD_test  lukacm       r     05/18/2005 09:01:18 SLAVE


however if i do
cluster-fork ps -u lukacm i obtain the following:
compute-0-0:
  PID TTY          TIME CMD
18176 ?        00:00:00 sh
18177 ?        00:03:36 namd2
18186 ?        00:00:00 sh
18187 ?        00:03:34 namd2
18228 ?        00:00:00 sshd
18230 ?        00:00:00 ps
compute-0-1:
  PID TTY          TIME CMD
19034 ?        00:00:00 sh
19035 ?        00:03:35 namd2
19036 ?        00:00:00 sh
19038 ?        00:03:39 namd2
19086 ?        00:00:00 sshd
19088 ?        00:00:00 ps
compute-0-2:
  PID TTY          TIME CMD
23690 ?        00:00:00 sh
23692 ?        00:03:40 namd2
23698 ?        00:00:00 sh
23699 ?        00:03:38 namd2
23730 ?        00:00:00 sshd
23732 ?        00:00:00 ps
compute-0-3:
  PID TTY          TIME CMD
 3147 ?        00:00:00 csh
 3173 ?        00:00:00 charmrun
 3224 ?        00:00:00 sshd
 3226 ?        00:00:00 ps
compute-0-4:
  PID TTY          TIME CMD
  971 ?        00:00:00 sshd
  973 ?        00:00:00 ps
compute-0-5:
  PID TTY          TIME CMD
15898 ?        00:00:00 sh
15899 ?        00:00:00 sh
15900 ?        00:03:40 namd2
15901 ?        00:03:39 namd2
15946 ?        00:00:00 sshd
15948 ?        00:00:00 ps

which means that SGE is starting some threads but they are not located where SGE
is showing they are. For example the node compute-0-4 is showing that SGE
started some threads on itbut in reality those threads are somewhere else. Is
it a problem of tight-integration or is it NAMD specific problem? anyone having
some issue?

thanks

martin

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list