Opened 4 years ago

Closed 4 years ago

#1545 closed defect (invalid)

SGE Buffering Python Output

Reported by: agrothberg Owned by:
Priority: normal Milestone:
Component: sge Version: 8.1.8
Severity: minor Keywords:
Cc:

Description

For some reason SGE seems to buffer the output from Python.
see: http://stackoverflow.com/questions/9876967/sge-script-print-to-file-during-execution-not-just-at-the-end
or: http://stackoverflow.com/questions/26667008/sge-how-to-see-the-output-in-real-time

When I run this shell script:

echo "hello"
PYTHONUNBUFFERED=1 python -u hello_world_delay.py

with this python script:

import time
print "Hello world"

time.sleep(10)

print "Hello world"

I see the first hello printed by the shell script but I do not see the Python output at all until Python is done running.

Attachments (3)

untitled-part.pgp (819 bytes) - added by wish 4 years ago.
Added by email2trac
untitled-part-1.pgp (819 bytes) - added by wish 4 years ago.
Added by email2trac
untitled-part-2.pgp (819 bytes) - added by wish 4 years ago.
Added by email2trac

Download all attachments as: .zip

Change History (23)

comment:1 Changed 4 years ago by wish

This is another one that looks more like a support query for users@… than a bug. Given that the shell manages to output without buffering I suspect the issue is more like "The environment grid engine provides causes python to buffer output despite being told not to".

Have you tried submitting the job with qsub -pty y ? That should provide something that looks more like an interactive environment and encourage python to really not buffer.

comment:2 Changed 4 years ago by agrothberg

Adding the -pty y flag actually made things worse; now both echo and python have their outputs buffered.

comment:3 Changed 4 years ago by wish

How about running python under stdbuf (assuming a GNUish environment)?

comment:4 Changed 4 years ago by agrothberg

I have tried both unbuffer and stdbuf and neither seemed to work.

comment:5 Changed 4 years ago by wish

What about explicitly disabling the buffering from within the script per this stackoverflow script:
https://stackoverflow.com/questions/9876967/sge-script-print-to-file-during-execution-not-just-at-the-end

comment:6 Changed 4 years ago by agrothberg

The solution presented in http://stackoverflow.com/a/13126950/2638485 does
work but seems to be symptomatic of something wrong.

On Sat, Jul 18, 2015 at 4:36 PM, SGE <sge-bugs@…> wrote:

#1545: SGE Buffering Python Output


Reporter: agrothberg | Owner:

Type: defect | Status: new

Priority: normal | Milestone:

Component: sge | Version: 8.1.8

Severity: minor | Resolution:
Keywords: |


Comment (by wish):

What about explicitly disabling the buffering from within the script per
this stackoverflow script:
https://stackoverflow.com/questions/9876967/sge-script-print-to-file-
during-execution-not-just-at-the-end

--
Ticket URL: <https://arc.liv.ac.uk/trac/SGE/ticket/1545#comment:5>
SGE <https://arc.liv.ac.uk/trac/SGE>
Son of Grid Engine: Community continuation of work on Grid Engine

comment:7 Changed 4 years ago by wish

It looks like a lttle overkill to me. I suspect wrapping the files in something that does flush and/or fsync after every write would do as well as opening and closing them.

Are you trying to observe the file from the same host on which the job runs or a different one?

comment:8 Changed 4 years ago by agrothberg

I am observing the file on a different host from where the script is run
(on the submit / master node). The output shows up "immediately" when
emitted by echo within bash just not python.

On Sun, Jul 19, 2015, 01:51 SGE <sge-bugs@…> wrote:

#1545: SGE Buffering Python Output


Reporter: agrothberg | Owner:

Type: defect | Status: new

Priority: normal | Milestone:

Component: sge | Version: 8.1.8

Severity: minor | Resolution:
Keywords: |


Comment (by wish):

It looks like a lttle overkill to me. I suspect wrapping the files in
something that does flush and/or fsync after every write would do as well
as opening and closing them.

Are you trying to observe the file from the same host on which the job
runs or a different one?

--
Ticket URL: <https://arc.liv.ac.uk/trac/SGE/ticket/1545#comment:7>
SGE <https://arc.liv.ac.uk/trac/SGE>
Son of Grid Engine: Community continuation of work on Grid Engine

comment:9 Changed 4 years ago by dlove

This is another one that looks more like a support query for
users@… than a bug.

Yes, though it's better to report bugs that might be invalid than not to
report things.

Anyhow, SGE does nothing specific to Python. There's advice in the
troubleshooting howto on comparing environments inside and outside jobs.

If you want interactive output, use qrsh.

comment:10 Changed 4 years ago by wish

On Sun, 19 Jul 2015 05:54:44 +0000
Alex Rothberg <agrothberg@…> wrote:

I am observing the file on a different host from where the script is
run (on the submit / master node). The output shows up "immediately"
when emitted by echo within bash just not python.

I suspect this may be in part due to writing via a shared filesystem.
The host on which the job is run may be caching data locally and not
sending it to the server until it decides it has enough.

Try running the same script by ssh into the node and see what appears
to happen from the master.

Alternatively fsyncing the file after every write should force things.

On Sun, Jul 19, 2015, 01:51 SGE
<sge-bugs@…<mailto:sge-bugs@…>> wrote:
#1545: SGE Buffering Python Output


| Owner: Type: defect | Status: new

Priority: normal | Milestone:

Component: sge | Version: 8.1.8

Severity: minor | Resolution:
Keywords: |


Comment (by wish):

It looks like a lttle overkill to me. I suspect wrapping the files

in something that does flush and/or fsync after every write would do
as well as opening and closing them.

Are you trying to observe the file from the same host on which the

job runs or a different one?

--
Ticket URL: <https://arc.liv.ac.uk/trac/SGE/ticket/1545#comment:7>
SGE <https://arc.liv.ac.uk/trac/SGE>
Son of Grid Engine: Community continuation of work on Grid Engine

untitled-part.pgp

Changed 4 years ago by wish

Added by email2trac

comment:12 Changed 4 years ago by agrothberg

I will try ssh out, but why would shared fs affect echo vs python output
differently?

On Sun, Jul 19, 2015, 07:27 SGE <sge-bugs@…> wrote:

#1545: SGE Buffering Python Output


Reporter: agrothberg | Owner:

Type: defect | Status: new

Priority: normal | Milestone:

Component: sge | Version: 8.1.8

Severity: minor | Resolution:
Keywords: |


Comment (by wish):

On Sun, 19 Jul 2015 05:54:44 +0000
Alex Rothberg <agrothberg@…> wrote:

I am observing the file on a different host from where the script is
run (on the submit / master node). The output shows up "immediately"
when emitted by echo within bash just not python.

I suspect this may be in part due to writing via a shared filesystem.
The host on which the job is run may be caching data locally and not
sending it to the server until it decides it has enough.

Try running the same script by ssh into the node and see what appears
to happen from the master.

Alternatively fsyncing the file after every write should force things.

On Sun, Jul 19, 2015, 01:51 SGE
<sge-bugs@…<mailto:sge-bugs@…>> wrote:
#1545: SGE Buffering Python Output


| Owner: Type: defect | Status: new

Priority: normal | Milestone:

Component: sge | Version: 8.1.8

Severity: minor | Resolution:
Keywords: |


Comment (by wish):

It looks like a lttle overkill to me. I suspect wrapping the files

in something that does flush and/or fsync after every write would do
as well as opening and closing them.

Are you trying to observe the file from the same host on which the

job runs or a different one?

--
Ticket URL: <https://arc.liv.ac.uk/trac/SGE/ticket/1545#comment:7>
SGE <https://arc.liv.ac.uk/trac/SGE>
Son of Grid Engine: Community continuation of work on Grid Engine

untitled-part.pgp

--
Ticket URL: <https://arc.liv.ac.uk/trac/SGE/ticket/1545#comment:10>
SGE <https://arc.liv.ac.uk/trac/SGE>
Son of Grid Engine: Community continuation of work on Grid Engine

comment:13 Changed 4 years ago by wish

On Sun, 19 Jul 2015 14:20:04 +0000
Alex Rothberg <agrothberg@…> wrote:

I will try ssh out, but why would shared fs affect echo vs python
output differently?

Why would anything affect echo vs python output differently? Because
they behave differently. Possibly the echo is calling fsync or closing
its clone of the file handle.

On Sun, Jul 19, 2015, 07:27 SGE
<sge-bugs@…<mailto:sge-bugs@…>> wrote:
#1545: SGE Buffering Python Output


| Owner: Type: defect | Status: new

Priority: normal | Milestone:

Component: sge | Version: 8.1.8

Severity: minor | Resolution:
Keywords: |


Comment (by wish):

On Sun, 19 Jul 2015 05:54:44 +0000
Alex Rothberg <agrothberg@…<mailto:agrothberg@…>>

wrote:

I am observing the file on a different host from where the script
is run (on the submit / master node). The output shows up
"immediately" when emitted by echo within bash just not python.

I suspect this may be in part due to writing via a shared filesystem.
The host on which the job is run may be caching data locally and not
sending it to the server until it decides it has enough.

Try running the same script by ssh into the node and see what appears
to happen from the master.

Alternatively fsyncing the file after every write should force

things.

On Sun, Jul 19, 2015, 01:51 SGE
<sge-bugs@…<mailto:sge-bugs@…><mailto:sge-bugs@…>>>
wrote: #1545: SGE Buffering Python Output


| Owner: Type: defect | Status: new

Priority: normal | Milestone:

Component: sge | Version: 8.1.8

Severity: minor | Resolution:
Keywords: |


Comment (by wish):

It looks like a lttle overkill to me. I suspect wrapping the

files in something that does flush and/or fsync after every write
would do as well as opening and closing them.

Are you trying to observe the file from the same host on which the

job runs or a different one?

--
Ticket URL: <https://arc.liv.ac.uk/trac/SGE/ticket/1545#comment:7>
SGE <https://arc.liv.ac.uk/trac/SGE>
Son of Grid Engine: Community continuation of work on Grid Engine

untitled-part.pgp

--
Ticket URL: <https://arc.liv.ac.uk/trac/SGE/ticket/1545#comment:10>
SGE <https://arc.liv.ac.uk/trac/SGE>
Son of Grid Engine: Community continuation of work on Grid Engine

untitled-part-1.pgp

Changed 4 years ago by wish

Added by email2trac

comment:15 Changed 4 years ago by wish

On Sun, 19 Jul 2015 16:11:45 +0000
William Hay <wish@…> wrote:

On Sun, 19 Jul 2015 14:20:04 +0000
Alex Rothberg <agrothberg@…> wrote:

I will try ssh out, but why would shared fs affect echo vs python
output differently?

Why would anything affect echo vs python output differently? Because
they behave differently. Possibly the echo is calling fsync or
closing its clone of the file handle.

Or to turn it around. Because it doesn't. The output of echo is seen
when echo has finished running. The output of python is seen when
python has finished running. Perhaps both close file handles on
termination which the OS uses as a hint to flush the written data to
backing store.

When you are on the same machine it doesn't make a difference whether
the data if flushed to backing store or not because other programs can
see the kernel's copy. When using NFS or most shared filesystems the
data can only be seen by other machines when it is transfered to the
file servers.

On Sun, Jul 19, 2015, 07:27 SGE
<sge-bugs@…<mailto:sge-bugs@…>> wrote:
#1545: SGE Buffering Python Output


| Owner: Type: defect | Status: new

Priority: normal | Milestone:

Component: sge | Version: 8.1.8

Severity: minor | Resolution:
Keywords: |


Comment (by wish):

On Sun, 19 Jul 2015 05:54:44 +0000
Alex Rothberg <agrothberg@…<mailto:agrothberg@…>>

wrote:

I am observing the file on a different host from where the script
is run (on the submit / master node). The output shows up
"immediately" when emitted by echo within bash just not python.

I suspect this may be in part due to writing via a shared

filesystem. The host on which the job is run may be caching data
locally and not sending it to the server until it decides it has
enough.

Try running the same script by ssh into the node and see what

appears to happen from the master.

Alternatively fsyncing the file after every write should force

things.

On Sun, Jul 19, 2015, 01:51 SGE
<sge-bugs@…<mailto:sge-bugs@…><mailto:sge-bugs@…>>>
wrote: #1545: SGE Buffering Python Output


agrothberg | Owner: Type: defect | Status: new

Priority: normal | Milestone:

Component: sge | Version: 8.1.8

Severity: minor | Resolution:
Keywords: |


Comment (by wish):

It looks like a lttle overkill to me. I suspect wrapping the

files in something that does flush and/or fsync after every write
would do as well as opening and closing them.

Are you trying to observe the file from the same host on which

the job runs or a different one?

--
Ticket URL:
<https://arc.liv.ac.uk/trac/SGE/ticket/1545#comment:7> SGE
<https://arc.liv.ac.uk/trac/SGE> Son of Grid Engine: Community
continuation of work on Grid Engine

untitled-part.pgp

--
Ticket URL: <https://arc.liv.ac.uk/trac/SGE/ticket/1545#comment:10>
SGE <https://arc.liv.ac.uk/trac/SGE>
Son of Grid Engine: Community continuation of work on Grid Engine

untitled-part-2.pgp

Changed 4 years ago by wish

Added by email2trac

comment:17 Changed 4 years ago by dlove

  • Resolution set to invalid

comment:18 Changed 4 years ago by dlove

  • Resolution changed from invalid to fixed
  • Status changed from new to closed

comment:19 Changed 4 years ago by dlove

  • Resolution changed from fixed to invalid

This is _not_ a bug in SGE. I don't think there's anything in the
documentation that even says batch output is available before the job
finishes, let alone how it's buffered or interacts with filesystem
caching. Output files aren't even generally visible in NFS immediately
when a job finishes.

comment:20 Changed 4 years ago by agrothberg

  • Resolution invalid deleted
  • Status changed from closed to reopened

One of the SO posts says:

In general you can decrease or disable the buffering in SGE by changing it

and recompiling. But its not a great thing to do, all that data is going to
be slowly written to disk affecting your overall performance.

On Sun, Jul 19, 2015, 13:43 SGE <sge-bugs@…> wrote:

#1545: SGE Buffering Python Output


Reporter: agrothberg | Owner:

Type: defect | Status: closed

Priority: normal | Milestone:

Component: sge | Version: 8.1.8

Severity: minor | Resolution: invalid
Keywords: |


Changes (by dlove):

  • resolution: fixed => invalid

Comment:

This is _not_ a bug in SGE. I don't think there's anything in the
documentation that even says batch output is available before the job
finishes, let alone how it's buffered or interacts with filesystem
caching. Output files aren't even generally visible in NFS immediately
when a job finishes.

--
Ticket URL: <https://arc.liv.ac.uk/trac/SGE/ticket/1545#comment:19>
SGE <https://arc.liv.ac.uk/trac/SGE>
Son of Grid Engine: Community continuation of work on Grid Engine

comment:21 Changed 4 years ago by agrothberg

It looks like if I run "python -u delay.py > bar.txt" on the master, I see the text stream to the NFS mounts on the compute nodes.
On the other hand, if I run "python -u delay.py > bar.txt" on the compute node I do NOT see the contents stream to the master. Adding stdbuf or unbuffer to the command does not seem to help. I agree this does seem to be some issue with how Python flushes + NFS.

comment:22 Changed 4 years ago by agrothberg

I was able to "fix" this issue and get the output from Python to stream back to the master. What I did was mount nfs on the client using "sync".

See: http://serverfault.com/a/443846/233297
and: http://unix.stackexchange.com/a/23425/103540

comment:23 Changed 4 years ago by wish

  • Resolution set to invalid
  • Status changed from reopened to closed

While this will work for your scenario some codes will get better performance with NFS mounted async. Mounting async and adding calls to fsync after writes you want to show up immediately would be optimal but only works if you can modify the code in question.

Note: See TracTickets for help on using tickets.