[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor

New submission from JohnM <[hidden email]>:

The (very handy) logging.handlers.WatchedFileHandler appears to be  susceptible to a race condition that causes occasional OSErrors during the normal course of operations:


Traceback (most recent call last):
  File "test.py", line 58, in test_race
    log.warning('Foo bar %d', ii)
  File "/usr/lib64/python2.7/logging/__init__.py", line 1144, in warning
    self._log(WARNING, msg, args, **kwargs)
  File "/usr/lib64/python2.7/logging/__init__.py", line 1250, in _log
    self.handle(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 1260, in handle
    self.callHandlers(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 1300, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 744, in handle
    self.emit(record)
  File "/usr/lib64/python2.7/logging/handlers.py", line 412, in emit
    stat = os.stat(self.baseFilename)
OSError: [Errno 2] No such file or directory: '/tmp/tmpG6_luRTestRaceWatchedFileHandler'

Looking at the implementation of the handler's emit function, I've commented on where I think the race can occur. Logrotate (or some similar application) is deleting/unlinking the logfile between the first os.path.exists and os.stat calls:

    def emit(self, record):
        """
        Emit a record.

        First check if the underlying file has changed, and if it
        has, close the old stream and reopen the file to get the
        current stream.
        """
        if not os.path.exists(self.baseFilename):
        #      ^^^ race occurs between here
            stat = None
            changed = 1
        else:
            stat = os.stat(self.baseFilename)
            #      ^^^ and this stat call
            changed = (stat[ST_DEV] != self.dev) or (stat[ST_INO] != self.ino)
        if changed and self.stream is not None:
            self.stream.flush()
            self.stream.close()
            self.stream = self._open()
            if stat is None:
                stat = os.stat(self.baseFilename)
            self.dev, self.ino = stat[ST_DEV], stat[ST_INO]
        logging.FileHandler.emit(self, record)

I've attached a test script that attempts to recreate the issue. On my Linux system running the script is able to trigger the issue about every 7 of 10 runs.

----------
components: Library (Lib)
files: test.py
messages: 158829
nosy: phlogistonjohn
priority: normal
severity: normal
status: open
title: Race condition in WatchedFileHandler leads to unhandled exception
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file25289/test.py

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor

JohnM <[hidden email]> added the comment:

I'm attaching a patch of an emit function implementation that reduces the number of stats and uses fstat were we can. Using fstat should also prevent any races between opening the new file and stating the path in the file system.

Using this patch or something along these lines should avoid the time of check to time of use issues with the current version with repeated stat calls.

----------
keywords: +patch
Added file: http://bugs.python.org/file25290/watchedfilehandler.patch

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor
In reply to this post by STINNER Victor

Charles-François Natali <[hidden email]> added the comment:

Thanks for the report and patch.

I'm making Vinay nosy, as he's the logging expert.

A couple remarks:
- it would be nice to add your testcase to the logging tests (Lib/test/test_logging.py)
- I think you could probably save the whole fstat() result (instead of st_dev and st_inode): this would allow you to use os.pathsamestat() to check whether we're still refering to the same file, and it could maybe be useful someday (maybe use st_mtime)
- typo: "existance" -> "existence"
- I see you're catching OSError when calling fstat(): however, I don't see what error could be raised apart from EBADF (if the FD is closed, then it's likely a bug that shouldn't be masked, and it will blow up on the next write()), or I/O errors, that shouldn't be silenced

----------
nosy: +neologix, vinay.sajip
versions: +Python 3.2, Python 3.3

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor
In reply to this post by STINNER Victor

Changes by Vinay Sajip <[hidden email]>:


----------
assignee:  -> vinay.sajip

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor
In reply to this post by STINNER Victor

Vinay Sajip <[hidden email]> added the comment:

There's certainly a problem to be addressed here, but integrating the test into the regular test suite represents a problem in that the test takes around 10 minutes to run, and sometimes completes successfully. I'm not sure a 10-minute extension of the time taken to run regrtest will be acceptable, but if the test is skipped unless explicitly enabled, then I'm not sure it'll add much value.

I will look at the patch and Charles-François' review comments soon.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor
In reply to this post by STINNER Victor

JohnM <[hidden email]> added the comment:

Thank you both for looking at this. I've added an updated version of the patch that incorporates the last two suggestions that Charles-François made.

I agree that this test may not be appropriate for the python test suite, due to the length and non-determinism of the test. I don't know what the suite's policy on monkey patching stdlib in the tests is, but monkey patching either os.path.exists or os.stat to remove the file at the appropriate time could be one way to make the test fast and deterministic. Seems a bit dirty to me though. :-)
 
I am up for adding some kind of test to the suite for the WatchedFileHandler, though.

I'm pretty ambivalent about keeping the whole stat structure around, since the samestat method is just a wrapper around what the emit function is already doing, and we'd be keeping more memory (although a small amount) around. I doubt we'd want to look at the timestamps because they could legitimately change in ways this check doesn't care about.

----------
Added file: http://bugs.python.org/file25305/watchedfilehandler.patch

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor
In reply to this post by STINNER Victor

Charles-François Natali <[hidden email]> added the comment:

> There's certainly a problem to be addressed here, but integrating the test into the regular test suite represents a problem in that the test takes around 10 minutes to run, and sometimes completes successfully. I'm not sure a 10-minute extension of the time taken to run regrtest will be acceptable, but if the test is skipped unless explicitly enabled, then I'm not sure it'll add much value.

There's no reason to wait for 10 minutes: one can reduce the number of
iterations greatly, so that it can run in a matter of seconds. Since
the test is run by all the builders multiple times per day, should the
race condition resurface, it will trigger a buildbot failure some day.
No need for it to trigger a failure every time.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor
In reply to this post by STINNER Victor

Vinay Sajip <[hidden email]> added the comment:

John: Thanks for the updated patch.

Charles-François: Certainly I can reduce the iterations to make the test faster. As it is, I did reproduce the failure on my dev system, but only once, after running John's test script about a dozen times: every other time, it completed successfully :-(

Isn't it simpler if I just replace the os.path.exists() calls with os.stat(), and check for ENOENT if an exception of type OSError or WindowsError occurs?

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor

Charles-François Natali <[hidden email]> added the comment:

> Charles-François: Certainly I can reduce the iterations to make the test
> faster. As it is, I did reproduce the failure on my dev system, but only
> once, after running John's test script about a dozen times: every other
> time, it completed successfully :-(

Juste reduce the sleep times, e.g.:
"""
@@ -17,7 +17,7 @@
             os.unlink(fname)
         except OSError:
             pass
-        stime = 0.04 * random.randint(0, 4)
+        stime = 0.004 * random.randint(0, 4)
         time.sleep(stime)

@@ -50,7 +50,7 @@
         log.setLevel(logging.INFO)

         for ii in range(LOGCOUNT):
-            stime = 0.05 # * random.randint(0, 4)
+            stime = 0.005 # * random.randint(0, 4)
             time.sleep(stime)
             log.warning('Foo bar %d', ii)
"""

With this change, I can trigger a failure reliably in around 1s, and
my computer is rather slow.

> Isn't it simpler if I just replace the os.path.exists() calls with
> os.stat(), and check for ENOENT if an exception of type OSError or
> WindowsError occurs?

The problem is that it would leave a race window if the file is
changed between the time it's opened (I guess in
logging.FileHandler.__init__()) and the first call to stat().
John's patch is safe in this regard, thanks to fstat().

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor

Vinay Sajip <[hidden email]> added the comment:

[snip]
>With this change, I can trigger a failure reliably in around 1s, and
>my computer is rather slow.

I'm working in a VM, and although I can get John's script to fail more regularly (with the reduced timeouts and counts of 1000), a version of the test which I added to test_logging always succeeds. That code is:

    @unittest.skipUnless(threading, 'Threading required for this test.')
    def test_race(self):
        # Issue #14632 refers.
        def remove_loop(fname, tries):
            for _ in range(tries):
                try:
                    os.unlink(fname)
                except OSError:
                    pass
                time.sleep(0.004 * random.randint(0, 4))

        def cleanup(remover, fn, handler):
            handler.close()
            remover.join()
            if os.path.exists(fn):
                os.unlink(fn)

        fd, fn = tempfile.mkstemp('.log', 'test_logging-3-')
        os.close(fd)
        del_count = 1000
        log_count = 1000
        remover = threading.Thread(target=remove_loop, args=(fn, del_count))
        remover.daemon = True
        remover.start()
        h = logging.handlers.WatchedFileHandler(fn)
        self.addCleanup(cleanup, remover, fn, h)
        f = logging.Formatter('%(asctime)s: %(levelname)s: %(message)s')
        h.setFormatter(f)
        for _ in range(log_count):
            time.sleep(0.005)
            r = logging.makeLogRecord({'msg': 'testing' })
            h.handle(r)

I can't see why this always works, while John's script sometimes fails.
>The problem is that it would leave a race window if the file is
>changed between the time it's opened (I guess in
>logging.FileHandler.__init__()) and the first call to stat().
>John's patch is safe in this regard, thanks to fstat().

Oh, right - missed that.

----------

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor

Charles-François Natali <[hidden email]> added the comment:

> I can't see why this always works, while John's script sometimes fails.

It does fail consistently on my machine.
I'm attaching the diff just in case.

----------
Added file: http://bugs.python.org/file25345/test_logging_race.diff

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com


test_logging_race.diff (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor
In reply to this post by STINNER Victor

Roundup Robot <[hidden email]> added the comment:

New changeset 15a33d7d2b50 by Vinay Sajip in branch '2.7':
Issue #14632: Updated WatchedFileHandler to deal with race condition. Thanks to John Mulligan for the problem report and patch.
http://hg.python.org/cpython/rev/15a33d7d2b50

New changeset 5de7c3d64f2a by Vinay Sajip in branch '3.2':
Issue #14632: Updated WatchedFileHandler to deal with race condition. Thanks to John Mulligan for the problem report and patch.
http://hg.python.org/cpython/rev/5de7c3d64f2a

New changeset 380821b47872 by Vinay Sajip in branch 'default':
Issue #14632: Updated WatchedFileHandler to deal with race condition. Thanks to John Mulligan for the problem report and patch.
http://hg.python.org/cpython/rev/380821b47872

----------
nosy: +python-dev

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor
In reply to this post by STINNER Victor

Vinay Sajip <[hidden email]> added the comment:

I will leave this open for a few days and see if the buildbots throw up anything.

----------
resolution:  -> fixed

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor
In reply to this post by STINNER Victor

Vinay Sajip <[hidden email]> added the comment:

Buildbots seem not to be complaining, so closing.

----------
status: open -> closed

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com

Reply | Threaded
Open this post in threaded view
|

[issue14632] Race condition in WatchedFileHandler leads to unhandled exception

STINNER Victor
In reply to this post by STINNER Victor

Brian Curtin <[hidden email]> added the comment:

The test for this issue seems to fail about half of the time on Windows.


======================================================================
ERROR: test_race (test.test_logging.HandlerTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\python-dev\cpython-main\lib\test\test_logging.py", line 605, in test_race
    h = logging.handlers.WatchedFileHandler(fn, delay=delay)
  File "c:\python-dev\cpython-main\lib\logging\handlers.py", line 421, in __init__
    logging.FileHandler.__init__(self, filename, mode, encoding, delay)
  File "c:\python-dev\cpython-main\lib\logging\__init__.py", line 965, in __init__
    StreamHandler.__init__(self, self._open())
  File "c:\python-dev\cpython-main\lib\logging\__init__.py", line 987, in _open
    return open(self.baseFilename, self.mode, encoding=self.encoding)
PermissionError: [Errno 13] Permission denied: 'c:\\users\\brian\\appdata\\local\\temp\\test_logging-3-lxjo5t.log'


I also get this failure when running in the VS2010 branch:

======================================================================
ERROR: test_race (test.test_logging.HandlerTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\python-dev\porting\cpython\lib\test\test_logging.py", line 600, in cleanup
    os.unlink(fn)
PermissionError: [Error 32] The process cannot access the file because it is being used by
 another process: 'c:\\users\\brian\\appdata\\local\\temp\\test_logging3-6ujeil.log'

----------
nosy: +brian.curtin

_______________________________________
Python tracker <[hidden email]>
<http://bugs.python.org/issue14632>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/lists%2B1322467933539-512619%40n6.nabble.com