Skip to content

sunos: fs-event callback can be called after uv_close() #3542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 11, 2022

Conversation

citrus-it
Copy link
Contributor

@citrus-it citrus-it commented Mar 10, 2022

On illumos and Solaris, fs events are implemented with
PORT_SOURCE_FILE type event ports. These are one-shot so
need re-arming each time they fire. Once they are armed
and an event occurs, the kernel removes them from the current
cache list and puts them on an event queue to be read by
the application.

There's a window in closing one of these ports when it could
have triggered and be pending delivery. In that case, the
attempt to disarm (dissociate) the event will fail with ENOENT
but libuv still goes ahead and closes down the handle. In
particular, the close callback (uv_close() argument) will be
called but then the event will subsequently be delivered if
the loop is still active; this should not happen.

This was originally reported to me as a problem with using
nodejs with the nodemon package. When monitoring a tree of
files and making a lot of quick changes (for example switching
a project's branch), node crashes:

stack pointer for thread 1: fffffaffffdea6f0
[ fffffaffffdea6f0 uv__fs_event_read+0x9f() ]
  fffffaffffdf07a0 uv__io_poll+0x33d()
  fffffaffffdf0800 uv_run+0x144()
  fffffaffffdf08a0 node::SpinEventLoop+0x156()

status: process terminated by SIGSEGV (Segmentation Fault), addr=1d59faf

> 1d59faf/i
uv__fs_event_read+0x9f:         call   *0x68(%r12)

> <r12=K
                6d62de8

> 6d62de8::whatis
6d62de8 is 6d62d80+68, freed from umem_alloc_320:
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
         6d638c0          6d62d80   584339eb1f8c18                1
                          646a028          60a1780          63a9270
                 libumem.so.1`umem_malloc_free+0x1a
                 node::HandleWrap::OnClose+0x134
                 uv_run+0x229
                 node::SpinEventLoop+0x156

Using dtrace, I was able to see what was happening here:

First, the fs event is started and associated.

	uv_fs_event_start -1159869698 (6d62de8)
	  uv__fs_event_rearm 0 (6d62de8)
	    port_associate(23, 7, 7641ed8, 6)
	      - returned 0 (errno 0)

Then, when the file is removed from the filesystem, the kernel moves
the event to the pending list

	port_pcache_delete(fffffdff9f935580, 7641ed8)
              portfs`port_pcache_remove_fop+0x50
              portfs`port_fop_excep+0x40
              portfs`port_fop_sendevent+0xea
                ...
              genunix`unlink+0x18
              unix`sys_syscall+0x1a8

Here is uv_close() attempting to dissociate the event, which returns
ENOENT:

	uv_close 153 (6d62de8)
	  uv__fs_event_close 153 (6d62de8)
	    uv_fs_event_stop 153 (6d62de8)
	      port_dissociate(23, 7, 7641ed8)
	        - returned -1 (errno 2 - ENOENT)

and, finally, the handle is freed and then the event is returned,
now pointing to the freed memory (ev 10 == file deleted). Note that
the handle is in the closing list on the call to uv__fs_event_read():

	uv__fs_event_read fd 17 (with closing = 6d62de8)
	umem_malloc_free (6d62de8)

	port_getn returns 6d62de8 (ev 10)
	SEGV

All tests pass successfully with this change in place, on both illumos and Solaris.

@citrus-it
Copy link
Contributor Author

@jclulow - would appreciate your review here.

Copy link
Contributor

@jclulow jclulow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@citrus-it
Copy link
Contributor Author

I've added a test for this issue, which fails without the patch:

not ok 50 - fs_event_close_with_pending_delete_event
# exit code 134
# Output from process `fs_event_close_with_pending_delete_event`:
# Assertion failed in test/test-fs-event.c on line 74: 0 && "should never be called"

and succeeds afterwards

ok 49 - fs_event_close_in_callback
ok 50 - fs_event_close_with_pending_delete_event
ok 51 - fs_event_close_with_pending_event
ok 52 - fs_event_error_reporting

@citrus-it citrus-it changed the title sunos: use-after-free race in fsevents sunos: fs-event callback can be called after uv_close() Mar 12, 2022
@citrus-it citrus-it force-pushed the portfsrace branch 2 times, most recently from 5bc1ffc to c7d5b3e Compare March 12, 2022 11:45
@vtjnash
Copy link
Member

vtjnash commented Mar 21, 2022

On illumos and Solaris, fs events are implemented with
PORT_SOURCE_FILE type event ports. These are one-shot so
need re-arming each time they fire. Once they are armed
and an event occurs, the kernel removes them from the current
cache list and puts them on an event queue to be read by
the application.

There's a window in closing one of these ports when it could
have triggered and be pending delivery. In that case, the
attempt to disarm (dissociate) the event will fail with ENOENT
but libuv still goes ahead and closes down the handle. In
particular, the close callback (uv_close() argument) will be
called but then the event will subsequently be delivered if
the loop is still active; this should not happen.
@vtjnash
Copy link
Member

vtjnash commented Apr 6, 2022

@vtjnash vtjnash merged commit 612c28b into libuv:v1.x Apr 11, 2022
JeffroMF pushed a commit to JeffroMF/libuv that referenced this pull request May 16, 2022
On illumos and Solaris, fs events are implemented with
PORT_SOURCE_FILE type event ports. These are one-shot so
need re-arming each time they fire. Once they are armed
and an event occurs, the kernel removes them from the current
cache list and puts them on an event queue to be read by
the application.

There's a window in closing one of these ports when it could
have triggered and be pending delivery. In that case, the
attempt to disarm (dissociate) the event will fail with ENOENT
but libuv still goes ahead and closes down the handle. In
particular, the close callback (uv_close() argument) will be
called but then the event will subsequently be delivered if
the loop is still active; this should not happen.
uv__handle_stop(handle);
uv__make_close_pending((uv_handle_t*) handle);
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reviewing #3620, I noticed a possible defect introduced here, and wanted to check my understanding. Doesn't this need to now check if the user attempted to stop the handle here?

Suggested change
}
} else if (handle->fd == PORT_DELETED) {
uv__handle_stop(handle);
break;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secondly, there is a uv__is_active check in uv_fs_event_start, and I think that is now incorrect (or at least insufficient) and also needs to permit the case where uv_fs_event_stop didn't finish yet. But also while maintaining this fix (so that if you do a sequence of stop/event/start/close, it will still remember to defer the close).

I am mildly tempted to suggest we should instead soft-deprecate uv_fs_event_stop (in favor of uv_close)

Copy link
Contributor Author

@citrus-it citrus-it Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a while since I looked at this now but, yes, I think you're right for the us_fs_event_stop() case when the event has already been despatched and is awaiting delivery. It will need a uv__make_close_pending() too though, wouldn't it? It could then just be added to the condition above:

    if (uv__is_closing(handle) || handle->fd == PORT_DELETED) {
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv__make_close_pending shouldn't be triggered, since the handle is only paused, not closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants