Fix kqueue false connect success from stale EVFILT_WRITE#202
Fix kqueue false connect success from stale EVFILT_WRITE#202mvandeberg wants to merge 1 commit intocppalliance:developfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a defensive verification for spurious EVFILT_WRITE on non-blocking connect: Changes
Sequence Diagram(s)sequenceDiagram
participant Socket as Socket (fd)
participant Kernel as Kernel/kqueue
participant Scheduler as kqueue_scheduler
participant ConnectOp as kqueue_connect_op
Socket->>Kernel: non-blocking connect()
Kernel-->>Scheduler: EVFILT_WRITE (write-ready)
Scheduler->>ConnectOp: invoke perform_io()
ConnectOp->>Socket: getsockopt(SO_ERROR)
alt SO_ERROR != 0
ConnectOp->>Scheduler: report error (enqueue)
else SO_ERROR == 0
ConnectOp->>Socket: getpeername()
alt getpeername succeeds
ConnectOp->>Scheduler: report success (enqueue completion)
else getpeername fails
ConnectOp->>Scheduler: set EAGAIN / ENOTCONN, defer completion
end
end
Note over Scheduler,ConnectOp: Scheduler may re-register fd including pending cn and retry as needed
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
An automated preview of the documentation is available at https://202.corosio.prtest3.cppalliance.org/index.html If more commits are pushed to the pull request, the docs will rebuild at the same URL. 2026-03-12 18:57:08 UTC |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp (1)
641-684:⚠️ Potential issue | 🟠 MajorPrefer
connect_opoverwrite_opwhen consuming cachedwrite_ready.At Lines 659-683,
write_readyrepresents a single cachedEVFILT_WRITEedge, but the new path gives that edge towrbeforecn. If both ops are live, a concurrentdescriptor_state::operator()()can cache the one edge that actually indicates connect completion while neither op is registered; consuming it forwrfirst can leavecnparked until another write edge happens, which may never occur withEV_CLEAR.Suggested fix
- if (wr) - { - if (write_ready) - { - write_ready = false; - retry = true; - } - else - { - write_op = wr; - wr = nullptr; - } - } if (cn) { if (write_ready) { write_ready = false; retry = true; } else { connect_op = cn; cn = nullptr; } } + if (wr) + { + if (write_ready) + { + write_ready = false; + retry = true; + } + else + { + write_op = wr; + wr = nullptr; + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp` around lines 641 - 684, The loop that consumes cached edges should prefer completing a pending connect over a write when a single write_ready edge is available; update the handling in the block that examines wr and cn so that if write_ready is true and both cn and wr are non-null, you assign the cached edge to connect_op (set connect_op = cn; cn = nullptr), clear write_ready and set retry=true, instead of giving it to write_op. Locate the logic using variables read_op, write_op, connect_op, read_ready, write_ready, rd, wr, cn inside the while loop in kqueue_scheduler.hpp (the same area that interacts with descriptor_state::operator()()) and adjust the conditional order to prefer connect_op consumption when both ops are live.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@include/boost/corosio/native/detail/kqueue/kqueue_op.hpp`:
- Around line 292-298: The code in kqueue_op.hpp replaces any getpeername()
error with EAGAIN, which masks permanent faults; change the logic in the block
that calls ::getpeername(fd, ...) inside the connect handling so that you only
map getpeername() failures to EAGAIN when errno == ENOTCONN (i.e., "not
connected yet"); for all other errno results (EBADF, ENOTSOCK, etc.)
preserve/propagate the original error value in err so the connect coroutine can
fail immediately. Use the existing variables fd and err and the getpeername()
call site to implement this conditional mapping.
---
Outside diff comments:
In `@include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp`:
- Around line 641-684: The loop that consumes cached edges should prefer
completing a pending connect over a write when a single write_ready edge is
available; update the handling in the block that examines wr and cn so that if
write_ready is true and both cn and wr are non-null, you assign the cached edge
to connect_op (set connect_op = cn; cn = nullptr), clear write_ready and set
retry=true, instead of giving it to write_op. Locate the logic using variables
read_op, write_op, connect_op, read_ready, write_ready, rd, wr, cn inside the
while loop in kqueue_scheduler.hpp (the same area that interacts with
descriptor_state::operator()()) and adjust the conditional order to prefer
connect_op consumption when both ops are live.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 081b6594-cbe4-4f24-82cf-6443f1a2a41a
📒 Files selected for processing (2)
include/boost/corosio/native/detail/kqueue/kqueue_op.hppinclude/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp
|
GCOVR code coverage report https://202.corosio.prtest3.cppalliance.org/gcovr/index.html Build time: 2026-03-12 19:04:10 UTC |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #202 +/- ##
===========================================
- Coverage 76.25% 76.24% -0.01%
===========================================
Files 99 99
Lines 10570 10567 -3
Branches 2407 2406 -1
===========================================
- Hits 8060 8057 -3
Misses 1797 1797
Partials 713 713
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
a675fdd to
8519ce2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp`:
- Around line 672-684: The bug is that write_ready is shared between connect_op
and write_op (cn/wr), so a cached write edge can be consumed by wr and leave
connect_op stuck; fix by introducing a separate flag (e.g., connect_ready or
connect_write_ready) or by making connect_op take priority: update the branches
that handle cn and wr (the blocks manipulating connect_op, write_op, cn, wr,
write_ready, retry) to check/clear the new connect_ready when servicing cn (or
to test cn first and consume the shared bit for connect_op before write_op), and
apply the same change to the analogous code around the other block handling
descriptor_state (the 712-722 region) so connect readiness is tracked/cleared
independently of write readiness.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 440a1465-eb20-4c31-9e98-bccea02c4927
📒 Files selected for processing (2)
include/boost/corosio/native/detail/kqueue/kqueue_op.hppinclude/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp
The kqueue backend registers sockets for EVFILT_WRITE at open() time. A freshly created socket is writable, so kqueue fires a stale event before connect() completes. If the reactor processes this before the kernel delivers the connect result (e.g. RST for ECONNREFUSED), getsockopt(SO_ERROR) returns 0 and the connect falsely reports success. Fix by adding a getpeername() check in connect perform_io() to verify the connection is actually established when SO_ERROR is 0, returning EAGAIN to re-park the op if not. Add EAGAIN handling for connect ops in descriptor_state::operator()() to match the existing read/write pattern.
8519ce2 to
0393204
Compare
The kqueue backend registers sockets for EVFILT_WRITE at open() time. A freshly created socket is writable, so kqueue fires a stale event before connect() completes. If the reactor processes this before the kernel delivers the connect result (e.g. RST for ECONNREFUSED), getsockopt(SO_ERROR) returns 0 and the connect falsely reports success.
Fix by adding a getpeername() check in connect perform_io() to verify the connection is actually established when SO_ERROR is 0, returning EAGAIN to re-park the op if not. Add EAGAIN handling for connect ops in descriptor_state::operator()() to match the existing read/write pattern.
Summary by CodeRabbit