Skip to content

Fix kqueue false connect success from stale EVFILT_WRITE#202

Open
mvandeberg wants to merge 1 commit intocppalliance:developfrom
mvandeberg:bug/kqueue-false-connect
Open

Fix kqueue false connect success from stale EVFILT_WRITE#202
mvandeberg wants to merge 1 commit intocppalliance:developfrom
mvandeberg:bug/kqueue-false-connect

Conversation

@mvandeberg
Copy link
Contributor

@mvandeberg mvandeberg commented Mar 11, 2026

The kqueue backend registers sockets for EVFILT_WRITE at open() time. A freshly created socket is writable, so kqueue fires a stale event before connect() completes. If the reactor processes this before the kernel delivers the connect result (e.g. RST for ECONNREFUSED), getsockopt(SO_ERROR) returns 0 and the connect falsely reports success.

Fix by adding a getpeername() check in connect perform_io() to verify the connection is actually established when SO_ERROR is 0, returning EAGAIN to re-park the op if not. Add EAGAIN handling for connect ops in descriptor_state::operator()() to match the existing read/write pattern.

Summary by CodeRabbit

  • Bug Fixes
    • Added defensive verification to prevent false-positive socket-write events during initial connection setup.
    • Improved handling and retry behavior for non-blocking connect attempts by normalizing transient EAGAIN/EWOULDBLOCK conditions and validating peer state before reporting connection completion.

@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d9871cf2-236c-4722-980d-7c04d2a0d94e

📥 Commits

Reviewing files that changed from the base of the PR and between 8519ce2 and 0393204.

📒 Files selected for processing (2)
  • include/boost/corosio/native/detail/kqueue/kqueue_op.hpp
  • include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp

📝 Walkthrough

Walkthrough

Adds a defensive verification for spurious EVFILT_WRITE on non-blocking connect: kqueue_connect_op::perform_io uses getpeername when SO_ERROR is zero to detect non-connected sockets. The kqueue scheduler normalizes EAGAIN/EWOULDBLOCK for connect ops, includes pending connects in re-registration, and adjusts enqueue/retry flows.

Changes

Cohort / File(s) Summary
kqueue connect operation
include/boost/corosio/native/detail/kqueue/kqueue_op.hpp
Add validation in kqueue_connect_op::perform_io: when SO_ERROR == 0, call getpeername(); if it fails, map to ENOTCONN/EAGAIN (or errno) to avoid treating premature EVFILT_WRITE as a completed connect.
kqueue scheduler connect handling
include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp
Normalize EAGAIN/EWOULDBLOCK for connect (cn) results (treat as not-ready), include cn in the re-registration loop (`while (rd

Sequence Diagram(s)

sequenceDiagram
    participant Socket as Socket (fd)
    participant Kernel as Kernel/kqueue
    participant Scheduler as kqueue_scheduler
    participant ConnectOp as kqueue_connect_op

    Socket->>Kernel: non-blocking connect()
    Kernel-->>Scheduler: EVFILT_WRITE (write-ready)
    Scheduler->>ConnectOp: invoke perform_io()
    ConnectOp->>Socket: getsockopt(SO_ERROR)
    alt SO_ERROR != 0
        ConnectOp->>Scheduler: report error (enqueue)
    else SO_ERROR == 0
        ConnectOp->>Socket: getpeername()
        alt getpeername succeeds
            ConnectOp->>Scheduler: report success (enqueue completion)
        else getpeername fails
            ConnectOp->>Scheduler: set EAGAIN / ENOTCONN, defer completion
        end
    end
    Note over Scheduler,ConnectOp: Scheduler may re-register fd including pending cn and retry as needed
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • sgerbino

Poem

🐰 I poked the socket, peered inside,

if SO_ERROR's clear but the peer won't bide,
I hop back to EAGAIN and wait my cue,
the scheduler nudges, tries again anew,
at last the handshake lands — a soft, sure stride.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: fixing a kqueue bug where stale EVFILT_WRITE events cause false connect success reports.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cppalliance-bot
Copy link

cppalliance-bot commented Mar 11, 2026

An automated preview of the documentation is available at https://202.corosio.prtest3.cppalliance.org/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-03-12 18:57:08 UTC

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp (1)

641-684: ⚠️ Potential issue | 🟠 Major

Prefer connect_op over write_op when consuming cached write_ready.

At Lines 659-683, write_ready represents a single cached EVFILT_WRITE edge, but the new path gives that edge to wr before cn. If both ops are live, a concurrent descriptor_state::operator()() can cache the one edge that actually indicates connect completion while neither op is registered; consuming it for wr first can leave cn parked until another write edge happens, which may never occur with EV_CLEAR.

Suggested fix
-            if (wr)
-            {
-                if (write_ready)
-                {
-                    write_ready = false;
-                    retry       = true;
-                }
-                else
-                {
-                    write_op = wr;
-                    wr       = nullptr;
-                }
-            }
             if (cn)
             {
                 if (write_ready)
                 {
                     write_ready = false;
                     retry       = true;
                 }
                 else
                 {
                     connect_op = cn;
                     cn         = nullptr;
                 }
             }
+            if (wr)
+            {
+                if (write_ready)
+                {
+                    write_ready = false;
+                    retry       = true;
+                }
+                else
+                {
+                    write_op = wr;
+                    wr       = nullptr;
+                }
+            }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp` around lines
641 - 684, The loop that consumes cached edges should prefer completing a
pending connect over a write when a single write_ready edge is available; update
the handling in the block that examines wr and cn so that if write_ready is true
and both cn and wr are non-null, you assign the cached edge to connect_op (set
connect_op = cn; cn = nullptr), clear write_ready and set retry=true, instead of
giving it to write_op. Locate the logic using variables read_op, write_op,
connect_op, read_ready, write_ready, rd, wr, cn inside the while loop in
kqueue_scheduler.hpp (the same area that interacts with
descriptor_state::operator()()) and adjust the conditional order to prefer
connect_op consumption when both ops are live.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/boost/corosio/native/detail/kqueue/kqueue_op.hpp`:
- Around line 292-298: The code in kqueue_op.hpp replaces any getpeername()
error with EAGAIN, which masks permanent faults; change the logic in the block
that calls ::getpeername(fd, ...) inside the connect handling so that you only
map getpeername() failures to EAGAIN when errno == ENOTCONN (i.e., "not
connected yet"); for all other errno results (EBADF, ENOTSOCK, etc.)
preserve/propagate the original error value in err so the connect coroutine can
fail immediately. Use the existing variables fd and err and the getpeername()
call site to implement this conditional mapping.

---

Outside diff comments:
In `@include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp`:
- Around line 641-684: The loop that consumes cached edges should prefer
completing a pending connect over a write when a single write_ready edge is
available; update the handling in the block that examines wr and cn so that if
write_ready is true and both cn and wr are non-null, you assign the cached edge
to connect_op (set connect_op = cn; cn = nullptr), clear write_ready and set
retry=true, instead of giving it to write_op. Locate the logic using variables
read_op, write_op, connect_op, read_ready, write_ready, rd, wr, cn inside the
while loop in kqueue_scheduler.hpp (the same area that interacts with
descriptor_state::operator()()) and adjust the conditional order to prefer
connect_op consumption when both ops are live.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 081b6594-cbe4-4f24-82cf-6443f1a2a41a

📥 Commits

Reviewing files that changed from the base of the PR and between 6d10efa and a675fdd.

📒 Files selected for processing (2)
  • include/boost/corosio/native/detail/kqueue/kqueue_op.hpp
  • include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp

@cppalliance-bot
Copy link

cppalliance-bot commented Mar 11, 2026

GCOVR code coverage report https://202.corosio.prtest3.cppalliance.org/gcovr/index.html
LCOV code coverage report https://202.corosio.prtest3.cppalliance.org/genhtml/index.html
Coverage Diff Report https://202.corosio.prtest3.cppalliance.org/diff-report/index.html

Build time: 2026-03-12 19:04:10 UTC

@codecov
Copy link

codecov bot commented Mar 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.24%. Comparing base (9f057bd) to head (0393204).
⚠️ Report is 2 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #202      +/-   ##
===========================================
- Coverage    76.25%   76.24%   -0.01%     
===========================================
  Files           99       99              
  Lines        10570    10567       -3     
  Branches      2407     2406       -1     
===========================================
- Hits          8060     8057       -3     
  Misses        1797     1797              
  Partials       713      713              
Files with missing lines Coverage Δ
...e/boost/corosio/native/detail/kqueue/kqueue_op.hpp 76.51% <ø> (ø)
.../corosio/native/detail/kqueue/kqueue_scheduler.hpp 61.43% <ø> (-0.21%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f057bd...0393204. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mvandeberg mvandeberg force-pushed the bug/kqueue-false-connect branch from a675fdd to 8519ce2 Compare March 11, 2026 22:40
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp`:
- Around line 672-684: The bug is that write_ready is shared between connect_op
and write_op (cn/wr), so a cached write edge can be consumed by wr and leave
connect_op stuck; fix by introducing a separate flag (e.g., connect_ready or
connect_write_ready) or by making connect_op take priority: update the branches
that handle cn and wr (the blocks manipulating connect_op, write_op, cn, wr,
write_ready, retry) to check/clear the new connect_ready when servicing cn (or
to test cn first and consume the shared bit for connect_op before write_op), and
apply the same change to the analogous code around the other block handling
descriptor_state (the 712-722 region) so connect readiness is tracked/cleared
independently of write readiness.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 440a1465-eb20-4c31-9e98-bccea02c4927

📥 Commits

Reviewing files that changed from the base of the PR and between a675fdd and 8519ce2.

📒 Files selected for processing (2)
  • include/boost/corosio/native/detail/kqueue/kqueue_op.hpp
  • include/boost/corosio/native/detail/kqueue/kqueue_scheduler.hpp

The kqueue backend registers sockets for EVFILT_WRITE at open() time.
A freshly created socket is writable, so kqueue fires a stale event
before connect() completes. If the reactor processes this before the
kernel delivers the connect result (e.g. RST for ECONNREFUSED),
getsockopt(SO_ERROR) returns 0 and the connect falsely reports success.

Fix by adding a getpeername() check in connect perform_io() to verify
the connection is actually established when SO_ERROR is 0, returning
EAGAIN to re-park the op if not. Add EAGAIN handling for connect ops
in descriptor_state::operator()() to match the existing read/write
pattern.
@mvandeberg mvandeberg force-pushed the bug/kqueue-false-connect branch from 8519ce2 to 0393204 Compare March 12, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants