Skip to content

Fix RTC reconnection and WebSocket connection notification issue#885

Open
CMzhizhe wants to merge 1 commit intolivekit:mainfrom
CMzhizhe:fixbug_reconnect
Open

Fix RTC reconnection and WebSocket connection notification issue#885
CMzhizhe wants to merge 1 commit intolivekit:mainfrom
CMzhizhe:fixbug_reconnect

Conversation

@CMzhizhe
Copy link
Contributor

我发现了2处bug,都是关于底层重连的问题

1、当我手机断开了网络,LiveKit会产生如下日志,这bug的情况,不是必现的,有概率会出现

2026-03-14 18:14:57.361 LkLog                com...rexm.improject  D  WebSocket: websocket onFailure() t.error = Read error: ssl=0x749f6dfbc8: I/O error during system call, Software caused connection abort
2026-03-14 18:14:57.367 LkLog                com...rexm.improject  D  WebSocket:failed to validate connection
2026-03-14 18:14:57.379 LkLog                com...rexm.improject  D  WebSocket:websocket failure: null
2026-03-14 18:14:57.385 LkLog                com...rexm.improject  D  WebSocket:websocket closed
2026-03-14 18:14:57.391 LkLog                com...rexm.improject  D  RTC connect:connect retries = 0
2026-03-14 18:14:57.391 LkLog                com...rexm.improject  D  RTC connect:Reconnecting to signal, attempt 1
2026-03-14 18:14:57.492 LkLog                com...rexm.improject  D  RTC connect:connectionState.flow oldVal.connectionState = CONNECTED newVal.connectionState = RESUMING
2026-03-14 18:14:57.493 LkLog                com...rexm.improject  D  RTC connect:Attempting soft reconnect.
2026-03-14 18:14:57.493 LkLog                com...rexm.improject  D  WebSocket:Closing SignalClient: code = 1000, reason = Starting new connection
2026-03-14 18:14:57.496 LkLog                com...rexm.improject  D  WebSocket: connecting to wss://livekit.test.im.cn/rtc?protocol=13&reconnect=1&sid=PA_j2C578b7WryN&auto_subscribe=1&adaptive_stream=0&sdk=android&version=2.23.5.testlog.7&device_model=HUAWEI ELS-AN00&os=android&os_version=10&network=
2026-03-14 18:14:57.500 LkLog                com...rexm.improject  D  WebSocket: websocket onFailure() t.error = Unable to resolve host "livekit.test.xmsharetalk.cn": No address associated with hostname
2026-03-14 18:15:03.049 LkLog                com...rexm.improject  D  API_CALL:onIceConnection new state: DISCONNECTED
2026-03-14 18:15:13.064 LkLog                com...rexm.improject  D  API_CALL:onIceConnection new state: FAILED
2026-03-14 18:15:13.065 LkLog                com...rexm.improject  D  RTC connect:connectionState.flow oldVal.connectionState = RESUMING newVal.connectionState = DISCONNECTED
2026-03-14 18:15:13.065 LkLog                com...rexm.improject  D  RTC connect:primary ICE disconnected
2026-03-14 18:15:14.058 LkLog                com...rexm.improject  D  API_CALL:publisherObserver.connectionChangeListener reconnect()
2026-03-14 18:15:14.060 LkLog                com...rexm.improject  D  RTC connect:Reconnection is already in progress

websocket 调用onFailure(),liveKit sdk 会帮助我们恢复重连,从 2026-03-14 18:14:57.492 的时间点就可以看见,此时 connectionState 的值是 RESUMING,onIceConnection 的值会是ConnectionState.DISCONNECTED
2026-03-14 18:15:13.065 connectionState 会收到调用 oldVal.connectionState = RESUMING newVal.connectionState = DISCONNECTED ,livekit 漏了处理oldVal.connectionState = RESUMING的情况

When WebSocket calls onFailure(), the LiveKit SDK helps us recover and reconnect. This can be seen from the time point 2026-03-14 18:14:57.492, where connectionState is RESUMING and onIceConnection is ConnectionState.DISCONNECTED.
At 2026-03-14 18:15:13.065, connectionState will receive calls to oldVal.connectionState = RESUMING and newVal.connectionState = DISCONNECTED. LiveKit missed handling the case where oldVal.connectionState = RESUMING.

 val connectionStateListener: PeerConnectionStateListener = { newState ->
                    LkLogRecord.writeLog(LogLevel.ApiCall,"onIceConnection new state: $newState" )
                    if (newState.isConnected()) {
                        connectionState = ConnectionState.CONNECTED
                    } else if (newState.isDisconnected()) {
                        connectionState = ConnectionState.DISCONNECTED
                    }
                }
var connectionState: ConnectionState by flowDelegate(ConnectionState.DISCONNECTED) { newVal, oldVal ->
        if (newVal == oldVal) {
            return@flowDelegate
        }
        LkLogRecord.writeLog(LogLevel.RTCConnect, "connectionState.flow oldVal.connectionState = $oldVal newVal.connectionState = $newVal")
        when (newVal) {
            ConnectionState.CONNECTED -> {
                if (oldVal == ConnectionState.DISCONNECTED || oldVal == ConnectionState.CONNECTING) {
                    LkLogRecord.writeLog(LogLevel.RTCConnect, "primary ICE connected")
                    LKLog.d { "primary ICE connected" }
                    listener?.onEngineConnected()
                } else if (oldVal == ConnectionState.RECONNECTING) {
                    LkLogRecord.writeLog(LogLevel.RTCConnect, "primary ICE reconnected")
                    LKLog.d { "primary ICE reconnected" }
                    listener?.onEngineReconnected()
                } else if (oldVal == ConnectionState.RESUMING) {
                    LkLogRecord.writeLog(LogLevel.RTCConnect, "onEngineResumed()")
                    listener?.onEngineResumed()
                }
            }

            ConnectionState.DISCONNECTED -> {
                LkLogRecord.writeLog(LogLevel.RTCConnect, "primary ICE disconnected")
                LKLog.d { "primary ICE disconnected" }
                if (oldVal == ConnectionState.CONNECTED) {
                    LkLogRecord.writeLog(LogLevel.RTCConnect, "ready call reconnect()")
                    reconnect()
                }
            }

            else -> {
            }
        }
    }

2、当我手机断开了网络 websocket 无法连接服务器,会调用到 onFailure() ,但是不会通知上层调用者


2026-03-16 15:01:28.668 LkLog                com...rexm.improject  D  WebSocket: websocket onFailure() t.error = Read error: ssl=0x742d1b0388: I/O error during system call, Software caused connection abort
2026-03-16 15:01:28.683 LkLog                com...rexm.improject  D  WebSocket:failed to validate connection
2026-03-16 15:01:28.684 LkLog                com...rexm.improject  D  WebSocket:websocket failure: null
2026-03-16 15:01:28.685 LkLog                com...rexm.improject  D  WebSocket:websocket closed
2026-03-16 15:01:28.685 LkLog                com...rexm.improject  D  RTC connect:received close event: Read error: ssl=0x742d1b0388: I/O error during system call, Software caused connection abort, code: 3500 next ready call reconnect()
2026-03-16 15:01:28.687 LkLog                com...rexm.improject  D  RTC connect:connect retries = 0
2026-03-16 15:01:28.687 LkLog                com...rexm.improject  D  RTC connect:Reconnecting to signal, attempt 1
2026-03-16 15:01:28.788 LkLog                com...rexm.improject  D  RTC connect:connectionState.flow oldVal.connectionState = CONNECTED newVal.connectionState = RESUMING
2026-03-16 15:01:28.788 LkLog                com...rexm.improject  D  RTC connect:Attempting soft reconnect.
2026-03-16 15:01:28.789 LkLog                com...rexm.improject  D  WebSocket:Closing SignalClient: code = 1000, reason = Starting new connection
2026-03-16 15:01:28.790 LkLog                com...rexm.improject  D  WebSocket: connecting to wss://livekit.test.im.cn/rtc?protocol=13&reconnect=1&sid=PA_JWY8AjLikt2A&auto_subscribe=1&adaptive_stream=0&sdk=android&version=2.23.5&device_model=HUAWEI ELS-AN00&os=android&os_version=10&network=
2026-03-16 15:01:28.793 LkLog                com...rexm.improject  D  WebSocket: websocket onFailure() t.error = Unable to resolve host "livekit.test.xmsharetalk.cn": No address associated with hostname
2026-03-16 15:01:28.795 LkLog                com...rexm.improject  D  WebSocket:failed to validate connection
2026-03-16 15:01:28.796 LkLog                com...rexm.improject  D  WebSocket:websocket failure: null
2026-03-16 15:01:28.796 LkLog                com...rexm.improject  D  WebSocket: connect cancelled, abort websocket
2026-03-16 15:01:28.797 LkLog                com...rexm.improject  D  RTC connect:Error during reconnection.
2026-03-16 15:01:28.798 LkLog                com...rexm.improject  D  RTC connect:connect retries = 1
2026-03-16 15:01:28.799 LkLog                com...rexm.improject  D  RTC connect:Reconnecting to signal, attempt 2
2026-03-16 15:01:29.400 LkLog                com...rexm.improject  D  RTC connect:Attempting full reconnect.
2026-03-16 15:01:29.401 LkLog                com...rexm.improject  D  RTC connect:connectionState.flow oldVal.connectionState = RESUMING newVal.connectionState = RECONNECTING
2026-03-16 15:01:29.401 LkLog                com...rexm.improject  D  RTC connect:Full Reconnecting
2026-03-16 15:01:29.401 LkLog                com...rexm.improject  D  RoomEvent:getRoom()?.events?.collect = io.livekit.android.events.RoomEvent$Reconnecting
2026-03-16 15:01:29.535 LkLog                com...rexm.improject  D  WebSocket:Closing SignalClient: code = 1000, reason = Full Reconnecting
2026-03-16 15:01:29.536 LkLog                com...rexm.improject  D  RoomEvent:getRoom()?.events?.collect = io.livekit.android.events.RoomEvent$TrackUnpublished
2026-03-16 15:01:29.536 LkLog                com...rexm.improject  D  PIPELINE:RoomEvent.TrackUnpublished localUser sid = PA_JWY8AjLikt2A publication.source = screenShare 
2026-03-16 15:01:29.536 LkLog                com...rexm.improject  D  RoomEvent:getRoom()?.events?.collect = io.livekit.android.events.RoomEvent$TrackUnpublished
2026-03-16 15:01:29.536 LkLog                com...rexm.improject  D  API_CALL:joinImpl() url = wss://livekit.test.im.cn options = ConnectOptions(autoSubscribe=true, iceServers=null, rtcConfig=null, audio=false, video=false, protocolVersion=v13) roomOptions = RoomOptions(adaptiveStream=false, dynacast=true, e2eeOptions=null, audioTrackCaptureDefaults=LocalAudioTrackOptions(noiseSuppression=true, echoCancellation=true, autoGainControl=true, highPassFilter=true, typingNoiseDetection=true), videoTrackCaptureDefaults=LocalVideoTrackOptions(isScreencast=false, deviceId=null, position=FRONT, captureParams=VideoCaptureParameter(width=1280, height=720, maxFps=30, adaptOutputToDimensions=true)), audioTrackPublishDefaults=AudioTrackPublishDefaults(audioBitrate=48000, dtx=true, red=true, preconnect=false), videoTrackPublishDefaults=VideoTrackPublishDefaults(videoEncoding=null, simulcast=true, videoCodec=h264, scalabilityMode=L3T3, backupCodec=BackupVideoCodec(codec=vp8, encoding=null, simulcast=true), degradationPreference=null, simulcastLayers=[H720, H360]), screenShareTrackCaptureDefaults=LocalVideoTrackOptions(isScreencast=true, deviceId=null, position=FRONT, captureParams=VideoCaptureParameter(width=1920, height=1080, maxFps=30, adaptOutputToDimensions=true)), screenShareTrackPublishDefaults=VideoTrackPublishDefaults(videoEncoding=VideoEncoding(maxBitrate=2000000, maxFps=24), simulcast=false, videoCodec=vp8, scalabilityMode=null, backupCodec=null, degradationPreference=null, simulcastLayers=null)) token = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpZHNMVWFleFdVUCIsIm1ldGFkYXRhIjoie1wiYXZhdGFyXCI6XCJodHRwczovL2ltLXRlc3QudG9zLWNuLWd1YW5nemhvdS52b2xjZXMuY29tL1BpY3R1cmUvMjAyNS0wMi0xMC9waWN0dXJlXzE3MzkxNzExMTE4NDVfNTFfNTEuanBnXCIsXCJyb2xlXCI6W1wiY3JlYXRlXCJdLFwidXBkYXRlX2ZpZWxkXCI6XCJcIixcImNoYW5nZV9mrmCIsIm5iZiI6MTc3MzY0NiLCJtaWNyb3Bob25lIiwiY2FtZXJhIiwic2NyZWVuX3NoYXJlIiwic2NyZWVuX3NoYXJlX2F1ZGlvIl0sImNhblN1YnNjcmliZSI6dHJ1ZSwicm9vbSI6ImdvbWVldF91c2VyXzIwMzMzODU4OTI4ODIwNDI4ODEiLCJyb29tSm9pbiI6dHJ1ZX19.z5MxCNa_UQDXiHziQkGrfBW7UGrX2RsvHmY2aEf_-pA
2026-03-16 15:01:29.536 LkLog                com...rexm.improject  D  PIPELINE:RoomEvent.TrackUnpublished localUser sid = PA_JWY8AjLikt2A publication.source = microphone 
2026-03-16 15:01:29.536 LkLog                com...rexm.improject  D  WebSocket:Closing SignalClient: code = 1000, reason = Starting new connection
2026-03-16 15:01:29.537 LkLog                com...rexm.improject  D  WebSocket: connecting to wss://livekit.test.im.cn/rtc?protocol=13&auto_subscribe=1&adaptive_stream=0&sdk=android&version=2.23.5&device_model=HUAWEI ELS-AN00&os=android&os_version=10&network=
2026-03-16 15:01:29.541 LkLog                com...rexm.improject  D  WebSocket: websocket onFailure() t.error = Unable to resolve host "livekit.test.im.cn": No address associated with hostname

在onFailure()方法中,只有当wasConnected为true时,才会调用handleWebSocketClose。但在重新连接场景中,WebSocket在尝试重新连接时失败——此时isConnected已经为false(它是由之前的handleWebSocketClose调用或close()方法设置为false的)。因此,由于wasConnected为false,handleWebSocketClose永远不会被调用,这意味着listener?.onClose(...)也永远不会被调用,上层永远不会得到通知。

In the onFailure() method, handleWebSocketClose is only called if wasConnected is true. However, in a reconnection scenario, the WebSocket fails when attempting to reconnect—at which point isConnected is already false (either because of a previous call to handleWebSocketClose or because the close() method sets it to false). Therefore, since wasConnected is false, handleWebSocketClose will never be called, meaning listener?.onClose(...) will never be called, and the upper layer will never be notified.

…nnection, no notification was sent to the upper-layer application.
@changeset-bot
Copy link

changeset-bot bot commented Mar 16, 2026

⚠️ No Changeset found

Latest commit: 08ecf59

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant