Skip to content

Conversation

@danxuliu
Copy link
Member

@danxuliu danxuliu commented Nov 19, 2019

Replaces #1881

When the MCU is used and the connection with another participant failed until now there was no reconnection. Now a new offer is periodically requested until the connection is established again.

Note that even if the previous offer arrives after a new one is requested everything still works as expected. The MCU closes the previous connection whenever it sends a new offer, so if a previous offer arrives and a connection is made it will be closed as soon as the new offer is sent and reconnected again when the new offer is handled. Although receiving the previous offer would have stopped the periodic request of offers even if the new one failed the peer will eventually change to the ICE failed state, which will start a new reconnection routine.

In a related change, when the MCU is not used if no offer is received from the other participant the participant with the smaller ID now also sends an offer periodically instead of just once.

However, in this case, there are still glare issues (crossed offers between peers, it can be seen by replacing the break with a sleep(12) in scenario 1 below), they are not fixed by this change (without MCU this change only addresses missing offers).

Finally, a missing spinner on user avatars during the initial connection was also fixed.

How to test

Scenario 1

  • Randomly drop most offers by adding
if ($decodedMessage['type'] === 'offer' && rand(0, 100) < 90) {
    break;
}

before adding messages in the SignalingController

  • Start a call with a user
  • Join the call with another user or guest

Result with this pull request

The participants are eventually connected despite several offers having failed.

Result without this pull request

The participants are kept trying to connect with each other (unless either the first or the second offer succeeded; in that case leave the call and join again until the wrong behaviour is shown). Also, the avatar of the user does not show a spinner during the initial connection.

Scenario 2

  • Modify the code of the external signaling server to simulate a lot of failures (sorry for the formatting, I did not find a way to use lists and blocks of code)
    • In mcu_janus.go
      • After log.Printf("Already connected as subscriber for %s, leaving and re-joining", p.streamType) add time.Sleep(5 * time.Second) to cause an answer to be ignored when a new offer is closing the previous one.
    • In hub.go
      • Add "math/rand" to the imports at the top
      • Replace mc, err = session.GetOrCreateSubscriber(ctx, h.mcu, message.Recipient.SessionId, data.RoomType) in processMcuMessage with
        if rand.Intn(100) < 70 { err = nil } else { mc, err = session.GetOrCreateSubscriber(ctx, h.mcu, message.Recipient.SessionId, data.RoomType) }
        to drop a lot of offers instead of sending them.
      • Before those lines, add
        if rand.Intn(100) < 90 { time.Sleep(10 * time.Second) }
        to delay most of the offers longer than the timeout used in the WebUI to request new ones (unlike the other issues I do not recall seeing this in the real world, but worth checking it just in case).
      • Replace ctx, cancel := context.WithTimeout(context.Background(), h.mcuTimeout) with ctx, cancel := context.WithTimeout(context.Background(), time.Duration(18) * time.Second) to ensure that the delayed offer will be eventually sent (otherwise it would be cancelled due to the timeout); another option would have been to change the default timeout in the configuration.
  • Setup the MCU
  • Start a call with a user
  • Join the call with another user or guest

Result with this pull request

The participants are eventually connected despite several offers having failed or arrived out of order due to a delay, and even if a connection failed during the negotiation.

Result without this pull request

The participants are kept trying to connect with each other (unless the first and the second offer succeeded without delay; in that case leave the call and join again until the wrong behaviour is shown). Also, the avatar of the user does not show a spinner during the initial connection.

@danxuliu danxuliu added this to the 💙 Next Minor (17) milestone Nov 19, 2019
@danxuliu danxuliu force-pushed the request-offer-again-when-receiving-peer-fails branch from 420ca92 to 5609eb7 Compare November 22, 2019 13:58
@danxuliu danxuliu added 3. to review feature: call 📹 Voice and video calls feature: frontend 🖌️ "Web UI" client feature: WebRTC 🚡 WebRTC connection between browsers and/or mobile clients and removed 2. developing labels Nov 22, 2019
@danxuliu danxuliu marked this pull request as ready for review November 22, 2019 14:09
@danxuliu
Copy link
Member Author

I have fixed the issue with the loading icon sometimes appearing on the user avatars even if they are connected.

The problem was that the icon was restored based on the connection status at the time that the avatar was initially set, but the icon should be restored based on the connection status at the time that the image is finally loaded. Otherwise it could happen that the avatar was set when the connection was still new, the connection changed to connected, and then the image was loaded, which restored the loading icon based on the original new status.

Copy link
Member

@nickvergessen nickvergessen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐘 Calling works on our company instance
Remaining loading circle was fixed

In some cases, for example during the initial connection, the loading
icon was not shown on the user avatar. The reason was that the
"avatar()" function adds a loading icon to the element while the image
is being fetched and removes it once done. Due to this, if the element
had a loading icon before it will be removed after the image is fetched,
and thus needs to be restored.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
The property will be used too when the MCU is used, but in this case to
request an offer from a peer instead of the send the offer to that peer
(but in both cases the goal is to connect with the peer).

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
When the MCU is not used the peer with the larger ID sends an offer to
the peer with the smaller ID. However, sometimes the peer with the
larger ID may fail to notice a disconnection and not send an offer
again, so if the peer with the smaller ID does not get an offer in a
reasonable time it also sends an offer.

Before the offer was sent just once, so if the offer sent by the peer
with the smaller ID was not handled for some reason by the other peer
no reconnection happened. Now the offer is periodically resent until the
connection is established again.

Note that, besides improving reconnections, this also helps with the
initial connection, as if the initial offers are not handled now the
peer with the smaller ID will try to connect again and again (although
there are still "glare" issues if offers are crossed between peers).

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
When the MCU is used and the connection fails in the peer that sent the
offer, that is, the own peer object that sends the local media to the
MCU, a forced reconnection is triggered. However, if the peer that
failed was one used to receive the media from the other participants
there was no reconnection, and therefore the other participant could be
no longer heard or seen.

Now, if the connection with one of those peers fail an offer is
requested again to the MCU periodically until the connection is
established again. The same approach is used too in the initial
connection to those peers, as in some cases the initial offer request
may fail, which left the peers disconnected.

The periodic retry is stopped as soon as an offer is received, but even
if that offer fails to connect eventually an ICE fail will be triggered,
which will start a new reconnection routine.

Finally, note that a forced reconnection when the connection to other
participants fail is not only unneeded, but also it would be
problematic, as a network issue in one participant could trigger a
forced reconnection for all the other participants.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
@nickvergessen nickvergessen force-pushed the request-offer-again-when-receiving-peer-fails branch from 62593b9 to 6276099 Compare November 29, 2019 14:19
@nickvergessen
Copy link
Member

Rebased for the fixup

@nickvergessen nickvergessen merged commit 0eaf8ab into stable17 Dec 4, 2019
@nickvergessen nickvergessen deleted the request-offer-again-when-receiving-peer-fails branch December 4, 2019 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4. to release bug deployed on testing feature: call 📹 Voice and video calls feature: frontend 🖌️ "Web UI" client feature: WebRTC 🚡 WebRTC connection between browsers and/or mobile clients

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants