[stable17] Request offer again when receiving peer fails #2475

danxuliu · 2019-11-19T09:34:35Z

Replaces #1881

When the MCU is used and the connection with another participant failed until now there was no reconnection. Now a new offer is periodically requested until the connection is established again.

Note that even if the previous offer arrives after a new one is requested everything still works as expected. The MCU closes the previous connection whenever it sends a new offer, so if a previous offer arrives and a connection is made it will be closed as soon as the new offer is sent and reconnected again when the new offer is handled. Although receiving the previous offer would have stopped the periodic request of offers even if the new one failed the peer will eventually change to the ICE failed state, which will start a new reconnection routine.

In a related change, when the MCU is not used if no offer is received from the other participant the participant with the smaller ID now also sends an offer periodically instead of just once.

However, in this case, there are still glare issues (crossed offers between peers, it can be seen by replacing the break with a sleep(12) in scenario 1 below), they are not fixed by this change (without MCU this change only addresses missing offers).

Finally, a missing spinner on user avatars during the initial connection was also fixed.

How to test

Scenario 1

Randomly drop most offers by adding

if ($decodedMessage['type'] === 'offer' && rand(0, 100) < 90) {
    break;
}

before adding messages in the SignalingController

Start a call with a user
Join the call with another user or guest

Result with this pull request

The participants are eventually connected despite several offers having failed.

Result without this pull request

The participants are kept trying to connect with each other (unless either the first or the second offer succeeded; in that case leave the call and join again until the wrong behaviour is shown). Also, the avatar of the user does not show a spinner during the initial connection.

Scenario 2

Modify the code of the external signaling server to simulate a lot of failures (sorry for the formatting, I did not find a way to use lists and blocks of code)
- In mcu_janus.go
  - After log.Printf("Already connected as subscriber for %s, leaving and re-joining", p.streamType) add time.Sleep(5 * time.Second) to cause an answer to be ignored when a new offer is closing the previous one.
- In hub.go
  - Add "math/rand" to the imports at the top
  - Replace mc, err = session.GetOrCreateSubscriber(ctx, h.mcu, message.Recipient.SessionId, data.RoomType) in processMcuMessage with
    if rand.Intn(100) < 70 { err = nil } else { mc, err = session.GetOrCreateSubscriber(ctx, h.mcu, message.Recipient.SessionId, data.RoomType) }
    to drop a lot of offers instead of sending them.
  - Before those lines, add
    if rand.Intn(100) < 90 { time.Sleep(10 * time.Second) }
    to delay most of the offers longer than the timeout used in the WebUI to request new ones (unlike the other issues I do not recall seeing this in the real world, but worth checking it just in case).
  - Replace ctx, cancel := context.WithTimeout(context.Background(), h.mcuTimeout) with ctx, cancel := context.WithTimeout(context.Background(), time.Duration(18) * time.Second) to ensure that the delayed offer will be eventually sent (otherwise it would be cancelled due to the timeout); another option would have been to change the default timeout in the configuration.
Setup the MCU
Start a call with a user
Join the call with another user or guest

Result with this pull request

The participants are eventually connected despite several offers having failed or arrived out of order due to a delay, and even if a connection failed during the negotiation.

Result without this pull request

The participants are kept trying to connect with each other (unless the first and the second offer succeeded without delay; in that case leave the call and join again until the wrong behaviour is shown). Also, the avatar of the user does not show a spinner during the initial connection.

danxuliu · 2019-11-28T04:13:52Z

I have fixed the issue with the loading icon sometimes appearing on the user avatars even if they are connected.

The problem was that the icon was restored based on the connection status at the time that the avatar was initially set, but the icon should be restored based on the connection status at the time that the image is finally loaded. Otherwise it could happen that the avatar was set when the connection was still new, the connection changed to connected, and then the image was loaded, which restored the loading icon based on the original new status.

nickvergessen

🐘 Calling works on our company instance
Remaining loading circle was fixed

In some cases, for example during the initial connection, the loading icon was not shown on the user avatar. The reason was that the "avatar()" function adds a loading icon to the element while the image is being fetched and removes it once done. Due to this, if the element had a loading icon before it will be removed after the image is fetched, and thus needs to be restored. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

The property will be used too when the MCU is used, but in this case to request an offer from a peer instead of the send the offer to that peer (but in both cases the goal is to connect with the peer). Signed-off-by: Daniel Calviño Sánchez <[email protected]>

When the MCU is not used the peer with the larger ID sends an offer to the peer with the smaller ID. However, sometimes the peer with the larger ID may fail to notice a disconnection and not send an offer again, so if the peer with the smaller ID does not get an offer in a reasonable time it also sends an offer. Before the offer was sent just once, so if the offer sent by the peer with the smaller ID was not handled for some reason by the other peer no reconnection happened. Now the offer is periodically resent until the connection is established again. Note that, besides improving reconnections, this also helps with the initial connection, as if the initial offers are not handled now the peer with the smaller ID will try to connect again and again (although there are still "glare" issues if offers are crossed between peers). Signed-off-by: Daniel Calviño Sánchez <[email protected]>

When the MCU is used and the connection fails in the peer that sent the offer, that is, the own peer object that sends the local media to the MCU, a forced reconnection is triggered. However, if the peer that failed was one used to receive the media from the other participants there was no reconnection, and therefore the other participant could be no longer heard or seen. Now, if the connection with one of those peers fail an offer is requested again to the MCU periodically until the connection is established again. The same approach is used too in the initial connection to those peers, as in some cases the initial offer request may fail, which left the peers disconnected. The periodic retry is stopped as soon as an offer is received, but even if that offer fails to connect eventually an ICE fail will be triggered, which will start a new reconnection routine. Finally, note that a forced reconnection when the connection to other participants fail is not only unneeded, but also it would be problematic, as a network issue in one participant could trigger a forced reconnection for all the other participants. Signed-off-by: Daniel Calviño Sánchez <[email protected]>

nickvergessen · 2019-11-29T14:19:29Z

Rebased for the fixup

danxuliu added the 2. developing label Nov 19, 2019

danxuliu added this to the 💙 Next Minor (17) milestone Nov 19, 2019

nickvergessen added the deployed on testing label Nov 19, 2019

danxuliu force-pushed the request-offer-again-when-receiving-peer-fails branch from 420ca92 to 5609eb7 Compare November 22, 2019 13:58

danxuliu added 3. to review feature: call 📹 Voice and video calls feature: frontend 🖌️ "Web UI" client feature: WebRTC 🚡 WebRTC connection between browsers and/or mobile clients and removed 2. developing labels Nov 22, 2019

danxuliu requested review from Ivansss, fancycode and nickvergessen November 22, 2019 14:09

danxuliu marked this pull request as ready for review November 22, 2019 14:09

danxuliu mentioned this pull request Nov 22, 2019

[stable17] Send offer again when own peer is not initially connected #2488

Closed

marcoambrosini assigned danxuliu Nov 26, 2019

nickvergessen added the bug label Nov 29, 2019

nickvergessen approved these changes Nov 29, 2019

View reviewed changes

danxuliu added 4 commits November 29, 2019 15:19

nickvergessen force-pushed the request-offer-again-when-receiving-peer-fails branch from 62593b9 to 6276099 Compare November 29, 2019 14:19

nickvergessen added 4. to release and removed 3. to review labels Nov 29, 2019

nickvergessen merged commit 0eaf8ab into stable17 Dec 4, 2019

nickvergessen deleted the request-offer-again-when-receiving-peer-fails branch December 4, 2019 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[stable17] Request offer again when receiving peer fails #2475

[stable17] Request offer again when receiving peer fails #2475

Uh oh!

danxuliu commented Nov 19, 2019 •

edited

Loading

Uh oh!

danxuliu commented Nov 28, 2019

Uh oh!

nickvergessen left a comment

Uh oh!

nickvergessen commented Nov 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[stable17] Request offer again when receiving peer fails #2475

[stable17] Request offer again when receiving peer fails #2475

Uh oh!

Conversation

danxuliu commented Nov 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to test

Scenario 1

Result with this pull request

Result without this pull request

Scenario 2

Result with this pull request

Result without this pull request

Uh oh!

danxuliu commented Nov 28, 2019

Uh oh!

nickvergessen left a comment

Choose a reason for hiding this comment

Uh oh!

nickvergessen commented Nov 29, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danxuliu commented Nov 19, 2019 •

edited

Loading