-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/cc 2166 speedup eventing websocket reconnect #2736
Feature/cc 2166 speedup eventing websocket reconnect #2736
Conversation
…reconnect) if target is not local host
…edup-eventing-websocket-reconnect # Conflicts: # events/janus_wsevh.c
Thanks for your contribution, @JanFellner! Please make sure you sign our CLA, as it's a required step before we can merge this. |
As I said in the issue, the schedule is a no go, because it's only available on newer versions of the library. Even doing two different implementations to accommodate both, I think will unneededly complicate things, especially considering your changes are a big refactoring already. I'd rather try and fix the one we have now for the issue you identified. |
The code is aligned with what libwebsocket suggest for >= 3.2 (no own grown code, just the plain sample) I understand your concerns, tried my best documenting the code, using readable function names. I did not find a solution to get it working the lws_service with 3.2 just sleeps and i see no way in waking it up. (Testing with 3.1 in the mean time, loosen to what happens with the PR) |
Updated PR to work with lws 3.1 (now building with 3.1 and 4.2) |
CLA signed... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JanFellner , I think this is a good enhancement.
You missed the handling of re-connection in case of LWS_CALLBACK_CLOSED
.
I'm leaving some other notes that should be addressed.
events/janus_wsevh.c
Outdated
* Schedules a connect attempt using the lws scheduler as | ||
*/ | ||
void janus_wsevh_schedule_connect_attempt(void) { | ||
#if (LWS_LIBRARY_VERSION_MAJOR >= 3 && LWS_LIBRARY_VERSION_MINOR > 2) || (LWS_LIBRARY_VERSION_MAJOR > 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LWS_LIBRARY_VERSION_MAJOR >= 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
@atoppi are you using any kind for code formatting with rules? (using the same env as you vscode) |
@JanFellner No I'm not applying any kind of template in the IDE, just following the style in the rest of code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is still missing the reconnection schedule in case of LWS_CALLBACK_CLOSED
.
I've left other notes to address.
events/janus_wsevh.c
Outdated
/* Forward declarations */ | ||
void janus_wsevh_schedule_connect_attempt(void); | ||
void janus_wsevh_calculate_reconnect_delay_on_fail(void); | ||
// lws_sorted_usec_list_t is defined starting with lws 3.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use /*
syntax for comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
While working on the code changes i lost the closed in mind, will address it |
Hope i did now cover everything :) |
As much of the comments were styling related, i wonder if you ever discussed using code formatters. |
@JanFellner nope, never discussed about this. |
I am a friend about automation. and see it loosen from the rules you apply. What you want to specify, specify it. (or in other words, what you have to comment in a PR more than 5 times get it automated ;] ) |
It seems to work as intended on lws v4.2 :-) However when specifying an invalid address, the timeout on my machine is 15 seconds. |
…double disconnected log entry
Added a connected log entry and removed a duplicate disconnected log entry |
I added that value with d914fa4 for lws>=3.2 I used it now the whole day while developing debugging the signalling server. |
…dded with lws 4.1
@JanFellner lgtm 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Added some editorial notes and a question inline.
events/janus_wsevh.c
Outdated
/* | ||
* Websocket thread loop for websocket library newer than 3.2 | ||
* The reconnect is handled in a dedicated lws scheduler janus_wsevh_schedule_connect_attempt | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style: comments should be in the style below
/* Bla bla
* bla bla */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
events/janus_wsevh.c
Outdated
JANUS_LOG(LOG_VERB, "Joining WebSocketsEventHandler (lws>=3.2) client thread\n"); | ||
int n = 0; | ||
while(n >= 0 && g_atomic_int_get(&initialized) && !g_atomic_int_get(&stopping)) | ||
n = lws_service(context, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What can cause lws_service
to return a negative integer? I'm thinking of intermittent errors that could cause this loop to break before Janus is shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It´s copy pasted from the sample actually.
I will check inside libwebsocket if it returns a different value and what might be conditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you follow the functions through the lib there are some places where the function returns -1,
I did not validate them all but to me it looks like those are scenarios where the context is not properly settled and therefore further execution does not make sense...
- lws_service() returns 1 or lws_plat_service()
- lws_plat_service() returns _lws_plat_service_tsi()
- _lws_plat_service_tsi() returns on seldom cases -1 but the main negative returns are here lws_service_fd_tsi()
https://libwebsockets.org/git/libwebsockets/tree/lib/core-net/service.c?h=v4.2-stable#n624
The question is whats better in the end.
A non sleeping 100% core consuming thread or a thread termination :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that this could result in a fatal scenario due to a temporary issue: as a first integration of the return value, I'd just print the negative value in case of errors, but I'd move on with the loop (so no n>=0
in the while
). This would give us some more information from people's logs, in case it ever happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check 78bf993
events/janus_wsevh.c
Outdated
/* | ||
* Websocket thread loop for websocket library prior to (less than) 3.2 | ||
* The reconnect is handled in the loop for lws < 3.2 | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above on comments style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
events/janus_wsevh.c
Outdated
JANUS_LOG(LOG_WARN, "Error attempting reconnection...\n"); | ||
continue; | ||
janus_wsevh_connect_attempt(NULL); | ||
if (!wsi) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style: no space between 'if` and the bracket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
events/janus_wsevh.c
Outdated
disconnected = janus_get_monotonic_time(); | ||
reconnect = TRUE; | ||
JANUS_LOG(LOG_WARN, "WebSocketsEventHandler: Error attempting connection... (next retry in %ds)\n", reconnect_delay); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a log error above: please add the retry info there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
events/janus_wsevh.c
Outdated
reconnect_delay = 1; | ||
else if (reconnect_delay < JANUS_WSEVH_MAX_RETRY_SECS) { | ||
reconnect_delay += reconnect_delay; | ||
if (reconnect_delay > JANUS_WSEVH_MAX_RETRY_SECS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two if
above have an extra space that per code style shouldn't be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
events/janus_wsevh.c
Outdated
|
||
/** | ||
* Schedules a connect attempt using the lws scheduler as | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above on comments style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
events/janus_wsevh.c
Outdated
#if (LWS_LIBRARY_VERSION_MAJOR >= 3 && LWS_LIBRARY_VERSION_MINOR >= 2) || (LWS_LIBRARY_VERSION_MAJOR >= 4) | ||
lws_sul_schedule(context, 0, &sul_stagger, janus_wsevh_connect_attempt, reconnect_delay * LWS_US_PER_SEC); | ||
#endif | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a missing empty line at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
events/janus_wsevh.c
Outdated
#if !(((LWS_LIBRARY_VERSION_MAJOR == 3 && LWS_LIBRARY_VERSION_MINOR >= 2) || LWS_LIBRARY_VERSION_MAJOR >= 4)) | ||
#define lws_sorted_usec_list_t void | ||
#endif | ||
static void janus_wsevh_connect_attempt(lws_sorted_usec_list_t* sul); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per code style, the *
should be next to the pointer, not the type:
lws_sorted_usec_list_t *sul
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
events/janus_wsevh.c
Outdated
* Implements the connecting attempt to the upstream websocket server | ||
* sets the connection result (lws_client_connect_info) to static wsi | ||
*/ | ||
static void janus_wsevh_connect_attempt(lws_sorted_usec_list_t* sul) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here (*
next to the pointer, not type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Thanks @lminiero for the comments, will check those now... |
Pushed That one needs to be nailed down. I can also remove the negative check, i just took it from the samples... Concerning the styling stuff please do also check #2736 (comment) and the following two posts as suggestion... |
Logging lws_service result if it gets negative and just in case if it reverts back from negativ (or changes) as well. |
Looks good to me too, now! Thanks for the quick feedback and fixes, merging ✌️ |
PS: just noticed the automated builds weren't started, I'll do those before merging just to be sure everything's fine. |
As followup from #2734
Changed the implementation for the reconnect from the handling inside the lws_service thread to the scheduler approach as shown in the libsocket example. (https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/ws-client/minimal-ws-client-tx/minimal-ws-client.c)
The current approach is not working properly if the target is not localhost. The connection failed callback is triggered 5 seconds after the connect however the lws_service continues to sleep for 25 seconds.
Using the lws_sul_schedule the time when the next reconnect has to happen is exactly specified, so no need to do that in the lws_service loop and the loop just does the lws_service loop.
(!) Not yet tested with wsl 3.1 want to do that now but a short feedback in the mean time would be nice.