-
-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQTT crashes ems-esp while getting disconnected to server and reconnecting afterwards #1067
Comments
@tp1de Could you compile/test the tech-branch? Or shoud i add a bin, or merge to my 3.5.1-dev? With the RC310 you have more entities, i get not restarts with my system, even if i enable HA discovery. |
Hi guys I am lost for the moment how to get the tech upgrade. When I clone with command: git clone https://github.com/emsesp/EMS-ESP32.git So I used Michaels repository and selected the tech-upgrade branch and compiled to: EMS-ESP-3_5_0-tec_4-ESP32.bin Please advice how to go forward ..... btw: What have you changed? |
Here's a bin file : EMS-ESP-3_5_0-tec_4-ESP32_4M.zip The tec branch has various optimizations under the hood which I'm working on. There are no breaking changes or new features. |
thanks @proddy for bin. I get the same upload error 507. What to do? I finally managed to compile form original repro with tech upgrade. But upload does not work for this file neither. (error 507) |
the WWWData.h file is built when you use platformio to build the firmware. Do you have a custom pio_local.ini where you override settings? Did you use yarn to build the webui? I don't get the 507 error. This is what the filesize should look like:
|
I started with an empty directory on scratch. No own ini-files. Main I can compile and upload without errors. But your bin does not work either for upload. |
See: https://yarnpkg.com/getting-started/install |
@MichaelDvP thanks for reply. I was able to compile and upload from your tech-upgrade branch. As far as I can observe after first testing the crash can be avoided by limiting the number of entries in the queue. Nevertheless the logic stays the same. The entities are removed first and then they are new created. |
No. Do you have the latest commit? Only inactive (missing) entities are removed, active are rebuild. (This is the idea, maybe i've done a mistake in code). |
Yes. I shall use the tech-upgrade branch and not the dev - isn't it? ... and after doing some more testing I still have sometimes new reboots / crashes - maybe in 30% of all cases. |
The tech branch is updated frequently, do you have commit 050ecd9 You can also use my dev v3.5.1-dev0 https://github.com/MichaelDvP/EMS-ESP32/releases/tag/latest |
btw: I believe that I might have found the root cause for my Wifi stability problem then causing some of the reboots. #1072 |
I used the bin from there. Is this ok? (I do not know how to check if I have all commits ...) As far as I can see after first tests, the thermostat and mixer entities are not removed anymore. But the boiler entities are still deleted and made new - this takes a while. Could this be @MichaelDvP ? |
wow, never knew that. Thanks Michael. I've updated the docs too https://emsesp.github.io/docs/Building/ |
I'va added another PR, with this change i get for coldstart and reconnect this: (the extra logging is not included in official software)
|
@MichaelDvP how to use it? It's not within your latest bin. And when I compile neither .... what it's wrong on my side? |
As mentioned, i've added the counting for my personal analysis, it's not in the official builds. |
I'm getting crashes with the MQTT HA discovery enabled. I haven't had time to locate what is causing it. To reproduce compile with |
Let me recommend for reconnecting mqtt (HA):
|
it was a bug I introduced, sorry. fixing. |
Installed the automation - works I get 2 notifications with one minute in between. Is ems-esp restarting then twice? |
i think it's because you set the retained flag? best check with MQTTExplorer. In any case don't take any of the tec builds for now because they ain't working! |
retain means the server keeps the messages if client is down, but if server is down it keeps only if it has a permanent storage, that's not always, so we have to resend. |
OK guys thanks for clarifying. I will wait until the reboot topic / crash is solved. Since I installed the MQTT server independent from the HA / ioBroker installation now, the offline backups have no influence on ems-esp anymore and while changing from WPA2/WP3 to WPA2(fixed) Wifi seems to be stable as well (I will observe more). |
@MichaelDvP I'm really struggling to find why it's crashing and think it's memory related. Are we adding more to the MQTT now? I solved in temporarily in edit: running a performance test, with the new |
@proddy take it easy. I wish all the best and a good course of your recovery. |
This test build also crashes wth mqtt enabled and HA disabled. Heap is 103/65 and should not be the problem, Maybe a serialization buffer inside mqtt-lib? |
I have no idea. I'm still playing around and see what I find. I wish I documented the logic! |
I merged the HA config changes from @pswid so we don't need to remove the HA topics on a restart or reconnect anymore. But still something is eating the heap and I can't find out why. In any case, one good change would be to put process_queue() in it's on thread (using xTaskCreatePinnedToCore) so its always looking at the queue. This will keep the buffer queue smaller. |
This change from @pswid is great, so we only have to rebuild. But ony skipping the config messages is not good:
I tried to stop generating configs on low heap, insert in
The creation stops on low heap and continues on next mqtt schedule. I'll merge the pswid PR to my 3.5.1-dev, so we have some more testers how needs a binary. |
I was thinking maybe we're making this too complicated. An MQTT broker is essentially a queue and we're trying to code our own smart queue on top of it. The original design was based around
I believe the rule should be to publish the Mqtt messages, i.e. clear our own queue, as fast as we can, and constantly. So a new design could be
|
Just tell when this done .... I am ready to test. |
it's already in the tech-upgrade branch. |
If I compile und install from your tech-upgrade branch ems-esp is permanently rebooting. The version before and the actual version both do not work stable. (rebooting after approx. 15-20 seconds) |
I compiled the dev branch from Michael including the last 7 new commits. This version seems to work. I tested 3 times until yet to stop mqtt server and restarting it. Until now no crashes / reboots anymore and the removal of the configs seems not to happen anymore. I will continue testing. What is different from @MichaelDvP repository and yours @proddy ? |
I tested a bit more. Everything is fine for me for reconnecting to mqtt server. 👍 I was just surprised that on restarting ems-esp there are no config removals anymore. |
about 200 commits. There's still a bug which I'm hunting down. Doesn't crash on my system but something doesn't look right. |
I tested a bit more. When deleting configs with mqtt explorer then a restart will create them new again. (restart needed) btw: I recognized that 70% of the ems-esp system HA entities are renamed? Was this intended? I lost therefor some statistics (e.g. WiFi strength). Not really a problem but not expected. |
think i fixed it, changed so many things I still don't know what caused it. A nullptr somewhere I expect. |
can you give an example or the HA entity name before and after? |
Not really anymore before. E.g. Uptime stays as before wifi strength or rssi not. |
Have you checked/changed mqtt-settings: Entity-ID-format? |
No changes in format, all other entities are the same, just those related to ems-esp (system) are changed (not all). No problem for me, I was just surprised. I recognized while watching my WiFi stability. With the changed entity I lost the long-term history.... |
I haven't noticed it, but I'm using a different branch and build. You said 70% of the entities had changed so I was worried for a minute. |
old name ---- new name sensor.system_wifi_rssi - sensor.system_rssi |
well spotted! I'll do a fix. EDIT: couldn't reproduce it. I tested all MQTT options on the official 3.5.0 release against the tech-upgrade branch and the sensor names in HA are identical. Are you using another build from somewhere? |
Not as far as I can remember .... it's not important for me anyhow. |
@MichaelDvP @proddy as discussed on Discord:
When MQTT (HA) gets disconnected and reconnects after a while, all entities are removed and created new.
Michael mentioned on Discord:
"this generates a lot of messages (EMSdevice::ha_config_clear() {), because it's for all devices at once, and fills the queue".
When queue becomes too long then ems-esp crashes and reboots. Then the same procedure starts again, but due to discovery of devices / entities while reading telegrams needs some time, the start-up works.
I tested that some seconds of "unavailability" of the mqtt-server brings my ems-esp system to crash.
As a result any short-time wifi-disconnects as well as longer offline backup-sessions will result in crashes.
I recognized that ems-esp wifi disconnects and reconnects during the day and changing the wifi access point on my mesh-network.
This takes some seconds. I might believe that this is the reason for my 2-3 crashes a day.
The automatic removal of the entities should be avoided wherever possible. It takes 1-2 minutes to recreate them by mqtt-discovery on HA. During this time dashboards and automations do not work or even crash. A MQTT reconnect does not need this removal function.
The text was updated successfully, but these errors were encountered: