-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Rate implementation on the Level Control Move command is not spec complaint. #24193
Comments
This problem with flash wear out is also related to this issue. currentLevel is specified to always be persistent. However currentLevel gets rapidly changed during Move commands and over fade in/out commands. This can result is committing many page writes to flash very rapidly leading to flash wear out. A typical goal of persistent currentLevel is is to restore the light when a smart bulb is connected to a dumb light switch. Another strategy could achieve this more efficiently. Change currentLevel to be non-persistent and then when the fade in/out or Move command is finished (or begins), update defaultOnValue to match current level. This behavior should be optionally enabled, again to avoid needless flash wear. In general, the Move logic should be reworked to have the minimal possible side effects in order to achieve a maximal ramp rate of 254 (4ms). There should be no need for continuous updates to globally visible attributes during the ramping process. |
if you make a clever implementation you are not going to write each increment of LevelControl immediately into flash |
@jonsmirl Why do you conclude "So all rate settings over 10 are out of spec." |
I am aware of how to break the flash commit model and cache the persistent results until a timer tick. However, I don't think it is a wise thing to do. An an alternative design would recognize that a rapidly changing currentLevel is not something that should be persisted and would consider the strategy of updating the defaultOnLevel attribute at the beginning or end of the command. |
A design consideration would be to consider that the maximal rate the Move command can move at is 254 units per second and then ask, what can consume updates at the rate of 254 units per second? If there is nothing that consume updates at this rate, them why are the updates being done? |
The spec allows you to make bad products. Clever engineers will make products that comply with the spec and don't wear out the flash. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Reproduction steps
Issue a Level Control Move command with a rate of 254 from 0 to 254. Note that on most non-PC platforms the ramp can not complete in one second like the spec says it has to. When testing on ESP32 best I could get is 10.2 seconds for the ramp to complete. So all rate settings over 10 are out of spec.
Bug prevalence
always
GitHub hash of the SDK that was being used
903e149
Platform
core
Platform Version(s)
No response
Anything else?
Note: this issue has an easy way for humans to observe the tick performance problem - pressing the button on a dimmer and expecting the lights to ramp in a reasonable amount of time to a human. With the current code the fastest tick rate I can achieve on an ESP32 is 40ms. 40ms tick time implies you have to hold the button to ramp from 0 to 254 for 10.2 seconds. Most humans will object to this slow ramp speed.
The spec says: rate 100 is 100 units per second, so 2.5 second total ramp time. rate 254 is 254 units per second so 1 second ramp time. Since the fastest ramp time I can achieve is 10.2s, all ramp rates over 10 are out of spec.
This tick routine is simply doing way, way too much junk on each timer tick. For example a null level change routine just doing a return from my code runs in 20ms minimum on ESP32. The spec says that it has to be 4ms at max rate so even with zero external code the internal CHIP routine can't meet the spec.
The quick fix: The routine is recording the system clock so the routine can check and see if it is falling behind. If it is falling behind don't ramp by a single unit, instead do some division and jump enough units to not fall behind. The spec just says that at a rate of 100 units in one second, it doesn't say you have to ramp them one unit at a time. At a rate of 255 with my round trip time of 40ms it would increment by 1 the first tick and then see that it is behind 11 increments, so it would jump to 12 on the next tick. not a perfect fix, but it keeps everything within the spec.
Longer term fix -- stop doing so much junk while ramping. Don't do any network activity during the ramp. Wait until the end of the ramp only then update the network and save a new default on value.
Also -- this needs to get tested. Rate parameters like this are in multiple places. Testing needs to ensure that they are spec compliant.
The text was updated successfully, but these errors were encountered: