-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive lag #332
Comments
|
Actually. The reason why nanosleep was more accurate was that for small sleeps, less than 2ms, when the thread is scheduled with a real-time scheduler, e.g. My tests showed pretty similar performance for pthread_cond_timedwait and nanosleep, in the order of 60-90 usec lag. A simple optimization would be to wakeup 200usec earlier and do a busy wait for the remainder. This we can do for both timedwait and nanosleep. I think this optimization should be optional as it consumes more CPU to get more accurate timing. So this optimization based on physical actions might not be worth it. (@lhstrh: FYI) |
Thanks for the tag. Where is the busy-wait happening? In our implementation or in |
I am going to do a more thorough comparison on my desktop before we make any conclusion. The busy-waiting was done inside nanosleep itself. But in newer versions of the Linux kernel this optimization is removed. It is worth considering bringing in such an optimization in reactor-c, Edward already proposed it, wakeup a few hundred microseconds early and do the remainder in busy-wait |
When running under gdb, I would expect |
What are the units of the values? Are they absolute values and the sign is given by the color of the bar? If the runtime tries to sleep for less than 10 usec, it just returns immediately. So a spurious wakeup of pthread_cond_timedwait at the right time can lead to a negative lag. You can try changing MIN_SLEEP_DURATION to 0 in |
The y-axis is the frequency (aka the number of times we see such lag) in log scale and the x-axis is in microseconds. The negative lags are really small in the absolute value, like maybe <10 nanoseconds; therefore they were getting put in the same bucket as small positive values in the histogram, I did the color change to signify the difference in the signs. It should technically not be in absolute values -- but the histogram shows them almost about 0 -- I believe it's because the values are really small compared to the positive lags. The MIN_SLEEP_DURATION was also my theory, but I haven't tested it yet. But to re-iterate one of my points from above, in Raspberry Pi 4b, the function that waits is At the same time, if the wait time is less than the |
What is missing from my picture is this: What is the critical difference of the behaviour of the macOS benchmark and the RPi benchmark that you want to attribute to
10usec was an unfounded guess at the overhead involved in doing a |
I like Edwards proposal in the original post. We could learn, at runtime, the sleep overhead for the platforms and correct for it. This is a standard a control problem and a PI controller would probably manage it nicely. |
I'm pretty sure MIN_SLEEP_DURATION has nothing to do with the initial observation because the timer periods is quite large. |
MIN_SLEEP_DURATION can explain why we could have a small negative lag sometimes. The initial observation could be due to several things:
In fact, point 2 here is why I am advocating just sticking with CLOCK_REALTIME and say that our real-time performance is no better than the performance of the system clock. So it is up to the user to configure his system correctly (i.e. disable clock-stepping for NTP or PTP clients running) |
Yes, all good points. I also agree on sticking to CLOCK_REALTIME. I'm just wondering whether this should be an overridable default. |
The only MacOS measurement I have run was over Edward's laptop, I'll go ahead and make some measurements over running it multiple times over Mac as well and share the histogram.
I think how bad negative lag is depends on our expected behavior, I believe our current definition for physical action is min_delay based for example, and at that point, I think the timing guarantee shouldn't have any negative lag, I'm not sure what would be the best definition for timing accuracy here -- since we also want relatively small lag.
I'm not sure on macOS, but nanosleep uses CLOCK_MONOTONIC in Linux based systems. So another potential implementation change would be changing the function being used instead of changing the base clock. |
To summarize: I have three proposals for reducing lag on macOS (and Linux) which can be implemented as three distinct PRs all aiming to deliver better real-time performance for reactor-c
|
Related discussion: #289.
At least on macOS, we get a lag (physical time minus logical time) that is always larger than what I would expect. @lsk567 and I looked at this yesterday with the following extremely simple LF program:
In this program, the runtime system will call
lf_cond_timedwait()
to wait for physical time to match each multiple of 1ms before invoking the reaction.lf_cond_timedwait()
calls the POSIX functionpthread_cond_timedwait()
on macOS (and presumably on Linux as well). We inserted the following code at the point in the runtime right after it returns fromlf_cond_timedwait()
.This inserts a "user event" into the trace file. We then converted the trace file to CSV using
trace_to_csv
and looked at the times at which things are occurring. The trace looks something like this:Here, the last column is physical time as reported by
lf_time_physical_elapsed()
. The above section of the trace is when the advancing from 2ms to 3ms in logical time is occurring.What we found was pretty interesting.
pthread_cond_timedwait()
consistently returns later than you ask it to. In the above trace, it is 291us late.pthread_cond_timedwait()
returns and the reaction gets invoked is pretty small. In the above trace, it is 5us, although it is often as low as 1us. 1us is the smallest time increment we can measure withlf_time_physical()
because the macOS clock has a resolution of 1us.lf_cond_timedwait()
sometimes gets invoked many times in rapid succession. It gets called several dozen times rapidly. Why?The lag we are seeing is mostly due to
pthread_cond_timedwait()
, not to our runtime system. There are some obvious questions:pthread_cond_timedwait()
using the same clock aslf_time_physical()
? If not, this could account for what we are seeing.pthread_cond_timedwait()
andlf_time_physical()
and correct for it by pre-subtracting the 200us from the argument given topthread_cond_timedwait()
?The text was updated successfully, but these errors were encountered: