|
| 1 | +Lightweight PI-futexes |
| 2 | +---------------------- |
| 3 | + |
| 4 | +We are calling them lightweight for 3 reasons: |
| 5 | + |
| 6 | + - in the user-space fastpath a PI-enabled futex involves no kernel work |
| 7 | + (or any other PI complexity) at all. No registration, no extra kernel |
| 8 | + calls - just pure fast atomic ops in userspace. |
| 9 | + |
| 10 | + - even in the slowpath, the system call and scheduling pattern is very |
| 11 | + similar to normal futexes. |
| 12 | + |
| 13 | + - the in-kernel PI implementation is streamlined around the mutex |
| 14 | + abstraction, with strict rules that keep the implementation |
| 15 | + relatively simple: only a single owner may own a lock (i.e. no |
| 16 | + read-write lock support), only the owner may unlock a lock, no |
| 17 | + recursive locking, etc. |
| 18 | + |
| 19 | +Priority Inheritance - why? |
| 20 | +--------------------------- |
| 21 | + |
| 22 | +The short reply: user-space PI helps achieving/improving determinism for |
| 23 | +user-space applications. In the best-case, it can help achieve |
| 24 | +determinism and well-bound latencies. Even in the worst-case, PI will |
| 25 | +improve the statistical distribution of locking related application |
| 26 | +delays. |
| 27 | + |
| 28 | +The longer reply: |
| 29 | +----------------- |
| 30 | + |
| 31 | +Firstly, sharing locks between multiple tasks is a common programming |
| 32 | +technique that often cannot be replaced with lockless algorithms. As we |
| 33 | +can see it in the kernel [which is a quite complex program in itself], |
| 34 | +lockless structures are rather the exception than the norm - the current |
| 35 | +ratio of lockless vs. locky code for shared data structures is somewhere |
| 36 | +between 1:10 and 1:100. Lockless is hard, and the complexity of lockless |
| 37 | +algorithms often endangers to ability to do robust reviews of said code. |
| 38 | +I.e. critical RT apps often choose lock structures to protect critical |
| 39 | +data structures, instead of lockless algorithms. Furthermore, there are |
| 40 | +cases (like shared hardware, or other resource limits) where lockless |
| 41 | +access is mathematically impossible. |
| 42 | + |
| 43 | +Media players (such as Jack) are an example of reasonable application |
| 44 | +design with multiple tasks (with multiple priority levels) sharing |
| 45 | +short-held locks: for example, a highprio audio playback thread is |
| 46 | +combined with medium-prio construct-audio-data threads and low-prio |
| 47 | +display-colory-stuff threads. Add video and decoding to the mix and |
| 48 | +we've got even more priority levels. |
| 49 | + |
| 50 | +So once we accept that synchronization objects (locks) are an |
| 51 | +unavoidable fact of life, and once we accept that multi-task userspace |
| 52 | +apps have a very fair expectation of being able to use locks, we've got |
| 53 | +to think about how to offer the option of a deterministic locking |
| 54 | +implementation to user-space. |
| 55 | + |
| 56 | +Most of the technical counter-arguments against doing priority |
| 57 | +inheritance only apply to kernel-space locks. But user-space locks are |
| 58 | +different, there we cannot disable interrupts or make the task |
| 59 | +non-preemptible in a critical section, so the 'use spinlocks' argument |
| 60 | +does not apply (user-space spinlocks have the same priority inversion |
| 61 | +problems as other user-space locking constructs). Fact is, pretty much |
| 62 | +the only technique that currently enables good determinism for userspace |
| 63 | +locks (such as futex-based pthread mutexes) is priority inheritance: |
| 64 | + |
| 65 | +Currently (without PI), if a high-prio and a low-prio task shares a lock |
| 66 | +[this is a quite common scenario for most non-trivial RT applications], |
| 67 | +even if all critical sections are coded carefully to be deterministic |
| 68 | +(i.e. all critical sections are short in duration and only execute a |
| 69 | +limited number of instructions), the kernel cannot guarantee any |
| 70 | +deterministic execution of the high-prio task: any medium-priority task |
| 71 | +could preempt the low-prio task while it holds the shared lock and |
| 72 | +executes the critical section, and could delay it indefinitely. |
| 73 | + |
| 74 | +Implementation: |
| 75 | +--------------- |
| 76 | + |
| 77 | +As mentioned before, the userspace fastpath of PI-enabled pthread |
| 78 | +mutexes involves no kernel work at all - they behave quite similarly to |
| 79 | +normal futex-based locks: a 0 value means unlocked, and a value==TID |
| 80 | +means locked. (This is the same method as used by list-based robust |
| 81 | +futexes.) Userspace uses atomic ops to lock/unlock these mutexes without |
| 82 | +entering the kernel. |
| 83 | + |
| 84 | +To handle the slowpath, we have added two new futex ops: |
| 85 | + |
| 86 | + FUTEX_LOCK_PI |
| 87 | + FUTEX_UNLOCK_PI |
| 88 | + |
| 89 | +If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to |
| 90 | +TID fails], then FUTEX_LOCK_PI is called. The kernel does all the |
| 91 | +remaining work: if there is no futex-queue attached to the futex address |
| 92 | +yet then the code looks up the task that owns the futex [it has put its |
| 93 | +own TID into the futex value], and attaches a 'PI state' structure to |
| 94 | +the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware, |
| 95 | +kernel-based synchronization object. The 'other' task is made the owner |
| 96 | +of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the |
| 97 | +futex value. Then this task tries to lock the rt-mutex, on which it |
| 98 | +blocks. Once it returns, it has the mutex acquired, and it sets the |
| 99 | +futex value to its own TID and returns. Userspace has no other work to |
| 100 | +perform - it now owns the lock, and futex value contains |
| 101 | +FUTEX_WAITERS|TID. |
| 102 | + |
| 103 | +If the unlock side fastpath succeeds, [i.e. userspace manages to do a |
| 104 | +TID -> 0 atomic transition of the futex value], then no kernel work is |
| 105 | +triggered. |
| 106 | + |
| 107 | +If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), |
| 108 | +then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the |
| 109 | +behalf of userspace - and it also unlocks the attached |
| 110 | +pi_state->rt_mutex and thus wakes up any potential waiters. |
| 111 | + |
| 112 | +Note that under this approach, contrary to previous PI-futex approaches, |
| 113 | +there is no prior 'registration' of a PI-futex. [which is not quite |
| 114 | +possible anyway, due to existing ABI properties of pthread mutexes.] |
| 115 | + |
| 116 | +Also, under this scheme, 'robustness' and 'PI' are two orthogonal |
| 117 | +properties of futexes, and all four combinations are possible: futex, |
| 118 | +robust-futex, PI-futex, robust+PI-futex. |
| 119 | + |
| 120 | +More details about priority inheritance can be found in |
| 121 | +Documentation/rtmutex.txt. |
0 commit comments