Skip to content

Commit a6537be

Browse files
rostedtLinus Torvalds
authored and
Linus Torvalds
committed
[PATCH] pi-futex: rt mutex docs
Add rt-mutex documentation. [rostedt@goodmis.org: Update rt-mutex-design.txt as per Randy Dunlap suggestions] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1 parent 23f78d4 commit a6537be

File tree

3 files changed

+981
-0
lines changed

3 files changed

+981
-0
lines changed

Documentation/pi-futex.txt

+121
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
Lightweight PI-futexes
2+
----------------------
3+
4+
We are calling them lightweight for 3 reasons:
5+
6+
- in the user-space fastpath a PI-enabled futex involves no kernel work
7+
(or any other PI complexity) at all. No registration, no extra kernel
8+
calls - just pure fast atomic ops in userspace.
9+
10+
- even in the slowpath, the system call and scheduling pattern is very
11+
similar to normal futexes.
12+
13+
- the in-kernel PI implementation is streamlined around the mutex
14+
abstraction, with strict rules that keep the implementation
15+
relatively simple: only a single owner may own a lock (i.e. no
16+
read-write lock support), only the owner may unlock a lock, no
17+
recursive locking, etc.
18+
19+
Priority Inheritance - why?
20+
---------------------------
21+
22+
The short reply: user-space PI helps achieving/improving determinism for
23+
user-space applications. In the best-case, it can help achieve
24+
determinism and well-bound latencies. Even in the worst-case, PI will
25+
improve the statistical distribution of locking related application
26+
delays.
27+
28+
The longer reply:
29+
-----------------
30+
31+
Firstly, sharing locks between multiple tasks is a common programming
32+
technique that often cannot be replaced with lockless algorithms. As we
33+
can see it in the kernel [which is a quite complex program in itself],
34+
lockless structures are rather the exception than the norm - the current
35+
ratio of lockless vs. locky code for shared data structures is somewhere
36+
between 1:10 and 1:100. Lockless is hard, and the complexity of lockless
37+
algorithms often endangers to ability to do robust reviews of said code.
38+
I.e. critical RT apps often choose lock structures to protect critical
39+
data structures, instead of lockless algorithms. Furthermore, there are
40+
cases (like shared hardware, or other resource limits) where lockless
41+
access is mathematically impossible.
42+
43+
Media players (such as Jack) are an example of reasonable application
44+
design with multiple tasks (with multiple priority levels) sharing
45+
short-held locks: for example, a highprio audio playback thread is
46+
combined with medium-prio construct-audio-data threads and low-prio
47+
display-colory-stuff threads. Add video and decoding to the mix and
48+
we've got even more priority levels.
49+
50+
So once we accept that synchronization objects (locks) are an
51+
unavoidable fact of life, and once we accept that multi-task userspace
52+
apps have a very fair expectation of being able to use locks, we've got
53+
to think about how to offer the option of a deterministic locking
54+
implementation to user-space.
55+
56+
Most of the technical counter-arguments against doing priority
57+
inheritance only apply to kernel-space locks. But user-space locks are
58+
different, there we cannot disable interrupts or make the task
59+
non-preemptible in a critical section, so the 'use spinlocks' argument
60+
does not apply (user-space spinlocks have the same priority inversion
61+
problems as other user-space locking constructs). Fact is, pretty much
62+
the only technique that currently enables good determinism for userspace
63+
locks (such as futex-based pthread mutexes) is priority inheritance:
64+
65+
Currently (without PI), if a high-prio and a low-prio task shares a lock
66+
[this is a quite common scenario for most non-trivial RT applications],
67+
even if all critical sections are coded carefully to be deterministic
68+
(i.e. all critical sections are short in duration and only execute a
69+
limited number of instructions), the kernel cannot guarantee any
70+
deterministic execution of the high-prio task: any medium-priority task
71+
could preempt the low-prio task while it holds the shared lock and
72+
executes the critical section, and could delay it indefinitely.
73+
74+
Implementation:
75+
---------------
76+
77+
As mentioned before, the userspace fastpath of PI-enabled pthread
78+
mutexes involves no kernel work at all - they behave quite similarly to
79+
normal futex-based locks: a 0 value means unlocked, and a value==TID
80+
means locked. (This is the same method as used by list-based robust
81+
futexes.) Userspace uses atomic ops to lock/unlock these mutexes without
82+
entering the kernel.
83+
84+
To handle the slowpath, we have added two new futex ops:
85+
86+
FUTEX_LOCK_PI
87+
FUTEX_UNLOCK_PI
88+
89+
If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to
90+
TID fails], then FUTEX_LOCK_PI is called. The kernel does all the
91+
remaining work: if there is no futex-queue attached to the futex address
92+
yet then the code looks up the task that owns the futex [it has put its
93+
own TID into the futex value], and attaches a 'PI state' structure to
94+
the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware,
95+
kernel-based synchronization object. The 'other' task is made the owner
96+
of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the
97+
futex value. Then this task tries to lock the rt-mutex, on which it
98+
blocks. Once it returns, it has the mutex acquired, and it sets the
99+
futex value to its own TID and returns. Userspace has no other work to
100+
perform - it now owns the lock, and futex value contains
101+
FUTEX_WAITERS|TID.
102+
103+
If the unlock side fastpath succeeds, [i.e. userspace manages to do a
104+
TID -> 0 atomic transition of the futex value], then no kernel work is
105+
triggered.
106+
107+
If the unlock fastpath fails (because the FUTEX_WAITERS bit is set),
108+
then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the
109+
behalf of userspace - and it also unlocks the attached
110+
pi_state->rt_mutex and thus wakes up any potential waiters.
111+
112+
Note that under this approach, contrary to previous PI-futex approaches,
113+
there is no prior 'registration' of a PI-futex. [which is not quite
114+
possible anyway, due to existing ABI properties of pthread mutexes.]
115+
116+
Also, under this scheme, 'robustness' and 'PI' are two orthogonal
117+
properties of futexes, and all four combinations are possible: futex,
118+
robust-futex, PI-futex, robust+PI-futex.
119+
120+
More details about priority inheritance can be found in
121+
Documentation/rtmutex.txt.

0 commit comments

Comments
 (0)