Skip to content

Commit b8c34c9

Browse files
committed
Auto merge of #119 - alexcrichton:less-generics, r=Amanieu
Remove most `#[inline]` annotations This commit goes through and deletes almost all `#[inline]` annotations in this crate. It looks like before this commit basically every single function is `#[inline]`, but this is generally not necessary for performance and can have a severe impact on compile times in both debug and release modes, most severely in release mode. Some `#[inline]` annotations are definitely necessary, however. Most functions in this crate are already candidates for inlining because they're generic, but functions like `Group` and `BitMask` aren't candidates for inlining without `#[inline]`. Additionally LLVM is by no means perfect, so some `#[inline]` may still be necessary to get some further speedups. The procedure used to generate this commit looked like: * Remove all `#[inline]` annotations. * Run `cargo bench`, comparing against the `master` branch, and add `#[inline]` to hot spots as necessary. * A [PR] was made against rust-lang/rust to [evaluate the impact][run1] on the compiler for more performance data. * Using this data, `perf diff` was used locally to determine further hot spots and more `#[inline]` annotations were added. * A [second round of benchmarking][run2] was done The numbers are at the point where I think this should land in the crate and get published to move into the standard library. There are up to 20% wins in compile time for hashmap-heavy crates (like Cargo) and milder wins (up to 10%) for a number of other large crates. The regressions are all in the 1-3% range and are largely on benchmarks taking a few handful of milliseconds anyway, which I'd personally say is a worthwhile tradeoff. For comparison, the benchmarks of this crate before and after this commit look like so: ``` name baseline ns/iter new ns/iter diff ns/iter diff % speedup insert_ahash_highbits 7,137 9,044 1,907 26.72% x 0.79 insert_ahash_random 7,575 9,789 2,214 29.23% x 0.77 insert_ahash_serial 9,833 9,476 -357 -3.63% x 1.04 insert_erase_ahash_highbits 15,824 19,164 3,340 21.11% x 0.83 insert_erase_ahash_random 16,933 20,353 3,420 20.20% x 0.83 insert_erase_ahash_serial 20,857 27,675 6,818 32.69% x 0.75 insert_erase_std_highbits 35,117 38,385 3,268 9.31% x 0.91 insert_erase_std_random 35,357 37,236 1,879 5.31% x 0.95 insert_erase_std_serial 30,617 34,136 3,519 11.49% x 0.90 insert_std_highbits 15,675 18,180 2,505 15.98% x 0.86 insert_std_random 16,566 17,803 1,237 7.47% x 0.93 insert_std_serial 14,612 16,025 1,413 9.67% x 0.91 iter_ahash_highbits 1,715 1,640 -75 -4.37% x 1.05 iter_ahash_random 1,721 1,634 -87 -5.06% x 1.05 iter_ahash_serial 1,723 1,636 -87 -5.05% x 1.05 iter_std_highbits 1,715 1,634 -81 -4.72% x 1.05 iter_std_random 1,715 1,637 -78 -4.55% x 1.05 iter_std_serial 1,722 1,637 -85 -4.94% x 1.05 lookup_ahash_highbits 4,565 5,809 1,244 27.25% x 0.79 lookup_ahash_random 4,632 4,047 -585 -12.63% x 1.14 lookup_ahash_serial 4,612 4,906 294 6.37% x 0.94 lookup_fail_ahash_highbits 4,206 3,976 -230 -5.47% x 1.06 lookup_fail_ahash_random 4,327 4,211 -116 -2.68% x 1.03 lookup_fail_ahash_serial 8,999 4,386 -4,613 -51.26% x 2.05 lookup_fail_std_highbits 13,284 13,342 58 0.44% x 1.00 lookup_fail_std_random 13,172 13,614 442 3.36% x 0.97 lookup_fail_std_serial 11,240 11,539 299 2.66% x 0.97 lookup_std_highbits 13,075 13,333 258 1.97% x 0.98 lookup_std_random 13,257 13,193 -64 -0.48% x 1.00 lookup_std_serial 10,782 10,917 135 1.25% x 0.99 ``` The summary of this from what I can tell is that the microbenchmarks are sort of all over the place, but they're neither consistently regressing nor improving, as expected. In general I would be surprised if there's much of a significant performance regression attributed to this commit, and `#[inline]` can always be selectively added back in easily without adding it to every function in the crate. [PR]: rust-lang/rust#64846 [run1]: rust-lang/rust#64846 (comment) [run2]: rust-lang/rust#64846 (comment)
2 parents a9bdbe3 + 4e9e27d commit b8c34c9

File tree

12 files changed

+307
-301
lines changed

12 files changed

+307
-301
lines changed

Cargo.toml

+5
Original file line numberDiff line numberDiff line change
@@ -43,5 +43,10 @@ rustc-internal-api = []
4343
rustc-dep-of-std = ["nightly", "core", "compiler_builtins", "alloc", "rustc-internal-api"]
4444
raw = []
4545

46+
# Enables usage of `#[inline]` on far more functions than by default in this
47+
# crate. This may lead to a performance increase but often comes at a compile
48+
# time cost.
49+
inline-more = []
50+
4651
[package.metadata.docs.rs]
4752
features = ["nightly", "rayon", "serde", "raw"]

build.rs

+1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
fn main() {
2+
println!("cargo:rerun-if-changed=build.rs");
23
let nightly = std::env::var_os("CARGO_FEATURE_NIGHTLY").is_some();
34
let has_stable_alloc = || autocfg::new().probe_rustc_version(1, 36);
45

src/external_trait_impls/rayon/map.rs

+17-17
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ pub struct ParIter<'a, K, V, S> {
2222
impl<'a, K: Sync, V: Sync, S: Sync> ParallelIterator for ParIter<'a, K, V, S> {
2323
type Item = (&'a K, &'a V);
2424

25-
#[inline]
25+
#[cfg_attr(feature = "inline-more", inline)]
2626
fn drive_unindexed<C>(self, consumer: C) -> C::Result
2727
where
2828
C: UnindexedConsumer<Self::Item>,
@@ -39,7 +39,7 @@ impl<'a, K: Sync, V: Sync, S: Sync> ParallelIterator for ParIter<'a, K, V, S> {
3939
}
4040

4141
impl<K, V, S> Clone for ParIter<'_, K, V, S> {
42-
#[inline]
42+
#[cfg_attr(feature = "inline-more", inline)]
4343
fn clone(&self) -> Self {
4444
ParIter { map: self.map }
4545
}
@@ -65,7 +65,7 @@ pub struct ParKeys<'a, K, V, S> {
6565
impl<'a, K: Sync, V: Sync, S: Sync> ParallelIterator for ParKeys<'a, K, V, S> {
6666
type Item = &'a K;
6767

68-
#[inline]
68+
#[cfg_attr(feature = "inline-more", inline)]
6969
fn drive_unindexed<C>(self, consumer: C) -> C::Result
7070
where
7171
C: UnindexedConsumer<Self::Item>,
@@ -79,7 +79,7 @@ impl<'a, K: Sync, V: Sync, S: Sync> ParallelIterator for ParKeys<'a, K, V, S> {
7979
}
8080

8181
impl<K, V, S> Clone for ParKeys<'_, K, V, S> {
82-
#[inline]
82+
#[cfg_attr(feature = "inline-more", inline)]
8383
fn clone(&self) -> Self {
8484
ParKeys { map: self.map }
8585
}
@@ -105,7 +105,7 @@ pub struct ParValues<'a, K, V, S> {
105105
impl<'a, K: Sync, V: Sync, S: Sync> ParallelIterator for ParValues<'a, K, V, S> {
106106
type Item = &'a V;
107107

108-
#[inline]
108+
#[cfg_attr(feature = "inline-more", inline)]
109109
fn drive_unindexed<C>(self, consumer: C) -> C::Result
110110
where
111111
C: UnindexedConsumer<Self::Item>,
@@ -119,7 +119,7 @@ impl<'a, K: Sync, V: Sync, S: Sync> ParallelIterator for ParValues<'a, K, V, S>
119119
}
120120

121121
impl<K, V, S> Clone for ParValues<'_, K, V, S> {
122-
#[inline]
122+
#[cfg_attr(feature = "inline-more", inline)]
123123
fn clone(&self) -> Self {
124124
ParValues { map: self.map }
125125
}
@@ -147,7 +147,7 @@ pub struct ParIterMut<'a, K, V, S> {
147147
impl<'a, K: Send + Sync, V: Send, S: Send> ParallelIterator for ParIterMut<'a, K, V, S> {
148148
type Item = (&'a K, &'a mut V);
149149

150-
#[inline]
150+
#[cfg_attr(feature = "inline-more", inline)]
151151
fn drive_unindexed<C>(self, consumer: C) -> C::Result
152152
where
153153
C: UnindexedConsumer<Self::Item>,
@@ -185,7 +185,7 @@ pub struct ParValuesMut<'a, K, V, S> {
185185
impl<'a, K: Send, V: Send, S: Send> ParallelIterator for ParValuesMut<'a, K, V, S> {
186186
type Item = &'a mut V;
187187

188-
#[inline]
188+
#[cfg_attr(feature = "inline-more", inline)]
189189
fn drive_unindexed<C>(self, consumer: C) -> C::Result
190190
where
191191
C: UnindexedConsumer<Self::Item>,
@@ -220,7 +220,7 @@ pub struct IntoParIter<K, V, S> {
220220
impl<K: Send, V: Send, S: Send> ParallelIterator for IntoParIter<K, V, S> {
221221
type Item = (K, V);
222222

223-
#[inline]
223+
#[cfg_attr(feature = "inline-more", inline)]
224224
fn drive_unindexed<C>(self, consumer: C) -> C::Result
225225
where
226226
C: UnindexedConsumer<Self::Item>,
@@ -249,7 +249,7 @@ pub struct ParDrain<'a, K, V, S> {
249249
impl<K: Send, V: Send, S: Send> ParallelIterator for ParDrain<'_, K, V, S> {
250250
type Item = (K, V);
251251

252-
#[inline]
252+
#[cfg_attr(feature = "inline-more", inline)]
253253
fn drive_unindexed<C>(self, consumer: C) -> C::Result
254254
where
255255
C: UnindexedConsumer<Self::Item>,
@@ -268,28 +268,28 @@ impl<K: fmt::Debug + Eq + Hash, V: fmt::Debug, S: BuildHasher> fmt::Debug
268268

269269
impl<K: Sync, V: Sync, S: Sync> HashMap<K, V, S> {
270270
/// Visits (potentially in parallel) immutably borrowed keys in an arbitrary order.
271-
#[inline]
271+
#[cfg_attr(feature = "inline-more", inline)]
272272
pub fn par_keys(&self) -> ParKeys<'_, K, V, S> {
273273
ParKeys { map: self }
274274
}
275275

276276
/// Visits (potentially in parallel) immutably borrowed values in an arbitrary order.
277-
#[inline]
277+
#[cfg_attr(feature = "inline-more", inline)]
278278
pub fn par_values(&self) -> ParValues<'_, K, V, S> {
279279
ParValues { map: self }
280280
}
281281
}
282282

283283
impl<K: Send, V: Send, S: Send> HashMap<K, V, S> {
284284
/// Visits (potentially in parallel) mutably borrowed values in an arbitrary order.
285-
#[inline]
285+
#[cfg_attr(feature = "inline-more", inline)]
286286
pub fn par_values_mut(&mut self) -> ParValuesMut<'_, K, V, S> {
287287
ParValuesMut { map: self }
288288
}
289289

290290
/// Consumes (potentially in parallel) all values in an arbitrary order,
291291
/// while preserving the map's allocated memory for reuse.
292-
#[inline]
292+
#[cfg_attr(feature = "inline-more", inline)]
293293
pub fn par_drain(&mut self) -> ParDrain<'_, K, V, S> {
294294
ParDrain { map: self }
295295
}
@@ -317,7 +317,7 @@ impl<K: Send, V: Send, S: Send> IntoParallelIterator for HashMap<K, V, S> {
317317
type Item = (K, V);
318318
type Iter = IntoParIter<K, V, S>;
319319

320-
#[inline]
320+
#[cfg_attr(feature = "inline-more", inline)]
321321
fn into_par_iter(self) -> Self::Iter {
322322
IntoParIter { map: self }
323323
}
@@ -327,7 +327,7 @@ impl<'a, K: Sync, V: Sync, S: Sync> IntoParallelIterator for &'a HashMap<K, V, S
327327
type Item = (&'a K, &'a V);
328328
type Iter = ParIter<'a, K, V, S>;
329329

330-
#[inline]
330+
#[cfg_attr(feature = "inline-more", inline)]
331331
fn into_par_iter(self) -> Self::Iter {
332332
ParIter { map: self }
333333
}
@@ -337,7 +337,7 @@ impl<'a, K: Send + Sync, V: Send, S: Send> IntoParallelIterator for &'a mut Hash
337337
type Item = (&'a K, &'a mut V);
338338
type Iter = ParIterMut<'a, K, V, S>;
339339

340-
#[inline]
340+
#[cfg_attr(feature = "inline-more", inline)]
341341
fn into_par_iter(self) -> Self::Iter {
342342
ParIterMut { map: self }
343343
}

src/external_trait_impls/rayon/raw.rs

+11-11
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ pub struct RawParIter<T> {
1818
impl<T> ParallelIterator for RawParIter<T> {
1919
type Item = Bucket<T>;
2020

21-
#[inline]
21+
#[cfg_attr(feature = "inline-more", inline)]
2222
fn drive_unindexed<C>(self, consumer: C) -> C::Result
2323
where
2424
C: UnindexedConsumer<Self::Item>,
@@ -36,15 +36,15 @@ struct ParIterProducer<T> {
3636
impl<T> UnindexedProducer for ParIterProducer<T> {
3737
type Item = Bucket<T>;
3838

39-
#[inline]
39+
#[cfg_attr(feature = "inline-more", inline)]
4040
fn split(self) -> (Self, Option<Self>) {
4141
let (left, right) = self.iter.split();
4242
let left = ParIterProducer { iter: left };
4343
let right = right.map(|right| ParIterProducer { iter: right });
4444
(left, right)
4545
}
4646

47-
#[inline]
47+
#[cfg_attr(feature = "inline-more", inline)]
4848
fn fold_with<F>(self, folder: F) -> F
4949
where
5050
F: Folder<Self::Item>,
@@ -61,7 +61,7 @@ pub struct RawIntoParIter<T> {
6161
impl<T: Send> ParallelIterator for RawIntoParIter<T> {
6262
type Item = T;
6363

64-
#[inline]
64+
#[cfg_attr(feature = "inline-more", inline)]
6565
fn drive_unindexed<C>(self, consumer: C) -> C::Result
6666
where
6767
C: UnindexedConsumer<Self::Item>,
@@ -92,7 +92,7 @@ unsafe impl<T> Send for RawParDrain<'_, T> {}
9292
impl<T: Send> ParallelIterator for RawParDrain<'_, T> {
9393
type Item = T;
9494

95-
#[inline]
95+
#[cfg_attr(feature = "inline-more", inline)]
9696
fn drive_unindexed<C>(self, consumer: C) -> C::Result
9797
where
9898
C: UnindexedConsumer<Self::Item>,
@@ -123,7 +123,7 @@ struct ParDrainProducer<T> {
123123
impl<T: Send> UnindexedProducer for ParDrainProducer<T> {
124124
type Item = T;
125125

126-
#[inline]
126+
#[cfg_attr(feature = "inline-more", inline)]
127127
fn split(self) -> (Self, Option<Self>) {
128128
let (left, right) = self.iter.clone().split();
129129
mem::forget(self);
@@ -132,7 +132,7 @@ impl<T: Send> UnindexedProducer for ParDrainProducer<T> {
132132
(left, right)
133133
}
134134

135-
#[inline]
135+
#[cfg_attr(feature = "inline-more", inline)]
136136
fn fold_with<F>(mut self, mut folder: F) -> F
137137
where
138138
F: Folder<Self::Item>,
@@ -153,7 +153,7 @@ impl<T: Send> UnindexedProducer for ParDrainProducer<T> {
153153
}
154154

155155
impl<T> Drop for ParDrainProducer<T> {
156-
#[inline]
156+
#[cfg_attr(feature = "inline-more", inline)]
157157
fn drop(&mut self) {
158158
// Drop all remaining elements
159159
if mem::needs_drop::<T>() {
@@ -168,22 +168,22 @@ impl<T> Drop for ParDrainProducer<T> {
168168

169169
impl<T> RawTable<T> {
170170
/// Returns a parallel iterator over the elements in a `RawTable`.
171-
#[inline]
171+
#[cfg_attr(feature = "inline-more", inline)]
172172
pub fn par_iter(&self) -> RawParIter<T> {
173173
RawParIter {
174174
iter: unsafe { self.iter().iter },
175175
}
176176
}
177177

178178
/// Returns a parallel iterator over the elements in a `RawTable`.
179-
#[inline]
179+
#[cfg_attr(feature = "inline-more", inline)]
180180
pub fn into_par_iter(self) -> RawIntoParIter<T> {
181181
RawIntoParIter { table: self }
182182
}
183183

184184
/// Returns a parallel iterator which consumes all elements of a `RawTable`
185185
/// without freeing its memory allocation.
186-
#[inline]
186+
#[cfg_attr(feature = "inline-more", inline)]
187187
pub fn par_drain(&mut self) -> RawParDrain<'_, T> {
188188
RawParDrain {
189189
table: NonNull::from(self),

src/external_trait_impls/rayon/set.rs

+7-7
Original file line numberDiff line numberDiff line change
@@ -214,14 +214,14 @@ where
214214
{
215215
/// Visits (potentially in parallel) the values representing the difference,
216216
/// i.e. the values that are in `self` but not in `other`.
217-
#[inline]
217+
#[cfg_attr(feature = "inline-more", inline)]
218218
pub fn par_difference<'a>(&'a self, other: &'a Self) -> ParDifference<'a, T, S> {
219219
ParDifference { a: self, b: other }
220220
}
221221

222222
/// Visits (potentially in parallel) the values representing the symmetric
223223
/// difference, i.e. the values that are in `self` or in `other` but not in both.
224-
#[inline]
224+
#[cfg_attr(feature = "inline-more", inline)]
225225
pub fn par_symmetric_difference<'a>(
226226
&'a self,
227227
other: &'a Self,
@@ -231,14 +231,14 @@ where
231231

232232
/// Visits (potentially in parallel) the values representing the
233233
/// intersection, i.e. the values that are both in `self` and `other`.
234-
#[inline]
234+
#[cfg_attr(feature = "inline-more", inline)]
235235
pub fn par_intersection<'a>(&'a self, other: &'a Self) -> ParIntersection<'a, T, S> {
236236
ParIntersection { a: self, b: other }
237237
}
238238

239239
/// Visits (potentially in parallel) the values representing the union,
240240
/// i.e. all the values in `self` or `other`, without duplicates.
241-
#[inline]
241+
#[cfg_attr(feature = "inline-more", inline)]
242242
pub fn par_union<'a>(&'a self, other: &'a Self) -> ParUnion<'a, T, S> {
243243
ParUnion { a: self, b: other }
244244
}
@@ -287,7 +287,7 @@ where
287287
{
288288
/// Consumes (potentially in parallel) all values in an arbitrary order,
289289
/// while preserving the set's allocated memory for reuse.
290-
#[inline]
290+
#[cfg_attr(feature = "inline-more", inline)]
291291
pub fn par_drain(&mut self) -> ParDrain<'_, T, S> {
292292
ParDrain { set: self }
293293
}
@@ -297,7 +297,7 @@ impl<T: Send, S: Send> IntoParallelIterator for HashSet<T, S> {
297297
type Item = T;
298298
type Iter = IntoParIter<T, S>;
299299

300-
#[inline]
300+
#[cfg_attr(feature = "inline-more", inline)]
301301
fn into_par_iter(self) -> Self::Iter {
302302
IntoParIter { set: self }
303303
}
@@ -307,7 +307,7 @@ impl<'a, T: Sync, S: Sync> IntoParallelIterator for &'a HashSet<T, S> {
307307
type Item = &'a T;
308308
type Iter = ParIter<'a, T, S>;
309309

310-
#[inline]
310+
#[cfg_attr(feature = "inline-more", inline)]
311311
fn into_par_iter(self) -> Self::Iter {
312312
ParIter { set: self }
313313
}

src/external_trait_impls/serde.rs

+6-6
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ mod size_hint {
44
/// This presumably exists to prevent denial of service attacks.
55
///
66
/// Original discussion: https://github.com/serde-rs/serde/issues/1114.
7-
#[inline]
7+
#[cfg_attr(feature = "inline-more", inline)]
88
pub(super) fn cautious(hint: Option<usize>) -> usize {
99
cmp::min(hint.unwrap_or(0), 4096)
1010
}
@@ -27,7 +27,7 @@ mod map {
2727
V: Serialize,
2828
H: BuildHasher,
2929
{
30-
#[inline]
30+
#[cfg_attr(feature = "inline-more", inline)]
3131
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
3232
where
3333
S: Serializer,
@@ -62,7 +62,7 @@ mod map {
6262
formatter.write_str("a map")
6363
}
6464

65-
#[inline]
65+
#[cfg_attr(feature = "inline-more", inline)]
6666
fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
6767
where
6868
A: MapAccess<'de>,
@@ -104,7 +104,7 @@ mod set {
104104
T: Serialize + Eq + Hash,
105105
H: BuildHasher,
106106
{
107-
#[inline]
107+
#[cfg_attr(feature = "inline-more", inline)]
108108
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
109109
where
110110
S: Serializer,
@@ -137,7 +137,7 @@ mod set {
137137
formatter.write_str("a sequence")
138138
}
139139

140-
#[inline]
140+
#[cfg_attr(feature = "inline-more", inline)]
141141
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
142142
where
143143
A: SeqAccess<'de>,
@@ -178,7 +178,7 @@ mod set {
178178
formatter.write_str("a sequence")
179179
}
180180

181-
#[inline]
181+
#[cfg_attr(feature = "inline-more", inline)]
182182
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
183183
where
184184
A: SeqAccess<'de>,

0 commit comments

Comments
 (0)