Flag the Repr::repr function with #[inline] #9032

alexcrichton · 2013-09-07T05:36:03Z

This allows cross-crate inlining which is very good because this is called a
lot throughout libstd (even when libstd is inlined across crates).

In one of my projects, I have a test case with the following performance characteristics

commit	optimization level	runtime (seconds)
before	O2	22s
before	O3	107s
after	O2	13s
after	O3	12s

I'm a bit disturbed by the 107s runtime from O3 before this commit. The performance characteristics of this test involve doing an absurd amount of small operations. A huge portion of this is creating hashmaps which involves allocating vectors.

The worst portions of the profile are:

Which as you can see looks like some serious problems with inlining. I would expect the hash map methods to be high up in the profile, but the top 9 callers of cast::transmute_copy were Repr::repr's various monomorphized instances.

I wish there we a better way to detect things like this in the future, and it's unfortunate that this is required for performance in the first place. I suppose I'm not entirely sure why this is needed because all of the methods should have been generated in-crate (monomorphized versions of library functions), so they should have gotten inlined? It also could just be that by modifying LLVM's idea of the inline cost of this function it was able to inline it in many more locations.

This allows cross-crate inlining which is *very* good because this is called a lot throughout libstd (even when libstd is inlined across crates).

thestinger · 2013-09-07T16:28:58Z

It's not an issue with LLVM. If you mark a function as #[inline] we emit the AST into the metadata and compile it again, so anything it calls in libstd is a cross-crate call.

This allows cross-crate inlining which is *very* good because this is called a lot throughout libstd (even when libstd is inlined across crates). In one of my projects, I have a test case with the following performance characteristics commit | optimization level | runtime (seconds) ----|------|---- before | O2 | 22s before | O3 | 107s after | O2 | 13s after | O3 | 12s I'm a bit disturbed by the 107s runtime from O3 before this commit. The performance characteristics of this test involve doing an absurd amount of small operations. A huge portion of this is creating hashmaps which involves allocating vectors. The worst portions of the profile are: ![screen shot 2013-09-06 at 10 32 15 pm](https://f.cloud.github.com/assets/64996/1100723/e5e8744c-177e-11e3-83fc-ddc5f18c60f9.png) Which as you can see looks like some *serious* problems with inlining. I would expect the hash map methods to be high up in the profile, but the top 9 callers of `cast::transmute_copy` were `Repr::repr`'s various monomorphized instances. I wish there we a better way to detect things like this in the future, and it's unfortunate that this is required for performance in the first place. I suppose I'm not entirely sure why this is needed because all of the methods should have been generated in-crate (monomorphized versions of library functions), so they should have gotten inlined? It also could just be that by modifying LLVM's idea of the inline cost of this function it was able to inline it in many more locations.

alexcrichton · 2013-09-07T16:57:11Z

It seems weird though because wherever Repr::repr is compiled to, it should inline cast::transmute_copy into itself, right?

thestinger · 2013-09-07T16:58:30Z

There are no guarantees about inlining those, because there are undefined bitcasts from type_use.

thestinger · 2013-09-07T16:59:58Z

Can you replicate it with -Z no-monomorphic-collapse?

alexcrichton · 2013-09-07T18:24:09Z

I cannot replicate it with no-monomorphic-collapse, interesting. The runtime goes from 23s => 11s with that flag.

thestinger · 2013-09-07T18:24:46Z

Yep, so it's just from undefined behaviour in our IR.

alexcrichton · 2013-09-07T18:25:35Z

There was talk of removing that part of the compiler, right?

thestinger · 2013-09-07T18:29:20Z

Yes, but it hasn't happened. The mergefunc pass hits an assert on the code produced by rustc when trying a build with it enabled.

enum_variant_names should ignore when all prefixes are _ close rust-lang#9018 When Enum prefix is only an underscore, we should not issue warnings. changelog: fix false positive in enum_variant_names

Flag the Repr::repr function with #[inline]

739df23

This allows cross-crate inlining which is *very* good because this is called a lot throughout libstd (even when libstd is inlined across crates).

bors closed this Sep 7, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flag the Repr::repr function with #[inline] #9032

Flag the Repr::repr function with #[inline] #9032

alexcrichton commented Sep 7, 2013

thestinger commented Sep 7, 2013

alexcrichton commented Sep 7, 2013

thestinger commented Sep 7, 2013

thestinger commented Sep 7, 2013

alexcrichton commented Sep 7, 2013

thestinger commented Sep 7, 2013

alexcrichton commented Sep 7, 2013

thestinger commented Sep 7, 2013

Flag the Repr::repr function with #[inline] #9032

Flag the Repr::repr function with #[inline] #9032

Conversation

alexcrichton commented Sep 7, 2013

thestinger commented Sep 7, 2013

alexcrichton commented Sep 7, 2013

thestinger commented Sep 7, 2013

thestinger commented Sep 7, 2013

alexcrichton commented Sep 7, 2013

thestinger commented Sep 7, 2013

alexcrichton commented Sep 7, 2013

thestinger commented Sep 7, 2013