Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cranelift: Port bitselect over to ISLE on x64 #3653

Merged
merged 4 commits into from
Jan 6, 2022

Conversation

fitzgen
Copy link
Member

@fitzgen fitzgen commented Jan 6, 2022

This is causing a test failure, as we are generating some unnecessary movdqas now. Haven't had time to look into why this is happening, but I figured I would put this WIP PR up for posterity, along with info on the failing test and unnecessary moves.

CLIF:

function %bitselect_i16x8() -> i16x8 {
block0:
    v0 = vconst.i16x8 [0 0 0 0 0 0 0 0]
    v1 = vconst.i16x8 [0 0 0 0 0 0 0 0]
    v2 = vconst.i16x8 [0 0 0 0 0 0 0 0]
    v3 = bitselect v0, v1, v2
    return v3
}

Before:

VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 11)
  Inst 0:   pushq   %rbp
  Inst 1:   movq    %rsp, %rbp
  Inst 2:   load_const VCodeConstant(0), %xmm0
  Inst 3:   load_const VCodeConstant(0), %xmm1
  Inst 4:   load_const VCodeConstant(0), %xmm2
  Inst 5:   pand    %xmm0, %xmm1
  Inst 6:   pandn   %xmm2, %xmm0
  Inst 7:   por     %xmm1, %xmm0
  Inst 8:   movq    %rbp, %rsp
  Inst 9:   popq    %rbp
  Inst 10:   ret
}}

After:

VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 13)
  Inst 0:   pushq   %rbp
  Inst 1:   movq    %rsp, %rbp
  Inst 2:   load_const VCodeConstant(0), %xmm0
  Inst 3:   load_const VCodeConstant(0), %xmm3
  Inst 4:   load_const VCodeConstant(0), %xmm1
  Inst 5:   movdqa  %xmm0, %xmm2
  Inst 6:   pand    %xmm3, %xmm2
  Inst 7:   pandn   %xmm1, %xmm0
  Inst 8:   por     %xmm0, %xmm2
  Inst 9:   movdqa  %xmm2, %xmm0
  Inst 10:   movq    %rbp, %rsp
  Inst 11:   popq    %rbp
  Inst 12:   ret
}}

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen isle Related to the ISLE domain-specific language labels Jan 6, 2022
@github-actions
Copy link

github-actions bot commented Jan 6, 2022

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@fitzgen
Copy link
Member Author

fitzgen commented Jan 6, 2022

we are generating some unnecessary movdqas now

It looks like the initial movdqa is being inserted by move mitosis, but that the regalloc isn't coalescing and eliding the move:

 TRACE cranelift_codegen::isa::x64::inst       > mov_mitosis(pand    %v1V, %v5V)
 TRACE cranelift_codegen::isa::x64::inst       >   -> movdqa  %v0V, %v5V
 TRACE cranelift_codegen::isa::x64::inst       >   -> pand    %v1V, %v5V

@cfallin any idea why regalloc might not be able to coalesce/elide this move? (See OP for full code)

@fitzgen
Copy link
Member Author

fitzgen commented Jan 6, 2022

It looks like regalloc correctly determines that these virtual registers (v0V and v5V) are in the same e-class / are connected by moves (and therefore could be coalesced?) but isn't actually coalescing them:

 TRACE cranelift_codegen::machinst::compile    > vcode from lowering: 
VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 12)
  Inst 0:   load_const VCodeConstant(0), %v0V
  Inst 1:   load_const VCodeConstant(0), %v1V
  Inst 2:   load_const VCodeConstant(0), %v2V
  Inst 3:   movdqa  %v0V, %v5V
  Inst 4:   pand    %v1V, %v5V
  Inst 5:   movdqa  %v0V, %v6V
  Inst 6:   pandn   %v2V, %v6V
  Inst 7:   movdqa  %v5V, %v3V
  Inst 8:   por     %v6V, %v3V
  Inst 9:   movdqa  %v3V, %v4V
  Inst 10:   movdqa  %v4V, %xmm0
  Inst 11:   ret
}}

...

 TRACE regalloc::bt_coalescing_analysis        > 
 TRACE regalloc::bt_coalescing_analysis        > do_coalescing_analysis: begin
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i3 v5V <- v0V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i5 v6V <- v0V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > reduce cost of vr0 and vr6
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i7 v3V <- v5V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > reduce cost of vr5 and vr3
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i9 v4V <- v3V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > reduce cost of vr3 and vr4
 TRACE regalloc::bt_coalescing_analysis        > connected by moves: i10 r0V <- v4V (est_freq 1)
 TRACE regalloc::bt_coalescing_analysis        > Revised VLRs:
 TRACE regalloc::bt_coalescing_analysis        > vr0   (VR: v0V, sz=6, tc=2, sc=0.333, [(RF: i0.d-i5.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr1   (VR: v1V, sz=4, tc=2, sc=0.500, [(RF: i1.d-i4.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr2   (VR: v2V, sz=5, tc=2, sc=0.400, [(RF: i2.d-i6.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr3   (VR: v3V, sz=3, tc=2, sc=0.667, [(RF: i7.d-i9.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr4   (VR: v4V, sz=2, tc=1, sc=0.500, [(RF: i9.d-i10.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr5   (VR: v5V, sz=5, tc=3, sc=0.600, [(RF: i3.d-i7.u)])
 TRACE regalloc::bt_coalescing_analysis        > vr6   (VR: v6V, sz=4, tc=3, sc=0.750, [(RF: i5.d-i8.u)])
 TRACE regalloc::bt_coalescing_analysis        > Coalescing hints:
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr0 = (SameAs vr6, weight=1) 
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr1 = 
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr2 = 
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr3 = (SameAs vr5, weight=1) (SameAs vr4, weight=1) 
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr4 = (SameAs vr3, weight=1) (Exactly %xmm0, weight=1) 
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr5 = (SameAs vr3, weight=1) 
 TRACE regalloc::bt_coalescing_analysis        >   hintsfor vr6 = (SameAs vr0, weight=1) 
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr0 = [vr6, vr0]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr1 = [vr1]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr2 = [vr2]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr3 = [vr4, vr5, vr3]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr4 = [vr4, vr5, vr3]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr5 = [vr4, vr5, vr3]
 TRACE regalloc::bt_coalescing_analysis        >   eclassof vr6 = [vr6, vr0]
 TRACE regalloc::bt_coalescing_analysis        >   vv_boundary_move at i5
 TRACE regalloc::bt_coalescing_analysis        >   vv_boundary_move at i7
 TRACE regalloc::bt_coalescing_analysis        >   vv_boundary_move at i9
 TRACE regalloc::bt_coalescing_analysis        > do_coalescing_analysis: end
 TRACE regalloc::bt_coalescing_analysis        > 

@cfallin
Copy link
Member

cfallin commented Jan 6, 2022

Hmm, interesting. My understanding is that regalloc.rs computes equivalence classes to use in a strictly heuristic way during allocation (of virtual-reg ranges to registers, and then separately to spillslots). The former (mainly in this file) traverses equivalence classes looking for register hints (move to/from fixed reg), and to avoid some evictions. But two vregs existing in the same eclass does not guarantee that the move between them will be elided; in fact, one can construct inputs where this is not possible, as liveranges in an eclass can overlap. Elements in an eclass are joined by an "equivalence" that's kind of misnamed, I guess: it just means that at one point, one of the vregs was assigned to the other. Consider e.g. a loop where two vregs are swapped every iteration (loop(x, y): br loop(y, x)). So in this case, for whatever reason, the spill weights and processing order mean that the two sides of the move get allocated to different registers, and then that's it; the heuristics weren't quite good enough.

Zooming out a bit, I would expect that once in a while, we get perturbations like this as ISLE's SSA lowering works just a bit differently. As long as we don't see regressions overall, I think this is fine to accept.

@fitzgen
Copy link
Member Author

fitzgen commented Jan 6, 2022

Swapped the operands to the two commutative instructions (which I noticed was the only difference between the pre-regalloc vcode on main vs this branch) and the moves went away.

Yay 😬

@fitzgen
Copy link
Member Author

fitzgen commented Jan 6, 2022

Zooming out a bit, I would expect that once in a while, we get perturbations like this as ISLE's SSA lowering works just a bit differently. As long as we don't see regressions overall, I think this is fine to accept.

But yeah, agreed on this point 👍

@fitzgen fitzgen marked this pull request as ready for review January 6, 2022 18:48
@fitzgen fitzgen requested a review from cfallin January 6, 2022 18:48
@github-actions github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Jan 6, 2022
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks!

I very much like the compile-test style (check pxor, not xorps, not xorpd) -- this is the right way to use compile-tests going forward, I think, verifying actual properties of the compilation 👍

@github-actions
Copy link

github-actions bot commented Jan 6, 2022

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@fitzgen fitzgen merged commit 7fd78da into bytecodealliance:main Jan 6, 2022
@fitzgen fitzgen deleted the bitselect-isle branch January 6, 2022 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants