-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fuse Seq.* calls #1525
[WIP] Fuse Seq.* calls #1525
Conversation
Hmmm... I must admit I'm not totally sure I want to do these kinds of optimizations in F#. There are so many high-level optimizations like this that can be applied, and each one can cause surprise when they stop working (i.e. when they don't commute with some other code change). For example, what of For example, an alternative approach may be to improve and broaden the existing sequence state machine compilation to work on a broader range of expressions. Currently that only applies on the desugared form of |
I agree rewritting into choose from map and filter would feel weird. I person would only try to reduce stuff from two functions into one of the already existing. I think this would also ensure that we reach a fix point. Rewriting into seq expressions is also interesting. We could do that in a separate step. Does this sound reasonable? |
After rereading: do you suggest we should rewrite every Seq.map into seq expressions and just let the seq expression rewriter do it's thing? |
@forki Another thing you could try is to do something similar to LINQ in which when you use map, you try to downcast the enumerator of the input sequence to a specialized That seems like it would be safer. |
You mean to rewrite Seq.map xs into List.map xs when xs is a list? That would change from lazy to eager evaluation. We can't do that |
No, I mean like:
I don't think this would change the lazy evaluation of |
And in a last rewriting step we would rewrite everything back to seq again? |
|
I think I don't really understand that suggestion yet. Tomorrow I'll try to rewrite all Seq.map into seq expressions and see how far this gets optimized in the seq expression optimizer. For lists I will try to do a similar optimization like the one I proposed here. The only difference I see is that we need to make sure that the inner lambda is side effect free. I think I saw helpers which test this. They are marked as being slow, but hey ;-) |
A more general rewrite system, like described in Playing by the Rules: Rewriting as a practical optimisation technique in GHC might be very useful. This could allow optimizations such as these (see the commit messages) to be performed automatically. |
So simply rewriting Seq.map f xs as seq { for x in xs do yield f x } doesn't work. I just compiled seq { for x in seq { for (y:string) in data do yield y.Length } do yield x * 42 } and the result is a big state machine with two calls to GetEnumerator, but the (fun y -> y.Length) and (fun x -> x * 42) are far away and don't get optimized any further. |
e686d9e
to
fe97589
Compare
The state machine creating logic I have found to be quite terrible (performance wise), so unless a complete rewrite of that code is on the cards then I think that path is not going to yield results (IMHO). |
Maybe some inspiration could be gained from Nessos's LinqOptimizer (it does what you want, I think, but at runtime) |
@manofstick Please link a specific issue with repro - it's important not to leave this as hearsay, and to make sure we know what we're comparing with. While surely imperfect, the perf gains from the state machine rewriting were often around 20x in many common cases. So it's all going to depend on what you're comparing with, and the specific examples. I'd be especially interested in any cases where the introduction of state machines is a net negative over the combinator encoding of @forki Perhaps we first need to be clearer about the current approach to performance and optimizations in F#. Some descriptions are in the F# compiler guide, but we need more. My feeling is that rewriting of combinations of library functions (of the kind in this PR) doesn't fit into the kinds of optimizations we do in the compiler, nor do we want to go there at this stage, except perhaps in an experimental branch, unless we have a really comprehensive approach to the problem. There are things the F# compiler does, there are things it doesn't do. For example, if we did start doing rewrites of this kind, there are a very large number we'd like applied - there are probably 300 or more useful rewrites over FSharp.Core functions. Ideally we would do these using a general rewriting approach (no doubt inspired by Haskell - see @polytypic link above - Coq, HOL, Isabelle and many other rewriting systems). But that's also a big topic. There are risks to every optimization, and we've previously shipped critical bugs in new optimizations. Correctness, completeness, predictability, robustness-under-change, "obviousness", "learnability" ("reducing what a reasonable user needs to know") are all reasons why I'm loathe to start adding new rewrites of combinations-of-library-functions into the compiler, especially in cases where the user can reasonably apply the optimization manually once a hotspot is identified. So overall my feeling is that this is in the "things we don't do" category, unless we're iterating to a much more general solution to the topic, or to a particular sub-domain of optimization. One exception is the introduction of state machines. However this is an algorithmically non-trivial operation that is extremely difficult to code by hand. Ideally that would be generalized to other computational structures (notably Async), and potentially to further combinations of sequence functions. |
I agree that a state machine implementation is the way to go (and much faster than a pure function fest), but the current implementation is overly cautious (in regards to Current) and sub-optimal with regards to looping constructs (uses helper functions). I have whipped up a trivial example for you. In 64-bit the example runs ~5x faster and in 32-bit it runs ~6x faster. In my code base at work profiling often leads to seq{}s, which I end up ripping out... Anyway, I must say that it is an area that I have been thinking of attacking for a long while (i.e. state machine generation). But I still want to get my equality/comparison stuff back up off the ground, as well as UseVirtualTag stuff finished... Hmmmm... Anyone got some cloning technology? (Update: modifying the seq to use "while" (with a mutable index) rather than "for" brings the speed basically up to the manual implementation; but I do recall having some other issue with "while" under some circumstances; but I can't remember at the moment. I'll try to remember) |
no I don't think it should be closed ;-) |
Well maybe it should change it's name to some general fusing? Anyway, if you run this gist which compares "fused" Seq calls via manual seq {} block vs the modified piping technique as described in #1570 (And this is just using the Seq module, not the proposed Seq.Composer module, which would include inling, and thus be even faster) you get the following results:
times in milliseconds. |
What if you apply both? |
Pretty sweet eh? (Because, as I mentioned before, the state machine implementation is less than ideal! But I got snapped at for mentioning it!) And yes, you could fuse a seq implementation based off the composer. But lot less fat there than there was. And it adds considerable complexity. And I wasn't against the PR per se, I was just keen for it to change name, if it is research based. But if it's just going to sit here and rot, then i think it is better closed and worked on in a private repository. But that is just my personal preference of how to manage PRs/tasks. Ie once you get to a certain level of open ones then the whole management process collapses. Well that's my personal experience anyway. |
Ok closing for now. Let's discuss post RTM again |
this is highly experimental.
This PR enables the compiler to rewrite
into:
Future rules: