Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new String.Join overload added in .Net 4.0 (~3x faster) #2570

Merged
merged 6 commits into from
Mar 10, 2017

Conversation

saul
Copy link
Contributor

@saul saul commented Mar 9, 2017

Fixes #2569

Before this PR: 880 msecs
With this PR: 300 msecs

Benchmark script:

let sw = System.Diagnostics.Stopwatch()
sw.Start()

Seq.init 1_000_000 string
|> String.concat ""
|> ignore

sw.Stop()
printfn "%d milliseconds" sw.ElapsedMilliseconds

Copy link
Member

@KevinRansom KevinRansom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, eliminating that copy is a good thing.

@KevinRansom KevinRansom merged commit cbe88a8 into dotnet:master Mar 10, 2017
@manofstick
Copy link
Contributor

manofstick commented Mar 16, 2017

Really probably inconsequential, but not sure if this was a good change or not. Under the hood, because it's a seq, it can't calculate the total size requirements, so it has to use a StringBuilder which potentially will do more allocations that the up front toArray. Anyway, if you change the test case to:

for i = 1 to 5 do
    let sw = System.Diagnostics.Stopwatch()
    sw.Start()

    let strings =
        Seq.init 1000000 (fun _ -> "This is a slightly longer string")
#if TURN_INTO_ARRAY
        |> Seq.toArray
#endif

    let b = System.String.Join ("", strings)

    sw.Stop()
    printfn "%d milliseconds" sw.ElapsedMilliseconds

You find that the upfront toArray version is faster (and probably fragments memory less as only the single array and single string buffer are allocated)

@saul
Copy link
Contributor Author

saul commented Mar 16, 2017

@manofstick never thought of that - it's probably worth graphing size vs speed so then we can figure out a tradeoff and optimise for what we believe is the most common case.

@manofstick
Copy link
Contributor

@saul

I mean toArray is going to suffer the same things as StringBuilder (i.e. need to keep growing an expanding buffer) but only of string references - so I'm guessing if the string is >> 8 bytes (for 64 bit) then that will probably be the point at which you start to see action...

Oh, and I would avoid Seq.init, it does strange things from a timing perspective!

@manofstick
Copy link
Contributor

@saul

Oh, and I would probably avoid using the string function as well, as it tries to do casting to IFormattable (from memory) and then does stuff. Just gets in the way of what you're trying to time... Something like:

for i = 1 to 5 do
    let sw = System.Diagnostics.Stopwatch()
    sw.Start()

    let strings =
        seq {
            let mutable x = 0
            while x < 1000000 do
                yield x.ToString ()
                x <- x + 1
        }

#if TURN_INTO_ARRAY
        |> Seq.toArray
#endif

    let b = System.String.Join ("", strings)

    sw.Stop()
    printfn "%d milliseconds" sw.ElapsedMilliseconds

Is probably best

@saul
Copy link
Contributor Author

saul commented Mar 16, 2017

@manofstick with your latest code, here are the benchmarks I get:

# Without Seq.toArray
PS > .\fsiAnyCpu.exe C:\code\scratch\perf.fsx --optimize+
270 milliseconds
222 milliseconds
247 milliseconds
250 milliseconds
244 milliseconds

# With Seq.toArray
PS > .\fsiAnyCpu.exe C:\code\scratch\perf.fsx --optimize+
328 milliseconds
352 milliseconds
418 milliseconds
294 milliseconds

@manofstick
Copy link
Contributor

@saul

OK, now play around with some different strings :-)

Have fun!

@manofstick
Copy link
Contributor

More effort than I really think this is probably worth (I think the change as stands is good and I don't think you should change it, my discussion is more just on the joys of optimizing things...) anyway, I played a little more and present this:

[<Literal>]
let innerCount = 10

[<Literal>]
let outerCount = 100000

[<Literal>]
let seperator = ", "

for atomSize in 0 .. 5 .. 100 do
    let sw = System.Diagnostics.Stopwatch.StartNew ()

    let atom = String.init atomSize (fun _ -> "X")

    for i = 1 to outerCount do 
        let strings =
            seq {
                let mutable x = 0
                while x < innerCount do
                    yield atom
                    x <- x + 1 }
    #if TURN_INTO_ARRAY
            |> Seq.toArray
    #endif

        let composite = System.String.Join (seperator, strings)
        if composite.Length <> ((atom.Length * innerCount) + (seperator.Length * (innerCount-1))) then
            failwith "unexpected output"

        ()

    printfn "%d: %d milliseconds" atomSize sw.ElapsedMilliseconds

Which seems to indicate that the benefit of going to an array kicks in with strings than are greater than 20 to 30 characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants