Speed up lambda capture handling #97087

aaronp64 · 2024-09-16T18:23:53Z

Updated GDScriptLambdaCallable::call and GDScriptLambdaSelfCallable::call to use alloca instead of Vector when using captures, to avoid extra allocation/copy_on_write calls on each lambda function call.

This makes functions like Array.map and Array.filter around 60% faster when using simple lambda functions with captures, compared with gdscript below:

func _ready() -> void:
	time("test_no_capture")
	time("test_map_multiply")
	time("test_filter")
	time("test_self_lambda")

func test_no_capture():
	var a := range(1000)
	for i in 1000:
		a.map(func(n): return n * 2)

func test_map_multiply():
	var a := range(1000)
	var multiplier := 5
	for i in 1000:
		a.map(func(n): return n * multiplier)

func test_filter():
	var a := range(1000)
	var min := 500
	for i in 1000:
		a.filter(func(n): return n >= min)

var max := 750
func test_self_lambda():
	var a := range(1000)
	var min := 500
	for i in 1000:
		a.filter(func(n): return n >= min && n <= max)

func time(test_name : String):
	var start := Time.get_ticks_msec()
	call(test_name)
	var end := Time.get_ticks_msec()
	print("%s: %dms" % [test_name, end - start])

Old:

test_no_capture: 111ms
test_map_multiply: 212ms
test_filter: 215ms
test_self_lambda: 265ms

New:

test_no_capture: 111ms
test_map_multiply: 129ms
test_filter: 131ms
test_self_lambda: 180ms

Using alloca does require additional stack space, which has a small impact on how many recursive calls can be made when p_argcount + captures_amount is large. The first example below (5 captures + 22 arguments) crashes for me at 567 recursive calls with the new code, and at 574 with old code. The second example (1 capture + 4 arguments) crashes at 606 recursive calls in both versions. I think the difference is small enough to be ok, but I can look into avoiding alloca for larger sizes if needed.

func test_recursive_call_large():
	var c1 := 1
	var c2 := 1
	var c3 := 1
	var c4 := 1
	var c5 := 1
	var recursive_func := func(x, f, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20):
		if x > 0:
			print(c1 + c2 + c3 + c4 + c5)
			f.call(x-1, f, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20)
	
	# 566 max new, 573 old
	recursive_func.call(573, recursive_func, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0)
	
func test_recursive_call_small():
	var c1 := 1
	var recursive_func := func(x, f, p1, p2):
		if x > 0:
			print(c1)
			f.call(x-1, f, p1, p2)
	
	recursive_func.call(605, recursive_func, 1, 2) # 605 max both

modules/gdscript/gdscript_lambda_callable.cpp

dalexeev

Looks good to me.

modules/gdscript/gdscript_lambda_callable.cpp

Updated GDScriptLambdaCallable::call and GDScriptLambdaSelfCallable::call to use alloca instead of Vector when using captures, to avoid extra allocation/copy_on_write calls on each lambda function call.

akien-mga · 2024-09-17T07:23:06Z

Thanks!

aaronp64 requested a review from a team as a code owner September 16, 2024 18:23

dalexeev added enhancement topic:gdscript performance labels Sep 16, 2024

dalexeev added this to the 4.4 milestone Sep 16, 2024

dalexeev reviewed Sep 16, 2024

View reviewed changes

modules/gdscript/gdscript_lambda_callable.cpp Show resolved Hide resolved

aaronp64 force-pushed the lambda_capture_alloc branch 2 times, most recently from 3713a70 to cbc9108 Compare September 16, 2024 19:04

dalexeev approved these changes Sep 16, 2024

View reviewed changes

modules/gdscript/gdscript_lambda_callable.cpp Outdated Show resolved Hide resolved

modules/gdscript/gdscript_lambda_callable.cpp Outdated Show resolved Hide resolved

Speed up lambda capture handling

e2b6d92

Updated GDScriptLambdaCallable::call and GDScriptLambdaSelfCallable::call to use alloca instead of Vector when using captures, to avoid extra allocation/copy_on_write calls on each lambda function call.

aaronp64 force-pushed the lambda_capture_alloc branch from cbc9108 to e2b6d92 Compare September 16, 2024 19:49

akien-mga merged commit cf53991 into godotengine:master Sep 17, 2024
20 checks passed

aaronp64 deleted the lambda_capture_alloc branch September 23, 2024 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up lambda capture handling #97087

Speed up lambda capture handling #97087

aaronp64 commented Sep 16, 2024 •

edited

Loading

dalexeev left a comment

akien-mga commented Sep 17, 2024

Speed up lambda capture handling #97087

Speed up lambda capture handling #97087

Conversation

aaronp64 commented Sep 16, 2024 • edited Loading

dalexeev left a comment

Choose a reason for hiding this comment

akien-mga commented Sep 17, 2024

aaronp64 commented Sep 16, 2024 •

edited

Loading