Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up lambda capture handling #97087

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

aaronp64
Copy link
Contributor

@aaronp64 aaronp64 commented Sep 16, 2024

Updated GDScriptLambdaCallable::call and GDScriptLambdaSelfCallable::call to use alloca instead of Vector when using captures, to avoid extra allocation/copy_on_write calls on each lambda function call.

This makes functions like Array.map and Array.filter around 60% faster when using simple lambda functions with captures, compared with gdscript below:

func _ready() -> void:
	time("test_no_capture")
	time("test_map_multiply")
	time("test_filter")
	time("test_self_lambda")

func test_no_capture():
	var a := range(1000)
	for i in 1000:
		a.map(func(n): return n * 2)

func test_map_multiply():
	var a := range(1000)
	var multiplier := 5
	for i in 1000:
		a.map(func(n): return n * multiplier)

func test_filter():
	var a := range(1000)
	var min := 500
	for i in 1000:
		a.filter(func(n): return n >= min)

var max := 750
func test_self_lambda():
	var a := range(1000)
	var min := 500
	for i in 1000:
		a.filter(func(n): return n >= min && n <= max)

func time(test_name : String):
	var start := Time.get_ticks_msec()
	call(test_name)
	var end := Time.get_ticks_msec()
	print("%s: %dms" % [test_name, end - start])

Old:

test_no_capture: 111ms
test_map_multiply: 212ms
test_filter: 215ms
test_self_lambda: 265ms

New:

test_no_capture: 111ms
test_map_multiply: 129ms
test_filter: 131ms
test_self_lambda: 180ms

Using alloca does require additional stack space, which has a small impact on how many recursive calls can be made when p_argcount + captures_amount is large. The first example below (5 captures + 22 arguments) crashes for me at 567 recursive calls with the new code, and at 574 with old code. The second example (1 capture + 4 arguments) crashes at 606 recursive calls in both versions. I think the difference is small enough to be ok, but I can look into avoiding alloca for larger sizes if needed.

func test_recursive_call_large():
	var c1 := 1
	var c2 := 1
	var c3 := 1
	var c4 := 1
	var c5 := 1
	var recursive_func := func(x, f, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20):
		if x > 0:
			print(c1 + c2 + c3 + c4 + c5)
			f.call(x-1, f, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20)
	
	# 566 max new, 573 old
	recursive_func.call(573, recursive_func, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0)
	
func test_recursive_call_small():
	var c1 := 1
	var recursive_func := func(x, f, p1, p2):
		if x > 0:
			print(c1)
			f.call(x-1, f, p1, p2)
	
	recursive_func.call(605, recursive_func, 1, 2) # 605 max both

@aaronp64 aaronp64 force-pushed the lambda_capture_alloc branch 2 times, most recently from 3713a70 to cbc9108 Compare September 16, 2024 19:04
Copy link
Member

@dalexeev dalexeev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Updated GDScriptLambdaCallable::call and GDScriptLambdaSelfCallable::call to use alloca instead of Vector when using captures, to avoid extra allocation/copy_on_write calls on each lambda function call.
@aaronp64 aaronp64 force-pushed the lambda_capture_alloc branch from cbc9108 to e2b6d92 Compare September 16, 2024 19:49
@akien-mga akien-mga merged commit cf53991 into godotengine:master Sep 17, 2024
20 checks passed
@akien-mga
Copy link
Member

Thanks!

@aaronp64 aaronp64 deleted the lambda_capture_alloc branch September 23, 2024 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants