Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dereferencing a funcref from the function it calls leads to a heap-use-after-free #75658

Closed
tcoxon opened this issue Apr 4, 2023 · 9 comments · Fixed by #96856
Closed

Dereferencing a funcref from the function it calls leads to a heap-use-after-free #75658

tcoxon opened this issue Apr 4, 2023 · 9 comments · Fixed by #96856
Assignees
Milestone

Comments

@tcoxon
Copy link
Contributor

tcoxon commented Apr 4, 2023

Godot version

3.5.1.stable

System information

Arch Linux (rolling release)

Issue description

Dereferencing a funcref from the function it calls can cause an error in AddressSanitizer in debug/release-debug builds. It might be possible for this to cause more subtle heap corruption issues in release builds--I suspect it is behind a rare/unreproducible crash in my project.

I think what happens in the below example is that when foobar dereferences the funcref, the funcref is freed. Then when control flow returns from foobar to FuncRef.call_func the this pointer is invalid. It seems weird that function calls do not implicitly hold a reference to the object they're on.

Note that this isn't unique to FuncRefs - you can implement a FuncRef in pure GDScript and it will have the same bug.

==16930==ERROR: AddressSanitizer: heap-use-after-free on address 0x612000254d70 at pc 0x0000066b3b05 bp 0x7fff849419c0 sp 0x7fff849419b0
WRITE of size 4 at 0x612000254d70 thread T0
    #0 0x66b3b04 in std::__atomic_base<unsigned int>::fetch_sub(unsigned int, std::memory_order) /usr/include/c++/12.2.1/bits/atomic_base.h:628
    #1 0x66b3b04 in SafeNumeric<unsigned int>::decrement() core/safe_refcount.h:84
    #2 0x66b3b04 in SafeRefCount::unref() core/safe_refcount.h:177
    #3 0x66b3b04 in _ObjectDebugLock::~_ObjectDebugLock() core/object.cpp:53
    #4 0x66b3b04 in Object::call(StringName const&, Variant const**, int, Variant::CallError&) core/object.cpp:924
    #5 0x68b8c7a in Variant::call_ptr(StringName const&, Variant const**, int, Variant*, Variant::CallError&) core/variant_call.cpp:1229
    #6 0x12f7960 in GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Variant::CallError&, GDScriptFunction::CallState*) modules/gdscript/gdscript_function.cpp:1048
    #7 0x122592e in GDScriptInstance::call_multilevel(StringName const&, Variant const**, int) modules/gdscript/gdscript.cpp:1214
    #8 0x29b9039 in Node::_notification(int) scene/main/node.cpp:58
    #9 0x2cb073f in Node::_notificationv(int, bool) scene/main/node.h:47
    #10 0x2cb073f in CanvasItem::_notificationv(int, bool) scene/2d/canvas_item.h:163
    #11 0x2cb073f in Control::_notificationv(int, bool) scene/gui/control.h:47
    #12 0x66ab532 in Object::notification(int, bool) core/object.cpp:927
    #13 0x2a4c801 in SceneTree::_notify_group_pause(StringName const&, int) scene/main/scene_tree.cpp:1105
    #14 0x2a733d1 in SceneTree::idle(float) scene/main/scene_tree.cpp:608
    #15 0xb08189 in Main::iteration() main/main.cpp:2298
    #16 0xaaf46f in OS_X11::run() platform/x11/os_x11.cpp:3987
    #17 0xa4e5dc in main platform/x11/godot_x11.cpp:59
    #18 0x7f1c6d03c78f  (/usr/lib/libc.so.6+0x2378f)
    #19 0x7f1c6d03c849 in __libc_start_main (/usr/lib/libc.so.6+0x23849)
    #20 0xa64634 in _start (/media/scratch/coding/godot/godot-builder/godot/bin/godot.x11.opt.debug.64s+0xa64634)

0x612000254d70 is located 48 bytes inside of 272-byte region [0x612000254d40,0x612000254e50)
freed by thread T0 here:
    #0 0x7f1c6dcbe672 in __interceptor_free /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:52
    #1 0x6753795 in void memdelete<Reference>(Reference*) core/os/memory.h:118
    #2 0x6753795 in void memdelete<Reference>(Reference*) core/os/memory.h:110
    #3 0x6753795 in Ref<Reference>::unref() core/reference.h:258
    #4 0x6753795 in RefPtr::unref() core/ref_ptr.cpp:85

previously allocated by thread T0 here:
    #0 0x7f1c6dcbfa89 in __interceptor_malloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x6b16b2f in Memory::alloc_static(unsigned long, bool) core/os/memory.cpp:75
    #2 0x12f66ed in GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Variant::CallError&, GDScriptFunction::CallState*) modules/gdscript/gdscript_function.cpp:1116
    #3 0x123621b in GDScriptInstance::_ml_call_reversed(GDScript*, StringName const&, Variant const**, int) modules/gdscript/gdscript.cpp:1229

Steps to reproduce

Put this script on a node and then play the scene, using a release_debug build with asan on:

var fr

func _ready():
	fr = funcref(self, "foobar")

func foobar():
	fr = null

func _process(_delta):
	if fr:
		fr.call_func()

Minimal reproduction project

N/A

@lawnjelly lawnjelly added this to the 3.x milestone Apr 5, 2023
@tcoxon
Copy link
Contributor Author

tcoxon commented Aug 30, 2024

I've run into this issue at least a dozen separate times now in unrelated parts of my codebase, in multiple projects, in Godot 3 and in Godot 4 as well. I seriously can't be the only one encountering this 😆

Updated example that works with Godot 4.3:
godot-callable-crash.zip

Build Godot with asan, run the callable_example scene and observe the asan error:

==90026==ERROR: AddressSanitizer: heap-use-after-free on address 0x5140001a8ca0 at pc 0x636b24b87682 bp 0x7ffd10c69a30 sp 0x7ffd10c69a28
WRITE of size 4 at 0x5140001a8ca0 thread T0
    #0 0x636b24b87681 in std::__atomic_base<unsigned int>::fetch_sub(unsigned int, std::memory_order) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/14.1.1/../../../../include/c++/14.1.1/bits/atomic_base.h:641:16
    #1 0x636b24b87681 in SafeNumeric<unsigned int>::decrement() /media/scratch/coding/godot/godot-builder/godot/./core/templates/safe_refcount.h:87:16
    #2 0x636b24b87681 in SafeRefCount::unref() /media/scratch/coding/godot/godot-builder/godot/./core/templates/safe_refcount.h:207:16
    #3 0x636b24b87681 in _ObjectDebugLock::~_ObjectDebugLock() /media/scratch/coding/godot/godot-builder/godot/core/object/object.cpp:55:20
    #4 0x636b24b87681 in Object::callp(StringName const&, Variant const**, int, Callable::CallError&) /media/scratch/coding/godot/godot-builder/godot/core/object/object.cpp:814:1
    #5 0x636b24383b0a in Variant::callp(StringName const&, Variant const**, int, Variant&, Callable::CallError&) /media/scratch/coding/godot/godot-builder/godot/core/variant/variant_call.cpp:1211:27
    #6 0x636b18891d65 in GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Callable::CallError&, GDScriptFunction::CallState*) /media/scratch/coding/godot/godot-builder/godot/modules/gdscript/gdscript_vm.cpp:1780:12
    #7 0x636b184598e9 in GDScriptInstance::callp(StringName const&, Variant const**, int, Callable::CallError&) /media/scratch/coding/godot/godot-builder/godot/modules/gdscript/gdscript.cpp:2032:22
    #8 0x636b1df52251 in bool Node::_gdvirtual__ready_call<false>() /media/scratch/coding/godot/godot-builder/godot/scene/main/node.h:355:2
    #9 0x636b1df52251 in Node::_notification(int) /media/scratch/coding/godot/godot-builder/godot/scene/main/node.cpp:217:4
    #10 0x636b1f798bd1 in Node::_notificationv(int, bool) /media/scratch/coding/godot/godot-builder/godot/./scene/main/node.h:50:2
    #11 0x636b1f798bd1 in CanvasItem::_notificationv(int, bool) /media/scratch/coding/godot/godot-builder/godot/./scene/main/canvas_item.h:45:2
    #12 0x636b1f798bd1 in Node2D::_notificationv(int, bool) /media/scratch/coding/godot/godot-builder/godot/scene/2d/node_2d.h:37:2
    #13 0x636b24b7edc1 in Object::notification(int, bool) /media/scratch/coding/godot/godot-builder/godot/core/object/object.cpp
    #14 0x636b1df5ad36 in Node::_propagate_ready() /media/scratch/coding/godot/godot-builder/godot/scene/main/node.cpp:272:3
    #15 0x636b1df5aca2 in Node::_propagate_ready() /media/scratch/coding/godot/godot-builder/godot/scene/main/node.cpp:263:12
    #16 0x636b1df6c109 in Node::_set_tree(SceneTree*) /media/scratch/coding/godot/godot-builder/godot/scene/main/node.cpp:3171:4
    #17 0x636b17b3417b in OS_LinuxBSD::run() /media/scratch/coding/godot/godot-builder/godot/platform/linuxbsd/os_linuxbsd.cpp:950:13
    #18 0x636b17b17295 in main /media/scratch/coding/godot/godot-builder/godot/platform/linuxbsd/godot_linuxbsd.cpp:85:6
    #19 0x7b167e034e07  (/usr/lib/libc.so.6+0x25e07) (BuildId: 476816ac63052a8d67cb2a7e2b93ee9bd72e417b)
    #20 0x7b167e034ecb in __libc_start_main (/usr/lib/libc.so.6+0x25ecb) (BuildId: 476816ac63052a8d67cb2a7e2b93ee9bd72e417b)
    #21 0x636b179dd2e4 in _start (/media/scratch/coding/godot/godot-builder/godot/bin/godot.linuxbsd.editor.x86_64.llvm.san+0x4dc62e4) (BuildId: 94136b4b522f8ae4)

0x5140001a8ca0 is located 96 bytes inside of 392-byte region [0x5140001a8c40,0x5140001a8dc8)
freed by thread T0 here:
    #0 0x636b17acb492 in free.part.0 (/media/scratch/coding/godot/godot-builder/godot/bin/godot.linuxbsd.editor.x86_64.llvm.san+0x4eb4492) (BuildId: 94136b4b522f8ae4)
    #1 0x636b2432d184 in Variant::_clear_internal() /media/scratch/coding/godot/godot-builder/godot/core/variant/variant.cpp:1389:6
    #2 0x636b184598e9 in GDScriptInstance::callp(StringName const&, Variant const**, int, Callable::CallError&) /media/scratch/coding/godot/godot-builder/godot/modules/gdscript/gdscript.cpp:2032:22

previously allocated by thread T0 here:
    #0 0x636b17acc4c9 in malloc (/media/scratch/coding/godot/godot-builder/godot/bin/godot.linuxbsd.editor.x86_64.llvm.san+0x4eb54c9) (BuildId: 94136b4b522f8ae4)
    #1 0x636b23b772df in Memory::alloc_static(unsigned long, bool) /media/scratch/coding/godot/godot-builder/godot/core/os/memory.cpp:75:14
    #2 0x636b23b772df in operator new(unsigned long, char const*) /media/scratch/coding/godot/godot-builder/godot/core/os/memory.cpp:40:9
    #3 0x636b1842d697 in GDScriptNativeClass::instantiate() /media/scratch/coding/godot/godot-builder/godot/modules/gdscript/gdscript.cpp:99:9
    #4 0x636b1842d697 in GDScript::_new(Variant const**, int, Callable::CallError&) /media/scratch/coding/godot/godot-builder/godot/modules/gdscript/gdscript.cpp:222:29

This sometimes leads to a crash in non-asan builds (in editor, debug, and in release template builds), but it's very difficult to reproduce that way.

It happens because a RefCounted object's refcount can go to zero, causing the object to be freed, while one of its methods is still somewhere on the stack. When the stack unwinds to that method, there's a heap-use-after-free.

The pattern of use that tends to hit this bug in real code is one where there's an object that emits a signal when it's finished - and where the consumer of that signal clears the sole reference to that object (whether intentionally, incidentally, or indirectly). The other example in that project is more along these lines, and demonstrates just how insidious this bug can be:

func _ready() -> void:
	var promise = $JobQueue.post(some_slow_operation)
	$TextureRect.texture = await promise.fulfilled

When I look at this, I don't tend to think that when _ready returns it's going to (1) free the promise, and then (2) jump back into some method on the promise and crash, lol. That's some spooky action at a distance.

Edit to add: I should say it can be worked around by carefully re-writing code. But users can only do that if they know the bug is there, and what causes the bug.

@akien-mga akien-mga modified the milestones: 3.x, 4.4 Aug 30, 2024
@akien-mga
Copy link
Member

CC @godotengine/gdscript @godotengine/core

@RandomShaper RandomShaper self-assigned this Sep 11, 2024
@lawnjelly
Copy link
Member

lawnjelly commented Sep 11, 2024

This bug seems caused by OBJ_DEBUG_LOCK in Object::call().

Likely this is holding onto a reference to the object and then trying to change a ref count after deletion.

Removing the line OBJ_DEBUG_LOCK seems to remove the bug.

Still investigating.

It likely needs something like a temporary bumped ref count to defer deletion until both the call and the OBJ_DEBUG_LOCK have finished.

UPDATE:
Have a fix, it's a bit bodgey but works, I'll see if there's a better one (I'm not so familiar with Object refcounting).

The bodge fix BTW is just storing the ObjectID on construction of the _ObjectDebugLock and checking it against the database before decrementing. But it would probably be better to do with ref counts.

There's already some protection against calling free() directly it seems (Object::call()):

	if (p_method == CoreStringNames::get_singleton()->_free) {
	...

so this problem has come up before in some form but doesn't protect against this type of deletion.

@lawnjelly
Copy link
Member

struct _ObjectDebugLock {
	Object *obj; // I AM THE PROBLEM HERE, RAW POINTER TO OBJECT
	ObjectID id;

	_ObjectDebugLock(Object *p_obj) {
		obj = p_obj;
		obj->_lock_index.ref();
		
		if (obj)
		{
			id = obj->get_instance_id();
		}
	}
	~_ObjectDebugLock() {
		
		if (id)
		{
			if (ObjectDB::get_instance(id))
			{
				obj->_lock_index.unref();
			}
		}
		
	}
};

@lawnjelly
Copy link
Member

lawnjelly commented Sep 11, 2024

Actually the bodge fix might be not so bad after all.
The FuncRef is a reference, so has a ref count, but the problem is that ObjectDebugLock works with all Objects (which don't seem to have a refcount on cursory inspection). So there's no easy way to defer the deletion.

We could test whether the object is a Reference, and if so, increase the ref count around the call. But although that fixes this case, it doesn't fix the general case for non-reference objects being deleted. Maybe this can't happen but it would be nice to close that too.

Looking up instance ids is probably super slow but it seems debug only. But using it for every call in gdscript might slow things down, not sure yet how often this is called. There may be more cunning ways to do it more efficiently.

There's also the problem that if we did increase the refcount to prevent this (debug) bug, we would probably need to do it in release too because the lifetime would change subtly, and could lead to further hard to debug bugs in the future. So maybe the bodge is better.

Maybe there's a more elegant way though.

@RandomShaper
Copy link
Member

It seems we crossed our ways. I've made a fix based on Variant as the responsible for validity check.

@lawnjelly
Copy link
Member

lawnjelly commented Sep 11, 2024

It seems we crossed our ways. I've made a fix based on Variant as the responsible for validity check.

That looks good I'll check in a second. 👍
You are more familiar with Object etc than me.
Akien poked me this morning to have a look at these bugs.

Ah yes, your PR looks to be doing the same as my suggestion with the ObjectDB check under the hood. I don't think we have this function in 3.x, so might have to be more as above.

On the other hand I'm wondering now if we can do this more efficiently:

  • If call is single threaded that we can just set a static current call object on Object, and if it tries to delete this it changes a static bool.
  • Just check the bool after the call site.

This depends on the pattern being single threaded (no idea if it is, and this may differ in 3.x and 4.x). If not then maybe the ObjectDB may be better.

UPDATE:
Ah no, this above doesn't work as there are chains of calls, so a single static won't work, even single threaded.

@tcoxon
Copy link
Contributor Author

tcoxon commented Sep 11, 2024

There's also the problem that if we did increase the refcount to prevent this (debug) bug, we would probably need to do it in release too because the lifetime would change subtly, and could lead to further hard to debug bugs in the future. So maybe the bodge is better.

Another issue would be the predelete handler. When the object's refcount reaches 0, it frees the object, triggering the predelete_handler and NOTIFICATION_PREDELETE. That could then do a method call and re-increment the refcount. I think this would cause an infinite loop unless how the predelete handler works was also changed somehow?

This bug seems caused by OBJ_DEBUG_LOCK in Object::call().

I think the object debug lock might not be the only problem. It's just the only way the issue manifests in the example project.

So the relevant functions in the less_direct_example are:

func _ready() -> void:
	var promise = $JobQueue.post(some_slow_operation)
	$TextureRect.texture = await promise.fulfilled

#...

	func fulfill(value) -> void:
		result = value
		fulfilled.emit(result)

The fulfill function does nothing after emitting the "fulfilled" signal, so fixing the object debug lock would fix this example. But there could be anything there after emit returns. If it assigned a value to a property or called another method wouldn't this access unallocated memory (in release builds, too)?

	func fulfill(value) -> void:
		result = value
		fulfilled.emit(result)
		# now self has been freed
		result = null # oops, write-after-free

EDIT: I looked into whether this would cause an issue, and it appears that deletion doesn't occur until the end of the fulfill method. Since somebody has already thought of this, I'm retracting this comment. :)

@lawnjelly
Copy link
Member

I think the object debug lock might not be the only problem. It's just the only way the issue manifests in the example project.

I know next to zero about the await command, and I may be wrong, but this does sound like a separate issue, and should probably be reported in separate issue with MRP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants