Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore using VMA to create buffers and images #102830

Merged
merged 1 commit into from
Feb 17, 2025

Conversation

clayjohn
Copy link
Member

Fixes: #101850

VMA handles memory allocation on certain devices better than our custom VK code, so we might as well use it

This PR uses VMA when we don't need the extra memory tracking. When not doing the extra memory tracking, there is no reason to use the raw Vulkan code.

I am tempted to find what VMA does differently and just copy it into our code to avoid the branch. However, it is likely best to always fall back on VMA since it is well-maintained and we can benefit from further updates to it in the future.

@clayjohn clayjohn added this to the 4.4 milestone Feb 13, 2025
@clayjohn clayjohn requested a review from a team as a code owner February 13, 2025 23:02
@darksylinc
Copy link
Contributor

Something smells:

The VMA branch always sets alloc_create_info.usage = VMA_MEMORY_USAGE_AUTO_PREFER_DEVICE;; which is not the original behavior. I believe this would restore the original 4.3 behavior:

VmaMemoryUsage vma_usage = VMA_MEMORY_USAGE_UNKNOWN;
uint32_t vma_flags_to_remove = 0;

VmaAllocationCreateInfo alloc_create_info = {};
switch (p_allocation_type) {
	case MEMORY_ALLOCATION_TYPE_CPU: {
		bool is_src = p_usage.has_flag(BUFFER_USAGE_TRANSFER_FROM_BIT);
		bool is_dst = p_usage.has_flag(BUFFER_USAGE_TRANSFER_TO_BIT);
		if (is_src && !is_dst) {
			// Looks like a staging buffer: CPU maps, writes sequentially, then GPU copies to VRAM.
			alloc_create_info.flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT;
			alloc_create_info.preferredFlags |= VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT;
			vma_flags_to_remove |= VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT;
		}
		if (is_dst && !is_src) {
			// Looks like a readback buffer: GPU copies from VRAM, then CPU maps and reads.
			alloc_create_info.flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_RANDOM_BIT;
			alloc_create_info.preferredFlags |= VK_MEMORY_PROPERTY_HOST_CACHED_BIT;
			vma_flags_to_remove |= VK_MEMORY_PROPERTY_HOST_CACHED_BIT;
		}
		vma_usage = VMA_MEMORY_USAGE_AUTO_PREFER_HOST;
		alloc_create_info.requiredFlags = (VK_MEMORY_PROPERTY_HOST_COHERENT_BIT | VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT);
	} break;
	case MEMORY_ALLOCATION_TYPE_GPU: {
		vma_usage = VMA_MEMORY_USAGE_AUTO_PREFER_DEVICE;
		alloc_create_info.preferredFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT;
		if (p_size <= SMALL_ALLOCATION_MAX_SIZE) {
			uint32_t mem_type_index = 0;
			vmaFindMemoryTypeIndexForBufferInfo(allocator, &create_info, &alloc_create_info, &mem_type_index);
			alloc_create_info.pool = _find_or_create_small_allocs_pool(mem_type_index);
		}
	} break;
}

VkBuffer vk_buffer = VK_NULL_HANDLE;
VmaAllocation allocation = nullptr;
VmaAllocationInfo alloc_info = {};

if (!Engine::get_singleton()->is_extra_gpu_memory_tracking_enabled()) {
	alloc_create_info.preferredFlags &= ~vma_flags_to_remove;
	alloc_create_info.usage = vma_usage;
	VkResult err = vmaCreateBuffer(allocator, &create_info, &alloc_create_info, &vk_buffer, &allocation, &alloc_info);
	ERR_FAIL_COND_V_MSG(err, BufferID(), "Can't create buffer of size: " + itos(p_size) + ", error " + itos(err) + ".");
} else {
	VkResult err = vkCreateBuffer(vk_device, &create_info, VKC::get_allocation_callbacks(VK_OBJECT_TYPE_BUFFER), &vk_buffer);
	ERR_FAIL_COND_V_MSG(err, BufferID(), "Can't create buffer of size: " + itos(p_size) + ", error " + itos(err) + ".");
	err = vmaAllocateMemoryForBuffer(allocator, vk_buffer, &alloc_create_info, &allocation, &alloc_info);
	ERR_FAIL_COND_V_MSG(err, BufferID(), "Can't allocate memory for buffer of size: " + itos(p_size) + ", error " + itos(err) + ".");
	err = vmaBindBufferMemory2(allocator, allocation, 0, vk_buffer, nullptr);
	ERR_FAIL_COND_V_MSG(err, BufferID(), "Can't bind memory to buffer of size: " + itos(p_size) + ", error " + itos(err) + ".");
}

Although something similar happens with vmaCreateImage, that shouldn't be a problem because we want all textures to be device local.

…extra gpu memory tracking.

VMA handles memory allocation on certain devices better than our custom VK code, so we might as well use it

Co-authored-by: Matias N. Goldberg <dark_sylinc@yahoo.com.ar>
@clayjohn
Copy link
Member Author

Something smells:

The VMA branch always sets alloc_create_info.usage = VMA_MEMORY_USAGE_AUTO_PREFER_DEVICE;; which is not the original behavior. I believe this would restore the original 4.3 behavior:

Although something similar happens with vmaCreateImage, that shouldn't be a problem because we want all textures to be device local.

Good catch. Previously we used VMA_MEMORY_USAGE_AUTO_PREFER_DEVICE unconditionally with vmaCreateImage(), so I kept the logic there. Then I forgot to check what we used to do when I copied the changes over for vmaCreateBuffer().

I have applied your change and confirmed that it continues to work fine. Taking a look at #90993 I can confirm that your change restores the old behaviour too.

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge pro-actively for the 4.4.beta4 snapshot, but a second review from @darksylinc or @DarioSamo would still be good.

@akien-mga akien-mga merged commit cd72d26 into godotengine:master Feb 17, 2025
20 checks passed
@akien-mga
Copy link
Member

Thanks!

@clayjohn clayjohn deleted the vma-memory-fix branch February 17, 2025 13:33
darksylinc added a commit to darksylinc/godot that referenced this pull request Mar 6, 2025
If the allocation is small enough that it enters the

if (p_size <= SMALL_ALLOCATION_MAX_SIZE) {} block, Godot would call
vmaFindMemoryTypeIndexForBufferInfo with the wrong parameters.

This can cause vmaFindMemoryTypeIndexForBufferInfo to potentially
misbehave on some cards or drivers.

Fixes regression introduced in godotengine#102830
Might potentially reopen godotengine#101850 (I doubt it, but it's possible)

Must be backported to 4.4
akien-mga pushed a commit to akien-mga/godot that referenced this pull request Mar 12, 2025
If the allocation is small enough that it enters the

if (p_size <= SMALL_ALLOCATION_MAX_SIZE) {} block, Godot would call
vmaFindMemoryTypeIndexForBufferInfo with the wrong parameters.

This can cause vmaFindMemoryTypeIndexForBufferInfo to potentially
misbehave on some cards or drivers.

Fixes regression introduced in godotengine#102830
Might potentially reopen godotengine#101850 (I doubt it, but it's possible)

Must be backported to 4.4

(cherry picked from commit c543c56)
akien-mga pushed a commit to akien-mga/godot that referenced this pull request Mar 12, 2025
If the allocation is small enough that it enters the

if (p_size <= SMALL_ALLOCATION_MAX_SIZE) {} block, Godot would call
vmaFindMemoryTypeIndexForBufferInfo with the wrong parameters.

This can cause vmaFindMemoryTypeIndexForBufferInfo to potentially
misbehave on some cards or drivers.

Fixes regression introduced in godotengine#102830
Might potentially reopen godotengine#101850 (I doubt it, but it's possible)

Must be backported to 4.4

(cherry picked from commit c543c56)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Godot 4.4 doesn't allocate shared GPU memory correctly
3 participants