Implement threading to speed up video saving in Movie Maker mode #4751

Calinou · 2022-06-26T00:36:24Z

Related to #4752.

Describe the project you are working on

The Godot editor 🙂

Describe the problem or limitation you are having in your project

Godot now has offline video recording functionality. This works well, but it's fairly slow when recording at high resolutions since encoding the MJPEG can take a relatively long time. This is even worse when using PNG output as PNG is slow to encode compared to MJPEG.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Move MJPEG/PNG encoding to a thread pool, so it can be performed in parallel and avoid blocking the main thread (so it can keep on rendering frames).

This has a potential to double video writing performance (or more, on CPUs with higher core counts).

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

PNG/WAV movie writer is here: https://github.com/godotengine/godot/blob/master/servers/movie_writer/movie_writer.cpp#L173
AVI (MJPEG) movie writer is here: https://github.com/godotengine/godot/blob/master/servers/movie_writer/movie_writer_mjpeg.cpp
One challenge is to ensure frames are always written in the right order, but it should be feasible.
Support for non-threaded recording should likely be preserved for use cases where the engine must be compiled without support for multiple threads. While the editor requires support for multiple threads, export templates don't. Export templates do include Movie Maker functionality, so that exported projects can also use this feature to create high-quality offline videos.

If this enhancement will not be used often, can it be worked around with a few lines of script?

No.

Is there a reason why this should be core and not an add-on in the asset library?

This is about improving Movie Maker recording performance to make it more usable for video production.

myaaaaaaaaa · 2023-03-03T17:16:16Z

Alternatively for lossless video, there's also the option to add support for outputting .bmp sequences, which should be substantially faster than compressing to .png.

Calinou · 2023-03-03T20:41:03Z

Alternatively for lossless video, there's also the option to add support for outputting .bmp sequences, which should be substantially faster than compressing to .png.

This is something I indirectly proposed in #6390 🙂

Godot doesn't have save_bmp() or save_tga() functions in Image yet though. It should be relatively easy to implement those, at least for RGB8 and RGBA8 formats.

Adding a way to save uncompressed images will help improve performance to an extent, but the benefit is often much smaller compared to threading. See this comment and its replies for an example of this in SHADERed.

myaaaaaaaaa · 2023-03-04T00:41:21Z

After further investigation, I've found that uncompressed images do in fact provide optimal performance, in the sense that the fps is identical to just disabling frame writing altogether:

diff --git a/main/main.cpp b/main/main.cpp
index 87e83f65c1..dc051e74ce 100644
--- a/main/main.cpp
+++ b/main/main.cpp
@@ -3277,7 +3277,7 @@ bool Main::iteration() {
 		RID main_vp_rid = RenderingServer::get_singleton()->viewport_find_from_screen_attachment(DisplayServer::MAIN_WINDOW_ID);
 		RID main_vp_texture = RenderingServer::get_singleton()->viewport_get_texture(main_vp_rid);
 		Ref<Image> vp_tex = RenderingServer::get_singleton()->texture_2d_get(main_vp_texture);
-		movie_writer->add_frame(vp_tex);
 	}
 
 	if ((quit_after > 0) && (Engine::get_singleton()->_process_frames >= quit_after)) {

However, there is still overhead compared to not having Movie Maker on, which is only resolved after also disabling texture downloading:

diff --git a/main/main.cpp b/main/main.cpp
index 87e83f65c1..02f83e30d1 100644
--- a/main/main.cpp
+++ b/main/main.cpp
@@ -3275,9 +3275,7 @@ bool Main::iteration() {
 
 	if (movie_writer) {
 		RID main_vp_rid = RenderingServer::get_singleton()->viewport_find_from_screen_attachment(DisplayServer::MAIN_WINDOW_ID);
		RID main_vp_texture = RenderingServer::get_singleton()->viewport_get_texture(main_vp_rid);
-		Ref<Image> vp_tex = RenderingServer::get_singleton()->texture_2d_get(main_vp_texture);
-		movie_writer->add_frame(vp_tex);
 	}
 
 	if ((quit_after > 0) && (Engine::get_singleton()->_process_frames >= quit_after)) {

In other words, the overhead comes from a render pipeline stall rather than compressing the image data:

Desired behavior:

(Source)

I don't think there's much more that can be done from the encoding side.

Calinou · 2023-03-04T08:11:33Z

After further investigation, I've found that uncompressed images do in fact provide optimal performance

Do you have a branch for saving uncompressed images? This sounds very interesting, I'd like to test it 🙂

However, there is still overhead compared to not having Movie Maker on, which is only resolved after also disabling texture downloading:

This is likely resolved by using hardware readbacks, which introduces a 2-3 frame delay. It should be possible to compensate this delay when writing the video, so that the audio is perfectly synchronized with the video like before.

Hardware readbacks are something that would be worth exposing to GDScipt too, as they make a lot of things possible that shaders alone can't do (e.g. for gameplay purposes).

myaaaaaaaaa · 2023-03-04T15:14:44Z

Do you have a branch for saving uncompressed images? This sounds very interesting, I'd like to test it 🙂

See below for a proof-of-concept that you can download and git apply, obviously not written in a way suitable for inclusion into Godot:

 servers/movie_writer/movie_writer_pngwav.cpp | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/servers/movie_writer/movie_writer_pngwav.cpp b/servers/movie_writer/movie_writer_pngwav.cpp
index 12f338dfb6..74720c5879 100644
--- a/servers/movie_writer/movie_writer_pngwav.cpp
+++ b/servers/movie_writer/movie_writer_pngwav.cpp
@@ -139,11 +139,23 @@ Error MovieWriterPNGWAV::write_begin(const Size2i &p_movie_size, uint32_t p_fps,
 
 Error MovieWriterPNGWAV::write_frame(const Ref<Image> &p_image, const int32_t *p_audio_data) {
 	ERR_FAIL_COND_V(!f_wav.is_valid(), ERR_UNCONFIGURED);
+	ERR_FAIL_COND_V(p_image->get_format() != Image::FORMAT_RGB8, ERR_UNCONFIGURED);
+
+	if (getenv("PPM")) {
+		WARN_PRINT_ONCE("Saving as PPM");
+
+		Vector<uint8_t> buffer = p_image->get_data();
+		Ref<FileAccess> fi = FileAccess::open(base_path + zeros_str(frame_count) + ".ppm", FileAccess::WRITE);
+		fi->store_string(String() + "P6\n" + itos(p_image->get_width()) + " " + itos(p_image->get_height()) + "\n255\n");
+		fi->store_buffer(buffer.ptr(), buffer.size());
+	} else {
+		WARN_PRINT_ONCE("Saving as PNG");
 
 	Vector<uint8_t> png_buffer = p_image->save_png_to_buffer();
 
 	Ref<FileAccess> fi = FileAccess::open(base_path + zeros_str(frame_count) + ".png", FileAccess::WRITE);
 	fi->store_buffer(png_buffer.ptr(), png_buffer.size());
+	}
 	f_wav->store_buffer((const uint8_t *)p_audio_data, audio_block_size);
 
 	frame_count++;

It saves images in .ppm, which is reasonably well-supported among standard tools like ImageMagick or ffmpeg, although .bmp should be preferred in a real implementation due to its native integration with Windows.

PeterMarques · 2023-04-27T02:10:07Z

Well i think by the file extension it is a vocation call. I need to do this.

Anyway, this is the issue that gather all the data about improving video recording of engine performance. I need to understand more about this hardware readbacks.

myaaaaaaaaa · 2023-05-01T03:26:24Z

In other words, the overhead comes from a render pipeline stall rather than compressing the image data:

See a proof-of-concept solution for this below that renders to a ringbuffer, which allows Movie Maker to download frames that were rendered ring_buffer_size = 5 frames ago. Combined with the PPM patch above, this should essentially eliminate all overhead with using Movie Maker.

Note that the use of RD::TEXTURE_USAGE_CPU_READ_BIT was a hack to disable synchronization (fencing?), which requires ring_buffer_size to be large enough to avoid a race condition where frames are downloaded while in the middle of being rendered. This is essentially the Vulkan equivalent of synchronizing threads via sleep(), and a proper fence should be used in a real implementation.

However, assuming there were no bugs in my ringbuffer implementation, removing RD::TEXTURE_USAGE_CPU_READ_BIT appears to result in texture_get_data() waiting for the entire pipeline to finish rather than just the texture copy, negating the benefits of the ringbuffer.

I don't know if the best approach is to update texture_get_data() to wait for just the texture copy, or set RD::TEXTURE_USAGE_CPU_READ_BIT and fence manually from the Movie Maker side (assuming either is possible).

diff --git a/servers/movie_writer/movie_writer.cpp b/servers/movie_writer/movie_writer.cpp
index 9df05ba94a..6348238534 100644
--- a/servers/movie_writer/movie_writer.cpp
+++ b/servers/movie_writer/movie_writer.cpp
@@ -184,7 +184,32 @@ void MovieWriter::add_frame() {
 
 	RID main_vp_rid = RenderingServer::get_singleton()->viewport_find_from_screen_attachment(DisplayServer::MAIN_WINDOW_ID);
 	RID main_vp_texture = RenderingServer::get_singleton()->viewport_get_texture(main_vp_rid);
-	Ref<Image> vp_tex = RenderingServer::get_singleton()->texture_2d_get(main_vp_texture);
+	RID main_vp_rd_texture = RenderingServer::get_singleton()->texture_get_rd_texture(main_vp_texture);
+
+	Vector2 vp_size = RD::get_singleton()->texture_size(main_vp_rd_texture);
+
+	constexpr int ring_buffer_size = 5;
+	static RID ring_buffer[ring_buffer_size];
+	static int ring_index = -1;
+	if (ring_index < 0) {
+		ring_index = 0;
+		RD::TextureFormat tf;
+		tf.width = vp_size.x;
+		tf.height = vp_size.y;
+		tf.format = RD::DATA_FORMAT_R8G8B8_SRGB;
+		tf.usage_bits = RD::TEXTURE_USAGE_CPU_READ_BIT | RD::TEXTURE_USAGE_CAN_COPY_FROM_BIT | RD::TEXTURE_USAGE_CAN_COPY_TO_BIT;
+		for (int i = 0; i < ring_buffer_size; i++) {
+			ring_buffer[i] = RD::get_singleton()->texture_create(tf, RD::TextureView());
+		}
+	}
+
+	RD::get_singleton()->texture_copy(main_vp_rd_texture, ring_buffer[ring_index], Vector3(), Vector3(), Vector3(vp_size.x, vp_size.y, 1), 0, 0, 0, 0);
+	ring_index = (ring_index+1)%ring_buffer_size;
+	if (Engine::get_singleton()->get_frames_drawn() < ring_buffer_size)
+		return;
+
+	Vector<uint8_t> data = RD::get_singleton()->texture_get_data(ring_buffer[ring_index], 0);
+	Ref<Image> vp_tex = Image::create_from_data(vp_size.x, vp_size.y, false, Image::Format::FORMAT_RGB8, data);
 
 	RenderingServer::get_singleton()->viewport_set_measure_render_time(main_vp_rid, true);
 	cpu_time += RenderingServer::get_singleton()->viewport_get_measured_render_time_cpu(main_vp_rid);

Calinou · 2023-05-21T21:42:06Z

Great work 🙂

See a proof-of-concept solution for this below that renders to a ringbuffer, which allows Movie Maker to download frames that were rendered ring_buffer_size = 5 frames ago. Combined with the PPM patch above, this should essentially eliminate all overhead with using Movie Maker.

What about audio synchronization? I assume we should intentionally delay audio by a duration equivalent to 5 frames to ensure the saved image and sound stay in sync.

myaaaaaaaaa · 2023-05-21T23:21:19Z

What about audio synchronization? I assume we should intentionally delay audio by a duration equivalent to 5 frames to ensure the saved image and sound stay in sync.

Yes, audio samples should be stored inside the ringbuffer, so something like this pseudocode:

-	static RID ring_buffer[ring_buffer_size];
+	static struct {
+		RID frame_texture;
+		LocalVector<int32_t> audio_mix_buffer;
+	} ring_buffer[ring_buffer_size];

...

 	RD::get_singleton()->texture_copy(main_vp_rd_texture, ring_buffer[ring_index].frame_texture, Vector3(), Vector3(), Vector3(vp_size.x, vp_size.y, 1), 0, 0, 0, 0);
+	ring_buffer[ring_index].audio_mix_buffer = audio_mix_buffer;

...

 	Vector<uint8_t> data = RD::get_singleton()->texture_get_data(ring_buffer[ring_index].frame_texture, 0);
 	Ref<Image> vp_tex = Image::create_from_data(vp_size.x, vp_size.y, false, Image::Format::FORMAT_RGB8, data);
+	audio_mix_buffer = ring_buffer[ring_index].audio_mix_buffer;

Calinou added the topic:core label Jun 26, 2022

Calinou mentioned this issue Jun 26, 2022

Implement support for writing AVI files larger than 4 GB in Movie Maker mode #4752

Open

Calinou added the performance label Sep 9, 2022

myaaaaaaaaa mentioned this issue Mar 2, 2023

Implement hardware accelerated Movie Maker encoding using Vulkan Video #6409

Closed

myaaaaaaaaa mentioned this issue Mar 3, 2023

Enable fast saving mode for PNGs godotengine/godot#74324

Closed

myaaaaaaaaa mentioned this issue Apr 27, 2023

SubViewPort.get_texture.get_image makes Godot stutter/jitter godotengine/godot#75877

Open

Calinou mentioned this issue Sep 27, 2023

Add support for asynchronous RenderingDevice.buffer_get_data() (hardware readbacks) #7886

Closed

Calinou mentioned this issue Oct 23, 2023

MovieWriter drops FPS from 60 to 10 godotengine/godot#83864

Closed

Calinou mentioned this issue May 30, 2024

Ensure MovieWriter output is in gamma space when using HDR 2D godotengine/godot#92496

Merged

Calinou mentioned this issue Jan 16, 2025

[TRACKER] Movie Maker mode issues godotengine/godot#101644

Open

22 tasks

Calinou mentioned this issue Feb 20, 2025

Make use of hardware accelerated video encoding (ex: NVENC) in Movie Maker mode #11813

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement threading to speed up video saving in Movie Maker mode #4751

Implement threading to speed up video saving in Movie Maker mode #4751

Calinou commented Jun 26, 2022 •

edited

Loading

myaaaaaaaaa commented Mar 3, 2023

Calinou commented Mar 3, 2023 •

edited

Loading

myaaaaaaaaa commented Mar 4, 2023

Calinou commented Mar 4, 2023 •

edited

Loading

myaaaaaaaaa commented Mar 4, 2023 •

edited

Loading

PeterMarques commented Apr 27, 2023 •

edited

Loading

myaaaaaaaaa commented May 1, 2023 •

edited

Loading

Calinou commented May 21, 2023 •

edited

Loading

myaaaaaaaaa commented May 21, 2023

Implement threading to speed up video saving in Movie Maker mode #4751

Implement threading to speed up video saving in Movie Maker mode #4751

Comments

Calinou commented Jun 26, 2022 • edited Loading

Describe the project you are working on

Describe the problem or limitation you are having in your project

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

If this enhancement will not be used often, can it be worked around with a few lines of script?

Is there a reason why this should be core and not an add-on in the asset library?

myaaaaaaaaa commented Mar 3, 2023

Calinou commented Mar 3, 2023 • edited Loading

myaaaaaaaaa commented Mar 4, 2023

Calinou commented Mar 4, 2023 • edited Loading

myaaaaaaaaa commented Mar 4, 2023 • edited Loading

PeterMarques commented Apr 27, 2023 • edited Loading

myaaaaaaaaa commented May 1, 2023 • edited Loading

Calinou commented May 21, 2023 • edited Loading

myaaaaaaaaa commented May 21, 2023

Calinou commented Jun 26, 2022 •

edited

Loading

Calinou commented Mar 3, 2023 •

edited

Loading

Calinou commented Mar 4, 2023 •

edited

Loading

myaaaaaaaaa commented Mar 4, 2023 •

edited

Loading

PeterMarques commented Apr 27, 2023 •

edited

Loading

myaaaaaaaaa commented May 1, 2023 •

edited

Loading

Calinou commented May 21, 2023 •

edited

Loading