-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement threading to speed up video saving in Movie Maker mode #4751
Comments
Alternatively for lossless video, there's also the option to add support for outputting .bmp sequences, which should be substantially faster than compressing to .png. |
This is something I indirectly proposed in #6390 🙂 Godot doesn't have Adding a way to save uncompressed images will help improve performance to an extent, but the benefit is often much smaller compared to threading. See this comment and its replies for an example of this in SHADERed. |
After further investigation, I've found that uncompressed images do in fact provide optimal performance, in the sense that the fps is identical to just disabling frame writing altogether: diff --git a/main/main.cpp b/main/main.cpp
index 87e83f65c1..dc051e74ce 100644
--- a/main/main.cpp
+++ b/main/main.cpp
@@ -3277,7 +3277,7 @@ bool Main::iteration() {
RID main_vp_rid = RenderingServer::get_singleton()->viewport_find_from_screen_attachment(DisplayServer::MAIN_WINDOW_ID);
RID main_vp_texture = RenderingServer::get_singleton()->viewport_get_texture(main_vp_rid);
Ref<Image> vp_tex = RenderingServer::get_singleton()->texture_2d_get(main_vp_texture);
- movie_writer->add_frame(vp_tex);
}
if ((quit_after > 0) && (Engine::get_singleton()->_process_frames >= quit_after)) { However, there is still overhead compared to not having Movie Maker on, which is only resolved after also disabling texture downloading: diff --git a/main/main.cpp b/main/main.cpp
index 87e83f65c1..02f83e30d1 100644
--- a/main/main.cpp
+++ b/main/main.cpp
@@ -3275,9 +3275,7 @@ bool Main::iteration() {
if (movie_writer) {
RID main_vp_rid = RenderingServer::get_singleton()->viewport_find_from_screen_attachment(DisplayServer::MAIN_WINDOW_ID);
RID main_vp_texture = RenderingServer::get_singleton()->viewport_get_texture(main_vp_rid);
- Ref<Image> vp_tex = RenderingServer::get_singleton()->texture_2d_get(main_vp_texture);
- movie_writer->add_frame(vp_tex);
}
if ((quit_after > 0) && (Engine::get_singleton()->_process_frames >= quit_after)) { In other words, the overhead comes from a render pipeline stall rather than compressing the image data: Desired behavior: I don't think there's much more that can be done from the encoding side. |
Do you have a branch for saving uncompressed images? This sounds very interesting, I'd like to test it 🙂
This is likely resolved by using hardware readbacks, which introduces a 2-3 frame delay. It should be possible to compensate this delay when writing the video, so that the audio is perfectly synchronized with the video like before. Hardware readbacks are something that would be worth exposing to GDScipt too, as they make a lot of things possible that shaders alone can't do (e.g. for gameplay purposes). |
See below for a proof-of-concept that you can download and servers/movie_writer/movie_writer_pngwav.cpp | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/servers/movie_writer/movie_writer_pngwav.cpp b/servers/movie_writer/movie_writer_pngwav.cpp
index 12f338dfb6..74720c5879 100644
--- a/servers/movie_writer/movie_writer_pngwav.cpp
+++ b/servers/movie_writer/movie_writer_pngwav.cpp
@@ -139,11 +139,23 @@ Error MovieWriterPNGWAV::write_begin(const Size2i &p_movie_size, uint32_t p_fps,
Error MovieWriterPNGWAV::write_frame(const Ref<Image> &p_image, const int32_t *p_audio_data) {
ERR_FAIL_COND_V(!f_wav.is_valid(), ERR_UNCONFIGURED);
+ ERR_FAIL_COND_V(p_image->get_format() != Image::FORMAT_RGB8, ERR_UNCONFIGURED);
+
+ if (getenv("PPM")) {
+ WARN_PRINT_ONCE("Saving as PPM");
+
+ Vector<uint8_t> buffer = p_image->get_data();
+ Ref<FileAccess> fi = FileAccess::open(base_path + zeros_str(frame_count) + ".ppm", FileAccess::WRITE);
+ fi->store_string(String() + "P6\n" + itos(p_image->get_width()) + " " + itos(p_image->get_height()) + "\n255\n");
+ fi->store_buffer(buffer.ptr(), buffer.size());
+ } else {
+ WARN_PRINT_ONCE("Saving as PNG");
Vector<uint8_t> png_buffer = p_image->save_png_to_buffer();
Ref<FileAccess> fi = FileAccess::open(base_path + zeros_str(frame_count) + ".png", FileAccess::WRITE);
fi->store_buffer(png_buffer.ptr(), png_buffer.size());
+ }
f_wav->store_buffer((const uint8_t *)p_audio_data, audio_block_size);
frame_count++; It saves images in .ppm, which is reasonably well-supported among standard tools like ImageMagick or ffmpeg, although .bmp should be preferred in a real implementation due to its native integration with Windows. |
Well i think by the file extension it is a vocation call. I need to do this. Anyway, this is the issue that gather all the data about improving video recording of engine performance. I need to understand more about this hardware readbacks. |
See a proof-of-concept solution for this below that renders to a ringbuffer, which allows Movie Maker to download frames that were rendered Note that the use of However, assuming there were no bugs in my ringbuffer implementation, removing I don't know if the best approach is to update diff --git a/servers/movie_writer/movie_writer.cpp b/servers/movie_writer/movie_writer.cpp
index 9df05ba94a..6348238534 100644
--- a/servers/movie_writer/movie_writer.cpp
+++ b/servers/movie_writer/movie_writer.cpp
@@ -184,7 +184,32 @@ void MovieWriter::add_frame() {
RID main_vp_rid = RenderingServer::get_singleton()->viewport_find_from_screen_attachment(DisplayServer::MAIN_WINDOW_ID);
RID main_vp_texture = RenderingServer::get_singleton()->viewport_get_texture(main_vp_rid);
- Ref<Image> vp_tex = RenderingServer::get_singleton()->texture_2d_get(main_vp_texture);
+ RID main_vp_rd_texture = RenderingServer::get_singleton()->texture_get_rd_texture(main_vp_texture);
+
+ Vector2 vp_size = RD::get_singleton()->texture_size(main_vp_rd_texture);
+
+ constexpr int ring_buffer_size = 5;
+ static RID ring_buffer[ring_buffer_size];
+ static int ring_index = -1;
+ if (ring_index < 0) {
+ ring_index = 0;
+ RD::TextureFormat tf;
+ tf.width = vp_size.x;
+ tf.height = vp_size.y;
+ tf.format = RD::DATA_FORMAT_R8G8B8_SRGB;
+ tf.usage_bits = RD::TEXTURE_USAGE_CPU_READ_BIT | RD::TEXTURE_USAGE_CAN_COPY_FROM_BIT | RD::TEXTURE_USAGE_CAN_COPY_TO_BIT;
+ for (int i = 0; i < ring_buffer_size; i++) {
+ ring_buffer[i] = RD::get_singleton()->texture_create(tf, RD::TextureView());
+ }
+ }
+
+ RD::get_singleton()->texture_copy(main_vp_rd_texture, ring_buffer[ring_index], Vector3(), Vector3(), Vector3(vp_size.x, vp_size.y, 1), 0, 0, 0, 0);
+ ring_index = (ring_index+1)%ring_buffer_size;
+ if (Engine::get_singleton()->get_frames_drawn() < ring_buffer_size)
+ return;
+
+ Vector<uint8_t> data = RD::get_singleton()->texture_get_data(ring_buffer[ring_index], 0);
+ Ref<Image> vp_tex = Image::create_from_data(vp_size.x, vp_size.y, false, Image::Format::FORMAT_RGB8, data);
RenderingServer::get_singleton()->viewport_set_measure_render_time(main_vp_rid, true);
cpu_time += RenderingServer::get_singleton()->viewport_get_measured_render_time_cpu(main_vp_rid);
|
Great work 🙂
What about audio synchronization? I assume we should intentionally delay audio by a duration equivalent to 5 frames to ensure the saved image and sound stay in sync. |
Yes, audio samples should be stored inside the ringbuffer, so something like this pseudocode: - static RID ring_buffer[ring_buffer_size];
+ static struct {
+ RID frame_texture;
+ LocalVector<int32_t> audio_mix_buffer;
+ } ring_buffer[ring_buffer_size];
...
RD::get_singleton()->texture_copy(main_vp_rd_texture, ring_buffer[ring_index].frame_texture, Vector3(), Vector3(), Vector3(vp_size.x, vp_size.y, 1), 0, 0, 0, 0);
+ ring_buffer[ring_index].audio_mix_buffer = audio_mix_buffer;
...
Vector<uint8_t> data = RD::get_singleton()->texture_get_data(ring_buffer[ring_index].frame_texture, 0);
Ref<Image> vp_tex = Image::create_from_data(vp_size.x, vp_size.y, false, Image::Format::FORMAT_RGB8, data);
+ audio_mix_buffer = ring_buffer[ring_index].audio_mix_buffer; |
Related to #4752.
Describe the project you are working on
The Godot editor 🙂
Describe the problem or limitation you are having in your project
Godot now has offline video recording functionality. This works well, but it's fairly slow when recording at high resolutions since encoding the MJPEG can take a relatively long time. This is even worse when using PNG output as PNG is slow to encode compared to MJPEG.
Describe the feature / enhancement and how it helps to overcome the problem or limitation
Move MJPEG/PNG encoding to a thread pool, so it can be performed in parallel and avoid blocking the main thread (so it can keep on rendering frames).
This has a potential to double video writing performance (or more, on CPUs with higher core counts).
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
If this enhancement will not be used often, can it be worked around with a few lines of script?
No.
Is there a reason why this should be core and not an add-on in the asset library?
This is about improving Movie Maker recording performance to make it more usable for video production.
The text was updated successfully, but these errors were encountered: