Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About 2D&3D joint training #13

Open
lizhiqi49 opened this issue Sep 24, 2023 · 4 comments
Open

About 2D&3D joint training #13

lizhiqi49 opened this issue Sep 24, 2023 · 4 comments

Comments

@lizhiqi49
Copy link

Very nice work!

I have a question about 2D&3D joint training:
I think it's very intuitive that only training with the synthetic 3D dataset will lead to degeneration on the quality of generated images and easily overfitting to the synthetic 3D data, so it should help to introduce high-quality 2D data into training. But since you didn't show the comparison of with/without 2D data in training, I want to know how much it has improved the generation quality in your practice.
Thanks.

@seasonSH
Copy link
Collaborator

Here are some examples. Empirically we found the joint training leads better quality and text-image consistency.

The examples are "a bulldog wearing a black pirate hat" and "an astronaut riding a horse".

No 2D data 2D+3D Training
an_astronaut_riding_a_horse,_3d_asset_DDIM_50 (1) an_astronaut_riding_DDIM_50
a_bulldog_wearing_a_black_pirate_hat,_DDIM_50 (1) a_bulldog_wearing_DDIM_50 (5)

@lizhiqi49
Copy link
Author

Thank you! The performance was really improved a lot. And I have another question:

You mentioned in your paper that you sample data batch from laion image dataset with 30% chance. When training with multi-view batch, the batch size is 4096 (1024x4), what's the number for 2D batch (1024 or 4096)?

@seasonSH
Copy link
Collaborator

seasonSH commented Sep 26, 2023

We train the model with 32 A100 GPUs distributed on 4 nodes. Each node has a batch size of 256. So for each node:

  • 70% chance a batch is a multi-view batch, which has 256x4 images with 256 text descriptions.
  • 30% chance a batch is an image batch, which has 1024 image+text pairs.

The mode could be different for each node at the same step.

@lizhiqi49
Copy link
Author

OK, thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants