Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Animate Anyone trn patch #510

Merged
merged 18 commits into from
Jun 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions ppdiffusers/examples/AnimateAnyone/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,10 +122,11 @@ python -u -m paddle.distributed.launch --gpus "0" scripts/trainer_stage2.py \

### 4.4 第二阶段微调前后对比
在第二阶段训练中,利用 [animatediff初始化权重](https://huggingface.co/guoyww/animatediff)对模型组网中的motion_modules进行微调,微调前后生成效果对比如下:

| Static Image | Pose Video | Before Fine-tuning | After Fine-tuning |
|--------------|------------|---------------------|-------------------|
| <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/f6c5d27b-0183-4ae5-ad6b-3e36125cb515" width="512" height="668"> | <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/abe4931d-81ca-453b-b061-510a48b62b02" width="512" height="668"> | <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/33ddb6ac-d07c-40a2-9d97-7cba9ebea88d" width="512" height="668"> | <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/8b4ba74c-5a3f-45c3-be0f-645e0ece6bcd" width="552" height="668"> |
| <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/fa0df880-d891-4a99-8272-86405f38a03f" width="512" height="668"> | <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/015640bf-9309-4a88-b1ff-7e63ab04f0b8" width="512" height="668"> | <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/86f34d9f-73a8-4a4c-9945-04d1f322d5d3" width="512" height="668"> | <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/104eb7d1-b9eb-453a-bb0b-27f2fe02f6c6" width="552" height="668"> |
|--------------|------------|--------------------|-------------------|
| ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/07a5f6cd-db53-4c69-a469-fda9edbff3f3) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/5442ff20-9aab-4f28-adca-711c7cd46ff9) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/d1f6942f-2075-4e24-b7e1-645c7a9f2c86) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/a2470660-3757-474b-b414-117416f1314c) |
| ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/5958967d-57ce-4501-8a15-860879e08541) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/6e4ca44d-5d62-49a6-ae2f-bf87e0ca29b2) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/b3644e24-ec5e-43e4-b44d-7d5b4e6ca2c3) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/dd4aa5d5-6217-49ba-984f-1ceb05ca4495) |

## 5. 模型推理

Expand All @@ -136,10 +137,11 @@ python -m scripts.pose2vid --config ./configs/inference/animation.yaml -W 600 -H
```

生成效果如下所示:
| Static Image | Pose Video | Animation Video |
|--------------|------------|---------------------|
| <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/c55a0449-b0f2-4137-9ed0-354bd3c57936" width="512" height="668"> | <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/f856e8c4-824c-4403-8fb2-6cdf12eacea2" width="512" height="668"> | <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/23e2e55e-f505-425f-920f-cde7e04bebbe" width="552" height="668"> |
| <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/bf3ceacc-ad32-41ea-9f2c-1fb91abb2afe" width="512" height="668"> | <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/5eec36a8-7ce8-4299-b524-0c45f115bc0c" width="512" height="668"> | <img src="https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/0c7e7088-58f5-476f-8d37-bf5bb768f56c" width="552" height="668"> |

| Static Image | Pose Video | Before Fine-tuning |
|--------------|------------|--------------------|
| ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/a81e2c42-09c6-4a0b-8f0b-b7df1d77779a) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/973a6629-f24a-4420-b4af-7653e8ff8e92) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/ce2e2cd2-8ba2-46dd-bb6b-99726cd80e97) |
| ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/abb8da73-951b-41a1-b922-8095ca84b988) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/b1d5efa8-76e0-4d4b-a878-4c3625b65b3d) | ![demo](https://github.com/PaddlePaddle/PaddleMIX/assets/46399096/68c1a0ef-6958-4a66-92b6-6d52717354f0)|

## 5. 参考资料

Expand Down
6 changes: 3 additions & 3 deletions ppdiffusers/examples/AnimateAnyone/scripts/pose2vid.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@
from paddle.vision import transforms
from paddlenlp.transformers import CLIPVisionModelWithProjection
from PIL import Image
from src.models.pose_guider import PoseGuider
from src.models.unet_2d_condition import UNet2DConditionModel
from src.models.unet_3d import UNet3DConditionModel
from src.pipelines.pipeline_pose2vid_long import Pose2VideoPipeline
from src.utils.util import get_fps, read_frames, save_video_as_mp4

from ppdiffusers import AutoencoderKL, DDIMScheduler
from ppdiffusers.models.animate_anyone.pose_guider import PoseGuider
from ppdiffusers.models.animate_anyone.unet_2d_condition import UNet2DConditionModel
from ppdiffusers.models.animate_anyone.unet_3d import UNet3DConditionModel


def parse_args():
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,15 @@
import paddle
from einops import rearrange
from paddlenlp.transformers import CLIPImageProcessor
from src.models.mutual_self_attention import ReferenceAttentionControl
from src.pipelines.context import get_context_scheduler
from src.pipelines.utils import get_tensor_interpolation_method
from tqdm import tqdm

from ppdiffusers import DiffusionPipeline
from ppdiffusers.image_processor import VaeImageProcessor
from ppdiffusers.models.animate_anyone.mutual_self_attention import (
ReferenceAttentionControl,
)
from ppdiffusers.models.modeling_utils import faster_set_state_dict
from ppdiffusers.schedulers import (
DDIMScheduler,
Expand Down Expand Up @@ -461,20 +463,18 @@ def __call__(

for context in global_context:

latent_model_input = paddle.concat(x=[latents[:, :, c] for c in context]).transpose(
[2, 0, 1, 3, 4]
)
latent_model_input = paddle.concat(x=[latents[:, :, c] for c in context])

latent_model_input = latent_model_input.tile(
((2 if do_classifier_free_guidance else 1), 1, 1, 1, 1)
((2 if do_classifier_free_guidance else 1, 1, 1, 1, 1))
)

latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)

b, c, f, h, w = latent_model_input.shape

latent_pose_input = paddle.concat(x=[pose_fea[:, :, c] for c in context]).transpose(
[2, 0, 1, 3, 4]
)
latent_pose_input = paddle.concat(x=[pose_fea[:, :, c] for c in context])

latent_pose_input = latent_pose_input.tile((2 if do_classifier_free_guidance else 1, 1, 1, 1, 1))

pred = self.denoising_unet(
Expand All @@ -486,13 +486,8 @@ def __call__(
)[0]

for j, c in enumerate(context):

add_pred_noise = noise_pred[:, :, c].transpose([2, 0, 1, 3, 4]) + pred
add_1_conuter = counter[:, :, c].transpose([2, 0, 1, 3, 4]) + 1
for index, value_c in enumerate(c):

noise_pred[:, :, value_c] = add_pred_noise[:, :, index]
counter[:, :, value_c] = add_1_conuter[:, :, index]
noise_pred[:, :, c] = noise_pred[:, :, c] + pred
counter[:, :, c] = counter[:, :, c] + 1

# perform guidance
if do_classifier_free_guidance:
Expand Down
10 changes: 6 additions & 4 deletions ppdiffusers/examples/AnimateAnyone/src/trainer/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,14 @@
from omegaconf import OmegaConf
from paddlenlp.transformers import CLIPVisionModelWithProjection
from paddlenlp.utils.log import logger
from src.models.mutual_self_attention import ReferenceAttentionControl
from src.models.pose_guider import PoseGuider
from src.models.unet_2d_condition import UNet2DConditionModel
from src.models.unet_3d import UNet3DConditionModel

from ppdiffusers import AutoencoderKL, DDIMScheduler
from ppdiffusers.models.animate_anyone.mutual_self_attention import (
ReferenceAttentionControl,
)
from ppdiffusers.models.animate_anyone.pose_guider import PoseGuider
from ppdiffusers.models.animate_anyone.unet_2d_condition import UNet2DConditionModel
from ppdiffusers.models.animate_anyone.unet_3d import UNet3DConditionModel
from ppdiffusers.training_utils import freeze_params, unfreeze_params


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,11 @@

import paddle
from einops import rearrange
from src.models.attention import TemporalBasicTransformerBlock

from .attention import BasicTransformerBlock
from ppdiffusers.models.animate_anyone.attention import (
BasicTransformerBlock,
TemporalBasicTransformerBlock,
)


def paddle_dfs(model: paddle.nn.Layer):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@
from typing import Tuple

import paddle
from src.models.motion_module import zero_module
from src.models.resnet import InflatedConv3d

from ppdiffusers.models.animate_anyone.motion_module import zero_module
from ppdiffusers.models.animate_anyone.resnet import InflatedConv3d
from ppdiffusers.models.modeling_utils import ContextManagers, ModelMixin


Expand Down