Skip to content
This repository was archived by the owner on Apr 1, 2025. It is now read-only.
/ banmo Public archive

BANMo Building Animatable 3D Neural Models from Many Casual Videos

License

Notifications You must be signed in to change notification settings

facebookresearch/banmo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

4220987 · Jan 19, 2024

History

55 Commits
Feb 18, 2022
Jun 17, 2022
Feb 14, 2022
Jun 17, 2022
Aug 9, 2022
Oct 9, 2022
Apr 6, 2023
Jun 9, 2022
Apr 11, 2022
Aug 9, 2022
Feb 14, 2022
Feb 14, 2022
Feb 14, 2022
Feb 14, 2022
Jan 19, 2024
Feb 14, 2022
Feb 14, 2022

Repository files navigation

BANMo

[Project page] [Paper] [Colab for NVS]

This repo provides scripts to reproduce experiments in the paper. For the latest updates on the software, please check out lab4d.

Changelog

  • 11/21: Remove eikonal loss to align with paper results, #36
  • 08/09: Fix eikonal loss that regularizes surface (resulting in smoother mesh).
  • 06/18: Add a colab demo for novel view synthesis.
  • 04/11: Replace matching loss with feature rendering loss; Fix bugs in LBS; Stablize optimization.
  • 03/20: Add mesh color option (canonical mappihg vs radiance) during surface extraction. See --ce_color flag.
  • 02/23: Improve NVS with fourier light code, improve uncertainty MLP, add long schedule, minor speed up.
  • 02/17: Add adaptation to a new video, optimization with known root poses, and pose code visualization.
  • 02/15: Add motion-retargeting, quantitative evaluation and synthetic data generation/eval.

Install

Build with conda

We provide two versions.

[A. torch1.10+cu113 (1.4x faster on V100)]
# clone repo
git clone git@github.com:facebookresearch/banmo.git --recursive
cd banmo
# install conda env
conda env create -f misc/banmo-cu113.yml
conda activate banmo-cu113
# install pytorch3d (takes minutes), kmeans-pytorch
pip install -e third_party/pytorch3d
pip install -e third_party/kmeans_pytorch
# install detectron2
python -m pip install detectron2 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
[B. torch1.7+cu110]
# clone repo
git clone git@github.com:facebookresearch/banmo.git --recursive
cd banmo
# install conda env
conda env create -f misc/banmo.yml
conda activate banmo
# install kmeans-pytorch
pip install -e third_party/kmeans_pytorch
# install detectron2
python -m pip install detectron2 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html

Data

We provide two ways to obtain data. The easiest way is to download and unzip the pre-processed data as follows.

[Download pre-processed data]

We provide preprocessed data for cat and human. Download the pre-processed rgb/mask/flow/densepose images as follows

# (~8G for each)
bash misc/processed/download.sh cat-pikachiu
bash misc/processed/download.sh human-cap
[Download raw videos]

Download raw videos to ./raw/ folder

bash misc/vid/download.sh cat-pikachiu
bash misc/vid/download.sh human-cap
bash misc/vid/download.sh dog-tetres
bash misc/vid/download.sh cat-coco

To use your own videos, or pre-process raw videos into banmo format, please follow the instructions here.

PoseNet weights

[expand]

Download pre-trained PoseNet weights for human and quadrupeds

mkdir -p mesh_material/posenet && cd "$_"
wget $(cat ../../misc/posenet.txt); cd ../../

Demo

This example shows how to reconstruct a cat from 11 videos and a human from 10 videos. For more examples, see here.

Hardware/time for running the demo

The short schedule takes 4 hours on 2 V100 GPUs (+SSD storage). To reach higher quality, the full schedule takes 12 hours. We provide a script that use gradient accumulation to support experiments on fewer GPUs / GPU with lower memory.

Setting good hyper-parameter for videos with various length

When optimizing videos with different lengths, we found it useful to scale batchsize with the number of frames. A rule of thumb is to set "num gpus" x "batch size" x "accu steps" ~= num frames. This means more video frames needs more GPU memory but the same optimization time.

Try pre-optimized models

We provide pre-optimized models and scripts to run novel view synthesis and mesh extraction (results saved at tmp/*all.mp4). Also see this Colab for NVS.

# download pre-optimized models
mkdir -p tmp && cd "$_"
wget https://www.dropbox.com/s/qzwuqxp0mzdot6c/cat-pikachiu.npy
wget https://www.dropbox.com/s/dnob0r8zzjbn28a/cat-pikachiu.pth
wget https://www.dropbox.com/s/p74aaeusprbve1z/opts.log # flags used at opt time
cd ../

seqname=cat-pikachiu
# render novel views
bash scripts/render_nvs.sh 0 $seqname tmp/cat-pikachiu.pth 5 0
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: path to the weights
# argv[4]: video id used for pose traj
# argv[5]: video id used for root traj

# Extract articulated meshes and render
bash scripts/render_mgpu.sh 0 $seqname tmp/cat-pikachiu.pth \
        "0 5" 64
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: weights path
# argv[4]: video id separated by space
# argv[5]: resolution of running marching cubes (use 256 to get higher-res mesh)

1. Optimization

[cat-pikachiu]
seqname=cat-pikachiu
# To speed up data loading, we store images as lines of pixels). 
# only needs to run it once per sequence and data are stored
python preprocess/img2lines.py --seqname $seqname

# Optimization
bash scripts/template.sh 0,1 $seqname 10001 "no" "no"
# argv[1]: gpu ids separated by comma 
# args[2]: sequence name
# args[3]: port for distributed training
# args[4]: use_human, pass "" for human cse, "no" for quadreped cse
# args[5]: use_symm, pass "" to force x-symmetric shape

# Extract articulated meshes and render
bash scripts/render_mgpu.sh 0 $seqname logdir/$seqname-e120-b256-ft2/params_latest.pth \
        "0 1 2 3 4 5 6 7 8 9 10" 256
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: weights path
# argv[4]: video id separated by space
# argv[5]: resolution of running marching cubes (256 by default)
cat-pikachiu-.0.-all.mp4
[human-cap]
seqname=adult7
python preprocess/img2lines.py --seqname $seqname
bash scripts/template.sh 0,1 $seqname 10001 "" ""
bash scripts/render_mgpu.sh 0 $seqname logdir/$seqname-e120-b256-ft2/params_latest.pth \
        "0 1 2 3 4 5 6 7 8 9" 256
adult7-.8.-all.mp4

2. Visualization tools

[Tensorboard]
# You may need to set up ssh tunneling to view the tensorboard monitor locally.
screen -dmS "tensorboard" bash -c "tensorboard --logdir=logdir --bind_all"
[Root pose, rest mesh, bones]

To draw root pose trajectories (+rest shape) over epochs

# logdir
logdir=logdir/$seqname-e120-b256-init/
# first_idx, last_idx specifies what frames to be drawn
python scripts/visualize/render_root.py --testdir $logdir --first_idx 0 --last_idx 120

Find the output at $logdir/mesh-cam.gif. During optimization, the rest mesh and bones at each epoch are saved at $logdir/*rest.obj.

pose-20.mp4
[Correspondence/pose code]

To visualize 2d-2d and 2d-3d matchings of the latest epoch weights

# 2d matches between frame 0 and 100 via 2d->feature matching->3d->geometric warp->2d
bash scripts/render_match.sh $logdir/params_latest.pth "0 100" "--render_size 128"

2d-2d matches will be saved to tmp/match_%03d.jpg. 2d-3d feature matches of frame 0 will be saved to tmp/match_line_pred.obj. 2d-3d geometric warps of frame 0 will be saved to tmp/match_line_exp.obj. near-plane frame 0 will be saved to tmp/match_plane.obj. Pose code visualization will be saved at tmp/code.mp4.

pose-code.mp4
[Render novel views]

Render novel views at the canonical camera coordinate

bash scripts/render_nvs.sh 0 $seqname logdir/$seqname-e120-b256-ft2/params_latest.pth 5 0
# argv[1]: gpu id
# argv[2]: sequence name
# argv[3]: path to the weights
# argv[4]: video id used for pose traj
# argv[5]: video id used for root traj

Results will be saved at logdir/$seqname-e120-b256-ft2/nvs*.mp4.

nvs-pikachiu.mp4
[Render canonical view over iterations]

Render depth and color of the canonical view over optimization iterations

bash scripts/visualize/nvs_iter.sh 0 logdir/$seqname-e120-b256-init/
# argv[1]: gpu id
# argv[2]: path to the logdir

Results will be saved at logdir/$seqname-e120-b256-init/vis-iter*.mp4.

cat-pikachiu-vis-iter-iter-dph.mp4
cat-pikachiu-vis-iter-iter-rgb.mp4

Common install issues

[expand]
  • Q: pyrender reports ImportError: Library "GLU" not found.
    • install sudo apt install freeglut3-dev
  • Q: ffmpeg reports libopenh264.so.5 not fund
    • resinstall ffmpeg in conda conda install -c conda-forge ffmpeg

Note on arguments

[expand]
  • use --use_human for human reconstruction, otherwise it assumes quadruped animals
  • use --full_mesh to disable visibility check at mesh extraction time
  • use --noce_color at mesh extraction time to assign radiance instead canonical mapping as vertex colors.
  • use --queryfw at mesh extraction time to extract forward articulated meshes, which only needs to run marching cubes once.
  • use --use_cc maintains the largest connected component for rest mesh in order to set the object bounds and near-far plane (by default turned on). Turn it off with --nouse_cc for disconnected objects such as hands.
  • use --debug to print out the rough time each component takes.

Acknowledgement

[expand]

Volume rendering code is borrowed from Nerf_pl. Flow estimation code is adapted from VCN-robust. Other external repos:

License

[expand]

About

BANMo Building Animatable 3D Neural Models from Many Casual Videos

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published