ADR

Action-conditioned disentangled representations for video prediction

(Code is being refactored from previous version)

Overview

In this work we take on the challenge of overcoming one of the main hurdles in the field of video prediction which is the prediction of object movement. The key insight of our solution is that, in a robotic scenario, it should be easy to predict the agent's own movement, and that the supression of that prediction should allow the model to focus on the more difficult to predict movement of the objects.

With this in mind, we first propose ADR-AO (agent only), which predicts the future pose of the agent from the observation of a few context frames and the knowledge of the agent's future actions, while explicitly ignoring the objects.

The error between ground truth frames and the frames generated with ADR-AO produces an image dominated by information of the objects that move during the video. This is a cue on object information obtained in a self-supervised way, without the need for data pre-processing or human annotation.

The error images can be used to learn a representation of the objects. When predicting into the future with ADR-VP, an LSTM receives a content, action and object representation at time-step t and outputs the object representation for t+1. Predictions can then be fed back into the network, allowing the model to hallucinate the future.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data_readers		data_readers
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
adr.py		adr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADR

Overview

About

Releases

Packages

Languages

License

m-serra/adr

Folders and files

Latest commit

History

Repository files navigation

ADR

Overview

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages