Skip to content
/ adr Public

Action-conditioned disentangled representations for video prediction

License

Notifications You must be signed in to change notification settings

m-serra/adr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ADR

Action-conditioned disentangled representations for video prediction

(Code is being refactored from previous version)


[Paper] | [Thesis]

Overview

In this work we take on the challenge of overcoming one of the main hurdles in the field of video prediction which is the prediction of object movement. The key insight of our solution is that, in a robotic scenario, it should be easy to predict the agent's own movement, and that the supression of that prediction should allow the model to focus on the more difficult to predict movement of the objects.

With this in mind, we first propose ADR-AO (agent only), which predicts the future pose of the agent from the observation of a few context frames and the knowledge of the agent's future actions, while explicitly ignoring the objects.

The error between ground truth frames and the frames generated with ADR-AO produces an image dominated by information of the objects that move during the video. This is a cue on object information obtained in a self-supervised way, without the need for data pre-processing or human annotation.

The error images can be used to learn a representation of the objects. When predicting into the future with ADR-VP, an LSTM receives a content, action and object representation at time-step t and outputs the object representation for t+1. Predictions can then be fed back into the network, allowing the model to hallucinate the future.

About

Action-conditioned disentangled representations for video prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages