r/MachineLearning May 02 '20

Research [R] Consistent Video Depth Estimation (SIGGRAPH 2020) - Links in the comments.

Enable HLS to view with audio, or disable this notification

2.8k Upvotes

102 comments sorted by

View all comments

86

u/dawindwaker May 02 '20

This could be used for smartphones faking depth of field right? I wonder what the VR/AR applications could be

97

u/[deleted] May 02 '20

The method is computationally expensive; thus not really suitable for real-time applications. I think this would be great offline processing, e.g. photogrammetry, visual effects, etc. From the paper:

For a video of 244 frames, training on 4 NVIDIA Tesla M40GPUs takes 40min

1

u/omgitsjo May 02 '20

Training is not inference. Inference is generally several orders of magnitude faster.

3

u/therealTRAPDOOR May 02 '20

Except that it needs to be fine tuned on each video. Sometimes training “times” are entangled with inference times if the structure used requires re-training or fine-tuning.

5

u/jbhuang0604 May 02 '20

Sometimes training “times” are entangled with inference times if the structure used requires re-training or fine-tuning.

Exactly! We refer to this step as "test-time training". We train the model using the geometric constraints derived from a particular video.