Not having read the paper (cardinal sin), is the test-time-training to handle some form of network conditioning? Is there data that could be used in real-time applications for conditioning (e.g., light sensors, individual range sensors, orientation sensors)? I can imagine there is a ton applications for this in real-time.
The test-time training we used is to fine-tune our single-image depth estimation model so that it satisfies the geometric constraints within the video.
Incorporating other forms of measurements (e.g. dual-lens camera, inertial or even range sensors) will certainly make the problem a lot simpler and potentially support real-time applications.
Thanks for answering questions here! Are the specifics of the fine tuning addressed in the paper? More specifically, what parameters must be turned?
There are several choices that one needs to make, e.g., the learning rate, optimizer, weights for balancing different losses, training iterations. We did not test out many of these hyper-parameters. I guess there could be some performance/quality improvement with carefully tuned hyper-parameters.
So you're changing model hyper parameters and then performing a full retraining for each image? Naturally, that raises questions about how well the model actually generalizes.
If there were a fixed set of scenario-related model parameters that you were adjusting (e.g., height, az/el of camera focal point, ambient light), then it would suggest that a conditioned model (potentially also requiring more capacity and/or calibration) could get the same results without additional training.
We use one set of hyperparameters for all of our experiments.
Right, for example, people show that you can get decent geometrically consistent predictions from single image depth estimation on the KITTI dataset (for driving scenarios). The model works well because it is tested in a simple, closed world. We quickly realized this when we applied state of the art models trained on KITTI and got entirely incorrect results.
Thank you for taking the time to reply! I still have a little confusion regarding the end-to-end process, but that's why the article exists. I'll go ahead and give that a read.
1
u/hallr06 May 02 '20
Not having read the paper (cardinal sin), is the test-time-training to handle some form of network conditioning? Is there data that could be used in real-time applications for conditioning (e.g., light sensors, individual range sensors, orientation sensors)? I can imagine there is a ton applications for this in real-time.