Zehan Wang1* · Siyu Chen1* · Lihe Yang2
Jialei Wang1 · Ziang Zhang1 · Hengshuang Zhao2 · Zhou Zhao1
1ZJU 2HKU
This work presents Prior Depth Anything, a framework that combines incomplete but precise metric information in depth measurement with relative but complete geometric structures in depth prediction, generating accurate, dense, and detailed metric depth maps for any scene
- 2025-08-30: We release our newly trained model prior_depth_anything_1_1.pth which replaces
error map
withsparse mask
as the condition.priorda-v1.1
shows better performance on dense patterns without any modification of network structures. Our additional evaluation results are shown below. We also fixed several bugs mentioned in #issues. - 2025-05-28: We provide the code to measure inference latency. You can do it just by
python latency.py
- 2025-05-21: We provide one more way to input your own geometric prior (To minimize code changes, we use "geometric" to represent the geometric prior).
- 2025-05-15: We released Paper, Project Page, Code and Models
We provide two model checkpoints of varying scales for robust metric depth completion. The checkpoints will be downloaded when you first run our demo.
First, clone this repository and create environment with python=3.9
.
git clone https://github.com/SpatialVision/Prior-Depth-Anything
cd Prior-Depth-Anything
conda create -n priorda python=3.9
conda activate priorda
Then, install the dependencies with the following command. If you encounter the issue that the installed torch_cluster
is not for CUDA, please install torch_cluster==1.6.3 -f https://pytorch-geometric.com/whl/torch-2.2.2+cu121.html
instead to download the gpu version (refer to this issue).
pip install -r requirements.txt
or you can install Prior-Depth-Anything
as a package.
pip install -e .
To run with CLI, you can begin by following command (Installing Prior-Depth-Anything
as a package is required.). On the initial execution, the model weights will be automatically downloaded from the Hugging Face Model Hub.
# We sample on Ground-Truth depth map as prior.
priorda test --image_path assets/sample-1/rgb.jpg --prior_path assets/sample-1/gt_depth.png --pattern 1000 --visualize 1
Alternatively, you can use our model with:
import torch
from prior_depth_anything import PriorDepthAnything
device = "cuda:0" if torch.cuda.is_available() else "cpu"
priorda = PriorDepthAnything(device=device)
image_path = 'assets/sample-2/rgb.jpg'
prior_path = 'assets/sample-2/prior_depth.png'
output = priorda.infer_one_sample(image=image_path, prior=prior_path, visualize=True)
The results will be saved at ./output
.
More example
import torch
from prior_depth_anything import PriorDepthAnything
device = "cuda:0" if torch.cuda.is_available() else "cpu"
priorda = PriorDepthAnything(device=device)
image_path = 'assets/sample-6/rgb.npy'
prior_path = 'assets/sample-6/prior_depth.npy'
output = priorda.infer_one_sample(image=image_path, prior=prior_path, visualize=True)
To facilitate further research, we offer an interface to the first-stage 'coarse alignment' process. You can generate the coarse-aligned depth map by simply configuring the coarse_only
option. For example:
priorda = PriorDepthAnything(device=device, coarse_only=True)
We provide two ways that allow you to utilize geometric information from other depth estimation models.
- Replace the depth estimation model in the coarse stage here.
- Just input the geometric depth to
infer_one_sample()
by specifygeometric
(refer to Inference Configurations). For the example with CLI, you should add the item--geometric_path assets/sample-1/geo_depth.npy
.
mde_dir
: Directory of the monocular depth model backbone.ckpt_dir
: Directory of the fine-stage model fintuned by us.frozen_model_size
: Specify the size of the coarse-stage model(choices=['vits', 'vitb', 'vitl']).conditioned_model_size
: Specify the size of the fine-stage model(choices=['vits', 'vitb', 'vitl'(coming soon...)]).
image
: Path to the input image.prior
: Path to the prior.geometric
: Path to the geometric depth estimated by depth estimation models.pattern
: Pattern to sample in the prior.double_global(bool)
: Whether to use double globally-aligned conditions.visualize(bool)
: Whether to visualize the results.
Prior-Depth-Anything
showcases remarkable zero-shot robustness in the presence of varied and potentially noisy prior inputs. It is engineered to function as a plug-and-play module for other depth estimation frameworks, boosting their performance. Integrating this module requires only a few lines of code, leading to improved depth estimation accuracy.
Set up the environment as in prepraration.
Here, we use VGGT
as an example.
First, use the VGGT
model to predict the depth map. The code below is from VGGT official website.
import torch
from vggt.models.vggt import VGGT
from vggt.utils.load_fn import load_and_preprocess_images
device = "cuda" if torch.cuda.is_available() else "cpu"
# bfloat16 is supported on Ampere GPUs (Compute Capability 8.0+)
dtype = torch.bfloat16 if torch.cuda.get_device_capability()[0] >= 8 else torch.float16
# Initialize the model and load the pretrained weights.
# This will automatically download the model weights the first time it's run, which may take a while.
model = VGGT.from_pretrained("facebook/VGGT-1B").to(device)
# Load and preprocess example images (replace with your own image paths)
image_names = ['assets/sample-2/rgb.jpg']
images = load_and_preprocess_images(image_names).to(device)
with torch.no_grad():
with torch.cuda.amp.autocast(dtype=dtype):
# Predict attributes including cameras, depth maps, and point maps.
predictions = model(images)
Then use Prior-Depth-Anything
to refine the depth map.
# Initialize prior-depth-anything module.
from PIL import Image
import numpy as np
import torch.nn.functional as F
from prior_depth_anything.plugin import PriorDARefiner, PriorDARefinerMetrics
Refiner = PriorDARefiner(device=device)
# Reload RGB for refiner.
priorda_image = torch.from_numpy(np.asarray(Image.open(image_names[0])).astype(np.uint8))
### Refine depth
depth_map, depth_conf = predictions['depth'], predictions['depth_conf']
refined_depth, meview_depth_map = Refiner.predict(
image=priorda_image, depth_map=depth_map.squeeze(), confidence=depth_conf.squeeze())
# The size of `refined_depth` is the same as `priorda_image`, tune it to your need.
We provide a complete example here including the performance comparison between the original depth map and the refined depth map. For quantitive evaluation results, please refer to our paper.
In addition to the results in our paper, we evaluate our v1.1 model and list the results below.
We would like to express our sincere gratitude to the following excellent works:
If you find this project useful, please consider citing:
@misc{wang2025depthprior,
title={Depth Anything with Any Prior},
author={Zehan Wang and Siyu Chen and Lihe Yang and Jialei Wang and Ziang Zhang and Hengshuang Zhao and Zhou Zhao},
year={2025},
eprint={2505.10565},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.10565},
}