🎯 FoundationPose — 6D Object Pose Estimation

FoundationPose by NVIDIA is the #1 method on the BOP Challenge 2024 benchmark for model-based 6D localization of unseen objects, achieving an AR score of 73.4 across 7 core datasets (LM-O, T-LESS, TUD-L, IC-BIN, ITODD, HB, YCB-V).

How it works

Initialize: Upload a 3D mesh (.obj/.stl/.ply) of your object and optionally reference RGB images
Estimate: Upload a query RGB image (+ optional depth) and the model estimates the full 6D pose
Visualize: See the projected 3D axes and bounding box overlaid on the image

The pose output is a 4×4 transformation matrix (rotation + translation) from object frame to camera frame.

Metric	Value
BOP AR Score	73.4
BOP Rank (2024)	#1 (model-based unseen)
Paper	CVPR 2024
Input	RGB-D + CAD mesh

📋 Initialization Guide

Required:

Object ID: A unique name for your object (e.g., "mug", "wrench")
3D Mesh: Upload an .obj, .stl, or .ply file of the object

Optional but recommended:

Reference Images: 1+ RGB images of the object from known viewpoints
Camera Intrinsics: Focal lengths (fx, fy) and principal point (cx, cy)

💡 Tip: The default intrinsics work for the bundled test data. For your own images, use the calibration values from your camera.

Object ID

3D Mesh (.obj / .stl / .ply)

Reference Images (optional)

Camera Intrinsics

Result

Built with ❤️ using FoundationPose by NVIDIA and Gradio — Powered by the FoundationPose backend Space