🎯 FoundationPose — 6D Object Pose Estimation

FoundationPose by NVIDIA is the #1 method on the BOP Challenge 2024 benchmark for model-based 6D localization of unseen objects, achieving an AR score of 73.4 across 7 core datasets (LM-O, T-LESS, TUD-L, IC-BIN, ITODD, HB, YCB-V).

How it works

  1. Initialize: Upload a 3D mesh (.obj/.stl/.ply) of your object and optionally reference RGB images
  2. Estimate: Upload a query RGB image (+ optional depth) and the model estimates the full 6D pose
  3. Visualize: See the projected 3D axes and bounding box overlaid on the image

The pose output is a 4×4 transformation matrix (rotation + translation) from object frame to camera frame.

Metric Value
BOP AR Score 73.4
BOP Rank (2024) #1 (model-based unseen)
Paper CVPR 2024
Input RGB-D + CAD mesh

📋 Initialization Guide

Required:

  • Object ID: A unique name for your object (e.g., "mug", "wrench")
  • 3D Mesh: Upload an .obj, .stl, or .ply file of the object

Optional but recommended:

  • Reference Images: 1+ RGB images of the object from known viewpoints
  • Camera Intrinsics: Focal lengths (fx, fy) and principal point (cx, cy)

💡 Tip: The default intrinsics work for the bundled test data. For your own images, use the calibration values from your camera.

Camera Intrinsics


Built with ❤️ using FoundationPose by NVIDIA and Gradio — Powered by the FoundationPose backend Space