🎯 FoundationPose — 6D Object Pose Estimation
FoundationPose by NVIDIA is the #1 method on the BOP Challenge 2024 benchmark for model-based 6D localization of unseen objects, achieving an AR score of 73.4 across 7 core datasets (LM-O, T-LESS, TUD-L, IC-BIN, ITODD, HB, YCB-V).
How it works
- Initialize: Upload a 3D mesh (.obj/.stl/.ply) of your object and optionally reference RGB images
- Estimate: Upload a query RGB image (+ optional depth) and the model estimates the full 6D pose
- Visualize: See the projected 3D axes and bounding box overlaid on the image
The pose output is a 4×4 transformation matrix (rotation + translation) from object frame to camera frame.
| Metric | Value |
|---|---|
| BOP AR Score | 73.4 |
| BOP Rank (2024) | #1 (model-based unseen) |
| Paper | CVPR 2024 |
| Input | RGB-D + CAD mesh |
📋 Initialization Guide
Required:
- Object ID: A unique name for your object (e.g., "mug", "wrench")
- 3D Mesh: Upload an
.obj,.stl, or.plyfile of the object
Optional but recommended:
- Reference Images: 1+ RGB images of the object from known viewpoints
- Camera Intrinsics: Focal lengths (fx, fy) and principal point (cx, cy)
💡 Tip: The default intrinsics work for the bundled test data. For your own images, use the calibration values from your camera.
Camera Intrinsics
Built with ❤️ using FoundationPose by NVIDIA and Gradio — Powered by the FoundationPose backend Space