refine project structure

This commit is contained in:
Ethan Clark 2025-03-21 17:00:58 -07:00
parent ae1abb1db1
commit aae4e91f54
11 changed files with 167 additions and 2894 deletions

34
.gitignore vendored
View File

@ -1,8 +1,32 @@
**__pycache__**
**.vscode**
**recordings**
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# Project specific
recordings/
*.txt
repomix-output.xml
# IDE
.vscode/
.idea/
*.swp
*~

26
.pre-commit-config.yaml Normal file
View File

@ -0,0 +1,26 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
language_version: python3
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
args: ["--profile", "black"]
- repo: https://github.com/pycqa/flake8
rev: 6.0.0
hooks:
- id: flake8
additional_dependencies: [flake8-docstrings]

View File

@ -198,4 +198,4 @@
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
limitations under the License.

105
README.md
View File

@ -0,0 +1,105 @@
# Open Phantom: Training Robots Using Only Human Videos
## Overview
Open Phantom is a fully open-source implementation of the approach described in the paper "[Phantom: Training Robots Without Robots Using Only Human Videos.](https://phantom-human-videos.github.io/)" This project focuses on the data collection component of the Phantom pipeline, enabling anyone with a standard RGB camera to generate training data for robot learning without requiring actual robot hardware.
## Key Features
- **Camera-Only Data Collection**: Capture hand movements using any standard RGB camera
- **3D Hand Tracking**: Convert 2D video to 3D hand poses using MediaPipe landmarks
- **Advanced Depth Estimation**: Generate depth maps from monocular RGB input using ml-depth-pro
- **Hand Segmentation**: Precisely isolate hand regions with Meta's SAM2 for better depth estimation
- **ICP Registration**: Align hand mesh with depth point cloud for improved 3D accuracy
- **Anatomical Constraints**: Apply natural hand constraints to ensure realistic movements
- **Robot Action Extraction**: Transform hand poses into robot control parameters (position, orientation, gripper width)
- **Visualization Pipeline**: Debug-friendly visualization of each processing stage
- **Commercial-Friendly**: Built entirely with open-source, commercially usable components
## Project Status
⚠️ ******Work in Progress******: Open Phantom is currently under active development. Core functionality is implemented, but the codebase is still being refined and tested. We welcome early adopters and contributors to help improve the project.
Known limitations:
* ICP registration still being optimized for better alignment
* Depth estimation quality varies with lighting conditions
* Performance optimizations needed for real-time processing
## Background
The original Phantom paper demonstrated that robots could learn tasks from human demonstrations without any robot-specific data collection. By capturing hand movements in diverse environments and converting them to robot action parameters, it's possible to train robot policies that perform effectively during zero-shot deployment.
Unlike the original implementation which relies on [MANO](https://mano.is.tue.mpg.de/index.html) (a hand model not available for commercial use), Open Phantom is built entirely with open-source components that can be used in commercial applications.
## How It Works
1. **Video Capture**: Record video of your hand performing a task using a standard RGB camera
2. **Hand Tracking**: Track hand landmarks in the video
3. **Depth Estimation**: Estimate depth information from the monocular RGB input
4. **Segmentation**: Segment the hand using SAM2 (Segment Anything Model 2)
5. **3D Reconstruction**: Create a 3D hand model from the landmarks and depth information
6. **Robot Parameters**: Extract position, orientation, and gripper parameters for robot control
## Installation
```bash
# Clone the repository with submodules
git clone https://github.com/yourusername/open-phantom.git
cd open-phantom
# Create and activate conda environment
conda env create -f environment.yml
conda activate open-phantom
# Initialize and update submodules
git submodule update --init --recursive
# Install dependencies for SAM2
cd external/sam2
pip install -e .
cd ../..
# Install dependencies for ML-Depth-Pro
cd external/ml-depth-pro
pip install -e .
cd ../..
```
## Usage
```bash
# Run the main script to record and process a video
python open_phantom/main.py
```
## Contributing
We welcome contributions from the community! This project is intended as a resource for researchers and developers interested in robot learning from human demonstrations. Whether you're improving the hand tracking, depth estimation, or adding new features, your contributions help advance the goal of more accessible robot learning.
## Citation
If you use Open Phantom in your research, please cite the original Phantom paper:
<pre>
@article{lepert2025phantom,
title={Phantom: Training Robots Without Robots Using Only Human Videos},
author={Lepert, Marion and Fang, Jiaying and Bohg, Jeannette},
journal={arXiv preprint arXiv:2503.00779},
year={2025}
}
</pre>
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
* This project is based on the research presented in "Phantom: Training Robots Without Robots Using Only Human Videos"
* We use Meta's SAM2 (Segment Anything Model 2) for hand segmentation
* ML-Depth-Pro from Apple provides advanced depth estimation
## Disclaimer
Open Phantom is a community implementation focused on the data collection aspects of the Phantom approach. The original paper authors are not affiliated with this specific implementation.

File diff suppressed because one or more lines are too long

View File

@ -17,4 +17,4 @@
pkg="rviz"
type="rviz"
args="-d $(find SO_5DOF_ARM100_05d.SLDASM)/urdf.rviz" />
</launch>
</launch>

View File

@ -17,4 +17,4 @@
pkg="rostopic"
type="rostopic"
args="pub /calibrated std_msgs/Bool true" />
</launch>
</launch>

View File

@ -18,4 +18,4 @@ for SO_5DOF_ARM100_05d.SLDASM robot</p>
<export>
<architecture_independent />
</export>
</package>
</package>

View File

@ -1,5 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
<!-- This URDF was automatically created by SolidWorks to URDF Exporter! Originally created by Stephen Brawner (brawner@gmail.com)
<!-- This URDF was automatically created by SolidWorks to URDF Exporter! Originally created by Stephen Brawner (brawner@gmail.com)
Commit Version: 1.6.0-1-g15f4949 Build Version: 1.6.7594.29634
For more information, please see http://wiki.ros.org/sw_urdf_exporter -->
<robot
@ -362,4 +362,4 @@
<axis
xyz="0 0 1" />
</joint>
</robot>
</robot>

View File

@ -190,4 +190,4 @@
<child link="Moving_Jaw"/>
<axis xyz="0 0 1"/>
</joint>
</robot>
</robot>