CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World

Robotics: Science and Systems (RSS) 2025

Yankai Fu^1†, Qiuxuan Feng^1†, Ning Chen^1†, Zichen Zhou¹, Mengzhen Liu¹,
Mingdong Wu¹, Tianxing Chen², Shanyu Rong¹, Jiaming Liu¹, Hao Dong¹, Shanghang Zhang^1,3✉
¹State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University,
²The University of Hong Kong, ³Beijing Academy of Artificial Intelligence
† Equal contribution, ✉ Corresponding author

ArXiv Video Code

CordViP

Abstract

Achieving human-level dexterity in robots is a key objective in the field of robotic manipulation. Recent advancements in 3D-based imitation learning have shown promising results, providing an effective pathway to achieve this goal. However, obtaining high-quality 3D representations presents two key problems: (1) the quality of point clouds captured by a single-view camera is significantly affected by factors such as camera resolution, positioning, and occlusions caused by the dexterous hand; (2) the global point clouds lack crucial contact information and spatial correspondences, which are necessary for fine-grained dexterous manipulation tasks. To eliminate these limitations, we propose CordViP, a novel framework that constructs and learns correspondences by leveraging the robust 6D pose estimation of objects and robot proprioception. Specifically, we first introduce the interaction-aware point clouds, which establish correspondences between the object and the hand. These point clouds are then used for our pretraining strategy, where we also incorporate object-centric contact maps and handarm coordination information, effectively capturing both spatial and temporal dynamics. Our method demonstrates exceptional dexterous manipulation capabilities, achieving state-of-the-art performance in six real-world tasks, surpassing other baselines by a large margin. Experimental results also highlight the superior generalization and robustness of CordViP to different objects, viewpoints, and scenarios.

Overview

We propose CordViP, a correspondence-based visuomotor policy for dexterous manipulation in the real world. (a) Left: We present the interaction-aware point clouds, which demonstrate robustness to different viewpoints while establishing correspondences between the object and the hand. (b) Right: Our method achieves promising results across four real-world dexterous manipulation tasks, showcasing exceptional generalization capabilities.

Framework

(a) We first employ TripoSR to generate the initial object point cloud and FoundationPose to estimate the 6D pose of the object. In parallel, the hand point cloud is generated based on the robot's state. They are combined to construct interaction-aware point clouds, which demonstrate robustness to viewpoint variations. (b) During the pre-training phase, the generated point cloud data, combined with the robot’s proprioceptive information, is utilized to enhance spatial understanding and interaction modeling. (c) The pre-trained encoder is subsequently integrated into an imitation learning framework to facilitate downstream tasks in dexterous manipulation.

Real Robot System

Our system consists of a Leap Hand and a UR5 Arm, with a fixed Realsense L515 camera employed to capture visual observation. The Realsense D435 camera is only used for data collection during teleoperation, and is not involved in the policy learning.

Task1: Pick and Place

Task2: Flip Cup

Task3: Assembly

Task4: Articulated Manipulation

Task5: FlipCap

Task6: Long Horizon Manipulation

Effectiveness and Efficiency

Our method demonstrates exceptional dexterous manipulation capabilities with an average success rateof 90% in four real-world tasks, surpassing other baselines by a large margin. CordViP also exhibits superior performance, achieving higher accuracy with fewer demonstrations.

Generalization to Different Lighting Conditions

Ours

ACT+3D

Generalization to Different Scenarios

Ours

ACT+3D

Generalization to Unseen Objects

Ours

ACT+3D

Generalization to Different Viewpoints

Ours

ACT+3D

BibTeX

@article{fu2025cordvip,
    title={CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World},
    author={Fu, Yankai and Feng, Qiuxuan and Chen, Ning and Zhou, Zichen and Liu, Mengzhen and Wu, Mingdong and Chen, Tianxing and Rong, Shanyu and Liu, Jiaming and Dong, Hao and others},
    journal={arXiv preprint arXiv:2502.08449},
    year={2025}
}

If you have any questions, please contact us at yankaifu.aur@gmail.com.