🚀Excited to share that our paper “𝐄𝐧𝐝𝐨𝐕𝐋𝐀: 𝐃𝐮𝐚𝐥-𝐏𝐡𝐚𝐬𝐞 𝐕𝐢𝐬𝐢𝐨𝐧-𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞-𝐀𝐜𝐭𝐢𝐨𝐧 𝐌𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐀𝐮𝐭𝐨𝐧𝐨𝐦𝐨𝐮𝐬 𝐓𝐫𝐚𝐜𝐤𝐢𝐧𝐠 𝐢𝐧 𝐄𝐧𝐝𝐨𝐬𝐜𝐨𝐩𝐲” has been accepted to the Conference on Robot Learning (𝐂𝐨𝐑𝐋) 2025!

By hxwu August 5, 2025 August 5, 2025 News

In this project, we tackled the unique challenges of robotic endoscopy by integrating vision, language grounding, and motion planning into one end-to-end framework. EndoVLA enables:

– Precise polyp tracking through surgeon-issued prompts

– Delineation and following of abnormal mucosal regions

– Adherence to circumferential cutting markers during resections

We introduced a dual-phase training strategy:

1. 𝐒𝐮𝐩𝐞𝐫𝐯𝐢𝐬𝐞𝐝 𝐟𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠 on our new 𝐄𝐧𝐝𝐨𝐕𝐋𝐀-𝐌𝐨𝐭𝐢𝐨𝐧 dataset

2. 𝐑𝐞𝐢𝐧𝐟𝐨𝐫𝐜𝐞𝐦𝐞𝐧𝐭 𝐟𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠 with task-aware rewards

This approach impressively boosts tracking accuracy and achieves zero-shot generalization across diverse GI scenes.

The paper is available at: https://lnkd.in/g35DF7Fq

No alternative text description for this image

Bookmark the permalink.

Comments are closed.

News

Tags