๐กThis work introduces ๐ฉ๐-๐ฆ๐๐ฟ๐ด๐ฃ๐ง, the first large-scale multimodal dataset that ๐ช๐ฏ๐ต๐ฆ๐จ๐ณ๐ข๐ต๐ฆ๐ด ๐ท๐ช๐ด๐ถ๐ข๐ญ ๐ต๐ณ๐ข๐ซ๐ฆ๐ค๐ต๐ฐ๐ณ๐ช๐ฆ๐ด ๐ธ๐ช๐ต๐ฉ ๐ด๐ฆ๐ฎ๐ข๐ฏ๐ต๐ช๐ค ๐ฑ๐ฐ๐ช๐ฏ๐ต ๐ด๐ต๐ข๐ต๐ถ๐ด ๐ฅ๐ฆ๐ด๐ค๐ณ๐ช๐ฑ๐ต๐ช๐ฐ๐ฏ๐ด in surgical environments.
๐Alongside the dataset, we propose ๐ง๐-๐ฆ๐๐ฟ๐ด๐ฃ๐ง, a text-guided point tracking method that consistently outperforms vision-only approaches, especially under challenging intraoperative conditions such as smoke, occlusion, and tissue deformation.
๐ We are deeply grateful to all coauthors and especially our clinical collaborators at Shenzhen Peopleโs Hospital for their invaluable contributions. Looking forward to engaging with the community at AAAI in Singapore and advancing the conversation on multimodal surgical AI!
Check the paper at https://lnkd.in/grfE5iVi
Project Page: https://lnkd.in/gscM_ciV