This paper is an extended version of our #ICRA2023 Surgical-VQLA. Our method can serve as an effective and reliable tool to assist in surgical education and clinical decision-making by providing more insightful analyses of surgical scenes.
โจ Key Contributions in the journal version:
– A dual calibration module is proposed to align and normalize multimodal representations.
– A contrastive training strategy with adversarial examples is employed to enhance robustness.
– Various optimization function is widely explored.
– The EndoVis-18-VQLA & EndoVis-17-VQLA datasets are further extended.
– Our proposed solution presents superior performance and robustness against real-world image corruption.
Conference Version (ICRA 2023): https://lnkd.in/gHscT3eN
Journal Version (Information Fusion): https://lnkd.in/gQNWwHmt
Code & Dataset: https://lnkd.in/g7CTuyAH
Thank all of the collaborators for their effort: Long Bai, Guankun Wang, An Wang, and Prof. Hongliang Ren from CUHK, Dr. Mobarakol Islam from WEISS, UCL, and Dr. Lalithkumar Seenivasan from JHU.