We extend our sincere gratitude to Prof. Tan Lee, Prof. Qi Dou, and Prof. S. Kevin Zhou for serving as examiners during Long Bai’s defense. Special thanks to his supervisors, Prof. Hongliang Ren and Prof. Jiewen Lai, for their invaluable guidance throughout his Ph.D. journey.
During his time at CUHK RenLab, Dr. Bai has made impressive contributions to surgical and medical artificial intelligence, particularly in multimodal AI.
๐ For more details about his research, visit his personal website: longbai-cuhk.github.io.
Wishing Dr. Long Bai all the best in his future endeavors! ๐๐
๐๏ธ We propose a multimodal Graph Representation network with Adversarial feature Disentanglement (GRAD) for robust surgical workflow recognition in challenging scenarios with domain shifts or corrupted data. Specifically, we introduce a Multimodal Disentanglement Graph Network (MDGNet) that captures fine-grained visual information while explicitly modeling the complex relationships between vision and kinematic embeddings through graph-based message modeling. To align feature spaces across modalities, we propose a Vision-Kinematic Adversarial (VKA) framework that leverages adversarial training to reduce modality gaps and improve feature consistency. Furthermore, we design a Contextual Calibrated Decoder, incorporating temporal and contextual priors to enhance robustness against domain shifts and corrupted data.
๐๏ธ Extensive comparative and ablation experiments demonstrate the effectiveness of our model and proposed modules. Specifically, we achieved an accuracy of 86.87% and 92.38% on two public datasets, respectively. Moreover, our robustness experiments show that our method effectively handles data corruption during storage and transmission, exhibiting excellent stability and robustness. Our approach aims to advance automated surgical workflow recognition, addressing the complexities and dynamism inherent in surgical procedures.
The ๐๐ฌ๐ญ ๐๐จ๐ซ๐ค๐ฌ๐ก๐จ๐ฉ ๐จ๐ง ๐๐ง๐ญ๐๐ซ๐ง๐๐ญ ๐จ๐ ๐๐๐๐ซ๐๐๐ฅ๐ ๐๐ก๐ข๐ง๐ ๐ฌ (๐๐จ๐๐ ๐๐๐๐) will be held at The IEEE 11th World Forum on IoT in ๐๐ก๐๐ง๐ ๐๐ฎ, ๐๐ก๐ข๐ง๐ (๐๐๐ญ ๐๐โ๐๐, ๐๐๐๐)!
We invite submissions on AI-driven wearable systems, energy-efficient IoT, human-centric automation, and scalable intelligence.
๐ Key Dates:
๐ Submission Deadline: June 15, 2025
๐ข Notification: July 31, 2025
๐ Camera-ready: August 15, 2025
๐ More details: IoWT 2025 Workshop (https://lnkd.in/gj_arCXX)
Tinghua Zhang, Sishen YUAN et al. for “PneumaOCT: Pneumatic optical coherence tomography endoscopy for targeted distortion-free imaging in tortuous and narrow internal lumens”, a collaboration between CUHK ABI Lab (https://lnkd.in/gUuzQqDt) and RENLab (labren.org),
published in Science Advances (DOI: 10.1126/sciadv.adp3145).
“ETSM: Automating Dissection Trajectory Suggestion and Confidence Map-Based Safety Margin Prediction for Robot-assisted Endoscopic Submucosal Dissection”, accepted at #ICRA2025 (arXiv preprint: arXiv:2411.18884).
๐ Congratulations to our brilliant team members on these well-deserved recognitions!
Additionally, Prof. Hongliang Ren delivered an insightful talk, “Endoscopic Multisensory Navigation with Soft Flexible Robotics”, highlighting the latest advancements in endoscopic navigation and soft medical robotics.
This conference serves as a platform for researchers and practitioners to discuss advancements, challenges, and opportunities in information, automation, artificial intelligence, robotics, image processing, computer vision, DSP, and BME.
In this work, we tackled a longโstanding challenge in soft tactile sensingโaccurately localizing a contact point on a stretchable sensor even in the presence of strain and variable contact forces. Our approach uses ultrasonic scatter signals extracted from a soft waveguide to decouple these intertwined effects. A data-driven method was developed, combining:
– Global feature extraction: Using the Hilbert transform to capture the overall energy distribution before and after force contact.
– Local feature extraction: Leveraging continuous wavelet transforms (CWT) to retrieve high-resolution timeโfrequency characteristics.
– Deep learning integration: Fusing these features through a deep convolutional neural network and multilayer perceptron regression, which allowed us to achieve a mean absolute error of just 0.627 mm and a mean relative error of 3.19%.
This fusion of global and local signal analysis not only overcomes limitations of traditional time-of-flight estimation methods but also paves the way for more robust multimodal sensing in robotics and humanโmachine interfaces. The implications for advanced robotics, intelligent prosthetics, and other emerging applications are truly exciting.
๐ This paper addresses several key challenges in PDT, including cross-contamination, constrained operative space, complex anatomy, and the lack of tactile feedback during manual procedures. To overcome these barriers, we propose an โinside-outโ robotic system equipped with a retractable drill and wireless magnetic actuation. By integrating feedback from tactile and magnetic sensors along with precise control mechanisms, the system enhances the safety of tracheal punctures, preventing damage to adjacent tissues, such as the esophagus, and reducing reliance on manual expertise.
๐ฌ Ex vivo experiments on porcine tracheas validated the feasibility of this approach, demonstrating effective puncture with a maximum localization deviation of 6.308 mm, preliminarily confirming the systemโs potential to achieve safer and more consistent outcomes in clinical settings.
This research presents an automatic calibration and dynamic registration method specifically designed for deformable tissues, integrating Augmented Reality (AR) technology to enhance surgical precision in Endoscopic Submucosal Dissection (ESD).
Our approach leverages a 6D pose estimator to align virtual and real-world target tissues seamlessly, utilizing the SuperGlue feature-matching network and the Metric3D depth estimation network for robust fusion. Additionally, our dynamic registration method enables real-time tracking of tissue deformation, ensuring more reliable surgical guidance.
Experimental validation demonstrated the effectiveness of our system, with automatic calibration experiments using cloth achieving a mean absolute error (MAE) of 3.79 ยฑ 0.64 mm. Dynamic registration accuracy was assessed under varying tissue deformation, yielding an MAE of 6.03 ยฑ 0.96 mm. Ex-vivo experiments with porcine small intestine tissue further validated our systemโs performance, with an AR calibration MAE of 3.11 ยฑ 0.56 mm and a dynamic registration MAE of 3.20 ยฑ 1.96 mm.
The full paper is in production and will be available at https://lnkd.in/gMFCDWv4
In this paper, we explore the role of large vision models in advancing robot-assisted surgery, analyzing key developments and discussing future directions in AI-driven surgical innovation. By examining emerging trends and challenges, we contribute to the broader conversation on how intelligent visual systems can enhance precision, adaptability, and decision-making in surgical robotics.