PhD/Postdoc/Visiting Scholar/RA Opportunities on AI, Robotics & Perception at CUHK Hong Kong

PhD/Postdoc/RA (and Visiting Scholar/Prof/Ph.D.) Opportunities in AI, Robotics & Perception at CUHK Hong Kong

 

[RESEARCH AREA]

 

There are multiple openings for Postdoc/RA (and Visiting Scholar/Prof/Ph.D.) to perform research on Medical Robotics Perception & AI at The Chinese University of Hong Kong (CUHK, Hong Kong) starting immediately. Particularly, the main areas of interest include AI-assisted endoscopic diagnosis, biorobotics & intelligent systems, multisensory perception, AI learning and control in image-guided procedures, medical mechatronics, continuum, and soft flexible robots and sensors, deployable motion generation, compliance modulation/sensing, cooperative and context-aware flexible/soft sensors/actuators in human environments. For more details, please refer to the recent publications at Google Scholar or the lab website http://labren.org/.

 

The scholars will have opportunities to work with an interdisciplinary team consisting of clinicians and researchers from robotics, AI & perception, imaging, and medicine.
The salary/remunerations will be highly competitive and commensurate with qualifications and experience (e.g., Postdoc salary will be typically above 4300USD per month plus medical insurance etc.).

[QUALIFICATIONS]

* Background in AI, Computer Science/Engineering, Electronic or Mechanical Engineering, robotics, medical physics, automation, or mechatronics background
* Preferably have hands-on experience in AI/robots/sensors, instrumentation, intelligent systems

* Strong problem-solving, writing, programming, interpersonal, and analytical skills
* Outstanding academic records/publications or recognitions from worldwide top-ranking institutes
* Self-motivated and preferably with strong academic records 

[HOW TO APPLY]

Qualified candidates are invited to express their interests through an email with detailed supporting documents (including CV, transcripts, HK visa status, research interests, education background, experiences, GPA, representative publications, demo projects) to Prof. Hongliang Ren ASAP email: <hlren@ee.cuhk.edu.hk> Due to the significant amount of emails, we seek understandings that only shortlisted candidates will be informed/invited to interview.

๐Ÿš€ ICRA 2026: ๐‘ฎ๐’†๐’๐‘ณ๐’‚๐’๐‘ฎ: ๐‘ฎ๐’†๐’๐’Ž๐’†๐’•๐’“๐’š-๐‘จ๐’˜๐’‚๐’“๐’† ๐‘ณ๐’‚๐’๐’ˆ๐’–๐’‚๐’ˆ๐’†-๐‘ฎ๐’–๐’Š๐’…๐’†๐’… ๐‘ฎ๐’“๐’‚๐’”๐’‘๐’Š๐’๐’ˆ ๐’˜๐’Š๐’•๐’‰ ๐‘ผ๐’๐’Š๐’‡๐’Š๐’†๐’… ๐‘น๐‘ฎ๐‘ฉ-๐‘ซ ๐‘ด๐’–๐’๐’•๐’Š๐’Ž๐’๐’…๐’‚๐’ ๐‘ณ๐’†๐’‚๐’“๐’๐’Š๐’๐’ˆ

Thrilled to share our latest work, ๐†๐ž๐จ๐‹๐š๐ง๐†, a unified geometry-aware framework for language-guided robotic grasping.

Language-guided grasping is a key capability for intuitive humanโ€“robot interaction. A robot should not only detect objects but also understand natural instructions such as โ€œpick up the blue cup behind the bowl.โ€ While recent multimodal models have shown promising results, most existing approaches rely on multi-stage pipelines that loosely couple perception and grasp prediction. These methods often overlook the tight integration of geometry, language, and visual reasoning, making them fragile in cluttered, occluded, or low-texture environments. This motivated us to bridge the gap between semantic language understanding and precise geometric grasp execution.

๐Ÿง โœจ ๐–๐ก๐š๐ญ ๐ฐ๐ž ๐๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ž๐:

A novel unified framework for geometry-aware language-guided grasping that includes:

๐Ÿ”น Unified RGB-D Multimodal Representation:

 We embed RGB, depth, and language features into a shared representation space, enabling consistent cross-modal semantic alignment for accurate target reasoning.

๐Ÿ”น Depth-Guided Geometric Module (DGGM):

 Instead of treating depth as auxiliary input, we explicitly inject geometric priors derived from depth into the attention mechanism, strengthening object discrimination under occlusion and ambiguous visual conditions.

๐Ÿ”น Adaptive Dense Channel Integration (ADCI):

 A dynamic multi-layer fusion strategy that balances global semantic cues and fine-grained geometric details for robust grasp prediction.

๐ŸŽฏ  ๐Š๐ž๐ฒ ๐‘๐ž๐ฌ๐ฎ๐ฅ๐ญ๐ฌ:

โœ… GeoLanG significantly outperforms prior multi-stage baselines on OCID-VLG for language-guided grasping.

โœ… Demonstrates strong robustness in cluttered and heavily occluded scenes.

โœ… Successfully validated on real robotic hardware, showing reliable sim-to-real transfer.

๐Ÿ’ก ๐–๐ก๐ฒ ๐ข๐ญ ๐ฆ๐š๐ญ๐ญ๐ž๐ซ๐ฌ:

This work shows that tightly coupling geometric reasoning with multimodal language understanding can significantly enhance robotic grasp reliability. By embedding depth-aware geometric priors directly into attention mechanisms, we reduce ambiguity and improve consistency in grasp decision-making.

GeoLanG provides a pathway toward more intelligent robotic systems that understand not just what object to grasp, but also how to grasp it robustly in complex real-world environments.

๐ŸŒฑ ๐–๐ก๐š๐ญโ€™๐ฌ ๐ง๐ž๐ฑ๐ญ?

We are exploring extending this geometry-aware multimodal reasoning toward:

 ๐Ÿ”น Real-time interactive grasping

 ๐Ÿ”น Multi-step manipulation tasks

 ๐Ÿ”น Integration with motion planning and autonomous robotic control

#ICRA2026 #CUHK

No alternative text description for this image
No alternative text description for this image

๐Ÿš€ ICRA 2026: ๐‘ฌ๐’๐’…๐’๐‘ซ๐‘ซ๐‘ช: ๐‘ณ๐’†๐’‚๐’“๐’๐’Š๐’๐’ˆ ๐‘บ๐’‘๐’‚๐’“๐’”๐’† ๐’•๐’ ๐‘ซ๐’†๐’๐’”๐’† ๐‘น๐’†๐’„๐’๐’๐’”๐’•๐’“๐’–๐’„๐’•๐’Š๐’๐’ ๐’‡๐’๐’“ ๐‘ฌ๐’๐’…๐’๐’”๐’„๐’๐’‘๐’Š๐’„ ๐‘น๐’๐’ƒ๐’๐’•๐’Š๐’„ ๐‘ต๐’‚๐’—๐’Š๐’ˆ๐’‚๐’•๐’Š๐’๐’ ๐’—๐’Š๐’‚ ๐‘ซ๐’Š๐’‡๐’‡๐’–๐’”๐’Š๐’๐’ ๐‘ซ๐’†๐’‘๐’•๐’‰ ๐‘ช๐’๐’Ž๐’‘๐’๐’†๐’•๐’Š๐’๐’ ๐Ÿค–

Thrilled to share our latest work on enabling robust sparse-to-dense reconstruction for endoscopic surgical robots โ€” bridging the gap between ๐ฌ๐ฉ๐š๐ซ๐ฌ๐ž ๐ฌ๐ž๐ง๐ฌ๐จ๐ซ ๐๐š๐ญ๐š ๐š๐ง๐ ๐ก๐ข๐ ๐ก-๐ช๐ฎ๐š๐ฅ๐ข๐ญ๐ฒ ๐Ÿ‘๐ƒ ๐ฆ๐š๐ฉ๐ฉ๐ข๐ง๐  using a novel ๐๐ข๐Ÿ๐Ÿ๐ฎ๐ฌ๐ข๐จ๐ง-๐›๐š๐ฌ๐ž๐ framework.

Fine-tuning foundational models often fails due to a lack of dense ground truth, and self-supervised methods struggle with scale ambiguity, sparse depth sensors offer a reliable geometric prior.

This motivated us to develop EndoDDC, a method that robustly generates dense depth maps by fusing RGB images with sparse depth inputs.

๐Ÿง โœจ ๐–๐ก๐š๐ญ ๐ฐ๐ž ๐๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ž๐:

A diffusion-driven depth completion architecture that:

๐Ÿ”น Integrates sparse depth and RGB inputs to overcome the limitations of pure visual estimation.

๐Ÿ”น Utilizes a Multi-scale Feature Extraction and Depth Gradient Fusion module to capture fine-grained surface orientation and local structure.

๐Ÿ”น Optimizes depth maps iteratively using a conditional diffusion model, refining geometry even in regions with weak textures or reflections.

๐ŸŽฏ ๐Š๐ž๐ฒ ๐‘๐ž๐ฌ๐ฎ๐ฅ๐ญ๐ฌ:

โœ… 25.55% and 9.03% improvement in accuracy on the StereoMIS and C3VD dataset compared to SOTA surgical estimators like EndoDAC.

โœ… 7.35% and 5.28% reduction in RMSE on StereoMIS and C3VD compared to the best depth completion baseline (OGNI-DC).

โœ… Outperformed foundational models (DepthAnything-v2) and standard depth completion (Marigold-DC) methods in both accuracy and robustness.

๐Ÿ’ก ๐–๐ก๐ฒ ๐ข๐ญ ๐ฆ๐š๐ญ๐ญ๐ž๐ซ๐ฌ:

This work demonstrates that diffusion models can effectively solve the “sparse-to-dense” challenge in medical imaging. By providing accurate depth completion despite complex lighting and texture conditions, EndoDDC has the potential to significantly enhance autonomous navigation, procedural safety, and spatial awareness in minimally invasive surgery.

๐Ÿ”– #DepthCompletion #DiffusionModel #EndoscopicSurgery #SurgicalNavigation #ICRA #CUHKEngineering #CUHK

No alternative text description for this image
No alternative text description for this image

๐Ÿš€ ICRA 2026: ๐‘ต๐’†๐’–๐’“๐’๐‘ฝ๐‘ณ๐‘จ: ๐‘บ๐’–๐’“๐’ˆ๐’Š๐’„๐’‚๐’ ๐‘บ๐’„๐’†๐’๐’‚๐’“๐’Š๐’-๐‘จ๐’˜๐’‚๐’“๐’† ๐‘ณ๐’†๐’‚๐’“๐’๐’Š๐’๐’ˆ ๐’๐’‡ ๐‘ซ๐’†๐’ƒ๐’–๐’๐’Œ๐’Š๐’๐’ˆ ๐‘บ๐’Œ๐’Š๐’๐’๐’” ๐’Š๐’ ๐‘ฌ๐’๐’…๐’๐’”๐’„๐’๐’‘๐’Š๐’„ ๐‘น๐’๐’ƒ๐’๐’•๐’Š๐’„ ๐‘ต๐’†๐’–๐’“๐’๐’”๐’–๐’“๐’ˆ๐’†๐’“๐’š ๐’—๐’Š๐’‚ ๐‘ฝ๐’Š๐’”๐’Š๐’๐’-๐‘ณ๐’‚๐’๐’ˆ๐’–๐’‚๐’ˆ๐’†-๐‘จ๐’„๐’•๐’Š๐’๐’ ๐‘ด๐’๐’…๐’†๐’ ๐Ÿค–๐Ÿงฒ

We present ๐๐ž๐ฎ๐ซ๐จ-๐•๐‹๐€, an scenario-aware model designed for the motion control of a parallel continuum neurosurgical robot.

Robotic surgery systems have garnered significant attention for their precision and efficiency, yet achieving autonomous tasks in complex neurosurgical environments remains challenging. Although Vision-Language-Action (VLA) models hold great potential, their development is constrained by the scarcity of data from surgical environments and robotic kinematics. To address this issue, this paper proposes NeuroVLA: a VLA model specifically designed for neurosurgical robotic tumor debulking tasks. Through phantom experiments conducted on a flexible parallel continuum robot, we constructed a dataset and decomposed the debulking task into four skill-based instructions. NeuroVLA utilizes a Vision-Language Model (VLM) as its backbone for scene reasoning, enabling the robot to comprehend the surgical scene and its own state. Experimental results demonstrate that after training on 90 debulking segments, NeuroVLA can infer actions based on images, language instructions, and the robotโ€™s state. It achieved average pixel distance errors of 29.10 pixels and 21.55 pixels for the “alignment” and “transfer” skills, respectively, and success rates of 88.89% and 100% for the “grasping” and “release” skills.

๐Ÿง  Technical Framework:

โ—     End-to-End scenario-aware VLA model

โ—     Skill-based scenario infer mechanism

โ—     Debulking task dataset in neurosurgery

๐ŸŽฏ Experimental Results:

โ—     NeuroVLA demonstrates significantly lower pixel distance (PD) errors in the “alignment” and “transfer” skills (29.10 px / 21.55 px), far surpassing the performance of baseline models (such as Octo’s 79.72 px / 65.46 px).

In the “grasping” and “release” skills, NeuroVLA exhibits greater robustness, achieving a grasping success rate of 88.89% and a release success rate of 100%. In contrast, baseline models often misinterpret incomplete forceps closure as task completion, leading to grasping failures.

#ICRA2026

diagram

๐Ÿš€ ICRA 2026: ๐‘ฒ๐’Š๐’“๐’Š-๐‘ช๐’‚๐’‘๐’”๐’–๐’๐’† โ€” ๐‘ฒ๐’Š๐’“๐’Š๐’ˆ๐’‚๐’Ž๐’Š ๐‘ฉ๐’Š๐’๐’‘๐’”๐’š ๐‘ช๐’‚๐’‘๐’”๐’–๐’๐’† ๐‘น๐’๐’ƒ๐’๐’• ๐Ÿงฌ๐Ÿค–

We present ๐Š๐ข๐ซ๐ข-๐‚๐š๐ฉ๐ฌ๐ฎ๐ฅ๐ž, a swallowable kirigami-inspired capsule robot that enables minimally invasive GI biopsyโ€”pushing capsule endoscopy from imaging to tissue sampling.

Wireless capsule endoscopy is comfortable and accessible, but cannot collect biopsy tissue, while histology is still the gold standard. Our work targets safe, depth-controlled, retrievable sampling in a capsule form factor.

๐Ÿง  Technical Framework:

โ—     Kirigami PI skin: flat during locomotion, deploys sharp protrusions when stretched

โ—     Dual-cam actuation: compact, repeatable deployment + recovery

โ—     Rotary scraping + internal storage: detach tissue and store it in internal cavities for retrieval

๐ŸŽฏ Experimental Results:

โ—     Penetration depth: median ~0.61 mm (0.46โ€“0.66 mm) on ex vivo porcine tissue

โ—     Biopsy yield: ~10.9 mg (stomach) / 18.9 mg (small intestine), histology-ready (mucosa + submucosa)

โ—     Forces within reported GI biopsy safety ranges

๐Ÿ’ก Significance:

A practical route toward capsule-based biopsy with controlled shallow interaction and retrievable specimensโ€”without bulky endoscopic tools.

๐ŸŒฑ Next:

Moving toward untethered actuation (e.g., magnetic drive) and multi-site sampling to reduce mixing risk.

๐Ÿ”– #ICRA2026 #SoftRobotics #MedicalRobotics #CapsuleEndoscopy #Kirigami #BioInspiredRobotics #MinimallyInvasive #Biopsy #GI #RoboticsResearch

diagram

Title: ๐Ÿš€ ICRA 2026: ๐‘บ๐’–๐’“๐’ˆ๐‘ฝ๐’Š๐’…๐‘ณ๐‘ด: ๐‘ป๐’๐’˜๐’‚๐’“๐’…๐’” ๐‘ด๐’–๐’๐’•๐’Š-๐’ˆ๐’“๐’‚๐’Š๐’๐’†๐’… ๐‘ฝ๐’Š๐’…๐’†๐’

๐”๐ง๐๐ž๐ซ๐ฌ๐ญ๐š๐ง๐๐ข๐ง๐  ๐ฐ๐ข๐ญ๐ก ๐‹๐š๐ซ๐ ๐ž ๐‹๐š๐ง๐ ๐ฎ๐š๐ ๐ž ๐Œ๐จ๐๐ž๐ฅ ๐ข๐ง ๐‘๐จ๐›๐จ๐ญ-๐š๐ฌ๐ฌ๐ข๐ฌ๐ญ๐ž๐ ๐’๐ฎ๐ซ๐ ๐ž๐ซ๐ฒ!

Thrilled to share our latest work, ๐’๐ฎ๐ซ๐ ๐•๐ข๐๐‹๐Œ, the first video-language model specifically designed to address both full and fine-grained surgical video comprehension.

Surgical scene understanding is critical for training and robotic decision-making. While current Multimodal Large Language Models (MLLMs) excel at image analysis, they often overlook the fine-grained temporal reasoning required to capture detailed task execution and specific procedural processes within a surgery. This motivated us to bridge the gap between global video understanding and micro-action analysis.

๐Ÿง โœจ What we developed:

A novel framework and resource for surgical video reasoning that includes:

๐Ÿ”น ๐“๐ฐ๐จ-๐ฌ๐ญ๐š๐ ๐ž ๐’๐ญ๐š๐ ๐ž๐…๐จ๐œ๐ฎ๐ฌ ๐ฆ๐ž๐œ๐ก๐š๐ง๐ข๐ฌ๐ฆ: The first stage extracts global procedural context, while the second stage performs high-frequency local analysis for fine-grained task execution.

๐Ÿ”น ๐Œ๐ฎ๐ฅ๐ญ๐ข-๐Ÿ๐ซ๐ž๐ช๐ฎ๐ž๐ง๐œ๐ฒ ๐…๐ฎ๐ฌ๐ข๐จ๐ง ๐€๐ญ๐ญ๐ž๐ง๐ญ๐ข๐จ๐ง (๐Œ๐…๐€): Effectively integrates low-frequency global features with high-frequency local details to ensure comprehensive scene perception.

๐Ÿ”น ๐’๐•๐”-๐Ÿ‘๐Ÿ๐Š ๐ƒ๐š๐ญ๐š๐ฌ๐ž๐ญ: We constructed a large-scale dataset with over 31,000 video-instruction pairs, featuring hierarchical knowledge representation for enhanced visual reasoning.

๐ŸŽฏ Key Results:

โœ… SurgVidLM significantly outperforms existing models (like Qwen2-VL) in multi-grained surgical video understanding tasks.

โœ… Capable of inferring anatomical landmarks (e.g., Denonvilliers’ fascia) and providing clinical motivation, moving beyond simple visual description.

โœ… Demonstrated strong performance on unseen surgical tasks, proving the robustness of our hierarchical training approach.

๐Ÿ’ก Why it matters:

This work shows that by combining global context with localized high-frequency focus, we can significantly reduce “hallucinations” in surgical AI. It provides a pathway toward more intelligent, context-aware surgical assistants that can understand not just what is happening, but how and why specific steps are performed.

๐ŸŒฑ Whatโ€™s next?

We are exploring how to extend this multi-grained understanding to real-time intraoperative guidance and integrating it with physical robotic control for autonomous sub-tasks.

๐Ÿš€ ICRA 2026: ย ๐‘ป๐‘ด๐‘น-๐‘ฝ๐‘ณ๐‘จ โ€” ๐‘ฝ๐’Š๐’”๐’Š๐’๐’-๐‘ณ๐’‚๐’๐’ˆ๐’–๐’‚๐’ˆ๐’†-๐‘จ๐’„๐’•๐’Š๐’๐’ ๐‘ด๐’๐’…๐’†๐’ ๐’‡๐’๐’“ ๐‘ด๐’‚๐’ˆ๐’๐’†๐’•๐’Š๐’„ ๐‘บ๐’๐’‡๐’• ๐‘น๐’๐’ƒ๐’๐’•๐’” ๐Ÿค–๐Ÿงฒ

We present TMR-VLA, an end-to-end framework designed for the motion control of tri-leg silicone-based soft robots.

Miniature magnetic robots face a hardware bottleneck where the robot body is too small to integrate onboard sensors or power. This creates a gap between actuation and perception, often requiring human experts to manually adjust magnetic fields based on visual feedback. Our work aims to bridge this gap by enabling autonomous control through a multi-modal system.

๐Ÿง  Technical Framework:

โ—     End-to-End Mapping: The policy translates sequential endoscope images and natural language instructions directly into low-level coil voltage commands.

โ—     Action Adaptor: We utilized an EndoVLA-initialized backbone with an Action LoRA Adaptor that allows the model to autoregressively emit voltage increments.

โ—     TrilegMR-Motion Dataset: The model was trained on a new dataset containing 15,793 image-action pairs across 60 episodes.

โ—     Diverse Locomotion: The system controls five motion primitives: squatting, leg-lifting, rotation, forward movement, and recovery.

๐ŸŽฏ Experimental Results:

โ—     Success Rate: TMR-VLA achieved an average success rate of 74% across tested motion types.

โ—     Performance: The model outperformed general-purpose multimodal models (such as Qwen2.5-VL and LLaVA-1.6) in both instruction interpretation and action execution.

โ—     Inference Speed: Real-time control was demonstrated at approximately 2 Hz using an NVIDIA RTX 5090 GPU.

๐Ÿ’ก Significance: This study addresses the challenge of autonomous control in untethered soft robots without increasing their structural complexity. It provides a foundational baseline for intelligent navigation in complex in-vivo environments.

Big news! ๐Ÿš€ Our lab is proud to announce that 6 of our latest papers have been accepted! ๐ŸŽŠ

We are incredibly proud of the teamโ€™s hard work and innovation. To give each project the spotlight it deserves, we will be sharing details about each breakthrough one by one over the coming days.

Stay tuned for the updates! ๐Ÿค–โœจ

diagram, schematic

๐Ÿš€๐— ๐—ผ๐˜ƒ๐—ถ๐—ป๐—ด ๐—•๐—ฒ๐˜†๐—ผ๐—ป๐—ฑ ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—ถ๐—ป ๐—ฅ๐—ผ๐—ฏ๐—ผ๐˜๐—ถ๐—ฐ ๐—ฆ๐˜‚๐—ฟ๐—ด๐—ฒ๐—ฟ๐˜†๐Ÿค–๐Ÿ‘จโ€โš•๏ธ

We are thrilled to share our latest Comment published in #NatureReviewsBioengineering: “๐—”๐—ฟ๐˜๐—ถ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฎ๐—น ๐—ธ๐—ถ๐—ป๐—ฎ๐—ฒ๐˜€๐˜๐—ต๐—ฒ๐˜€๐—ถ๐—ฎ ๐—ถ๐—ป ๐—ฎ๐˜‚๐˜๐—ผ๐—ป๐—ผ๐—บ๐—ผ๐˜‚๐˜€ ๐—ฟ๐—ผ๐—ฏ๐—ผ๐˜๐—ถ๐—ฐ ๐˜€๐˜‚๐—ฟ๐—ด๐—ฒ๐—ฟ๐˜†”.

Current autonomous surgical robots are ๐—ต๐—ฒ๐—ฎ๐˜ƒ๐—ถ๐—น๐˜† ๐˜ƒ๐—ถ๐˜€๐—ถ๐—ผ๐—ป-๐—ฐ๐—ฒ๐—ป๐˜๐—ฟ๐—ถ๐—ฐ. While they ๐—ฐ๐—ฎ๐—ป “๐˜€๐—ฒ๐—ฒ” anatomy, they ๐—น๐—ฎ๐—ฐ๐—ธ ๐˜๐—ต๐—ฒ ๐—ถ๐—ป๐˜๐—ฟ๐—ถ๐—ป๐˜€๐—ถ๐—ฐ ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐˜๐—ผ “๐—ณ๐—ฒ๐—ฒ๐—น” tissue interactionsโ€”๐—ฎ ๐—ฐ๐—ฟ๐˜‚๐—ฐ๐—ถ๐—ฎ๐—น ๐˜€๐—ธ๐—ถ๐—น๐—น ๐˜๐—ต๐—ฎ๐˜ ๐—ต๐˜‚๐—บ๐—ฎ๐—ป ๐˜€๐˜‚๐—ฟ๐—ด๐—ฒ๐—ผ๐—ป๐˜€ ๐—ฟ๐—ฒ๐—น๐˜† ๐—ผ๐—ป ๐—ณ๐—ผ๐—ฟ ๐˜€๐—ฎ๐—ณ๐—ฒ๐˜๐˜† ๐—ฎ๐—ป๐—ฑ ๐—ฑ๐—ฒ๐˜…๐˜๐—ฒ๐—ฟ๐—ถ๐˜๐˜†.

In this article, we propose a hierarchical framework for ๐—”๐—ฟ๐˜๐—ถ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฎ๐—น ๐—ž๐—ถ๐—ป๐—ฎ๐—ฒ๐˜€๐˜๐—ต๐—ฒ๐˜€๐—ถ๐—ฎ to bridge this gap:

1. ๐Ÿ“ˆ๐—ง๐—ต๐—ฒ ๐—ฃ๐—ต๐˜†๐˜€๐—ถ๐—ฐ๐—ฎ๐—น ๐—Ÿ๐—ฒ๐˜ƒ๐—ฒ๐—น: Integrating proprioception and exteroception for high-res physical sensing.

2. ๐Ÿ’ฌ๐—ง๐—ต๐—ฒ ๐—”๐—น๐—ด๐—ผ๐—ฟ๐—ถ๐˜๐—ต๐—บ๐—ถ๐—ฐ ๐—Ÿ๐—ฒ๐˜ƒ๐—ฒ๐—น: Moving from raw signal processing to semantic understanding of contact.

3. ๐Ÿง ๐—ง๐—ต๐—ฒ ๐—”๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฎ๐—น ๐—Ÿ๐—ฒ๐˜ƒ๐—ฒ๐—น: Implementing Vision-Kinaesthesia-Language-Action models to achieve true sensorimotor synergy.

We believe the future of autonomous surgery lies in systems that can synergistically fuse vision and kinaesthesia to not just see, but truly feel, think, and act.

๐Ÿ“ƒ Read the ๐—ณ๐˜‚๐—น๐—น ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ here: [https://lnkd.in/gqTEpYjs]

๐Ÿ‘ Kudos to our amazing team, Dr. Tangyou Liu, Dr. Sishen YUAN, and Prof. Hongliang Ren.

diagram

๐Ÿš€ ๐—œ๐—๐—ฅ๐—ฅ ๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฒ: ๐—•๐—ถ๐—ผ๐—ถ๐—ป๐˜€๐—ฝ๐—ถ๐—ฟ๐—ฒ๐—ฑ ๐—š๐—ฟ๐—ฎ๐˜ƒ๐—ถ๐˜๐˜†-๐—”๐˜„๐—ฎ๐—ฟ๐—ฒ ๐—ฆ๐—ผ๐—ณ๐˜ ๐—ฅ๐—ผ๐—ฏ๐—ผ๐˜๐˜€! ๐Ÿค–๐Ÿชข

Thrilled to share our latest The International Journal of Robotics Research (IJRR) work on enabling ๐—ด๐—ฟ๐—ฎ๐˜ƒ๐—ถ๐˜๐˜†โ€‘๐—ฎ๐˜„๐—ฎ๐—ฟ๐—ฒ ๐—ฐ๐—ผ๐—ป๐˜๐—ฟ๐—ผ๐—น for ๐—ฝ๐—ผ๐—ฟ๐˜๐—ฎ๐—ฏ๐—น๐—ฒ ๐—ฐ๐—ฎ๐—ฏ๐—น๐—ฒโ€‘๐—ฑ๐—ฟ๐—ถ๐˜ƒ๐—ฒ๐—ป ๐˜€๐—ผ๐—ณ๐˜ ๐˜€๐—น๐—ฒ๐—ป๐—ฑ๐—ฒ๐—ฟ ๐—ฟ๐—ผ๐—ฏ๐—ผ๐˜๐˜€ โ€” using ๐—ผ๐—ป๐—น๐˜† ๐—ฎ ๐˜€๐—ถ๐—ป๐—ด๐—น๐—ฒ ๐—œ๐— ๐—จ and a powerful ๐—ฟ๐—ผ๐—ฏ๐—ผ๐—ฝ๐—ต๐˜†๐˜€๐—ถ๐—ฐ๐—ฎ๐—น simulationโ€‘driven framework.

Soft robots are lightweight and flexible, but their high aspect ratios make them ๐˜ฆ๐˜น๐˜ต๐˜ณ๐˜ฆ๐˜ฎ๐˜ฆ๐˜ญ๐˜บ sensitive to gravity, causing passive deformation that traditional kinematics just canโ€™t handle. This motivated us to rethink how soft robots can ๐˜ด๐˜ฆ๐˜ฏ๐˜ด๐˜ฆ and ๐˜ค๐˜ฐ๐˜ฎ๐˜ฑ๐˜ฆ๐˜ฏ๐˜ด๐˜ข๐˜ต๐˜ฆ for gravity โ€” without bulky sensors or complex hardware.

๐Ÿง โœจ ๐—ช๐—ต๐—ฎ๐˜ ๐˜„๐—ฒ ๐—ฑ๐—ฒ๐˜ƒ๐—ฒ๐—น๐—ผ๐—ฝ๐—ฒ๐—ฑ:

A ๐—ฏ๐—ถ๐—ผโ€‘๐—ถ๐—ป๐˜€๐—ฝ๐—ถ๐—ฟ๐—ฒ๐—ฑ ๐—ฟ๐—ฒ๐—ฎ๐—น๐Ÿฎ๐˜€๐—ถ๐—บ๐Ÿฎ๐—ฟ๐—ฒ๐—ฎ๐—น ๐—ฐ๐—ผ๐—ป๐˜๐—ฟ๐—ผ๐—น ๐—ฎ๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ that:

๐Ÿ”น Streams realโ€‘world IMU orientation into a realโ€‘time and robust SOFA simulation

๐Ÿ”น Dynamically reorients virtual gravity to mirror reality

๐Ÿ”น Uses QP optimization to compute jointโ€‘level compensation

๐Ÿ”น Executes compensation in both simulation and the physical robot

All of this โ€” using just ๐—ผ๐—ป๐—ฒ ๐—œ๐— ๐—จ. No strain sensors. No cameras. No expensive reconstruction systems.

๐ŸŽฏ ๐—ž๐—ฒ๐˜† ๐—ฅ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€:

โœ… >99% compensation recovery in static tests

โœ… ~94% recovery in lowโ€‘motion dynamic tests

โœ… Demonstrated on different two-segment cableโ€‘driven soft robots

๐Ÿ’ก ๐—ช๐—ต๐˜† ๐—ถ๐˜ ๐—บ๐—ฎ๐˜๐˜๐—ฒ๐—ฟ๐˜€:

This work shows that soft robots can maintain ๐˜€๐˜๐—ฎ๐—ฏ๐—น๐—ฒ, ๐—ฐ๐—ผ๐—ป๐˜€๐—ถ๐˜€๐˜๐—ฒ๐—ป๐˜ ๐—ฐ๐—ผ๐—ป๐—ณ๐—ถ๐—ด๐˜‚๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐˜€ ๐˜‚๐—ป๐—ฑ๐—ฒ๐—ฟ ๐—ฐ๐—ต๐—ฎ๐—ป๐—ด๐—ถ๐—ป๐—ด ๐—ด๐—ฟ๐—ฎ๐˜ƒ๐—ถ๐˜๐˜† by integrating ๐˜ƒ๐—ถ๐—ฟ๐˜๐˜‚๐—ฎ๐—น ๐˜€๐—ฒ๐—ป๐˜€๐—ถ๐—ป๐—ด + ๐˜€๐—ถ๐—บ๐˜‚๐—น๐—ฎ๐˜๐—ถ๐—ผ๐—ปโ€‘๐—ฑ๐—ฟ๐—ถ๐˜ƒ๐—ฒ๐—ป ๐—ถ๐—ป๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฎ๐˜๐—ถ๐—ผ๐—ป. It reduces reliance on physical sensors and opens a pathway toward ๐˜€๐—ฐ๐—ฎ๐—น๐—ฎ๐—ฏ๐—น๐—ฒ, ๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐—น๐—ถ๐˜‡๐—ฎ๐—ฏ๐—น๐—ฒ ๐—ด๐—ฟ๐—ฎ๐˜ƒ๐—ถ๐˜๐˜†โ€‘๐—ฎ๐˜„๐—ฎ๐—ฟ๐—ฒ ๐˜€๐—ผ๐—ณ๐˜ ๐—ฟ๐—ผ๐—ฏ๐—ผ๐˜๐˜€.

๐ŸŒฑ ๐—ช๐—ต๐—ฎ๐˜โ€™๐˜€ ๐—ป๐—ฒ๐˜…๐˜?

Weโ€™re exploring how to extend this architecture to virtualizable external force sensing and richer environmental interactions.

Special shoutout to the team โ€”

Jiewen Lai, Tian-Ao Ren (coโ€‘first authors),

Pengfei YE, Yanjun Liu, Jingyao Sun, Hongliang Ren โ€”

for making this project possible.

๐Ÿ”— Paper link: https://lnkd.in/gMgwCfvf

diagram

๐Ÿ“ขExcited to share that our latest research has been accepted by ๐—œ๐—˜๐—˜๐—˜ ๐—ฅ๐—ผ๐—ฏ๐—ผ๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐— ๐—ฎ๐—ด๐—ฎ๐˜‡๐—ถ๐—ป๐—ฒ!๐ŸŽ‰

๐—ง๐—ถ๐˜๐—น๐—ฒ: ๐—” ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ฒ๐—ป๐—ฑ๐—ผ๐˜€๐—ฐ๐—ผ๐—ฝ๐—ถ๐—ฐ ๐—ง๐—ฒ๐—น๐—ฒ๐—ฟ๐—ผ๐—ฏ๐—ผ๐˜๐—ถ๐—ฐ ๐—ฆ๐˜†๐˜€๐˜๐—ฒ๐—บ ๐—จ๐˜€๐—ถ๐—ป๐—ด ๐—›๐—ฒ๐˜๐—ฒ๐—ฟ๐—ผ๐—ด๐—ฒ๐—ป๐—ฒ๐—ผ๐˜‚๐˜€ ๐—™๐—น๐—ฒ๐˜…๐—ถ๐—ฏ๐—น๐—ฒ ๐— ๐—ฎ๐—ป๐—ถ๐—ฝ๐˜‚๐—น๐—ฎ๐˜๐—ผ๐—ฟ๐˜€ ๐—ณ๐—ผ๐—ฟ ๐—•๐—ถ๐—บ๐—ฎ๐—ป๐˜‚๐—ฎ๐—น ๐—˜๐—ป๐—ฑ๐—ผ๐˜€๐—ฐ๐—ผ๐—ฝ๐—ถ๐—ฐ ๐—ฆ๐˜‚๐—ฏ๐—บ๐˜‚๐—ฐ๐—ผ๐˜€๐—ฎ๐—น ๐——๐—ถ๐˜€๐˜€๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป

๐Ÿ” ๐—•๐—ฎ๐—ฐ๐—ธ๐—ด๐—ฟ๐—ผ๐˜‚๐—ป๐—ฑ:

Endoscopic submucosal dissection (ESD) is a key technique for early GI cancer treatment, requiring high dexterity and precision.

๐Ÿ›  ๐—ช๐—ต๐—ฎ๐˜ ๐˜„๐—ฒ ๐—ฑ๐—ถ๐—ฑ:

We developed the first ๐—ต๐—ฒ๐˜๐—ฒ๐—ฟ๐—ผ๐—ด๐—ฒ๐—ป๐—ฒ๐—ผ๐˜‚๐˜€ ๐—ณ๐—น๐—ฒ๐˜…๐—ถ๐—ฏ๐—น๐—ฒ ๐—บ๐—ฎ๐—ป๐—ถ๐—ฝ๐˜‚๐—น๐—ฎ๐˜๐—ผ๐—ฟ๐˜€ (๐—›๐—™๐— ๐˜€) for bimanual ESD, integrating:

๐Ÿค– ๐—ฆ๐—ฒ๐—ฟ๐—ถ๐—ฎ๐—น ๐—”๐—ฟ๐˜๐—ถ๐—ฐ๐˜‚๐—น๐—ฎ๐˜๐—ฒ๐—ฑ ๐— ๐—ฎ๐—ป๐—ถ๐—ฝ๐˜‚๐—น๐—ฎ๐˜๐—ผ๐—ฟ (๐—ฆ๐—”๐— ) โ€“ for stable, multidirectional tissue traction

๐Ÿ”ฌ ๐—ฃ๐—ฎ๐—ฟ๐—ฎ๐—น๐—น๐—ฒ๐—น ๐—–๐—ผ๐—ป๐˜๐—ถ๐—ป๐˜‚๐˜‚๐—บ ๐—ช๐—ฟ๐—ถ๐˜€๐˜ (๐—ฃ๐—–๐—ช) โ€“ for accurate tissue dissection

๐Ÿ“ ๐—ž๐—ฒ๐˜† ๐—ฐ๐—ผ๐—ป๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ถ๐—ผ๐—ป๐˜€:

โœ” Kinematic modeling using Denavitโ€“Hartenberg & Cosserat rod methods

โœ” Workspace & dexterity analysis via simulation

โœ” Validation through 16 ex vivo ESD tests

๐Ÿ’ก This work demonstrates a novel strategy for surgical roboticsโ€”leveraging heterogeneous structures to enhance flexibility, stiffness, and accuracy in minimally invasive procedures.

๐Ÿ‘ Kudos to our amazing team and collaborators from CUHK (Prof. Huxin Gao, Tao Zhang, Prof. Hongliang Ren), Qilu Hospital (Xiaoxiao Yang, Prof. ๅทฆ็ง€ไธฝ, Prof. Yanqing Li), Southern University of Science and Technology (Xiao Xiao, Prof. Qinghu Meng), and Beijing Institute of Technology (Prof. Changsheng Li)!

๐Ÿ“– Stay tuned for the full article in IEEE RAM!

No alternative text description for this image
No alternative text description for this image
No alternative text description for this image

๐Ÿš€ Thrilled to share that our recent work has been honored with the ๐—ฅ๐—ผ๐—ฏ๐—ผ๐˜๐—ถ๐—ฐ๐˜€ ๐—•๐—ฒ๐˜€๐˜ ๐—ฃ๐—ฎ๐—ฝ๐—ฒ๐—ฟ ๐—”๐˜„๐—ฎ๐—ฟ๐—ฑ at ๐—œ๐—˜๐—˜๐—˜ #๐—ฅ๐—ข๐—•๐—œ๐—ข๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฑ in Chengdu.

๐Ÿ† Paper: Contact-Aided Navigation of Flexible Robotic Endoscope Using Deep Reinforcement Learning in Dynamic Stomach

๐Ÿ‘ฉโ€๐Ÿ”ฌ Authors: Chi Kit Ng, Huxin Gao, Tianao Ren, Prof. Jiewen Lai, and Prof. Hongliang Ren

๐Ÿ” ๐—ช๐—ต๐˜† ๐—ถ๐˜ ๐—บ๐—ฎ๐˜๐˜๐—ฒ๐—ฟ๐˜€:

Navigating flexible robotic endoscopes in the dynamic, deformable stomach environment is a grand challenge. Our proposed Contact-Aided Navigation (CAN) strategy, powered by deep reinforcement learning and force-feedback, achieved:

โ€ข 100% success rate in both static and dynamic simulated stomach environments

โ€ข Average navigation error of just ๐Ÿญ.๐Ÿฒ ๐—บ๐—บ

โ€ข Robust generalization even under strong external disturbances

This work highlights how ๐—ฒ๐—บ๐—ฏ๐—ผ๐—ฑ๐—ถ๐—ฒ๐—ฑ ๐—”๐—œ ๐—ฎ๐—ป๐—ฑ ๐—ฏ๐—ถ๐—ผ๐—บ๐—ฒ๐—ฐ๐—ต๐—ฎ๐—ป๐—ถ๐—ฐ๐˜€-๐—ถ๐—ป๐˜€๐—ฝ๐—ถ๐—ฟ๐—ฒ๐—ฑ ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐—ถ๐—ฒ๐˜€ can transform surgical robotics, enabling safer and more precise navigation in complex clinical environments.

Check the paper at https://lnkd.in/g6KgZTdD

๐Ÿ™ Huge thanks to the team, collaborators, and the broader robotics community for the support and inspiration.

No alternative text description for this image
No alternative text description for this image
No alternative text description for this image
No alternative text description for this image
No alternative text description for this image
No alternative text description for this image