{"id":3449,"date":"2026-02-17T01:12:18","date_gmt":"2026-02-17T01:12:18","guid":{"rendered":"http:\/\/www.labren.org\/mm\/?p=3449"},"modified":"2026-02-17T01:12:19","modified_gmt":"2026-02-17T01:12:19","slug":"%f0%9f%9a%80-icra-2026-%f0%9d%91%ae%f0%9d%92%86%f0%9d%92%90%f0%9d%91%b3%f0%9d%92%82%f0%9d%92%8f%f0%9d%91%ae-%f0%9d%91%ae%f0%9d%92%86%f0%9d%92%90%f0%9d%92%8e%f0%9d%92%86%f0%9d%92%95%f0%9d%92%93","status":"publish","type":"post","link":"http:\/\/www.labren.org\/mm\/news\/%f0%9f%9a%80-icra-2026-%f0%9d%91%ae%f0%9d%92%86%f0%9d%92%90%f0%9d%91%b3%f0%9d%92%82%f0%9d%92%8f%f0%9d%91%ae-%f0%9d%91%ae%f0%9d%92%86%f0%9d%92%90%f0%9d%92%8e%f0%9d%92%86%f0%9d%92%95%f0%9d%92%93\/","title":{"rendered":"\ud83d\ude80 ICRA 2026: \ud835\udc6e\ud835\udc86\ud835\udc90\ud835\udc73\ud835\udc82\ud835\udc8f\ud835\udc6e: \ud835\udc6e\ud835\udc86\ud835\udc90\ud835\udc8e\ud835\udc86\ud835\udc95\ud835\udc93\ud835\udc9a-\ud835\udc68\ud835\udc98\ud835\udc82\ud835\udc93\ud835\udc86 \ud835\udc73\ud835\udc82\ud835\udc8f\ud835\udc88\ud835\udc96\ud835\udc82\ud835\udc88\ud835\udc86-\ud835\udc6e\ud835\udc96\ud835\udc8a\ud835\udc85\ud835\udc86\ud835\udc85 \ud835\udc6e\ud835\udc93\ud835\udc82\ud835\udc94\ud835\udc91\ud835\udc8a\ud835\udc8f\ud835\udc88 \ud835\udc98\ud835\udc8a\ud835\udc95\ud835\udc89 \ud835\udc7c\ud835\udc8f\ud835\udc8a\ud835\udc87\ud835\udc8a\ud835\udc86\ud835\udc85 \ud835\udc79\ud835\udc6e\ud835\udc69-\ud835\udc6b \ud835\udc74\ud835\udc96\ud835\udc8d\ud835\udc95\ud835\udc8a\ud835\udc8e\ud835\udc90\ud835\udc85\ud835\udc82\ud835\udc8d \ud835\udc73\ud835\udc86\ud835\udc82\ud835\udc93\ud835\udc8f\ud835\udc8a\ud835\udc8f\ud835\udc88"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Thrilled to share our latest work, \ud835\udc06\ud835\udc1e\ud835\udc28\ud835\udc0b\ud835\udc1a\ud835\udc27\ud835\udc06, a unified geometry-aware framework for language-guided robotic grasping.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Language-guided grasping is a key capability for intuitive human\u2013robot interaction. A robot should not only detect objects but also understand natural instructions such as \u201cpick up the blue cup behind the bowl.\u201d While recent multimodal models have shown promising results, most existing approaches rely on multi-stage pipelines that loosely couple perception and grasp prediction. These methods often overlook the tight integration of geometry, language, and visual reasoning, making them fragile in cluttered, occluded, or low-texture environments. This motivated us to bridge the gap between semantic language understanding and precise geometric grasp execution.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\ud83e\udde0\u2728 \ud835\udc16\ud835\udc21\ud835\udc1a\ud835\udc2d \ud835\udc30\ud835\udc1e \ud835\udc1d\ud835\udc1e\ud835\udc2f\ud835\udc1e\ud835\udc25\ud835\udc28\ud835\udc29\ud835\udc1e\ud835\udc1d:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A novel unified framework for geometry-aware language-guided grasping that includes:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\ud83d\udd39 Unified RGB-D Multimodal Representation:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;We embed RGB, depth, and language features into a shared representation space, enabling consistent cross-modal semantic alignment for accurate target reasoning.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\ud83d\udd39 Depth-Guided Geometric Module (DGGM):<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;Instead of treating depth as auxiliary input, we explicitly inject geometric priors derived from depth into the attention mechanism, strengthening object discrimination under occlusion and ambiguous visual conditions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\ud83d\udd39 Adaptive Dense Channel Integration (ADCI):<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;A dynamic multi-layer fusion strategy that balances global semantic cues and fine-grained geometric details for robust grasp prediction.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\ud83c\udfaf &nbsp;\ud835\udc0a\ud835\udc1e\ud835\udc32 \ud835\udc11\ud835\udc1e\ud835\udc2c\ud835\udc2e\ud835\udc25\ud835\udc2d\ud835\udc2c:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2705 GeoLanG significantly outperforms prior multi-stage baselines on OCID-VLG for language-guided grasping.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2705 Demonstrates strong robustness in cluttered and heavily occluded scenes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2705 Successfully validated on real robotic hardware, showing reliable sim-to-real transfer.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\ud83d\udca1 \ud835\udc16\ud835\udc21\ud835\udc32 \ud835\udc22\ud835\udc2d \ud835\udc26\ud835\udc1a\ud835\udc2d\ud835\udc2d\ud835\udc1e\ud835\udc2b\ud835\udc2c:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This work shows that tightly coupling geometric reasoning with multimodal language understanding can significantly enhance robotic grasp reliability. By embedding depth-aware geometric priors directly into attention mechanisms, we reduce ambiguity and improve consistency in grasp decision-making.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">GeoLanG provides a pathway toward more intelligent robotic systems that understand not just what object to grasp, but also how to grasp it robustly in complex real-world environments.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\ud83c\udf31 \ud835\udc16\ud835\udc21\ud835\udc1a\ud835\udc2d\u2019\ud835\udc2c \ud835\udc27\ud835\udc1e\ud835\udc31\ud835\udc2d?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We are exploring extending this geometry-aware multimodal reasoning toward:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;\ud83d\udd39 Real-time interactive grasping<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;\ud83d\udd39 Multi-step manipulation tasks<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;\ud83d\udd39 Integration with motion planning and autonomous robotic control<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>#ICRA2026<\/strong> <strong>#CUHK<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/media.licdn.com\/dms\/image\/v2\/D5622AQEBcXzApGU-7g\/feedshare-shrink_2048_1536\/B56ZxpJZk5KkAk-\/0\/1771290597851?e=1772668800&amp;v=beta&amp;t=RJ-PIgNPKywJgR_35vONvTXPe3qctX-QBPDfpTs6TcE\" alt=\"No alternative text description for this image\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/media.licdn.com\/dms\/image\/v2\/D5622AQHVNQEydDc1Hw\/feedshare-shrink_800\/B56ZxpJZp5H8Ag-\/0\/1771290598174?e=1772668800&amp;v=beta&amp;t=RRFS-1BYRCHtwQh0YJnC-7IOUQw88u7LYSQxBSIO7Uc\" alt=\"No alternative text description for this image\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Thrilled to share our latest work, \ud835\udc06\ud835\udc1e\ud835\udc28\ud835\udc0b\ud835\udc1a\ud835\udc27\ud835\udc06, a unified geometry-aware framework for language-guided robotic grasping. Language-guided grasping is a key capability for intuitive human\u2013robot interaction. A robot should not only detect objects but also understand natural instructions such as \u201cpick up the blue cup behind the bowl.\u201d While recent multimodal\u2026 <a class=\"continue-reading-link\" href=\"http:\/\/www.labren.org\/mm\/news\/%f0%9f%9a%80-icra-2026-%f0%9d%91%ae%f0%9d%92%86%f0%9d%92%90%f0%9d%91%b3%f0%9d%92%82%f0%9d%92%8f%f0%9d%91%ae-%f0%9d%91%ae%f0%9d%92%86%f0%9d%92%90%f0%9d%92%8e%f0%9d%92%86%f0%9d%92%95%f0%9d%92%93\/\">Continue reading<\/a><\/p>\n","protected":false},"author":17,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[4],"tags":[],"class_list":["post-3449","post","type-post","status-publish","format-standard","hentry","category-news"],"_links":{"self":[{"href":"http:\/\/www.labren.org\/mm\/wp-json\/wp\/v2\/posts\/3449","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.labren.org\/mm\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.labren.org\/mm\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.labren.org\/mm\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"http:\/\/www.labren.org\/mm\/wp-json\/wp\/v2\/comments?post=3449"}],"version-history":[{"count":1,"href":"http:\/\/www.labren.org\/mm\/wp-json\/wp\/v2\/posts\/3449\/revisions"}],"predecessor-version":[{"id":3450,"href":"http:\/\/www.labren.org\/mm\/wp-json\/wp\/v2\/posts\/3449\/revisions\/3450"}],"wp:attachment":[{"href":"http:\/\/www.labren.org\/mm\/wp-json\/wp\/v2\/media?parent=3449"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.labren.org\/mm\/wp-json\/wp\/v2\/categories?post=3449"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.labren.org\/mm\/wp-json\/wp\/v2\/tags?post=3449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}