Overview
Robots are increasingly expected to operate in unstructured and dynamic environments. In such environments, robots must be capable of performing complex and diverse tasks, which places new demands on robustness and interpretability of robot perception.
In this workshop, we aim at examining the state of the art in robot perception and discuss what is still missing for achieving these properties for rigorous robot perception. We will discuss how perception in modular pipelines and in end-to-end learning approaches, e.g., using foundation models such as VLMs or VLAs, can support robustness and interpretability. The intended workshop scope is not limited to visual perception, instead we want to also explore how different modalities (e.g. tactile perception) can contribute to answering above questions. Lastly, in the workshop, we will ask how robustness and interpretability can be assessed for robot systems.
By invited talks, roundtable discussions, and spotlight/poster presentations of contributed extended abstracts, the workshop provides an opportunity to identify open challenges, assess the promises and limitations of current approaches, and chart new directions for achieving robust and interpretable perception in robotics.
Speakers

Margarita Chli
ETH Zurich and University of Cyprus
She is a Professor of Robotic Vision and Director of the Vision for Robotics Lab at the University of Cyprus, and a Visiting Professor at ETH Zurich. Her research has pioneered vision-based autonomous flight and collaborative monocular SLAM for drone swarms. She is the recipient of an ERC Consolidator Grant and has delivered invited keynote talks at venues including the World Economic Forum in Davos, TEDx, and ICRA.

Xiaolong Wang
UC San Diego
He is an Assistant Professor in the ECE department at the University of California, San Diego, and a Visiting Professor at NVIDIA Research. His research focuses on the intersection between computer vision and robotics. His specific interest lies in representation learning with videos and physical robotic interaction data. These comprehensive representations are utilized to facilitate the learning of human-like robot skills, with the goal of generalizing the robot to interact effectively with a wide range of objects and environments in the real physical world.

Kostas Alexis
Norwegian University of Science and Technology
He is a Full Professor at the Department of Engineering Cybernetics of the Norwegian University of Science and Technology (NTNU) at Trondheim, Norway. His research goal is to contribute towards establishing true navigational and operational autonomy for robotics.

Yu Xiang
University of Texas at Dallas
He is an Assistant Professor in the Department of Computer Science at the University of Texas at Dallas. His research lies at the intersection of robotics and computer vision, with a focus on enabling intelligent systems to perceive, understand, and act in complex 3D environments.

Georgia Chalvatzaki
TU Darmstadt
She is a Full Professor for Interactive Robot Perception & Learning at the Computer Science Department of the Technical University of Darmstadt and Hessian.AI. Her research focuses on robot learning for mobile manipulation in assistive robotics, advancing embodied AI through methods at the intersection of machine learning and classical robotics.

Juxi Leitner
Amazon Robotics & Monash University
He is an Applied Science Manager at Amazon Robotics, co-founder of Lyro Robotics, and Adjunct Senior Lecturer at Monash University. For more than 20 years, he has worked at the intersection of robotics, AI, and computer vision, leading cross-disciplinary teams from research ideas to prototypes and deployed robotic systems. His work spans robotic manipulation, humanoid robotics, space robotics, and intelligent automation, with experience across academia, industry, and research institutions in Europe and Australia.
Schedule
This is a preliminary version of the schedule and may be subject to change.
| Time | Description | |
|---|---|---|
| 8:50 | Opening Remarks by the Workshop Organizers | |
| 9:00 | Topic 1: Perception in Navigation | |
| 9:00 |
Margarita Chli
ETH Zurich and University of Cyprus | Rigorous perception for single- and multi-robot systems: are we there yet? |
| 9:30 |
Xiaolong Wang
UC San Diego | |
| 10:00 | Spotlight Talks Perception Debt: Monitoring Safety-Margin Consumption in Embodied Autonomy One-Step Planner: Unified Observation and Decision-Making with Vision-Language Models COIN-BIEVR: 3D Intensity Mapping for Robust LiDAR-Inertial Odometry | |
| 10:15 | Coffee Break and Poster Session Perception Debt: Monitoring Safety-Margin Consumption in Embodied Autonomy One-Step Planner: Unified Observation and Decision-Making with Vision-Language Models COIN-BIEVR: 3D Intensity Mapping for Robust LiDAR-Inertial Odometry In-context adaptation of place recognition through self-supervised learning from video Language-Based Swarm Perception: Decentralized Person Re-Identification via Natural Language Descriptions Extended Abstract: Adaptive LiDAR Inertial Odometry with an Ellipsoid Representation (EllipseLIO) Cross-Modal Benchmarking for Robotic Perception in Natural Environments SUPER -- A Framework for Sensitivity-based Uncertainty-aware Performance and Risk Assessment in Visual Inertial Odometry Visual Layer Selection Matters for Egocentric VLM Perception Lensless Aerial Navigation in Dark Spatially Stratified Distillation for Heterogeneous Radar Place Recognition | |
| 11:00 |
Kostas Alexis
Norwegian University of Science and Technology | The Role of FMCW Radar in Resilient Robot Perception |
| 11:30 | Roundtable Discussion | |
| 12:00 | Lunch Break | |
| 14:00 | Topic 2: Perception in Manipulation | |
| 14:00 |
Yu Xiang
University of Texas at Dallas | From Modular Robotics Pipelines to Vision-Language-Action Systems: Lessons from Real-World Manipulation |
| 14:30 |
Georgia Chalvatzaki
TU Darmstadt | Structured Robot Learning for Rigorous Manipulation: From Perception to Action |
| 15:00 | Spotlight Talks GroundedPlanBench: Spatially Grounded Long-Horizon Task Planning GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks Input-Aware Routing of Image-to-3D Models for Robotic Manipulation | |
| 15:15 | Poster Session GroundedPlanBench: Spatially Grounded Long-Horizon Task Planning GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks Input-Aware Routing of Image-to-3D Models for Robotic Manipulation Robust Pose Estimation through Failure Explanation and Mitigation Core-Agnostic Compliance Perception for Rigid–Deformable Coupled Objects using Vision-Based Tactile Sensing U-VINDO: Underwater Visual-Inertial Odometry Enhanced with Robot Dynamics Predictions Powered by Port-Hamiltonian Neural ODE Networks Training-Free 6D Robot Pose Estimation with Neural Memory Objects Task-Relevant Depth Quality Metrics for Suction Grasping Compositional Neural Field Movement Primitives OSMa-Bench++: Toward Open-Ended Benchmarking of Semantic Mapping for Manipulation with Prompt-Generated Synthetic Scenes IFG: Internet-Scale Guidance for Functional Grasping Generation EVII: Measuring Early Visual Integration in VLM Reasoning | |
| 16:00 |
Juxi Leitner
Amazon Robotics & Monash University | |
| 16:30 | Roundtable Discussion | |
| 17:00 | Closing Remarks | |
Accepted Papers
The accepted papers are listed below. After the final version has been submitted, the pdf will appear here as well.
Perception Debt: Monitoring Safety-Margin Consumption in Embodied Autonomy
Stavan Dholakia, Abhishek Singh, Aditya Gazta, Shivani Shukla
One-Step Planner: Unified Observation and Decision-Making with Vision-Language Models
Youngjae Yoo, Jae-Woo Choi, DohyungKim, Byoung-Tak Zhang
COIN-BIEVR: 3D Intensity Mapping for Robust LiDAR-Inertial Odometry
Patrick Pfreundschuh, Cedric Le Gentil, Roland Siegwart, Cesar Cadena
In-context adaptation of place recognition through self-supervised learning from video
Kiavash Jamshidi, Hermann Blum, Gülhan Şikaroğlu
Language-Based Swarm Perception: Decentralized Person Re-Identification via Natural Language Descriptions
Miquel Kegeleirs, Lorenzo Garattoni, Gianpiero Francesca, Mauro Birattari
Extended Abstract: Adaptive LiDAR Inertial Odometry with an Ellipsoid Representation (EllipseLIO)
Rowan Border, Margarita Chli
Cross-Modal Benchmarking for Robotic Perception in Natural Environments
David Hall, Joshua Knights, Mark Cox, Peyman Moghadam
SUPER -- A Framework for Sensitivity-based Uncertainty-aware Performance and Risk Assessment in Visual Inertial Odometry
Johannes A. Gaus, Daniel Haeufle, Woo-Jeong Baek
Visual Layer Selection Matters for Egocentric VLM Perception
Ruchen Liu, Yi Yang, Yiming Xu, Monika Sester, Bodo Rosenhahn
Lensless Aerial Navigation in Dark
Deepak Singh, Hudson Kortus, Jahnavi Prudhivi, Vivek Reddy Kasireddy, Nitin J. Sanket
Spatially Stratified Distillation for Heterogeneous Radar Place Recognition
Sagun Man Singh Shrestha, Abdelwahed Khamis, Saimunur Rahman, Peyman Moghadam
GroundedPlanBench: Spatially Grounded Long-Horizon Task Planning
Sehun Jung, Hyunjee Song, Dong-Hee Kim, Reuben Tan, Jianfeng Gao, Yong Jae Lee, Donghyun Kim
GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks
Davide Buoso, Andrea Protopapa, Stefano Di Carlo, Francesca Pistilli, Giuseppe Averta
Input-Aware Routing of Image-to-3D Models for Robotic Manipulation
Akash Anand, Aditya Agarwal, Leslie Pack Kaelbling
Robust Pose Estimation through Failure Explanation and Mitigation
Loris Schneider, Yitian Shi, Rosa Wolf, Carolin Brenner, Rudolph Triebel, Rania Rayyes
Core-Agnostic Compliance Perception for Rigid–Deformable Coupled Objects using Vision-Based Tactile Sensing
CanZhao, Yanghui Ding, Haonan Zhao, Yebao Hu, Daolin Ma
U-VINDO: Underwater Visual-Inertial Odometry Enhanced with Robot Dynamics Predictions Powered by Port-Hamiltonian Neural ODE Networks
Yazan Maalla, Sergey Kolyubin, Zein Alabedeen Barhoum
Training-Free 6D Robot Pose Estimation with Neural Memory Objects
Sebastian Jung, Leonard Klüpfel, Tjark Darius, Rudolph Triebel, Maximilian Durner
Task-Relevant Depth Quality Metrics for Suction Grasping
Shivansh Inamdar
Compositional Neural Field Movement Primitives
Ahmet Ercan Tekden, Yasemin Bekiroglu
OSMa-Bench++: Toward Open-Ended Benchmarking of Semantic Mapping for Manipulation with Prompt-Generated Synthetic Scenes
Regina Kurkova, Maxim Popov, Sergey Kolyubin
IFG: Internet-Scale Guidance for Functional Grasping Generation
Muxin Liu, Mingxuan Li, Kenneth Shaw, Deepak Pathak
EVII: Measuring Early Visual Integration in VLM Reasoning
Hakan Muluk, Ozgur S. Oguz
Call for Extended Abstracts
We invite the submission of extended abstracts (incl. field reports) on the following topics of interest:
- - Robust perception for navigation in unstructured and dynamic environments
- - Robust perception for manipulation in unstructured and everyday environments
- - Perception in end-to-end learning architectures for robotic navigation and manipulation
- - Interpretability and robustness of perception in end-to-end learning robot systems
- - Uncertainty quantification for robot perception methods
- - Introspection and interpretability of perception methods in robot systems
- - Tactile or visuo-tactile perception for robust contact-rich manipulation in unstructured environments
- - Lessons learned from robot perception in integrated robot systems, incl. informative failure cases
- - Datasets and benchmarks for robustness and interpretability of perception in real-world robot systems
All submitted extended abstracts will be reviewed on the basis of technical quality, relevance, significance, and clarity. The review process will be single-blind. The page limit of submitted extended abstracts is 4 pages including references. Submitted extended abstracts beyond the page limit will be desk rejected without further review. We also accept submissions of previously presented work that you have extended on and work that is being published as part of the ICRA 2026 main conference. Upon acceptance, you will be able to present your submission as part of the poster session. Some extended abstracts will be selected for oral spotlight presentations. All accepted submissions will be available for the workshop on this website (non archival).
Please submit your extended abstracts following the ICRA 2026 format guidelines. For details see the following links:
Please submit your contribution via OpenReview.
Final Submission for Accepted Papers
Final versions of accepted papers are due May 22 May 27, 2026, 23:59 AoE via OpenReview. The page limit of the final version is 4 pages including references. Please use the same ICRA formatting guidelines and template as for the extended abstract submission.
Presentation Instructions for Accepted Papers
All accepted papers will be presented as posters at the workshop. Authors of accepted papers should prepare a poster in DIN A0 Portrait format.
Papers selected as spotlight will additionally receive a 5 min oral presentation slot. Spotlight presenters are asked to present from their own laptop.
Important Dates
Extended Abstract Submission Deadline: April 7 April 17, 2026, 23:59 AoE
Decision Notification: May 8 May 11, 2026
Final Version: May 22 May 27, 2026, 23:59 AoE
Workshop Date: June 1, 2026
Organizers

Jens Behley
University of Bonn

Maximilian Durner
German Aerospace Center (DLR)

Raphael Hagmanns
Fraunhofer IOSB

Ayoung Kim
Seoul National University

Dimity Miller
QUT Centre for Robotics

Joerg Stueckler
University of Augsburg

Rudolph Triebel
German Aerospace Center (DLR), Karlsruhe Institute of Technology (KIT)

Wenzhen Yuan
University of Illinois
