Overview

Robots are increasingly expected to operate in unstructured and dynamic environments. In such environments, robots must be capable of performing complex and diverse tasks, which places new demands on robustness and interpretability of robot perception.

In this workshop, we aim at examining the state of the art in robot perception and discuss what is still missing for achieving these properties for rigorous robot perception. We will discuss how perception in modular pipelines and in end-to-end learning approaches, e.g., using foundation models such as VLMs or VLAs, can support robustness and interpretability. The intended workshop scope is not limited to visual perception, instead we want to also explore how different modalities (e.g. tactile perception) can contribute to answering above questions. Lastly, in the workshop, we will ask how robustness and interpretability can be assessed for robot systems.

By invited talks, roundtable discussions, and spotlight/poster presentations of contributed extended abstracts, the workshop provides an opportunity to identify open challenges, assess the promises and limitations of current approaches, and chart new directions for achieving robust and interpretable perception in robotics.



Speakers

  • Margarita Chli

    Margarita Chli

    ETH Zurich and University of Cyprus

    She is a Professor of Robotic Vision and Director of the Vision for Robotics Lab at the University of Cyprus, and a Visiting Professor at ETH Zurich. Her research has pioneered vision-based autonomous flight and collaborative monocular SLAM for drone swarms. She is the recipient of an ERC Consolidator Grant and has delivered invited keynote talks at venues including the World Economic Forum in Davos, TEDx, and ICRA.

  • Xiaolong Wang

    Xiaolong Wang

    UC San Diego

    He is an Assistant Professor in the ECE department at the University of California, San Diego, and a Visiting Professor at NVIDIA Research. His research focuses on the intersection between computer vision and robotics. His specific interest lies in representation learning with videos and physical robotic interaction data. These comprehensive representations are utilized to facilitate the learning of human-like robot skills, with the goal of generalizing the robot to interact effectively with a wide range of objects and environments in the real physical world.

  • Kostas Alexis

    Kostas Alexis

    Norwegian University of Science and Technology

    He is a Full Professor at the Department of Engineering Cybernetics of the Norwegian University of Science and Technology (NTNU) at Trondheim, Norway. His research goal is to contribute towards establishing true navigational and operational autonomy for robotics.

  • Yu Xiang

    Yu Xiang

    University of Texas at Dallas

    He is an Assistant Professor in the Department of Computer Science at the University of Texas at Dallas. His research lies at the intersection of robotics and computer vision, with a focus on enabling intelligent systems to perceive, understand, and act in complex 3D environments.

  • Georgia Chalvatzaki

    Georgia Chalvatzaki

    TU Darmstadt

    She is a Full Professor for Interactive Robot Perception & Learning at the Computer Science Department of the Technical University of Darmstadt and Hessian.AI. Her research focuses on robot learning for mobile manipulation in assistive robotics, advancing embodied AI through methods at the intersection of machine learning and classical robotics.

  • Juxi Leitner

    Juxi Leitner

    Amazon Robotics & Monash University

    He is an Applied Science Manager at Amazon Robotics, co-founder of Lyro Robotics, and Adjunct Senior Lecturer at Monash University. For more than 20 years, he has worked at the intersection of robotics, AI, and computer vision, leading cross-disciplinary teams from research ideas to prototypes and deployed robotic systems. His work spans robotic manipulation, humanoid robotics, space robotics, and intelligent automation, with experience across academia, industry, and research institutions in Europe and Australia.

Schedule

This is a preliminary version of the schedule and may be subject to change.

TimeDescription
8:50Opening Remarks by the Workshop Organizers
9:00Topic 1: Perception in Navigation
9:00Margarita Chli Margarita Chli ETH Zurich and University of CyprusRigorous perception for single- and multi-robot systems: are we there yet?
9:30Xiaolong Wang Xiaolong Wang UC San Diego
10:00Spotlight Talks

Perception Debt: Monitoring Safety-Margin Consumption in Embodied Autonomy
Stavan Dholakia, Abhishek Singh, Aditya Gazta, Shivani Shukla

One-Step Planner: Unified Observation and Decision-Making with Vision-Language Models
Youngjae Yoo, Jae-Woo Choi, DohyungKim, Byoung-Tak Zhang

COIN-BIEVR: 3D Intensity Mapping for Robust LiDAR-Inertial Odometry
Patrick Pfreundschuh, Cedric Le Gentil, Roland Siegwart, Cesar Cadena

10:15Coffee Break and Poster Session

Perception Debt: Monitoring Safety-Margin Consumption in Embodied Autonomy
Stavan Dholakia, Abhishek Singh, Aditya Gazta, Shivani Shukla

One-Step Planner: Unified Observation and Decision-Making with Vision-Language Models
Youngjae Yoo, Jae-Woo Choi, DohyungKim, Byoung-Tak Zhang

COIN-BIEVR: 3D Intensity Mapping for Robust LiDAR-Inertial Odometry
Patrick Pfreundschuh, Cedric Le Gentil, Roland Siegwart, Cesar Cadena

In-context adaptation of place recognition through self-supervised learning from video
Kiavash Jamshidi, Hermann Blum, Gülhan Şikaroğlu

Language-Based Swarm Perception: Decentralized Person Re-Identification via Natural Language Descriptions
Miquel Kegeleirs, Lorenzo Garattoni, Gianpiero Francesca, Mauro Birattari

Extended Abstract: Adaptive LiDAR Inertial Odometry with an Ellipsoid Representation (EllipseLIO)
Rowan Border, Margarita Chli

Cross-Modal Benchmarking for Robotic Perception in Natural Environments
David Hall, Joshua Knights, Mark Cox, Peyman Moghadam

SUPER -- A Framework for Sensitivity-based Uncertainty-aware Performance and Risk Assessment in Visual Inertial Odometry
Johannes A. Gaus, Daniel Haeufle, Woo-Jeong Baek

Visual Layer Selection Matters for Egocentric VLM Perception
Ruchen Liu, Yi Yang, Yiming Xu, Monika Sester, Bodo Rosenhahn

Lensless Aerial Navigation in Dark
Deepak Singh, Hudson Kortus, Jahnavi Prudhivi, Vivek Reddy Kasireddy, Nitin J. Sanket

Spatially Stratified Distillation for Heterogeneous Radar Place Recognition
Sagun Man Singh Shrestha, Abdelwahed Khamis, Saimunur Rahman, Peyman Moghadam

11:00Kostas Alexis Kostas Alexis Norwegian University of Science and TechnologyThe Role of FMCW Radar in Resilient Robot Perception
11:30Roundtable Discussion
12:00Lunch Break
14:00Topic 2: Perception in Manipulation
14:00Yu Xiang Yu Xiang University of Texas at DallasFrom Modular Robotics Pipelines to Vision-Language-Action Systems: Lessons from Real-World Manipulation
14:30Georgia Chalvatzaki Georgia Chalvatzaki TU DarmstadtStructured Robot Learning for Rigorous Manipulation: From Perception to Action
15:00Spotlight Talks

GroundedPlanBench: Spatially Grounded Long-Horizon Task Planning
Sehun Jung, Hyunjee Song, Dong-Hee Kim, Reuben Tan, Jianfeng Gao, Yong Jae Lee, Donghyun Kim

GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks
Davide Buoso, Andrea Protopapa, Stefano Di Carlo, Francesca Pistilli, Giuseppe Averta

Input-Aware Routing of Image-to-3D Models for Robotic Manipulation
Akash Anand, Aditya Agarwal, Leslie Pack Kaelbling

15:15Poster Session

GroundedPlanBench: Spatially Grounded Long-Horizon Task Planning
Sehun Jung, Hyunjee Song, Dong-Hee Kim, Reuben Tan, Jianfeng Gao, Yong Jae Lee, Donghyun Kim

GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks
Davide Buoso, Andrea Protopapa, Stefano Di Carlo, Francesca Pistilli, Giuseppe Averta

Input-Aware Routing of Image-to-3D Models for Robotic Manipulation
Akash Anand, Aditya Agarwal, Leslie Pack Kaelbling

Robust Pose Estimation through Failure Explanation and Mitigation
Loris Schneider, Yitian Shi, Rosa Wolf, Carolin Brenner, Rudolph Triebel, Rania Rayyes

Core-Agnostic Compliance Perception for Rigid–Deformable Coupled Objects using Vision-Based Tactile Sensing
CanZhao, Yanghui Ding, Haonan Zhao, Yebao Hu, Daolin Ma

U-VINDO: Underwater Visual-Inertial Odometry Enhanced with Robot Dynamics Predictions Powered by Port-Hamiltonian Neural ODE Networks
Yazan Maalla, Sergey Kolyubin, Zein Alabedeen Barhoum

Training-Free 6D Robot Pose Estimation with Neural Memory Objects
Sebastian Jung, Leonard Klüpfel, Tjark Darius, Rudolph Triebel, Maximilian Durner

Task-Relevant Depth Quality Metrics for Suction Grasping
Shivansh Inamdar

Compositional Neural Field Movement Primitives
Ahmet Ercan Tekden, Yasemin Bekiroglu

OSMa-Bench++: Toward Open-Ended Benchmarking of Semantic Mapping for Manipulation with Prompt-Generated Synthetic Scenes
Regina Kurkova, Maxim Popov, Sergey Kolyubin

IFG: Internet-Scale Guidance for Functional Grasping Generation
Muxin Liu, Mingxuan Li, Kenneth Shaw, Deepak Pathak

EVII: Measuring Early Visual Integration in VLM Reasoning
Hakan Muluk, Ozgur S. Oguz

16:00Juxi Leitner Juxi Leitner Amazon Robotics & Monash University
16:30Roundtable Discussion
17:00Closing Remarks

Accepted Papers

The accepted papers are listed below. After the final version has been submitted, the pdf will appear here as well.

Perception Debt: Monitoring Safety-Margin Consumption in Embodied Autonomy
Stavan Dholakia, Abhishek Singh, Aditya Gazta, Shivani Shukla

One-Step Planner: Unified Observation and Decision-Making with Vision-Language Models
Youngjae Yoo, Jae-Woo Choi, DohyungKim, Byoung-Tak Zhang

COIN-BIEVR: 3D Intensity Mapping for Robust LiDAR-Inertial Odometry
Patrick Pfreundschuh, Cedric Le Gentil, Roland Siegwart, Cesar Cadena

In-context adaptation of place recognition through self-supervised learning from video
Kiavash Jamshidi, Hermann Blum, Gülhan Şikaroğlu

Language-Based Swarm Perception: Decentralized Person Re-Identification via Natural Language Descriptions
Miquel Kegeleirs, Lorenzo Garattoni, Gianpiero Francesca, Mauro Birattari

Extended Abstract: Adaptive LiDAR Inertial Odometry with an Ellipsoid Representation (EllipseLIO)
Rowan Border, Margarita Chli

Cross-Modal Benchmarking for Robotic Perception in Natural Environments
David Hall, Joshua Knights, Mark Cox, Peyman Moghadam

SUPER -- A Framework for Sensitivity-based Uncertainty-aware Performance and Risk Assessment in Visual Inertial Odometry
Johannes A. Gaus, Daniel Haeufle, Woo-Jeong Baek

Visual Layer Selection Matters for Egocentric VLM Perception
Ruchen Liu, Yi Yang, Yiming Xu, Monika Sester, Bodo Rosenhahn

Lensless Aerial Navigation in Dark
Deepak Singh, Hudson Kortus, Jahnavi Prudhivi, Vivek Reddy Kasireddy, Nitin J. Sanket

Spatially Stratified Distillation for Heterogeneous Radar Place Recognition
Sagun Man Singh Shrestha, Abdelwahed Khamis, Saimunur Rahman, Peyman Moghadam

GroundedPlanBench: Spatially Grounded Long-Horizon Task Planning
Sehun Jung, Hyunjee Song, Dong-Hee Kim, Reuben Tan, Jianfeng Gao, Yong Jae Lee, Donghyun Kim

GAP: Geometric Anchor Pre-training for Data-Efficient Visuomotor Learning of Manipulation Tasks
Davide Buoso, Andrea Protopapa, Stefano Di Carlo, Francesca Pistilli, Giuseppe Averta

Input-Aware Routing of Image-to-3D Models for Robotic Manipulation
Akash Anand, Aditya Agarwal, Leslie Pack Kaelbling

Robust Pose Estimation through Failure Explanation and Mitigation
Loris Schneider, Yitian Shi, Rosa Wolf, Carolin Brenner, Rudolph Triebel, Rania Rayyes

Core-Agnostic Compliance Perception for Rigid–Deformable Coupled Objects using Vision-Based Tactile Sensing
CanZhao, Yanghui Ding, Haonan Zhao, Yebao Hu, Daolin Ma

U-VINDO: Underwater Visual-Inertial Odometry Enhanced with Robot Dynamics Predictions Powered by Port-Hamiltonian Neural ODE Networks
Yazan Maalla, Sergey Kolyubin, Zein Alabedeen Barhoum

Training-Free 6D Robot Pose Estimation with Neural Memory Objects
Sebastian Jung, Leonard Klüpfel, Tjark Darius, Rudolph Triebel, Maximilian Durner

Task-Relevant Depth Quality Metrics for Suction Grasping
Shivansh Inamdar

Compositional Neural Field Movement Primitives
Ahmet Ercan Tekden, Yasemin Bekiroglu

OSMa-Bench++: Toward Open-Ended Benchmarking of Semantic Mapping for Manipulation with Prompt-Generated Synthetic Scenes
Regina Kurkova, Maxim Popov, Sergey Kolyubin

IFG: Internet-Scale Guidance for Functional Grasping Generation
Muxin Liu, Mingxuan Li, Kenneth Shaw, Deepak Pathak

EVII: Measuring Early Visual Integration in VLM Reasoning
Hakan Muluk, Ozgur S. Oguz

Call for Extended Abstracts

We invite the submission of extended abstracts (incl. field reports) on the following topics of interest:

  • - Robust perception for navigation in unstructured and dynamic environments
  • - Robust perception for manipulation in unstructured and everyday environments
  • - Perception in end-to-end learning architectures for robotic navigation and manipulation
  • - Interpretability and robustness of perception in end-to-end learning robot systems
  • - Uncertainty quantification for robot perception methods
  • - Introspection and interpretability of perception methods in robot systems
  • - Tactile or visuo-tactile perception for robust contact-rich manipulation in unstructured environments
  • - Lessons learned from robot perception in integrated robot systems, incl. informative failure cases
  • - Datasets and benchmarks for robustness and interpretability of perception in real-world robot systems

All submitted extended abstracts will be reviewed on the basis of technical quality, relevance, significance, and clarity. The review process will be single-blind. The page limit of submitted extended abstracts is 4 pages including references. Submitted extended abstracts beyond the page limit will be desk rejected without further review. We also accept submissions of previously presented work that you have extended on and work that is being published as part of the ICRA 2026 main conference. Upon acceptance, you will be able to present your submission as part of the poster session. Some extended abstracts will be selected for oral spotlight presentations. All accepted submissions will be available for the workshop on this website (non archival).

Please submit your extended abstracts following the ICRA 2026 format guidelines. For details see the following links:

Please submit your contribution via OpenReview.

Final Submission for Accepted Papers

Final versions of accepted papers are due May 22 May 27, 2026, 23:59 AoE via OpenReview. The page limit of the final version is 4 pages including references. Please use the same ICRA formatting guidelines and template as for the extended abstract submission.

Presentation Instructions for Accepted Papers

All accepted papers will be presented as posters at the workshop. Authors of accepted papers should prepare a poster in DIN A0 Portrait format.
Papers selected as spotlight will additionally receive a 5 min oral presentation slot. Spotlight presenters are asked to present from their own laptop.

Important Dates

Extended Abstract Submission Deadline: April 7 April 17, 2026, 23:59 AoE

Decision Notification: May 8 May 11, 2026

Final Version: May 22 May 27, 2026, 23:59 AoE

Workshop Date: June 1, 2026

Organizers