Behavior cloning (BC) is a widely-used approach in imitation learning, where a robot learns a control policy by observing an expert supervisor. However, the learned policy can make errors and might lead to safety violations, which limits their utility in safety-critical robotics applications. While prior works have tried improving a BC policy via additional real or synthetic action labels, adversarial training, or runtime filtering, none of them explicitly focus on reducing the BC policy's safety violations during training time. We propose SAFE-GIL, a design-time method to learn safety-aware behavior cloning policies. SAFE-GIL deliberately injects adversarial disturbance in the system during data collection to guide the expert towards safety-critical states. This disturbance injection simulates potential policy errors that the system might encounter during the test time. By ensuring that training more closely replicates expert behavior in safety-critical states, our approach results in safer policies despite policy errors during the test time. We further develop a reachability-based method to compute this adversarial disturbance. Our method demonstrates a significant reduction in safety failures, particularly in low data regimes where the likelihood of learning errors, and therefore safety violations, is higher.
[Paper] [Project Website]Modern autonomous systems often rely on machine learning to operate intelligently in uncertain or a priori unknown environments, making it even trickier to obtain robust safety assurances outside of the training regime. In this research thrust, we focus on understanding how a robot can safely learn in such settings, adapt its safety assurances during operation time in light of system or environment evolution, and continuously improve the learning components to new risks and safety hazards.
Safety Guided Imitation Learning
Controlling Covariate Shift with Stable Behavior Cloning
Behavior cloning is a common imitation learning paradigm. Under behavior cloning, the robot collects expert demonstrations and then trains a policy to match the actions taken by the expert. This works well when the robot learner visits states where the expert has already demonstrated the correct action, but inevitably the robot will also encounter new states outside of its training dataset. If the robot learner takes the wrong action at these new states it could move farther from the training data, which in turn leads to increasingly incorrect actions and compounding errors. Existing works try to address this fundamental challenge by augmenting or enhancing the training data. By contrast, in this project, we develop the control theoretic properties of behavior cloned policies. Specifically, we consider the error dynamics between the system's current state and the states in the expert dataset. From the error dynamics, we derive model-based and model-free conditions for stability: under these conditions, the robot shapes its policy so that its current behavior converges towards example behaviors in the expert dataset. In practice, this results in Stable-BC, an easy-to-implement extension of standard behavior cloning that is provably robust to covariate shift.
[Paper] [Code][Project Website] [Video]Learning Robot Safety Representations from Natural Language Feedback
Current safe control approaches typically assume that the safety constraints are known a priori, and thus, the robot can pre-compute a corresponding safety controller. While this may make sense for some safety constraints (e.g., avoiding collision with walls by analyzing a floor plan), other constraints are more complex (e.g., spills), inherently personal, context-dependent, and can only be identified at deployment time when the robot is interacting in a specific environment and with a specific person (e.g., fragile objects, expensive rugs). Language can provide a flexible mechanism to communicate these evolving safety constraints to the robot. In this work, we use vision language models (VLMs) to interpret language feedback and the robot's image observations to continuously update the robot's representation of safety constraints. With these inferred constraints, we update a Hamilton-Jacobi reachability safety controller online via efficient warm-starting techniques. Through simulation and hardware experiments, we demonstrate the robot's ability to infer and respect language-based safety constraints with the proposed approach.
[Paper] [Code][Project Website]Enhancing Safety and Robustness of Vision-Based Controllers via Reachability Analysis
Autonomous systems leveraging visual inputs and machine learning for decision-making and control have made significant strides in recent years. Despite their impressive performance, these vision-based controllers can make erroneous predictions when faced with novel or out-of-distribution inputs. Such errors can cascade into catastrophic system failures and compromise system safety. In this project, we compute Neural Reachable Tubes, a parameterized approximation of Backward Reachable Tubes to stress-test the vision-based controllers and mine their failure modes. The identified failures are then used to enhance the system safety through both offline and online methods. The online approach involves training a classifier as a run-time failure monitor to detect closed-loop, system-level failures, subsequently triggering a fallback controller that robustly handles these detected failures to preserve system safety. For the offline approach, we improve the original controller via incremental training using a carefully augmented failure dataset, resulting in a more robust controller resistant to the known failure modes. In either approach, the system is safeguarded against shortcomings that transcend the vision-based controller and pertain to overall system safety. We validate the proposed approaches on an autonomous aircraft taxiing task that involves using a vision-based controller to guide the aircraft towards the centerline of the runway.
[Paper1] [Paper2] [Code] [Project Website]System-Level Safety Monitoring and Recovery for Perception Failures in Autonomous Vehicles
The safety-critical nature of autonomous vehicle (AV) operation necessitates development of task-relevant algorithms that can reason about safety at the system level and not just at the component level. To reason about the impact of a perception failure on the entire system performance, such task-relevant algorithms must contend with various challenges: the complexity of AV stacks, high uncertainty in the operating environments, and the need for real-time performance. To overcome these challenges, in this project, we introduce a Qnetwork called SPARQ (abbreviation for Safety evaluation for Perception And Recovery Q-network) that evaluates the safety of a plan generated by a planning algorithm, accounting for perception failures that the planning process may have overlooked. This Q-network can be queried during system runtime to assess whether a proposed plan is safe for execution or poses potential safety risks. If a violation is detected, the network can then recommend a corrective plan while accounting for the perceptual failure. We validate our algorithm using the NuPlanVegas dataset, demonstrating its ability to handle cases where a perception failure compromises a proposed plan while the corrective plan remains safe. We observe an overall accuracy and recall of 90% while sustaining a frequency of 42Hz on the unseen testing dataset. We compare our performance to a popular reachability-based baseline and analyze some interesting properties of our approach in improving the safety properties of an AV pipeline.
[Paper] [Project Website]Discovering Closed-Loop Failures of Vision-Based Controllers via Reachability Analysis
One of the key reasons for the success of deep learning in robot control is the ability of neural networks to elegantly process rich visual information and output useful information for control. Unfortunately, vision-based controllers can also be very brittle when faced with out-of-distribution inputs, potentially leading to catastrophic system failures. In this work, we propose an approach for stress testing these controllers using photorealistic simulators. We formulate the closed-loop failure discovery as an optimal control problem, which allows us to do a targeted and efficient search for visual inputs that might trigger system failures. Our findings reveal intriguing and unexpected situations that could compromise state-of-the-art visual controllers, such as pedestrian markings confusing an autonomous aircraft or light-colored walls misleading indoor navigation robots.
[Paper] [Project Website] [Video]Online Update of Safety Assurances for a Reliable Human-Robot Interaction
Human-robot interactions are an unavoidable aspect of many modern robotic applications and ensuring safety for these interactions is critical. In such scenarios, the robot often leverages a human motion predictor to predict their future states and plan safe and efficient trajectories, includinh high-capacity neural networks that can leverage the scene semantics to predict complex human behaviors. However, no model is ever perfect -- when the observed human behavior deviates from the model predictions, the robot might plan unsafe maneuvers. In this work, we propose a HJ reachability-based approach that maintains a confidence in the human model and automatically adjust the safety assurances online as this confidence changes based on the observed human behavior. Overall, our approach allows us to ensure a safe human-robot interaction even when the human model is incorrect. We demonstrate our approach in several safety-critical autonomous driving scenarios, involving a state-of-the-art deep learning-based prediction model.
[Paper]An Efficient Reachability-Based Framework for Provably Safe Autonomous Navigation in Unknown Environments
Real-world autonomous vehicles often operate in a priori unknown environments. Since most of these systems are safety-critical, it is important to ensure they operate safely in the face of environment uncertainty, such as unseen obstacles. Current safety analysis tools do not scale well to scenarios where the environment is being sensed in real time, such as during navigation tasks. In this work, we propose a novel, real-time safety analysis method based on Hamilton-Jacobi reachability that provides strong safety guarantees despite environment uncertainty. Our safety method is planner-agnostic and provides guarantees for a variety of mapping sensors. We demonstrate our approach in simulation and in hardware to provide safety guarantees around a state-of-the-art vision-based, learning-based planner.
[Paper] [Project Website][Video]Combining Optimal Control and Learning for Visual Navigation in Novel Environments
Real-world autonomous vehicles often need to navigate in a priori unknown environments. In this work, we propose a modular approach that enables autonomous navigation in unknown enviornments using only the monocular RGB images from an onboard camera. Our approach uses machine learning for high-level planning based on perceptual information; this high-level plan is then used for low-level planning and control via leveraging classical model-based, control-theoretic approaches. This modular policy leads to a significantly better performance in new, unseen buildings compared to pure learning- based approaches. In addition, we demonstrate that a navigation policy encapsulated in this fashion can be transferred from simulation to reality directly with no retraining or finetuning, demonstrating the robust generalization capabilities of the learned policy. Finally, a side advantage of our integrated policy is its significant data efficiency compared to pure learning-based approaches (a 10x improvement in sample complexity!).
[Paper] [Project Website][Video]Visual Navigation Among Humans With Optimal Control as a Supervisor
Autonomous robots often navigate in unfamiliar, dynamic environments, where they need to share the space with humans. Navigating around humans is especially difficult because it requires predicting their future motion, which can be quite challenging. We propose a novel framework for navigation around humans which combines learning-based perception with model-based optimal control. The proposed framework learns to anticipate and react to peoples' motion based only on a monocular RGB image, without explicitly predicting the future human motion. Our method generalizes well to unseen buildings and humans in both simulation and real world environments. Furthermore, our experiments demonstrate that combining model-based control and learning leads to better and more data-efficient navigational behaviors as compared to a purely learning based approach.
[Paper] [Project Website][Code]