Robotics
New submissions
[ showing up to 1000 entries per page: fewer | more ]
New submissions for Thu, 28 Mar 24
- [1] arXiv:2403.18021 [pdf, other]
-
Title: A Study on the Use of Simulation in Synthesizing Path-Following Control Policies for Autonomous Ground RobotsAuthors: Harry Zhang, Stefan Caldararu, Aaron Young, Alexis Ruiz, Huzaifa Unjhawala, Ishaan Mahajan, Sriram Ashokkumar, Nevindu Batagoda, Zhenhao Zhou, Luning Bakke, Dan NegrutComments: 8 pages, 7 figuresSubjects: Robotics (cs.RO)
We report results obtained and insights gained while answering the following question: how effective is it to use a simulator to establish path following control policies for an autonomous ground robot? While the quality of the simulator conditions the answer to this question, we found that for the simulation platform used herein, producing four control policies for path planning was straightforward once a digital twin of the controlled robot was available. The control policies established in simulation and subsequently demonstrated in the real world are PID control, MPC, and two neural network (NN) based controllers. Training the two NN controllers via imitation learning was accomplished expeditiously using seven simple maneuvers: follow three circles clockwise, follow the same circles counter-clockwise, and drive straight. A test randomization process that employs random micro-simulations is used to rank the ``goodness'' of the four control policies. The policy ranking noted in simulation correlates well with the ranking observed when the control policies were tested in the real world. The simulation platform used is publicly available and BSD3-released as open source; a public Docker image is available for reproducibility studies. It contains a dynamics engine, a sensor simulator, a ROS2 bridge, and a ROS2 autonomy stack the latter employed both in the simulator and the real world experiments.
- [2] arXiv:2403.18062 [pdf, other]
-
Title: ShapeGrasp: Zero-Shot Task-Oriented Grasping with Large Language Models through Geometric DecompositionAuthors: Samuel Li, Sarthak Bhagat, Joseph Campbell, Yaqi Xie, Woojun Kim, Katia Sycara, Simon StepputtisComments: 8 pagesSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Task-oriented grasping of unfamiliar objects is a necessary skill for robots in dynamic in-home environments. Inspired by the human capability to grasp such objects through intuition about their shape and structure, we present a novel zero-shot task-oriented grasping method leveraging a geometric decomposition of the target object into simple, convex shapes that we represent in a graph structure, including geometric attributes and spatial relationships. Our approach employs minimal essential information - the object's name and the intended task - to facilitate zero-shot task-oriented grasping. We utilize the commonsense reasoning capabilities of large language models to dynamically assign semantic meaning to each decomposed part and subsequently reason over the utility of each part for the intended task. Through extensive experiments on a real-world robotics platform, we demonstrate that our grasping approach's decomposition and reasoning pipeline is capable of selecting the correct part in 92% of the cases and successfully grasping the object in 82% of the tasks we evaluate. Additional videos, experiments, code, and data are available on our project website: https://shapegrasp.github.io/.
- [3] arXiv:2403.18096 [pdf, other]
-
Title: Efficient Multi-Band Temporal Video Filter for Reducing Human-Robot InteractionAuthors: Lawrence O'GormanComments: 15 pages, 5 figures, 4 tablesSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Although mobile robots have on-board sensors to perform navigation, their efficiency in completing paths can be enhanced by planning to avoid human interaction. Infrastructure cameras can capture human activity continuously for the purpose of compiling activity analytics to choose efficient times and routes. We describe a cascade temporal filtering method to efficiently extract short- and long-term activity in two time dimensions, isochronal and chronological, for use in global path planning and local navigation respectively. The temporal filter has application either independently, or, if object recognition is also required, it can be used as a pre-filter to perform activity-gating of the more computationally expensive neural network processing. For a testbed 32-camera network, we show how this hybrid approach can achieve over 8 times improvement in frames per second throughput and 6.5 times reduction of system power use. We also show how the cost map of static objects in the ROS robot software development framework is augmented with dynamic regions determined from the temporal filter.
- [4] arXiv:2403.18149 [pdf, other]
-
Title: Code Generation for Conic Model-Predictive Control on Microcontrollers with TinyMPCComments: Submitted to CDC, 2024. First two authors contributed equallySubjects: Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
Conic constraints appear in many important control applications like legged locomotion, robotic manipulation, and autonomous rocket landing. However, current solvers for conic optimization problems have relatively heavy computational demands in terms of both floating-point operations and memory footprint, making them impractical for use on small embedded devices. We extend TinyMPC, an open-source, high-speed solver targeting low-power embedded control applications, to handle second-order cone constraints. We also present code-generation software to enable deployment of TinyMPC on a variety of microcontrollers. We benchmark our generated code against state-of-the-art embedded QP and SOCP solvers, demonstrating a two-order-of-magnitude speed increase over ECOS while consuming less memory. Finally, we demonstrate TinyMPC's efficacy on the Crazyflie, a lightweight, resource-constrained quadrotor with fast dynamics. TinyMPC and its code-generation tools are publicly available at https://tinympc.org.
- [5] arXiv:2403.18172 [pdf, other]
-
Title: Vision-Based Force Estimation for Minimally Invasive Telesurgery Through Contact Detection and Local Stiffness ModelsComments: Preprint of an article accepted in Journal of Medical Robotics Research \copyright 2024 copyright World Scientific Publishing CompanySubjects: Robotics (cs.RO)
In minimally invasive telesurgery, obtaining accurate force information is difficult due to the complexities of in-vivo end effector force sensing. This constrains development and implementation of haptic feedback and force-based automated performance metrics, respectively. Vision-based force sensing approaches using deep learning are a promising alternative to intrinsic end effector force sensing. However, they have limited ability to generalize to novel scenarios, and require learning on high-quality force sensor training data that can be difficult to obtain. To address these challenges, this paper presents a novel vision-based contact-conditional approach for force estimation in telesurgical environments. Our method leverages supervised learning with human labels and end effector position data to train deep neural networks. Predictions from these trained models are optionally combined with robot joint torque information to estimate forces indirectly from visual data. We benchmark our method against ground truth force sensor data and demonstrate generality by fine-tuning to novel surgical scenarios in a data-efficient manner. Our methods demonstrated greater than 90% accuracy on contact detection and less than 10% force prediction error. These results suggest potential usefulness of contact-conditional force estimation for sensory substitution haptic feedback and tissue handling skill evaluation in clinical settings.
- [6] arXiv:2403.18178 [pdf, other]
-
Title: Online Embedding Multi-Scale CLIP Features into 3D MapsComments: 8 pages, 7 figuresSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
This study introduces a novel approach to online embedding of multi-scale CLIP (Contrastive Language-Image Pre-Training) features into 3D maps. By harnessing CLIP, this methodology surpasses the constraints of conventional vocabulary-limited methods and enables the incorporation of semantic information into the resultant maps. While recent approaches have explored the embedding of multi-modal features in maps, they often impose significant computational costs, lacking practicality for exploring unfamiliar environments in real time. Our approach tackles these challenges by efficiently computing and embedding multi-scale CLIP features, thereby facilitating the exploration of unfamiliar environments through real-time map generation. Moreover, the embedding CLIP features into the resultant maps makes offline retrieval via linguistic queries feasible. In essence, our approach simultaneously achieves real-time object search and mapping of unfamiliar environments. Additionally, we propose a zero-shot object-goal navigation system based on our mapping approach, and we validate its efficacy through object-goal navigation, offline object retrieval, and multi-object-goal navigation in both simulated environments and real robot experiments. The findings demonstrate that our method not only exhibits swifter performance than state-of-the-art mapping methods but also surpasses them in terms of the success rate of object-goal navigation tasks.
- [7] arXiv:2403.18195 [pdf, other]
-
Title: SCANet: Correcting LEGO Assembly Errors with Self-Correct Assembly NetworkSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Autonomous assembly in robotics and 3D vision presents significant challenges, particularly in ensuring assembly correctness. Presently, predominant methods such as MEPNet focus on assembling components based on manually provided images. However, these approaches often fall short in achieving satisfactory results for tasks requiring long-term planning. Concurrently, we observe that integrating a self-correction module can partially alleviate such issues. Motivated by this concern, we introduce the single-step assembly error correction task, which involves identifying and rectifying misassembled components. To support research in this area, we present the LEGO Error Correction Assembly Dataset (LEGO-ECA), comprising manual images for assembly steps and instances of assembly failures. Additionally, we propose the Self-Correct Assembly Network (SCANet), a novel method to address this task. SCANet treats assembled components as queries, determining their correctness in manual images and providing corrections when necessary. Finally, we utilize SCANet to correct the assembly results of MEPNet. Experimental results demonstrate that SCANet can identify and correct MEPNet's misassembled results, significantly improving the correctness of assembly. Our code and dataset are available at https://github.com/Yaser-wyx/SCANet.
- [8] arXiv:2403.18197 [pdf, other]
-
Title: LocoMan: Advancing Versatile Quadrupedal Dexterity with Lightweight Loco-ManipulatorsAuthors: Changyi Lin, Xingyu Liu, Yuxiang Yang, Yaru Niu, Wenhao Yu, Tingnan Zhang, Jie Tan, Byron Boots, Ding ZhaoComments: Project page: this https URLSubjects: Robotics (cs.RO)
Quadrupedal robots have emerged as versatile agents capable of locomoting and manipulating in complex environments. Traditional designs typically rely on the robot's inherent body parts or incorporate top-mounted arms for manipulation tasks. However, these configurations may limit the robot's operational dexterity, efficiency and adaptability, particularly in cluttered or constrained spaces. In this work, we present LocoMan, a dexterous quadrupedal robot with a novel morphology to perform versatile manipulation in diverse constrained environments. By equipping a Unitree Go1 robot with two low-cost and lightweight modular 3-DoF loco-manipulators on its front calves, LocoMan leverages the combined mobility and functionality of the legs and grippers for complex manipulation tasks that require precise 6D positioning of the end effector in a wide workspace. To harness the loco-manipulation capabilities of LocoMan, we introduce a unified control framework that extends the whole-body controller (WBC) to integrate the dynamics of loco-manipulators. Through experiments, we validate that the proposed whole-body controller can accurately and stably follow desired 6D trajectories of the end effector and torso, which, when combined with the large workspace from our design, facilitates a diverse set of challenging dexterous loco-manipulation tasks in confined spaces, such as opening doors, plugging into sockets, picking objects in narrow and low-lying spaces, and bimanual manipulation.
- [9] arXiv:2403.18206 [pdf, other]
-
Title: Sailing Through Point Clouds: Safe Navigation Using Point Cloud Based Control Barrier FunctionsSubjects: Robotics (cs.RO)
The capability to navigate safely in an unstructured environment is crucial when deploying robotic systems in real-world scenarios. Recently, control barrier function (CBF) based approaches have been highly effective in synthesizing safety-critical controllers. In this work, we propose a novel CBF-based local planner comprised of two components: Vessel and Mariner. The Vessel is a novel scaling factor based CBF formulation that synthesizes CBFs using only point cloud data. The Mariner is a CBF-based preview control framework that is used to mitigate getting stuck in spurious equilibria during navigation. To demonstrate the efficacy of our proposed approach, we first compare the proposed point cloud based CBF formulation with other point cloud based CBF formulations. Then, we demonstrate the performance of our proposed approach and its integration with global planners using experimental studies on the Unitree B1 and Unitree Go2 quadruped robots in various environments.
- [10] arXiv:2403.18212 [pdf, other]
-
Title: Preference-Based Planning in Stochastic Environments: From Partially-Ordered Temporal Goals to Most Preferred PoliciesComments: arXiv admin note: substantial text overlap with arXiv:2209.12267Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
Human preferences are not always represented via complete linear orders: It is natural to employ partially-ordered preferences for expressing incomparable outcomes. In this work, we consider decision-making and probabilistic planning in stochastic systems modeled as Markov decision processes (MDPs), given a partially ordered preference over a set of temporally extended goals. Specifically, each temporally extended goal is expressed using a formula in Linear Temporal Logic on Finite Traces (LTL$_f$). To plan with the partially ordered preference, we introduce order theory to map a preference over temporal goals to a preference over policies for the MDP. Accordingly, a most preferred policy under a stochastic ordering induces a stochastic nondominated probability distribution over the finite paths in the MDP. To synthesize a most preferred policy, our technical approach includes two key steps. In the first step, we develop a procedure to transform a partially ordered preference over temporal goals into a computational model, called preference automaton, which is a semi-automaton with a partial order over acceptance conditions. In the second step, we prove that finding a most preferred policy is equivalent to computing a Pareto-optimal policy in a multi-objective MDP that is constructed from the original MDP, the preference automaton, and the chosen stochastic ordering relation. Throughout the paper, we employ running examples to illustrate the proposed preference specification and solution approaches. We demonstrate the efficacy of our algorithm using these examples, providing detailed analysis, and then discuss several potential future directions.
- [11] arXiv:2403.18222 [pdf, other]
-
Title: Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning PoliciesComments: 8 pages, 7 figuresSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
Large-scale robotic policies trained on data from diverse tasks and robotic platforms hold great promise for enabling general-purpose robots; however, reliable generalization to new environment conditions remains a major challenge. Toward addressing this challenge, we propose a novel approach for uncertainty-aware deployment of pre-trained language-conditioned imitation learning agents. Specifically, we use temperature scaling to calibrate these models and exploit the calibrated model to make uncertainty-aware decisions by aggregating the local information of candidate actions. We implement our approach in simulation using three such pre-trained models, and showcase its potential to significantly enhance task completion rates. The accompanying code is accessible at the link: https://github.com/BobWu1998/uncertainty_quant_all.git
- [12] arXiv:2403.18236 [pdf, other]
-
Title: Multi-AGV Path Planning Method via Reinforcement Learning and Particle FiltersAuthors: Shao ShuoSubjects: Robotics (cs.RO)
The Reinforcement Learning (RL) algorithm, renowned for its robust learning capability and search stability, has garnered significant attention and found extensive application in Automated Guided Vehicle (AGV) path planning. However, RL planning algorithms encounter challenges stemming from the substantial variance of neural networks caused by environmental instability and significant fluctuations in system structure. These challenges manifest in slow convergence speed and low learning efficiency. To tackle this issue, this paper presents the Particle Filter-Double Deep Q-Network (PF-DDQN) approach, which incorporates the Particle Filter (PF) into multi-AGV reinforcement learning path planning. The PF-DDQN method leverages the imprecise weight values of the network as state values to formulate the state space equation. Through the iterative fusion process of neural networks and particle filters, the DDQN model is optimized to acquire the optimal true weight values, thus enhancing the algorithm's efficiency. The proposed method's effectiveness and superiority are validated through numerical simulations. Overall, the simulation results demonstrate that the proposed algorithm surpasses the traditional DDQN algorithm in terms of path planning superiority and training time indicators by 92.62% and 76.88%, respectively. In conclusion, the PF-DDQN method addresses the challenges encountered by RL planning algorithms in AGV path planning. By integrating the Particle Filter and optimizing the DDQN model, the proposed method achieves enhanced efficiency and outperforms the traditional DDQN algorithm in terms of path planning superiority and training time indicators.
- [13] arXiv:2403.18256 [pdf, other]
-
Title: Manipulating Neural Path Planners via Slight PerturbationsSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Data-driven neural path planners are attracting increasing interest in the robotics community. However, their neural network components typically come as black boxes, obscuring their underlying decision-making processes. Their black-box nature exposes them to the risk of being compromised via the insertion of hidden malicious behaviors. For example, an attacker may hide behaviors that, when triggered, hijack a delivery robot by guiding it to a specific (albeit wrong) destination, trapping it in a predefined region, or inducing unnecessary energy expenditure by causing the robot to repeatedly circle a region. In this paper, we propose a novel approach to specify and inject a range of hidden malicious behaviors, known as backdoors, into neural path planners. Our approach provides a concise but flexible way to define these behaviors, and we show that hidden behaviors can be triggered by slight perturbations (e.g., inserting a tiny unnoticeable object), that can nonetheless significantly compromise their integrity. We also discuss potential techniques to identify these backdoors aimed at alleviating such risks. We demonstrate our approach on both sampling-based and search-based neural path planners.
- [14] arXiv:2403.18259 [pdf, other]
-
Title: RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint GenerationComments: Accepted by ICRA 2024Subjects: Robotics (cs.RO)
Estimating robot pose and joint angles is significant in advanced robotics, enabling applications like robot collaboration and online hand-eye calibration.However, the introduction of unknown joint angles makes prediction more complex than simple robot pose estimation, due to its higher dimensionality.Previous methods either regress 3D keypoints directly or utilise a render&compare strategy. These approaches often falter in terms of performance or efficiency and grapple with the cross-camera gap problem.This paper presents a novel framework that bifurcates the high-dimensional prediction task into two manageable subtasks: 2D keypoints detection and lifting 2D keypoints to 3D. This separation promises enhanced performance without sacrificing the efficiency innate to keypoint-based techniques.A vital component of our method is the lifting of 2D keypoints to 3D keypoints. Common deterministic regression methods may falter when faced with uncertainties from 2D detection errors or self-occlusions.Leveraging the robust modeling potential of diffusion models, we reframe this issue as a conditional 3D keypoints generation task. To bolster cross-camera adaptability, we introduce theNormalised Camera Coordinate Space (NCCS), ensuring alignment of estimated 2D keypoints across varying camera intrinsics.Experimental results demonstrate that the proposed method outperforms the state-of-the-art render\&compare method and achieves higher inference speed.Furthermore, the tests accentuate our method's robust cross-camera generalisation capabilities.We intend to release both the dataset and code in https://nimolty.github.io/Robokeygen/
- [15] arXiv:2403.18358 [pdf, ps, other]
-
Title: Imaging radar and LiDAR image translation for 3-DOF extrinsic calibrationSubjects: Robotics (cs.RO)
The integration of sensor data is crucial in the field of robotics to take full advantage of the various sensors employed. One critical aspect of this integration is determining the extrinsic calibration parameters, such as the relative transformation, between each sensor. The use of data fusion between complementary sensors, such as radar and LiDAR, can provide significant benefits, particularly in harsh environments where accurate depth data is required. However, noise included in radar sensor data can make the estimation of extrinsic calibration challenging. To address this issue, we present a novel framework for the extrinsic calibration of radar and LiDAR sensors, utilizing CycleGAN as amethod of image-to-image translation. Our proposed method employs translating radar bird-eye-view images into LiDAR-style images to estimate the 3-DOF extrinsic parameters. The use of image registration techniques, as well as deskewing based on sensor odometry and B-spline interpolation, is employed to address the rolling shutter effect commonly present in spinning sensors. Our method demonstrates a notable improvement in extrinsic calibration compared to filter-based methods using the MulRan dataset.
- [16] arXiv:2403.18376 [pdf, other]
-
Title: Extensible Hook System for Rendesvouz and Docking of a Cubesat SwarmSubjects: Robotics (cs.RO)
The use of cubesat swarms is being proposed for different missions where cooperation between satellites is required. Commonly, the cube swarm requires formation flight and even rendezvous and docking, which are very challenging tasks since they required more energy and the use of advanced guidance, navigation and control techniques. In this paper, we propose the use of an extensible hook system to mitigate these drawbacks,i.e. it allows to save fuel and reduce the system complexity by including techniques that have been previously demonstrated on Earth. This system is based on a scissor boom structure, which could reach up to five meters for a 4U dimension, including three degrees of freedom to place the end effector at any pose within the system workspace. We simulated the dynamic behaviour of a cubesat with the proposed system, demonstrating the required power for a 16U cubesat equipped with one extensible hook system is considered acceptable according to the current state of the art actuators.
- [17] arXiv:2403.18413 [pdf, ps, other]
-
Title: HyRRT-Connect: A Bidirectional Rapidly-Exploring Random Trees Motion Planning Algorithm for Hybrid SystemsComments: Accepted by the 8th IFAC International Conference on Analysis and Design of Hybrid Systems (ADHS 2024)Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This paper proposes a bidirectional rapidly-exploring random trees (RRT) algorithm to solve the motion planning problem for hybrid systems. The proposed algorithm, called HyRRT-Connect, propagates in both forward and backward directions in hybrid time until an overlap between the forward and backward propagation results is detected. Then, HyRRT-Connect constructs a motion plan through the reversal and concatenation of functions defined on hybrid time domains, ensuring the motion plan thoroughly satisfies the given hybrid dynamics. To address the potential discontinuity along the flow caused by tolerating some distance between the forward and backward partial motion plans, we reconstruct the backward partial motion plan by a forward-in-hybrid-time simulation from the final state of the forward partial motion plan. By applying the reversed input of the backward partial motion plan, the reconstruction process effectively eliminates the discontinuity and ensures that as the tolerance distance decreases to zero, the distance between the endpoint of the reconstructed motion plan and the final state set approaches zero. The proposed algorithm is applied to an actuated bouncing ball example and a walking robot example so as to highlight its generality and computational improvement.
- [18] arXiv:2403.18456 [pdf, other]
-
Title: Inverse kinematics learning of a continuum manipulator using limited real time dataSubjects: Robotics (cs.RO)
Data driven control of a continuum manipulator requires a lot of data for training but generating sufficient amount of real time data is not cost efficient. Random actuation of the manipulator can also be unsafe sometimes. Meta learning has been used successfully to adapt to a new environment. Hence, this paper tries to solve the above mentioned problem using meta learning. We consider two cases for that. First, this paper proposes a method to use simulation data for training the model using MAML(Model-Agnostic Meta-Learning). Then, it adapts to the real world using gradient steps. Secondly,if the simulation model is not available or difficult to formulate, then we propose a CGAN(Conditional Generative adversial network)-MAML based method for it. The model is trained using a small amount of real time data and augmented data for different loading conditions. Then, adaptation is done in the real environment. It has been found out from the experiments that the relative positioning error for both the cases are below 3%. The proposed models are experimentally verified on a real continuum manipulator.
- [19] arXiv:2403.18459 [pdf, other]
-
Title: CoBOS: Constraint-Based Online Scheduler for Human-Robot CollaborationComments: 7 pages, 8 figuresSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Assembly processes involving humans and robots are challenging scenarios because the individual activities and access to shared workspace have to be coordinated. Fixed robot programs leave no room to diverge from a fixed protocol. Working on such a process can be stressful for the user and lead to ineffective behavior or failure. We propose a novel approach of online constraint-based scheduling in a reactive execution control framework facilitating behavior trees called CoBOS. This allows the robot to adapt to uncertain events such as delayed activity completions and activity selection (by the human). The user will experience less stress as the robotic coworkers adapt their behavior to best complement the human-selected activities to complete the common task. In addition to the improved working conditions, our algorithm leads to increased efficiency, even in highly uncertain scenarios. We evaluate our algorithm using a probabilistic simulation study with 56000 experiments. We outperform all baselines by a margin of 4-10%. Initial real robot experiments using a Franka Emika Panda robot and human tracking based on HTC Vive VR gloves look promising.
- [20] arXiv:2403.18524 [pdf, other]
-
Title: Bridging the Gap: Regularized Reinforcement Learning for Improved Classical Motion Planning with Safety ModulesComments: 8 pagesSubjects: Robotics (cs.RO)
Classical navigation planners can provide safe navigation, albeit often suboptimally and with hindered human norm compliance. ML-based, contemporary autonomous navigation algorithms can imitate more natural and humancompliant navigation, but usually require large and realistic datasets and do not always provide safety guarantees. We present an approach that leverages a classical algorithm to guide reinforcement learning. This greatly improves the results and convergence rate of the underlying RL algorithm and requires no human-expert demonstrations to jump-start the process. Additionally, we incorporate a practical fallback system that can switch back to a classical planner to ensure safety. The outcome is a sample efficient ML approach for mobile navigation that builds on classical algorithms, improves them to ensure human compliance, and guarantees safety.
- [21] arXiv:2403.18546 [pdf, other]
-
Title: Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered ScenesComments: Extensive results on GraspNet-1B datasetSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Fast and robust object grasping in clutter is a crucial component of robotics. Most current works resort to the whole observed point cloud for 6-Dof grasp generation, ignoring the guidance information excavated from global semantics, thus limiting high-quality grasp generation and real-time performance. In this work, we show that the widely used heatmaps are underestimated in the efficiency of 6-Dof grasp generation. Therefore, we propose an effective local grasp generator combined with grasp heatmaps as guidance, which infers in a global-to-local semantic-to-point way. Specifically, Gaussian encoding and the grid-based strategy are applied to predict grasp heatmaps as guidance to aggregate local points into graspable regions and provide global semantic information. Further, a novel non-uniform anchor sampling mechanism is designed to improve grasp accuracy and diversity. Benefiting from the high-efficiency encoding in the image space and focusing on points in local graspable regions, our framework can perform high-quality grasp detection in real-time and achieve state-of-the-art results. In addition, real robot experiments demonstrate the effectiveness of our method with a success rate of 94% and a clutter completion rate of 100%. Our code is available at https://github.com/THU-VCLab/HGGD.
- [22] arXiv:2403.18643 [pdf, other]
-
Title: Sampling-Based Motion Planning with Online Racing Line Generation for Autonomous Driving on Three-Dimensional Race TracksComments: 8 pages, submitted to be published at the 35th IEEE Intelligent Vehicles Symposium, June 2 - 5, 2024, Jeju Shinhwa World, Jeju Island, KoreaSubjects: Robotics (cs.RO)
Existing approaches to trajectory planning for autonomous racing employ sampling-based methods, generating numerous jerk-optimal trajectories and selecting the most favorable feasible trajectory based on a cost function penalizing deviations from an offline-calculated racing line. While successful on oval tracks, these methods face limitations on complex circuits due to the simplistic geometry of jerk-optimal edges failing to capture the complexity of the racing line. Additionally, they only consider two-dimensional tracks, potentially neglecting or surpassing the actual dynamic potential. In this paper, we present a sampling-based local trajectory planning approach for autonomous racing that can maintain the lap time of the racing line even on complex race tracks and consider the race track's three-dimensional effects. In simulative experiments, we demonstrate that our approach achieves lower lap times and improved utilization of dynamic limits compared to existing approaches. We also investigate the impact of online racing line generation, in which the time-optimal solution is planned from the current vehicle state for a limited spatial horizon, in contrast to a closed racing line calculated offline. We show that combining the sampling-based planner with the online racing line generation can significantly reduce lap times in multi-vehicle scenarios.
- [23] arXiv:2403.18692 [pdf, ps, other]
-
Title: Teaching Introductory HRI: UChicago Course "Human-Robot Interaction: Research and Practice"Authors: Sarah SeboComments: 4 pages, 2 tables, Presented at the Designing an Intro to HRI Course Workshop at HRI 2024 (arXiv:2403.05588)Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
In 2020, I designed the course CMSC 20630/30630 Human-Robot Interaction: Research and Practice as a hands-on introduction to human-robot interaction (HRI) research for both undergraduate and graduate students at the University of Chicago. Since 2020, I have taught and refined this course each academic year. Human-Robot Interaction: Research and Practice focuses on the core concepts and cutting-edge research in the field of human-robot interaction (HRI), covering topics that include: nonverbal robot behavior, verbal robot behavior, social dynamics, norms & ethics, collaboration & learning, group interactions, applications, and future challenges of HRI. Course meetings involve students in the class leading discussions about cutting-edge peer-reviewed research HRI publications. Students also participate in a quarter-long collaborative research project, where they pursue an HRI research question that often involves conducing their own human-subjects research study where they recruit human subjects to interact with a robot. In this paper, I detail the structure of the course and its learning goals as well as my reflections and student feedback on the course.
- [24] arXiv:2403.18721 [pdf, other]
-
Title: PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab InvestigationsComments: Submitted to IEEE RO-MANSubjects: Robotics (cs.RO)
Robot systems in education can leverage Large language models' (LLMs) natural language understanding capabilities to provide assistance and facilitate learning. This paper proposes a multimodal interactive robot (PhysicsAssistant) built on YOLOv8 object detection, cameras, speech recognition, and chatbot using LLM to provide assistance to students' physics labs. We conduct a user study on ten 8th-grade students to empirically evaluate the performance of PhysicsAssistant with a human expert. The Expert rates the assistants' responses to student queries on a 0-4 scale based on Bloom's taxonomy to provide educational support. We have compared the performance of PhysicsAssistant (YOLOv8+GPT-3.5-turbo) with GPT-4 and found that the human expert rating of both systems for factual understanding is the same. However, the rating of GPT-4 for conceptual and procedural knowledge (3 and 3.2 vs 2.2 and 2.6, respectively) is significantly higher than PhysicsAssistant (p < 0.05). However, the response time of GPT-4 is significantly higher than PhysicsAssistant (3.54 vs 1.64 sec, p < 0.05). Hence, despite the relatively lower response quality of PhysicsAssistant than GPT-4, it has shown potential for being used as a real-time lab assistant to provide timely responses and can offload teachers' labor to assist with repetitive tasks. To the best of our knowledge, this is the first attempt to build such an interactive multimodal robotic assistant for K-12 science (physics) education.
- [25] arXiv:2403.18760 [pdf, other]
-
Title: MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language ModelSubjects: Robotics (cs.RO)
In the realm of data-driven AI technology, the application of open-source large language models (LLMs) in robotic task planning represents a significant milestone. Recent robotic task planning methods based on open-source LLMs typically leverage vast task planning datasets to enhance models' planning abilities. While these methods show promise, they struggle with complex long-horizon tasks, which require comprehending more context and generating longer action sequences. This paper addresses this limitation by proposing MLDT, theMulti-Level Decomposition Task planning method. This method innovatively decomposes tasks at the goal-level, task-level, and action-level to mitigate the challenge of complex long-horizon tasks. In order to enhance open-source LLMs' planning abilities, we introduce a goal-sensitive corpus generation method to create high-quality training data and conduct instruction tuning on the generated corpus. Since the complexity of the existing datasets is not high enough, we construct a more challenging dataset, LongTasks, to specifically evaluate planning ability on complex long-horizon tasks. We evaluate our method using various LLMs on four datasets in VirtualHome. Our results demonstrate a significant performance enhancement in robotic task planning, showcasing MLDT's effectiveness in overcoming the limitations of existing methods based on open-source LLMs as well as its practicality in complex, real-world scenarios.
- [26] arXiv:2403.18764 [pdf, other]
-
Title: Temporal Logic Formalisation of ISO 34502 Critical Scenarios: Modular Construction with the RSS Safety DistanceAuthors: Jesse Reimann, Nico Mansion, James Haydon, Benjamin Bray, Agnishom Chattopadhyay, Sota Sato, Masaki Waga, Étienne André, Ichiro Hasuo, Naoki Ueda, Yosuke YokoyamaComments: 12 pages, 4 figures, 5 tables. Accepted to SAC 2024Subjects: Robotics (cs.RO); Logic in Computer Science (cs.LO)
As the development of autonomous vehicles progresses, efficient safety assurance methods become increasingly necessary. Safety assurance methods such as monitoring and scenario-based testing call for formalisation of driving scenarios. In this paper, we develop a temporal-logic formalisation of an important class of critical scenarios in the ISO standard 34502. We use signal temporal logic (STL) as a logical formalism. Our formalisation has two main features: 1) modular composition of logical formulas for systematic and comprehensive formalisation (following the compositional methodology of ISO 34502); 2) use of the RSS distance for defining danger. We find our formalisation comes with few parameters to tune thanks to the RSS distance. We experimentally evaluated our formalisation; using its results, we discuss the validity of our formalisation and its stability with respect to the choice of some parameter values.
- [27] arXiv:2403.18765 [pdf, other]
-
Title: CaT: Constraints as Terminations for Legged Locomotion Reinforcement LearningAuthors: Elliot Chane-Sane, Pierre-Alexandre Leziart, Thomas Flayols, Olivier Stasse, Philippe Souères, Nicolas MansardComments: Project webpage: this https URLSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
Deep Reinforcement Learning (RL) has demonstrated impressive results in solving complex robotic tasks such as quadruped locomotion. Yet, current solvers fail to produce efficient policies respecting hard constraints. In this work, we advocate for integrating constraints into robot learning and present Constraints as Terminations (CaT), a novel constrained RL algorithm. Departing from classical constrained RL formulations, we reformulate constraints through stochastic terminations during policy learning: any violation of a constraint triggers a probability of terminating potential future rewards the RL agent could attain. We propose an algorithmic approach to this formulation, by minimally modifying widely used off-the-shelf RL algorithms in robot learning (such as Proximal Policy Optimization). Our approach leads to excellent constraint adherence without introducing undue complexity and computational overhead, thus mitigating barriers to broader adoption. Through empirical evaluation on the real quadruped robot Solo crossing challenging obstacles, we demonstrate that CaT provides a compelling solution for incorporating constraints into RL frameworks. Videos and code are available at https://constraints-as-terminations.github.io.
- [28] arXiv:2403.18778 [pdf, other]
-
Title: 3P-LLM: Probabilistic Path Planning using Large Language Model for Autonomous Robot NavigationAuthors: Ehsan LatifComments: Exploratory StudySubjects: Robotics (cs.RO)
Much worldly semantic knowledge can be encoded in large language models (LLMs). Such information could be of great use to robots that want to carry out high-level, temporally extended commands stated in natural language. However, the lack of real-world experience that language models have is a key limitation that makes it challenging to use them for decision-making inside a particular embodiment. This research assesses the feasibility of using LLM (GPT-3.5-turbo chatbot by OpenAI) for robotic path planning. The shortcomings of conventional approaches to managing complex environments and developing trustworthy plans for shifting environmental conditions serve as the driving force behind the research. Due to the sophisticated natural language processing abilities of LLM, the capacity to provide effective and adaptive path-planning algorithms in real-time, great accuracy, and few-shot learning capabilities, GPT-3.5-turbo is well suited for path planning in robotics. In numerous simulated scenarios, the research compares the performance of GPT-3.5-turbo with that of state-of-the-art path planners like Rapidly Exploring Random Tree (RRT) and A*. We observed that GPT-3.5-turbo is able to provide real-time path planning feedback to the robot and outperforms its counterparts. This paper establishes the foundation for LLM-powered path planning for robotic systems.
Cross-lists for Thu, 28 Mar 24
- [29] arXiv:2403.18015 (cross-list from cs.SY) [pdf, other]
-
Title: A Constructive Method for Designing Safe Multirate Controllers for Differentially-Flat SystemsAuthors: Devansh R. Agrawal, Hardik Parwana, Ryan K. Cosner, Ugo Rosolia, Aaron D. Ames, Dimitra PanagouComments: 6 pages, 3 figures, accepted at IEEE Control Systems Letters 2021Journal-ref: IEEE Control Systems Letters, Vol 6, Page 2138--2143, 2021Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
We present a multi-rate control architecture that leverages fundamental properties of differential flatness to synthesize controllers for safety-critical nonlinear dynamical systems. We propose a two-layer architecture, where the high-level generates reference trajectories using a linear Model Predictive Controller, and the low-level tracks this reference using a feedback controller. The novelty lies in how we couple these layers, to achieve formal guarantees on recursive feasibility of the MPC problem, and safety of the nonlinear system. Furthermore, using differential flatness, we provide a constructive means to synthesize the multi-rate controller, thereby removing the need to search for suitable Lyapunov or barrier functions, or to approximately linearize/discretize nonlinear dynamics. We show the synthesized controller is a convex optimization problem, making it amenable to real-time implementations. The method is demonstrated experimentally on a ground rover and a quadruped robotic system.
- [30] arXiv:2403.18033 (cross-list from cs.CV) [pdf, other]
-
Title: SpectralWaste Dataset: Multimodal Data for Waste Sorting AutomationAuthors: Sara Casao, Fernando Peña, Alberto Sabater, Rosa Castillón, Darío Suárez, Eduardo Montijano, Ana C. MurilloSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
The increase in non-biodegradable waste is a worldwide concern. Recycling facilities play a crucial role, but their automation is hindered by the complex characteristics of waste recycling lines like clutter or object deformation. In addition, the lack of publicly available labeled data for these environments makes developing robust perception systems challenging. Our work explores the benefits of multimodal perception for object segmentation in real waste management scenarios. First, we present SpectralWaste, the first dataset collected from an operational plastic waste sorting facility that provides synchronized hyperspectral and conventional RGB images. This dataset contains labels for several categories of objects that commonly appear in sorting plants and need to be detected and separated from the main trash flow for several reasons, such as security in the management line or reuse. Additionally, we propose a pipeline employing different object segmentation architectures and evaluate the alternatives on our dataset, conducting an extensive analysis for both multimodal and unimodal alternatives. Our evaluation pays special attention to efficiency and suitability for real-time processing and demonstrates how HSI can bring a boost to RGB-only perception in these realistic industrial settings without much computational overhead.
- [31] arXiv:2403.18041 (cross-list from eess.SY) [pdf, other]
-
Title: Learning Piecewise Residuals of Control Barrier Functions for Safety of Switching Systems using Multi-Output Gaussian ProcessesComments: arXiv admin note: text overlap with arXiv:2403.09573Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Control barrier functions (CBFs) have recently been introduced as a systematic tool to ensure safety by establishing set invariance. When combined with a control Lyapunov function (CLF), they form a safety-critical control mechanism. However, the effectiveness of CBFs and CLFs is closely tied to the system model. In practice, model uncertainty can jeopardize safety and stability guarantees and may lead to undesirable performance. In this paper, we develop a safe learning-based control strategy for switching systems in the face of uncertainty. We focus on the case that a nominal model is available for a true underlying switching system. This uncertainty results in piecewise residuals for each switching surface, impacting the CLF and CBF constraints. We introduce a batch multi-output Gaussian process (MOGP) framework to approximate these piecewise residuals, thereby mitigating the adverse effects of uncertainty. A particular structure of the covariance function enables us to convert the MOGP-based chance constraints CLF and CBF into second-order cone constraints, which leads to a convex optimization. We analyze the feasibility of the resulting optimization and provide the necessary and sufficient conditions for feasibility. The effectiveness of the proposed strategy is validated through a simulation of a switching adaptive cruise control system.
- [32] arXiv:2403.18066 (cross-list from eess.SY) [pdf, ps, other]
-
Title: Path Integral Control with Rollout Clustering and Dynamic ObstaclesComments: 8 pages, 5 figures, extended version of ACC 2024 submissionSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
Model Predictive Path Integral (MPPI) control has proven to be a powerful tool for the control of uncertain systems (such as systems subject to disturbances and systems with unmodeled dynamics). One important limitation of the baseline MPPI algorithm is that it does not utilize simulated trajectories to their fullest extent. For one, it assumes that the average of all trajectories weighted by their performance index will be a safe trajectory. In this paper, multiple examples are shown where the previous assumption does not hold, and a trajectory clustering technique is presented that reduces the chances of the weighted average crossing in an unsafe region. Secondly, MPPI does not account for dynamic obstacles, so the authors put forward a novel cost function that accounts for dynamic obstacles without adding significant computation time to the overall algorithm. The novel contributions proposed in this paper were evaluated with extensive simulations to demonstrate improvements upon the state-of-the-art MPPI techniques.
- [33] arXiv:2403.18145 (cross-list from cs.AI) [pdf, other]
-
Title: A Real-Time Rescheduling Algorithm for Multi-robot Plan ExecutionComments: ICAPS 2024Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
One area of research in multi-agent path finding is to determine how replanning can be efficiently achieved in the case of agents being delayed during execution. One option is to reschedule the passing order of agents, i.e., the sequence in which agents visit the same location. In response, we propose Switchable-Edge Search (SES), an A*-style algorithm designed to find optimal passing orders. We prove the optimality of SES and evaluate its efficiency via simulations. The best variant of SES takes less than 1 second for small- and medium-sized problems and runs up to 4 times faster than baselines for large-sized problems.
- [34] arXiv:2403.18207 (cross-list from cs.CV) [pdf, other]
-
Title: Road Obstacle Detection based on Unknown Objectness ScoresComments: ICRA 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
The detection of unknown traffic obstacles is vital to ensure safe autonomous driving. The standard object-detection methods cannot identify unknown objects that are not included under predefined categories. This is because object-detection methods are trained to assign a background label to pixels corresponding to the presence of unknown objects. To address this problem, the pixel-wise anomaly-detection approach has attracted increased research attention. Anomaly-detection techniques, such as uncertainty estimation and perceptual difference from reconstructed images, make it possible to identify pixels of unknown objects as out-of-distribution (OoD) samples. However, when applied to images with many unknowns and complex components, such as driving scenes, these methods often exhibit unstable performance. The purpose of this study is to achieve stable performance for detecting unknown objects by incorporating the object-detection fashions into the pixel-wise anomaly detection methods. To achieve this goal, we adopt a semantic-segmentation network with a sigmoid head that simultaneously provides pixel-wise anomaly scores and objectness scores. Our experimental results show that the objectness scores play an important role in improving the detection performance. Based on these results, we propose a novel anomaly score by integrating these two scores, which we term as unknown objectness score. Quantitative evaluations show that the proposed method outperforms state-of-the-art methods when applied to the publicly available datasets.
- [35] arXiv:2403.18209 (cross-list from cs.LG) [pdf, other]
-
Title: Long and Short-Term Constraints Driven Safe Reinforcement Learning for Autonomous DrivingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Reinforcement learning (RL) has been widely used in decision-making tasks, but it cannot guarantee the agent's safety in the training process due to the requirements of interaction with the environment, which seriously limits its industrial applications such as autonomous driving. Safe RL methods are developed to handle this issue by constraining the expected safety violation costs as a training objective, but they still permit unsafe state occurrence, which is unacceptable in autonomous driving tasks. Moreover, these methods are difficult to achieve a balance between the cost and return expectations, which leads to learning performance degradation for the algorithms. In this paper, we propose a novel algorithm based on the long and short-term constraints (LSTC) for safe RL. The short-term constraint aims to guarantee the short-term state safety that the vehicle explores, while the long-term constraint ensures the overall safety of the vehicle throughout the decision-making process. In addition, we develop a safe RL method with dual-constraint optimization based on the Lagrange multiplier to optimize the training process for end-to-end autonomous driving. Comprehensive experiments were conducted on the MetaDrive simulator. Experimental results demonstrate that the proposed method achieves higher safety in continuous state and action tasks, and exhibits higher exploration performance in long-distance decision-making tasks compared with state-of-the-art methods.
- [36] arXiv:2403.18447 (cross-list from cs.CL) [pdf, other]
-
Title: Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory PredictionComments: Accepted at CVPR 2024Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Language models have demonstrated impressive ability in context understanding and generative performance. Inspired by the recent success of language foundation models, in this paper, we propose LMTraj (Language-based Multimodal Trajectory predictor), which recasts the trajectory prediction task into a sort of question-answering problem. Departing from traditional numerical regression models, which treat the trajectory coordinate sequence as continuous signals, we consider them as discrete signals like text prompts. Specially, we first transform an input space for the trajectory coordinate into the natural language space. Here, the entire time-series trajectories of pedestrians are converted into a text prompt, and scene images are described as text information through image captioning. The transformed numerical and image data are then wrapped into the question-answering template for use in a language model. Next, to guide the language model in understanding and reasoning high-level knowledge, such as scene context and social relationships between pedestrians, we introduce an auxiliary multi-task question and answering. We then train a numerical tokenizer with the prompt data. We encourage the tokenizer to separate the integer and decimal parts well, and leverage it to capture correlations between the consecutive numbers in the language model. Lastly, we train the language model using the numerical tokenizer and all of the question-answer prompts. Here, we propose a beam-search-based most-likely prediction and a temperature-based multimodal prediction to implement both deterministic and stochastic inferences. Applying our LMTraj, we show that the language-based model can be a powerful pedestrian trajectory predictor, and outperforms existing numerical-based predictor methods. Code is publicly available at https://github.com/inhwanbae/LMTrajectory .
- [37] arXiv:2403.18452 (cross-list from cs.CV) [pdf, other]
-
Title: SingularTrajectory: Universal Trajectory Predictor Using Diffusion ModelComments: Accepted at CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
There are five types of trajectory prediction tasks: deterministic, stochastic, domain adaptation, momentary observation, and few-shot. These associated tasks are defined by various factors, such as the length of input paths, data split and pre-processing methods. Interestingly, even though they commonly take sequential coordinates of observations as input and infer future paths in the same coordinates as output, designing specialized architectures for each task is still necessary. For the other task, generality issues can lead to sub-optimal performances. In this paper, we propose SingularTrajectory, a diffusion-based universal trajectory prediction framework to reduce the performance gap across the five tasks. The core of SingularTrajectory is to unify a variety of human dynamics representations on the associated tasks. To do this, we first build a Singular space to project all types of motion patterns from each task into one embedding space. We next propose an adaptive anchor working in the Singular space. Unlike traditional fixed anchor methods that sometimes yield unacceptable paths, our adaptive anchor enables correct anchors, which are put into a wrong location, based on a traversability map. Finally, we adopt a diffusion-based predictor to further enhance the prototype paths using a cascaded denoising process. Our unified framework ensures the generality across various benchmark settings such as input modality, and trajectory lengths. Extensive experiments on five public benchmarks demonstrate that SingularTrajectory substantially outperforms existing models, highlighting its effectiveness in estimating general dynamics of human movements. Code is publicly available at https://github.com/inhwanbae/SingularTrajectory .
- [38] arXiv:2403.18600 (cross-list from cs.CV) [pdf, other]
-
Title: RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional VideosComments: 23 pages, 6 figures, 12 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Procedure Planning in instructional videos entails generating a sequence of action steps based on visual observations of the initial and target states. Despite the rapid progress in this task, there remain several critical challenges to be solved: (1) Adaptive procedures: Prior works hold an unrealistic assumption that the number of action steps is known and fixed, leading to non-generalizable models in real-world scenarios where the sequence length varies. (2) Temporal relation: Understanding the step temporal relation knowledge is essential in producing reasonable and executable plans. (3) Annotation cost: Annotating instructional videos with step-level labels (i.e., timestamp) or sequence-level labels (i.e., action category) is demanding and labor-intensive, limiting its generalizability to large-scale datasets.In this work, we propose a new and practical setting, called adaptive procedure planning in instructional videos, where the procedure length is not fixed or pre-determined. To address these challenges we introduce Retrieval-Augmented Planner (RAP) model. Specifically, for adaptive procedures, RAP adaptively determines the conclusion of actions using an auto-regressive model architecture. For temporal relation, RAP establishes an external memory module to explicitly retrieve the most relevant state-action pairs from the training videos and revises the generated procedures. To tackle high annotation cost, RAP utilizes a weakly-supervised learning manner to expand the training dataset to other task-relevant, unannotated videos by generating pseudo labels for action steps. Experiments on CrossTask and COIN benchmarks show the superiority of RAP over traditional fixed-length models, establishing it as a strong baseline solution for adaptive procedure planning.
- [39] arXiv:2403.18616 (cross-list from cs.HC) [pdf, other]
-
Title: Will You Participate? Exploring the Potential of Robotics Competitions on Human-centric TopicsJournal-ref: International Conference on Human-Computer Interaction (HCII) 2024Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO)
This paper presents findings from an exploratory needfinding study investigating the research current status and potential participation of the competitions on the robotics community towards four human-centric topics: safety, privacy, explainability, and federated learning. We conducted a survey with 34 participants across three distinguished European robotics consortia, nearly 60% of whom possessed over five years of research experience in robotics. Our qualitative and quantitative analysis revealed that current mainstream robotic researchers prioritize safety and explainability, expressing a greater willingness to invest in further research in these areas. Conversely, our results indicate that privacy and federated learning garner less attention and are perceived to have lower potential. Additionally, the study suggests a lack of enthusiasm within the robotics community for participating in competitions related to these topics. Based on these findings, we recommend targeting other communities, such as the machine learning community, for future competitions related to these four human-centric topics.
- [40] arXiv:2403.18695 (cross-list from eess.SY) [pdf, other]
-
Title: An Efficient Risk-aware Branch MPC for Automated Driving that is Robust to Uncertain Vehicle BehaviorsSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
One of the critical challenges in automated driving is ensuring safety of automated vehicles despite the unknown behavior of the other vehicles. Although motion prediction modules are able to generate a probability distribution associated with various behavior modes, their probabilistic estimates are often inaccurate, thus leading to a possibly unsafe trajectory. To overcome this challenge, we propose a risk-aware motion planning framework that appropriately accounts for the ambiguity in the estimated probability distribution. We formulate the risk-aware motion planning problem as a min-max optimization problem and develop an efficient iterative method by incorporating a regularization term in the probability update step. Via extensive numerical studies, we validate the convergence of our method and demonstrate its advantages compared to the state-of-the-art approaches.
- [41] arXiv:2403.18762 (cross-list from cs.CV) [pdf, other]
-
Title: ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place RecognitionAuthors: Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli ChenComments: 8 pages, 11 figures, conferenceSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation for modality conversion, which are usually computationally intensive and need expensive labeled data for depth supervision. In this work, we introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors. We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images. This module eliminates the necessity for depth estimation and helps subsequent modules achieve real-time performance. We further design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images. This encoder yields more distinctive global descriptors for retrieval. Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time. Additional evaluation on the HAOMO dataset covering a 17 km trajectory further shows the practical generalization capabilities. We have released the implementation of our methods as open source at: https://github.com/haomo-ai/ModaLink.git.
Replacements for Thu, 28 Mar 24
- [42] arXiv:2302.08463 (replaced) [pdf, other]
-
Title: Dynamic Grasping with a Learned Meta-ControllerComments: 9 pagesSubjects: Robotics (cs.RO)
- [43] arXiv:2308.00911 (replaced) [pdf, other]
-
Title: Optimal Sensor Deception to Deviate from an Allowed ItinerarySubjects: Robotics (cs.RO)
- [44] arXiv:2309.06494 (replaced) [pdf, other]
-
Title: Non-smooth Control Barrier Functions for Stochastic Dynamical SystemsSubjects: Robotics (cs.RO)
- [45] arXiv:2309.10718 (replaced) [pdf, other]
-
Title: DRIVE: Data-driven Robot Input Vector ExplorationAuthors: Dominic Baril, Simon-Pierre Deschênes, Luc Coupal, Cyril Goffin, Julien Lépine, Philippe Giguère, François PomerleauComments: 8 pages, 7 figures, 1 table, accepted for publication at the 2024 IEEE International Conference on Robotics and Automation (ICRA2024), Yokohama, JapanSubjects: Robotics (cs.RO)
- [46] arXiv:2309.12857 (replaced) [pdf, other]
-
Title: Risk-aware Control for Robots with Non-Gaussian Belief SpacesSubjects: Robotics (cs.RO)
- [47] arXiv:2311.03189 (replaced) [pdf, other]
-
Title: Safe Control for Soft-Rigid Robots with Self-Contact using Control Barrier FunctionsComments: 6 pages, 6 figures, submitted to IEEE Robosoft 2024 ConferenceSubjects: Robotics (cs.RO)
- [48] arXiv:2311.05362 (replaced) [pdf, other]
-
Title: Modeling and Control of Intrinsically Elasticity Coupled Soft-Rigid RobotsComments: 7 pages, 8 figuresSubjects: Robotics (cs.RO)
- [49] arXiv:2311.08787 (replaced) [pdf, other]
-
Title: Polygonal Cone Control Barrier Functions (PolyC2BF) for safe navigation in cluttered environmentsComments: 6 Pages, 6 Figures. Accepted at European Control Conference (ECC) 2024. arXiv admin note: text overlap with arXiv:2303.15871Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
- [50] arXiv:2401.11542 (replaced) [pdf, other]
-
Title: Nigel -- Mechatronic Design and Robust Sim2Real Control of an Over-Actuated Autonomous VehicleSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
- [51] arXiv:2403.07091 (replaced) [pdf, other]
-
Title: Sim-to-Real gap in RL: Use Case with TIAGo and Isaac Sim/GymComments: Accepted in ERF24 workshop "Towards Efficient and Portable Robot Learning for Real-World Settings". To be published in Springer Proceedings in Advanced RoboticsSubjects: Robotics (cs.RO)
- [52] arXiv:2403.11617 (replaced) [pdf, other]
-
Title: Frontier-Based Exploration for Multi-Robot Rendezvous in Communication-Restricted Unknown EnvironmentsSubjects: Robotics (cs.RO)
- [53] arXiv:2403.14864 (replaced) [pdf, other]
-
Title: Learning Quadruped Locomotion Using Differentiable SimulationSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
- [54] arXiv:2403.16967 (replaced) [pdf, other]
-
Title: Visual Whole-Body Control for Legged Loco-ManipulationComments: The first two authors contribute equally. Project page: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [55] arXiv:2403.17320 (replaced) [pdf, other]
-
Title: Leveraging Symmetry in RL-based Legged Locomotion ControlAuthors: Zhi Su, Xiaoyu Huang, Daniel Ordoñez-Apraez, Yunfei Li, Zhongyu Li, Qiayuan Liao, Giulio Turrisi, Massimiliano Pontil, Claudio Semini, Yi Wu, Koushil SreenathSubjects: Robotics (cs.RO)
- [56] arXiv:2403.17367 (replaced) [pdf, other]
-
Title: RoboDuet: A Framework Affording Mobile-Manipulation and Cross-EmbodimentAuthors: Guoping Pan, Qingwei Ben, Zhecheng Yuan, Guangqi Jiang, Yandong Ji, Jiangmiao Pang, Houde Liu, Huazhe XuSubjects: Robotics (cs.RO)
- [57] arXiv:2403.17392 (replaced) [pdf, other]
-
Title: Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrainAuthors: Yang Bai, Phuoc Thanh Tran Ngoc, Huu Duoc Nguyen, Duc Long Le, Quang Huy Ha, Kazuki Kai, Yu Xiang See To, Yaosheng Deng, Jie Song, Naoki Wakamiya, Hirotaka Sato, Masaki OguraSubjects: Robotics (cs.RO); Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO)
- [58] arXiv:2310.04181 (replaced) [pdf, other]
-
Title: DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse ConditionsAuthors: Sanket Kalwar, Mihir Ungarala, Shruti Jain, Aaron Monis, Krishna Reddy Konda, Sourav Garg, K Madhava KrishnaSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [59] arXiv:2310.17072 (replaced) [pdf, other]
-
Title: MMP++: Motion Manifold Primitives with Parametric Curve ModelsAuthors: Yonghyeon LeeComments: 12 pages. This work has been submitted to the IEEE for possible publicationSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [60] arXiv:2311.02749 (replaced) [pdf, other]
-
Title: Fast Point Cloud to Mesh Reconstruction for Deformable Object TrackingComments: 8 pages with appendix,16 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [61] arXiv:2311.08100 (replaced) [pdf, other]
-
Title: PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous DrivingSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [62] arXiv:2311.15803 (replaced) [pdf, other]
-
Title: SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance FieldsAuthors: Quentin Herau, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Cyrille Migniot, Pascal Vasseur, Cédric DemonceauxComments: Accepted at CVPR 2024. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [63] arXiv:2312.01616 (replaced) [pdf, other]
-
Title: SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation SystemSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [64] arXiv:2312.02126 (replaced) [pdf, other]
-
Title: SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAMAuthors: Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon LuitenComments: CVPR 2024. Website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [65] arXiv:2312.08344 (replaced) [pdf, other]
-
Title: FoundationPose: Unified 6D Pose Estimation and Tracking of Novel ObjectsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[ showing up to 1000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2403, contact, help (Access key information)