We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 27 entries: 1-27 ]
[ showing up to 500 entries per page: fewer | more ]

New submissions for Thu, 30 Mar 23

[1]  arXiv:2303.16382 [pdf, other]
Title: ARMBench: An Object-centric Benchmark Dataset for Robotic Manipulation
Comments: To appear at the IEEE Conference on Robotics and Automation (ICRA), 2023
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

This paper introduces Amazon Robotic Manipulation Benchmark (ARMBench), a large-scale, object-centric benchmark dataset for robotic manipulation in the context of a warehouse. Automation of operations in modern warehouses requires a robotic manipulator to deal with a wide variety of objects, unstructured storage, and dynamically changing inventory. Such settings pose challenges in perceiving the identity, physical characteristics, and state of objects during manipulation. Existing datasets for robotic manipulation consider a limited set of objects or utilize 3D models to generate synthetic scenes with limitation in capturing the variety of object properties, clutter, and interactions. We present a large-scale dataset collected in an Amazon warehouse using a robotic manipulator performing object singulation from containers with heterogeneous contents. ARMBench contains images, videos, and metadata that corresponds to 235K+ pick-and-place activities on 190K+ unique objects. The data is captured at different stages of manipulation, i.e., pre-pick, during transfer, and after placement. Benchmark tasks are proposed by virtue of high-quality annotations and baseline performance evaluation are presented on three visual perception challenges, namely 1) object segmentation in clutter, 2) object identification, and 3) defect detection. ARMBench can be accessed at this http URL

[2]  arXiv:2303.16386 [pdf, other]
Title: Quantifying VIO Uncertainty
Subjects: Robotics (cs.RO)

We compute the uncertainty of XIVO, a monocular visual-inertial odometry system based on the Extended Kalman Filter, in the presence of Gaussian noise, drift, and attribution errors in the feature tracks in addition to Gaussian noise and drift in the IMU. Uncertainty is computed using Monte-Carlo simulations of a sufficiently exciting trajectory in the midst of a point cloud that bypass the typical image processing and feature tracking steps. We find that attribution errors have the largest detrimental effect on performance. Even with just small amounts of Gaussian noise and/or drift, however, the probability that XIVO's performance resembles the mean performance when noise and/or drift is artificially high is greater than 1 in 100.

[3]  arXiv:2303.16427 [pdf, other]
Title: Learning Excavation of Rigid Objects with Offline Reinforcement Learning
Comments: Submitted to IROS 2023
Subjects: Robotics (cs.RO)

Autonomous excavation is a challenging task. The unknown contact dynamics between the excavator bucket and the terrain could easily result in large contact forces and jamming problems during excavation. Traditional model-based methods struggle to handle such problems due to complex dynamic modeling. In this paper, we formulate the excavation skills with three novel manipulation primitives. We propose to learn the manipulation primitives with offline reinforcement learning (RL) to avoid large amounts of online robot interactions. The proposed method can learn efficient penetration skills from sub-optimal demonstrations, which contain sub-trajectories that can be ``stitched" together to formulate an optimal trajectory without causing jamming. We evaluate the proposed method with extensive experiments on excavating a variety of rigid objects and demonstrate that the learned policy outperforms the demonstrations. We also show that the learned policy can quickly adapt to unseen and challenging fragmented rocks with online fine-tuning.

[4]  arXiv:2303.16469 [pdf, other]
Title: Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Combined with demonstrations, deep reinforcement learning can efficiently develop policies for manipulators. However, it takes time to collect sufficient high-quality demonstrations in practice. And human demonstrations may be unsuitable for robots. The non-Markovian process and over-reliance on demonstrations are further challenges. For example, we found that RL agents are sensitive to demonstration quality in manipulation tasks and struggle to adapt to demonstrations directly from humans. Thus it is challenging to leverage low-quality and insufficient demonstrations to assist reinforcement learning in training better policies, and sometimes, limited demonstrations even lead to worse performance.
We propose a new algorithm named TD3fG (TD3 learning from a generator) to solve these problems. It forms a smooth transition from learning from experts to learning from experience. This innovation can help agents extract prior knowledge while reducing the detrimental effects of the demonstrations. Our algorithm performs well in Adroit manipulator and MuJoCo tasks with limited demonstrations.

[5]  arXiv:2303.16500 [pdf, other]
Title: AirLine: Efficient Learnable Line Detection with Local Edge Voting
Authors: Xiao Lin, Chen Wang
Subjects: Robotics (cs.RO)

Line detection is widely used in many robotic tasks such as scene recognition, 3D reconstruction, and simultaneous localization and mapping (SLAM). Compared to points, lines can provide both low-level and high-level geometrical information for downstream tasks. In this paper, we propose a novel edge-based line detection algorithm, AirLine, which can be applied to various tasks. In contrast to existing learnable endpoint-based methods which are sensitive to the geometrical condition of environments, AirLine can extract line segments directly from edges, resulting in a better generalization ability for unseen environments. Also to balance efficiency and accuracy, we introduce a region-grow algorithm and local edge voting scheme for line parameterization. To the best of our knowledge, AirLine is one of the first learnable edge-based line detection methods. Our extensive experiments show that it retains state-of-the-art-level precision yet with a 3-80 times runtime acceleration compared to other learning-based methods, which is critical for low-power robots.

[6]  arXiv:2303.16654 [pdf, other]
Title: Learning Augmented, Multi-Robot Long-Horizon Navigation in Partially Mapped Environments
Comments: 7 pages, 7 figures, ICRA2023
Subjects: Robotics (cs.RO)

We present a novel approach for efficient and reliable goal-directed long-horizon navigation for a multi-robot team in a structured, unknown environment by predicting statistics of unknown space. Building on recent work in learning-augmented model based planning under uncertainty, we introduce a high-level state and action abstraction that lets us approximate the challenging Dec-POMDP into a tractable stochastic MDP. Our Multi-Robot Learning over Subgoals Planner (MR-LSP) guides agents towards coordinated exploration of regions more likely to reach the unseen goal. We demonstrate improvement in cost against other multi-robot strategies; in simulated office-like environments, we show that our approach saves 13.29% (2 robot) and 4.6% (3 robot) average cost versus standard non-learned optimistic planning and a learning-informed baseline.

[7]  arXiv:2303.16739 [pdf, other]
Title: Active Implicit Object Reconstruction using Uncertainty-guided Next-Best-View Optimziation
Comments: 8 pages, 10 figures, Submitted to IEEE Robotics and Automation Letters (RA-L)
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Actively planning sensor views during object reconstruction is essential to autonomous mobile robots. This task is usually performed by evaluating information gain from an explicit uncertainty map. Existing algorithms compare options among a set of preset candidate views and select the next-best-view from them. In contrast to these, we take the emerging implicit representation as the object model and seamlessly combine it with the active reconstruction task. To fully integrate observation information into the model, we propose a supervision method specifically for object-level reconstruction that considers both valid and free space. Additionally, to directly evaluate view information from the implicit object model, we introduce a sample-based uncertainty evaluation method. It samples points on rays directly from the object model and uses variations of implicit function inferences as the uncertainty metrics, with no need for voxel traversal or an additional information map. Leveraging the differentiability of our metrics, it is possible to optimize the next-best-view by maximizing the uncertainty continuously. This does away with the traditionally-used candidate views setting, which may provide sub-optimal results. Experiments in simulations and real-world scenes show that our method effectively improves the reconstruction accuracy and the view-planning efficiency of active reconstruction tasks. The proposed system is going to open source at https://github.com/HITSZ-NRSL/ActiveImplicitRecon.git.

[8]  arXiv:2303.16821 [pdf, other]
Title: Decision Making for Autonomous Driving in Interactive Merge Scenarios via Learning-based Prediction
Comments: 12 pages, 12 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Autonomous agents that drive on roads shared with human drivers must reason about the nuanced interactions among traffic participants. This poses a highly challenging decision making problem since human behavior is influenced by a multitude of factors (e.g., human intentions and emotions) that are hard to model. This paper presents a decision making approach for autonomous driving, focusing on the complex task of merging into moving traffic where uncertainty emanates from the behavior of other drivers and imperfect sensor measurements. We frame the problem as a partially observable Markov decision process (POMDP) and solve it online with Monte Carlo tree search. The solution to the POMDP is a policy that performs high-level driving maneuvers, such as giving way to an approaching car, keeping a safe distance from the vehicle in front or merging into traffic. Our method leverages a model learned from data to predict the future states of traffic while explicitly accounting for interactions among the surrounding agents. From these predictions, the autonomous vehicle can anticipate the future consequences of its actions on the environment and optimize its trajectory accordingly. We thoroughly test our approach in simulation, showing that the autonomous vehicle can adapt its behavior to different situations. We also compare against other methods, demonstrating an improvement with respect to the considered performance metrics.

[9]  arXiv:2303.16865 [pdf, other]
Title: Legged Robots for Object Manipulation: A Review
Comments: Preprint of the paper submitted to Frontiers in Mechanical Engineering
Subjects: Robotics (cs.RO)

Legged robots can have a unique role in manipulating objects in dynamic, human-centric, or otherwise inaccessible environments. Although most legged robotics research to date typically focuses on traversing these challenging environments, many legged platform demonstrations have also included "moving an object" as a way of doing tangible work. Legged robots can be designed to manipulate a particular type of object (e.g., a cardboard box, a soccer ball, or a larger piece of furniture), by themselves or collaboratively. The objective of this review is to collect and learn from these examples, to both organize the work done so far in the community and highlight interesting open avenues for future work. This review categorizes existing works into four main manipulation methods: object interactions without grasping, manipulation with walking legs, dedicated non-locomotive arms, and legged teams. Each method has different design and autonomy features, which are illustrated by available examples in the literature. Based on a few simplifying assumptions, we further provide quantitative comparisons for the range of possible relative sizes of the manipulated object with respect to the robot. Taken together, these examples suggest new directions for research in legged robot manipulation, such as multifunctional limbs, terrain modeling, or learning-based control, to support a number of new deployments in challenging indoor/outdoor scenarios in warehouses/construction sites, preserved natural areas, and especially for home robotics.

[10]  arXiv:2303.16898 [pdf, other]
Title: Bagging by Learning to Singulate Layers Using Interactive Perception
Subjects: Robotics (cs.RO)

Many fabric handling and 2D deformable material tasks in homes and industry require singulating layers of material such as opening a bag or arranging garments for sewing. In contrast to methods requiring specialized sensing or end effectors, we use only visual observations with ordinary parallel jaw grippers. We propose SLIP: Singulating Layers using Interactive Perception, and apply SLIP to the task of autonomous bagging. We develop SLIP-Bagging, a bagging algorithm that manipulates a plastic or fabric bag from an unstructured state, and uses SLIP to grasp the top layer of the bag to open it for object insertion. In physical experiments, a YuMi robot achieves a success rate of 67% to 81% across bags of a variety of materials, shapes, and sizes, significantly improving in success rate and generality over prior work. Experiments also suggest that SLIP can be applied to tasks such as singulating layers of folded cloth and garments. Supplementary material is available at https://sites.google.com/view/slip-bagging/.

Cross-lists for Thu, 30 Mar 23

[11]  arXiv:2303.16408 (cross-list from cs.CV) [pdf, other]
Title: The Need for Inherently Privacy-Preserving Vision in Trustworthy Autonomous Systems
Comments: 7 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Vision is a popular and effective sensor for robotics from which we can derive rich information about the environment: the geometry and semantics of the scene, as well as the age, gender, identity, activity and even emotional state of humans within that scene. This raises important questions about the reach, lifespan, and potential misuse of this information. This paper is a call to action to consider privacy in the context of robotic vision. We propose a specific form privacy preservation in which no images are captured or could be reconstructed by an attacker even with full remote access. We present a set of principles by which such systems can be designed, and through a case study in localisation demonstrate in simulation a specific implementation that delivers an important robotic capability in an inherently privacy-preserving manner. This is a first step, and we hope to inspire future works that expand the range of applications open to sighted robotic systems.

[12]  arXiv:2303.16641 (cross-list from cs.MA) [pdf, other]
Title: A Hierarchical Game-Theoretic Decision-Making for Cooperative Multi-Agent Systems Under the Presence of Adversarial Agents
Comments: This paper is accepted by the ACM Symposium on Applied Computing (SAC) 2023 Technical Track on Intelligent Robotics and Multi-Agent Systems (IRMAS)
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)

Underlying relationships among Multi-Agent Systems (MAS) in hazardous scenarios can be represented as Game-theoretic models. This paper proposes a new hierarchical network-based model called Game-theoretic Utility Tree (GUT), which decomposes high-level strategies into executable low-level actions for cooperative MAS decisions. It combines with a new payoff measure based on agent needs for real-time strategy games. We present an Explore game domain, where we measure the performance of MAS achieving tasks from the perspective of balancing the success probability and system costs. We evaluate the GUT approach against state-of-the-art methods that greedily rely on rewards of the composite actions. Conclusive results on extensive numerical simulations indicate that GUT can organize more complex relationships among MAS cooperation, helping the group achieve challenging tasks with lower costs and higher winning rates. Furthermore, we demonstrated the applicability of the GUT using the simulator-hardware testbed - Robotarium. The performances verified the effectiveness of the GUT in the real robot application and validated that the GUT could effectively organize MAS cooperation strategies, helping the group with fewer advantages achieve higher performance.

[13]  arXiv:2303.16710 (cross-list from cs.CV) [pdf, other]
Title: An intelligent modular real-time vision-based system for environment perception
Comments: Accepted in NeurIPS 2022 Workshop on Machine Learning for Autonomous Driving
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

A significant portion of driving hazards is caused by human error and disregard for local driving regulations; Consequently, an intelligent assistance system can be beneficial. This paper proposes a novel vision-based modular package to ensure drivers' safety by perceiving the environment. Each module is designed based on accuracy and inference time to deliver real-time performance. As a result, the proposed system can be implemented on a wide range of vehicles with minimum hardware requirements. Our modular package comprises four main sections: lane detection, object detection, segmentation, and monocular depth estimation. Each section is accompanied by novel techniques to improve the accuracy of others along with the entire system. Furthermore, a GUI is developed to display perceived information to the driver. In addition to using public datasets, like BDD100K, we have also collected and annotated a local dataset that we utilize to fine-tune and evaluate our system. We show that the accuracy of our system is above 80% in all the sections. Our code and data are available at https://github.com/Pandas-Team/Autonomous-Vehicle-Environment-Perception

[14]  arXiv:2303.16746 (cross-list from math.OC) [pdf, other]
Title: FATROP : A Fast Constrained Optimal Control Problem Solver for Robot Trajectory Optimization and Control
Subjects: Optimization and Control (math.OC); Robotics (cs.RO)

Trajectory optimization is a powerful tool for robot motion planning and control. State-of-the-art general-purpose nonlinear programming solvers are versatile, handle constraints in an effective way and provide a high numerical robustness, but they are slow because they do not fully exploit the optimal control problem structure at hand. Existing structure-exploiting solvers are fast but they often lack techniques to deal with nonlinearity or rely on penalty methods to enforce (equality or inequality) path constraints. This works presents FATROP: a trajectory optimization solver that is fast and benefits from the salient features of general-purpose nonlinear optimization solvers. The speed-up is mainly achieved through the use of a specialized linear solver, based on a Riccati recursion that is generalized to also support stagewise equality constraints. To demonstrate the algorithm's potential, it is benchmarked on a set of robot problems that are challenging from a numerical perspective, including problems with a minimum-time objective and no-collision constraints. The solver is shown to solve problems for trajectory generation of a quadrotor, a robot manipulator and a truck-trailer problem in a few tens of milliseconds. The algorithm's C++-code implementation accompanies this work as open source software, released under the GNU Lesser General Public License (LGPL). This software framework may encourage and enable the robotics community to use trajectory optimization in more challenging applications.

[15]  arXiv:2303.16878 (cross-list from cs.CV) [pdf, other]
Title: Photometric LiDAR and RGB-D Bundle Adjustment
Comments: 11 pages, 9 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

The joint optimization of the sensor trajectory and 3D map is a crucial characteristic of Simultaneous Localization and Mapping (SLAM) systems. To achieve this, the gold standard is Bundle Adjustment (BA). Modern 3D LiDARs now retain higher resolutions that enable the creation of point cloud images resembling those taken by conventional cameras. Nevertheless, the typical effective global refinement techniques employed for RGB-D sensors are not widely applied to LiDARs. This paper presents a novel BA photometric strategy that accounts for both RGB-D and LiDAR in the same way. Our work can be used on top of any SLAM/GNSS estimate to improve and refine the initial trajectory. We conducted different experiments using these two depth sensors on public benchmarks. Our results show that our system performs on par or better compared to other state-of-the-art ad-hoc SLAM/BA strategies, free from data association and without making assumptions about the environment. In addition, we present the benefit of jointly using RGB-D and LiDAR within our unified method. We finally release an open-source CUDA/C++ implementation.

Replacements for Thu, 30 Mar 23

[16]  arXiv:2110.10324 (replaced) [pdf, other]
Title: HARPS: An Online POMDP Framework for Human-Assisted Robotic Planning and Sensing
Comments: Accepted to IEEE Transactions on Robotics. 20 pages, 18 figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[17]  arXiv:2111.00088 (replaced) [pdf, other]
Title: Stitching Dynamic Movement Primitives and Image-based Visual Servo Control
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
[18]  arXiv:2211.11903 (replaced) [pdf, other]
Title: FLEX: Full-Body Grasping Without Full-Body Grasps
Comments: CVPR 2023 Camera-ready
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
[19]  arXiv:2303.06335 (replaced) [pdf, other]
Title: Just Flip: Flipped Observation Generation and Optimization for Neural Radiance Fields to Cover Unobserved View
Subjects: Robotics (cs.RO)
[20]  arXiv:2303.09824 (replaced) [pdf, other]
Title: Motion Planning for Autonomous Driving: The State of the Art and Future Perspectives
Comments: 20 pages, 14 figures and 5 tables
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
[21]  arXiv:2303.14188 (replaced) [pdf, other]
Title: Learning from Few Demonstrations with Frame-Weighted Motion Generation
Comments: Submitted to RA-L
Subjects: Robotics (cs.RO)
[22]  arXiv:2209.09484 (replaced) [pdf, other]
Title: Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos
Comments: Accepted by CVPR 2023; Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[23]  arXiv:2209.11908 (replaced) [pdf, other]
Title: Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations
Journal-ref: Proceedings of Conference on Robot Learning (CoRL) 2022
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
[24]  arXiv:2303.07430 (replaced) [pdf, other]
Title: A Modular Platform For Collaborative, Distributed Sensor Fusion
Subjects: Systems and Control (eess.SY); Robotics (cs.RO); Image and Video Processing (eess.IV)
[25]  arXiv:2303.13654 (replaced) [pdf, other]
Title: NEWTON: Neural View-Centric Mapping for On-the-Fly Large-Scale SLAM
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[26]  arXiv:2303.15535 (replaced) [pdf, other]
Title: A Compositional Approach to Certifying the Almost Global Asymptotic Stability of Cascade Systems
Comments: This version corrects a minor technical error in Definition 6
Subjects: Optimization and Control (math.OC); Robotics (cs.RO); Dynamical Systems (math.DS)
[27]  arXiv:2303.16203 (replaced) [pdf, other]
Title: Your Diffusion Model is Secretly a Zero-Shot Classifier
Comments: Website at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
[ total of 27 entries: 1-27 ]
[ showing up to 500 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2303, contact, help  (Access key information)