Abstract

Flexible pick-and-place is a fundamental yet challenging task within robotics, in particular due to the need of an object model for a simple target pose definition. In this work, the robot instead learns to pick-and-place objects using planar manipulation according to a single, demonstrated goal state. Our primary contribution lies within combining robot learning of primitives, commonly estimated by fully-convolutional neural networks, with one-shot imitation learning. Therefore, we define the place reward as a contrastive loss between real-world measurements and a task-specific noise distribution. Furthermore, we design our system to learn in a self-supervised manner, enabling real-world experiments with up to 25000 pick-and-place actions. Then, our robot is able to place trained objects with an average placement error of 2.7±0.2 mm and 2.6±0.8°. As our approach does not require an object model, the robot is able to generalize to unknown objects keeping a precision of 5.9±1.1 mm and 4.1±1.2°. We further show a range of emerging behaviors: The robot naturally learns to select the correct object in the presence of multiple object types, precisely inserts objects within a peg game, picks screws out of dense clutter, and infers multiple pick-and-place actions from a single goal state.

Conference Video


Supplementary Material

Below, we show supplementary videos of our pick-and-place system. As our approach places objects according to a demonstrated goal state, it does not require an object model. We've trained two models: First, a model using RGBD-images handling screws on around 3500 pick-and-place actions. Second, a general model using depth-images trained while manipulating wooden objects with around 25000 pick-and-place actions. It is used for all further experiments without screws.

Unknown Objects

As no object model is needed, our system is able to pick-and-place even unknown objects with high precision.

Video 1 Pick-and-place of various unknown objects.

Insertion Task

To further demonstrate the precision of our system, we evaluate insertion tasks with small tolerances. The robot achieves success rates - depending on the object type - of up to 90% despite grasping out of clutter.

Video 2 Playing the peg game.
Video 3 Inserting a screw into a custom-designed holder.

Multiple Actions

Our robot is able to infer multiple pick-and-place actions from a single goal state.

Video 4 Placing the logo of our Alma Mater, the Karlsruhe Institute of Technology (KIT).
Video 5 Precisely isolating multiple screws out of dense clutter.
Video 6 Demonstrating the flexibility of our one-shot imitation learning approach. For some examples, we demonstrate multiple goal states as an instruction list.

Visual State Space

Here we show some samples of successful pick-and-place actions. Our approach uses four images, in particular of the window around the robot's tool center point (TCP), as its visual state space.

Grasp Before
Place Before
Place After
Place Goal