Robot learns to cook by watching YouTube

Eric Hopton for redOrbit.com – Your Universe Online
Let’s jump forward in time a little. At some imagined point in the not too distant world, advances in robotics and artificial intelligence have seen massive growth. In almost every home, humans rely on robots to carry out everyday tasks. You have your favorite robot. Of course you do. It does everything it is programed to do – no complaints. It doesn’t even ask for pocket money. You have tweaked it as far as you can to adapt to your peculiar ways. But then you need it to do a new task that’s not part of its database – you need it to bake a muffin.
Can it do that? No, it needs to learn. You could send it back for reprograming but you can’t imagine life without it, even for a few days. You could buy an expensive downloadable fix from Amazoogle, but you are too mean and you still hate their tax arrangements. So you take the easy option. You plug it into You Tube and say “Hey, buddy, watch a few vids on making muffins and knock a few out as soon as you are ready.”
An hour later, you have your muffins. If that sounds far-fetched, the findings of a neat piece of recently published research might reassure you that one day you will get your AI muffin fix.
The study was carried out by scientists from the University of Maryland and NICTA in Australia. Their paper will be presented later this month at the 29th annual conference of the Association for the Advancement of Artificial Intelligence. The research involved teaching a “self-learning robot” how to improve its knowledge about fine-grained manipulation actions – like cooking skills – just by allowing it to watch demonstration videos.
The new robot-training system is based on recent advances in our understanding of “deep neural networks” in computer vision. A key element of the experiments was to teach the robot where and how to hold an object just from its appearance. A big obstacle was the fact that videos only present 2D information. While a human would naturally translate that into 3D conceptualization by recognizing that, for example, that elongated rounded shape is an egg, robots don’t have that ability to “infer”.
Also the robot had to learn to select which element of the video image it was supposed to model. It then had to learn how to associate specific actions like hand movements with particular objects. Breaking an egg is a good example.
The researchers developed a deep-learning system that combines object recognition with grasping-type recognition. The system they created is derived from recognition modules which are built on a “convolutional neural network” (CNN).  This helps the robot to predict the best action to take by tapping into a large internal database.
To train their robot model, the researchers used data from 88 YouTube videos of people cooking. The team then generated commands for the robot to execute.
“We believe this preliminary integrated system raises hope towards a fully intelligent robot for manipulation tasks that can automatically enrich its own knowledge resource by “watching” recordings from the World Wide Web,” the researchers concluded.
The full paper, “Robot Learning Manipulation Action Plans by ‘Watching’ Unconstrained Videos” is available here (PDF).
That muffin-baking robot may not be so far away then. But, back to the future for a moment, what if those robots become foodie obsessives and you just can’t drag your favourite robot buddy away from watching endless repeats of Barefoot Contessa? Life is never easy, is it?
—–
Follow redOrbit on TwitterFacebookInstagram and Pinterest.