From WALL-E to Reality - Meet RT-2, The Lates Google Robot

Remember WALL-E? Well, guess what? Google just announced something like that—a general-purpose robot that can navigate human environments like those cool fictional robots WALL-E or C-3PO. But here's the kicker: these robots are trained like AI chatbots. Confusing, right? Let me break it down for you.

Let's meet RT-2, the awesome general-purpose robot powered by a vision-language-action (VLA) model that's here to change the game in robotics. RT-2's brain is a powerful language model that's been trained on tons of text and images from the Internet.

That means it can recognize patterns and do tasks without needing any specific training. It's like having a super-smart assistant who is always ready to help.

One of the coolest things about RT-2 is how it can identify and toss out trash without any prior training. It knows what trash is and how to get rid of it, even tricky stuff like food packaging or banana peels. That's some serious brainpower!

And you know why RT-2 is so good at what it does? It had some intense training with its older sibling, RT-1. For 17 months, they both worked together in an "office kitchen environment," gaining real-world experience and becoming super skilled at handling everyday situations.

0:00

trained on tons of text and images from the Internet

To control RT-2, Google uses these nifty things called tokens. They're like tiny word fragments that help the robot understand and follow instructions to the T. It's like a language tokenizer for robots, making sure they do exactly what you want.

But wait, there's more! RT-2's got some top-notch cognitive skills. It can reason through multiple stages of decision-making, like choosing the best tool or picking the perfect drink for someone who's tired. That's one smart bot, I tell you!

Examples of generalized robotic skills RT-2 can perform that were not in the robotics data. Instead, it learned about them from scrapes of the web.

So, after over 6,000 tests, RT-2 has shown off its skills with flying colors. When it comes to tasks it was trained for ("seen tasks"), it's just as good as its older sibling, RT-1. But the real magic happens in "unseen scenarios," where RT-2's success rate skyrockets to an impressive 62%, leaving RT-1's 32% in the dust!

Even though RT-2 is pretty amazing, Google admits it's not perfect. While it's great at understanding things from the web, it can't suddenly develop new physical abilities out of thin air. But hey, don't worry! The future holds exciting possibilities for even smarter and more capable robots. The journey has just begun!

Sources: deepmind.com / robotics-transformer2