Google has unveiled its latest robot which facilitates more human like learning by “transferring learned concepts to new situations”. The robot has been named RT-2 and is the first vision-language-action (VLA) model for robot control. This model enables robots to execute real-world activities such as trash disposal.
According to Google, RT-2 aims to equip robots with the ability of “transferring information to actions.” This advancement enables robots to quickly adapt to new situations and environments. In theory, this may help unlock a number of use cases that were previously beyond reach. Simply put, Google says that “RT-2 can speak robot.”
Google also notes that making robots more helpful has been been a “herculean task,” and this is “because a robot capable of doing general tasks in the world needs to be able to handle complex, abstract tasks in highly variable environments — especially ones it’s never seen before,” and Google recent work in the form the new RT-2 model allows just that. “Recent work has improved robots’ ability to reason, even enabling them to use chain-of-thought prompting, a way to dissect multi-step problems,” Google said. Google explained how the new model works by citing the example of throwing out trash. In the past, if you wanted to train a robot to throw out trash, you would have to follow a multi-step process: train the robot to identify trash, how to pick it up, and where to throw it.
However, RT-2 eliminates the need for this by “transferring knowledge from a large corpus of web data.” This means that the robot can identify trash without training, and it can even figure out how to throw it away, even though it has never been trained to take that action. This is because RT-2 is able to understand the nature of trash from its vision-language data. “And think about the abstract nature of trash — what was a bag of chips or a banana peel becomes trash after you eat them. RT-2 is able to make sense of that from its vision-language training data and do the job,” Google added.
Moreover, unlike chatbots like ChatGPT, which are powered by large language models like Google’s PaLM 2 or OpenAI’s GPT-4, robots need to have a better understanding of the context in which they are operating. This includes understanding how to pick up objects, how to differentiate between similar-looking objects, and how objects fit into a given context.