Google researchers have developed a new model called UniPi, which can learn how to perform different tasks in different environments. With the advent of OpenAI’s GPT-4 and multimodal systems gaining traction, the UniPi model solves the challenges of environmental diversity and reward specification.
In a blog post, Google describes the model’s ability to perform different tasks using text guides and video demonstrations. UniPi leverages text that describes the task and a video showing how to complete it. UniPi uses a special program to generate videos showing the steps an agent should take to complete a task. UniPi then uses another program to derive the actions needed to accomplish the task. Finally, UniPi uses those actions to complete tasks in the real world or simulations.
According to the researchers, UniPi can generalize on familiar language prompts and new combinations. UniPi could help agents learn to perform different tasks in different real-world environments.
The researchers pre-trained UniPi on non-robot data and evaluated the quality of videos generated by UniPi using the Fréchet Inception Distance (FID) and Fréchet Video Distance (FVD) metrics. The pre-trained UniPi had significantly better FID and FVD scores and a higher CLIPScore than the unpre-trained UniPi. These results suggest that pre-training on non-robot data assists robot plan generation.
UniPi is a significant step forward in developing models that can learn to perform different tasks in different environments. UniPi could significantly impact the development of robotics and artificial intelligence, helping to create agents that can learn and perform tasks in various environments.