DeepMind's newest AI learns new skills by watching humans

0 3

By Matthew Griffin Intelligence and the Senses 13th December 2023

WHY THIS MATTERS IN BRIEF

We think that we need data to train AI, but data comes in many forms, and new AI’s are being trained in many new ways that advance their capabilities and skills.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

Teaching algorithms to mimic humans typically requires hundreds or thousands of examples. But a new Artificial Intelligence (AI) from Google DeepMind can pick up new skills from human demonstrators on the fly by just watching them, similar to what we say with the Baxter robot from MIT a while ago which, in that case, used telepathy to learn new things from humans.

One of humanity’s greatest tricks is our ability to acquire knowledge rapidly and efficiently from each other. This kind of social learning, often referred to as cultural transmission, is what allows us to show a colleague how to use a new tool or teach our children nursery rhymes.

The Future of AI, Cyber, and Data, by Keynote Matthew Griffin

It’s no surprise that researchers have tried to replicate the process in machines. Imitation learning, in which AI watches a human complete a task and then tries to mimic their behaviour, has long been a popular approach for training robots. But even today’s most advanced deep learning algorithms typically need to see many examples before they can successfully copy their trainers.

When humans learn through imitation, they can often pick up new tasks after just a handful of demonstrations. Now, Google DeepMind researchers have taken a step toward rapid social learning in AI with agents that learn to navigate a virtual world from humans in real time.

“Our agents succeed at real-time imitation of a human in novel contexts without using any pre-collected human data,” the researchers write in a paper in Nature Communications. “We identify a surprisingly simple set of ingredients sufficient for generating cultural transmission.”

The researchers trained their agents in a specially designed simulator called GoalCycle3D. The simulator uses an algorithm to generate an almost endless number of different environments based on rules about how the simulation should operate and what aspects of it should vary.

In each environment, small blob-like AI agents must navigate uneven terrain and various obstacles to pass through a series of coloured spheres in a specific order. The bumpiness of the terrain, the density of obstacles, and the configuration of the spheres varies between environments.

The agents are trained to navigate using reinforcement learning. They earn a reward for passing through the spheres in the correct order and use this signal to improve their performance over many trials. But in addition, the environments also feature an expert agent – which is either hard-coded or controlled by a human – that already knows the correct route through the course.

Over many training runs, the AI agents learn not only the fundamentals of how the environments operate, but also that the quickest way to solve each problem is to imitate the expert. To ensure the agents were learning to imitate rather than just memorizing the courses, the team trained them on one set of environments and then tested them on another. Crucially, after training, the team showed that their agents could imitate an expert and continue to follow the route even without the expert.

This required a few tweaks to standard reinforcement learning approaches.

The researchers made the algorithm focus on the expert by having it predict the location of the other agent. They also gave it a memory module. During training, the expert would drop in and out of environments, forcing the agent to memorize its actions for when it was no longer present. The AI also trained on a broad set of environments, which ensured it saw a wide range of possible tasks.

It might be difficult to translate the approach to more practical domains though. A key limitation is that when the researchers tested if the AI could learn from human demonstrations, the expert agent was controlled by one person during all training runs. That makes it hard to know whether the agents could learn from a variety of people.

More pressingly, the ability to randomly alter the training environment would be difficult to recreate in the real world. And the underlying task was simple, requiring no fine motor control and occurring in highly controlled virtual environments.

Still, social learning progress in AI is welcome. If we’re to live in a world with intelligent machines, finding efficient and intuitive ways to share our experience and expertise with them will be crucial.

Matthew Griffin / About Author

Matthew Griffin, described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath, is one of the world's most renowned futurists and strategic foresight experts. A serial entrepreneur Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. 13 times author of the Codex of the Future series, Matthew is an in demand international keynote, acclaimed university lecturer, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, Pepsi Co, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.