Based on materials from theverge.com
When you play a video game, what motivates you to keep going?
Perhaps this is too broad a question to give an unambiguous answer, but still, if you try to summarize why you take a new quest, open a new level or cave, play a new round, the simplest explanation will be this: out of curiosity. Just to see what's next. And, as it turns out, curiosity is a very effective motivator when teaching AI to play video games.
More recently, the OpenAI Artificial Intelligence Lab published a study that explains how an AI-controlled character with a sense of curiosity surpassed its predecessors in Atari's 1984 classic, Montezuma's Revenge. Of course, success in 'Montezuma's Revenge' cannot be compared with such indicators as playing Go or Dota 2, but nevertheless, it is a great progress. When Google-owned AI company DeepMind published its 2015 report explaining how AI was winning at Atari's deep learning games, Montezuma's Revenge was the only game where it didn't score.
The problems with this particular game lie in the discrepancy between how this game is played and how AI learns, which also reveals a blind spot in the vision of the world by artificial intelligence.
Usually, training an AI-controlled character in video games is based on a technique called reinforced learning. In this paradigm, characters are introduced into the virtual world and receive rewards for their actions (for example, for increasing the score) or, conversely, are fined for others (such as losing a life). The AI-controlled character starts the game making decisions at random, but learns through trial and error. Reinforced learning is often seen as a way to create smarter robots.
The problem with Montezuma's Revenge is that it doesn't provide the usual rewards for the AI-controlled character. It is a platformer where players must explore the pyramid's dungeon, avoiding traps and enemies, while collecting keys to open doors and obtaining special items. If you train an AI character to win the game, then you should reward him for saving lives and obtaining keys, but how can you teach him to save certain keys needed for specific items, and also use those items to overcome traps and complete a level? Answer: curiosity.
In the OpenAI study, their character was rewarded not only for jumping over pits with stakes, but also for opening new parts of the pyramid. This made it possible to achieve higher efficiency than that of a human, the bot averaged more than 10,000 points in nine playthroughs, while human players have an average score of about 4,000. In one playthrough, an AI character even managed to pass one of the game levels.
'There is still a lot of work to be done,' says OpenAI's Harrison Edwards to The Verge. “But now we have a system that allows you to explore many rooms, get a lot of rewards and randomly, but pass the first level.” He adds that all subsequent levels of the game are similar to the first, so it is just a matter of time before completing the game.
Overcoming 'TV Interference Problems'
The OpenAI team is far from being the first in its endeavor, AI developers have been developing the concept of 'curiosity' as motivation for decades. They also looked to Montezuma's Revenge, but so far no one has been able to achieve this kind of AI success without resorting to learning through human example.
Thus, it turns out that although in general the theory is well developed, in practice there are still problems in specific solutions. For example, prediction-based curiosity is only useful for learning in a certain type of game. It will work in games like 'Mario', where there are large levels to explore, as well as a lot of bosses and enemies not previously encountered in the scenario. But in simpler games like Pong, the AI will prefer long matches to a real victory over an opponent. This is probably because winning the game is more predictable than the trajectory of the ball.
Another issue is known as 'TV Noise Problem'. It occurs at the moment when the AI-controlled character was programmed to search for new experiences, but began to return to random patterns, like a TV that catches static noise when setting up, but remembers it as a TV channel. This is because such characters have a sense of 'interesting' and 'new' based on predicting the future. Before taking an action, they predict how the game will change after it. If the prediction was correct, then it can be perceived as a collision with an already known part of the game. This mechanism is called 'Prediction error'.
And since static noise is unpredictable, it turns out that any AI that encounters 'TV noise' (or any other unpredictable stimulus) is hypnotized. OpenAI representatives compare this problem with gambling people who are addicted to slot machines, unable to tear themselves away from them, because they cannot predict the next result.
This GIF shows an AI-controlled character exploring a maze, distracted by randomly flashing images. GIF: OpenAI
A new OpenAI study bypasses this problem by changing the algorithm for predicting the future by artificial intelligence. The complete methodology, dubbed Random Network Distillation, is quite complex, but Edwards and his colleague Yuri Burda liken it to hiding a secret from AI in every screen of the game. This secret is random and meaningless (like the question 'what color is in the upper left corner of the screen?'), But it forces the AI-controlled character to explore the level without falling into the TV noise trap.
More importantly, this motivation does not require computation, which is of great importance. This reinforced learning method relies on the massive amount of data required to train an AI character (for the OpenAI team's bot, Montezuma's Revenge is the equivalent of three years of real-time), so every step of the way should be as fast as possible.
Arthur Giuliani, Unity programmer and machine learning expert, talks about what makes the OpenAI team so impressive. “The method they used was really simple, but surprisingly effective nonetheless,” Giuliani told The Verge. “It is indeed much simpler than other research methods that have been tested in games in the past and have not produced the same impressive results.” Giuliani notes that the similarities between the various levels of Montezuma's Revenge in OpenAI's work are essentially equivalent to “playing the game,” but adds that “the fact that they weren't able to consistently beat the first level suggests that there is still how to work. ' He's also not sure if this method will work in 3D games, where the visuals are more subtle and the world is designed for a first-person perspective. “In scenarios where exploration is required, but the differences between parts of the environment are more subtle, the method may not work,” says Giuliani.
Real-world robots like Boston Dynamics' SpotMini can also benefit from artificial curiosity. Photo by Matt Winkelmeyer (Getty Images WIRED25)
What is the meaning of curiosity
Why do we need artificial intelligence with curiosity at all? What a rewarding job it can do other than providing amusing parallels to the human ability to be distracted by random events.
The main reason is curiosity allows computers to learn on their own.
Most of the machine learning in use today can be divided into two camps. In the first, machines are trained by looking at vast amounts of information, generating patterns that they could apply to similar problems. In the second, they are embedded in a specific environment and rewarded for achieving a specific result using enhanced learning.
Both of these approaches are effective for specific tasks, they also require a lot of labor on the part of people to provide data for training or develop reward functions in a virtual environment. If you give artificial intelligence an internal stimulus for research for the sake of research, then some of the human labor will become unnecessary, and people will spend less time directing artificial intelligence on the right path.
OpenAI's Edward and Burda say this part of the curiosity-based learning system is much more suitable for creating computer programs that need to interact with the real world. Among other things, in reality, as in Montezuma's Revenge, instant gratification is rare, we need to work, study and research for a long time before we get anything in return. Curiosity makes us go forward. Perhaps computers will be the same story.