Artificial Intelligence playing Hide and Seek

An artificial intelligence project recently demonstrated a machine learning to perfect the game of hide and seek for both the hiders and the seekers where each one learned from prior games. The hiders would be rewarded by evading the seekers and the seekers would be rewarded by finding the hiders. This learning occurred in a simulated space with moveable and unmovable obstructions. There would be two hiders and two seekers. Within the pair, the two somehow learned to cooperate and collaborate, so that their individual reward came in part from the help of the other. The simulated games were run hundreds of millions of times. Over the course of the simulations, there would be eras where either the hiders or the seekers would find a strategy that would out compete the other, but over time this would change when the previously disadvantaged team would discover a new strategy. Quanta Magazine is one of many sources that describe the experiment and the observations.

Many elements of this simulation seem extravagant to the point where I am skeptical it is really doing what is claimed. The simulated environment are complex 3D models. The agents have vision that can scan both right and left and up and down where the latter is useful only much later on when they learn how to climb ramps. Each agent much have very deep learning capability. All of this was run hundreds of millions of times. This experiment used a lot of computer resources. The part that most raised my skepticism is that all of this was animated. For the sake of this discussion, I will suspend my skepticism and grant that the experiment ran exactly as described. The agents learned elaborate strategies whether to hide or to seek.

The broader interpretation of this experiment is that it may explain how natural intelligence works. Theoretical models of natural learning involves a process of adjustments following experiencing rewards and punishments. The artificial learning follows this model. The simulation demonstrates something that is recognizably intelligent. As a result, the experiment supports the theory that this is how natural intelligence works.

The part that stands out to me is the fact that this learning required hundreds of millions of simulations. Natural intelligence would become extinct if it required that many simulations to learn.

During this springtime, I have been watching a family of foxes grow up. There is a wildlife corridor following the stream that cuts through my yard. The foxes would travel along that path and occasionally show up in my yard. They seemed comfortable in my back yard. While the young ones were still growing up, I started a major landscaping project that disrupted the area closer to the house but left the rest of the property alone. While this was happening, I was watching how the foxes responded.

This is similar to a hide and seek game. The foxes were hiding. I was seeking. But there were no penalties or rewards involved. I don’t think the foxes ever knew I was spotting them. Other than satisfaction of seeing them, there was nothing tangible for me to gain. There certainly weren’t hundreds of millions of simulations.

The first phase of the project involved clearing the ground with a grader. This removed the vegetation and leveled out the dirt. That evening, the foxes returned and the adult checked out the changed territory. Intriguingly, the fox was very intent on the exposed dirt, at times pawing at it and even seemingly eating it. In my imagination, it was expressing a rage at what happened to what was once a nice spot for it. It was a day later that I realized what was happening. The fox had been burying previous kills in this area and was wanting to retrieve them now to feed the young ones. I realized this when it managed to find something under the dirt but in a location far from where it had been placed originally.

I am sure the sense of smell played a large role in this fact, but my point is that I learned something about the fox. Before now, I had no idea that foxes buried their kills for later consumption. This burial may be retrieved a long time later. I learned something, and this is something that might be useful in the future if I want to change the incentives for foxes to visit my property. This learning did not come from trial and error. It did not come from successive rewards and penalties. I simply observed the fox eventually finding a mouthful of some long-dead creature within the dirt.

The project eventually built two new terraces to the backyard landscape. This happened in the middle of the young foxes’ training by the adult. Prior to the project, I watched the older fox gradually move up one terrace higher than where the little ones were playing. At first I assumed it would give her a better vantage point to watch them and to watch out for any dangers. Eventually, I figured out that she was encouraging the little ones to venture further from the area where they were more comfortable. Prior to the landscaping project, there were at least 5 different fox-sized terraces between the stream and my house. The older fox was gradually encouraging the little ones to venture higher up. I’m not sure, but I think also she had hidden some food in those levels and was hoping to get the little ones to find it, a foxes’ version of an Easter egg hunt.

The finished landscape added two more levels, and these would be completely sterile of any buried food or even living prey. On each successive day, the foxes continued to work their way up the hill to eventually end up playing next to the house. Like in the hide-and-seek game, they immediately recognized the evasion opportunities afforded by the new retaining walls. The older fox was showing the little ones how to hide in the new structure.

I realize that this parental training is characteristic of foxes. She would have found some opportunity somewhere for evasion training. In some sense, I made it easy by bringing her training field closer to her home. My point here was how quickly they took advantage of the new terrain. I fully expected the entire construction commotion and major land disruption would discourage them from returning at all. Instead they seemed eager to stick around. Even during the day when there were multiple workers in the back yard, I spotted her at the edge of the property, watching, and probably thinking no one was spotting her. By then, the little ones were getting bigger and more independent and they too seemed to be comfortable with what was happening. They seemed to immediately understand what was happening, at least in terms of what it meant to them. They were not concerned.

These are city foxes. Their territory crosses a large number of properties and many of them are doing backyard projects of great diversity, and most of them are not meant to be accommodate foxes. Even without the human element, the little foxes were born in winter and now it is height of the spring growing season. The landscape is changing naturally with new growth. From the eyes of the young foxes, everything must be learned quickly.

Unlike the hide-and-seek game, the foxes are dealing with completely new elements thrown into their world, and with old elements taken out. Nature does not have the luxury of learning through successive trial and error with a consistent set of elements for each exercise. Dirt is replaced by concrete blocks. Short growth is replaced with tall growth. Optimal strategies for hunting and evasion completely change with each change, and these changes are occurring over the span of just a couple weeks.

It would be like the hide and seek game where each iteration had a different environment. One iteration would have a rolling ball, and the next would replace it with an immovable ramp. The strategy that was rewarded before would be penalized if reused later. Natural intelligence figures this out immediately. Natural intelligence learns about its surroundings but it is not learning from trial and error, or even by observation and deduction.

Another model for learning is the type we experience in school. This is learning from text books and learning plans. I learned about Newtonian mechanics first by reading about it, then by applying it in a controlled laboratory that avoided the imperfections of natural settings. I imagine something similar may be available to natural intelligence. Natural intelligence learns from a textbook. The fox family has the added benefit of having a teacher to provide lesson plans and carefully controlled laboratories, such as the burying of dead prey with the intent to have the young ones discover it.

Even the parent fox appears to be following some kind of textbook. Although she knows how to do things herself, the project of training the little ones is new to her. For that training, she does not have the luxury of trial and error. She only has so many offspring to lose.

Given the urban environment, I imagine that the foxes are domesticated in the same sense as city rats and squirrels are domesticated. They are not pets, but they are comfortable living very close to humans even with the evidence that humans often do not welcome them. Even so, they appear to be very accurate in assessing whether a situation presents a danger or not.

I recall something I saw a long time ago. There was a squirrel on the ground being molested by an immature bird of prey. If that bird had been older, it could easily have made a meal of that squirrel. Even though clearly the bird was trying to land on top of the squirrel, the squirrel would most just enough to get out of the way and then go about its business as if nothing was happening. Eventually, it appeared that both recognized they were playing a game instead of being a true predator-prey scenario. I really wonder if there came a point where each realized that this was just for fun, or for practice. Some learning mechanism lead to that realization, and that mechanism did not require hundreds of millions of simulations with rewards and penalties.

I recall seeing documentaries of wildlife rescue centers that would sometimes nurture together the youths of predators and prey and they would end up playing with each other and by all impressions they would be playing. The predator youth would pin down the prey-youth where killing it would be easy but then let it escape. Even then, there came a point where it was necessary to separate them, because the game would turn deadly for the prey species.

There are other documentaries showing that there would be life-long bond even through adulthood where the predator would not hunt that particular prey, and that prey would not fear that particular predator. This was behavior learned from their youthful acquaintances, but in very similar developmental scenarios the adult predator would not hesitate to kill its childhood acquaintance and that prey would immediate flee if it saw its childhood playmate in adult form.

There is more going on in natural intelligence than just conditioning through repetitive testing with rewards and penalties. Often there are no repetitions. For example, often the penalty is death and there is no way to to learn when the game is over for that individual. The first response is often the right one. Somehow the organism had access to the right answer before being taught through some exercise. It read it somewhere.

We describe this innate understanding as instinct. I think about the newborn foal instinctually knowing how to stand up, and to walk toward its mother, and in particular to seek milk. This is knowledge, but it was not learned from trial and error. We ascribe this learning to programming within the DNA, but I am skeptical that DNA contains the actual instructions for how to stand up, how to recognize the mother, and to move the legs in a way that moves closer to the mother. DNA does not operate that quickly. DNA could prewire the brain with this information so that the brain can act on these instructions more quickly.

A more straight forward explanation is that the foal figured the situation out immediately. It starts off collapsed uncomfortably on the ground. It figures out that it can move the appendages and quickly figures out the arrangement that is most comfortable, the arrangement that happens to be a standing posture. This is like solving a puzzle. The next lessons are recognizing the mother and figuring out how to move toward it. Unlike the initial state of being uncomfortable on the ground, there is no immediate need to recognize the mother or to move toward it. The animal could just stand still for the rest of its life. It would feel hunger or thirst, but to it, these are just a fact of life. From a newborn perspective, a brief life is a full life. Something must be telling it that a full life is not a brief one.

In the beginning of the artificial intelligence hide and seek simulation, the seeker agents did not know they were supposed to seek, and the hiders didn’t know they were supposed to hide. In the earliest iterations, both would wander randomly for the specified period of time. At the end of each early episode, the hiders would receive some reward for evading the seekers, and the seekers would receive a reward for finding the hiders. Eventually they would locked into their roles: hiders striving to hide better, and seekers striving to seek better.

In the beginning of an organism’s life, whether a liveborn or a hatchling, the individual does not know it is supposed to live long. A newborn seeing the world for the first time could be satisfied just passively observing the world for the brief time it lives. Even with penalties such as thirst and hunger, the creature has no way of knowing that these are annoyances that must be tolerated.

In species that care for their young, the parent can provide some food or milk and this would feel good enough to give the young one a reason to seek out more. For species like turtles that abandon their nest of eggs so that the little ones emerge alone in the world, nothing is around to give the little ones any evidence that something can satisfy their hunger, or that the water would give them relief from the sun.

In a prior post, I used the example of an acorn that follows a script for building an acorn factory. Even though it has that script, what propels it to follow the script. I can imagine the seed accessing the nutrients provided by its parents and use those to build a tap root and the stem that can reach the sunlight. For plants, they get the immediate rewards: the roots are absorbing water and minerals, the leaf is converting sunlight into excess energy. Both are rewards and the plant decides to do more of both. It makes more roots, it makes more leaves. The added bulk adds new challenge of needing a stronger structure. That learning process can continue until there is a mature tree.

Like in the hide-and-seek simulation, the tree learned the ultimate optimal strategy is to become a mature tree. Unlike the simulation, the tree goes one additional step to go about the project of reproducing, providing the seed not only with necessary starting nutrients, but also, at least the instructions for how to get started. The instructions for the need to create new seeds also needs to be learned. It did not learn it from trial and error, or through a system of rewards or penalties.

The hide-and-seek simulated hiders or seekers did not attempt to reproduce to propagate their discoveries. The simulations already had to run hundreds of millions of times. Maybe some future experiment would be able to billions of repetitions to add the element of propagation, allowing successful agents from one environment to populate new environments that are different from the one they grew up in, and where both their partner and their antagonists came from distinctly different environments.

The hide and seek experiment started with a blank slate for the agents. They had no prior conditioning and can learn entirely from the situation they are in. Something very different would happen if the agents were born from successful parents but find themselves in an environment different from what they parents excelled in. In that mega-simulation, the second generation agents may not be so successful because what worked for the parents would no longer work. Maybe the mega-simulation would discover an analogy to what happens in life, where the offspring does thrive with the information provided by the parent and yet in an environment unlike what the parent faced.

One thought on “Artificial Intelligence playing Hide and Seek

  1. Pingback: COVID19: Worker Exodus | Hypothesis Discovery

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s