Teaching Algorithms to Learn

The Carrot-and-Stick Approach of learning

Except for the genetic heritage, any newborn is a clean slate and needs guidance to learn the things of the world she needs to know. Guidance is ultimately a well-known learning algorithm that any adult human being builds in years and then, often unconsciously, transmits to children. The algorithm builds on the core findings that humans started learning in the Neolithic age with the initial development of agriculture and especially farming.

It’s informally called the method of the carrot and the stick.

Building a Behavior

As humans, we learn through the input we receive from senses and senses give us instinctively the feeling that something is good or bad. Sensory inputs are then logged into the brain and further elaborated into a number of information (e.g., we like it, we don’t like it) that, once stored, altogether contribute to the brain database of memories. The content of this database of memories will then be used further on to elaborate the next sensory input in a less instinctive, more thoughtful way.

Imagine doing this over and over and again, synapses after synapses, second after second. A humungous archive of information is built. The brain, in a way, indexes the database so that it can quickly find the set of neural commands to forward to neurons and muscles to react. Any sign of life is a matter of reacting to some stimulus.

Hence, the more you receive a given input, the more you know about the behavior you consciously want to have about it. The more you receive a given input, the more often you react in a given way. Therefore, the more you morph the instinctive behavior into some more thoughtful behavior.

Changing the instinctive behavior to something smarter in terms of final results is the ultimate purpose of training. For animals, humans as well as for algorithms.

Reward and Punishment

The purpose of training is changing the frequency of certain behaviors so that an undesirable behavior is observed less often and a desirable behavior, instead, is observed more likely. To build any form of training, you must use consequences and fine-tune your actions to trigger just the expected reactions as often as possible.

One of the core principles of training is offering the trainee a positive experience–a reward–in response to a desirable action. Another core principle of training is specular and consists in offering the trainee an aversive experience–a punishment–in return of an unwanted action. All trainers, therefore, use consequences as all trainees instinctively tend to orientate their behavior towards rewards (the carrot) and to avoid punishment (the stick). 

The carrot-and-stick approach is the pattern that animals and humans use to learn. What about algorithms? If the carrot exists to please the trainee and the stick serves to punish the trainee, how do you please, or punish, an algorithm? Of course, you don’t. Or, at least, you don’t do it in the common sense.

For a trained algorithm, the reward or the punishment results from the position of the computed result, over or under an acceptable threshold.

And Once the Behavior is Learned?

Once a behavior is learned should you remove any rewards from the equation and expect that the human, or the animal, behaves the desirable way only for the joy of performing? Professional dog trainers, for example, agree that you can reduce the food reward but never phase it out entirely. The reason is that dogs (but the same holds true for other animals and, even more, humans) are living beings and use their behavior as a tool to trigger events and produce consequences.

The carrot (or stick) stimulus should remain even when the behavior is learned, but the strength of the stimulus should be modulated over time to maintain the likelihood of desirable behavior high enough. What about algorithms? It’s just the same.

A trained algorithm, more than animals and humans, can effectively be thought as a fountain from which desirable behaviors flow. The issue is that the landscape and the context around the fountain may continuously change. Hence the algorithm, like animals and humans do, must adapt. Adapting the algorithm to a changed context required a new session training and probably also fine-tuning the threshold that sets what’s good and what’s bad.

NOTE Overall, the carrot-and-stick approach is a technique aimed first at teaching how to perform a desired behavior and then at internalizing it so that it becomes natural regardless of the actual conditions it was initially taught in. Thinking of carrots and sticks with an eye on machine learning, the words pronounced by Sir Winston Churchill right before the beginning of the second world war resound evocatively: Thus, by every device from the stick to the carrot, the emaciated Austrian donkey is made to pull the Nazi barrow up an ever-steepening hill.

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *