equine clicker training

using precision and positive reinforcement to teach horses and people

ASAT Conference 2016: Dr. Iver Iversen on “Selection and Creation Processes Involved in Shaping a Novel Behavior: Method and Theory”

rat in skinner box 3

Dr. Iver Iversen was the keynote speaker at the conference. He is a Professor in the Department of Psychology at the University of North Florida.  In his CV, he writes:  “At UNF’s animal learning laboratory, I use rats to study a) circadian rhythms of learned behavior, b) stimulus control; specifically how chains of learned behavior are formed and broken down, c) how learned behavior is maintained over long periods of time, and d) how significant environmental events come to determine when specific behaviors will occur. From 1993 to 2000 a good part of my research was done during the summers at Primate Research Institute, Kyoto University, Japan, where I established automated methods to train chimpanzees to perform complex human-like behaviors such as drawing the letters of the alphabet.”

His talk was about shaping. He explored questions like:

What is the science behind shaping? How is it studied in the laboratory? How can we measure it? What are the basic mechanisms? What is the role of variability?

What is the Science Behind Shaping?

He started with a little history, first looking at Thorndike and then at Skinner.

Edward Thorndike’s Law of Effect (1905) states that “responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect become less likely to occur again in that situation.”

A lot of Thorndike’s work was done by putting cats in “Puzzle-boxes” that could be opened when the cat manipulated the closing mechanism in the correct way. The cats learned how to get out through trial and error and over time became faster and faster at escaping. This kind of “trial and error” learning happened when the cat started with normal behaviors (pawing, scratching, pushing, etc…) and continued working until it found something that worked.  This is a slow process and the learning curve is quite gradual.   Thorndike said that over time unsuccessful behavior would be “stamped out” and successful behavior would be “stamped in”

B.F. Skinner, who is known for his work defining and studying operant conditioning (among other things), was studying rats in laboratories and found that a rat could very quickly be trained to press a lever to get food.   Receiving just one food pellet was enough to change the rat’s behavior.  Dr. Iversen said that this discovery changed “the science of behavior.”

How Is It Studied in the Laboratory? What is the Role of Variability?

It’s important to remember that a single reinforcer can change behavior.  He had some video of a rat in a Skinner box. The rat was reinforced at random intervals and it was fascinating to watch how the rat responded each time it was reinforced.  It usually went and repeated the last behavior it had done which might be sniffing the ceiling, going to a corner, or some other location.  If that didn’t “work,” then the rat would scroll through past behaviors that it had been doing when it was reinforced.  If it was reinforced for one behavior more than once (by chance), then it would repeat that behavior more.

The effect of one reinforcer can be shown by looking at rat behavior when presented with a hole board which contains food in one or more locations. A hole board is a grid with many openings into which food can be placed.  The rat is observed to see where it looks for food.  He had some diagrams showing that if food is placed in a central location, the rat will spend more time investigating the center hole (where food has been in the past), AND it will also investigate the surrounding holes.  This is an example of “response spreading or generalization” or the “spread of effect” and can be used to capture variations on the original behavior.   He did say that the effect was temporary but if you know to look for it, it’s one way to capture a variation on the original behavior.

Why is the effect temporary? It’s temporary because as a behavior receives more reinforcement, it becomes less variable.  There’s a very narrow window where behavior is more variable before something is selected.  He had a series of pictures of a study they did teaching a rat to touch a pole for reinforcement. The rat is in a Skinner box and there’s a pole extending down from the ceiling. When the rat touches the pole, it gets reinforced. It doesn’t matter how the rat touches the pole.  In the early pictures, there are lots of variations on how each rat touched the pole. Some used one hand, some used two, some touched the top, others the bottom, etc…  But over time, they all started to look more the same.  He didn’t shape the posture, but the environment (location of pole and food hopper) shaped the rats so that they learned the most efficient way to do the behavior.

Once the rats were consistently touching the pole, then they put the rats into extinction so that touching the pole didn’t earn reinforcement. Now the rats started to offer more variable behavior.   This would continue until there was some reinforcement.  The reinforcement was not contingent upon any particular variation of touching the pole, so the rat could be touching the pole in any way when the food was delivered.

And here’s the interesting thing…as soon as one food pellet was delivered, the rats would immediately go back to offering the original highly reinforced behavior, NOT the one that they were doing prior to being reinforced. Wow. I’ve had this happen with my horses and could never quite figure out what was going on because in theory, they should repeat what was last clicked, but they didn’t.  They went back to a known behavior that had a strong reinforcement history under those conditions.

This led to a little discussion on extinction and patterns of behavior in extinction.   While there are some consistent patterns that describe how behavior changes in extinction, more research needs to be done.   Dr. Iversen did say that longer responses come later.  That makes sense because you try the easy options first.   This was not in Dr. Iversen’s talk but if you are curious about what we do know about patterns of behavior in extinction, you can look for previous articles I’ve written on resurgence.  Dr. Jesús Rosales-Ruiz has talked about resurgence at Clicker Expo and it’s in my notes for Clicker Expo 2014 (on my website).

Dr. Iversen’s main point was that understanding extinction is one of the keys to shaping.   Behavior can be shaped by using a mixture of reinforcement, extinction and response spreading/generalization. If you understand how these work and know when to use which one, then you will be able to successfully shape behavior.

How Can We Measure It? What are the Basic Mechanisms?

He illustrated this with a series of graphs that showed a theoretical view of shaping. In the first graph, the animal’s current behavior is shown and it looks like a single peak. There are some behaviors that are more likely to happen (the middle of the peak) and then there are others, of decreasing likelihood, that from the sides of the peak. (think of the classic bell curve, but more jagged).   The farther away from the middle you go, the less likely the behavior is to happen.  In shaping, the goal is to make one behavior (or variation) stronger.  If this is successfully done, then when you look at the graph, the peak will be shifted in the direction of the new behavior. You might end up with two separate peaks or with one peak that has been shifted in the direction of the new behavior.

He had some examples of shaping including shaping a dog to jump higher (done by B. F. Skinner), teaching a pigeon to put its head in a particular location, a girl being taught to walk (she was developmentally disabled) and some rats that were trained to jump out of the side of their cage.

He had a little comment here that some people argue that using reinforcement limits the learner’s choices because it can be such a powerful motivator. But he thinks that when we use reinforcement to teach behaviors or skills that are beneficial to the learner, we are giving them freedom because now they can do more things.

There was one other fun example of shaping that he shared at the end of the talk. This time it was unintentional but he shared it to show that if reinforcement is available, mother nature will select something, whether you intend to or not. He had a video of a rat in a Skinner box which was set up so that it would deliver reinforcement on a random basis. He left the rat and the data recording machine running and came back later to discover that the rat had been trained to do a full roll for the food pellet.  On the video you can see how this behavior was shaped, purely through random reinforcement.   It’s an example of how nature will select some behavior, regardless of whether you are intending to or not.

Can We Look at More Complex Shaping in the Laboratory?

One argument he hears is that studying shaping in the laboratory is not the same as studying shaping in real life. This is true, but there are advantages to studying shaping in the laboratory. One is that you can get good data.   He showed a printout that mapped the progress of a rat learning to push a lever.  The printout shows what behavior the rat was doing and looks like a very squiggly line when the rat is doing many behaviors (lots of up and down zig-zags) but flattens out when the rat starts to do one behavior more consistently.

By looking at the printout you can see (and measure) how many behaviors the rat does between reinforcements, how long it takes to learn to do the correct behavior consistently, etc… The printout also showed the relationship between extinction and reinforcement. The periods of high variability when the rat was searching for the right answer looked like periods of extinction. As the rat became more consistent about pressing the lever, the periods of extinction got shorter and shorter until there was a nice pattern showing  lever press -> go to get food -> lever press.

A more complex example would be teaching a rat to press two levers. In his example, the rat was trained to press the left lever and then the right lever. The food hopper was between them.  It takes the rat longer to learn this pattern and it does a lot of unnecessary pressing of the right lever because that is on the one that immediately precedes the food delivery, but it does eventually learn to press the left lever and then the right lever without any extra lever presses. I thought it was interesting that the rat continued to check the food hopper between lever presses so the loop was actually Left lever -> peek in feed hopper -> right lever -> get reinforcement from food hopper.  I guess that’s the rat equivalent of mugging you or checking you for treats.  In the question and answer session afterward someone asked how to get rid of the “checking for food” behavior as they see it sometimes in dogs that should be focused on doing behavior, but are checking in with their handler.  Dr. Iversen said that in his case, he could move the levers away from the food hopper.  If that wasn’t possible, then you would have to make the reinforcement contingent on a clean pass from the left to the right lever (no looking for food).

So now that the rat can press the left and then the right lever, what happens if we reverse it? Can the rat learn to press the right and then left lever?  Dr. Iversen pointed out that this pattern is already in the rat’s repertoire because it does go from right to left as part of the pattern.  The pattern is left lever -> right lever -> food -> left lever.    All they are doing is changing where the reinforcement is in the sequence.   So it should be easy, right? Well, no.  The rat just presses the left lever. It turns out that the existing pattern has to be broken up in order for it to learn the new pattern.  I think he said it took 7 sessions before the rat was consistently going from left to right.

I think this has implications for chains or sequences when we want to change the order of behaviors. An animal that has learned to do a series of behaviors in one order is not necessarily going to immediately be able to do them in a different order.  The process involves separating existing units as well as making new combinations and that makes it more complex.

Chimpanzee Work:

The last part of his talk was about some work he did in Japan teaching chimpanzees to trace letters and shapes on a touch screen.  It was interesting to hear him talk about it because the work was very different than what he does with rats in the lab.  For one thing, the chimpanzees were loose in an enclosure and could leave the session at any time. He said that he had to be careful with his reinforcement because if it was too low, they would leave.

Another difference was that this project was about teaching a new behavior that required precision, so he had to take that into account in his shaping sessions.   When you are shaping, it’s important to consider whether you are looking for an increase in variability as you might do if you are trying to shape completely novel behaviors, or if you are trying to refine an existing behavior.  One approach might require the careful use of extinction whereas the other would not.

The chimpanzees he used for this project had already been trained to do other behaviors, including touching the touch screen. But they had been taught to touch it as one would do if pressing a button. So if there were two dots to touch, they would touch one, lift their finger up and touch the other. Dr. Iversen wanted to teach them to slide their finger along a line.

Here is a simple list of the steps that were used to shape “tracing.”

  1. Teach the chimpanzee to touch a row of dots on the screen. In the first approximation, the chimp had to touch all the dots, but it could be in any order. The dots were spaced a short distance apart.
  2. Teach the chimpanzee to touch the dots in order (left to right or right to left). At this point the chimp is touching one dot, lifting its finger and touching the next. The dots are still as short distance apart.
  3. Continue with the row of dots, but place them closer together, then touching and then overlapping. As the dots got closer together, the chimp would lift its fingers less and eventually would slide from one dot to the next.
  4. Put an outline around the overlapping dots so it was like a wide line with dots inside. Then fade out the individual dots so that the chimp was just tracing its along a wide line.
  5. Once the chimpanzee could do straight lines, it had to learn to go around corners without lifting its finger. I think he said they started to do this naturally once the lines got longer and he added turns. He did say that if asked to do a circle, the chimp would start at the top and do the left side, then go back to the top (finger off) and trace down the right side. So it did two half circles instead of a continuous movement. It took a while to train them to go all the way around in one continuous movement.
  6. Once they could trace figures, they were taught to trace letters, do finger mazes and other tasks like sorting where they had to put their finger on the object and drag it to the bottom of the screen.

One interesting thing he mentioned was that when the chimpanzee was done with the task on the screen it had to indicate this by pressing a button or touching a symbol. This “I’m done” signal was important. If he didn’t have one, then the chimp would keep drawing until it heard the “beep” (the beep indicated a correct answer) and would not stop on its own when it reached the end of the line.

Teaching chimpanzees to draw might seem like it doesn’t have any practical application, but their drawing skills can be used in other projects.   This kind of research also has the potential to provide valuable information about how to communicate with people who are paralyzed or disabled by diseases like ALS.

Thank you to Dr. Iver Iversen for allowing me to share my notes and to everyone at ORCA for putting on a great conference.  You can learn more about the conference by going to their FB page: https://www.facebook.com/The-Art-and-Science-of-Animal-Training-1460845514215463/?fref=ts.

Categories: Uncategorized

Tags: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s