Notes from the Art and Science of Animal Training Conference (ORCA): Choice

maze2
The idea of choice was one of the underlying themes of the conference and is always an important consideration for positive reinforcement based animal trainers. At some level, animal training is about teaching an animal to do the behaviors we want, and to do them when we want them, but there are many different ways to go about getting there.

This conference has always been about exploring how we can achieve our goals, while ensuring that the learning process is enjoyable, the learner is allowed to actively participate in the process, and that he becomes empowered by his new skills and his relationship with his trainer.

There were a lot of presentations that touched on some aspect of choice.

  • Dr. Killeen spoke on how understanding the use of the Premack Principle opens up more choices for reinforcement and can lead to a better understanding of how the value of different reinforcers can change depending upon environmental conditions.
  • Emily Larlham talked about how we can teach our dogs to make different choices instead of becoming stuck in behavior patterns that create stress for both parties. She also talked about how important attitude was in training and how being allowed to actively participate and make choices contributes to a dog’s enthusiasm for any chosen activity.
  • Alexandra Kurland talked about how trainers make choices based on the kind of relationship they want to have with their learner (authoritarian vs. nurturing), and how these decisions influence how much choice they give their animals.
  • Barbara Heidenreich provided lots of examples of how to provide choice through more types of reinforcers and a discussion of why it’s important for both the trainer and the learner to have options.
  • Dr. Andronis showed what happens when animals have limited choices about when and how to earn reinforcement, and how to recognize behaviors that indicate that the learner is no longer enjoying and engaged in the learning process.

These are just a few of the references to choice that came up in the other presentations, but they show that animal trainers have to think about choice all the time.  Sometimes we are looking for ways to increase it.  Sometimes we are looking for ways limit it so the animal cannot practice behavior we don’t want.  And sometimes we are educating the animal so he learns to make “good” choices.

Since choice is such an important topic, I saved it for last. The notes in this article are based on two presentations that dealt more specifically with choice.  The first presentation was given by Jesus Rosales-Ruiz and was titled “Premack and Freedom.” The second presentation was given by Ken Ramirez and was titled “Teaching an animal to say ‘No.'” They go together nicely because Jesús talked about choice from a more academic point of view and Ken talked about how to use choice as part of training.

Jesús Rosales-Ruiz: “Premack and Freedom.”

He started with a quote from David Premack:

“Reward and Punishment vs. Freedom”

“Only those who lack goods can be rewarded or punished. Only they can be induced to increase their low probability responses to gain goods they lack, or be forced to make low probability responses for goods that do not belong to them. Trapped by contingencies, vulnerable to the control of others, the poor are anything but free.”

This quote puts a little different perspective on the how reinforcement and punishment work and the idea of choice.  If you desperately need something and you work to get it, is that really choice?   And how are does that relate to reinforcement? We tend to think that by offering reinforcement, we are giving choices, but are we really doing this? Let’s look more closely at reinforcement and then see how it relates to choice.

How we think about reinforcement

The Language of Reinforcement

  • reinforcement as a stimulus (thing)
  • reinforcement as a behavior (activity)

Originally, it was more common to think of reinforcement as being about getting objects such as food, a toy or other item.  But, in Dr. Killeen’s talk, he spoke about the importance of recognizing that reinforcers are behaviors, an idea that came from Dr. Premack. There are some advantages to thinking of behaviors as reinforcers, because this change in thinking opens up the possibility for more types of reinforcers, and also makes it more obvious that the value of a reinforcer is variable.

Not all behaviors are going to be reinforcing at all times, and in some cases, there is reversibility between reinforcers and punishers. Dr. Premack did experiments showing the reversibility of reinforcers. He could set up contingencies that would make lever pressing, running and drinking all reinforcers (at different times), and he did this by adjusting the environment (adding constraints) so that access to some activities was contingent on other activities.

Want a drink? You have to run first. Running is reinforced by drinking.  Want to run? You have to have a drink first.  Drinking is reinforced by access to running.  He showed that behaviors can be either punishers or reinforcers,  depending upon the circumstances. So, perhaps the availability of reinforcers is not what defines choice.

What about freedom? Is choice about having freedom?

There are different ways we can think about freedom:

  • Freedom from aversive stimulation
  • Freedom to do what we want (positive reinforcement)
  • More generally (freedom from control)

How do these ideas about freedom apply to animal training?  And can they help us understand more about choice in animal training?

Clicker trainers meet all three of these “definitions,” to some degree.  Jesus went over this pretty quickly but it’s quite easy to see how clicker training can contribute to an animal’s sense of freedom or being able to make choices.  Clicker trainers avoid using aversives by shaping behavior using positive reinforcement.  They also avoid using aversives when the animal gives the “wrong” answer or does an unwanted behavior.

Clicker trainers don’t necessarily give the animal freedom to do whatever it wants, but they do use positive reinforcement, and over time positively reinforced behaviors do often become what the animal wants to do.  Also, during shaping, the trainer may want the animal to offer a variety of behaviors, so there is some element of choice there.

They may also choose to train behaviors or set up training so the animal has more control of its own training.  A lot of clicker trainers focus on training behaviors that the animal can use to communicate with the trainer so this can give the animal some feeling of control. We are still focusing on what we want them to do, but we are doing it in such a way that they feel they have more control.

At this point, Jesús mentioned that he doesn’t believe that total freedom exists.  If it did, then science would not exist because there would be no “laws.” Our behavior is always determined by something and while we may think we can control it, what we are really after is the feeling that we can control it, which may or may not be true.

I have to confess that at this point I found myself remembering endless college discussions about “free will” and whether or not it exists. I don’t think we need to go there, but I do think that it’s important to realize that it may sometimes be more accurate to say that our goal is for the animal to have the perception of control. Interestingly enough, one of the points that Steve White made was that perception drives behavior, not reality.

Let’s look at how we can control behavior in animal training:

With the idea of freedom and choice in mind, let’s look at four different ways to control behavior in animal training.

1. Control only through aversives:
(note: I added this one for completeness. Jesús referred to it but did not list it on his slides)

  • target behavior occurs -> animal is left alone or aversive ceases
  • target behavior absent -> aversive consequences
  • the animal has no control, choices or freedom

2.  Control through aversives, but with some added positive reinforcement

  • target behavior occurs -> positive consequences
  • target behavior absent -> aversive consequences
  • the animal can gain reinforcement, but it still does not have a choice and cues can become “poisoned” because the cue can be followed by either a positive or negative consequence, depending upon how the animal responds.

3.  Control through positive reinforcement

  • target behavior occurs -> positive consequences
  • target behavior absent -> no consequences
  • the animal can now “choose” whether or not it wants to gain reinforcement, without having to worry about aversive consequences for some choices.  But is it really choice if one option earns reinforcement and the other does not?

4.  Control through positive reinforcement with choices

  • target behavior occurs -> positive consequences
  • target behavior absent -> other positive consequences
  • animal always has another option for a way to earn reinforcement, so there is true choice between two options that both lead to reinforcement.

Jesús shared a couple of videos that showed animals making choices.  He did not spend a lot of time on these, so I can only provide a brief description and the point he was trying to make.

The first video was of a training session with a dog using clicker training and food.  The dog is loose and can participate or not.  As soon as the trainer starts clicking and treating, the dog leaves.  He followed that video with another one where the dog is trained with food alone (no clicker).  In this case, the dog stays and continues doing behaviors for reinforcement.  He said this was an example of a situation where the clicker itself was associated with “hard work” so when the clicker came out, the dog would leave.  He didn’t go into more details on the dog’s previous training history but those clips do suggest that the clicker itself can become an aversive stimulus.

The second video showed a horse being taught using clicker training and food. The horse is loose and the trainer is sitting in a chair.  Throughout the training session, the horse is eagerly offering a behavior, which the trainer clicks and reinforces with food. But, because the trainer is feeding small pellets of grain, some of the food is falling on the ground around the horse.  Despite the abundance of food on the ground, the horse prefers to keep offering behavior, and getting reinforced for it, rather than just eating the food off the ground. Jesús said this was an example of “real choice” because the same food reinforcement was available whether the horse did the behavior or not.

So far we have looked at how reinforcement and freedom contribute to the idea of providing choice for animals.  But there’s another consideration, and that’s hidden within the idea of repertoire size.

Restraint, constraint and the effect of repertoire size

Positive reinforcement trainers are usually very aware of the effect of restraint on animals and try to train under conditions in which the animal is not physically restrained.  They are also aware of the effect of constraint, but it seems to get less attention.   Constraint is when we control the “goodies” or limit the animal’s ways to earn them.

Jesús said it was important to avoid constraint in training.  Constraint can be physical, which means that the animal is in an environment where it might be “free” to move, but there are very few options for things to do.  A Skinner box is an example of an environment where the animal is constrained because the number of behaviors it can do is quite limited.

But, constraint is not always physical. It can also be more about skills or the repertoire of behaviors that are available to an individual. You could say this is a kind of “mental” constraint where the individual feels it has few options because it is only comfortable doing a few things.

He used some human examples to illustrate this. For example,  a person who has several skills has more freedom from constraint than someone who is only good at one thing.   If you are good at debating, dancing, and social interactions, then you can go to a debate, dance or eat lunch with your friends. If you are only good at debating (even if you’re really good at it), but you lack other skills, especially social ones that are important for many activities, then you are constrained by your own repertoire.

He called this being coerced by available behavior, because your options are limited if you have a limited repertoire. At the end of the presentation, Joe Layng made the comment that “feeling” free is actually being free. If you only have one way of getting the extraneous consequence, then you are still limited. Joe’s comment reminded me of Steve’s point that perception, not reality, drives behavior.  It’s always interesting to see how these things come together.

So, one way to limit constraint and increase choices in animal training is to increase the animal’s behavioral repertoire.  This gives them more choices on several different levels because they have more options for reinforceable behavior overall, and it also may make it possible for you to give them more options at any given time.

To summarize:

Choice is not just about providing reinforcement or about removing aversives.  It’s about providing the animal with opportunities to earn reinforcement in many ways and increasing the animal’s repertoire so it has the skills and opportunities to practice many different behaviors.

Remember that:

  • A small repertoire = more constraint
  • limited skills = limited opportunities

If our goal is to increase freedom, then we need to be aware that individuals can be constrained by the available environment and available behavior.

Jesús ended with the question, “If my dog will only walk with me when I have treats, why is that?”  

That’s kind of a loaded question and if I didn’t know Jesús was in favor if using treats, I might think he was suggesting that using treats was a problem.
But I don’t think he was saying that at all. I think he was just encouraging the audience to think about what it means if your dog will only walk with you if you have food.  What does that say about the choices he is making?  Does it tell you anything about how much you might be using food to limit his choices, not to give him more choices?

At the end of Jesus’s presentation, I found myself pondering the practical application of the material he presented.  Yes, I love the idea of giving animal’s choices and I know from personal experience that adding reinforcement is not the same as giving choices. But, I was thinking hard about what training would look like if several behaviors were all capable of receiving the same amount of reinforcement. The whole idea of clicker training is that we can select out and shape behavior by using differential reinforcement.

So, what would happen if you had several behaviors that could earn equal reinforcement? Well, lucky for me, Ken Ramirez’s presentation later in the day was on this exact topic.

Ken Ramirez:  Teaching an animal to say “No”

In his presentation, Ken shared some training that he did with a beluga whale who had become reluctant to participate in training sessions.  He started by saying that while his talk is titled, “Teaching an animal to say ‘No,'” he realizes that that phrase is just a convenient way to describe what they did, and is not necessarily how the animal perceived the training.

He spent a little time talking about terms like “no” and “choice.” They are labels we give to ideas so we can talk about them, but that’s not useful unless we make sure we are all using them in the same way, or have a common reference point.  He shared what he means by teaching “no,” choice, and how the two are related.

What is “no”?

  • Teaching “no” means teaching another reinforceable behavior, one that the animal can choose to do instead of the behavior that has been cued. In the example he’s going to share, they taught the whale that she could always touch a target for reinforcement.
  • Teaching “no” is different than teaching intelligent disobedience, which is more about teaching the animal that some cues override other cues. It’s also different than a Go, No-Go paradigm (hearing test where you don’t respond if you don’t hear the tone), or an “all clear” in scent detection which is just a new contextual cue.
  • We can only guess why the animal chooses to say “no.”  When the whale did touch the target, they had no way of knowing if she was doing it because it was easier, had a stronger reinforcement history, she didn’t want to do the other cued behavior, or …
  • But, regardless of why she chose it,  the value was that it gave her another option besides responding to the cue or ignoring the cue.  Tracking her “no” responses had the added benefit of allowing the trainer to gather information about her preferences and whether or not there was any pattern to when she chose to say “no.”

What is choice?

(these points should be familiar, as they are very similar to what Jesús discussed)

  • It is hard to define
  • Arguably, no situation every provides true choice (there are always consequences)
  • In “true choice,” the animal has the option to receive reinforcement in more than one manner (this goes along with Jesús’s point about true choice being where there are multiple ways to earn the same reinforcement).
  • It is about controlling outcomes

Choice Matters:

  • Real choice is rare
  • Choice is often forced (meaning it is limited, or only one option has a positive consequence)
  • Choice is a primary reinforcer (animals can be reinforced by the opportunity to control their own environment)

Choice in animal training:  a little history

The introduction of positive reinforcement training into zoos and other animal care facilities made it possible for trainers to choose training strategies that allowed their animals more choices.  In the beginning, it may just have been giving the animal a choice between earning reinforcement or not, but over time the training has gotten more sophisticated so that animals have more choices and can actively choose their level of involvement in training sessions.

Ken had some video clips that showed the ways that trainers in zoos can provide choice during husbandry behaviors.  One common practice is to teach stationing, which can be used to teach animals to stay in a specific location for husbandry behaviors.  The animal can choose whether or not to participate by either going to the station, or not.

Another option is to teach the animal an “I’m ready” behavior, which the animal offers when it is ready to start or continue. The trainer does not start until the animal offers the behavior, and she may pause and wait for the animal to offer it again, at appropriate intervals during the session, to make sure the animal is ready to continue.  Some common “I’m ready” behaviors are chin rests, targeting, the bucket game (Chirag Patel), and stationing.  These methods give the animal some choice because the animal is taught a specific way to tell the trainer whether he wants to participate or not.

Teaching stationing and “I’m ready” behaviors are examples of ways that trainers can give their animals more choices.  Teaching these kinds of behaviors usually leads to training that is more comfortable and relaxed for both the trainer and the learner.  A side benefit is that the trainers become much more skilled at observing their animals and paying attention to body language. And, they learn to wait until the animal is ready, which is always a good thing!

Husbandry behaviors can be unpleasant, but allowing the animal to control the timing and pace of the training can make a big difference in how the animal feels about what needs to be done.   However, this may still be quite different than providing “true choice.”

So, what does “true choice” look like?  This was the main part of Ken’s presentation.

The “No” Project:

The “no” project is the story of re-training Kayavak, a young beluga whale.  Kayavak was born at the Shedd Aquarium and has been trained for her entire life (5+ years) using positive reinforcement.  During that time, she developed strong relationships with several trainers and she would work both for food and for secondary reinforcers, especially tongue tickles.  She was easy to handle and responded well (to criteria and with fluency) to her cues.

In fact, she was so agreeable that they often let the younger and more inexperienced trainers work with her.  But, as she started to have more training sessions with less experienced trainers, and less with more advanced trainers, her behavior started to change. She became less reliable about responding to cues and was likely to just swim away, especially if they were working on medical behaviors.

This continued for several years until the “problem” was finally brought to Ken’s attention. By this time the staff was very frustrated and they needed to find a solution. Ken said that part of the reason it took so long for the problem to get to him was that she was handled by different trainers and it took a while for a clear pattern to appear.

Of course, the first thing one wonders is “What happened?”  Looking back, Ken’s best guess is that the change in her behavior was the result of many small mistakes that accumulated over time. None of them were significant events, but added up, they undermined her confidence and made her reluctant to participate in training sessions.

Here are some of the contributing factors:

  • She was trained more and more often by young trainers without strong relationships
  • They misread early, mild signs of frustration, and didn’t adjust
  • They used a LRS (least responsive scenario) inappropriately and it became long enough that it was more of a TO (Time Out.)
  • They felt pressure to get behavior and asked for it again after refusal, instead of asking for another behavior, taking a break, or one of many other options.
  • The problem exacerbated over time and she discriminated against less experienced or unknown trainers.

The Solution

Ken proposed a unique solution.  He felt that Kayavak needed to have a way to say “no.”  He thought she might be feeling as if she didn’t have any choices, and that was why she would just leave.  But, if she had a way to say “no,” and got reinforced for it, perhaps she would choose to remain and would learn to become more engaged in training again.

He suggested that they teach her to touch a target (a buoy tethered by the pool edge).  Touching the target would ALWAYS be a reinforceable behavior, and would be reinforced in the same way as other behaviors.  This was an important point.  They didn’t reinforce targeting with a lesser value reinforcer. They made sure the reinforcement for targeting was equivalent to the reinforcement for other behaviors.

While teaching “no,” might seem like a radical idea, Ken mentioned that he had a few reasons he thought this might work. One was some training he had seen at another aquarium where a sea lion was taught to touch a target at the end of his session so the trainer could leave.  The sea lion learned to do this, but also started touching the target at other times, and seemed to be using it to indicate when he wanted to be done.

The challenge was convincing the staff to try it.  They doubted it would work, because why would she do any of the “harder” behaviors if she could just touch the target all the time? This was a good question, and gets to the heart of what some clicker trainers believe, which is that animals like doing behaviors that have been trained with positive reinforcement.  But, if this is really true, then shouldn’t she be just as happy to do a variety of positive reinforcement trained behaviors instead of just repeating the same one over and over?

Since Ken is the boss, he convinced them to give it a try…

The training

  • Place a buoy close to where she is working and teach her to target it.
  • Practice targeting the buoy until it’s a very strong behavior.
  • Start mixing in some easy behaviors, but ask for targeting the buoy in between.  So they might cue behavior 1 -> click -> reinforce -> target buoy -> click -> reinforce -> behavior 1 (or 2) -> click -> reinforce -> target buoy ->…
  • Increase the number of other behaviors and/or difficulty, still mixing in targeting the buoy on a regular basis.
  • Throughout this process, she can touch the buoy as often as she likes. So if the trainer wants to alternate buoy touches with cued behaviors, and Kayavak offers several buoy touches in a row, she still gets clicked for all of them. If she makes an error, doesn’t get clicked and then touches the buoy, she gets clicked and reinforced.  A buoy touch is always a reinforceable option.

Evolution of a behavior

Ken showed us how the training progressed. He had some video of her at the various stages and some great charts to show how the number of buoy touches changed over time.  I thought this part was really fascinating because it showed how important it was to allow her to find her own way to use “no” and how challenging it was for the trainers to stick with her through the process!

After 3 weeks:

  • She touched it all the time, after every cue.
  • The staff thought her behavior meant that it wasn’t going to work.

At 4 weeks:

  • She started to work well and only chose the buoy under specific conditions, such as when ill, when wrong or no marker heard, when asked to do a medical behavior, with a new trainer, or if working with a trainer with whom she didn’t have a good relationship.
  • He had a clip of a training session with a trainer she didn’t like. She would touch the buoy repeatedly, usually doing it before the trainer had time to cue another behavior.
  • I asked Ken if they kept the session length the same, even if all she wanted to do was touch the buoy and he said “yes.”  That was partly because her training session is how she gets fed, but also because touching the buoy repeatedly was not “wrong.” If that’s what she felt like doing that day, that was fine.
  • With a trainer she trusted, she might not touch the buoy if she didn’t want to do a behavior, but would wait for the next cue instead.

At 4 months:

  • There were almost no refusals with experienced staff.
  • She still tests new trainers for a period of time.
  • They did use the buoy during free shaping, but she rarely touched it.  If she did, it was a sign that the trainer was not slicing the behavior finely enough.
  • They can use the buoy to test if she likes a behavior – which one does she do?
  • He had some nice charts showing how her behavior changed (#right answers, buoy touches, refusals).  They showed how she would test new trainers and then over time, the “no” behavior would get offered less and less.

Is this a useful approach?

Since doing this training with Kayavak, Ken has done the same thing with a sea lion and two dogs. They were all cases where the animal had lost confidence in the training.  Having a default behavior that was always reinforceable meant they always had a way to earn reinforcement and it gave them choices.

He did find that, as with Kayavak, once the “no” behavior had been learned, the animals were fairly discriminating in when they used it. They might offer “no” instead of doing a cued behavior if the cued behavior was difficult, uncomfortable, or unknown.  They also might offer it after an error.

Despite his success, he’s not sure you should use it with all animals. Usually if an animal is trained with positive reinforcement, it already has lots of ways to say “no,” so it’s not necessary to teach another one.  It may be more useful to work on your own training or observation skills so you notice the first signs of frustration and can adjust before the animal reaches the point where it needs to say “no.”

There may also be difficulties if you teach it too early because the animal might get “stuck” on that behavior. This point made me think of Jesus and his comments about the danger of having a limited repertoire. Ken thinks it’s better to teach the animal a larger repertoire and then add a “no” behavior if needed, either because the relationship has broken down, or the animal has lost confidence. If you do teach a “no” behavior, it’s important to choose an appropriate one, either one that is useful or is a starting point for other behaviors.

I enjoy Ken’s presentations because he always has the coolest projects and approaches them with a great blend of practical and scientific knowledge.  At some point in his presentation, he mentioned that the “no” project brought together a lot of scientific principles, including matching law, contra-freeloading, Premack, and others.  But he also said that he used what he had learned from observing other trainers, or observing the animals themselves.  I think this project was a great example of how we can give animals more choices as long as we have a well thought out plan and are willing to take the time to see it through.

This is the last of the articles I am planning on writing on the ASAT conference.  I have lots of ideas for what to do with what I learned from the conference, and may blog about some of my own training later this summer. In the meantime, I hope something in these articles has caught your attention and inspired you to go out and try something new.   I want to end by thanking all the speakers for their permission to share my notes. I also want to thank all the ORCA students who work hard to put to plan and run the conference. They are already busy planning the conference for next year and it will take place on March 24-25, 2018 in Irving, Texas.

 

Notes from the Art and Science of Animal Training Conference (ORCA): Dr. Jesús Rosales-Ruiz on “Conditioned Reinforcers are Worth Maintaining.”

click treat 1.jpg

In this short presentation, Jesús Rosales-Ruiz revisited the question:

“Do I have to treat every time I click?”

He said that this question constantly comes up and that different trainers have different answers.

Before I share the details of his presentation, I want to mention that he said he chose to use the words “click” and “treat” because he was trying to avoid using too much scientific jargon.    But, as he pointed out at the end of his talk, it would be more accurate to say “click and reinforce,” and probably even more accurate to say “mark and reinforce.”

Since he used “click and treat,” I’m using the same words in these notes, but you should remember that he is really looking at the larger question of how we use conditioned reinforcers and whether or not they always need to be followed by a primary reinforcer in order to maintain their effectiveness.

Back to the question…

Do you have to treat after every click?

Some say YES:

  • Otherwise the effectiveness of click may be weakened
  • Bob Bailey says: “NEVER sound the bridging stimulus idly (just to be ‘fiddling’) or teasing…it’s important that the “meaning” of the bridging stimulus is kept unambiguous and clear. It should ALWAYS signify the same event- The primary reinforcer.”  How To Train a Chicken (1997) Marian Breland Bailey, PhD and Robert E Bailey.
  • This view is supported by research that shows that the conditioned reinforcer should be a reliable predictor of the unconditioned reinforcer.

Some say NO:

  • Once a click is charged, you only have to treat occasionally
  • Once a behavior is learned, you only have to treat occasionally
  • Supported by research on extinction (in general, this means that if an animal learns that not every correct answer is reinforced, then it will keep offering the correct answer for some period of time, even if there’s no reinforcement.

So maybe there is some research for both.

He said that he started thinking about this question again after reading a blog by Patricia McConnell, who was sharing some thoughts on whether or not to treat after every click. She was wondering why clicker trainers recommend it, but other positive reinforcement trainers do not.

Patricia McConnell wrote:

  • “For many years I have wondered why standard clicker training always follows a click with a treat.”
  • “Karen Pryor strongly advocates for us to reinforce every click (secondary reinforcer) with a treat (primary reinforcer). Ken Ramirez, one of the best animal trainers in the world, in my opinion, always follows a click with a treat.”
  • “But Gadbois went farther, given the link between motivation and anticipation, suggesting that it was important to balance the “seeking” and “liking” systems, with more emphasis on the former than the latter during training. He strongly advocates for not following every click (which creates anticipation) with a treat, far from it, for the reasons described above.”

You can read the blog at: http://www.patriciamcconnell.com/theotherendoftheleash/click-and-always-treat-or-not.

If you have not heard of Simon Gadbois, you can read about him here: https://www.dal.ca/academics/programs/undergraduate/psychology/a_day_in_the_life/professors/simon-gadbois.html.

What happens if you don’t treat after every click?

Jesús was intrigued by Gadbois’s statement that you don’t want, or need to treat after every click because you want to balance “liking” with “seeking.” And that if you don’t treat after every click, you get more seeking.

One reason for his interest was that he already knew of an experiment that had been done to look at what happens if you don’t follow every click with a treat.  About 10 years ago, one of his students wanted to compare how the behavior of a dog trained under conditions where one click = one treat was different than a dog that was trained with multiple clicks before the treat.   The two conditions looked like this:

  • one click = one treat:  The trainer clicked and treated as normal after every correct response:  cue -> behavior -> click -> treat -> cue -> behavior ->click -> treat.
  • two clicks = one treat:  The trainer clicked for a correct response, cued another behavior and clicked and treated after the correct response: cue -> behavior -> click -> cue -> behavior -> click -> treat.

These dogs were tested by asking for previously trained behaviors. Each dog was trained under both conditions so some training sessions were under one click = one treat and some were done under two clicks = one treat.  There were multiple reversals so the dogs went back and forth between the two conditions several times over the course of the experiment.

Under the one click = one treat condition, the dogs continued to perform as they had in training sessions prior to the start of the experiment. Under the two clicks = one treat condition, both dogs showed frustration behaviors, deterioration in behavior and at times the dog would leave the session.

There were many factors that could have contributed to the result, including the fact the dogs were originally trained under one click = one treat,  the reversals themselves could have caused confusion, and the dogs might have done better if they were transitioned more gradually.  But, it was pretty clear that omitting the treat did not activate the seeking system, instead it created frustration. Why?

They considered two possibilities:

  • Perhaps because they were getting less food? Under the one click = one treat condition, each dog was getting twice as much food reinforcement as the dog training under the two clicks = one treat condition.
  • Properties of the click had changed.  What does the click mean to the dog?

Can we test if it’s about the decrease in food reinforcers?

If you want to test what happens when you click without treating, you have to change the ratio of clicks to treats. You can do that by omitting some treats, or by adding some clicks. But both options are probably not going to be perceived in the same way by the animal.

In the experiment described above, the trainer changed the ratio of clicks to treats by omitting food reinforcers after half the clicks. This is a significant decrease in the number of primary reinforcers that the dog was receiving. Could the results be more about the reduction in food reinforcers, than about whether or not each click was followed by a treat?

One way to test this would be to keep the number of food reinforcers the same, but add another click.  To do this, the trainer taught the dog to do two behaviors for one click.  The dog would touch two objects. When he touched the second object, he would get clicked and treated.

Once this behavior had been learned, the trainer decided to add another click by clicking for the first object, clicking for the second object and then treating. So the pattern would be behavior (touch) -> click -> behavior (touch) -> click -> treat. This works out to clicking after every second behavior, but the trainer got there by adding a click, not by removing a treat.

What she found was that the dog just got confused.  The dog would orient to the trainer on the first click, get no response, go back to the objects and touch again (either one).  Or he might just wait and look at the trainer, or he might leave. The additional click didn’t seem to promote seeking. Instead it interrupted the behavior and created confusion.

Why?  Well, perhaps it has to do the two functions of conditioned reinforcers. This goes along with the second point above, which is that the difference was due to how the click was being used.

The 2 Functions of Conditioned Reinforcers:

Let’s take a moment and look more closely at conditioned reinforcers.  Conditioned reinforcers are stimuli that become reinforcers through association with other reinforcers.  They usually have no inherent value. Instead, their value comes from being closely associated with another strong reinforcer for a period of time, (while it is being “conditioned”), and this association must be maintained through regular pairings in order for the conditioned reinforcer to retain its value.

In training, this is usually done by deliberately pairing the new stimulus with a primary reinforcer.  There are different kinds of conditioned reinforcers and their meaning and value will depend upon how they were conditioned and how they are used.  Marker signals (the click), cues, and keep going signals (KGS) are all examples of conditioned reinforcers.

Regardless of the type, all conditioned reinforcers have two functions. They are:

  • Reinforcing
  • Discriminating (they can function either as cues or event markers, or both)

Conditioned reinforcers are not just used in training and laboratory experiments.  They are everywhere.

Jesús used the example of a sign, which is a conditioned reinforcer for someone driving to a specific destination.  Let’s say you are driving to Boston and you see a sign that says “Boston, 132 miles.” The sign provides reinforcement because it tells you that you are going the right way. It also has a discriminatory function because it provides information about what to do next, telling you to stay on this road to get to Boston.

When talking about conditioned reinforcers, it’s easy to focus on only one of these functions.  Is this why there is confusion?  Perhaps the debate over whether or not to treat after every click is because some trainers are focused on the discriminating function of the click and others are focused on the reinforcing function of the click?

What does training look like if the focus is on the discriminating function?

When every click is followed by a treat, the click has a very specific discriminating function. It tells the animal it has met criteria and reinforcement is coming.  The trainer can choose what the animal does upon hearing the click (stop, go to a food station, orient to the trainer), so the trainer has to decide what behavior she wants the animal to do upon hearing the click. But, regardless of which you choose, the click functions to cue another behavior which is the start of the reinforcement process.

A lot of one click = one treat trainers emphasize the importance of the click as a communication tool.  There are two aspects to this. One is that it marks the behavior they want to reinforce and the other is that it tells the animal to end the behavior and get reinforcement. If the click is always followed by a treat, the meaning of the click remains clear and it provides clear and consistent information to the animal.

You can think of the click -> treat as part of a behavior chain, where the click has both a reinforcing function, from the association (click = treat), and also an operant function (click = do this).  Clicker trainers who promote the one click = one treat protocol still recognize that the click itself has value as a reinforcer, but they choose to focus on the click as an event marker and as a cue, more than as a reinforcer.

What does training look like if the focus is on the reinforcing function?

A lot of trainers who treat intermittently (not after every click) emphasize that the click is a reinforcer in itself, so it’s not necessary to also provide a treat after every click. They are looking at the reinforcing function of a conditioned reinforcer and would argue that the whole point of having a conditioned reinforcer is so that you don’t have to follow it with another reinforcer every time.

They are still using the discriminating function of the click because it can be used to mark behavior.  But, the click does not become an accurate predictor of the start of the reinforcement phase, so it is not going to have the same cue function as it does under the one click = one treat condition.

Jesús did mention that if the click is not a reliable cue for the start of the reinforcement process, then the animal will look for a more reliable way to tell when it will be reinforced. In most cases, the animal finds a new “cue” that tells it when to expect reinforcement and the click functions as a Keep Going Signal. If the animal can’t find a reliable cue for the start of reinforcement, or if it’s not clear when the conditioned reinforcer will be followed by reinforcement, and when it won’t, then he will get frustrated.

Back to the Literature…

With this information in mind, what can we learn by going back and looking at the research on conditioned reinforcers?  Well, it turns out that the literature is incomplete for several different reasons:

  • It doesn’t look at the cue function of the conditioned reinforcer.
  • Animals in the lab are often restrained or constrained (limited in their options) so the cue function of the conditioned reinforcer may be more difficult to observe.
  • It doesn’t take into account that the most consistent predictor of food is the sound of the food magazine as it delivers the reinforcement.   Even when testing other conditioned reinforcers, the sound of the food magazine is what predicts the delivery of the food reinforcement, and it’s on a one “sound” = one “treat” schedule.
  • To test a conditioned reinforcer that as sometimes followed by food and sometimes not, you would have to use two feeders, one with food and one without and even then you would have to worry about vibrations. Most labs are not set up with two feeders so this work has not really been done.

 
He also mentioned that a lot of what we know about conditioned reinforcers in the lab is from research where the conditioned reinforcer was used as a Keep Going Signal (KGS), and not as a marker or terminal bridge.

I asked Jesús if he had an example of an experiment using a conditioned reinforcer as a KGS and he sent me an article about a study that looked at the effect of conditioned reinforcers on button pushing in a chimpanzee.

The chimpanzee could work under two different conditions. In one condition, he had to push the button 4,0oo times (yikes!) and after the 4,000th push, a light over the hopper would flash and his food reinforcement would be delivered. In the other condition, he also had to press the lever 4,000 times, but a light would flash over the hopper after every 400 pushes, and then again at the end when the food was delivered after the 4,000th push.

The chimpanzee was tested under both conditions for 31 days and the results showed that he worked faster and with fewer pauses until he got to the 4,000th push when he was reinforced by the flashing light every 400 pushes.

Once the chimpanzee had been tested under both conditions for 31 days, they started the second part of the experiment.  In this part, the chimpanzee could choose the condition (by pressing another button) and he usually chose the one where the light flashed after every 400 pushes.

So, having a Keep Going Signal improved the speed at which the chimpanzee completed the 4000 pushes and was also the condition preferred by the chimpanzee.  This suggests that Keep Going Signals can be useful and an animal may prefer to get some kind of feedback.

In this experiment, the conditioned reinforcer they were testing (the flashing light) was functioning as a KGS and the sound of the food magazine was what told the chimpanzee that he had met criteria.  So, this is an interesting experiment about conditioned reinforcers as Keep Going Signals, but it also shows the difficulty of separating out the conditioned reinforcer from the stimulus that predicts food delivery.

An example of training a KGS with a dog

Jesús talked a little bit more about Keep Going Signals, using an example from one of his own students. She wanted to teach her dog a new conditioned reinforcer that she could use as a KGS. She started by teaching the dog to touch an object for a click and treat. Once the dog had learned the behavior, she said “bien” (her new KGS) instead of clicking, and waited for the dog to touch the object again. If the dog repeated the touch, then she would click and treat.

She was able to use the KGS to ask the dog to continue touching an object and I think she tested it on other objects. You do have to train a KGS with multiple behaviors in order for it to become a KGS, as opposed to a cue for a specific behavior. I don’t know if she tested it with other behaviors, but that would be the next step. I’m also not sure if they compared the dog’s performance, with and without the KGS, to see if adding a KGS increased the dog’s seeking behavior, as Gadbois had suggested it would.

Conclusion

The difficulty with the question “Do I have to treat after every click?” is that the answer depends upon how you are using the click and whether or not it cues the animal to “end the behavior” and expect reinforcement. Conditioned reinforcers have two functions. They function as reinforcers and as discriminators, and you need to consider these functions when choosing how to use the click.

If you are using the click as a Keep Going Signal, the animal learns to continue after the click and the click does not interrupt  the behavior.  This means you can click multiple times before delivering the terminal reinforcer. However, it’s likely that you will end up having a different cue that tells the animal when it has completed the behavior and can expect reinforcement. If you don’t, the animal may become confused about what it should do when it hears the click.

If you are using the click to indicate when the behavior is complete, the animal learns that the click is a cue to start the reinforcement process.  You can teach the animal a specific response to the click so that the animal knows what to do to get his reinforcement. If the click is being used in this way, then it will interrupt the behavior and you will want to wait until the behavior is complete before clicking.

We call both these types, the click as a KGS and the click as an “end of behavior” cue, conditioned reinforcers, but they are not the same thing. There are many kinds of conditioned reinforcers, and when you are not specific, it’s easy to think you are talking about the same kind, but you are not.  So both “camps” may be right, but for the wrong reasons.

Jesús finished by saying we need to study this more carefully in the laboratory and also in real life training situations.  One point he made was that an animal, who initially learned one click = one treat, could probably be re-trained to understand that the click was a KGS, if the transition was done more slowly (than the dogs in his student’s experiment), but he still thinks it would change the meaning of the click from an “end of behavior” cue to a “keep going signal.”

I thought this was a very interesting talk, partly because it shows how important it is to clearly decide how you are going to use conditioned reinforcers and to make sure that you teach your animal what it means. I don’t think it was intended to be the final word on a complicated subject, but the presentation certainly made me more aware of the importance of thinking about the many functions of conditioned reinforcers and how I am using them.

But… I’m not sure it left us with an answer to the question of what happens when the same conditioned reinforcer is used as both a KGS and to end the behavior, which is how many trainers describe their practice of clicking multiple times before delivering the terminal reinforcer. There needs to be research done on what happens if it is used as both.

A few personal thoughts

This presentation was informative, and made me feel more confident about the system I use, but it also left me with some unanswered questions.

I have always followed a click with a treat. It is how I originally learned to clicker train and it has worked well for me. If I want to use a different reinforcer, I have a different marker. If I want to provide information or reinforcement to the horse without interrupting the behavior, I have several other conditioned reinforcers I can use.

It’s never made sense to me to have the same conditioned reinforcer sometimes be a cue to “end the behavior” and sometimes be a cue to “keep going.” I question if that’s even really possible, unless the animal learns it has different meanings under different conditions, and that seems a bit awkward. It just seems simpler to have clearly defined conditioned reinforcers and use them in a consistent manner.

I was intrigued by the research into Keep Going Signals. I do use Keep Going Signals and have found them to be useful. But I have also found that I have to pay attention to maintaining them in such a way that they retain their value (through pairing with other reinforcers), but don’t become reliable predictors of reinforcement and morph into “end of behavior” cues. I’d love to see more research on how to effectively maintain Keep Going Signals, as well as some research on how effective they are at marking behavior.

Notes from the Art and Science of Animal Training Conference (ORCA): Dr. Paul Andronis – Adjunctive Behavior – So What Else Is Going On?

dog drinking

Dr. Paul Andronis is a professor at the University of Michigan where he is an expert in the experimental and applied analysis of behavior.  In his presentation, he shared some information on “adjunctive” behavior both from an academic and a practical viewpoint. He discussed several varieties of adjunctive behavior, how it differs from other types of behavior, and the necessary conditions under which it occurs. He also talked about how it should be classified and why the concept of adjunctive behavior is relevant for animal trainers.

 
The history of adjunctive behavior

Adjunctive behavior was first identified in the laboratory as “schedule-induced” behavior (Falk 1961). It was observed in experiments where an animal was subjected to the same schedule or limited contingency over and over.

In this scenario, you would expect to see some regularity of behavior, and that the animal would become very efficient at doing what was necessary to earn reinforcement.  But, in some cases,  what they saw was that the animals were doing a lot of extra and unnecessary behavior.  The types of behaviors they observed, and how much these “extra” behaviors occurred, varied depending upon many factors. But they were often tied to reinforcement schedules or something about the environment, usually the experimental set-up.

Even though adjunctive behavior was not described until 1961, it’s likely that it had been present in other experiments and was ignored or not recognized as being of interest. Or perhaps it had occurred infrequently because the experimental set-up for many experiments had included some level of constraint or restraint that limited the animal’s behavior. Pavlov’s dogs were physically restrained and Skinner’s rats were placed in operant conditioning chambers that offered few options for alternate behaviors.  In any case, it was not until 1961, that adjunctive behavior started to receive attention.

Dr. Andronis did mention that it was around this time that Keller and Marion Breland published their article “The Misbehavior of Organisms.” The article described how some behaviors were difficult to train in certain animals if the behavior they wanted to train conflicted with a strong natural behavior. For example, the Brelands wanted to train a raccoon to put money in a slot, but the raccoon wanted to wash, or would go through washing actions with,  objects that it was given. Even though washing was not part of the behavior the Brelands wanted, or were intending to reinforce, they found it was very difficult, if not impossible, to eliminate it.  The fact that they could not control this “misbehavior” through a reinforcement contingency was used by some scientists to try and discredit operant work.

As part of the  discussion about how experimental set-up can influence results, Dr. Andronis talked about some of his own work training key pecking with pigeons. He said that, in one of his experiments, they had to use some older keys that were stiffer than the usual keys.  They found that the pigeons pecking the stiff keys produced a different pattern of behavior than the pigeons pecking lighter keys.  I think he was comparing results from two different experiments here, not saying they deliberately compared two types of keys.

When the pigeons were put on a fixed rate schedule,  they saw a nice scalloping pattern (surges of pecking followed by pauses) with the birds pecking the stiff keys.  But the birds pecking lighter keys showed a more consistent pecking behavior where they pecked at a more steady rate. The reason he shared this example was that the birds pecking the stiffer keys had time to do adjunctive behaviors because there were gaps in the pecking behavior. The bird with the lighter keys were busy pecking and less likely to do other behaviors. So, something as simple as key stiffness could make a difference in whether adjunctive behavior was likely to occur or not.

Adjunctive behavior in the literature

The history of adjunctive behavior can be difficult to trace because it goes by many different names in the literature, and while these names may refer to the same phenomena, they may also be used for similar but different phenomena as well.

Some of the most common names in the literature are:

  • schedule-induced behavior
  • Collateral responses
  • Adjunctive behavior
  • Ancillary behavior
  • Interim activities (or behavior)
  • Behavioral side-effects
  • Psychogenic behavior

Some of these terms are used more when referring to behaviors that are repeated because they end up being part of the behavior cycle (ABC cycle), even though they are not part of the criteria for reinforcement.  Other terms are more likely to be used to refer to behaviors that the animal does between opportunities for reinforcement, but that seem to be indicative of stress or frustration, such as might happen when the animal is placed on a lean reinforcement schedule.

For example, the term “collateral response” is usually used to describe behavior that is repeated because it gets reinforced along with the behavior the trainer wants the animal to do.  This might happen if the rat presses a lever, gets food,  and goes through a whole cycle of behavior before it is reinforced for lever pressing again.  From the animal’s perspective, it had to do the entire cycle to get reinforcement, because all the behavior in the cycle was reinforced along with the target behavior. The animal doesn’t know that it didn’t actually have to do it, it was just filling time.

On the other hand, behavioral side-effects or psychogenic behavior are terms that are more likely to be used to describe behaviors that occur at a higher frequency when an animal is stressed. These behaviors are usually not done intentionally by the animal.  Examples of these types of behavior are excessive defecation, urinating, and defecating.

I want to mention that professional animal trainers are unlikely to use the terms I listed above, which are used more in academic circles, but they certainly do recognize that adjunctive behaviors exist.  They just have their own names for them and may also have slightly different definitions.  Some common names for “extra” behavior that occurs when trying to train something else are displacement behaviors, stress or frustration behaviors or superstitious behaviors.

Okay, so with all this background information…

How do we define adjunctive behaviors?

Critical attributes of the concept.

An adjunctive behavior is a behavior that:

  1. Reliably accompanies another operant behavior targeted by experimenter-programmed contingencies;
  2. Is not explicitly required for meeting the requirements of those (E-programmed) contingencies;
  3. Is not reinforced (directly or adventitiously) by those contingencies maintaining the operant behavior they accompany; and
  4. Is occurring at rates considered excessive under the particular procedural arrangements.

Various types of adjunctive behaviors have been observed in laboratory settings. Here’s a list of some of them:

  • Polydipsia (excessive drinking)
  • Air-licking
  • Wheel running – common, but varies among species
  • Aggression – also fairly common
  • Bolt-pecking
  • Pica
  • Locomotor activity
  • Paw grooming
  • Defecation
  • Wing-flapping
  • “displacement preening”
  • Escape from schedule-requirements
  • Cigarette smoking (I hope this is among human subjects…)
  • Alcohol consumption (most animals avoid alcohol, but will drink it under certain conditions)
  • Chronic hypertension
  • Nitrogen “drinking”
  • Pellet “pouching”
  • Self-injection with nicotine

When and where does adjunctive behavior happen?

Most adjunctive behavior has been observed in the laboratory (is this suspicious?) and it has been studied in rats, pigeons, mice, hamsters, gerbils, humans, and rhesus macaques.

Dr. Andronis pointed out that adjunctive behavior is more likely to occur when animals are put on certain types of reinforcement schedules. This is where the name “schedule-induced behavior” comes from.  It is more likely to occur on schedules that have longer intervals between reinforcement, or when reinforcement is not delivered as expected.  It also tends to occur at predictable places in the schedule.

The type of schedule-induced behavior can tell you something about the underlying state of the animal.  An animal that is showing aggressive behavior is in a different state than one that thinks it has to do some extra body movement to earn reinforcement.

But, Dr. Andronis did point out, later in the presentation, that some schedule-induced behavior is simply behavior that has been reinforced as part of the reinforcement contingency.  So, even within the category of schedule-induced behavior, you can have adjunctive behavior that is operant, and is being done deliberately, or respondent, and is more a reflection of how the animal is feeling.

I think his point was that the type of behavior can be useful information, or not. You may have to collect additional information before you can interpret it. And of course, a behavior might initially occur because the animal is frustrated and then be maintained or increase because it is being reinforced.  One of the challenges of schedule-induced behavior is that it is probably the result of several variables, not just the schedule, and it’s not always clear how all the variables interact.

I think it’s interesting that most of the behaviors on the list above are mostly ones that you would not want to have happen during training, and some are ones you wouldn’t want to have happen at all, because they indicate that the animal is stressed.

Some examples of adjunctive behavior:

Polydipsia (excessive drinking):

  • Polydipsia was the schedule-induced behavior identified by J. Falk in 1961. He was studying rats and wanted to measure water intake under natural conditions. When he changed his reinforcement schedule to a F1, (1 minute between delivery of reinforcement), he saw more drinking. The rats would drink half their body weight in under one hour.
  • Dr. Andronis had an example of this with people living in a hospital.  The residents would walk the halls because they were bored.  At the end of each hall, there was a water fountain and they would stop to get a drink, probably just for something to do. This led to problems because the excessive urination affected their medicine and the hospital had to provide more activities to limit the hall walking and polydipsia behavior.

Aggression

  • In experiments where an animal becomes frustrated due to lack of expected reinforcement, there may be increased aggression toward other animals.
  • You may also see aggression due to resource guarding in experiments where food is delivered for lever pressing.  One animal may choose to guard the lever so another animal cannot have access to it.

Increase in Social Behaviors:

  • You can see schedule-induced changes in social behavior if you have multiple animals and multiple levers.  The animals may switch places to “help” another animal earn reinforcement.

Defecation:

  • Schedule -induced defecation has been observed in rats. In one experiment, the researcher noticed that the rats were producing more feces than expected, and also that they were defecating at unusual times.  He discovered that the amount of defecation could be manipulated by changing reinforcement schedules and was also affected by whether or not a water bottle was present.  (Rayfield, 1982)

An experiment that showed how to generate different kinds of adjunctive behavior

Along with the examples of types of adjunctive behavior, Dr. Andronis also described an experiment he did with pigeons.  In the experiment, he had pigeons pecking keys under three different conditions. They were:

  • Hard: the bird had to peck a lot to earn reinforcement
  • In-between: moderate amount of pecking
  • Easy: few pecks required to earn reinforcement

Interestingly, the “in-between” schedule was the one that produced either aggression or social behavior. On the easy schedule, the birds were not stressed. On the hard schedule, the birds were too busy to do anything else. But the in-between one created some frustration and also provided opportunities for other behaviors.

The experiment had several steps:

  1. He taught the birds the schedules. Each bird was in a cage with a lever.
  2. He added two new additional keys (in another location in the same cage) that allowed the bird to change the schedule. One key made the schedule harder, the other made it easier.  The pigeons all learned to set their own requirements to “easy.”
  3. He switched the function of the keys so the key that used to make it easier now made it harder.  This led to some frustration because the bird would choose the key for “easy” and get frustrated that it no longer worked as it had before. But, the pigeons did learn that when that happened, they should choose the other key. And they also learned that the functions of the keys would change and would anticipate the change and start to peck the other key even before he switched it.
  4. He set up a social experiment by placing two birds in side by side cages. I can’t remember the exact arrangement, but it was designed so that one bird could control the “difficulty” level for the other bird, meaning it got to decide if the other bird’s schedule was easy, in-between, or hard.
  5. He observed the behavior of the two birds.  He found that the bird who could control the other bird’s level of difficulty would consistently choose to peck the key that made it harder for the other bird to get food.

Main Laboratory findings

He provided a brief summary of what they have learned about adjunctive behavior from work in the laboratory.

  • It’s often a “post-reinforcement phenomenon” (it happens right after reinforcement when there is a delay before the next reinforcement is available)
  • Rates of occurrence vary as an “inverted-U” function of the inducing-schedule parameters. Rises over some schedule requirements, and then drops off, similar to a dose response curve.
  • They tried to show that it was not reinforced by inducing contingency.  Scientists would insert a COD (change over delay) to try and separate the adjunctive behavior from the reinforcement, so it was not accidentally reinforced.  But you don’t know how the animal experiences the COD and some scientists argue that a separation doesn’t mean there’s no effect.
  • Probably related to potentiating effect of inducing-schedule on SDs  or reinforcers specific to induced behaviors.
  • Idiosyncratic patterns of substitutability among induced behaviors.

Where it fits, in theory

Scientist like to know why behavior occurs, so once they identified and starting studying adjunctive behaviors, they came up with some theories about them.

Are they? 

  • Adventitiously-reinforced, superstitious behaviors? (Skinner). This is a reference to Skinner’s experiment on the development of superstitious behavior (1948).
  • A third class of behavior? (Wetherington, 1982), i.e., respondent, operant and contingency-induced. She concluded that they were not a third class of behavior and that trying to put behaviors into respondent/operant categories was problematic.
  • “Induced states” (Staddon) – purely a function of reinforcement schedules that induce certain behaviors, particularly drinking.
  • “Just plain operant behavior,” related to joint-environmental effects of programmed contingencies

Of these, Dr. Andronis favors the operant contingency relation.  If you consider that animals are balancing costs and balances all the time, it seems possible that there is an operant aspect to adjunctive behavior.

This doesn’t really tell us where they adjunctive behaviors come from, but the theory that adjunctive behavior is operant fits in with some other things we know about behavior and appears consistent with the innate response hierarchies posited by Lorenz and other ethologists, and with probabilistic model of Epstein in generativity theory.  According to Epstein’s theory, creativity or the generation of “novel behavior” is predictable and you can calculate probabilities for the different options.

Though the languages differ, both acknowledge:

  • Prior histories of occurrence for the behaviors involved;
  • Differential probabilities of the response classes in repertoire depend upon the presence of specific antecedent stimuli (“innate releasers” or SDs) and/or other potentiating variables (“motivational” variables);
  • When the currently highest probability behavior in the repertoire is momentarily interrupted (eg. Suppressed by punishment, rendered less likely by breaking the contingency, disrupted by changes in background stimuli, physically restrained or other prevented from occurring, etc.) the next most probably behavior in the repertoire occurs, made more probably particularly by its specific antecedent stimuli being present.

Ok, there’s a lot of jargon in there. As best I understood, what he was saying was that adjunctive behaviors are behaviors that already exist in the animal’s repertoire and they are “released” by stimuli in the environment under certain conditions.  One of the conditions under which this happens is if the animal’s ability to earn reinforcement is interrupted.

An example of an experiment that supports this theory was described by Dr. Joe Layng in the question and answer session after the presentation. In the first part of the experiment, he and Dr. Andronis taught pigeons to do a new behavior, one that was not a natural behavior for pigeons. The behavior they chose was head banging. Yes the pigeons would bang their heads against the wall for reinforcement. But don’t worry. they made them wear helmets. (Yes, this is true – he showed us a picture).

After they trained this behavior, they put the pigeons on a reinforcement schedule for pecking that was likely to produce schedule-induced (adjunctive) behavior. And what did the pigeons do for the adjunctive behavior? Head banging. Head banging was a behavior with a previous history of high reinforcement, so when the pecking was interrupted, it’s the behavior that re-appeared.

Increasing the likelihood of contingency adduction

In the experiment described above, Dr. Layng and Dr. Andronis intentionally introduced an undesirable behavior, but the same process can be used to produce new behaviors through “contingency adduction.”  This goes back to the idea that adjunctive behaviors are operant and can become part of the behavior.

It is related to Dr. Epstein’s work on creativity in which he showed that what we think of as being creative is often just a new combination of some known behaviors.   What you want to do is teach several behaviors separately and then set up conditions where the learner combines them.

He stated that:

  • Evocative environmental arrangements can introduce controlled variability into behavioral stream (canalization).
  • They can make possible (or highly likely) some novel combinations of existing and emerging repertoires that in turn meet new and more complex contingency requirements posed by the trainer.

He showed a picture of Freud’s office, which was filled with lots of weird stuff.  This makes it easy for the “patient” to do the ice breaking and is a way to evoke behaviors without doing a lot of prompting. It’s also useful because history affects how a stimulus is perceived, so an individual’s response to different stimuli can tell you a lot about what has happened to them in the past.

For animal trainers, the equivalent of this would be setting up the environment to get some natural behaviors going.  Behavior is never isolated. You want to be alert to instances when behavior happens systematically and take advantage of it.  Depending upon the type of behavior you want, you may want to have more controlled variables – for motor behavior, or you can leave it more open ended if you are looking for more creativity or teaching cognitive tasks.

Here are a few points from the question and answer session:

If you sometimes reinforce the adjunctive behavior, you will get it at even higher rates than would be expected.  (this is why a mis-timed click that lands on an adjunctive behavior can be such a set-back). It doesn’t take much reinforcement to maintain it, once you have an adjunctive behavior going.

Dr. Killeen said they have done work that showed that a delay is not a guarantee of separation. You can train a rat to press a lever and wait 30 seconds before reinforcement and they still learn to press the lever

 

 

Notes from the Art and Science of Animal Training Conference (ORCA): Barbara Heidenreich on “Maintaining Behavior the Natural Way.”

parrotball
Barbara Heidenreich is a professional animal trainer who does extensive consulting with zoos and also works with individuals training many different species.  In her work as a consultant, she often finds herself in situations where she has cannot rely solely on food for reinforcement, so she has learned to identify and use non-food reinforcers of many different kinds.

In her presentation, she shared some tips on finding and using non-food reinforcers.  This was a great follow-up to the discussions on using The Premack Principle because she had a lot of good examples of behaviors that could be used for reinforcement.  The examples included a lot of videos, which I cannot include here, but if you go to her website (www.goodbirdinc.com), or look her up on YouTube, she has lots of videos available.

Why food might not be the best option

She started with a discussion about why food might not be the reinforcer she chooses to use.  In some cases, it has more to do with the rules of the facility at which she is consulting, but there are lots of other reasons why food might not be an option, or might not be the best option.

Here’s her list:

  • The animal has no motivation for food
  • The diet is limited for health reasons
  • The diet was fed out already or needed for other purposes
  • She has no authorization to use food
  • The animal is fasting (snakes, alligators, etc.)
  • The same reinforcer can become predictable and less effective (she didn’t expand much on this but some trainers do believe that reinforcement variety is important)

While some of these limitations are less common with horses, I have found myself in situations where I could not use food for dietary reasons, because it was not allowed at the facility, or because the horse would not be able to eat during the training (this is usually only the case with dental or some medical procedures). I’ve also found that, in some situations, food may not be the best reinforcer.  So, having other options is always a good idea.

What do you do if you can’t use food?

If you feel it’s very important to be able to use food, you may need to address the issue directly by getting permission, coming back at a better time, or locating an appropriate food. But, in many cases, the better option is to try using non-food reinforcers. These can  be just as, or even more, effective than food reinforcers when used correctly.

She had some examples of natural behaviors that could be used as reinforcers.  One example was a beaver who could be reinforced with permission to take browse back to her cage.  Another example was a bird that could be reinforced with a colored object that he could take back to his nest.

Identifying possible non-food reinforcers

The challenge for most people is that if they are not used to thinking about non-food reinforcers, then it’s hard to know where to start.  But it all starts by observing your animal.  What does it engage in? What does it seek to acquire? What are species specific trends? Is there social behavior that is reinforcing?

She stated that “ANYTHING an animal seeks to acquire/engage in/have access to/do and can be delivered contingently has the potential to reinforce behavior.”

Here are some types of non-reinforcers that she has used:

  • Scent stimulation (smelling bedding from another animal can function as a reinforcer
  • Tactile stimulation – decide where, when, how  (pigs love to be scratched)
  • Visual stimulation (penguins can be reinforced by chance to chase lasers.  She worked with a gorilla that was reinforced by the chance to look at the ultrasound screen.)
  • Auditory stimulation (people do this inadvertently all the time by responding to behavior with laughter or by verbal responses, also mating calls, conspecifics)
  • Social interactions (in parrots, reinforcers can be facial expressions, mimicry, head bobbing, chuffing, etc.)
  • Enrichment items (toys, boxes, blankets, etc.)
  • Mental stimulation (a chance to solve a problem can be reinforcing)
  • Physical activities (running, flying, destroying, dust bathing, etc.)
  • Access to preferred people, preferred animals, preferred locations (darkness, sunlight, water for bathing or washing food,

There are many subsets within each category and some reinforcers will fall into multiple categories.

Evaluating reinforcers (food and non-food)

Even though many behaviors can be reinforcing, you can’t assume that all of them can be used as reinforcement for other behaviors. Therefore, it’s a good idea to spend some time observing the animal so you become familiar with his body language and responses to different kinds of stimuli. An easy way to start this is by looking more closely at how he responds to food reinforcers.  Learning how he responds to food reinforcers will help you learn to read his body language when you offer opportunities to engage in behaviors as reinforcement.

You can use some of this information to help evaluate non-food reinforcers once you know how they respond to food they like. It can help to compare a few different types of reinforcers to get a larger picture of the possible ways an animal can respond.

Evaluating food reinforcers:

When the animal is relaxed and comfortable and asked to do nothing:

  • Show the animal a food item and observe body language, does the animal look at, lean toward, orient toward the food? Anticipatory behaviors?
  • Offer a small item: how does the animal take it?
  • Observe how it eats the food: speed, using species specific behaviors?
  • What does it do when it’s done? Does it look for another piece? Engage in other activities?
  • Throughout: look for signs (species specific) that indicate the level of satiation

Evaluating non-food reinforcers:

The animal needs to be calm, relaxed, and in a distraction free environment before being offered the opportunity to engage in the activity.

  • Identify criteria for measuring motivation – are there behavioral signs that show the animal finds the activity reinforcing?
  • Are there ways to measure responses to the criteria?  How do you know when the animal wants more or is approaching satiation?

I think it’s a great idea to evaluate reinforcers before starting to use them. It’s so easy to jump into training with a reinforcer that you assume the animal will like, and then find that things don’t quite work out as planned.

The challenges of using non-food reinforcers

There are some differences between using food and non-food reinforcers, so if you are going to use a non-food reinforcer, you will want to think carefully about when and how it will be the most effective.

You also should consider if you need to spend time teaching the animal how it gets access to the reinforcer, how long it should expect to have the reinforcer, and how the reinforcement ends. Just as we teach our animals what to expect with food reinforcers, we need to teach them how we are going to use non-food reinforcers so they have clear expectations and we can use them effectively.

While the examples below touch on the need to have a clear understanding (on both sides) of how and when you are going to use the reinforcer, I wanted to emphasize it because I think it’s important to realize that you can’t just start using non-food reinforcers in place of food reinforcers without some preparation. This is especially true if your animal is used to food reinforcers and has some expectation about how behaviors will be reinforced.

Here are some of the challenges with non-food reinforcers:

  • You might have to wait longer between repetitions (doing a behavior can take more time than eating).
  • The animal might need to “give up” an item, so you need to plan for that (do you teach it to release it or set up the environment so the behavior naturally ends?). You might need to teach a drop, trade or retrieve behavior so the animal gives up what it has in order to do another repetition of the behavior.
  • If novelty is part of the reinforcement value, then you have to keep coming up with novel stimuli, which can be difficult.
  • Some non-food reinforcers are best used either before or after food reinforcers, and not mixed in with them. For example: if you try to mix food and non-food reinforcers with pigs, it won’t work because food is their #1 choice.  So use food before or in separate sessions. With other animals, it might be better to use food reinforcers after non-food reinforcers.
  • Some non-food reinforcers can build high levels of arousal which can flip to aggressive behavior.
  • Some non-food reinforcers can trigger unwanted behaviors (sexual).
  • Some non-food reinforcers require a lot of trainer participation.
  • If using a tactile reinforcer, you may need to train a signal that tells the animal you are going to touch. This prepares the animal and also makes it more easy to tell if the animal wants it. Wiggle fingers -> head scratch, repeat a few times, then delay and see if the animal will come forward to get it. If desired, you can add distance so the animal has to move toward the handler to get the reinforcer.

Video examples:

  • A bird was reinforced for putting a ring in a cup by getting to shred paper as reinforcement.
  • A tiger was shown shifting to new cage for an enrichment item. The item was placed in the new location ahead of time (pre-baiting).
  • An orangutan was reinforced by the opportunity to play with a box if she sat in a particular spot on a log.
  • Other apes were reinforced with a chance to look in a mirror and to play with a blanket.
  • there were more, but I’ve forgotten all of them…

Benefits of Thinking Outside of the Treat Pouch 

  • Great when there is no motivation for food
  • Some reinforcers facilitate calm body language (tactile in pigs and tapirs) which can help with cooperation for medical care
  • Animals may not satiate on some types of reinforcers very quickly
  • Some may be even higher value than food under the right conditions
  • More opportunities for animals to communicate what they want
  • Adding variety of reinforcers can facilitate keeping motivation strong when in maintenance mode for a behavior or a repertoire of behaviors (you can extend your training session because you have more reinforcer options)
  • Creates an attentive, insightful, well-rounded trainer

She ended with a few other video clips.  One was of her rabbit doing an agility course. At the end, the rabbit returned to her crate and could choose her reinforcer. If she faced forward, she got a head scratch.  If she faced to the back, she was reinforced with food.

She also had a clip of a program she is developing where you can go for a walk with a hornbill.  The bird accompanies the hikers and this behavior is reinforced in a variety of ways, both by the hikers and with natural reinforcers it finds along the way

Final thoughts

All the information Barbara presented clearly showed the value of using non-food reinforcers. Since I work mostly with horses, I find it’s easy to end up using food for most reinforcement, but there have been times when non-food reinforcers were good options.  Still, I suspect I haven’t used them as much as I could.

I sometimes think it must be easier to see how to use them when you work with a variety of species and are exposed to lots of different types of behavior.  The differences help to broaden your thinking about how animals engage with their environment and what might be reinforcing. At the same time, when you see similar behaviors functioning as reinforcers for several species, it might make you wonder about whether or not those behaviors might be reinforcing for even more species.

I found Barbara’s presentation had a lot of great ideas for evaluating both food and non-food reinforcers and it has prompted me to look more carefully at how my horses respond to reinforcement.  Do I see signs of enjoyment?  Do I recognize when the horse wants more? When he has had enough?  When he would prefer a different reinforcer or to end the session? Her presentation was a good reminder that good trainers are constantly monitoring and adjusting reinforcers and that both food and non-food reinforcers can be used effectively.

Notes from the Art and Science of Animal Training Conference (ORCA): More on The Premack Principle

horse and ball
In the last article, I wrote about the Premack Principle which states that:

  • • Behaviors are reinforcers, not stimuli
  • • More probable behaviors reinforce less probable behaviors.
  • • Less probable behaviors punish more probable behaviors.

What can I do with information? First, the idea of behaviors as reinforcers opens up many new possibilities for ways to reinforce behavior. And second, it means I need to pay more attention to the effect that each behavior is having on other behaviors that occur either before or after it.

In many cases, especially if I train with food, this information may not change much about what I do. But there are some types of training where knowing about the Premack Principle can make it easier to come up with a training plan that will be successful and will help you make decisions along the way. At the Art and Science of Animal Training Conference, there were two presentations that explained how to deliberately use the Premack Principle in your training.

They were:

  • Emily Larlham – Reversibility of Reward: Harnessing the Power of your Animal’s worst distractions
  • Alexandra Kurland – Do it again: The use of patterns in training

I am going to share some notes from each one separately and then end with a summary about what we have learned at the conference about the Premack Principle.

Emily Larlham – Reversibility of Reward: Harnessing the Power of your Animal’s worst distractions

Emily started us off with a video showing one of her dogs who loves to chase frogs. For this dog, frog chasing is clearly a behavior that has a high probability of occurring in certain environments. Then she shared a video of her dog walking calmly among some chickens, showing that in that scenario, chasing was actually a lower probability behavior.

Her question was “Would you like to get from this to this?”

Of course, most of us would say “yes”, but it can sometimes seem impossible to imagine how we can replace one behavior with another, especially if the dog has a previous history of doing the less desirable behavior, or if it’s hard to control the environment so we can train a new behavior in small steps.

But it is very possible, and she provided some simple guidelines for how to do it. She did make a point of saying that this topic could not adequately be covered in a 20-minute presentation, so she was just providing a basic outline and some examples in order to illustrate the process. If you want to learn more about how to do this, she has additional resources on her website (www.dogmantics.com).

Before getting into details, she brought up the question of whether or not we are brainwashing dogs. I thought this was a really interesting point because when we try to change behavior, we do always need to think about the function of that behavior and what effect it has on the well-being of the animals. We also need to think about if the new behavior can serve the same function as the original behavior, or if we need to change something else so the needs of the dog are still being met.

Every case is going to be different, but she has found that there can be positive changes in health and well-being when a dog learns a new, calmer way to respond to stimuli that would previously have caused a great deal of excitement and stress.

The steps:

  1. Train desirable behavior first, and maybe an interrupter
  2. Train controlled version of undesirable behavior (I’m not sure exactly what she means)
  3. Set up environment in which the dog chooses to the desirable behavior after being released to do the undesirable behavior. If using food, the dog must be calm enough to take treats.
  4. Add criteria/generalize for the final scenario

She had some great examples and video showing the steps. Here are some of them.

Example 1: Teaching a dog to come away from an item of interest.

  • She had a video of her dog learning to recall away from a snuffle mat. She also showed teaching a dog to recall away from a Frisbee.
  • In both cases, she started by making sure the reinforcer she provided for the recall was of higher value than the item of interest.
  • She also built the behavior in small steps, starting with a recall from a short distance or a recall before the dog got too close to the Frisbee.

Example 2: Teaching a dog to orient to the handler, even in the presence of a rabbit.

This was with an Irish Wolfhound and they started by practicing the desirable behavior, which was having the dog orient to the handler. The rabbit was introduced in stages. First, they had a person stand or walk around at a distance. Then they had the person carry the rabbit, and so on. The complete progression was:

  • Practice orienting to handler
  • Practice with a person (no rabbit) at a distance
  • Practice with  a person holding the rabbit, still at a distance
  • Practice with a person and rabbit placed on the ground, still at a distance
  • You can vary the distance as needed, maybe adding distance with each new step, then repeating it with less distance before moving on to the next step.
  • You may want to use interrupter so dog doesn’t practice getting too excited
  • You can decrease distance as part of each step or as a gradual process over the whole sequence, or some combination.

Example 3: She also showed some video of a dog that wanted to chase deer and would get very excited, jumping up and down and straining at the leash. She was able to teach the dog to remain calm, even when deer were present, and the dog can now be walked without trying to chase deer.

A few other points:

  • The distraction can become a cue for the new behavior. In the second example, seeing the rabbit triggers the desire to chase it. The trainer’s goal is to change the response of the dog so that the sight of the rabbit becomes a cue to do the behavior she wants. It’s important to build this slowly by starting with a “picture” that doesn’t elicit the undesired response and then slowly change it so the rabbit cues the new response.

  • The distraction can be used as a reinforcer, if appropriate.  There are some behaviors that can used as reinforcers for the new behavior you have trained. She often uses “permission to sniff” as a reinforcer for waiting.

She ended with a review of these important concepts:

  • Maintain correct arousal level
  • Ability to control training set-ups
  • Building strength of desired behaviors first
  • Break down training into small steps
  • Reinforce dog’s choice
  • Manage/prevent unwanted behavior in between
  • Generalization/brush up

Emily’s presentation was on using the Premack Principle to decrease unwanted behavior by strengthening  new behaviors and make them more probable.  The next section is on how to use the Premack Principle to teach new behaviors by using patterns.

Alexandra Kurland- Do it again: The use of patterns in training

In Alexandra’s presentation, she showed how patterns can be used in training. Patterns are another way to use behaviors as reinforcers, especially if you build your patterns carefully so they include higher probability behaviors that can be used to reinforce lower probability behaviors.

Alex started with a video showing a simple pattern in which a horse went from mat to mat, passing a cone along the way. The set-up of the environment provided clear information to the horse about what to do next. This kind of exercise is a great way to teach some basic skills or behaviors in a systematic and thoughtful way.

Alexandra’s basic premise is that teachers and learners thrive on patterns.

Patterns create:

  • Predictability
  • Opportunities for do-overs (each repeat of the pattern provides another opportunity to practice or improve each behavior)
  • Time to plan ahead (you and your horse know what’s coming next so you can be prepared)
  • Consistent high rates of reinforcement (you can reinforce as much as needed. A pattern can have reinforcement at multiple places, not just at the end)
  • Emotional balance (if well designed)
  • An adjustable degree of difficulty

Patterns add a level of complexity that draws attention to areas where you need to do more work. A behavior might be easy for the horse when you practice it on its own, but be more difficult when inserted in a pattern. Figuring out why can identify areas that need more work. Combining several behaviors together in sequence may also make it easier to see a general trend such as a weakness in one particular skill or a missing component.

What do you need to know to teach patterns?

Patterns are connected to Loopy Training. Loopy training is Alexandra Kurland’s term for training where a behavior is built in loops. A loop can contain one or more ABC cycles (antecedent -> behavior -> consequence).

The basic guidelines for Loopy Training are:

  • Start with a tight loop with clean behavior (one ABC loop)
  • When the loop is clean, you get to move on. Not only do you get to move on, but you should move on. (Alex calls this phrase “The Loopy Training Mantra”)
  • Moving on means expanding the loop to include more behaviors, with additional cues and reinforcers if needed.
  • A loop is clean when it is fluid and there are no unwanted behaviors
  • Both sides of the click need to be clean

Some additional details about loops:

  • For every behavior you teach, there is an opposite behavior you need to teach and you need to balance behaviors within a loop. For example, a well-balanced loop would contain movement and stationary behaviors.
  • This is not recipe driven training. Every loop can be customized for every horse.
  • People get stuck in patterns because they stay too long. The Loopy Training mantra tells you to move on when the loop is clean.

Example of a Pattern created with Loopy Training: Walking with the trainer to a mat

Basic Principle: There’s always more than one way to shape a behavior. What is the future of the behavior? What skills do you want to learn?

Alex often uses mats in her training. They can be used to ask the horse to go forward (go to the mat) and to stop (stand on the mat) so they help to create well-balanced loops. But before you can use mats in your patterns, you need to have basic mat manners which include:

  • The horse walks on a slack lead beside me, or at liberty
  • He stops on the mat on his own
  • He stays on the mat
  • He leaves the mat when asked

To meet the criteria listed above, the horse and trainer need to learn how to go forward together, how to stop together and how to step back (if the horse overshoots the mat). She teaches these first.

Step 1: Teach the individual behaviors first:

Forward:

  • Can the horse come forward when asked?
  • Alex showed how you can use targeting to ask a horse to come forward
  • She can continue to use the target or add a cue to come forward
  • Stopping can also be taught out of targeting as the horse learns to come forward and stop at the target

Step back:

  • Can the horse step back when asked?
  • Alex showed how to use food delivery to teach backing as part of a basic targeting exercise.
  • Once the horse has learned to back through food delivery, she can add a cue
    Once the horse can come forward, stop and step back, she can start to train matwork.

Step 2: Teach Going to the mat

Alex teaches matwork using what she calls a “runway.” The runway is made of two sets of cones arranged to form an open V (wide). A mat is placed at the base of the V. You can click and reinforce at any point in this pattern. When Alex first teaches it, she will reinforce for most correct responses. Once the horse knows the pattern, she can thin out the reinforcement if desired.

Runway lesson:

  • Start at the open end of the V
  • Ask the horse to come forward one step and then step back one step (needlepoint) at the open part of the V – this allows the trainer to practice asking for one step at a time. The trainer can move on when she chooses, usually when she sees some improvement.
  • Walk with the horse to the mat
  • Stop with the horse on the mat
  • If the horse doesn’t step on the mat, you can ask the horse to come slightly forward (using your needlepoint skills). When the horse does put a foot on the mat, you can use a high rate of reinforcement to reinforce that behavior.
  • Walk off the mat and turn so you are coming around the side (outside the cones) to return to the top of the V. You can switch directions so you work from both sides.
  • The pattern builds the skills that are needed for performance work (turns, forward and back).

The matwork pattern works well because it shows a nice balance of behaviors. It also allows for a lot of adjustability. The trainer can spend more time on the behaviors that need improvement, but is less likely to get stuck on one for too long, as the pattern itself encourages the trainer to move on after any small improvement.

You can make lots of different patterns to teach different behaviors and skills. I use a lot of patterns in my own training and have found that they provide a nice framework for teaching more advanced behavior combinations and can be adjusted to keep the training interesting.

Here are some of the main points from the three presentations on the Premack Principle:

• Think of reinforcement as being about behaviors. What does your animal want to do? Can you use it as a reinforcer?

• Don’t forget that Premack is also about punishers. If you follow a behavior you want with one that the horse is less likely to choose on his own, there will be a decrease in the behavior you want.

• The value of reinforcers can change. Premack showed that the same behaviors can function as reinforcers and punishers, so a behavior can be reinforcing in one situation and punishing in the next.

• You can shift a behavior from more probable to less probable by building reinforcement history for that behavior in an environment where the animal can be successful, and then slowly raising the criteria until he can do the new behavior under the original conditions.

• When choosing a behavior to use as reinforcer, consider behaviors that the animal enjoys or that have a strong reinforcement history.

• As with any type of reinforcer, you do want to choose one that provides the appropriate level of motivation and doesn’t tip the animal over into a state where he is too excited to learn. Barbara Heidenreich gave a presentation on non-food reinforcers that included some tips on choosing and using non-food reinforcers.

• Chains and sequences of behaviors will be stronger if the behaviors are arranged so that more probable behaviors follow less probable behaviors.

Notes from The Art and Science of Animal Training Conference (ORCA): Dr. Peter Killeen on “Skinner’s Rats, Pavlov’s Dogs, Premack’s Principles.”

IMG_2510
Dr. Killeen is a professor of psychology at Arizona State University and has been a visiting scholar at the University of Texas, Cambridge University, and the Centre for Advanced Study, Oslo.  He gave the keynote address on Saturday morning.  Here’s the description from the conference website:

“Reinforcement is a central concept in the enterprise of training, and yet it remains a controversial one. Much of the opinion about its nature is derived from laboratory protocols involving food or water deprived animals. This does not always translate into the more complex and pragmatic world of animal training. In this talk I take a step back, to re-embed the concept of reinforcement in an ecological context. Reinforcement is always caused by the opportunity for an animal to make a transition from one action pattern to the next. The Premack principle is a simple deployment of this insight. I will discuss the Premack principle, alternate versions of it, and the relevance of the emotional state of the animal.”

I want to preface this article with a few thoughts.  This is a long article and at times it may seem overly academic for the needs of most animal trainers.  By the time I was done writing it, I found myself wondering if anyone would want to read it. 

But I hope that you will do so, because I think Dr. Killeen has shared an important perspective on animal training and behavior that combines the work of psychologists, ethologists and other professionals in related fields.
This is somewhat unusual.  In Karen Pryor’s closing remarks, she commented it was common for professionals in related fields to be isolated from one another, even though they each have important information that they would benefit from sharing.  A presentation that shows the connections between different fields (psychology, ethology, biology), takes the information we have learned from all of them, and puts it in a larger framework, is a great resource.

But this presentation was not just about the big picture.  He included a lot of useful information about what we have learned in the past 100 years, and I found there lots of practical tidbits scattered throughout it. I also found it was very helpful to see the context in which each “discovery” was made and how the new information built on, and either complemented or required some re-thinking about previous discoveries. I hear references to the work of Pavlov, Skinner and Premack all the time, but without understanding more about the historical significance, how the work was actually done, and future applications, my knowledge of how to use that information has been and will be somewhat limited. Putting their work in context has made it easier for me to see what we can learn from the science, as well as what we still need to learn.

And finally, I think learning this stuff can be fun. Yes, I said it. Ok, I am a bit of a behavior geek and I like reading about scientific discoveries, but I think that it can be very eye-opening to read about the actual research and what it tells us about behavior. The first year I attended ClickerExpo, I went to Kathy Sdao’s “A Moment of Science” lectures and found my brain fizzing with excitement.  Previously I had only had a limited understanding of the science behind clicker training, so learning more about it was exciting, but there was something more. Something about seeing all the little connections (and how we learned about them),  seeing more clearly that behavior is not generated randomly, but follows predictable (well mostly…) patterns, and that by observing, analyzing and changing the conditions under which behavior happens, we can influence it.

So what did Dr. Killeen have to say? The following article is based on my notes from his talk and is shared with his permission. He also generously shared the slides with me, so I could study them in more detail and include some diagrams.

Dr. Killeen started by staying that training requires art and science.  He has spent most of his life as a laboratory scientist, but recognizes that knowing the science is only part of animal training. Still, he feels it’s very important to get the scientific information out to the public so that the knowledge can be shared and also viewed in the proper context. With this in mind, he took us on a “brief tour of modern learning theory” and looked at the contributions of Pavlov, Skinner, Premack and a few other scientists along the way.

He started with a little review of what we’ve learned from studying behavior:

  • Classical (Pavlovian conditioning)
    -sign learning – pairing of stimuli to create associations
  • Effect (Skinnerian conditioning)
    -self learning – responses get connected to consequences
  • Attraction (Thorndikian conditioning)
    -approach to incentives – surprisingly general and powerful law
  •  Premack Principle
    -transition to higher probability response
  • Timberlake’s Ethograms
    -organizes the Premackian insight

Then he went into more detail:

Ivan Pavlov:

Ivan Pavlov was a Russian scientist who was studying digestion in dogs in the early 1900s. In his experiment he wanted to measure salivation the amount of saliva produced when dogs were fed meat.  But, he started to have trouble because the dogs were salivating before he could feed them, and eventually even before he showed them the meat.

He referred to these as “psychic secretions” and ended up studying them instead. He did this by pairing a sound (typically a metronome) with the presentation of the meat and studying how the response to the metronome changed over time.  After a few pairings the dog would salivate in response to the metronome, instead of to the meat itself.

This work led to an understanding of the process through which conditioned stimuli can become associated with unconditioned stimuli to form new associations and responses, the process we now call “classical” or “Pavlovian” conditioning.  It is also led to the basic laws of association which describe the relationships between the US (unconditional stimuli), CS (conditional stimuli),  UR (unconditional response) and CR (conditional response.)

Note:  Dr. Killeen used the terms “conditional,” not “conditioned” as is often seen.  According to Paul Chance, the term conditional is closer to Pavlov’s original meaning, but the two terms (conditioned and conditional) are often used interchangeably.

Pavlov’s motto was “Control your conditions and you will see order.”

A few items of note from his experiments:

  • Pavlov’s dogs were restrained and he was positioned such that he could not see all their responses to the stimuli.
  • The investigators only paid attention to the smooth muscle (visceral) response, not to other behaviors that the animals did. This is important because it led to a limited view of classical conditioning, with scientists assuming it only occurred with certain types of responses.
  • The original description of classical conditioning was one of substitution, where you could replace one stimuli with another through conditioning.

Further research into Pavlovian conditioning has shown that it should be viewed somewhat differently. In the 1970’s scientists (H.M. Jenkins and others) were studying Pavlovian conditioning in unrestrained animals and found that there were numerous responses to the unconditional stimulus. They said it was more accurate to call the conditioned response a “conditional release” because it was releasing a number of natural responses.

Their conclusion was that:

  • The CS-US episode mimics a naturally occurring sequence for which preorganized action patterns exist. The CS “substitutes for a natural signal, not for the object being signaled as in the Pavlovian concept of substitution …”
  • CR should mean Conditional Release
  • The topographies of CRs “are imported from the species’ evolutionary history and the individual’s pre-experimental history”

H.M. Jenkins’ work showed that the textbook description of the CS as a “faint image” of the US is not accurate. It is more accurate to say that it is a signal to engage some new action patterns.  This is called induction.

Edward Lee Thorndike

“Psychology is the science of the intellects, characters, and behaviors of animals including man.”

Dr. Killeen remarked that most psychologists study the behavior of man and perhaps it’s time to turn that around a bit…

Thorndike is best known for stating the Law of Effect, which he formulated after observing the behavior of cats placed in puzzle boxes. The cats learned to escape through trial and error, but learned from each experience, so they were quicker at escaping once they had done it successfully.

“The Law of Effect is that: Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal, will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond.” (taken from his slide, which credits Kenneth M. Steele)

The Law of Effect is the foundation for Skinner’s work on operant conditioning because it clearly states the connection between response and reinforcing context. Dr. Killeen described it as “A law of selection by consequences. It is a probabilistic law.”

So far we have:

  • Pavlovian Conditioning – connection of context to stimuli, CS, US
    -> Similarity, proximity, regularity
  • Thorndike – connection of response to context, S-> R
    -> when they lead to satisfiers
  • Skinner – connection of response to reinforcers, R -> S (superscript R)
    -> dropped the need for satisfaction
    -> wanted to vest all variables in the environment

These are different ways of looking at behavior, but you have to realize that they are all going on at the same time.

To understand the importance of these discoveries (laws), Dr. Killeen had a slide showing  where they would be placed in a list of either the top 10 laws in Psychology or the most important laws in Psychology at the end of the 20th century.

Top 10 Laws in Psychology  (this list was taken from textbooks):

  • 2. Law of Effect
  • 3. Laws of association
  • 6. Laws of continguity
  • 8. Law of exercise

(note:  the laws of association, contiguity and exercise are roughly equivalent, or contain essential components of numbers 6, 8 and 10 below.)

Important laws in Psychology at the end of the 20th century (this list was taken from a journal article.):

  • 2. The Law of Effect
  • 6. Premack’s Principle
  • 8. Classical conditioning
  • 10. Reinforcement/operant conditioning

Dr. Killeen said that he, personally, thinks that Premack’s Principle is the most powerful of all.

David Premack

Dr. David Premack was a psychologist who studied reinforcement and cognition in chimpanzees.  Two of his most notable contributions are his work on Theory of Mind in chimpanzees and the Premack Principle.

The Premack Principle states that:

  • Behaviors are reinforcers, not stimuli
  • More probable behaviors reinforce less probable behaviors.
  • Less probable behaviors punish more probable behaviors.

This re-defining of reinforcers as behaviors was a very important shift in thinking and changed the way that scientists (and others) looked at reinforcers.  Previously, reinforcers had been defined as stimuli (food, objects, etc.) but Dr. Premack showed that it was the activity associated with that stimulus that was reinforcing.  It’s EATING the food that is reinforcing.  It’s PLAYING with the ball that is reinforcing.

If you aren’t sure about this, think about some of the activities you enjoy doing and ask yourself if the end goal is reinforcing, or if it is the activity itself. Why do you eat? Is it just to feel full? Why do you read a book? Is the book better if someone tells you the ending ahead of time?

When you are trying to change behavior, you want to look at possible activities and see which ones are more probable and which ones are less probable. This gives you a preference hierarchy of possible activities, which can then be used to shift behavior in the direction you want.

Dr. Premack did a number of interesting experiments looking at changing more probable and less probable behavior by limiting access to resources. He found that there was a “reversibility” of reinforcers, so an activity that was reinforcing in one situation might function as a punisher in another.

This work was done in the laboratory, but the Premack Principle explains the relationship between behavior and reinforcement under many conditions. Dr. Killeen showed an example (from Jesús Rosales-Ruiz) of using the Premack Principle with a barking dog. The amount of barking could be decreased by moving the dog either toward or away from the other dog, depending upon which behavior was more reinforcing for the individual dog at that moment.

Dr. Killeen stated that he thinks all reinforcement principles come down to Premack’s Principle.  But, there are objections and some difficulties in figuring out how to measure probabilities.

One problem is how to measure probability? It’s difficult to measure as it’s not solely based on duration or intensity, but may depend upon many factors.  Eventually Dr. Premack decided to use how much time the animal will spend on a task, if not satiating.

There was also the question of whether or not the animals had to be in a state of deprivation for an activity, in order for it to become more probable. In his experiment with rats where he was able to reverse the probabilities of wheel running and drinking, he did use deprivation to make one of the behaviors more likely.

While deprivation can certainly change probabilities,  there was an interest in looking for a better way to calculate (or predict) more and less probable behaviors.  There were also scientists who were interesting in finding a larger framework within which to view Premack’s Principle.  William Timberlake, a psychologist at Indiana State University, had developed  a way of mapping behavior that proved to be useful. His Behavioral System offered an way to describe and map behavior that showed how an animal will naturally progress through a sequence of behaviors, with each one reinforcing the previous one.

Timberlake’s Behavioral System was based on looking at the natural sequence of behaviors that are part of specific activities. Dr. Killeen had a series of slides that showed predatory behavior, and showed how each step leads to several choices, which leads to several more choices, and so on.  When an animal makes one choice, it makes some new choices more likely and some other choices less likely.  If you want to read more about Timberlake’s work, this article goes into more detail: http://indiana.edu/~bsl/behavior.pdf.

Here’s one of Timberlake’s Behavioral Systems charts that Dr. Killeen showed:

Timberlake1For the purposes of this article, what you need to know is that Timberlake looked at patterns of behavior; identifying systems and subsystems, modes, modules, and actions. You could trace and predict an animal’s behavior by making a diagram of the Behavioral System that showed possible pathways.

This provided a framework for viewing how one behavior might reinforce another. For example, in predation,  the mode “focal search” might lead to the behaviors “investigate”, “chase”, “lie in wait”,  “capture”, and “test”.  If an animal continued down the chase pathway, it might “track the animal” or try to “cut it off”.

Once you can identify different behaviors states (modes, modules or actions), you can collect data by observing animals to see which pathways are more likely.  This gives you general tendencies, not absolute values, as there are many variables and an animal can start down one pathway and be forced to shift to a different one. So it’s not useful in the absolute sense, but it can provide information about what behaviors tend to reinforce other behaviors, and it can help to identify the most common sequences.  This information helps to see the connection between what Premack learned from his laboratory work and the behavior of animals in their natural habitat.

I want to make a comment here.  When I see people describing the application of the Premack Principle in training, they often put an emphasis on using an available activity, one that is what the animal would choose to do on its own.  So a dog might be taught to orient to its owner in the presence of squirrels, and they would try to reinforce that behavior by providing the opportunity to run in the direction of a squirrel.

But there’s nothing in the Premack Principle that says you need to use a “naturally” reinforcing activity. I asked Dr. Killeen about that and he said that you can use any behavior, as long as you take the time to build a strong reinforcement history so that it can function as a reinforcer.  In Emily Larlham’s presentation, she talked about how to use Premack to decrease deer chasing and she did it by building a high probability for an alternative behavior that had nothing to do with chasing deer.

I think it can be helpful to look at the Premack Principle in the context of naturally occurring behavior sequences, and you may be able to use them in some cases, but don’t let that limit how you think about using it.

Unified Theory of Connection (Peter Killeen)

Dr. Killeen pulled all the Laws of Connection (Pavlov, Skinner, Thorndike, Premack) and Timberlake’s Behavioral Systems together to make his Unified Theory of Connection. This is where you start to see how the different laws fit together to create a complex repertoire of behavior. The Behavioral Systems provide a framework  and movement through the Behavioral Systems can be explained using the Laws of Connection.

Some Key Points of the Unified Theory of Connection:

  • Different subsystems (predatory, defensive, sexual) make different modes attractive.
  • Reinforcers are responses, not stimuli.
  • Movement down the modules constitutes reinforcement.
  • Movement from state to state (subsystem -> mode -> module ->  action) is possible because of satisfying events.
  • Animals approach stimuli that make progress possible (these stimuli are unconditioned or may be classically conditioned.)
  • Within modules, the actions and how they are done are subject to the law of effect, operant strengthening, etc.

Using this chart as an example, he provided some specifics on what movement within each column indicates, and what prompts transitions:

unified theory

The “action” column:

  • More probable (and thus reinforcing) responses are ones lower in their action space (lower responses reinforce higher responses).
  • Transition points enable progression through the actions. An animal moves through transition points for one (or more ) reasons.
  • —- 1. They are satisfying (Thorndike)
    —- 2. They are approached (Thorndike)–are incentive motivators
    —- 3. They elicit other species-typical actions (Pavlov)
    —- 4. They Reinforce the particular responses that lead to them (Skinner)

The “module” column:

  • Moving from one module to the next provides a “conditional release” (Jenkins) for what classes of responses are most likely.
  • The topography of the conditional release comes from the animal’s natural behavior (pre-organized action patterns).
  • Signs of such transitions are Pavlovian CSs – the CS substitutes for a natural signal.(in this context, I think a “sign” is what we might call a cue or a signal to proceed to the next behavior).

The “mode” column:

  • Moving from one mode to the next “sets the occasion”(Holland) for what classes of stimuli are most effective, what responses are most likely.
  • Such transitions are “motivating operations.”
  • “Occasion setters” follow different rules than CSs.
  • Training and interactions in general are as much about configuring motivational operations as about applying reinforcers. You want the animal to be in the right mode.

Readiness and Regulatory Fit:

Understanding how animals move either horizontally or vertically down the chart is essential when trying to change behavior. He provided a little additional information on this subject by looking more at readiness and regulatory fit.

Thorndike’s Law of Readiness (1914) already provided some information about how moving into a module provides readiness to move down the chain.

“When a child sees an attractive object at a distance, his neurons may be said to prophetically prepare for the whole series of fixating it with the eyes, running toward it, seeing it within reach, grasping, feeling it in his hand, and curiously manipulating it.”

Skinner and Premack had also talked about how behavior tends to move down action chains, as the body is already anticipating the next action.  Some of the behavior in the action chain may be innate and some may be learned.

Dr. Killeen provided a simple example of how our behavior can be influenced by the mode we are in.  This example comes from Tony Higgins who studies human behavior to see the effects of different modes (promotion vs. prevention, approach vs. avoidance) on behavior.

If you are trying to sell someone something, you have to put them in the right mode.

  •  If you want to sell them a yacht, you put them in “adventure” mode by telling them stories that make them want to go out and do something new.
  • If you want to sell them life insurance, you put them in “life is dangerous” mode by sharing stories about people who have died, accidents, etc..

Key Points from the Unified Theory of Connection:

  • Animals approach satisfiers. He shared a number of slides citing research supporting the basic idea that behavior is motivated by approach or withdrawal.  The research included field data and experimental data.
  • Satisfiers are contexts with higher rates of reinforcement/action relevant to their current state, or contexts associated with more attractive actions.
  • Satisfying contexts are those that lead to actions at deeper levels in their Behavioral System. Always look at behavior from an ethological viewpoint.  Sometimes these actions become satisfying in their own right and animals get stuck in them.  We have purposely bred dogs to get “stuck” at some actions (retrievers, pointers, etc.).
  • Behavior is a trajectory through a field of attractors — modules.
  • Conditioned stimuli are signposts on the journey. If they are extrinsic, it’s Pavlovian sign learning, if they are intrinsic/proprioception, it’s Skinnerian act learning.
  • If moving to a better state, CSs function as conditioned reinforcers. If moving to a worse state, they function as conditioned punishers.
  • Many actions are shared by different systems and some are shared by different modes, so an action can have one meaning in one context and a different meaning in another one. A bite can be predatory or sexual. Actions shared between multiple systems or modes can lead to short-circuiting.
  • What gets learned are more efficient routes/actions ways to get to satisfying actions within modules or ways to get to modules that are deeper/more satisfying in their action hierarchy.

Sign Tracking:

As stated above, one of the key points of the Unified Theory of Connection is that animals approach satisfiers. An example of this can be found by looking at sign tracking, which has been found in dozens of species, and shows a tendency to approach and contact signs of reinforcement.

He described an experiment by Hearst and Jenkins (1974) in which they put a pigeon in a long cage with a light on one end and a food hopper on the other.  When the light came on, the pigeon would approach it, which meant the pigeon was actually moving away from the food hopper when the light came on.

But the food hopper was set up so that the food was only available for a short time after the light came on. By the time the pigeon got back to the food hopper (after going to the light), the food would no longer be available. You would think the birds would learn to wait at the food hopper and watch for the light, but they never did. That’s how powerful sign tracking, and therefore the desire to approach, can be.

Role of Affect:

He finished up by looking a little bit at the role of affect (emotions).

  • No matter how we think about stimuli and their settings –
  • We must also know how to feel about them –
  • Affect tells us which action modes to engage, what kind of “readiness” in Thorndike’s terms.
  • Different actions modes are associated with different emotions and they can tell us whether to approach, avoid or kick back: wait it out.

You can think of emotions as the signatures of different behavioral modes. They:

  • differentially prime perception
  • prime motor systems
  • inhibit competing systems
  • tell us what to do, and simultaneously empower that action
  • hold us in relevant modes

And this leaves us with the New Laws of Connection:

  • Approach -> To stimuli that mark transitions/routes down our hierarchy (Pavlovian sign-learning). They are are pleasurable/satisfying or scary; emotion empowers responses relevant to modes
  • Effect -> In similar contexts we approach the actions that gained that improvement (Skinnerian self-learning)
  • Act for Action -> It is access to better actions that constitutes reinforcement (Premack Principle)–Imposition of adverse actions that constitutes punishment.

There were two other presentations that looked specifically at using Premack Principle in training and I was originally going to include them as part of this article. But I think it would be better to write about them separately so they will be in a future article.

Notes from The Art and Science of Animal Training Conference (ORCA): Duration

police-dogThis year (2017) the Art and Science of Animal Training Conference had a theme for each day. On Saturday, the lectures focused on the Premack Principle and how reinforcement works. On Sunday, the focus was on how to effectively maintain behaviors.

As you might expect, there were some interesting connections between different presentations, both across topics and between topics.Because of this, rather than write about each presentation separately, I am going to write a series of articles on some of the “themes,” using information that was presented at the conference, as well as information from other sources if needed for clarification.

This article is going to look at the topic of duration. Duration was discussed in several presentations on Sunday, as part of the larger theme of how to effectively maintain behaviors. Most of the information in this article comes from the following talks:

Steve White: “Training in 3-D: Embracing duration, distance and distraction” – Practical tips on building duration, distance and teaching dogs to work in the presence of distractions. (60 min)

Ken Ramirez: “Teaching Duration Behaviors: Creating lengthy behavior chains” – A look at how to create a lengthy show with minimal food reinforcement. (20 min)

Emily Larlham: “The Show Must Go On: An Investigation into maintenance of behaviors for competitive sports and dogs with jobs“ – How understanding the components of training for duration can help you prepare your dog so you can have success in the show ring, or in a job where a dog might need to perform for an extended period of time. (20 min)

Alexandra Kurland: “Putting Behavior to Work” – Alexandra Kurland shared her thoughts on building duration, both from a practical and philosophical point of view. (60 min)

What is Duration?

In training, duration commonly refers to the amount of time that an animal does one or more behaviors before being released for reinforcement. If I am building duration for a single behavior, it means I am extending the amount of time the animal can maintain the behavior. It can be a static behavior (a down) or a moving behavior (walking).

Duration can also refer to extending the total time spent doing behaviors, with the animal doing several behaviors before being released for reinforcement. Duration over multiple behaviors is usually achieved through some form of chaining, or by teaching the animal to expect to do a sequence of behaviors. In a sequence of behaviors, the animal learns that each behavior will be followed by the cue for the next behavior and learns to go from one behavior to the next with no interruption. The final reinforcement is delivered at the end of the chain or sequence. There may be other reinforcers built into the chain or sequence, but the animal is working toward the final reinforcer.

Duration is an interesting topic because it touches on so many other aspects of training. As Steve points out, duration is closely related to distance and distraction and in order to build duration, you have to pay attention to all three components of your final behavior. Ken pointed out that successful chaining or sequencing of behaviors is based on clear cueing and adequate reinforcement. Emily pointed out that attitude and mental and physical fitness matter. And Alex showed that how we approach and think about training is often a result of the unconscious “frame” under which we are operating.

Steve White’s talk provided a nice framework for a discussion of duration because he provided some history on how duration is traditionally trained and a lot of practical information on how to do it.

This information is taken directly from my notes from his talk:

Steve White – Training in 3-D: Embracing Duration, Distance, and Distraction

Animal trainers need duration for many behaviors and in the older style training, it was taught with compulsion. For example, a dog might be held (physically restrained) in a long down so it would learn to stay there for longer. While this may have worked to teach the dog to stay down for longer, it had some unfortunate side effects in that the dogs were reluctant to do the behavior since it was associated with force.

Alexandra Kurland pointed out that duration is also traditionally taught this way with horses, where the horse is forced to continue the behavior (with a whip if moving), or punished for movement (if the desired behavior is stationary.) While it can be taught with other methods that build the behavior slowly and systematically, what she called the “tortoise” approach, many people are in a hurry and try to build it too quickly, the “hare” approach.

The first step away from that method was using food to encourage the animal to stay in the long down. Ian Dunbar would feed the dog in position and the dogs learned to maintain the behavior. Since then, trainers have learned better ways to shape and maintain the behavior, but the basic progression is still the same, teaching duration, then distance and distraction.

Traditionally training the 3Ds:

  • Duration trained first – build stability and baseline
  • Often can get additional duration as a by-product of distance
  • Original duration work for down was to hold the dog down – lead to poor latency
  • Ian Dunbar – feed dog in down to get duration, he would add the cue too “good down”
  • Then add distance and distraction: distance for tightly constrained kinetic behaviors like a send-away, distraction for loosely constrained kinetic behaviors like detection work

We’ve learned a lot about training duration from research in the laboratory, but understanding behavior in the laboratory does not always translate to being able to train behavior in real life. Even in real life, it’s hard to prepare for every possible scenario, so trainers have to work carefully and plan ahead, while understanding that part of their job (and the dog’s job) is being able to handle unexpected events. Learning how to regroup after them is also an important aspect of training and ideally the dogs become more robust from each new experience, even if it’s stressful.

Why 3-D? (why do we teach duration, distance and distraction?)

    • Life is not a lab: Most of us are training animals that have to work under a lot of different conditions, so we need animals that can handle lots of variability and remain fluent in their jobs.
    • Compromised timeline: He has dogs for 16 weeks so he has to get a lot done. Has to ask questions to find out why he has the dog. Why don’t they want this dog? Does it just need to learn new skills or are there issues? Are there poisoned cues? In Europe, they teach the dogs to bite, bite, bite and then to “not bite,” which creates stress. He prefers to re-shape the behavior, which is quick if the dog already knows the topography, then add a new cue.
    • Limited control: You can’t control everything that happens in a working situation.
    • Multivariate: You always have to be considering multiple variables. In training, you can set up scenarios to work on one aspect at a time, but eventually you need the dog to be able to handle changes in duration, distance, and distraction all at once.
    • Unpredictable (beautifully so): For some things you can develop a training plan and prepare, but you can’t be 100% prepared for everything. Also, have to deal with the fallout (get the dog back to being comfortable). He likes the variability and said it keeps him on his toes.

From a practitioner’s point of view, it’s important to know the science so you can make educated choices in your training strategies, but he feels that training should be about exploring new options and learning to observe, understand and inform.
Trainers who are open to new ideas, and curious about how things work, are the ones who are out there breaking new ground. Steve said he feels it’s more useful to look at the science to explain what we do, not wait for the science to tell us what to do.

In any training, the first question you need to ask is “What does the dog need to learn?”

Once you can answer that, then you can start teaching the behavior. He had a nice photo of some dogs being lowered in harnesses from a helicopter and you could see all the things the dogs had to learn to be comfortable and successful doing that behavior. They had to learn how to orient themselves (hind feet down so they touch first), eyes looking down (to see when they are going to land) and know what to do when they landed (get out of the way). Once they had learned the basic behavior, then they had to learn to deal with increasing distance and distractions.

Training the 3D’s: Evaluate your animal’s response to different stimuli

Before starting to train for duration, distance and distraction, it’s helpful to know how your animal is going to respond to different kinds of stimuli. There may be some general tendencies for different species, but you also need to evaluate each individual so that you can make a training plan that has a realistic starting point and then systematically introduces new stimuli at appropriate times. Some things you may want to consider when evaluating stimuli are:

  • Volume (loudness, size, different meanings)
  • Salience (stands out from environment)
  • Proximity (at what distance does the animal react?)
  • Vector (where it’s coming from)- same object can be perceived differently by the dog depending upon where it’s coming from.
  • Speed (slow vs. fast, how does it affect the dog?
  • Permanence – how long does the stimulus stay present?

You will also want to evaluate each animal to see how it responds to stimuli that are perceived by different senses. While there is some variation from dog to dog, he has found that the following is a pretty typical for most dogs:

  • Visual distractions are the most common -the dog reacts to what it sees
  • Sound is next -the dog orients toward or is startled by sounds
  • Smell is the most engrossing -the biggest problem he has is with dogs that are distracted by smells because they can be unable to move on. This can be a deal breaker for some dogs.
  • Touch is the one with the most propitious response – the dog looks quickly when you tap
  • Taste is captivating
  • Nature vs. nurture – in his view, nurture is going to trump nature so you can train a dog to ignore certain types of stimuli, but some dogs will take longer with different types of stimuli than others.

(note: I think it would be interesting to make this list for horses. I would certainly put visual first and sound second, although that might be reversed if riding in an indoor arena. My horses are very reactive to sounds when they can’t see the source. Are horses distracted by smells? I’m not sure. I’d have to think about the rest as they don’t seem to come into play in training very much.)

With this information (or as much as you can gather) in mind, you can start to work on the 3Ds. As he mentioned earlier, it is usually easiest to start with duration. Once you have some duration, you can introduce distance or distractions, which will build duration as well.

Duration:

He had a graph to illustrate how to build duration. It showed the change in duration over time, starting from the current level of duration and ending at the desired level of duration. The line is the general guideline for how to proceed and the angle will vary depending upon who much time you take and how much duration you need. It can be used as a general guideline for how to proceed.

But you are not going to proceed in a linear manner where each repetition is harder than the last. Instead you are going to “ping-pong” along the vector so that some repetitions are easier than others, but the general trend is toward longer durations. This variability keeps the dog guessing and makes it less likely that the dog will bail out because the work was getting consistently harder or the degree of difficulty was increased too quickly.

Distraction:

I think you need to build a certain amount of duration before you can add distractions, but once you are at that point, you can work on both distractions and you will be building additional duration at the same time. Steve had a nice video that showed a dog learning to hold on a nose target and then hold on the nose target while the handler manipulated or touched the dog.

In the video, the progression was:

  • Teach dog to touch a nose target
  • Teach dog to hold briefly on a nose target
  • Teach dog to hold on a nose target and maintain head positon while she checks his mouth
  • Teach dog to do the same nose touch, but on the wall
  • Teach dog to nose touch on the wall while patted all over by the handler
  • Teach dog to nose touch on wall while she picks up and wiggles the hind end
  • can increase complexity by adding by sending to the dog to the wall (adding distance)

It’s important to remember that what we call a “distraction” is just an inappropriate response to a stimulus. He did mention that he sometimes uses a Keep Going Signal to help the dog learn to maintain a behavior when adding distractions. Whether or not to use a KGS is a subject of much debate, but he does find they can be useful at certain stages in training, especially if you have a limited amount of time to get the training done.

Distance:

Here are some things to keep in mind when working on distance:

    • Fear/anxiety vs. calm/assured are inversely proportional
    • Reinforce appropriately – in place for stability on the target, at or beyond for more active behaviors or when changing behaviors.
    • You are often combining distance and distraction. For example, he showed how they teach a police dog to wait until released to “catch” a person. They start by placing the person (distraction) at a distance and then have him move slowly toward the dog on slightly irregular line. When dog relaxes while observing the person, that’s the time to release the dog. His reinforcement is getting the person.
    • Once the dog learns to remain calm under those circumstances, they can have the person head toward the dog in a more erratic way, taking a longer route, varying the speed and other movements so that the dog is learning to maintain duration with more and more distractions.
    • Steve said it was important to “Wobble, shift and shuffle” or “bounce around the 3Ds” by varying different aspects of the 3Ds so the dog learns to handle changes in them in random order. He recommends starting this early.

What can go wrong? Potential Snags:

    • Anticipation: Check your ABCs (antecedent -> behavior -> consequence), Don’t make assumptions.
    • Repetitive failure: Manipulate single variables (go back to focusing on one at a time if the dog is struggling.)
    • Plateaus: Who is training who? Trainers hit training plateaus when they stay at each step for too long.It’s important to move forward in balance. He used the image of jacking up a house where it’s important to keep jacking up each corner a little bit at a time to keep the house level. With training the 3Ds, you want to keep increasing the difficulty of each one a little bit at a time so the dog learns to handle lots of variations and combinations along the way.

Final thoughts:

  • Life is not a lab
  • Assess distractibility in all 5 senses
  • Wobble
  • Reconcile Perception vs. Reality: Perception drives behavior, not reality – there’s something to think about!

In Steve’s talk, he was speaking about building duration as a process and he had examples of building duration for single behaviors and for chains, as most of the information he provided would apply to both. But putting together longer chains or sequences, and maintaining them, requires some additional skills on the part of the trainer. This was the subject of Ken’s Sunday presentation.

Ken Ramirez – Teaching Duration Behaviors: Creating lengthy behavior chains

Ken Ramirez does a lot of consulting work with animal trainers from facilities around the world, and his presentation was based on some work he did for a group in Cuba. The trainers at the facility wanted to create a dinner show that featured dolphins and swimmers doing underwater synchronized swimming.

Their goal was to have a 20 minute show with no food reinforcement. They felt that feeding the dolphins during the show would interrupt the flow and they had spent time putting together a show by teaching long chains and using tactile reinforcers. But they were having trouble maintaining the behaviors under those conditions and they asked Ken for help.

Ken said it was a particularly challenging project for several reasons. Some of them were due to logistics (no visitation, slow internet) and some of them had more to do with the training itself. It was an ambitious project and the trainers were missing some important information or had misconceptions about how to build strong chains and the amount of reinforcement needed to maintain them.

Some of the training issues that Ken identified were:

  • Not enough positive reinforcement
  • Tactile used as a “reinforcer,” but not effective
  • The strength of the dolphin/swimmer relationships was questionable
  • There were too many different behaviors and cues (added confusion)
  • They were not back chaining effectively
  • The chains were too long. I think the original plan was to do the entire show (140-200 behaviors) as one long chain.
  • Communication among swimmers was not good during the show

Rather than jumping in the middle and trying to “fix” something that was not working well, Ken had them go back to the basics by returning to more frequent reinforcement to build behavior strength again.

This is something Emily Larlham emphasized as well. You need very strong behaviors if you are going to use them in longer duration exercises, and often we don’t put enough time into this. She spoke about making sure you build fitness and overtrain so that the difficulty level in longer duration behaviors is within the dog’s ability.

They also learned how to build shorter and stronger chains and simplified the number of behaviors and cues. In their original training, they had 93 different cued behaviors. He was able to simplify this down to 11 cued behaviors as many of the behaviors were variations on hand and foot targeting with the dolphins either following or pushing the target. Having fewer cued behaviors and one word names for any chains improved the trainer’s ability to communicate with the dolphins and each other during the shows.

One a personal note, with my own horses, I have found that cueing problems often show up when building chains because the horse no longer has the click to confirm the correct response to the cue. This leads to guessing and deterioration in existing behaviors, so cleaning up your cues is very important.

In addition, he found that their understanding of how to use intermittent reinforcement was based on the industry standard norm (common in marine mammal training), but was not actually working for them. He had to teach them about using reinforcement variety and how to evaluate how much reinforcement they really needed and how to provide it. This was where the secondary reinforcers came in, and also why they switched from to using multiple dolphins instead of having the same dolphins do the whole show.

His summary of the work they needed to do included:

  • Return to more frequent reinforcement to build behavior strength again
  • Develop better conditioned reinforcers
  • Establish shorter chains and develop clean backchain protocol
  • Focus on sequences with reinforcement built in, as opposed to fixed chains (animals tend to take short cuts in fixed chains)
  • Use multiple dolphins so they can rotate in and out of the show
  • Clean up behaviors: 93 different cued behaviors (too many!) simplified down to 11

He had a few specific points about how they improved the use of chains and sequences:

  • 3-6 behavior chains (fixed)
  • 12 sequences that could be changed
  • Short name for each chain (so swimmer could call it out)
  • Staggered for reinforcement (I think this meant it was set up so that dolphins got reinforced at different times so the reinforcement was less obvious. It also made it easier to rotate dolphins in and out of the show.

The show went on to be very successful and they won at award at the 2008 IMATA conference. Ken said it was a unique consulting experience and showed, once again, that more effective reinforcement makes a difference.

I found this presentation particularly interesting because most of my ridden work is now done as long chains and sequences and it took me quite a long time to figure out how to do this effectively. I encountered some of the same problems as the dolphin trainers, so it was interesting to see how the same general strategies that worked for me also worked for them.

The topic of duration was continued in the next talk, which was by Emily Larlham. This one also looked at training for performance, but with dogs. I already mentioned that Emily and Ken shared some of the same advice on common points and you’ll see some other similarities in the following notes which are based on the material she presented in her talk.

Emily Larlham: The Show Must Go On

What makes a great performance? What are the qualities that contribute to a great performance? How do you get there?

Emily approached this topic by looking at the trainers she admires, identifying the qualities that make their performances exceptional, and then looking at their training to see how they got there. She identified 5 key concepts:

Key Concepts:

  • The right attitude
  • Strong behaviors
  • Overtraining
  • Working for duration as a concept
  • Preparing for performance

(note: despite the title, this presentation was not just about preparing dogs for competitions or shows. The information she provided is useful to anyone who needs animals to perform for longer periods of time)

The Right Attitude:

When watching performance routines, one of the qualities that makes the difference between an average and a truly exceptional performance is the attitude of the dog and relationship it has with the trainer.

Emily showed some video of freestyle routines that showed the kind of happy, expectant, and joyful attitude that she likes to see in a dog. This comes from a training method that does not rely on physical or psychological intimidation and that minimizes frustration so that the dog is looks forward to training and is an eager and attentive participant.

She had some video that showed teaching some simple behaviors and how important it is to set up your training to avoid frustration. Even something like teaching a dog to follow a lure can be frustrating if not done correctly and trainers need to learn when a dog is showing signs of frustration (before it becomes full-blown) and how to adjust their training so the dog becomes successful again.

Building strong behaviors:

Trainers often underestimate how much time it takes to take a new behavior and build enough reinforcement history and flexibility to make a truly strong behavior. She showed some videos by Maria Brandel and Siv Svendson that showed how they continually build on successful behaviors so the dog becomes confident and learns behaviors to fluency.

One way they do this is by starting new behaviors in environments where the dog is likely to choose to do the behavior anyway, and the behavior’s topography and emotional state are consistent with how they want to use it. For example, they teach the “down” by capturing it when the dog is lying down while they are watching TV or resting. This is a time when the dog is likely to lie down and stay there for longer and allows them to build duration more easily.

In this situation, the dog is also more likely to associate the down with calmly waiting or resting because that is what is normal for a down under those conditions. This idea of using environmental cues or context cues to facilitate learning can make it much easier to train behaviors with the emotional tone that you want.

Overtraining:

Overtraining means planning ahead and setting up your training sessions and exercises so that the effort required in your “performance” is less than what the dog is used to doing during training. It’s a way of ensuring that the dog is physically and mentally prepared for the amount of effort or focus and has a bit of reserve to handle any unexpected changes.

Overtraining could mean paying attention to the dog’s fitness to ensure that the effort required during performance is well within its ability. That doesn’t necessarily mean the dog has to practice more, it may just mean taking the dog for longer walks or spending more time on play. One of her strategies is to do play sessions that are longer than the ring routine duration. If he dog is used to actively moving for 10 minutes in play, then a ring routine which is less than 10 minutes is going to be easier.

Teach working for duration as a concept:

A dog that has learned a lot of individual behaviors may struggle with chains and sequences, and the whole idea of duration over several behaviors because it doesn’t understand duration as a concept. She likes to teach this through the use of backchaining with a release cue.

She had a video of Emmy Simonsen showing how she could cue a behavior and then release the dog to the food dish. Then she could ask for two behaviors before releasing the dog to the food dish. Over time, the dog learned that it would be asked for a variable number of behaviors (that’s Steve White’s ping-ponging) before being released.

Along with this, Emily said it is helpful to teach the dog a lot of little chains, varying the order of behaviors and mixing in reinforcing behaviors so that reinforcing behaviors might occur at the beginning, middle, and end of the chain to keep it reinforcing. Using different markers can be helpful here as you can have some markers that are associated with excited reinforcement delivery and other markers for calm reinforcement delivery. Markers can be behaviors too so a behavior can be used to tell the dog it has responded correctly and tell it what to do next.

Preparing for Performance:

If you’ve done your work well so that your dog has strong behaviors, loves training, and is physically and mentally prepared to give a great performance. What are some other things you can do?

  • Check to make sure your cues are going to be easily perceived by the dog in the performance environment.
  • It can sometimes work well to train visual cues as they are often more salient in noisy environments and can also function as reinforcers
  • Practice without the dog first

Challenges with Building Duration:

Several of the speakers touched on some of the difficulties you may encounter in building duration and I thought it might be useful to end with a little summary, because we all know that understanding how something “should” be trained is not a guarantee that things will go as planned. There’s always some little hiccup in the process and it helps to be prepared with some ideas for what to do when things don’t go as planned.

I have a list of items pulled from the various talks, but before I share it, I want to make a few comments on Alexandra Kurland’s presentation, about which I have said very little. This is for two reasons. One is that Alex’s presentation was less about the details of training duration, and more about the attitude of the trainer when training duration (or any other behavior). The other is that her presentation was based on a blog post she wrote, and I think that any summary I can give would not do it justice. You really need to just go read it. You can find it at https://theclickercenterblog.com/2017/01/08/i-dont-understand-you-you-dont-understand-me-thank-you-donald-trump-for-helping-me-to-understand-why/

The main point of her presentation was that there are many ways to train behaviors and our choices are often based on the “frame” through which we are viewing behavior. The word “frame” comes from some work by George Lakoff who writes extensively about how we view and interpret the world based on information that we have picked up through life experiences. In her blog, she compares the contrasting frames of the “strict father” and the “nurturant parent” and shows how they relate to horse training.

When you choose how to train any behavior, you are viewing behavior (both yours and your learners) through one of these frames, and it will affect how you choose to teach and maintain the behavior. While the thoughts in her blog post apply to training in general, it is interesting to think about them in the context of training duration, because I think this is one place where cross-over trainers (those that started with traditional methods and switched to positive reinforcement) are likely to find it challenging to stay in the positive reinforcement mindset.

This is because we often start to build duration at the point at which we feel the animal “knows” the behavior, and it seems like it should be so simple to get more of it… but it’s not. So, it’s easy to revert back to previously learned techniques to get the animal to maintain it. One could say that when training for duration with your animal, you are also working on training yourself to stay in the positive reinforcement mindset for longer and longer periods of time as well.

Challenges and Solutions:

    • Anticipation – The animal does not meet criteria for duration because it is anticipating the end of the duration behavior or the next cue. To address this, you should check your ABC’s and also make sure you are increasing duration at a realistic rate and with variations (ping-ponging) – (Steve White)
    • Repetitive failure – Go back to a simpler variation and focus on only one variable (Steve White). You may also want to evaluate your reinforcement. Is there enough? (Ken) Is the reinforcement supporting the behavior? (active vs. static) (Steve, Emily)
    • Plateaus – Do you have a training plan to get from A (starting point) to B (ending point) and a way to measure progress so you keep moving? Training can stall out when trainers are not aware of when to move on (Steve, Alex).Note: Alex talks about this in the context of Loopy Training where you should move on when a loop is clean. If you are not familiar with Loopy Training, you can read about it on my website (look under “Alexandra Kurland” in the articles section.
    • Deterioration in behavior – Have you adequately identified the level of distracting stimuli (Steve), checked for adequate reinforcement and clarity of cues (Ken)? Go back and strengthen individual behaviors (Ken, Emily)
    • Insufficient understanding of chains, sequences –Do you and your learner understand chaining? Is there any reinforcement built into the chains? (Ken and Emily) – Go back to shorter chains and include secondary reinforcers if possible.
    • Cue confusion – Are your cues clear? Have you made things too complicated by asking for too many different behaviors or too many similar behaviors? (Ken)
    • Not enough reinforcement (Steve, Ken, Emily, Alex)
    • Secondary reinforcers are not strong enough (Ken) – Using secondary reinforcers requires a good understanding of when and how to use them and how to evaluate them. Barbara Heidenreich’s talk on this subject had many suggestions for how to use non-food reinforcers. I’ll be writing that up in another article.
    • Mental or physical fatigue (Ken, Emily) – Learning to work for longer periods of time takes mental and physical fitness. If your dog loses focus or seems fatigued, you may need more careful preparation with fitness in mind.
    • Distractions – It can be difficult to prepare for all possible distractions, but you should evaluate your animal and train for different types of distractions as much as possible (Steve). Having strong behaviors and a learner that is eager and engaged will work in your favor (Steve, Ken, Emily). You can also use the Premack Principle and change how your animal responds to distractions (Emily’s Saturday talk – more on this in another article).
    • Additional Unwanted Behavior Creeps in: When working toward duration, there is a greater likelihood that other behavior will creep in and become part of the behavior you want. These are called adjunctive behaviors and Paul Adronis gave us a nice introduction to them. I will share my notes on his talk in another article.

I thought I would end this article with a fun video that Steve White shared. You can think of it as a little additional reinforcement for you if you have read through this entire article. He shared it to make the point that learning new skills is always difficult, and that we often underestimate how hard it is to learn something new once we are at the point where we don’t have to think about how we do it.

Thank you to all the speakers for permission to share my notes from their presentations.

Enjoy,