The idea of choice was one of the underlying themes of the conference and is always an important consideration for positive reinforcement based animal trainers. At some level, animal training is about teaching an animal to do the behaviors we want, and to do them when we want them, but there are many different ways to go about getting there.
This conference has always been about exploring how we can achieve our goals, while ensuring that the learning process is enjoyable, the learner is allowed to actively participate in the process, and that he becomes empowered by his new skills and his relationship with his trainer.
There were a lot of presentations that touched on some aspect of choice.
- Dr. Killeen spoke on how understanding the use of the Premack Principle opens up more choices for reinforcement and can lead to a better understanding of how the value of different reinforcers can change depending upon environmental conditions.
- Emily Larlham talked about how we can teach our dogs to make different choices instead of becoming stuck in behavior patterns that create stress for both parties. She also talked about how important attitude was in training and how being allowed to actively participate and make choices contributes to a dog’s enthusiasm for any chosen activity.
- Alexandra Kurland talked about how trainers make choices based on the kind of relationship they want to have with their learner (authoritarian vs. nurturing), and how these decisions influence how much choice they give their animals.
- Barbara Heidenreich provided lots of examples of how to provide choice through more types of reinforcers and a discussion of why it’s important for both the trainer and the learner to have options.
- Dr. Andronis showed what happens when animals have limited choices about when and how to earn reinforcement, and how to recognize behaviors that indicate that the learner is no longer enjoying and engaged in the learning process.
These are just a few of the references to choice that came up in the other presentations, but they show that animal trainers have to think about choice all the time. Sometimes we are looking for ways to increase it. Sometimes we are looking for ways limit it so the animal cannot practice behavior we don’t want. And sometimes we are educating the animal so he learns to make “good” choices.
Since choice is such an important topic, I saved it for last. The notes in this article are based on two presentations that dealt more specifically with choice. They are:
Jesús Rosales-Ruiz – Premack and Freedom
Ken Ramirez – Teaching an animal to say “No.”
They go together nicely because Jesús talked about choice from a more academic point of view and Ken talked about how to use choice as part of training.
Jesús Rosales-Ruiz: “Premack and Freedom.”
He started with a quote from David Premack:
“Reward and Punishment vs. Freedom”
“Only those who lack goods can be rewarded or punished. Only they can be induced to increase their low probability responses to gain goods they lack, or be forced to make low probability responses for goods that do not belong to them. Trapped by contingencies, vulnerable to the control of others, the poor are anything but free.”
This quote puts a little different perspective on the how reinforcement and punishment work and the idea of choice. If you desperately need something and you work to get it, is that really choice? And how are does that relate to reinforcement? We tend to think that by offering reinforcement, we are giving choices, but are we really doing this? Let’s look more closely at reinforcement and then see how it relates to choice.
How we think about reinforcement
The Language of Reinforcement
- reinforcement as a stimulus (thing)
- reinforcement as a behavior (activity)
Originally, it was more common to think of reinforcement as being about getting objects such as food, a toy or other item. But, in Dr. Killeen’s talk, he spoke about the importance of recognizing that reinforcers are behaviors, an idea that came from Dr. Premack. There are some advantages to thinking of behaviors as reinforcers, because this change in thinking opens up the possibility for more types of reinforcers, and also makes it more obvious that the value of a reinforcer is variable.
Not all behaviors are going to be reinforcing at all times, and in some cases, there is reversibility between reinforcers and punishers. Dr. Premack did experiments showing the reversibility of reinforcers. He could set up contingencies that would make lever pressing, running and drinking all reinforcers (at different times), and he did this by adjusting the environment (adding constraints) so that access to some activities was contingent on other activities.
Want a drink? You have to run first. Running is reinforced by drinking. Want to run? You have to have a drink first. Drinking is reinforced by access to running. He showed that behaviors can be either punishers or reinforcers, depending upon the circumstances. So, perhaps the availability of reinforcers is not what defines choice.
What about freedom? Is choice about having freedom?
There are different ways we can think about freedom:
- Freedom from aversive stimulation
- Freedom to do what we want (positive reinforcement)
- More generally (freedom from control)
How do these ideas about freedom apply to animal training? And can they help us understand more about choice in animal training?
Clicker trainers meet all three of these “definitions,” to some degree. Jesus went over this pretty quickly but it’s quite easy to see how clicker training can contribute to an animal’s sense of freedom or being able to make choices. Clicker trainers avoid using aversives by shaping behavior using positive reinforcement. They also avoid using aversives when the animal gives the “wrong” answer or does an unwanted behavior.
Clicker trainers don’t necessarily give the animal freedom to do whatever it wants, but they do use positive reinforcement, and over time positively reinforced behaviors do often become what the animal wants to do. Also, during shaping, the trainer may want the animal to offer a variety of behaviors, so there is some element of choice there.
They may also choose to train behaviors or set up training so the animal has more control of its own training. A lot of clicker trainers focus on training behaviors that the animal can use to communicate with the trainer so this can give the animal some feeling of control. We are still focusing on what we want them to do, but we are doing it in such a way that they feel they have more control.
At this point, Jesús mentioned that he doesn’t believe that total freedom exists. If it did, then science would not exist because there would be no “laws.” Our behavior is always determined by something and while we may think we can control it, what we are really after is the feeling that we can control it, which may or may not be true.
I have to confess that at this point I found myself remembering endless college discussions about “free will” and whether or not it exists. I don’t think we need to go there, but I do think that it’s important to realize that it may sometimes be more accurate to say that our goal is for the animal to have the perception of control. Interestingly enough, one of the points that Steve White made was that perception drives behavior, not reality.
Let’s look at how we can control behavior in animal training:
With the idea of freedom and choice in mind, let’s look at four different ways to control behavior in animal training.
1. Control only through aversives:
(note: I added this one for completeness. Jesús referred to it but did not list it on his slides)
- target behavior occurs -> animal is left alone or aversive ceases
- target behavior absent -> aversive consequences
- the animal has no control, choices or freedom
2. Control through aversives, but with some added positive reinforcement
- target behavior occurs -> positive consequences
- target behavior absent -> aversive consequences
- the animal can gain reinforcement, but it still does not have a choice and cues can become “poisoned” because the cue can be followed by either a positive or negative consequence, depending upon how the animal responds.
3. Control through positive reinforcement
- target behavior occurs -> positive consequences
- target behavior absent -> no consequences
- the animal can now “choose” whether or not it wants to gain reinforcement, without having to worry about aversive consequences for some choices. But is it really choice if one option earns reinforcement and the other does not?
4. Control through positive reinforcement with choices
- target behavior occurs -> positive consequences
- target behavior absent -> other positive consequences
- animal always has another option for a way to earn reinforcement, so there is true choice between two options that both lead to reinforcement.
Jesús shared a couple of videos that showed animals making choices. He did not spend a lot of time on these, so I can only provide a brief description and the point he was trying to make.
The first video was of a training session with a dog using clicker training and food. The dog is loose and can participate or not. As soon as the trainer starts clicking and treating, the dog leaves. He followed that video with another one where the dog is trained with food alone (no clicker). In this case, the dog stays and continues doing behaviors for reinforcement. He said this was an example of a situation where the clicker itself was associated with “hard work” so when the clicker came out, the dog would leave. He didn’t go into more details on the dog’s previous training history but those clips do suggest that the clicker itself can become an aversive stimulus.
The second video showed a horse being taught using clicker training and food. The horse is loose and the trainer is sitting in a chair. Throughout the training session, the horse is eagerly offering a behavior, which the trainer clicks and reinforces with food. But, because the trainer is feeding small pellets of grain, some of the food is falling on the ground around the horse. Despite the abundance of food on the ground, the horse prefers to keep offering behavior, and getting reinforced for it, rather than just eating the food off the ground. Jesús said this was an example of “real choice” because the same food reinforcement was available whether the horse did the behavior or not.
So far we have looked at how reinforcement and freedom contribute to the idea of providing choice for animals. But there’s another consideration, and that’s hidden within the idea of repertoire size.
Restraint, constraint and the effect of repertoire size
Positive reinforcement trainers are usually very aware of the effect of restraint on animals and try to train under conditions in which the animal is not physically restrained. They are also aware of the effect of constraint, but it seems to get less attention. Constraint is when we control the “goodies” or limit the animal’s ways to earn them.
Jesús said it was important to avoid constraint in training. Constraint can be physical, which means that the animal is in an environment where it might be “free” to move, but there are very few options for things to do. A Skinner box is an example of an environment where the animal is constrained because the number of behaviors it can do is quite limited.
But, constraint is not always physical. It can also be more about skills or the repertoire of behaviors that are available to an individual. You could say this is a kind of “mental” constraint where the individual feels it has few options because it is only comfortable doing a few things.
He used some human examples to illustrate this. For example, a person who has several skills has more freedom from constraint than someone who is only good at one thing. If you are good at debating, dancing, and social interactions, then you can go to a debate, dance or eat lunch with your friends. If you are only good at debating (even if you’re really good at it), but you lack other skills, especially social ones that are important for many activities, then you are constrained by your own repertoire.
He called this being coerced by available behavior, because your options are limited if you have a limited repertoire. At the end of the presentation, Joe Layng made the comment that “feeling” free is actually being free. If you only have one way of getting the extraneous consequence, then you are still limited. Joe’s comment reminded me of Steve’s point that perception, not reality, drives behavior. It’s always interesting to see how these things come together.
So, one way to limit constraint and increase choices in animal training is to increase the animal’s behavioral repertoire. This gives them more choices on several different levels because they have more options for reinforceable behavior overall, and it also may make it possible for you to give them more options at any given time.
Choice is not just about providing reinforcement or about removing aversives. It’s about providing the animal with opportunities to earn reinforcement in many ways and increasing the animal’s repertoire so it has the skills and opportunities to practice many different behaviors.
- A small repertoire = more constraint
- limited skills = limited opportunities
If our goal is to increase freedom, then we need to be aware that individuals can be constrained by the available environment and available behavior.
Jesús ended with the question, “If my dog will only walk with me when I have treats, why is that?”
That’s kind of a loaded question and if I didn’t know Jesús was in favor if using treats, I might think he was suggesting that using treats was a problem.
But I don’t think he was saying that at all. I think he was just encouraging the audience to think about what it means if your dog will only walk with you if you have food. What does that say about the choices he is making? Does it tell you anything about how much you might be using food to limit his choices, not to give him more choices?
At the end of Jesus’s presentation, I found myself pondering the practical application of the material he presented. Yes, I love the idea of giving animal’s choices and I know from personal experience that adding reinforcement is not the same as giving choices. But, I was thinking hard about what training would look like if several behaviors were all capable of receiving the same amount of reinforcement. The whole idea of clicker training is that we can select out and shape behavior by using differential reinforcement.
So, what would happen if you had several behaviors that could earn equal reinforcement? Well, lucky for me, Ken Ramirez’s presentation later in the day was on this exact topic.
Ken Ramirez: Teaching an animal to say “No”
In his presentation, Ken shared some training that he did with a beluga whale who had become reluctant to participate in training sessions. He started by saying that while his talk is titled, “Teaching an animal to say ‘No,'” he realizes that that phrase is just a convenient way to describe what they did, and is not necessarily how the animal perceived the training.
He spent a little time talking about terms like “no” and “choice.” They are labels we give to ideas so we can talk about them, but that’s not useful unless we make sure we are all using them in the same way, or have a common reference point. He shared what he means by teaching “no,” choice, and how the two are related.
What is “no”?
- Teaching “no” means teaching another reinforceable behavior, one that the animal can choose to do instead of the behavior that has been cued. In the example he’s going to share, they taught the whale that she could always touch a target for reinforcement.
- Teaching “no” is different than teaching intelligent disobedience, which is more about teaching the animal that some cues override other cues. It’s also different than a Go, No-Go paradigm (hearing test where you don’t respond if you don’t hear the tone), or an “all clear” in scent detection which is just a new contextual cue.
- We can only guess why the animal chooses to say “no.” When the whale did touch the target, they had no way of knowing if she was doing it because it was easier, had a stronger reinforcement history, she didn’t want to do the other cued behavior, or …
- But, regardless of why she chose it, the value was that it gave her another option besides responding to the cue or ignoring the cue. Tracking her “no” responses had the added benefit of allowing the trainer to gather information about her preferences and whether or not there was any pattern to when she chose to say “no.”
What is choice?
(these points should be familiar, as they are very similar to what Jesús discussed)
- It is hard to define
- Arguably, no situation every provides true choice (there are always consequences)
- In “true choice,” the animal has the option to receive reinforcement in more than one manner (this goes along with Jesús’s point about true choice being where there are multiple ways to earn the same reinforcement).
- It is about controlling outcomes
- Real choice is rare
- Choice is often forced (meaning it is limited, or only one option has a positive consequence)
- Choice is a primary reinforcer (animals can be reinforced by the opportunity to control their own environment)
Choice in animal training: a little history
The introduction of positive reinforcement training into zoos and other animal care facilities made it possible for trainers to choose training strategies that allowed their animals more choices. In the beginning, it may just have been giving the animal a choice between earning reinforcement or not, but over time the training has gotten more sophisticated so that animals have more choices and can actively choose their level of involvement in training sessions.
Ken had some video clips that showed the ways that trainers in zoos can provide choice during husbandry behaviors. One common practice is to teach stationing, which can be used to teach animals to stay in a specific location for husbandry behaviors. The animal can choose whether or not to participate by either going to the station, or not.
Another option is to teach the animal an “I’m ready” behavior, which the animal offers when it is ready to start or continue. The trainer does not start until the animal offers the behavior, and she may pause and wait for the animal to offer it again, at appropriate intervals during the session, to make sure the animal is ready to continue. Some common “I’m ready” behaviors are chin rests, targeting, the bucket game (Chirag Patel), and stationing. These methods give the animal some choice because the animal is taught a specific way to tell the trainer whether he wants to participate or not.
Teaching stationing and “I’m ready” behaviors are examples of ways that trainers can give their animals more choices. Teaching these kinds of behaviors usually leads to training that is more comfortable and relaxed for both the trainer and the learner. A side benefit is that the trainers become much more skilled at observing their animals and paying attention to body language. And, they learn to wait until the animal is ready, which is always a good thing!
Husbandry behaviors can be unpleasant, but allowing the animal to control the timing and pace of the training can make a big difference in how the animal feels about what needs to be done. However, this may still be quite different than providing “true choice.”
So, what does “true choice” look like? This was the main part of Ken’s presentation.
The “No” Project:
The “no” project is the story of re-training Kayavak, a young beluga whale. Kayavak was born at the Shedd Aquarium and has been trained for her entire life (5+ years) using positive reinforcement. During that time, she developed strong relationships with several trainers and she would work both for food and for secondary reinforcers, especially tongue tickles. She was easy to handle and responded well (to criteria and with fluency) to her cues.
In fact, she was so agreeable that they often let the younger and more inexperienced trainers work with her. But, as she started to have more training sessions with less experienced trainers, and less with more advanced trainers, her behavior started to change. She became less reliable about responding to cues and was likely to just swim away, especially if they were working on medical behaviors.
This continued for several years until the “problem” was finally brought to Ken’s attention. By this time the staff was very frustrated and they needed to find a solution. Ken said that part of the reason it took so long for the problem to get to him was that she was handled by different trainers and it took a while for a clear pattern to appear.
Of course, the first thing one wonders is “What happened?” Looking back, Ken’s best guess is that the change in her behavior was the result of many small mistakes that accumulated over time. None of them were significant events, but added up, they undermined her confidence and made her reluctant to participate in training sessions.
Here are some of the contributing factors:
- She was trained more and more often by young trainers without strong relationships
- They misread early, mild signs of frustration, and didn’t adjust
- They used a LRS (least responsive scenario) inappropriately and it became long enough that it was more of a TO (Time Out.)
- They felt pressure to get behavior and asked for it again after refusal, instead of asking for another behavior, taking a break, or one of many other options.
- The problem exacerbated over time and she discriminated against less experienced or unknown trainers.
Ken proposed a unique solution. He felt that Kayavak needed to have a way to say “no.” He thought she might be feeling as if she didn’t have any choices, and that was why she would just leave. But, if she had a way to say “no,” and got reinforced for it, perhaps she would choose to remain and would learn to become more engaged in training again.
He suggested that they teach her to touch a target (a buoy tethered by the pool edge). Touching the target would ALWAYS be a reinforceable behavior, and would be reinforced in the same way as other behaviors. This was an important point. They didn’t reinforce targeting with a lesser value reinforcer. They made sure the reinforcement for targeting was equivalent to the reinforcement for other behaviors.
While teaching “no,” might seem like a radical idea, Ken mentioned that he had a few reasons he thought this might work. One was some training he had seen at another aquarium where a sea lion was taught to touch a target at the end of his session so the trainer could leave. The sea lion learned to do this, but also started touching the target at other times, and seemed to be using it to indicate when he wanted to be done.
The challenge was convincing the staff to try it. They doubted it would work, because why would she do any of the “harder” behaviors if she could just touch the target all the time? This was a good question, and gets to the heart of what some clicker trainers believe, which is that animals like doing behaviors that have been trained with positive reinforcement. But, if this is really true, then shouldn’t she be just as happy to do a variety of positive reinforcement trained behaviors instead of just repeating the same one over and over?
Since Ken is the boss, he convinced them to give it a try…
- Place a buoy close to where she is working and teach her to target it.
- Practice targeting the buoy until it’s a very strong behavior.
- Start mixing in some easy behaviors, but ask for targeting the buoy in between. So they might cue behavior 1 -> click -> reinforce -> target buoy -> click -> reinforce -> behavior 1 (or 2) -> click -> reinforce -> target buoy ->…
- Increase the number of other behaviors and/or difficulty, still mixing in targeting the buoy on a regular basis.
- Throughout this process, she can touch the buoy as often as she likes. So if the trainer wants to alternate buoy touches with cued behaviors, and Kayavak offers several buoy touches in a row, she still gets clicked for all of them. If she makes an error, doesn’t get clicked and then touches the buoy, she gets clicked and reinforced. A buoy touch is always a reinforceable option.
Evolution of a behavior
Ken showed us how the training progressed. He had some video of her at the various stages and some great charts to show how the number of buoy touches changed over time. I thought this part was really fascinating because it showed how important it was to allow her to find her own way to use “no” and how challenging it was for the trainers to stick with her through the process!
After 3 weeks:
- She touched it all the time, after every cue.
- The staff thought her behavior meant that it wasn’t going to work.
At 4 weeks:
- She started to work well and only chose the buoy under specific conditions, such as when ill, when wrong or no marker heard, when asked to do a medical behavior, with a new trainer, or if working with a trainer with whom she didn’t have a good relationship.
- He had a clip of a training session with a trainer she didn’t like. She would touch the buoy repeatedly, usually doing it before the trainer had time to cue another behavior.
- I asked Ken if they kept the session length the same, even if all she wanted to do was touch the buoy and he said “yes.” That was partly because her training session is how she gets fed, but also because touching the buoy repeatedly was not “wrong.” If that’s what she felt like doing that day, that was fine.
- With a trainer she trusted, she might not touch the buoy if she didn’t want to do a behavior, but would wait for the next cue instead.
At 4 months:
- There were almost no refusals with experienced staff.
- She still tests new trainers for a period of time.
- They did use the buoy during free shaping, but she rarely touched it. If she did, it was a sign that the trainer was not slicing the behavior finely enough.
- They can use the buoy to test if she likes a behavior – which one does she do?
- He had some nice charts showing how her behavior changed (#right answers, buoy touches, refusals). They showed how she would test new trainers and then over time, the “no” behavior would get offered less and less.
Is this a useful approach?
Since doing this training with Kayavak, Ken has done the same thing with a sea lion and two dogs. They were all cases where the animal had lost confidence in the training. Having a default behavior that was always reinforceable meant they always had a way to earn reinforcement and it gave them choices.
He did find that, as with Kayavak, once the “no” behavior had been learned, the animals were fairly discriminating in when they used it. They might offer “no” instead of doing a cued behavior if the cued behavior was difficult, uncomfortable, or unknown. They also might offer it after an error.
Despite his success, he’s not sure you should use it with all animals. Usually if an animal is trained with positive reinforcement, it already has lots of ways to say “no,” so it’s not necessary to teach another one. It may be more useful to work on your own training or observation skills so you notice the first signs of frustration and can adjust before the animal reaches the point where it needs to say “no.”
There may also be difficulties if you teach it too early because the animal might get “stuck” on that behavior. This point made me think of Jesus and his comments about the danger of having a limited repertoire. Ken thinks it’s better to teach the animal a larger repertoire and then add a “no” behavior if needed, either because the relationship has broken down, or the animal has lost confidence. If you do teach a “no” behavior, it’s important to choose an appropriate one, either one that is useful or is a starting point for other behaviors.
I enjoy Ken’s presentations because he always has the coolest projects and approaches them with a great blend of practical and scientific knowledge. At some point in his presentation, he mentioned that the “no” project brought together a lot of scientific principles, including matching law, contra-freeloading, Premack, and others. But he also said that he used what he had learned from observing other trainers, or observing the animals themselves. I think this project was a great example of how we can give animals more choices as long as we have a well thought out plan and are willing to take the time to see it through.
This is the last of the articles I am planning on writing on the ASAT conference. I have lots of ideas for what to do with what I learned from the conference, and may blog about some of my own training later this summer. In the meantime, I hope something in these articles has caught your attention and inspired you to go out and try something new. I want to end by thanking all the speakers for their permission to share my notes. I also want to thank all the ORCA students who work hard to put to plan and run the conference. They are already busy planning the conference for next year and it will take place on March 24-25, 2018 in Irving, Texas.