equine clicker training

using precision and positive reinforcement to teach horses and people

Operant Conditioning: An introduction for clicker trainers

This article is taken directly from my book, Teaching Horses with Positive Reinforcement.

Where does behavior come from? What can we do to change behavior? Why does my horse do that? Anyone who has spent time with horses has certainly puzzled over their behavior. Some of it might have made you laugh, and some might have had you pulling out your hair. I don’t think it’s possible to spend time around animals without becoming curious about why they do things, and how we can get them to do more of the things we want, as well as fewer of the things we don’t want. In this chapter, we are going to take a closer look at behavior and the factors that influence it.
I know that some people are more interested in science than others and I can understand if you read the chapter title with a little bit of apprehension. I even considered leaving this chapter out, because you can certainly learn to clicker train without learning about the quadrants or the difference between operant and classical conditioning.

But, I truly believe that you will be a better trainer if you have a basic understanding of these topics. So, I decided to include the information and present it as simply as I could. If you start reading and your eyes glaze over, then you can skip this chapter, continue with the book, and come back to it later. My guess is that, if you continue training horses, there will be a point at which this information becomes relevant and you will want to learn more about it.

I am going to focus on the two main ways in which individuals learn; operant conditioning and classical conditioning. One thing to keep in mind is that the principles described here are universal and apply whether you are clicker training or using some other method. Understanding them will not only make you into a better clicker trainer, but it will also help you recognize what other trainers are doing. Instead of depending on the description they choose to provide about what they are doing, you will be able to see what is really happening.

Before we jump into the technical stuff, let’s start with a useful summary of some of the most important concepts that trainers need to understand. This list comes from Dr. Susan Friedman, who has a great way of presenting information in understandable terms and paring it down to the bare essentials.

In her article, “10 Things Your Parrot Wants You to Know About Behavior,” she shares some critical fundamentals about behavior and training. I think her list is a good starting point for novice trainers who are just learning how to look at behavior from a more analytical point of view. It’s also a useful list for more experienced trainers who may need a few reminders. These apply to any species, and Susan has given me permission to replace the word “parrot” with “horse.”

  1. Behavior is what a horse does, under certain conditions; behavior is not what a horse is.
  2. Every behavior serves a purpose for your horse; the purpose is the consequence the behavior produces.
  3. Horses naturally choose the behavior that yields the most positive consequences.
  4. Every horse is an individual and has a personal point of view about what consequences motivate him or her to behave.
  5. To learn what motivates your horse, carefully observe favorite items, activities, and people.
  6. Increase your horse’s good behavior by delivering positive reinforcers immediately and consistently.
  7. The bad news is you can unintentionally reinforce problem behaviors too.
  8. To avoid problem behaviors, arrange the environment to make the right behavior easier and more effective than the wrong behavior.
  9. Reinforce small improvements toward the final behavior goal.
  10. You get what you reinforce, so catch your horse being good.

With that big picture in mind, let’s move on to looking at how animals learn and how trainers can influence their behavior.


Operant Conditioning


When I first wrote my website, I included a definition of clicker training and a small section on how to use it with horses. I wanted people to be able to get a quick idea of what clicker training for horses is all about, but I didn’t get into details about the science behind it, except to mention the use of positive reinforcement. Since then, I have learned more about behavior analysis (the science of learning and behavior) and how trainers can influence an individual’s behavior. This understanding has improved my training in many ways, both in teaching new behaviors and helping me to analyze and troubleshoot training issues. Therefore, I wanted to include a section on the science behind clicker training as part of this book.

This is a complicated subject and may be more than some people are ready to absorb right now. So, I’m going to keep things very simple. We’re going to start with operant conditioning because that is the type of learning that most people associate with clicker training, but we’ll also look at some related procedures. If you want to read more on this subject, I have included some additional resources in the Notes section at the end of this book. It took me a few years to understand and remember the basics of operant conditioning and apply that knowledge to what I was doing on anything more than a superficial level. But that was ok. Every time I read about it, it made more sense, and eventually I got to the point where not only could I understand the science, but it seemed useful and relevant.

Learning from experience


In the article, An Introduction to Clicker Training for Horses, in Chapter One, I wrote that clicker training is based on operant conditioning as described by B.F. Skinner. The science of operant conditioning explains how animals learn from experience and tells us how we can use consequences to change behavior. This is described by two basic principles. The first one is that behaviors that are followed by desirable consequences are more likely to occur in the future, and the second one is that behaviors that are followed by undesirable consequences are less likely to occur in the future.

That information is useful as a general description of how operant conditioning works, but in the interest of scientific accuracy, I went searching for a relatively jargon-free definition and found it in a textbook, Paul Chance’s Learning and Behavior. He defines operant conditioning as follows:

These procedures, whereby behavior is strengthened or weakened by its consequences, became known as operant learning because the behavior can be said to operate on the environment.


Does this add to what we know? I think it does, because it states that behavior “operates” on the environment, and operating is an active process. This is particularly relevant to clicker trainers because it means that the animal learns that he can change his own behavior to produce different consequences. This will become important when we look at training strategies and see how we can choose different training strategies depending upon how actively we want the animal to offer behavior that will lead to desired consequences.

The field of behavior analysis defines four types of operant learning, often referred to as the “quadrants” because they are commonly depicted as four squares in a grid. Unfortunately, the terminology used by behavior analysts is a little confusing, in part because B.F. Skinner chose terms that were already in common use, but that had slightly different connotations. Note that in behavior analysis, behaviors are reinforced or punished, not individuals.

Here is some clarification on some of the terms are used in behavior analysis:

  • Positive/negative: These merely refer to whether something has been added (positive) or removed (negative) after the occurrence of the behavior. Positive does not imply “good” and negative does not imply “bad.”
  • Reinforcement/punishment: These merely refer to whether a behavior increases (reinforcement) or decreases (punishment) in frequency in the future, as a result of the consequence.
  • Stimulus: A stimulus is “any event that affects or is capable of affecting behavior.”
  • Reinforcers/punishers: These are stimuli that are effective at changing behavior. We’ll discuss reinforcers in more detail later in this chapter.


The quadrants of operant learning


Positive reinforcement: The ADDITION of a stimulus immediately after the behavior INCREASES the frequency of the behavior in the future – it is more likely to happen again. This definition is consistent with how most people view positive reinforcement. If I want my friend to call me again, I reinforce that behavior with something she likes. If my boss gives me a bonus for doing a good job on a project, and I am more likely to work hard on a project again, then working hard has been positively reinforced.


Negative reinforcement: the REMOVAL of a stimulus immediately after the behavior INCREASES the future frequency of the behavior – it is more likely to happen again. This one is a little confusing because the word “negative” is often associated with
unpleasant things, but it just means something is removed. If I have a stone in my shoe and taking it out makes me feel better, then I am more likely to stop and remove the stone the next time I feel one in my shoe. The behavior of removing the stone has been negatively reinforced.


Positive punishment: the ADDITION of a stimulus immediately after the behavior DECREASES the future frequency of the behavior – it is less likely to happen again. This is how most of us would define punishment, but the word positive makes it confusing. Just remember positive means something is added. If I am speeding and get a ticket, which leads to a change in my behavior so that I drive more slowly in the future, then the behavior of speeding has been positively punished.


Negative punishment: the REMOVAL of a stimulus immediately after the behavior DECREASES the future frequency of the behavior – it is less likely to happen again. This definition can also be confusing because negative punishment seems redundant. Just remember that negative means taking something away. If I am late to work, my boss cuts my pay, and I start coming to work on time, then the behavior of being late has been negatively punished.
A simple way to show the quadrants is by using a table or grid. Here’s a version with some simple examples. I’ve also included the shorthand abbreviation (R or P) for each category.

A few other points about the quadrants

If I interrupt a behavior, is that considered operant learning? Whether or not operant learning has occurred is defined by a change in the frequency of the behavior in the future. If I interrupt a behavior by adding an unpleasant stimulus (an aversive), the behavior may stop, but I don’t know if operant learning has occurred until I see if the behavior is less likely to happen again in the future. For example, if I yell at a horse and he stops pawing, have I used positive punishment? The answer is that I don’t know until I see if he paws less often in the future.

Operant learning does not always occur. If I add or remove a stimulus and there’s no change in behavior, then operant learning has not occurred. That does not mean that the stimulus was not relevant in the past, but it does mean that there is no new operant learning.

The value of reinforcers and punishers can and will change. Their value can be influenced by the animal’s current physical or emotional state, his past reinforcement history, his level of satiation, what other reinforcers are available, as well as many other factors. A hungry horse may be more motivated by food; a horse who has been standing for a long time may be more motivated by movement; a horse who is cold may be more motivated by warmth. A wise trainer is constantly evaluating what is reinforcing to an animal and what is not, and while she might start with some assumptions, being observant and flexible is important

Analyzing which quadrant you’re using can be difficult. Dividing operant conditioning up into four types is convenient for us and makes it easier to explain for educational purposes, but it’s not that simple in real life. A change in behavior may be the result of multiple stimuli, and we may have to very precisely define the behavior and the consequences in order to isolate (perhaps artificially) the event we want to describe.

It matters how you define the behavior. For example, if my horse is standing still, takes a step forward, and I click and treat, did I reinforce movement by clicking and treating? Or punish standing by withholding the click? It may also matter how you define the consequence. For example, if I put on my coat because I’m cold, is the behavior (putting on the coat) negatively reinforced because I removed cold, or positively reinforced because I added warmth?

What quadrants do clicker trainers use?


It would be possible to spend a lot of time discussing the quadrants, and trainers do get into endless discussions about which quadrant is being used, which quadrants should be used, whether every quadrant can be used by clicker trainers, and so on. However, for practical purposes, I’m just going to make a few comments here and then move on. If you are interested in theory, I want to encourage you to do some additional reading to learn more. To help you get started, I’ve included a list of some resources (links, books, DVDs) on operant conditioning and behavior analysis in the Notes section.

Even though there are some differences of opinion about the use of the quadrants, clicker trainers rely primarily on the use of positive reinforcement to teach and maintain behavior. Some trainers do use negative reinforcement – applied very carefully. Most clicker trainers avoid using positive punishment but may use negative punishment under certain conditions. If you started as a traditional horse trainer, this shift away from negative reinforcement (pressure and release) and punishment-based methods requires a lot of mental effort and the acquisition of new skills. Even new trainers with no previous training experience may find it challenging because of the cultural norms that surround us.

Therefore, in this book, I am going to focus on teaching you how to use positive reinforcement to teach and maintain new behaviors, and how to minimize your use of positive punishment. These are the two things that are most difficult for new trainers, and a solid understanding of how to do them will provide a good foundation for future training.

So, where does that leave negative reinforcement? Do clicker trainers use negative reinforcement? The answer is that some do, and some don’t. If it is used, it is usually used in combination with positive reinforcement, not on its own, and it is used very carefully. Because I think combining positive and negative reinforcement is an advanced topic, I’m not going to spend a lot of time on it in this book. I have written a few comments below and the subject will come up here and there as part of other discussions, so I hope that by the end of the book, you’ll have a better understanding of how and when it might be appropriate to use negative reinforcement.

Why is negative reinforcement problematic? It is reinforcement, not punishment, and it seems like it would combine well with positive reinforcement because you could reinforce a behavior with both negative and positive reinforcers. These are good points, but it turns out that it’s not that simple.

Possible problems with using negative reinforcement

The definition implies the use of an aversive: The first “problem” is hidden in the definition of negative reinforcement, which says that a behavior is negatively reinforced if it increases in frequency after the removal of a stimulus. What does this really mean? Let’s break the definition up into parts.

The first part is that I am reinforcing behavior – making it more likely to happen. The second part is that I am removing something. What kind of stimulus would I need to remove to make a behavior more likely to happen again? Well, usually it has to be something unpleasant (an aversive), otherwise the horse would not change his behavior in order to have it removed.

Ok, so far this sounds good. I am taking away something unpleasant so that a behavior will increase. However, I can’t remove something that is not present in the first place. How does it get there? It could be a stimulus that is naturally occurring in the environment, but, in a training situation, it is more likely that it is something the trainer adds.

This is where it becomes tricky – because as soon as I add an aversive, that brings up the question of whether I am using positive punishment first, before using negative reinforcement. Remember, that in positive punishment, I change behavior by adding a stimulus that decreases the behavior in the future. So, we get into this gray area, where it’s not clear if you can use negative reinforcement without using positive punishment, or if the two are inherently linked together.

Because there is the possibility that the horse is experiencing punishment, some trainers will avoid using any type of negative reinforcement at all. It seems better to avoid it than to risk the emotional and behavioral effects of using punishment. Punishment will be discussed in more detail in the next article.

Some horses respond poorly to negative reinforcement: The effectiveness of any quadrant is going to be influenced by the animal’s past learning history. A horse that has been physically manipulated or punished with aversives in the past, or is naturally very sensitive, is going to find some types of stimuli to be aversive. These horses do not respond well to negative reinforcement and it’s better to use positive reinforcement alone.

Poisoned cues: This is a term that was first used in 2008 to describe a behavior that was trained with a combination of positive and negative reinforcement. The first time I heard the term was in a presentation at ClickerExpo by Dr. Jesús Rosales-Ruiz. Dr. Jesús Rosales-Ruiz is a professor at the University of North Texas. His work includes using behavior analysis with animals as well as with humans. He is a member of the ClickerExpo faculty and is also actively involved in the Art and Science of Animal Training Conference and the student group The Organization for Reinforcement Contingencies with Animals (ORCA.)

At ClickerExpo, Dr. Rosales-Ruiz described an experiment where a dog was trained to do the same behavior two different ways. In the first version, the dog was shaped using positive reinforcement, with food as a reinforcer, to come to the trainer. In the second version, the same dog was taught to come to the trainer with a combination of negative and positive reinforcement. The trainer would pull on the leash to ask the dog to come closer and then feed a treat after the dog arrived.

When the dog had learned to come to the handler under both conditions, the two behaviors were compared. There were significant differences in both the quality of the behavior and the dog’s emotional state. The dog who had been trained with positive reinforcement alone came eagerly toward the trainer with “happy” body language including an alert and energetic attitude, bright eyes, and a wagging tail. The dog who had been trained with both positive and negative reinforcement moved more slowly and had a depressed posture with a drooping tail, lower head, and generally subdued demeanor.

This led to the term “poisoned cue” which is the name for a cue associated with a behavior that has been trained with both negative and positive reinforcement. The word “poisoned” is used to indicate the effect an aversive (the leash pull) has on a behavior, compared to the behavior trained with positive reinforcement alone. The reason it’s referred to as a poisoned cue, not a poisoned behavior, is that it is the cue that becomes associated with the aversive, not the behavior. You can, of course, create unwanted associations between behaviors and aversives, but that’s a different topic. If you want to read more about poisoned cues, you can find some resources in the Notes section.

Does this mean we shouldn’t use negative reinforcement?

Does this mean that it’s impossible to combine negative and positive reinforcement without there being some fallout? No, but I think that it does require additional skills on the part of the trainer and may not be appropriate for all situations. When I started clicker training, some of the information I found included instructions for how to train behaviors using a combination of negative and positive reinforcement. Sometimes that worked well, sometimes it didn’t. The success of that approach depended a lot on the behavior being trained, the individual animal, and the trainer’s skill with both positive and negative reinforcement.

Because it was common to have varied results when combining positive and negative reinforcement, some trainers started to focus more on an all positive reinforcement approach, while others looked at how to make negative reinforcement more clicker compatible. Trainers who did choose to continue to use negative reinforcement learned from the research on poisoned cues and started exploring the fine line between negative reinforcement as information and negative reinforcement that relies on aversives.

Developing a better understanding of how to train with a combination of positive and negative reinforcement was very important for these trainers, because many of them were teaching animals in applications where pressure is an inherent part of the animal’s job. An example of this would be guide dog work where physical contact is one of the ways that the handler and the dog communicate with each other. It is also important for riding where the rider is going to be using contact as information, whether she intends to do so or not. If contact is going to be part of the animal’s connection to the trainer, it makes sense to educate both the trainer and the animal about what it means.

While there are still varied opinions on whether clicker trainers can, or should, incorporate negative reinforcement into their training, I’ve seen a shift in the last few years toward a better understanding of the nuances of negative reinforcement. This has led to more acceptance of a combined approach, as long as the animal does not appear to experience the stimulus as an aversive. My opinion is that training strategies that combine them can be considered a form of clicker training. I know that as I have become more skilled at shaping and learned what kinds of information I can use to help guide the horse, my ability to use negative reinforcement in a clicker-compatible way has improved significantly. There are times now when it’s not clear to me if I am using a combination of positive and negative reinforcement or just using a light touch as information and positively reinforcing correct responses.

If you want to learn more about this subject, I suggest you look at the resources available on Alexandra Kurland’s website. The DVDs on groundwork and rope handling have many examples of people learning to use negative reinforcement in a clicker compatible way. Alexandra has spent years learning how to combine positive and negative reinforcement and she explains very clearly the benefit of a combined approach. If you want to read more about my views on combining them, you can find them in the next article in this series which is on using negative reinforcement.

What can we learn from the quadrants?

Here are what I consider some of the most useful “take-aways” from the quadrants:

Terminology. Knowing the correct terminology makes it easier for trainers to talk to each other without confusion over definitions. But, keep in mind that you can’t assume someone is using the same definitions, so you might have to ask, “Are you using negative reinforcement as defined by operant conditioning?”

How consequences affect behavior. A basic understanding of how consequences affect behavior will improve your training. If a behavior is increasing,
you know to look to see what is reinforcing it. If a behavior is decreasing, you know to look and see what is punishing it.

Our behavior is a result of learning from all four quadrants. On any given day, I am likely to be doing behaviors that have been learned through all four quadrants. That doesn’t mean I am actively reinforced or punished every day; it just means that those processes were part of my learning experience in the past.

Each quadrant comes with an emotional cost or gain. Individuals do not experience all quadrants equally. Dr. Ogden Lindsley described the quadrants using terms that reflect the individual’s experience: R+ = reward, R- = relief, P+ = punishment, P- = penalty. Some people find these labels useful, as they describe the quadrants in simpler terms, but I think they may lead one to believe that the procedure is the only factor that affects the individual’s emotional response, and that has not been my experience.

The difference between negative reinforcement and positive reinforcement. There is sometimes confusion over what type of reinforcement a trainer is using, and some non-clicker trainers will describe a release of pressure as positive reinforcement. This is incorrect because the consequence that changes the horse’s behavior is the removal of the stimulus (the release of pressure) and not the addition of another stimulus such as a treat.

think it’s important to realize that the quadrants were originally used to describe procedures that researchers could use in the laboratory under very controlled conditions. They are useful reminders that behavior changes in specific ways when followed by different types of consequences. But, when working with another being, the most important thing to ask yourself is not which quadrant you are using, but how the animal is responding to what you are doing. Does he appear to be having a positive learning experience? You can answer that by learning to read the body language of your student, and by monitoring his progress. Bright, engaged, and enthusiastic students learn quickly. If you want to learn more about the quadrants, and current thinking on the subject, you can read my notes on Dr. Jesús Rosales-Ruiz’s presentation at the ClickerExpo Conference in 2016.

This general introduction to operant conditioning has been provided to help you understand how behavior changes based on the immediate consequences, and how we can use that information to understand and influence an animal’s behavior. But, operant conditioning is not the only way that animals learn. Animals also learn through a process called classical conditioning. In the real world, most learning happens through a COMBINATION of operant and classical conditioning. There’s an article on classical conditioning later in this chapter. Before I get to it, I want to take a quick look at extinction and punishment, two processes that are about decreasing behavior.

Extinction: Another way to reduce behavior

Positive and negative punishment are not the only processes through which behavior can decrease. There’s a third way: extinction. In extinction, reinforcement is simply no longer available for a previously reinforced behavior. I’m going to describe extinction because it’s important to understand it, but, as you will see, it’s not an effective way to reduce behavior and trainers rarely use intentionally. Here are two examples showing how extinction can affect behavior:

Example 1: There’s a bakery near me that I used to visit regularly. But, the bakery stopped making my favorite cupcake, and I started going less often. Eventually, after enough visits when they didn’t have my favorite cupcake, I stopped going entirely. The lack of reinforcement (the cupcake) eventually extinguished the behavior of going to the bakery.

Example 2: I am teaching my horse to touch an object and I have been reinforcing him for touching it with his nose. When I am shaping the behavior, I click for some nose touches that also include biting at the object. Then, I adjust my criteria so that I am only clicking for nose touches if the horse’s mouth is closed. The horse initially goes through a period where he bites more, but then the biting decreases.

Extinction is generally a process, not a one-time event. An individual might continue to do a behavior, even if there’s no reinforcement, because she expects it will be reinforced again in the future. In the first example, I kept going to the bakery. I just didn’t go as often. One instance of nonreinforcement will not likely extinguish a behavior that has a history of reinforcement. However, if there is an alternative behavior that can provide equivalent reinforcement, extinction can happen more quickly. If another bakery opened up, and they did have my favorite cupcake (or something I liked equally well), I would stop going to the original bakery more quickly.

Extinction starts when there is a long enough period of nonreinforcement that the individual changes his behavior to try and get reinforced again. The first change that occurs is often a phenomenon called an “extinction burst,” which is an increase in the intensity of the previously reinforced behavior. It may be accompanied by some other emotional behaviors. In the second example above, the horse bit at the target more when the trainer stopped reinforcing him for target touches with an open mouth.

A more classic example of an extinction burst is banging on the soda machine when you have put your money in and nothing comes out. In the past, jiggling the machine might have helped. Remembering this, you jiggle the machine, but nothing happens. Then, you jiggle harder, whack it, hit it, and kick it until it either produces a soda or you give up. That escalation in your soda-obtaining behavior is an extinction burst.

Extinction doesn’t fit neatly into the four quadrants of operant conditioning because it’s not driven by adding or subtracting a stimulus, but by a change from positive reinforcement to no reinforcement. This makes it relevant to clicker trainers who depend upon positive reinforcement to shape and maintain behavior. We need to have a basic understanding of how extinction works, so we can recognize when the amount of reinforcement is falling too low and a behavior is starting to go into extinction.

Key points about extinction

Extinction starts when reinforcement drops below a certain level. This level is determined by the animal’s past reinforcement history. The safest way to avoid extinction is to either reinforce consistently or to prepare the animal for periods of lower reinforcement.

It’s frustrating for the animal. If you think of the emotional state of the person kicking the soda machine, it’s not one that a trainer wants her animals to experience. The emotional response is usually most intense during the extinction burst, but there can be unwanted emotional responses during the entire process.

Don’t reinforce during an extinction burst. If a behavior is undergoing extinction, and you reinforce that behavior during an extinction burst, you’ve just made the behavior stronger. You have essentially reinforced the animal for doing more of the behavior you didn’t want. Not only will the behavior be stronger (more intense), it will also be less prone to extinction in the future. Trainers rarely reinforce intentionally during extinction bursts, but they can accidentally do so if the behavior escalates enough that the trainer has to intervene in a way that reinforces it.

Extinction bursts create variability. When a behavior “stops working,” the animal will usually modify his own behavior to try and earn reinforcement. This creates variability in his behavior. Some trainers describe shaping as riding along the edge of extinction bursts – you withhold reinforcement enough that the animal tries harder or tries something else – but not so much that it tips into frustration. Shaping can be done this way, but there are better, less stressful ways to do it.

Extinction is not permanent. Yes, the word implies that the behavior is gone forever, but behavior never goes away entirely, and a previously extinguished behavior can come back under the right circumstances.

Extinction is not an effective way to decrease behavior. It can be difficult to control all the reinforcers, the behavior is likely to return, and the process is frustrating for both the animal and the trainer.

Extinction is normal. I don’t want to leave you with the impression that extinction only happens when reinforcement is deliberately removed from the environment. Extinction is a normal part of life and happens any time the value or availability of reinforcers changes so that you cease doing a behavior. If you think about your own experience, there are probably behaviors (or activities) that you stopped doing, not because they were punished, but just because you didn’t feel like doing them or need to do them anymore. In some cases, trainers can successfully extinguish behaviors by deliberately removing reinforcers in a way that more closely approximates normal environmental changes, but they have to be careful about how they do it.

If you want to read more about extinction, I have two articles on it: Karen Pryor on Extinction and Dr. Jesús Rosales-Ruiz on Resurgence. You can find them in the ClickerExpo 2014 notes on the Articles page. Resurgence is what happens when a previously extinguished behavior reappears. Understanding about resurgence will help you understand extinction.

Punishment

In the previous article on operant conditioning, I said that one of my goals was to help new clicker trainers learn to avoid using punishment. Most people would probably agree that they would like to use less punishment, but wanting to use less punishment and knowing how to do it are two different things. And some people may even wonder if it’s possible to use less punishment. Don’t you need punishment to avoid unwanted behavior? Doesn’t the horse need to know if he’s wrong, or who’s in charge?

Even if we don’t describe what we are using as punishment, there’s a big emphasis in horse training on responding directly to unwanted behavior. This is because “letting him get away with bad behavior” is considered to be a sure path to having an uncontrollable horse. If that is what you have been led to believe, then it can be hard to accept that perhaps there is another way. I’ve found that one way to encourage people to consider alternatives to punishment is to have them look a little more closely at how punishment affects their horses. Is it effective? Are there side effects?

Problems with punishment

Bob Bailey, one of the best-known practitioners of applied operant conditioning, ran a successful animal training business for many years. He and his wife, Marion Breland Bailey, were both students of B.F. Skinner, and were among the early pioneers of positive reinforcement training. Their company, Animal Behavior Enterprises (ABE), trained tens of thousands of animals for government projects, shows, and other commercial purposes.

Occasionally, one of ABE’s trainers would come to Bob asking if he could use punishment. Bob would tell the trainer that he could use punishment, but that he would have to provide a detailed training plan for how he was going to use it, explaining why he couldn’t change the animal’s behavior in some other way. Bob says that by the time the trainer was done preparing this, he almost never still wanted to use punishment. Punishment was only used a handful of times in all the training done by ABE.

Punishment has known risks. Dr. Susan Friedman sums them up in her article “What’s Wrong with This Picture? Effectiveness Is Not Enough,” which has been published in multiple animal-training journals.

Dr. Friedman’s list of problems with punishment

  • Punishment can lead to aggression, generalized fear, apathy, and escape/avoidance behaviors.
  • Punishment doesn’t teach learners what to do instead of the problem behavior.
  • Punishment doesn’t teach caregivers how to teach alternate behaviors.
  • Punishment is really two aversive events – the onset of a punishing stimulus and the forfeiture of the reinforcer that has maintained the problem behavior in the past.
  • Punishment requires an increase in aversive stimulation to maintain initial levels of behavior reduction.
  • Effective punishment reinforces the punisher, who is, therefore, more likely to punish again in the future, even when antecedent arrangements and positive reinforcement would be equally, or more, effective.

In short, punishment does not teach either individual what to do, adds an aversive, removes reinforcement, usually increases in intensity over time, and reinforces the punisher who is more likely to continue doing it, even when other methods are more effective. Additional reasons to avoid punishment are that it only suppresses behavior temporarily (unless you keep applying it), can damage your relationship with your animal, and has unwanted emotional and physical side effects. Not only that, but the effects of punishment are not always immediately apparent. They can develop slowly over time, so trainers often do not realize that new “problems” are due to previous use of punishment. Animals that are in situations where punishment is common are under constant stress, which decreases their quality of life.

Everyone has moments that clarify some important point, and I can still remember the day that I realized how much punishment affects all behavior. I had been clicker training for a while and was in the habit of letting Rosie out in the aisle at night to play some clicker games. She would usually come right out and offer whatever behavior we had been working on. One evening, before her turn, she started banging on her door with her knees. This was how she got my attention when she wanted something. It was not a behavior I liked, and I had been training an alternative behavior. But, for some reason, that night I got annoyed and yelled at her. She stopped banging. But, when I let her out to play a few minutes later, she wouldn’t do anything. No offered behavior, no interacting with me, nothing.

Given all of the above, I hope you are starting to see why using punishment is not a good option for any trainer, and especially not for clicker trainers. For another perspective, here’s what Steve White has to say about punishment. Steve has been involved in training dogs for military and law enforcement agencies since 1975. He specializes in teaching behavior modification, urban tracking, and scent work with positive reinforcement. Steve has given me permission to share this list from his Trainer’s Pocket Reference, which is a handy reference card that he has developed. The list is of criteria that a trainer must be able to meet if she is going to use punishment. I am quoting him directly, so the references are to dogs, but the same criteria could be used by trainers working with any species.

Steve White: About Punishment

  1. It must be something the dog will work to avoid. This is a subjective experience unique to the individual.
  2. It must be unexpected. Knowing it’s coming can change the behavioral economics of the event.
  3. It must suppress behavior. Aversives applied without effect basically constitute abuse.
  4. It must be of perfect intensity. Too much and the subject bails; too little develops a punishment callus.
  5. It must happen immediately. Otherwise you risk punishing subsequent behaviors.
  6. It must be associated with the behavior, not you or the delivery mechanism. Far too many of us are the discriminative stimuli that punishment is at hand.
  7. It must happen every time the behavior occurs. Otherwise we strengthen problem behavior with a random reinforcement schedule.
  8. There must be an alternative for the dog. Make clear the contrast between the paths to reinforcement and punishment.
  9. Never forget punishment reinforces the individual punishing. This insidious effect happens just because you get momentary relief from annoyance.

One final thought. If you are tempted to use punishment and cannot follow all these guidelines, then you need to rethink your training plan.

Between Dr. Friedman and Steve White, I think that these are very compelling arguments against using punishment. But, it’s not enough to decide to stop using punishment, you also need to know what you can do instead. The good news is that most unwanted behavior can be addressed through a combined approach that includes removing contributing factors, management (prevention), and teaching alternative behaviors. All behavior has a function and unwanted behaviors can be the horse’s way of telling you there is something he needs. Putting punishment aside and concentrating on approaching the problem from a different angle can lead to a long-term resolution that is better for both of you. Practical alternatives to punishment will be discussed in the article Unwanted Behavior and Errors, in Chapter 5: Training Sessions.

When I first learned that there were alternatives to using punishment, it was a huge relief. I never wanted to use it, never liked using it, and never thought it worked very well. But, there was still a little inner voice that recognized that horse trainers are judged by how well they handle difficult horses and how quickly they can make a horse compliant. I had to take some time to process the idea that I didn’t have to think of
horse training that way anymore. But it’s not easy to throw out an idea without replacing it with a new one. So, I started paying more attention to quotes on the subject by people I admire.

Here are two of them.

The first comes from Karen Pryor’s book, Reaching the Animal Mind. I read this book when it came out in 2009, and enjoyed it so much that every now and then I pick it back up and just read a page or two. No matter where I start, I always find something interesting. Karen writes:

More profound than any specific application is the change the technology [of clicker training] brings about in the people who are using it. When you stop relying on aversive controls such as threats, intimidation, and punishment, and when you know how to use reinforcement to get not just the same but better results, your perception of the world undergoes a shift.

You don’t have to become a wimp. You don’t have to give up being in charge. You lose nothing of yourself. You just see things you didn’t see before. One man said to me, “I stopped jerking my dogs around and then I noticed what I was still doing with my kids.” It’s not a moral question. He was trying to be a good parent before. He is still trying to be a good parent. It’s just that now he sees an alternative way.

The second quote comes from Brené Brown. This quote, from her book Daring Greatly, refers to people, but it’s the first definition of a leader I’ve seen that I would consider applying to the relationship I want to have with my horses.

I’ve come to believe that a leader is anyone who holds her- or himself accountable for finding potential in people and processes. The term leader has nothing to do with position, status, or number of direct reports.

I particularly like her comment about “finding potential,” and that my job as a leader is to bring out the best in my horse. It reminds me of something Kay Laurence told me, which is that the word educate is related to the Latin educere, which means to draw out. Education, aka training, should be about bringing out and developing the qualities and skills we want to see in our students. In this context, it’s pretty clear that punishment, which has an overall dampening effect on behavior, is not going to draw someone out.

What kind of qualities do you want to develop in your horse, and how would you go about doing that? Asking this question puts training in a different perspective because it makes training about the horse and not about what I might want to do with him. I don’t think the two are incompatible, but it’s easy for them to get out of balance, and Brené’s
quote is a good reminder that training should be done for the horse, not just to the horse.


If you want to learn more about how to use negative reinforcement as a clicker trainer, the next article in this series goes into more depth on this subject.