For all you theory geeks out there (that would include me…), and maybe a few others, here are my notes from Dr. Jesús Rosales-Ruiz and his talk titled “The Quadrant Quandry: Clarity and Perspective on an Icon.”
The quadrants…. what a topic!
I’m so glad that Dr. Rosales-Ruiz put together a presentation on the quadrants. When I first started looking more into the science behind clicker training, I found a nice neat little diagram describing the ways that behavior could be increased or decreased. Cool! Then I started reading more, and thinking more, and observing more, and the question of “which quadrant” became one of the more confusing, mind boggling, illuminating and tangled up things I have ever tried to sort out.
Training in real life was not neat and tidy like the diagram suggested. Often two things seemed to be happening at once and even those were sometimes subject to interpretation. Trying to understand the quadrants generated a lot of thinking and forced me to look more carefully at what I was doing. Most traditional horse training is based on negative reinforcement. So now I had to ask myself “is this ok?” Where is the line between negative reinforcement and punishment?
At the same time I realized I couldn’t assume that I was using positive reinforcement just because I was feeding treats. I had to learn to ask myself what the horse really wanted. Was I using food in such a way that I was trying to “cover up” other issues, or was I using it to address behavior at the level that was most important to the horse.
I spent a lot of years asking myself these kinds of questions. Using the quadrants as a reference point was helpful in that it gave me a framework for my thinking, but in the end I realized that while it was important to understand them and recognize how different kinds of reinforcement and punishment affect behavior, the most important thing to consider was how the horse felt about what I was doing and whether or not we were working together toward behaviors that would benefit both of us.
So… should you learn about the quadrants? Definitely. Should you use them as an absolute guide? No. But learning about them will make you a better trainer and I think that most people need to go through a period of thinking about them in order to be able to fully understand all the ways we can change behavior. I joke to my students that you have to spend some time learning about the quadrants so that you can move on and forget about them. That basic understanding will be there and be useful when you are watching other trainers, explaining what you do to other people, and sometimes it can be helpful when you are faced with training challenges where things are not quite going as expected.
Here’s my summary of his talk. Do ask if you have questions or there seem to be errors. I always find it challenging to take the information he presents, process it and be able to share it with others, but it’s a good way to think through it and try to explain it to someone else.
A Little History on the Quadrants
The quadrants were first described by B.F. Skinner and used in Academia long before there was any public awareness of them. In the first descriptions of positive and negative reinforcement, the difference between positive and negative reinforcement was described as “stimulus presentation” and “stimulus withdrawal.” The word “stimulus” referred to a change in the environment, and the negative and positive reinforcement were used to describe PROCESSES.
B.F. Skinner actually started with only two boxes, positive and negative reinforcement. He said that punishment didn’t teach anything, but that doesn’t mean he didn’t acknowledge that aversives could be used in training. Negative reinforcement was defined as the conditions under which “stimulus withdrawal” would increase behavior. In that case, aversives were used as motivation to change behavior. Even with that information, he didn’t consider “stimulus presentation” an effective way to decrease behavior.
In the 1950’s, punishment shows up in the literature. Murray Sidman was doing work that showed how aversives can be a problem in training and lead to avoidance behaviors. If you haven’t read his book “Coercion and Its Fallout,” you might want to add that to your reading list. It’s not beach reading. I read it a few pages at a time. He did a lot of very interesting work looking at how aversives and punishment affect behavior. In 1956, there were experiments showing punishment does have an effect on behavior and by 1960 the diagram had been updated to include quadrants for both positive and negative punishment.
Dr. Rosales-Ruiz said that by the 1960s everyone was happy because the diagram of the quadrants was now tidy and symmetrical. At this point, people were now talking about the quadrants as PROCEDURES. They were no longer processes, but procedures that could be deliberately carried out in the lab to study behavior. Now instead of labeling the diagram (along the top) with “stimulus presentation” and “stimulus removal,” they were talking about “response produces a stimulus” and “response removes or prevents a stimulus.” Using positive reinforcement as an example, the distinction is that the view shifted from the environment determining the response; stimulus presentation -> responding increases, to the response determining whether or not the stimulus was added; response -> stimulus presentation.
More recently there has been another shift, driven partly by the increased use of the terminology by professionals outside of the field of Behavior Analysis. This shift has been to identify the quadrants based on some quality of the STIMULI. Stimuli are now labeled as appetitive and aversive. In these diagrams, instead of identifying processes or procedures, the diagram identifies types of reinforcers/punishers and what will happen when they are added or removed.
The problem with this is that it assumes that the value of a reinforcer is fixed and the objects themselves are now given properties of being either negative or positive reinforcers. It might be accurate to say that an object/stimulus functions as a positive reinforcer under specific conditions, but that doesn’t mean the quality of the object/stimulus is fixed and it ALWAYS functions as a positive reinforcer. It’s important to be empirical and not decide whether or not you think something will be a reinforcer or punisher.
Premack and the Value of Reinforcers
This brings us to the work of David Premack and the relativity of reinforcers. David Premack studied the behavior of rats (among other things) and found that he could change the relative value of different reinforcers by changing the environmental conditions under which the animal lived. He did a series of experiments where he looked at the amount of time rats spent drinking water vs. running on an exercise wheel and manipulated the environment (limited access to water or the wheel) in different ways.
What he found was that he could use one behavior to make the other more likely to occur. This is usually referred to as the “Premack Principle” and is often used by animal trainers to increase the likelihood of less preferred behaviors by reinforcing them with the opportunity to do more preferred behaviors. What he found was that the reinforcement relationship is reversible .
- If water deprivation makes drinking more probably than wheel running, then the opportunity to drink will reinforce running.
- If limited access to the wheel makes running more probable than drinking, then the opportunity to run will reinforce drinking.
This ability to reverse reinforcers has been shown with other types of animals and even with humans. Dr. Rosales-Ruiz described a study that was done with two groups of kids. The first group liked chocolate and the second group liked playing ping-pong. The “chocolate” kids were reinforced with chocolate for playing ping-pong and the “ping-pong” kids were reinforced for eating chocolate by the chance to play ping- pong. They were able to completely change the kids’ preferences so the chocolate group now preferred ping-pong and vice versa.
Here are two related points about the value of reinforcers.
- Just because something is reinforcing, that doesn’t meant the animal will work for it.
- Expectation can be a factor. If you expect one type of reinforcement and you get another, it might not function as a reinforcer even thought it might be reinforcing in other situations.
Confusion Inside the Quadrants….
If we focus on processes and procedures instead of the properties of stimuli, and recognize that the value of reinforcers and punishers is not fixed, does that make the quadrants more useful? Yes, but we also have to recognize that there are some other sources of confusion that make it difficult to decide which quadrant is driving behavior. He shared some more information and examples to help clarify the differences (and possible ambiguities) between positive and negative reinforcement, positive and negative punishment, and how extinction fits into the quadrants as well.
Positive vs. Negative reinforcement:
Using the quadrants, positive reinforcement happens when something is added to the environment to increase behavior. Negative reinforcement happens when something is removed to increase behavior. Sounds simple, but the distinction is ambiguous because the adding and removing can be symmetrical. Here’s an example from Weiss and Laties (1961).
A rat is placed in cold chamber
- Lever pressing produces heat
- Lever pressing increases
- What maintains lever pressing?
- Positive reinforcement or negative reinforcement?
It might be …
- Positive reinforcement – lever presses produce heat
- Negative reinforcement – lever presses remove cold
- The problem is that adding and removal are symmetrical, the addition of one event is the removal of the other and vice versa.
- Adding heat removes cold, adding a smile removes a frown, adding food removes deprivation.
Because of this ambiguity, some professionals have argued that there is no clear distinction between positive and negative reinforcement in many situations and it would be better to get rid of the terms positive and negative and just talk about reinforcement.
There are some significant differences between an animal’s behavior when put under conditions where learning has happened with positive compared to negative reinforcement (phew…say all positive reinforcement trainers….)
First, there is an asymmetry in how the animal responds to the presence/absence of the stimulus:
- In positive reinforcement, the response occurs in the absence of the stimulus or situation upon which reinforcement is based. This means you do the behavior hoping it will lead to reinforcement, even if you can’t see the reinforcement. My horse offers behavior even if he can’t see the carrots, and he may continue to offer the behavior even if he is not reinforced.
- In negative reinforcement, the response occurs in the presence of the stimulus or situation upon which reinforcement is based. This means the stimulus must be present in order for you to respond to it. My horse will stand in the location I want if I tie him with a lead rope. But if the lead rope is not there, he won’t stand in the location I want.
Second, under negative reinforcement conditions, the presence of the stimulus to be removed generates responses that can compete with the response to be reinforced. This can make negative reinforcement less effective. Here’s an example:
- When cold, a rat huddles in the corner and shivers – this competes with lever pressing.
- When cold in bed, the human shivers and curls up – this competes with getting up and adjusting the thermostat.
This is an important consideration because if the stimulus is intense enough, it will produce emotional behavior which can interfere with learning. He said that when they used to do shock experiments with rats, up to 60% of them did not respond as expected and could not be shaped to avoid or escape the shock.
Third, you have to take into consideration that there are two types of negative reinforcement and they can have different emotional impacts on the animal.
- Escape – the behavior results in removal of something already present, there is no warning stimulus. This type of negative reinforcement doesn’t seem to produce anxiety. The animals just learn from it.
- Avoidance – the behavior postpones or prevents something from occurring. There is a warning stimulus but it is not necessary. The warning actually becomes a source of anxiety and he said this is why poisoned cues can be such a problem. The dog is constantly anxious about what might happen.
Moving on to other possible sources of confusion….
Positive vs. negative punishment.
We have the same problem of symmetry involved in the presentation and removal of the stimuli. I’m actually scratching my head a little over this one as I don’t see quite how you do this, but the general point is that there can be situations where the positive and negative punishment are equal and opposite.
- Positive punishment – lever pressing produces cold
- Negative punishment – lever pressing removes heat
Negative punishment had been studied less in the laboratory as it is hard to set up conditions where reinforcement is removed as a way to change behavior. You cannot take food away that is already eaten, or if food is present, the rat would be eating instead of pressing the lever and there is no opportunity for punishment. What is usually removed are stimuli that signal reinforcement, so negative reinforcement could be called “time out from positive reinforcement.” Here are some examples of negative reinforcement:
- In the lab, the lights turn off and positive reinforcement is not available for a short period.
- In the real world, you can signal to the dog that the clicker game is off temporarily. This can be done by taking your treats and leaving.
- A training example of negative punishment is the method called “penalty yards” which is used to teach dogs to loose leash walking. The dog is walked from point A to point B and if the dog has an “error,” the trainer turns around and the dog has to start again from the beginning of from a previous point on the path from A to B. Reinforcement happens at point B so the dog wants to get there. Any return to point A is taking it away from the destination where reinforcement happens. The exercise is designed to use negative punishment to decrease the incorrect behavior or “error” in the loose leash walking.
What about Extinction?
Extinction is not included in the quadrants, but needs to be considered when we are looking at ways to change behavior. Extinction is the process of a behavior decreasing over time when it is no longer reinforced. The reinforcement could have been coming from the trainer or be coming from the environment. It is often confused with negative punishment because both result in a decrease in behavior. But the difference is that:
- In extinction the previously reinforced behavior is ineffective and there is no stimulus change following the behavior. The animal does the behavior and nothing happens, whereas previously when the animal did the behavior, something did happen (your reinforced it in some way).
- In negative punishment, there is a stimulus change where positive reinforcers are not available or are removed. The animal does the behavior and something changes. You can take your treats and leave or you can remove access to existing reinforcers. Negative punishment is more accurately viewed as “time out from positive reinforcement,” or “reinforcer loss.”
Using the example of loose leash walking again, the method of “being a tree” is an example of using extinction to decrease leash pulling. One reason dogs learn to pull on leashes is because they learn they can get their handler to take them where they want go, or to go there faster. If the handler stops and stands still, instead of going faster, the behavior will decrease because the previous reinforcement (forward movement) is no longer available. This is actually a great example of how the quadrants can get confusing because stopping is a change in stimulus conditions. To make sense of it, you have to view the stopping as removing forward motion, not as adding a stop. If you compare this to “penalty yards,” you’ll get a feeling for how negative punishment is different than extinction.
Extinction and negative reinforcement
We usually think of extinction in terms of positive reinforcement, but you can have extinction with negative reinforcement too.
- Extinction and Negative reinforcement: discontinuation of negative reinforcement – the response does not prevent or remove the aversive stimulus. A great example of this is riding a horse that no longer responds to leg pressure. Leg pressure is taught by removing the pressure when the horse goes forward. If the rider keeps her legs on, the horse learns to ignore it because there is no change when the horse moves forward. Over time the horse becomes less and less responsive to leg pressure.
Can we talk about Extinction and punishment?
Well, we could say that:
- positive punishment – discontinuation of positive punishment
- Negative punishment – discontinuation of negative punishment
But this invites confusion because extinction is usually discussed in the context of reinforced responding. Since discontinuation of punishment produces an increase in behavior it doesn’t make much sense to talk about extinction and punishment.
Operant quadrants in popular literature
He had a few examples of how the quadrants are now being presented in popular literature. Dog trainers and owners are becoming more educated and some do want to learn more about the science behind clicker training. They are looking for practical information on what the quadrants mean and trying to find new ways to present them to encourage people to change to more positive reinforcement based training. This is a great step forward, but it’s important that we present them accurately and in keeping with their original use.
In summary, to avoid confusion
- Use quadrants to identify processes and procedures (not types of stimuli)
- Don’t assume something is reinforcing or punishing, let the animal’s behavior tell you
- Remember that reinforcers and punishers are relative and can change over time
- Don’t get hung up on the terms, look at the emotionality