equine clicker training

using precision and positive reinforcement to teach horses and people

ASAT 2020: Dr. Paul Neuman and Dr. Philip Hineline – What the textbooks don’t tell you about negative reinforcement

This is the second in a series of posts based on my notes from the 2020 Art and Science of Animal Training Conference that was held in Hurst, Texas on February 22-23, 2020.  To learn more about the conference, you can visit the conference website.

While I try to take accurate notes, it is possible that there are errors or that some detail is lacking.  If you post a comment or email me, I can try to clarify or provide some additional information. Many thanks to the speakers and organizers who allow me to share.


Dr. Philip Hineline was the scheduled keynote speaker, but he was unable to attend due to health reasons. His colleague, Dr. Paul Neuman gave his talk for him.


What the textbooks don’t tell you about negative reinforcement: Dr. Paul Neuman and Dr. Philip Hineline

Contrasting theories of negative reinforcement

Negative reinforcement, as it is taught and presented in textbooks, is described as a process where the animal learns to avoid an aversive event by responding to a warning stimulus that occurs before the aversive event. In order for learning to occur, there must be temporal contiguity between the warning stimulus and the aversive event (the warning stimulus must directly precede the aversive event). According to this theory, the warning stimulus will become a conditioned aversive after it has been paired with the aversive event enough times.

But, the evidence doesn’t match the story.

Why are we teaching information that doesn’t match what they found in experiments 50 years ago? Let’s identify some of the problems with the textbook explanation and then see if we can improve our understanding by looking at the results of Murray Sidman’s avoidance experiments.

Problems with the textbook explanation:

  • “Avoidance” – An ordinary language term adopted as technical
  • The “fundamental” question that followed:
    • How can a non-occurrence be a consequence?
    • i.e. what reinforces avoidance?
  • Assumed necessity of continguous occurrence (1957)
    • thus- warning stimuli were always supplied in experimental procedures
    • the theorists needed it even if the lab rats didn’t
    • thus – Sidman’s procedure was revolutionary

We will replace “avoidance” with other terms and show that events recognized by the animal can be internal, previously related events – they don’t need trainer added warning stimuli to learn to avoid aversive stimuli. We will also argue that the warning stimulus is discriminative and not a conditioned aversive.

Sidman’s Experiments

Sidman wanted to look at how animals learn to avoid aversive events. He did a series of experiments looking at the effect of time, warning stimuli, and shock frequency on successful avoidance behavior. You can find his experiments by searching for papers on “conditioned aversive temporal stimuli.” Dr. Neuman described several of the experiments that challenged the textbook description of negative reinforcement.

This talk was quiet technical and had a lot of diagrams. I’ve tried to highlight the main points of each experiment as you read along. Please not that I added these – they are not direct quotes from Dr. Neuman. I’ve also put a little summary of what I learned in my notes at the end. You don’t have to understand all the details of each experiment to get the general gist of things.

In the diagrams, the solid blue line represents the time interval before the shock. The red “star” represents shock. The thinner line indicated by the “R” indicates the zone in which the rat would make the response. The dotted blue lines represent what happens after the shock or the rat’s response. In this experiment, the response shock interval is restarted if the rat gets shocked or responds to avoid the shock.

Experiment #1:

The rat receives a shock after a fixed time interval (eg 10 seconds). The rat can avoid the shock by pressing a lever. The response – shock interval is the interval between the response and the shock. The shock – shock interval is the interval between shocks.

In this experiment, the RS (response – shock interval) and the SS (shock – shock interval) are equal.

What did he learn? If the shock is predictable (comes at regular intervals), rats can learn to avoid shock without any warning stimulus.


Experiment #2:

The rat receives a shock after a fixed time interval (eg 10 seconds). The rat can avoid the shock by pressing a lever at any point before the shock occurs (anywhere along the solid blue line).

If the rat avoids the shock, the response – shock interval is reset. If he gets shocked, the following response – shock interval is shorter (see how the upper dotted line doesn’t go all the way back to the beginning?).

What did he learn? Rats can learn to avoid shock even when the RS and SS intervals are not equal (maybe it’s not just a function of time).


Experiment #3:

The rat receives a shock after a fixed time interval (eg 10 seconds), but now he gets a warning – a light turns on before the shock occurs. The rat can avoid the shock by pressing a lever, either before or after the warning stimulus starts. The interval during which the warning stimulus is present is shown by the red rectangle.

When the warning stimulus is added, the interval between shocks is divided up into two parts. The first part (the blue line with arrows and nothing above it) is the response – light interval. The second part (marked by the red rectangle over the blue line) is the response – shock interval.

This experiment was designed to identify the function/properties of the warning stimulus. Is it a conditioned aversive or a discriminative stimulus?

If the warning stimulus functioned as an S delta, then the rat would be expected to press the bar BEFORE the warning occurred. This would be consistent with the idea that the warning stimulus takes on the aversive qualities of the shock. You would see the response in the zone marked by the R, before the light turns on.

However, if the warning stimulus functions as a discriminative cue, the rat would pay attention to it, so he could avoid the shock, but he would not avoid the warning. You would see the response in the zone marked by the R, which is after the light turns on.

What did he learn? Sidman found was that the rats did not respond before the warning signal. Instead they delayed and pressed the lever after the warning signal had started. The warning stimulus functioned as a discriminating stimulus.


Experiment #4:

Next Sidman looked at the effect of varying the length of the two intervals. He would hold one constant while varying the other, to see when the rat would respond.

  • During the response – light interval (the solid blue line with arrows) the rat was in the dark and could press the lever to avoid the light and the shock.
  • During the response – shock interval (the solid blue line with the red rectangle over it), the light was on and the rat could press the lever to restart the beginning of the light interval, but this just postponed the shock – unless the rat responded again to further postpone it.
  • Once a rat was in the RS interval (light is on), the only way to get out of it was to accept the shock, which would restart the RL interval (dark).

In this diagram the RL and RS intervals are equal.

Here’s a graph that shows some of the results. In this one the RS interval was held constant and the RL interval was varied. The line labelled “L” shows reponses made during the RS interval – when the light was on. The line labelled “D” shows responses made during the RL interval – when it was dark.

The crossing lines show that the rats changed their behavior as the RL interval got longer.

Remember: the order is RL interval (dark) -> RS interval (light) -> shock

  • If the RL interval is short: rats respond at higher rates when the light is on (RS interval). This delays or postpones the shock. Not enough time to respond?
  • As the RL interval increases – the number of responses made when the light is on (RS interval) decreases and the number of responses made in the dark (RL interval) increases. They are responding before the light comes on, so they can avoid the shock.
  • I confess I added some material here that I didn’t learn this from the talk – he went through this too quickly for me to follow. But I found a description of the experiment here.

What did he learn? Whether or not the rat responds before the warning stimulus will change based on the amount of time allowed for the response.

He also experimented with varying the RS interval and got less interesting results. What this experiment showed him was how different conditions (contingencies) would affect when rats chose to delay, postpone, or avoid shock, as well as when they accepted the shock in order to escape from the RS interval.

This relates to Herrnstein’s work on scales of process, which states that events occur on different time scales. Sometimes we have to look at a longer time frame in order to understand all the factors that contribute to a certain behavior. For example, you might have to identify competing responses. Some animals tend to freeze when they are expecting an aversive event, which makes it harder for them to learn to do a specific behavior to avoid it. You might also have to recognize the impact of previous events (establishing operations). If you’ve just had a 9 course meal, then your behavior around food is likely to be different than if you’re hungry.


Krasnegor, Brady and Findley’s experiment

They took this one step further and set up an experiment with rhesus monkeys where the response – shock interval was divided up into three sections – blue, green, and red. They wanted to look at the relationships between the difficulty of the task, the proximity to the warning, and length of the different intervals.

The monkeys were trained to press a lever 30 times within a time interval.

If the monkey successfully performed the 30 lever presses in the blue interval, he could avoid the shock. If he did not, then the green light was turned on. He had another chance to do 30 lever presses. If he completed his 30 lever presses, then he avoided the shock. If he didn’t, then the red light came on, followed by a shock in 3 seconds. They experimented with a variety of conditions including changing the required number of responses in the blue and green intervals (the fixed ratio schedules – FR) and the length of the intervals (fixed intervals – FI).

Here are their results:

Remember the order is: blue -> green -> red/shock

When the number of responses required in both intervals was equal (FR 30 in each), the monkeys were more likely to delay responding until the green light came on. They had already been trained to perform 30 lever presses in that time interval, so they would wait until the green light came on. There was no penalty for waiting as they could still avoid the shock. However, when the difficulty increased in the green conditions (FR 60, 90, 120), but not in the blue conditions, then the number of successful avoidance trials under the blue conditions increased.

What did they learn? Monkeys will choose to respond earlier, during the blue light, if waiting until the green light means they have to do more work to avoid the shock.

There were more details, but I think this is the information that is most relevant for many of us. If you want to read about the experiment in more detail, you can find it here here.


How does this relate to real life? Dr. Neuman gave the example of studying for a test. In high school, you may wait until the last minute before starting to study because there amount of work can be done quite close to the time of the test. But in college, you learn to study earlier because there’s too much material to cover. How much you procrastinate is related to how much work you have to do and the time in which you have to do it. If there’s no penalty for delaying, most of us will procrastinate.


Five Summarizing Principles

(1) Negative Reinforcement is to be understood in terms of transitions between situations as well as by postponement or prevention of events within a situation.

(2) Relative aversiveness of a situation (the degree to which transitions away from it will reinforce behavior) depends only partly upon primary aversive stimuli that occur within a situation. Even when those stimuli do contribute to aversiveness, a relevant feature is the relation between their short-term versus longer term distributions over time.

(3) Relative aversiveness of a situation depends substantially upon both contingencies:

  • “work” requirements in that situation
  • “work” requirements in alternative situations

(4) Most importantly, the role of the alternative situation(s) depends upon contingencies regarding the change of situation – i.e. upon “what is involved in getting from one situation to the other.”

(5) All things being equal, performance tends to allow persistence of the situation closer to primary aversive events.

The distinction between discriminative and motivating functions

Asymmetry between positive and negative reinforcement – in the former, behavior occurs in the absence of the reinforcer, but in the latter, it occurs in its presence, with competing response (e.g. freezing).

In a positively reinforced behavior chain – responses in the presence of an S delta are distinct from reinforcing situation, but with negative reinforcement, the discriminative and motivational stimuli/aspects are present when behavior occurs and are conflated.

From Morris, Higgins & Bickel (1982)

Just as the power of a microscope must be adjusted as a function of the phenomenon under study, so does the level of behavior analysis need to be adjusted to the functional unit of behavior-environment interaction.

To be specific, when order is not apparent at a molar level, a more molecular analysis may be necessary … conversely, if one fails to find an immediate stimulus that controls a response, perhaps the response is only an element of a larger functional unit which is controlled by currently operating variables not immediately attendant to that element (pp. 119-120).

The prevailing opinion (“what the textbook says”) was that “events close in time maintain behavior,” but Dr. Hineline did not agree. Dr. Neuman showed us a video of a horse trainer demonstrating how pressure and release (negative reinforcement) works on horses. He asked us what was missing from the explanation. The answer was that seeing the final result (or a brief segment of training) does not tell us much about the learning process. In order to evaluate it, we need to see the big picture.

Field and Boren: A more promising analogue

The Experiment:

They set up conditions where a shock is given every 5 seconds, unless the rat does a specific behavior. In addition, every time the rat responds, the shock is further delayed by 10 seconds. Depending upon when the rat responds, it can either avoid or postpone the shock.

The red rectangle indicates the presence of warning signals. In this case, the warning signal was given multiple times.

What does the rat do? Well, it depends upon what kind of warning signals are presented. They tested under three different conditions. They were:

  • no information that shock is coming (except passage of time)
  • lights used as warning signals before the shock occurs
  • clicks used as warning signals before the shock occurs
  • both lights and clicks used as warning signals before the shock occurs

From the journal article:

“The present study investigates the effects of multiple warning stimuli programmed to indicate, at each step, the time remaining before the onset of shock. It was anticipated that the stimuli would permit an analysis of the avoidance behavior in terms of the temporal proximity of the shock.”

The following diagram shows their results. It is a bit hard to read (Dr. Neuman actually said this…).

These graphs make more sense.

Each condition is shown on a separate graph. The interesting part is that the more warning signals the rat receives, the longer he will delay responding to avoid the shock. This can be seen by comparing the “height” of the lines. A line that stays closer to the x-axis means the rat waited longer (and got closer to the time of the next shock) before responding.

Dr. Neuman compared this to backing up your car and listening to the audible warning sounds (ding, ding, ding) that tell you when an object is behind it and provide information about how close you can get without hitting anything. This is another experiment that raises questions about whether warning signals are conditioned aversives. If the lights and clicks were conditioned aversives, the rats should have responded to them more quickly so they could avoid more occurences of them.

What did they learn? If an aversive event is preceded by one or more warning stimuli, the rats will wait until closer to the aversive event before responding.

Summary

If these experiments were more widely studied, negative reinforcement would be taught differently. They clearly show that warning stimuli do not become conditioned aversives. Instead they are discriminative stimuli that provide information so the animal can avoid aversive events.

He went on to say that this goes against the prevailing (and often taught) view that negative reinforcement is bad. Yes – he did actually say “Negative reinforcement doesn’t have to be bad.” There are types of negative reinforcement that can be benign.

How do we identify them? We can only do it from our own perspective, but if we can observe the training or teaching process without being uncomfortable, then there’s a strong probability that the use of negative reinforcement is of the benign type.


Katie’s notes:

It’s going to take me a while to digest this one. Right now I have more questions than answers. For me, the main points were:

  • When an animal is regularly exposed to an aversive event, it can find warning stimuli in the environment, even if trainers do not supply them.
  • Warning stimuli function as discriminative stimuli, not as conditioned aversives – they are information.
  • If provided with warning stimuli, animals will delay responding until closer to the aversive event – they procrastinate.
  • How long the animal will delay responding is affected by the work requirements to avoid the aversive event and how much time he has to respond.
  • An animal’s response is based on his experience over time (past learning history) and the contingencies that are present. You have to look at the big picture.
  • Be careful about assuming that what happens in the lab will happen in real life. Dr. Neuman and Dr. Layng both said this at some point in the following Q & A session. There’s danger in extrapolating too much.
  • Negative reinforcement can be benign. I’m still thinking about how we evaluate if the animal is experiencing “benign” negative reinforcement because if we rely only on the judgement of the observer, then we are assuming that everyone is educated in reading body language and would pick up on any signs that the animal is uncomfortable with the situation. I’m not sure that’s a safe assumption…not only because people vary in their observational skills but also because animals vary in how much they reveal.

There were a lot of discussions about this talk over the weekend. One of the things that came up several times was how this relates to the associations that many animals make with equipment or objects in their environment. Animals do learn to avoid objects such as collars, leashes, and halters that have been associated with aversive events. Someone in the audience asked if Dr. Neuman was saying that these are not conditioned aversives as many of us have been taught.

This led to some general discussion among the speakers, but they all agreed that if the animal has time to respond after the warning stimulus is presented, thus avoiding the aversive event, then the warning stimulus would be discriminative, not aversive. However, if the two events occurred quickly (as in pairing), then pavlovian conditioning would occur and the warning stimulus could become a conditioned aversive. This is assuming that the warning signal is a neutral stimulus.

In most training situations, we do give the animal time to respond and we are looking for an operant response so it’s less likely that you would be doing a direct pairing procedure. I suspect that it’s not quite as simple as that, but the impression I got was that if you give the animal time to respond and he can avoid the aversive event, then the warning stimulus should not become a conditioned aversive.

Because I’m a horse person, I found myself considering how this talk is related to traditional pressure and release training, which is based on negative reinforcment. Among the people I meet, there are always endless discussions about whether negative reinforcement requires the use of an aversive and if such aversives suppress or punish behavior, or just provide information.

I’m not sure I want to get into that can of worms here, without more time to think, but I do want to point out that this entire presentation was on using negative reinforcement to create an avoidance response, not an escape response.

What do I mean by avoidance vs. escape? These are the two types of responses that animals learn through negative reinforcement. Typically an animal learns to escape first – he experiences the aversive stimulus and figures out how to get away from it. Then if it happens repeatedly, he learns how to avoid it by doing a behavior that prevents the aversive event from occuring.

For example, horses learn to AVOID electric fences through negative reinforcement. They touch the fence, get shocked and then learn how to avoid getting shocked. The behavior of moving away (when they get close to the fence) increases and they are able to avoid the aversive event in the future. They don’t usually keep touching the fence before moving away. Well, there’s always one, but…

However, if we are thinking about aids and cues that are taught through pressure and release, it’s different. In many cases, what we want is for the horse to learn to ESCAPE from the aversive event. If I am riding with negative reinforcement, I may teach my horse to move off from my leg cue, but I probably don’t want him to avoid the leg cue. Instead I want him to immediately move off to ESCAPE from the leg pressure. Over time I want him to respond more promptly and to less and less intense stimuli over time.

Of course, there are some situations where avoidance is acceptable to me, depending upon what types of cues I want to use. I teach my horses to back up off a hand cue on the shoulder. If the horse decides to back up when I point at the shoulder (before I touch him), that’s fine with me. My hand gesture now becomes the new cue.

It’s possible that bringing up escape vs. avoidance muddies the waters a bit but if you do use negative reinforcment in your training, I think it’s worth taking some time to think about whether you are looking for escape or avoidance and what kind of information you are providing as warning stimuli.

This talk has certainly make me more aware of the nuances of negative reinforcement and the role of warning stimuli, and I will be paying more attention to instances when I observe individuals (probably mostly me!) delaying or procrastinating.


I’ve been blogging about the ASAT conference for several years now. You can find a list of them in the articles page on this blog, but some older ones are on my website, www.equineclickertraining.com.

If you are interested in learning more about how to clicker train your horse, check out my book, Teaching Horses with Positive Reinforcement, available on Amazon.


Categories: Uncategorized

Tags: , , , , ,

9 replies

  1. Thank you for making this available for those unable to attend. The depth of the material is very satisfying to read.

    Like

  2. I do find it a bit depressing that they are deliberately designing studies that use electric shocks on rats and monkeys. The end result research is interesting, though and thanks for posting that.

    Like

    • Hi Janet, I share your sentiments about the use of shock. These experiments were done a long time ago and ideas about the use of animals in research were different back then. Luckily things have changed, although I’m not sure exactly what the current rules are. My thought is that we might as well learn as much as we can from this work, since it has already been done.

      Like

  3. I really enjoyed this and will bookmark it to read again (I am interested in dog training, but the same principles apply), I think its fascinating he found a way to demonstrate that conditioned stimuli can just be accepted as useful information by the animal (rather than becoming emotionally aversive by association).

    However, I don’t understand how the given examples are negative reinforcement – a horse learning to avoid the application of pressure, or a rat learning to avoid the application of shock, looks fundamentally more like (conditioned) positive punishment to my dog trainer eyes?

    Like

    • Hi Rachel, I’m glad you found the article useful, or at least thought provoking. Both those examples (pressure and release with a horse and shock) do also involve positive punishment. This is not uncommon when using negative reinforcement because there are cases (usually in training) where in order to use the removal of something as a reinforcer, you have to add it first. This is not always true as there are naturally occurring aversives, but it’s not uncommon for P+ and R- to occur together. It’s a good example of how we are not always training in only one quadrant.

      For me, the best way to think about which quadrant I am using is to clearly define the behavior, the consequence, and the effect it has. With the rats, the researchers were testing to see if rats would press a lever to avoid shock, – or in other words, would the lever pressing behavior increase if doing so meant that an aversive (the next shock) was removed. If I wrote out what happened, it would look something like this:

      rat in box – > shock -> rat in box – > shock -> rat presses lever -> rat presses lever -> rat presses lever

      The rats were pressing the lever so the shock would “go away” or not occur instead of occurring when it would have normally happened (every 10 seconds in this case).

      In this example, it’s a little hard to argue that the shock was punishing a specific behavior because I don’t know if the rats were doing anything specific when the shock occurred. But in some cases, one can argue that one behavior is punished while the other behavior increases. Which quadrant you focus on depends upon which behavior you are measuring. With horses we use pressure and release to get movement. If I touch my horse with my leg, she goes forward. I remove my leg and that reinforces the behavior of going forward. However, one could argue that putting my leg on punishes standing still because that behavior decreases. However, since my goal is to increase forward movement, I would say that I am using negative reinforcement to teach the behavior.

      So you have a valid point that when using negative reinforcement, you have to consider that some behaviors may also be punished. It also brings up the question of how can a behavior be reinforced by “non-occurrence” of an event? Dr. Neuman mentioned this as one of the problems with the current understanding of negative reinforcement. If the aversive is no longer occurring, can it’s removal be considered a consequence? I think it can, at least for a time – until you are reminded that the consequence is still out there and you go back to avoiding it. This is why behaviors can be maintained through negative reinforcement. You keep doing the behavior, even in the absence of punishment, because you think you are avoiding it.

      Like

      • Thank you for your reply!

        “In this example, it’s a little hard to argue that the shock was punishing a specific behavior because I don’t know if the rats were doing anything specific when the shock occurred.”

        What you said here is key for my understanding, thanks. I know we need can only define punishment (and reinforcement) by its effect on the animal’s behaviour, and because the shock presumably did not reduce any particular behavior the rats were doing (although we’d need to check for superstitious learning before we were sure about that), I think you are correct to say it might not be P+ even though an aversive was deliberately applied. Interesting.

        If rats pressing the lever had terminated a shock (rather than avoided it entirely), then I can clearly see how we could call lever pressing “negatively reinforced”. If they taught the rats this way initially, then let them discover that they could also prevent the shock (not just terminate it) then I would still feel pretty comfortable calling the current situation conditioned negative reinforcement.

        But if the current situation was what the rats learned from the start (lever pressing has always just prevented a “bad thing” from happening), it feels a little bit harder to categorise.

        Lots to think about, thanks again.

        Like

      • I’d have to find the original paper, but I’m pretty sure the rats were shocked multiple times before they figured out they could press the lever to avoid it. As you said, if they never experienced the shock, it would not be negative reinforcement – the aversive had to have been there at some point.

        Like

Trackbacks

  1. CLICKER TRAINER USING A SIGNAL BIT… PARADOX OR SCIENCE? – Mills Horsemanship & Hoofcare

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s