Notes from the Art and Science of Animal Training Conference (ORCA): Dr. Jesús Rosales-Ruiz on “Conditioned Reinforcers are Worth Maintaining.”

click treat 1.jpg

In this short presentation, Jesús Rosales-Ruiz revisited the question:

“Do I have to treat every time I click?”

He said that this question constantly comes up and that different trainers have different answers.

Before I share the details of his presentation, I want to mention that he said he chose to use the words “click” and “treat” because he was trying to avoid using too much scientific jargon.    But, as he pointed out at the end of his talk, it would be more accurate to say “click and reinforce,” and probably even more accurate to say “mark and reinforce.”

Since he used “click and treat,” I’m using the same words in these notes, but you should remember that he is really looking at the larger question of how we use conditioned reinforcers and whether or not they always need to be followed by a primary reinforcer in order to maintain their effectiveness.

Back to the question…

Do you have to treat after every click?

Some say YES:

  • Otherwise the effectiveness of click may be weakened
  • Bob Bailey says: “NEVER sound the bridging stimulus idly (just to be ‘fiddling’) or teasing…it’s important that the “meaning” of the bridging stimulus is kept unambiguous and clear. It should ALWAYS signify the same event- The primary reinforcer.”  How To Train a Chicken (1997) Marian Breland Bailey, PhD and Robert E Bailey.
  • This view is supported by research that shows that the conditioned reinforcer should be a reliable predictor of the unconditioned reinforcer.

Some say NO:

  • Once a click is charged, you only have to treat occasionally
  • Once a behavior is learned, you only have to treat occasionally
  • Supported by research on extinction (in general, this means that if an animal learns that not every correct answer is reinforced, then it will keep offering the correct answer for some period of time, even if there’s no reinforcement.

So maybe there is some research for both.

He said that he started thinking about this question again after reading a blog by Patricia McConnell, who was sharing some thoughts on whether or not to treat after every click. She was wondering why clicker trainers recommend it, but other positive reinforcement trainers do not.

Patricia McConnell wrote:

  • “For many years I have wondered why standard clicker training always follows a click with a treat.”
  • “Karen Pryor strongly advocates for us to reinforce every click (secondary reinforcer) with a treat (primary reinforcer). Ken Ramirez, one of the best animal trainers in the world, in my opinion, always follows a click with a treat.”
  • “But Gadbois went farther, given the link between motivation and anticipation, suggesting that it was important to balance the “seeking” and “liking” systems, with more emphasis on the former than the latter during training. He strongly advocates for not following every click (which creates anticipation) with a treat, far from it, for the reasons described above.”

You can read the blog at: http://www.patriciamcconnell.com/theotherendoftheleash/click-and-always-treat-or-not.

If you have not heard of Simon Gadbois, you can read about him here: https://www.dal.ca/academics/programs/undergraduate/psychology/a_day_in_the_life/professors/simon-gadbois.html.

What happens if you don’t treat after every click?

Jesús was intrigued by Gadbois’s statement that you don’t want, or need to treat after every click because you want to balance “liking” with “seeking.” And that if you don’t treat after every click, you get more seeking.

One reason for his interest was that he already knew of an experiment that had been done to look at what happens if you don’t follow every click with a treat.  About 10 years ago, one of his students wanted to compare how the behavior of a dog trained under conditions where one click = one treat was different than a dog that was trained with multiple clicks before the treat.   The two conditions looked like this:

  • one click = one treat:  The trainer clicked and treated as normal after every correct response:  cue -> behavior -> click -> treat -> cue -> behavior ->click -> treat.
  • two clicks = one treat:  The trainer clicked for a correct response, cued another behavior and clicked and treated after the correct response: cue -> behavior -> click -> cue -> behavior -> click -> treat.

These dogs were tested by asking for previously trained behaviors. Each dog was trained under both conditions so some training sessions were under one click = one treat and some were done under two clicks = one treat.  There were multiple reversals so the dogs went back and forth between the two conditions several times over the course of the experiment.

Under the one click = one treat condition, the dogs continued to perform as they had in training sessions prior to the start of the experiment. Under the two clicks = one treat condition, both dogs showed frustration behaviors, deterioration in behavior and at times the dog would leave the session.

There were many factors that could have contributed to the result, including the fact the dogs were originally trained under one click = one treat,  the reversals themselves could have caused confusion, and the dogs might have done better if they were transitioned more gradually.  But, it was pretty clear that omitting the treat did not activate the seeking system, instead it created frustration. Why?

They considered two possibilities:

  • Perhaps because they were getting less food? Under the one click = one treat condition, each dog was getting twice as much food reinforcement as the dog training under the two clicks = one treat condition.
  • Properties of the click had changed.  What does the click mean to the dog?

Can we test if it’s about the decrease in food reinforcers?

If you want to test what happens when you click without treating, you have to change the ratio of clicks to treats. You can do that by omitting some treats, or by adding some clicks. But both options are probably not going to be perceived in the same way by the animal.

In the experiment described above, the trainer changed the ratio of clicks to treats by omitting food reinforcers after half the clicks. This is a significant decrease in the number of primary reinforcers that the dog was receiving. Could the results be more about the reduction in food reinforcers, than about whether or not each click was followed by a treat?

One way to test this would be to keep the number of food reinforcers the same, but add another click.  To do this, the trainer taught the dog to do two behaviors for one click.  The dog would touch two objects. When he touched the second object, he would get clicked and treated.

Once this behavior had been learned, the trainer decided to add another click by clicking for the first object, clicking for the second object and then treating. So the pattern would be behavior (touch) -> click -> behavior (touch) -> click -> treat. This works out to clicking after every second behavior, but the trainer got there by adding a click, not by removing a treat.

What she found was that the dog just got confused.  The dog would orient to the trainer on the first click, get no response, go back to the objects and touch again (either one).  Or he might just wait and look at the trainer, or he might leave. The additional click didn’t seem to promote seeking. Instead it interrupted the behavior and created confusion.

Why?  Well, perhaps it has to do the two functions of conditioned reinforcers. This goes along with the second point above, which is that the difference was due to how the click was being used.

The 2 Functions of Conditioned Reinforcers:

Let’s take a moment and look more closely at conditioned reinforcers.  Conditioned reinforcers are stimuli that become reinforcers through association with other reinforcers.  They usually have no inherent value. Instead, their value comes from being closely associated with another strong reinforcer for a period of time, (while it is being “conditioned”), and this association must be maintained through regular pairings in order for the conditioned reinforcer to retain its value.

In training, this is usually done by deliberately pairing the new stimulus with a primary reinforcer.  There are different kinds of conditioned reinforcers and their meaning and value will depend upon how they were conditioned and how they are used.  Marker signals (the click), cues, and keep going signals (KGS) are all examples of conditioned reinforcers.

Regardless of the type, all conditioned reinforcers have two functions. They are:

  • Reinforcing
  • Discriminating (they can function either as cues or event markers, or both)

Conditioned reinforcers are not just used in training and laboratory experiments.  They are everywhere.

Jesús used the example of a sign, which is a conditioned reinforcer for someone driving to a specific destination.  Let’s say you are driving to Boston and you see a sign that says “Boston, 132 miles.” The sign provides reinforcement because it tells you that you are going the right way. It also has a discriminatory function because it provides information about what to do next, telling you to stay on this road to get to Boston.

When talking about conditioned reinforcers, it’s easy to focus on only one of these functions.  Is this why there is confusion?  Perhaps the debate over whether or not to treat after every click is because some trainers are focused on the discriminating function of the click and others are focused on the reinforcing function of the click?

What does training look like if the focus is on the discriminating function?

When every click is followed by a treat, the click has a very specific discriminating function. It tells the animal it has met criteria and reinforcement is coming.  The trainer can choose what the animal does upon hearing the click (stop, go to a food station, orient to the trainer), so the trainer has to decide what behavior she wants the animal to do upon hearing the click. But, regardless of which you choose, the click functions to cue another behavior which is the start of the reinforcement process.

A lot of one click = one treat trainers emphasize the importance of the click as a communication tool.  There are two aspects to this. One is that it marks the behavior they want to reinforce and the other is that it tells the animal to end the behavior and get reinforcement. If the click is always followed by a treat, the meaning of the click remains clear and it provides clear and consistent information to the animal.

You can think of the click -> treat as part of a behavior chain, where the click has both a reinforcing function, from the association (click = treat), and also an operant function (click = do this).  Clicker trainers who promote the one click = one treat protocol still recognize that the click itself has value as a reinforcer, but they choose to focus on the click as an event marker and as a cue, more than as a reinforcer.

What does training look like if the focus is on the reinforcing function?

A lot of trainers who treat intermittently (not after every click) emphasize that the click is a reinforcer in itself, so it’s not necessary to also provide a treat after every click. They are looking at the reinforcing function of a conditioned reinforcer and would argue that the whole point of having a conditioned reinforcer is so that you don’t have to follow it with another reinforcer every time.

They are still using the discriminating function of the click because it can be used to mark behavior.  But, the click does not become an accurate predictor of the start of the reinforcement phase, so it is not going to have the same cue function as it does under the one click = one treat condition.

Jesús did mention that if the click is not a reliable cue for the start of the reinforcement process, then the animal will look for a more reliable way to tell when it will be reinforced. In most cases, the animal finds a new “cue” that tells it when to expect reinforcement and the click functions as a Keep Going Signal. If the animal can’t find a reliable cue for the start of reinforcement, or if it’s not clear when the conditioned reinforcer will be followed by reinforcement, and when it won’t, then he will get frustrated.

Back to the Literature…

With this information in mind, what can we learn by going back and looking at the research on conditioned reinforcers?  Well, it turns out that the literature is incomplete for several different reasons:

  • It doesn’t look at the cue function of the conditioned reinforcer.
  • Animals in the lab are often restrained or constrained (limited in their options) so the cue function of the conditioned reinforcer may be more difficult to observe.
  • It doesn’t take into account that the most consistent predictor of food is the sound of the food magazine as it delivers the reinforcement.   Even when testing other conditioned reinforcers, the sound of the food magazine is what predicts the delivery of the food reinforcement, and it’s on a one “sound” = one “treat” schedule.
  • To test a conditioned reinforcer that as sometimes followed by food and sometimes not, you would have to use two feeders, one with food and one without and even then you would have to worry about vibrations. Most labs are not set up with two feeders so this work has not really been done.

 
He also mentioned that a lot of what we know about conditioned reinforcers in the lab is from research where the conditioned reinforcer was used as a Keep Going Signal (KGS), and not as a marker or terminal bridge.

I asked Jesús if he had an example of an experiment using a conditioned reinforcer as a KGS and he sent me an article about a study that looked at the effect of conditioned reinforcers on button pushing in a chimpanzee.

The chimpanzee could work under two different conditions. In one condition, he had to push the button 4,0oo times (yikes!) and after the 4,000th push, a light over the hopper would flash and his food reinforcement would be delivered. In the other condition, he also had to press the lever 4,000 times, but a light would flash over the hopper after every 400 pushes, and then again at the end when the food was delivered after the 4,000th push.

The chimpanzee was tested under both conditions for 31 days and the results showed that he worked faster and with fewer pauses until he got to the 4,000th push when he was reinforced by the flashing light every 400 pushes.

Once the chimpanzee had been tested under both conditions for 31 days, they started the second part of the experiment.  In this part, the chimpanzee could choose the condition (by pressing another button) and he usually chose the one where the light flashed after every 400 pushes.

So, having a Keep Going Signal improved the speed at which the chimpanzee completed the 4000 pushes and was also the condition preferred by the chimpanzee.  This suggests that Keep Going Signals can be useful and an animal may prefer to get some kind of feedback.

In this experiment, the conditioned reinforcer they were testing (the flashing light) was functioning as a KGS and the sound of the food magazine was what told the chimpanzee that he had met criteria.  So, this is an interesting experiment about conditioned reinforcers as Keep Going Signals, but it also shows the difficulty of separating out the conditioned reinforcer from the stimulus that predicts food delivery.

An example of training a KGS with a dog

Jesús talked a little bit more about Keep Going Signals, using an example from one of his own students. She wanted to teach her dog a new conditioned reinforcer that she could use as a KGS. She started by teaching the dog to touch an object for a click and treat. Once the dog had learned the behavior, she said “bien” (her new KGS) instead of clicking, and waited for the dog to touch the object again. If the dog repeated the touch, then she would click and treat.

She was able to use the KGS to ask the dog to continue touching an object and I think she tested it on other objects. You do have to train a KGS with multiple behaviors in order for it to become a KGS, as opposed to a cue for a specific behavior. I don’t know if she tested it with other behaviors, but that would be the next step. I’m also not sure if they compared the dog’s performance, with and without the KGS, to see if adding a KGS increased the dog’s seeking behavior, as Gadbois had suggested it would.

Conclusion

The difficulty with the question “Do I have to treat after every click?” is that the answer depends upon how you are using the click and whether or not it cues the animal to “end the behavior” and expect reinforcement. Conditioned reinforcers have two functions. They function as reinforcers and as discriminators, and you need to consider these functions when choosing how to use the click.

If you are using the click as a Keep Going Signal, the animal learns to continue after the click and the click does not interrupt  the behavior.  This means you can click multiple times before delivering the terminal reinforcer. However, it’s likely that you will end up having a different cue that tells the animal when it has completed the behavior and can expect reinforcement. If you don’t, the animal may become confused about what it should do when it hears the click.

If you are using the click to indicate when the behavior is complete, the animal learns that the click is a cue to start the reinforcement process.  You can teach the animal a specific response to the click so that the animal knows what to do to get his reinforcement. If the click is being used in this way, then it will interrupt the behavior and you will want to wait until the behavior is complete before clicking.

We call both these types, the click as a KGS and the click as an “end of behavior” cue, conditioned reinforcers, but they are not the same thing. There are many kinds of conditioned reinforcers, and when you are not specific, it’s easy to think you are talking about the same kind, but you are not.  So both “camps” may be right, but for the wrong reasons.

Jesús finished by saying we need to study this more carefully in the laboratory and also in real life training situations.  One point he made was that an animal, who initially learned one click = one treat, could probably be re-trained to understand that the click was a KGS, if the transition was done more slowly (than the dogs in his student’s experiment), but he still thinks it would change the meaning of the click from an “end of behavior” cue to a “keep going signal.”

I thought this was a very interesting talk, partly because it shows how important it is to clearly decide how you are going to use conditioned reinforcers and to make sure that you teach your animal what it means. I don’t think it was intended to be the final word on a complicated subject, but the presentation certainly made me more aware of the importance of thinking about the many functions of conditioned reinforcers and how I am using them.

But… I’m not sure it left us with an answer to the question of what happens when the same conditioned reinforcer is used as both a KGS and to end the behavior, which is how many trainers describe their practice of clicking multiple times before delivering the terminal reinforcer. There needs to be research done on what happens if it is used as both.

A few personal thoughts

This presentation was informative, and made me feel more confident about the system I use, but it also left me with some unanswered questions.

I have always followed a click with a treat. It is how I originally learned to clicker train and it has worked well for me. If I want to use a different reinforcer, I have a different marker. If I want to provide information or reinforcement to the horse without interrupting the behavior, I have several other conditioned reinforcers I can use.

It’s never made sense to me to have the same conditioned reinforcer sometimes be a cue to “end the behavior” and sometimes be a cue to “keep going.” I question if that’s even really possible, unless the animal learns it has different meanings under different conditions, and that seems a bit awkward. It just seems simpler to have clearly defined conditioned reinforcers and use them in a consistent manner.

I was intrigued by the research into Keep Going Signals. I do use Keep Going Signals and have found them to be useful. But I have also found that I have to pay attention to maintaining them in such a way that they retain their value (through pairing with other reinforcers), but don’t become reliable predictors of reinforcement and morph into “end of behavior” cues. I’d love to see more research on how to effectively maintain Keep Going Signals, as well as some research on how effective they are at marking behavior.

10 thoughts on “Notes from the Art and Science of Animal Training Conference (ORCA): Dr. Jesús Rosales-Ruiz on “Conditioned Reinforcers are Worth Maintaining.”

  1. Really interesting, and it underlines the importance of being clear with cues, clicks, criteria and not to be sloppy! I have always been 1 click= 1 treat, but I think I might try training a distinct keep going cue.

    Like

    • Thanks for your comment Abigail. As I noted in the article, I do use a keep going signal. I find it very useful for riding where I am often doing behaviors in sequence, but they are not fixed chains so there are some variations. I’m not convinced it’s as effective at marking a specific moment as a click would be, but I do think it provides her with some confirmation that she’s correct if she already has a history of being clicked for a behavior.

      Like

  2. In the 3rd to the last line, do you mean retain rather than retrain? And I wonder about the effect of using an n of 1 because it may depend on which function of the click was trained first.

    Like

    • Hi Skye,

      Yes, thanks for a catching that. It should be “retain.” I’ve corrected it.

      I agree that it probably does matter how the click was initially taught. They did the experiment with two dogs, but that’s still not a lot. My understanding is that it’s very common in psychology to do experiments with very small sample sizes as they are interested in the behavior of individuals. But there are so many variables…

      Like

  3. Nice post! Reading this, I’m thinking also that SEEKING will not be aroused in predictable situations. If it was a fixed ratio of 2-behaviours-per-click, that wasn’t unpredictable, so we wouldn’t expect that increase in the dopamine response (which I’m guessing the proponents of the “blazing” clicker would argue). I touched on the importance of unpredictability in this blog post: http://illis.se/en/seeking/

    Like

    • Hi Karolina,

      Thanks. It’s such an interesting topic. I’d love to see them do more research on it, using some protocols that trainers claim to use effectively. I agree that SEEKING and predictability are probably related and using a fixed ratio schedule was not necessarily the best choice to test for SEEKING. But it did provide some information about how the dogs reacted to a change in the meaning of the click. Thanks for the link to your blog, which I had already read and enjoyed, but will now go and read again.

      Like

  4. What does training look like if the focus is on the discriminating function?

    When every click is followed by a treat, the click has a very specific discriminating function. It tells the animal it has met criteria and reinforcement is coming. The trainer can choose what the animal does upon hearing the click (stop, go to a food station, orient to the trainer), so the trainer has to decide what behavior she wants the animal to do upon hearing the click. But, regardless of which you choose, the click functions to cue another behavior which is the start of the reinforcement process.

    A lot of one click = one treat trainers emphasize the importance of the click as a communication tool. There are two aspects to this. One is that it marks the behavior they want to reinforce and the other is that it tells the animal to end the behavior and get reinforcement. If the click is always followed by a treat, the meaning of the click remains clear and it provides clear and consistent information to the animal.

    You can think of the click -> treat as part of a behavior chain, where the click has both a reinforcing function, from the association (click = treat), and also an operant function (click = do this). Clicker trainers who promote the one click = one treat protocol still recognize that the click itself has value as a reinforcer, but they choose to focus on the click as an event marker and as a cue, more than as a reinforcer.

    What does training look like if the focus is on the reinforcing function?

    A lot of trainers who treat intermittently (not after every click) emphasize that the click is a reinforcer in itself, so it’s not necessary to also provide a treat after every click. They are looking at the reinforcing function of a conditioned reinforcer and would argue that the whole point of having a conditioned reinforcer is so that you don’t have to follow it with another reinforcer every time.

    They are still using the discriminating function of the click because it can be used to mark behavior. But, the click does not become an accurate predictor of the start of the reinforcement phase, so it is not going to have the same cue function as it does under the one click = one treat condition.

    Jesús did mention that if the click is not a reliable cue for the start of the reinforcement process, then the animal will look for a more reliable way to tell when it will be reinforced. In most cases, the animal finds a new “cue” that tells it when to expect reinforcement and the click functions as a Keep Going Signal. If the animal can’t find a reliable cue for the start of reinforcement, or if it’s not clear when the conditioned reinforcer will be followed by reinforcement, and when it won’t, then he will get frustrated.

    As a “Click not always cookie” trainer with some time invested in thinking about Affect and the Seeking system. I think you have these two concepts backwards.

    The Proper clicker trainer (Click=cookie) is focused on reinforcement. Completely on reinforcement. The click must maintain it’s tight bridge to the cookie or the marker will fail to predict reinforcement.

    The marker trainer who sometimes marks with no cookie is focused on discrimination of behavior. Completely on discrimination of behavior. The click need not maintain the tight bridge to the physical cookie, but it certainly needs to maintain it’s relationship to the target behavior.

    I believe this does activate the seeking system and alters the effect and prevalence of reward prediction error. I find these two things to be extremely important to my dog training. I think it makes voracious, flexible learners who start to work for Next instead of discreet cookies.

    I use markers to tell the dog the moment they’re correct. I’m probably 85% purposeful cookie. 15% no cookie or a sloppy next. They like being correct because it leads to a cookie. Once they know being correct leads to a cookie, I start to offer “Next” as that cookie. Next is what the seeking system is looking for to fire.

    If you take a voracious learner, someone who is bad ass at their job, and they have their physical needs met, they can and will work for little external reinforcement. I find that discreet cookies often get in the way of chains, flow, and skill mechanics.

    Loved your article. Thanks for sharing.

    Peace~

    Like

    • Hi Ron,

      You might have to take up the issue of “having it backward” with Jesus. This is his theory and in this article, I was just reporting on the information he presented. He didn’t say what led him to his theory of how each “camp” views the click, but I am assume it is based on observation and discussions with trainers of both types.

      Having said that, I wouldn’t have shared the content of his talk if I didn’t think he was making some valid points. I don’t meet that many people who train by using multiple clicks before treating, but the ones I have met did say they could do it because the click functioned as a reinforcer. So, either the people I encountered had an unusual view of what they were doing, or there are different ideas about how the click works, even among the people who use multiple clicks. I’m betting on the latter…

      The reality is that we are probably trying to take something kind of messy and put it into neat little boxes. Now it’s a reinforcer, now it’s a cue, now it’s …and life is usually not like that. My guess would be that it could be functioning in slightly different ways at different times, depending upon the behavior, the animal, what the animal needs on a given day, etc…

      I am sure there are times when I click and the click is of value to the horse because it predicts reinforcement is coming, and I’m sure there are other times when the click has more value as a discriminator because it clarifies exactly what behavior I want. One of my horses that gets anxious when I am not very clear and she is clearly working for the click, because she wants to know what to do, and the food is secondary.

      I think the point Jesus was making was that we need to think about the functionality of the click and what it means to the animal. That makes it easier to be consistent so that we can use the click effectively and it maintains its value. I think a lot of people are unaware that conditioned reinforcers have multiple functions, so this is a good topic for discussion, even if the conclusions he drew about how different trainers use the click might not apply to every trainer out there.

      I’m sure he would love to hear from you if you want to share your thoughts. He’s very interested in this subject. If you email me (kabart315@gmail.com), I can forward your message to him.

      Like

  5. Great post.
    For me, it throws up the question of what is the difference between a ‘Keep Going Signal and re-cuing the animal. When I’m training, if I have to re-cue, it means the dog has not understood the final task, so I break it down and successfully rebuild.

    Like

    • Hi Tony,

      I am hoping Jesus will do some research looking at Keep Going Signals. I have seen them used in different ways and they certainly can become a crutch and a sign that the behavior needs to be rebuilt. But, I do think they are sometimes useful, especially for behaviors that might end up with some natural variations, just because of how they are used. That usually means behaviors that are not clearly defined in all dimensions. So, for example, I can teach my horse to stand for 30 seconds or I can teach my horse to stand until I cue another behavior. The first behavior is going to be easier to maintain without a KGS because it’s the same every time. The second might be more difficult because if I find myself in a situation where I have the horse stand for 20 seconds a few times in a row, the horse will start to have an expectation. I could certainly go back and do a review in a training session to make sure I include some longer duration stands, but what if I need a longer duration one right after I do a few shorter ones? In a situation like that, I find it’s helpful (I’m not saying it’s necessary) to have a KGS.

      As for the difference between a KGS and re-cueing, I think it depends upon if you want more of the same behavior or if you are using a KGS after one element in a chain. The definition for a KGS seems to vary, depending upon who you ask, but I use the term to mean any cue or conditioned reinforcer I use that tells the animal it is doing well and to continue. So I use it both when working on getting more of a behavior or when putting several behaviors together. I honestly have no idea if that is the “norm” or not, it’s just what works for me.

      Thanks for your comment, it’s always interesting to see what other people do.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s