equine clicker training

using precision and positive reinforcement to teach horses and people

ASAT Conference 2019: Dr. Alliston Reid – From Behavior Chains to Behavioral Skills: Animals learn more than previously expected.

Dr. Alliston Reid is a professor of psychology at Wofford College. His specialty is comparative cognition – understanding the basic mechanisms of learning and memory across species by studying the response-selection mechanism. He has been involved in multiple projects, including the work with Chaser, the border collie who learned over 200 words, and the popular rat basketball tournaments.

One of his major interests has been looking at how animals learn behavior chains and asking if chaining theory, which was a very early way of explaining how animals learn behavior sequences, accurately predicts how chains are learned and maintained. While chaining theory has been updated over the years (it was first described in the 1860’s), Dr. Reid felt there was a better way of understanding how animals learn specific sequences of behavior. His question was “What do we need to know, to replace chaining theory?”

In this talk, he’s going to describe some of limitations of chaining theory and then describe a new way of thinking about behavior chains, one that more accurately predicts what happens in real learning situations, and that provides a “rigorous methodology” for experiments. Because this talk was long and quite detailed, I’ve broken it up into four sections.

  • Chains and chaining theory defined
  • History and limitations of chaining theory
  • From behavior chains to behavioral skills
  • Factors that influence the speed of skill acquisition

Note: One of the themes of this conference was chains and sequences, which is a great topic, but generated some confusion as there seem to be several different definitions/distinctions used by different fields when distinguishing between the two. I’ve tried to be clear about how each speaker is using the terms, but do be aware that there may be differences between speakers.

Part 1: Chains and Chaining Theory Defined

Clarifying terms: “Behavior chain” vs. “Chaining Procedure”

A behavior chain: Each response in the chain produces a stimulus that serves as a discriminative stimulus that (1) reinforces the previous response and (2) sets the occasion for the next response.

“Behavior chain” describes a well-defined structure of simple behavior patterns. Most serial patterns are not chains, even though they may be described as “chaining procedures.”

This classification (behavior chain) assumes no knowledge of serial order of the stimuli, even after 100’s of repetitions. (note: this in itself is pretty interesting and should already be raising some questions…).

Chaining Procedures:

  • Widely used to train animals and in ABA (applied behavior analysis) to help children with autism and other disabilities.
  • Break down a task into a smaller series of interconnected steps.
  • Can be done as forward chaining, backward chaining, and total task presentation.
  • May also be described as task analysis which involves breaking a complex skill or behavior down into smaller units that are easier to teach, and which results in a series of sequentially ordered steps or tasks.

Chaining Theory: A very early way of explaining learned (not innate) behavior sequences.

Chaining theory showed how various processes might combine to create the structure (chain). These processes are:

  • The 3 term contingency (links of the chain)
  • Discriminative stimuli
  • Stimulus-response associations
  • Conditioned reinforcement
  • Delay of reinforcement

If we take the basic diagram of chaining theory and rewrite it to describe a specific example (lever pressing by a rat) then it would look like this:

Chaining theory has been tested through carefully planned experiments to see if it makes accurate predictions. Dr. Reid described some experiments that have been done with lever pressing in rats.

What does chaining theory predict?

  • The model claims that as delay of reinforcement increases, its effectiveness increases. So, behaviors that were further away from reinforcement would be stronger. But…this has not been shown to be true in practice, instead the strongest behaviors are those that occur closer to reinforcement.
  • According to chaining theory, if one event does not occur, then the chain should fall apart. But, in real life this does not happen. For example, in the scenario above, if the click does not occur, the rats will continue the chain (perhaps with a slight pause). Rats are adept and they learn about serial patterns.

So, let’s go back and look at the history of chaining theory to see where it came from, how it has changed, and why it might be useful. Weaknesses in chain theory do not imply that the chaining process will be ineffective. Behavior chains can exist even if chaining theory is wrong. What we need is to update chaining theory or find a better explanation for how chains function.

The history of chaining theory:

Chaining theory was originally described about 150 years ago and while it has been modified and updated, it has not undergone any kind of complete revision. Behavioral science has changed a lot in the last 81 years and most early models have been replaced or refined of history in various forms. Even though most researchers took it with a “grain of salt,” no one has looked more closely at it, and it’s definitely time for it to be updated.

Here is a snapshot of some of the more significant events since chaining theory was first proposed.

  • Described by Bain (1868) and popularized by William James (1890)
  • Titchener (1909) proposed that the meaning of a word consists of the chain of associations which it arouses – a simple associative chain.
  • Washburn (1916) proposed a chain theory of language
  • Watson (1920) interpreted “internal speech” as chains of individual movements
  • Skinner (1936) behavior of the intact organism

Part 2: The History and Limitations of Chaining Theory: The Good, Bad, and the Ugly

Let’s identify the “Good” first:

  • Chaining theory was a very early way of explaining learned (not innate) behavior sequences.
  • Showed how several independently verified processes could combine to create structure in behavior patterns.
  • It avoided explanations based on the organism’s goals or intentions, included no “mental” processes.

Early criticisms of chaining theory (The Bad):

  • Karl Lashley (1951) is the most well known:
    • Language is organized by meaning, not by fixed sequences of elements.
    • Chaining theory could not account for knowledge of relationships between non-adjacent items in serially organized behavior. Arpeggios in music are too fast to rely on sensory control of movement.
    • He argued that serial order is mediated by central integrative processes, not by linkages between successive elements.
    • His work was taken more seriously by professionals working with humans than with animal trainers.
  • Maze studies – spatial learning produces learned behavior patterns – taking short cuts and detours can’t be behavior chains, but given the chance, rats will take them!
  • Herb Terrace produced strong evidence:
    • Animals can learn simultaneous chains (all stimuli are presented simultaneously) as well as successive chains (the stimuli are presented one by one in succession). Chaining theory says this is not possible.
    • In simultaneous chains (A -> B -> C -> D ->E), animals can produce the correct order even with subsets of stimuli (e.g. B->D). Even pigeons can skip links. Chaining theory predicts that animals cannot complete the chain if they skip links.
    • He demonstrated that animals often learn ordinal position within sequences. Pigeons trained on three different 3-item lists can produce the correct order even when the elements are combined from different lists.
  • Other researchers provided compelling results:
    • Damian Scarf and Michael Colombo have verified and extended Terrace’s demonstrations that animals form representations of serial order.
    • Discriminative stimuli “set the occasion” for that next response? Allen Neuringer demonstrated that discriminitive stimuli also control extended response patterns, such as vary and repeat patterns – meaning that animals can learn when they will be reinforced for repeating a pattern, and when they will be reinforced for variations on the pattern.
    • I added my 2 cents a few times by measuring the relation between response sequences and the delay of reinforcement gradient, which shows that the strongest responses are those that occur closest to reinforcement delivery.

Here’s an example that shows another of the limitations of chaining theory:

Let’s say you have a three behavior chain of three lever presses. You change it by modifying one link, either the first or the last. Now you test to see how long it takes for the rat to learn the new chain. Which “new chain” is easier for the rat to learn? – the one where the first behavior was changed, or the one where the third behavior was changed?

He asked this question to a room full of knowledgeable researchers at a conference and the audience was pretty equally divided, with half predicting the rats would learn one chain faster and the other half predicting the opposite. Even though the researchers based their choice on chaining theory, the theory did not provide enough quantitative details about the effect of reinforcement and timeout to make a simple prediction. Note: He asked the same question at this conference and also got about a 50/50 split.

However, when this was actually tested, it turned out that the last response was most sensitive to change – or in other words, it persisted the least. Or in other words, the rats learned a new chain more quickly when it was the last behavior in the chain that was changed. If the first behavior was changed, there were more errors and it took more time for them to learn the new chain. This can be explained using the Gradient Model (Reid 2009).

Chaining Theory – The Ugly

Chaining theory is wrong. It’s not just wrong, it denies the behavioral flexibility of animals from primates to pigeons. A new approach that considers what the animal is learning (skills) may lead to a more accurate understanding of how behavior chains function and facilitate learning.

  • Rats quickly take novel detours and shortcuts – abilities never present in behavior chains.
  • Animals don’t get lost at “choice” points in mazes. They make a selection, learn from it, and choose better the next time.
  • Animals learn behavioral skills, not just chains. Practice makes perfect. Enough practice leads to autonomy: they can complete the skill alone without relying on the trainer or the stimuli in the operant chamber.
  • Overreliance on chaining theory prevented researchers from asking the necessary questions to discover how animals learn sequences. He is studying how autonomy develops.

Part 3: From behavior chains to behavioral skills.

Ok, now we get to the part where we can see some practical applications. In this part of his talk, Dr. Reid described some experiments that allowed him to study what kind of learning occurred when animals practiced the same behavior chain for a long time.

The question was: How do guiding cues combine with practice to produce “behavioral autonomy?”

  • Do behavioral skills become autonomous?
  • And if so, what factors affect the process?

Testing the acquisition of behavioral skills:

To test acquisition, they set up two different experimental conditions. Within each condition, the experiment was divided into two phases. The first phase was the training or acquisition phase where the rat was learning the behavior chain. In this phase, the rat followed “guiding cues” that indicated which behaviors would be reinforced. In the second phase, “autonomy,” the guiding cues were no longer present and the rat was reinforced for continuing to offer the correct sequence without them.

They measured how long it took the rats to reach acquisition, how it responded when they were removed, and the accuracy throughout the experiment. Of particular importance was whether or not the rats developed autonomy – Could the rat learn to do the behavior on its own?

For this experiment:

  • Autonomy involves changes in stimulus control by
    • A: Environmental cues (e.g. lights)
    • B: Cues resulting from the subject’s own behavior (behavioral history, serial order, proprioception)
  • We should focus on the changes in stimulus control during guided skill learning, more specifically how animals switch from using “guiding cues” to “practice cues.”

The set-up: Designed to test the “skill” and “guiding” cues for rats

  • Skill: Left/right lever press sequences required in all conditions and experiments
  • Guiding Cues: They manipulated whether the lights were on or off over the respective levers.

Experiment #1: The “Lights” condition

The procedure:

  • If the light is on over the lever, the rat will be reinforced (with one food pellet) for pressing it.
  • If the rat presses the wrong lever, there is a 3 second time out before the rat has the chance to start again.
  • The rats were trained to press the levers in a left – right sequence. He called this a behavioral skill.
  • Each rat was trained (using the lights) until it was between 80 and 100 percent accurate at doing the L-R lever presses. This was the training or “acquisition” phase.
  • Then they tested whether the rat had learned the behavioral skill (L-R presses in correct order) by removing the guiding cues. Now the rat had to do the correct sequence without any guidance. In this phase, they were testing the rat’s autonomy.

The results:

  • When the lights were turned off, there was an immediate drop (46.4%) in accuracy, to about 40%, but then the rats improved in successive sessions.
  • The improvement after the initial drop was due to the rats using “practice cues” – cues that they learned during the training process.
  • My understanding is that practice cues are cues that are learned during the training process, but that are not intentionally provided as part of the experiment. They are the natural result of repeating the same behavior many times in a row. The rats got faster and faster at doing the left-right sequence because they got feedback from their own bodies and the environment on each repetition. For example, if a rat didn’t turn far enough to reach the lever on one repetition, he would adjust on the next one.

Experiment #2: The “Reversed Lights” condition (done with different rats)

The procedure:

  • The requirements for reinforcement were the same as in experiment #1. The rats had to press the levers in a left – right sequence. A correct answer was reinforced with a food pellet. An incorrect answer resulted in a 3 second time-out.
  • This time the rat had to press the lever where the light was OFF. So if the light was on over the left lever, the rat should press the lever on the right.

The results:

  • The initial training time until the rats had achieved proficiency was much longer in this experiment (28 compared to 14).
  • However, when the lights were both turned off, there was less of a drop in proficiency (19.6% compared to 46.4% for the lights on condition).
  • They also did an ABA reversal, returning to the reversed lights condition after testing with both lights.

Comparing the results: Time to acquisition and autonomy

Time to acquisition:

When comparing the speed of learning the Left-Right Sequence (time to acquisition under “training conditions”), they looked at the time to achieve 80% accuracy and found:

  • Fastest learning occurred with the lights on condition
  • Second fastest learning occurred with the reversed lights condition
  • Slowest learning occurred with the no guiding cues condition. For “no cues,” they tested with both lights on and both lights off.

Autonomy:

  • The rats that learned under the reversed lights condition reached autonomy faster than those that learned under the lights on condition. So, even though they took more time to learn the behavioral skill (L->R lever presses), they were more successful once the guiding cues (lights) were removed.
  • Different amounts of training led to different degrees of autonomy.
  • The more time it took to reach aquisition (80% success), the more successful the rats were when the guiding cues were removed.

Why is autonomy important?

When an animal reaches autonomy, it has become proficient in the skills needed to complete the entire behavior chain. This can be important for some kinds of tasks where the trainer wants the animal to work independently, without added trainer or environmental cues.

If autonomy is your goal, then you need to set up the training so that the animal achieves autonomy within a reasonable amount of time and with the appropriate amount of accuracy. Dr. Reid compared it to teaching someone to walk a route to a particular destination. If someone guides you every step of the way to the store, and continues to do so, you will find it more difficult to go to the store on your own if that person is suddenly unavailable.

Conclusion:

Learning is faster (time to acquisition) under the guiding cues condition, but autonomy (ability to do it without any guiding cues) occurs faster when the guiding cues are “weaker.” Note: “Weaker” doesn’t mean inconsistent or unclear. In the case of the lights on/reversed lights study, the reversed lights functioned as weaker cues because the animal’s natural tendency is to move toward the light, not away from it.

These data and those in the previous study made us confident that there is a real difference in the effectiveness of stimulus control between the Lights condition and the Reversed-Lights condition.

Part 4: Factors the influence the speed of skill acquisition

Some new questions…

  • Do differences in the effectiveness of guiding cues (easy vs. hard/weaker) influence the development of control by practice cues?
  • Does control by guiding cues and practice cues develop at the same rate?

Our Goal: To measure developing stimulus control by guiding cues and by practice cues independently in the same session. This was done by inserting probe trials without guiding cues.

Adding Probe Trials:

What is a probe trial? Probe trials are used to test or evaluate the level of learning at different points in the training process. They provide the researcher with information about the animal’s level of proficiency, before it has reached acquisition.

In this experiments, the probe trials were done by either turning Both Lights on or Both Lights Off. Both conditions functioned as “no cue” conditions since the rats where using the lights for guidance. They were inserted into the training phase, before the behavior was considered to be learned and stable.

Results of probe trials:

The black dots are the Lights/Reversed Lights conditions, the vertical lines are the probes.
  • The most accurate and consistent learning occurred under the Lights condition/Both Lights Probes.
  • In both the Lights/Lights on probe and Lights/No Lights Probe conditions, the accuracy under the probe conditions was usually slightly less than under the lights on conditions.
  • In both Reversed Lights conditions (probe = both lights, probe = no lights), the accuracy under the probe conditions more closely matched the accuracy under the Reversed Lights condition. This is shown on the graph by the proximity of the bars to the dots (no gap).
  • In all cases, control by guiding cues and practice cues develops at about the same rate.
  • In the Reversed Lights condition, accuracy during probe trails sometimes exceeded accuracy during guiding cue trials.
  • When all guiding cues were removed (all probe sessions), the Reversed Lights condition was consistently more accurate.

Conclusions:

  • Practice cues can develop even in the presence of effective guiding cues.
  • Less effective guiding cues and harder tasks produce greater autonomy. (this has been observed in 5 studies)
  • However, skill acquisition is much slower if no guiding cues are available.

What’s next?

How about comparing learning in humans and rats?

Focusing on “Behavioral skills” encourages questions that were not asked when the focus was limited to chaining theory. Now the question is:

Do rats and humans learn behavioral skills the same way? Specifically…Do rats and humans show the same features of skill learning, especially any counter-intuitive features?

Let’s look at a human study:

  • New Conceptualizations of Practice: Common Principles in Three Paradigms Suggest New Concepts for Training (Dr. Richard A Schmidt, Robert A. Bjork 1992)
  • They discovered a counter-intuitive feature of human skill learning: Factors that degrade performance during acquisition enhance performance in a subsequent autonomy/retention condition, and vice versa.
  • Introducing difficulties for the learner can enhance training.
  • They argued that difficulties trigger encoding and retrieval processes that support learning, comprehension, and memory.
  • Yes – these are the same results found by Dr. Reid in his experiments.

Has the same work been done with rats?

  • Skill learning has been widely studied in humans, but far less in rats.
  • Rat studies have focused on anticipatory cues, transfer of control from discriminative stimuli to new cues that develop during practice. We have observed the same counter-intuitive feature of human skill learning: less effective cues and more difficult behavioral skills degrade accuracy during acquisition, yet enhance accuracy in the autonomy phase.
  • New studies could focus on the roles of (a) antecedent stimuli or on (b) feedback from performance.

Informative vs. Operant Feedback

  • Human studies have focused mostly on “knowledge of results” (KR): the role of post-trial informative feedback about performance errors during skill acquisition. This is widely regarded as the most important variable in human skill acquisition.
  • Human studies ignore the other type of feedback: the operant consequences of responding (food/timeout).
  • No rat studies have manipulated informative KR – which is what they decided to do. Ask the question – Does KR affect rats the same way as with humans?

Human study:

In a study by Winstein and Schmidt (1990). They compared the results with 100% KR, 67% KR and 0% KR in the acquisition phase. The prediction was that the highest success during autonomy would be for the 100% KR condition. Instead, the 67% KR condition led to the greatest success in the retention phase. These results have been replicated dozens of times in humans.

The rat study:

  • Provided immediate operant feedback (pellet or timeout)
  • Provided KR after the operant feedback was over (no overlap)
    • KR: if the rat made an error, there was a full 1 second tone
    • KR: if the rat was correct, there were 4 tone beeps
  • Food was not provided for every correct response. They used food for about half the responses, delivered randomly, because they didn’t want the tones to correlate directly with whether or not the rat was reinforced with food.

Results:

The greatest accuracy in the retention phase was for those rats that received 67% KR in the acquisition phase. This replicates the work that has been done with humans. But, there was one interesting difference and that was that rats learned faster in acquisition without any KR feedback. So, adding KR improved their performance in the retention phase (which measures autonomy) but slowed down their learning in the acquisition phase.

Conclusions:

Did rats and humans appear to learn behavior in the same way?

  • Yes, they both showed that same surprising, counter-intuitive feature. Factors that degrade performance during acquisition often enhance performance in a subsequent retention condition and vice versa.
  • In addition, both anticipatory “guiding cues” and informative “knowledge or results” appeared to affect learning in rats and humans the same way.

Focusing on Behavioral Skills promotes novel important questions that we can answer.

  • Does practice make perfect for animals? How?
  • Does this practice obey the power law of practice?
  • Does transfer of stimulus control produce autonomy?
  • Is this transfer the same as in fading procedures?
  • Can animals achieve autonomy without involving the transfer of stimulus control? Other methods?
  • Can models of human skill learning explain skill learning in other species?
  • Does autonomy also involve transfer from declarative to procedural memory? Is it related to automaticity?

A few of my thoughts:

It’s going to take me a while to sort through the information he presented, but a few things jumped right out at me.

  • There’s always going to be some discrepancy between theory and practice. Sometimes the theory is ahead and we can go to the science to figure out how to do something. Other times the theoreticians have to look at what happens in real life and use that information to improve their theories.
  • When teaching behavior chains, you should consider the end goal. Is the goal for the animal to continue to rely on the guiding cues, or to achieve autonomy?
  • Mistakes really are information and it may be ok to “test the waters” to see what the animal has learned.
  • I found it really interesting that adding a KR slowed down learning during acquisition. More information is not always better.
  • I am also pondering how this relates to cueing issues. I think there’s probably a lot of food for thought here on how animals learn and use cues.


While I try to be accurate in my note taking, there may be some errors either in my understanding or my presentation of the information from this talk. If you have questions, feel free to contact me or leave a comment.

Thanks to Dr. Reid for allowing me to share his presentation and for sending me additional resources. Thanks to the ORCA students and to the organizers of the Art and Science of Animal Training Conference for all their hard work on putting on this great event.

Categories: Uncategorized

Tags: , ,

4 replies

  1. Thank you for writing this long post. As you know I attended dr. Alliston Reid’s presentation. You are so good at reflecting on presentations! For a moment it feels like if I am back at The Art and Science of Animal Training conference 2019. Thank you, thank you, thank you.

    Like

    • Hi Sandra,

      You’re welcome. There was a lot of information there and it was good to go through it again. I should have added to the “takeaways” that his presentation gave me more insight into why it can be so hard to maintain behavior chains and how easy it is to unintentionally create them. I also found it fascinating how they viewed animal learning a long time ago, particularly that they thought an animal could repeat a pattern 100s of times and not learn it!

      Katie

      Liked by 1 person

  2. Many thanks for this. Not only am I in awe of your note taking and your care in providing such an accurate report, but I also appreciate your conclusions at the end!

    Liked by 1 person

    • Thanks Terry. You would have enjoyed the conference and the extra discussions. I may have to post an update at some point, so I can share what I did with the information I learned. It’s really got me thinking about what we want our animals to learn when we ask them to do many behaviors (with and without cues) before the terminal reinforcer.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s