Barking up the Learning Tree: Dogs, Cats, and Rats
Thinking and Feeling and Acting
In This Chapter
Ringing the bell for Pavlov
Teaching an old dog new tricks
Making connections and staying in condition
Reinforcing behavior and punishing the culprit
Athletes are some of the most superstitious people around — only gamblers can outdo them in this category. When I played college baseball, I had one teammate, a pitcher, who wore the same undershirt without washing it for as long as he kept winning. Some of us kind of hoped we’d lose so he’d wash his shirt. Other athletes carry lucky charms, perform rituals, or engage in elaborate routines to keep a winning streak alive.
I had a couple of superstitions of my own during my college baseball years. For starters, I couldn’t knock the dirt off just one of my cleats (shoes) with my bat. I had to do both, even if the other one was clean. And when running in from the field, I never stepped on the chalk line. The other players never questioned me about my superstitions; they had their own weird habits.
When I started studying psychology, I began to wonder where this stuff comes from. What convinced me that I’d have a bad game if I stepped on the chalk line? At some point in time, I must have stepped on the line and then had a bad game. I saw a connection between what I did (stepped on the line) and what happened to me (had a bad game). I drew a connection between my behavior and a consequence, in this case, a negative consequence. Psychologists call this superstitious learning.
When an actual connection exists between what you do and a particular event that follows, be it positive or negative, a specific type of learning takes place. You learn that when you do something, the action is followed by consequence. Behaviorists use the acronym A-B-C: Antecedent (what happens before)→Behavior (the action performed)→Consequence (what happens after the action). All learning is a process of conditioning, a type of learning in which an association between events is made.
In this chapter, I describe the learning process and point out how learned behaviors apply to classical conditioning, a type of learning in which in which two events become associated with each other, as well as operant conditioning, learning in which an important consequence follows a specific response, leading to that response being more or less likely to happen again.
Both classical and operant conditioning lead to learning. What is “learned” in classical conditioning is that two previously unassociated stimuli are now “related” or associated. A good example is something called taste aversion learning. I once ate a shrimp cocktail and got ill and vomited. From that point on, just the thought of shrimp cocktail has made me nauseated. I learned that shrimp cocktail and illness are related, at least for me. I learned that the taste of shrimp was aversive because it was associated with nausea.
In operant conditioning, the learned association is between a particular behavior and what happens after it, the consequence. If you’ve ever caught a fish in a particular spot on a lake or river, you know from that point on you will continue to try that spot first every time you go fishing. What you learn is that your behavior of fishing (behavior) in spot X (context) resulted in a positive consequence, or a reward. The receipt of that reward increases the likelihood that you will repeat the behavior that triggered the reward when you are next in the same situation.
Classical conditioning is about two stimuli becoming related to each other. Operant conditioning is about the relation of two stimuli increasing the likelihood that a behavior will occur again (or not).
Learning to Behave
You’ve been there or at least you’ve seen it, and I’m not judging — too much. It’s the public tantrum with all the key ingredients: parent shame, onlooker disdain, child out of control. And only the coveted toy, piece of candy, or permission has the power to end it. Desperate, you give in and appease the hostile creature.
Most people seem to agree that throwing a public tantrum to fulfill an emotional or physical goal is a learned behavior, a response that is taught or acquired through experience. So when a tantrum erupts, parents tend to bear blame for teaching the child that tantrums work. Because work they do! A screaming and flailing child often does get what he wants; children see it work for others (observational learning), and they experience results when they do it (operant conditioning). So why not create a spectacle?
More than a hundred years ago, a group of British philosophers asked this very same question and tried to figure out the nature of learning. They observed that when two experiences occur together in time (temporal contiguity) and space (spatial contiguity), they become associated with each other. In other words, people learn that when event or object A occurs, so does event or object B. It sounds gossipy — “A and B are always together!” The freeway and traffic stay together; hamburgers and French fries don’t make individual plans; and tantrums go hand-in-hand with new toys. They go together. They’re associated!
Public tantrums capitalize on associations. The child realizes, “Every time I’m in the store, my terrible behavior leads to a new toy in my hands.” And unfortunately for the tired, stressed out, and impatient parent, frequency is at play as well. The parent learns that buying a toy stops the tantrum — quick relief for the weary! As this scene continues to play out over and over again, an ever-stronger association forms.
The good news is that learned behavior can be unlearned through the same learning processes, which is also known as conditioning.
Drooling like Pavlov’s Dogs
Kind of a gross visual, huh?
Personally, I would rather go to the dentist than conduct research on the salivation patterns of dogs. That’s just me. But one brave man, Russian physiologist Ivan Pavlov, was up for the job. Pavlov was actually studying digestion with dogs when he became interested in how the presentation of food automatically activated the salivation response in the dogs that he was studying. He found that the formation of saliva was automatic.
Try it. Think about something really tasty and see if your mouth waters automatically. Did it work? It should have because salivation is a reflexive response to food. It’s the body’s way of preparing to receive food. Saliva helps break down food into digestible bits.
In this section, I describe how Pavlov figured out why certain associations trigger certain natural responses and thereby discovered classical conditioning. I also point out how associations can change to alter certain learned responses.
Conditioning responses and stimuli
Pavlov constructed a device to collect the saliva directly from the dogs’ salivary glands as the glands went to work. He could then measure how much saliva the dogs reflexively produced. Picture a dog strapped into a cage with a tube attached to its salivary glands and this wacky scientist counting each drop. Not even Hollywood could have imagined a more eccentric scene.
At this point, Pavlov was probably happy with his canine digestion research; but one day, he noticed something strange — the dogs salivated sometimes even when the food wasn’t presented. What was going on? Was something else causing the salivation?
Pavlov came up with an associationist explanation. That is, the dogs had learned to associate other stimuli with the food. But what was triggering this response?
Pavlov conducted a whole series of experiments to figure out how the dogs had learned to automatically associate non-food stimuli with food in a way that produced salivation. A typical experiment went something like this:
1. Pavlov placed his dogs in their harness with the saliva tubes attached to the dogs’ salivary glands.
2. He rang a bell and observed whether the dogs salivated or not. He found that they didn’t.
3. Then he rang the bell, waited a few seconds, and then presented food to the dogs. The dogs salivated.
4. He repeated the bell plus food presentation several times. These pairings, by the way, are called trials.
5. After Pavlov was satisfied with the number of trials, he presented the bell alone, without the food.
6. He found that the bell by itself produced salivation!
Conditioning refers to learning through the associative process, learning through experience. Pavlov’s discovery became known as classical conditioning.
After conducting his experiments, he identified four necessary components of classical conditioning:
Unconditioned stimuli (US): The food that Pavlov presented to his dogs, the unconditioned stimulus, is the thing that triggers the unconditioned response. Food prompts salivation.
Unconditioned responses (UR): Pavlov’s dogs automatically, or reflexively, salivated when presented with food. They didn’t need to learn or be conditioned to salivate in the presence of food. Pavlov called this response the unconditioned, or not-learned, response. It happened without learning. It was a reflex.
Conditioned stimuli (CS): The bell that Pavlov rang in a typical experiment, called the conditioned stimulus, is the item that the dogs learned to associate with the food through the process of pairing trials. After enough trials, a conditioned stimulus produces a response on its own.
Conditioned responses (CR): After the CS begins producing the UR without the US, the response is called the conditioned response. In symbolic form, this system looks something like Table 8-1.
Table 8-1 Classical Conditioning
US→UR (food automatically produces salivation)
CS + US→UR (bell + food produces salivation)
(bell + food produces salivation)
(bell + food produces salivation)
(bell + food a few more times produces salivation)
CS→CR (bell alone produces salivation)
The power of classical conditioning is pretty impressive. Just think — if you appropriately pair two stimuli, the CS alone will eventually get the job done. But when the pairing stops, and the CS is producing the response by itself, the power of the CS eventually fades. If a CS is presented enough times without the US, the CS eventually will cease to elicit the CR.
This phenomenon is called extinction, and it is a way to reverse the process of classical conditioning. For example, Pavlov’s dogs learned to salivate at the sound of a bell. But if the bell continued to be presented without the delivery of food, the dogs would eventually stop slobbering to the bell.
But wait, there’s more!
Something even more interesting happens if the US is reintroduced sometime after extinction — spontaneous recovery. At this point, the CS’s ability to elicit the response comes bouncing back, and once again the CS triggers the CR. This means that you can use classical conditioning techniques to teach an old dog new tricks, and you can reverse the process through extinction. With this skill, you’ll never be the boring guy at the party sitting in the corner. You can dazzle your newfound friends with classical conditioning tricks and come to the rescue of parents of a toy-hungry tantrum king by teaching them to just stop giving in and let extinction take over.
Here’s a fun party trick if you’re thinking about testing your own classical conditioning prowess.
1. Gather a few people together — family, friends, coworkers, whomever. Get some packets of powdered, lemonade drink mix. This stuff is really sour without sugar. Give one packet to each participant.
2. Ask each person in the crowd to dip a finger in the lemonade and take a lick. (This is the US.) Ask them to observe if their mouths watered. They should have. If not, get yourself some better droolers.
3. Now choose a CS (a bell, a light, a whistle, whatever). Go through the process of pairing the CS with tasting the lemonade (CS→US→UR over and over again).
4. After 10 to 20 trials, go through a couple of trials where you present just the CS and ask the participants to observe if their mouths watered. They should have! That’s classical conditioning.
5. If your crew is really into spending this kind of time with you, you can now start playing around with extinction and spontaneous recovery!
One more way to reverse the effects of classical conditioning is worth mentioning. You’ve conducted the lemonade test, and you’ve successfully taught your Pavlovian subjects to drool on command. If you want to change the effect, choose another US that produces some other response (UR) and classically condition your subjects with the new US. This process is called counterconditioning.
Counterconditioning works especially well if the new US produces a response that is incompatible with the old CR. If the old CR was a watering mouth, maybe you pick a new US that produces a dry mouth. I don’t know what that may be — maybe eating sand.
I guarantee that if you classically condition the bell with the eating of sand, the bell will have a very hard time triggering a watering mouth ever again . . . unless, of course, you reverse the process all over again. Just be sure to give your subjects a break from time to time, and don’t actually try the sand-eating thing as a parlor game; it’s just an example!
Classic generalizing and discriminating
You may be thinking, “Big deal. Dogs learn to salivate to a bell; so what?”
Well, if you’re going to be so tough to impress, you should know that classical conditioning is actually a very important phenomenon in terms of human survival. It helps people learn things simply by association, without effort; and this can be very beneficial. In other words, after a CS becomes associated with a US to the point where the CS produces the CR by itself, that learning can expand automatically through a process known as generalization.
Generalization happens when something similar to the CS — I’ll call it CS-2 — elicits the CR, even if CS-2 has never before been associated with the original US. For instance, if you learn to associate certain facial gestures, like a snarl or a sneer, with eventual violence, then the snarl or sneer (CS) produces fear (CR), whereas only a flying fist or a verbal threat (US) elicited fear (UR) in the past. You may then generalize the snarl and experience fear in connection with direct and non-averted eye contact (CS-2). This generalization can save your tail. Generalization helps people adapt, because learned responses are applied to new situations.
Generalization can backfire, though. If, for example, I am attacked by a gray-colored pit bull, I may get scared every time I see a gray dog of any type, even a Chihuahua. This “over-learning” can limit my behavior and cause unnecessary suffering because I become afraid of dogs that pose no actual danger to me, so instead of just avoiding gray pit bulls I avoid all dogs.
Another example of generalization backfiring comes from the traumatic experiences of war veterans who suffer from post-traumatic stress. If they’ve experienced loud explosions and heavy gunfire and developed a strong fear reaction to these events, these veterans may respond to hearing a car backfire or some other loud noise in the same way they responded to gunfire in a war zone. This can make life pretty difficult, especially for people living in an urban area with a lot of loud noises.
When people begin to overgeneralize learned behaviors, a process known as discrimination is absent. You know how to discriminate, or tell the difference, between stimuli such as the sound of a potentially fatal gunshot and the merely annoying sound of a car backfiring. Discrimination is learned when a CS-2 (or 3 or 8 or 25) is presented enough times without eliciting a response. It becomes apparent that only the CS, and not the CS-2, is necessarily going to produce the CR.
If all it takes to trigger a natural response to an unnatural stimulus is pairing a natural stimulus with an unnatural stimulus and presenting them together for a while, it can’t get much easier.
But not so fast! The process sounds as straightforward as it gets, but some specific rules must be followed in order to achieve conditioning.
In order for associations to form, they must conform to the following two very important rules:
Contiguity: Associations are only formed when events occur together. For example, I feel depressed when I wake up every Monday morning and think about going back to work. Therefore, for me, work and waking up are associated.
Frequency: The more often that two (or more) events occur together, the stronger the association becomes.
Contiguity, when one event follows another in time, is absolutely required for classical conditioning to occur. Think about it: What if Pavlov had presented the bell (CS) after he presented the food (US)? Or what if he had presented the bell 15 minutes before the food? The CS must come immediately before the US in order for the association to form.
Each of these sequence and timing scenarios represents conditioning techniques that aren’t very effective. If Pavlov presented the US before the CS, which is a process known as backward conditioning, the dogs would have either made no association at all or an extremely weak one. If he presented the bell well in advance of the food, a process known as trace conditioning, the dogs may have formed a weak association, if any.
The best way to ensure that a strong or more quickly formed association is formed during the conditioning process is to follow these guidelines:
Present the CS just before the US and keep the CS on or around until the US appears. This way, the CS is perceived to be contiguous with the US.
Conduct a lot of trials with the CS and US paired frequently. The strength of the association is a direct product of the frequency of the pairing.
Use a strong or intense CS to condition faster. A bright light conditions faster than a dim one. A loud bell conditions faster than a faint one.
But I don’t want to mislead you into thinking that all you have to do is frequently present an intense CS before a US to achieve classical conditioning. Even though the rule of contiguity states that if two stimuli are contiguous, an association will form, it’s actually not that simple.
Blame it on a pesky graduate student named Robert Rescorla who questioned whether contiguity was enough. Maybe he thought it all seemed too simple.
Rescorla proposed that another rule — the rule of contingency — be added to the list of conditioning requirements. His idea was that a CS not only has to be contiguous with a US; it also has to be an accurate predictor of the US. In other words, if the CS is presented at random times (at 1 minute, 7 minutes, 2 minutes, or 12 minutes, for example) with the US, then the CS isn’t a credible predictor of the US. The learner (animal or human) gains no predictive power from experiencing the CS, so the CS fails to trigger the CR. Therefore, the CS must be presented with the US in a way that the learner can anticipate, with a fair degree of certainty, that the US is soon to come.
Adding another rule to the requirements of Pavlov’s classical conditioning is quite the accomplishment for a graduate student. But Rescorla wasn’t finished. Later, he and another psychologist, Allan Wagner, made another huge contribution to learning theory. Ready?
The Rescorla-Wagner model (1972) simply states that in order for a CS to be maximally effective, the US must be unexpected. The learning process is dependent on the element of surprise. If a learner expects the US every time she sees the CS, then she learns to associate it properly; but eventually, the strength of the association reaches a maximum. The strength increases dramatically at first and then levels off as the novelty of the CS wears off and it becomes more “expectable.” Therefore, the power of an association to elicit a CR is a function of surprise. The more novel the CS, the stronger the association.
Battling theories: Why does conditioning work?
Knowing how to perform classical conditioning is useful (check out the Conditioning responses and stimuli earlier for the how-to on this trick), and the conditioning response enables people to learn about their environment in ways that improve adaptability and a little something called survival. But why does conditioning work? Why do previously unrelated stimuli become associated with each other?
Pavlov believed that the simultaneous activation of two distinct areas in the brain form associations between a CS and a US. This activation results in the formation of a new pathway between the two areas. It’s like sitting next to a stranger on the bus and, through some polite chitchat, realizing you both know the same person. These two previously unrelated people become associated through this common association and a new connection is born.
Clark Hull presented an alternative account. He believed that the association formed is actually between the CS and the UR, which then becomes the CR. Scientists are at their most creative when they figure out how to make two different theories compete with each other in predicting the outcome of an experiment. This creativity makes it possible for them to dream up a critical experimental test. Holland and Staub set out to test Hull’s theory. They conditioned rats by using noise and food pellets.
According to Pavlov, the rats learned to associate the noise with the food. But Holland and Staub pitted Pavlov’s idea against Hull’s by trying to make the food an unattractive US. First, they taught the rats to associate noise (CS) with food (US). Then they put the rats on a turntable and spun them around to make them nauseous. Here, they taught the rats to associate food with nausea. Then, after spinning them for a while, they presented the noise again, and the rats didn’t respond to it. This “devalued” the food by associating it with nausea.
Pavlov thought that the original connection was between the noise and the food. But Hull predicted that devaluing the US would not make a difference in the rats’ response; he suggested that the critical association forms between the noise (CS) and eating (UR). Devaluing the US did make a difference, though. Spinning the rats on the turntable and making the food less attractive to them as a result should not have made a difference, according to Hull, but he was wrong. A connection must exist between the CS and the US; the CS can’t be left out of the loop for conditioning to occur.
So, Pavlov rules the day!
This isn’t just rigid tradition. It actually has predictive value. Learning doesn’t stop here, however. Check out Chapter 9 for new adventures in learning about learning.
Studying Thorndike’s Cats
Operant conditioning takes place in all facets of everyday living — in homes, the workplace, and public spaces. Parents use rewards, or operant conditioning, to get their children to do their homework or follow through on chores. Here’s how operant conditioning works.
Every month I get paid at my job. Am I paid just to sit around and take up space? No, I’m paid for performing the duties of my job, for working. I do something, and something happens. I work, and I get paid. Would I work if I didn’t get paid? Probably not, for two reasons.
First, I have better things to do with my time than to work for free. (My credit card debtors wouldn’t be too happy with me either.) Second, according to operant conditioning theory, I work because I get paid. The “something” that follows my working behavior is a reward, a positive consequence.
When I do something like work at my job, something happens; I get paid. Then what happens? I keep going to work every month, so that paycheck I get must be having an effect on me. Way back in early 1900s, Edward Thorndike created a theory, known as the law of effect, that addressed this idea of a consequence having an effect on behavior.
Thorndike decided to look into this phenomenon by doing research with cats. He constructed the puzzle box made from a wooden crate with spaced slats and a door that could be opened by a special mechanism. Thorndike placed a hungry cat inside the box and closed the door. He then placed some food on a dish outside of the box that the cat could see through the slats in the crate. Sounds kind of cruel, doesn’t it? The cat would reach for the food through the slats, but the food was out of reach. The only way for the cat to get the food was for Thorndike or the cat to open the door.
Obviously, Thorndike wasn’t going to open the door; he was conducting an experiment. The cat had to figure out how to open the door himself. You don’t see a lot of cats going around opening doors. So what did he do? It’s suspenseful, isn’t it? What will the little hungry cat do there in the puzzle box? Will he open the door and feed voraciously on the prized food that was just beyond his reach only moments before? Or will he meet his demise and starve at the hands of a fiendish psychologist?
The cat had to figure out how to open the door, and Thorndike was a patient man. He waited and watched, waited and watched. The cat wandered around the box, stuck his little paw out, meowed, bounced off the walls, and acted in any number of random ways inside of the box. But then, something remarkable happened. The cat accidentally hit the latch that was holding the door shut, and the door miraculously opened! Hurray! The cat got to eat, and everyone lived happily ever after.
What did Thorndike learn from his little experiment?
Nothing. He wasn’t done yet.
So he put that poor cat back inside the box to go at it again. No problem, right? The cat knew what to do; just hit the latch, little kitty! But when it got back into the box, the cat acted like he didn’t know that he had to hit the latch to open the door. He started acting in the same random ways all over again.
Never fear, eventually the cat triggered the latch by accident again and was again rewarded by gaining access to the food. Thorndike kept performing this experiment over and over again, and he made a remarkable observation. The amount of time that it took for the cat to figure out that the latch was the key to freedom — well, food! — got shorter and shorter with each subsequent trial. Why was the cat getting faster? Thorndike proposed that the food helped the cat learn the association between the triggering the latch and the escape.
Thorndike’s law of effect states that a response that results in stronger satisfaction to an organism (for example, animal or human) will be more likely to be associated with the situation that preceded it. The greater the satisfaction, the greater the bond between the situation and response.
Basically, the consequence of getting the food served as a reward for learning how to open the box. The cat’s opening-the-box behavior is like my job, and his food is like my paycheck.
So getting back to my original question of whether my paycheck has an effect on me or not — the fact is, I keep working, just like Thorndike’s cat kept opening the box to get the food. Therefore, the consequence of my action does appear to lead me to perform that action again.
Reinforcing the Rat Case
When a consequence of an action or event increases the probability that the event or action will happen again, that consequence is called a reinforcer. It’s like a reward, and rewards often motivate a repeat of actions that earned the reward. Operant conditioning is all about the effects of reinforcers on behavior.
B. F. Skinner, one of the most famous psychologists of all time, followed in Thorndike’s footsteps in using animals to investigate operant conditioning. He constructed a box with a lever inside and called it a Skinner box. When an animal pressed the lever, a food pellet fell out of a feeder and into the box. Skinner wanted to see if rats placed in the box could learn to press the lever in order to receive the food.
This task was a lot harder than one may think. Rats aren’t used to pressing levers to get food. Skinner had to facilitate the process a little bit with a procedure known as shaping, a technique of rewarding successful approximations to the goal. Skinner rewarded the rats with food for performing a behavior that was close to, but not exactly, the required response. Shaping was done gradually so that the rats eventually got to the point where they pressed the bar and received their reinforcers of food.
After the rats got the hang of it, they learned to press the bar for food the same way Thorndike’s cats learned to open the door. The rats learned because the reward of the food taught them how to press the bar.
Finding the right reinforcer
In the cases of both Thorndike’s cats and Skinner’s rats, the subjects learned because they were rewarded with food. Food is a powerful reward for animals, but it’s just one type of reinforcer. Anything that increases the likelihood that a behavior will occur again can be used as a reward or reinforcer. It can be food, money, recess, or vacations. It can also be something intangible like approval, praise, or attention from another person.
There are two basic types of reinforcement:
Positive reinforcement is the use of any reinforcer that increases the likelihood that a behavior will occur again.
Negative reinforcement occurs when the removal of negative stimuli leads to an increased likelihood that a behavior will occur again. A good example of this is when a student gets disruptive in class during an assignment he is trying to avoid or escape. The teacher sends him out of the room and negatively reinforces the disruptive behavior. The teacher thinks she is punishing the student but he is actually getting out of an aversive demand.
The basic idea of operant conditioning is that behaviors that are reinforced (either positively or negatively) are more likely to occur again. But is this true for all reinforcers? Are all reinforcers created equal? If Skinner had given the rats five dollars each time they pressed the lever, would they still have learned the response?
Probably not. Differences between reinforcers exist and affect the impact that the reinforcers have on responses. Not all consequences are rewarding or reinforcing as they vary from person to person (or animal to person).
Two types of positive reinforcers are effective:
Primary reinforcers are rewards that don’t require shaping or prior training to be effective. Examples may be food or pleasurable physical sensations.
David Premack in 1971 came up with the interesting idea that primary reinforcers can be identified by looking at what people spend most of their time doing. If they spend a lot of time watching television, riding bikes, or sleeping, then these activities may be considered primary reinforcers. His Premack principle states that high probability responses can be used to reinforce lower probability responses. This is like using ice cream to get your child to eat his or her vegetables. If they want the ice cream (high probability response), they’ll eat their vegetables (low probability response).
Secondary reinforcers are things that become reinforcing through experience and learning. This result happens by associating the secondary reinforcer with a primary reinforcer by using classical conditioning techniques (see the section Conditioning responses and stimuli earlier in this chapter).
The best example of a secondary reinforcer is money. We aren’t born knowing the value of money (and some of us never get it). But, eventually we learn the value of money as we experience its association with the things we like such as food, clothing, shelter, and expensive cars. So, money “acquires” its value to us as it becomes associated with primary reinforcers. In some institutions, like schools and hospitals, caretakers reward appropriate behaviors with tokens, which may be cashed in for specific rewards later. This type of system is called a token economy, like local money.
After identifying what a subject considers to be reinforcing, it becomes possible to influence behavior by providing rewards for performing the appropriate responses.
For example, consider an office manager who is having a difficult time getting her employees to come back from lunch on time. What can she do? First, she needs to figure out what is reinforcing for the group or each individual. Not all rewards are the reinforcing to all people. Then, she has to start rewarding anyone who performs the desired behavior, coming back from lunch on time. She could give them little gifts, money, or smiley-face stickers.
Or, the office manager could use negative reinforcement. For instance, she could send a really whiny employee out to lunch (who complains profusely and gets everyone’s anxiety up at the thought of being late) with the latecomers. The latecomers hate hearing the whiny employee complain so much that they start returning on time just to avoid hearing him go on and on.
This concept of negative reinforcement confuses a lot of people. How can taking something away or removing a noxious stimulus increase the probability of a behavior? You may have some experience with this tactic if you’ve ever had a new puppy in your home that wouldn’t stop whining while you tried to sleep. If you kept the puppy in another room or in the garage, you probably responded to the whining by getting up and checking on the cute little creature. What happened when you went to the puppy? It probably stopped crying. If you then went back to bed, I bet the crying woke you again less than ten minutes later.
The problem in this situation is that your behavior was under the control of negative reinforcement. The puppy’s whining was a noxious (and annoying) stimulus. When you went to the garage, the whining stopped, increasing the likelihood that you kept going to the puppy every time he cried. You were negatively reinforced for going to the puppy — and that puppy got positively reinforced for whining! Oops.
Both positive and negative reinforcements are consequences that are likely to increase certain behaviors. But what about that other consequence, punishment? Punishment is any consequence that decreases the likelihood of a response and not necessarily something typically thought of as a punishment. For example, if every time you call a certain friend he seems distracted, like he’s not listening to what you’re saying, you may experience negative feelings of not being valued; this “punishment” is likely to lead to you calling at person less often.
One type of punishment is straightforward — the introduction of something noxious or aversive.
Another type of punishment, negative punishment, involves removing a reinforcer, such as taking away a child’s bicycle. Again, as for reinforcement, punishment can be a very individual matter; what one person experiences as aversive or punishing may not apply to the next person.
Punishment is used to influence people’s behavior all the time. Parents punish children. Courts punish convicted criminals. Credit card companies punish people for late payments. But does punishment work?
Punishment can be a very potent and effective means for decreasing the frequency of a behavior, but keep a few things in mind:
Punishment should be the least intense form necessary to produce the desired response. Recipients may acclimate to each subsequent increase in punishment, however, and overly intense punishment is problematic as well. In order for punishment to be effective over a long period of time, you have to adjust its intensity in a meaningful way.
To be effective, punishment must occur as close in time as possible to the response being punished. If a parent waits three weeks to punish a child for breaking a lamp, the kid’s likely to be completely clueless about why she’s being punished; therefore, the punishment has no effect on deterring future behavior.
Punishment should be firm, consistent, and accompanied with a clear explanation of why the punishment is being administered.
There are ethical issues associate with punishment which mean it has to be considered very carefully in all circumstances.
Of course, a lot of people are uncomfortable with the idea of inflicting pain or suffering on another person in order to alter behavior. The use of punishment can have some negative consequences:
Fear: When people are effectively punished, they may learn to anticipate future punishment and develop severe anxiety while waiting for the next shoe to drop. This can have a disruptive effect on the life of a punished person, which can lead to avoidance and apathy.
Aggression: I’ve worked in both jails and prisons, and I’ve seen men become angrier and more aggressive as a result of the harsh conditions that they face while incarcerated. When the time comes for these people to be released and face the world in a reformed manner, they are dysfunctional and institutionalized, often unable to make the transition to the outside world as a result of their punishment.
The person delivering the punishment may become an aversive CS through conditioning. For example, a child may avoid a parent who punishes the child frequently. Contiguity does its thing — that person is there every time I get scolded (“Just wait until your father gets home.” Thanks, Ma).
Scheduling and timing reinforcement
Have you ever wondered why people keep going back to places like Las Vegas and Atlantic City time and time again to donate their money to the casino expansion fund? The bottom line with gambling is that the big winner is always the house, the casino. Everyone knows this, but some people can’t stay away.
People keep going back because of something called a schedule of reinforcement, a schedule or determination for what responses to reinforce and when to reinforce them. There are four basic schedules of reinforcement, each with different effects on the response in question:
Fixed ratio (continuous and partial types)
Perhaps the most common form of reinforcement is called continuous reinforcement, in which the ratio is one-to-one. One behavior, one reward. It involves reinforcing a behavior every time it occurs. Every time I pull the slot machine handle, I win! Yeah right, I wish.
Continuous reinforcement is good for the shaping phase of learning (see “Reinforcing the Rat Case” earlier for a discussion of shaping) or for what is called the acquisition phase. Learning a new behavior takes time. Continuous reinforcement speeds up the learning process.
The problem with continuous reinforcement, however, is that it extinguishes quickly. If I’m reinforced every time I return to work on time from lunch, then I’m likely to stop returning on time as soon as my boss stops reinforcing me for this behavior.
Patting heads sporadically
Often, reinforcement in our world is intermittent and sporadic. Of course we don’t win every time we pull the lever on the slot machine. B. F. Skinner didn’t design slots.
B. A. Loser, the casino behavioral psychologist, did. Reinforcement on a less frequent basis (for example, requiring more than one response) is called partial reinforcement. There are two types of partial reinforcement schedules, and each is further divided by how predictably or randomly the reinforcers come.
The first type of partial reinforcement is called a ratio schedule that involves more than one response being required to gain a reward. With a ratio schedule, reinforcement is only given after a specific number of responses have been given. If a parent is using this schedule with his children, he may only give a reward for some number of A’s his child gets on her report card or after a certain number of times the child cleans her room. Ratio schedules can then vary based on whether a fixed number of responses or a variable number of responses are required to receive the reinforcement.
• A fixed ratio reinforcement schedule involves always reinforcing for the same number of given responses. If I’m going to reward my child for every two A’s she earns, that never changes; reinforcement follows every two A’s.
• A variable ratio reinforcement schedule involves giving reinforcement for a varied number of responses provided. I may reinforce my child for two As now, but then I may reinforce her for one A, three A’s, or ten A’s down the line. The key to this approach is to keep the recipient guessing. Doing so has a powerful effect on the persistence of a response because people keep doing the requisite behavior because they don’t know when the reinforcement will come. A variable ratio is much more resistant to extinction than continuous reinforcement.
The other type of partial reinforcement schedule, an interval schedule, is based on the amount of time that has passed between reinforcements. You still have to respond to get a reward, but you have to wait a certain time before your response “works.”
• I get paid once a month. Time determines when I get paid. My pay schedule is an example of a fixed interval reinforcement schedule. The time frame never varies.
• The other type of interval schedule is variable interval. Here, responses are reinforced per a varied amount of time passed since the last reinforcement. This approach would be like getting paid at the end of one month, and then getting paid two days later, and again three weeks later, and so on. Variable interval schedules are also very resistant to extinction for the same reason as variable ratio schedules; the responder never knows when he is going to get reinforced, so he has to keep responding to find out.
Gambling is motivated by a variable interval schedule so that people keep pumping the money in, waiting for the big payoff.
I’m sure you’ve heard “You can’t win if you don’t play.” The next time you think you’re “due” or bound to win because you’ve been sitting at the same machine for three days without a shower, sleep, or anything to eat, remember that it’s variable. You never know when the machine is going to hit. So try to manage your rage if you finally give up and the next person who sits down wins it all!
That’s why they call it gambling.
The timing of the reinforcement is also critical. Research has shown that reinforcement must occur immediately, or as quickly as possible, following the desired response. If you wait too long, the connection between the response and the reinforcing consequence is lost. Skinner’s rats would have never figured out how to press that lever if they were given a food voucher redeemable only after five visits to the Rat Food Deluxe shop — instead of instant gratification for their achievement.
Becoming Aware of Stimulus Control and Operant Generalization
Have you ever noticed how people slow down on the highway when they see a traffic cop? That’s probably because they’ve all gotten tickets from them at one time or another. What happens when a good old city (non-traffic) cop is on the road? Nobody slows down. They just ignore him. Is this an example of a blatant disrespect for the law? No. It’s an example of stimulus control, the idea that a response can vary as a function of the stimulus present at the time of reinforcement or punishment. Although both law enforcement authorities can give tickets for speeding, most of us know that city cops don’t typically give tickets on the highway. The stimuli have different effects on our behavior because they have led to different consequences. Punishment only comes from the traffic cop.
Sometimes, when we learn a response due to reinforcement, we may automatically generalize that response to other similar stimuli. If I generalized my traffic cop ticket experience to city cops, I would slow down for city cops, too. Or, if I’m reinforced for coming back from lunch on time, I may also generalize that behavior to coming to work in the morning on time. Generalization helps speed up the learning process because we don’t have time to receive reinforcement for every single response we elicit.
Discovering Operant Discrimination
Sometimes people can over-learn a response or behavior. They then engage in the response when they shouldn’t because they’ve generalized a little too much.
I think this happens to psychotherapists sometimes. We may be in a social situation, not working, when someone starts talking about how hard his or her day was. “Tell me how that makes you feel,” may slip out. Everyone looks at the psychotherapist in question like a quack. Maybe it’s time for a vacation.
I’ve also seen this phenomenon in movies. An ex-cop overreacts to seeing his grandson point a water pistol at him, and he takes the kid down to “remove the threat.” These are problems of discrimination, responding to only one of two or more particular stimuli. The problem is remedied by presenting someone with both stimuli and only reinforcing the response to the correct one. Put grandpa in the middle of a hold-up and throw his grandson with a water pistol into the mix. Only reinforce the Detective Grandpa when he successfully neutralizes the threat of the robber (stimulus 1) and not for taking grandson out (stimulus 2). He’s learned to discriminate between a real threat and a benign one.