Operant Conditioning - 8 Learning - STEP 4 Review the Knowledge You Need to Score High

5 Steps to a 5: AP Psychology - McGraw Hill 2021

Operant Conditioning
8 Learning
STEP 4 Review the Knowledge You Need to Score High

In operant conditioning, an active subject voluntarily emits behaviors and can learn new behaviors. The connection is made between the behavior and its consequence, whether pleasant or not. Many more behaviors can be learned in operant conditioning because they do not rely on a limited number of reflexes. You can learn to sing, dance, or play an instrument as well as to study or clean your room through operant conditioning.

Thorndike’s Instrumental Conditioning

About the same time that Pavlov was classically conditioning dogs, E. L. Thorndike was conducting experiments with hungry cats. He put the cats in “puzzle boxes” and placed fish outside. To get to the fish, the cats had to step on a pedal, which released the door bolt on the box. Through trial and error, the cats moved about the box and clawed at the door. Accidentally at first, they stepped on the pedal and were able to get the reward of the fish. A learning curve shows that the time it took the cats to escape gradually fell. The random movements disappeared until the cat learned that just stepping on the pedal caused the door to open. Thorndike called this instrumental learning, a form of associative learning in which a behavior becomes more or less probable depending on its consequences. He studied how the cats’ actions were instrumental or important in producing the consequences. His Law of Effect states that behaviors followed by satisfying or positive consequences are strengthened (more likely to occur), while behaviors followed by annoying or negative consequences are weakened (less likely to occur).

B. F. Skinner’s Training Procedures

B. F. Skinner called Thorndike’s instrumental conditioning operant conditioning because subjects voluntarily operate on their environment in order to produce desired consequences. Skinner was interested in the ABCs of 'margin-top:1.2pt;margin-right:0cm;margin-bottom:1.2pt; margin-left:0cm;text-indent:18.0pt;line-height:normal;text-autospace:none'>Skinner developed four different training procedures: positive reinforcement, negative reinforcement, punishment, and omission training. In positive reinforcement, or reward training, emission of a behavior or response is followed by a reinforcer that increases the probability that the response will occur again. When a rat presses a lever and is rewarded with food, it tends to press the lever again. Being praised after you contribute to a class discussion is likely to cause you to participate again. According to the Premack principle, a more probable behavior can be used as a reinforcer for a less probable one.

“I use the Premack principle whenever I study. After an hour of studying for a test, I watch TV or call a friend. Then I go back to studying. Knowing I’ll get a reward keeps me working.”

—Chris, AP student

Negative reinforcement takes away an aversive or unpleasant consequence after a behavior has been given. This increases the chance that the behavior will be repeated in the future. When a rat presses a lever that temporarily turns off electrical shocks, it tends to press the lever again. If you have a bad headache and then take an aspirin that makes it disappear, you are likely to take aspirin the next time you have a headache. Both positive and negative reinforcement bring about desired responses, and so both increase or strengthen those behaviors.

In punishment training, a learner’s response is followed by an aversive consequence. Because this consequence is unwanted, the learner stops exhibiting that behavior. A child who gets spanked for running into the street stays on the grass or sidewalk. Punishment should be immediate so that the consequence is associated with the misbehavior, strong enough to stop the undesirable behavior, and consistent. Psychologists caution against the overuse of punishment because it does not teach the learner what he or she should do, suppresses rather than extinguishes behavior, and may evoke hostility or passivity. The learner may become aggressive or give up. An alternative to punishment is omission training. In this training procedure, a response by the learner is followed by taking away something of value from the learner. Both punishment and omission training decrease the likelihood of the undesirable behavior, but in omission training the learner can change this behavior and get back the positive reinforcer. One form of omission training used in schools is called time-out, in which a disruptive child is removed from the classroom until the child changes his or her behavior. The key to successful omission training is knowing exactly what is rewarding and what isn’t for each individual.

Operant Aversive Conditioning

Negative reinforcement is often confused with punishment. Both are forms of aversive conditioning, but negative reinforcement takes away aversive stimuli—you get rid of something you don’t want. By putting on your seat belt, an obnoxious buzz or beep is ended. You quickly learn to put your seat belt on when you hear that sound. There are two types of negative reinforcement—avoidance and escape. Avoidance behavior takes away the aversive stimulus before it begins. A dog jumps over a hurdle to avoid an electric shock, for example. Escape behavior takes away the aversive stimulus after it has already started. The dog gets shocked first, and then he escapes it by jumping over the hurdle. Learned helplessness is the feeling of futility and passive resignation that results from the inability to avoid repeated aversive events. Later, if it becomes possible to avoid or escape the aversive stimuli, it is unlikely that the learner will respond. Sometimes in contrast to negative reinforcement, punishment comes as the result of your exhibiting a behavior that is followed by aversive consequences. You get something you don’t want. By partying instead of studying before a test, you get a bad grade. That grade could result in your failing a course. You learn to stop doing behaviors that bring about punishment but learn to continue behaviors that are negatively reinforced.


A primary reinforcer is something that is biologically important and, thus, rewarding. Food and drink are examples of primary reinforcers. A secondary reinforcer is something neutral that, when associated with a primary reinforcer, becomes rewarding. Gold stars, points, money, and tokens are all examples of secondary reinforcers. A generalized reinforcer is a secondary reinforcer that can be associated with a number of different primary reinforcers. Money is probably the best example because you can get tired of one primary reinforcer like candy, but money can be exchanged for any type of food, another necessity, entertainment, or a luxury item you would like to buy. The operant training system, called a token economy, has been used extensively in institutions such as mental hospitals and jails. Tokens or secondary reinforcers are used to increase a list of acceptable behaviors. After so many tokens have been collected, they can be exchanged for special privileges like snacks, movies, or weekend passes.

Applied behavior analysis, also called behavior modification, is a field that applies the behavioral approach scientifically to solve individual, institutional, and societal problems. Data are gathered both before and after the program is established. For example, training programs have been designed to change employee behavior by reinforcing desired worker behavior, which increases worker motivation.

Teaching a New Behavior

What is the best way to teach and maintain desirable behaviors through operant conditioning? Shaping, positively reinforcing closer and closer approximations of the desired behavior, is an effective way of teaching a new behavior. Each reward comes when the learner gets a bit closer to the final goal behavior. When a little boy is being toilet trained, the child may get rewarded after just saying that he needs to go. The next time he can get rewarded after sitting on the toilet. Eventually, he gets rewarded only after urinating or defecating in the toilet. For a while, reinforcing this behavior every time firmly establishes the behavior. Chaining is used to establish a specific sequence of behaviors by initially positively reinforcing each behavior in a desired sequence and then later rewarding only the completed sequence. Animal trainers at SeaWorld often have dolphins do an amazing series of different behaviors, like swimming the length of a pool, jumping through a hoop, and then honking a horn before they are rewarded with fish. Generally, reinforcement or punishment that occurs immediately after a behavior has a stronger effect than when it is delayed.

Schedules of Reinforcement

A schedule refers to the training program that states how and when reinforcers will be given to the learner. Continuous reinforcement is the schedule that provides reinforcement every time the behavior is exhibited by the organism. Although continuous reinforcement encourages acquisition of a new behavior, not reinforcing the behavior even once or twice could result in extinction of the behavior. For example, if a disposable flashlight always works, when you click it on once or twice and it doesn’t work, you expect that it has quit working and throw it away.

Reinforcing behavior only some of the time, which is using partial reinforcement or an intermittent schedule, maintains behavior better than continuous reinforcement. Partial reinforcement schedules based on the number of desired responses are ratio schedules. Schedules based on time are interval schedules. Fixed ratio schedules reinforce the desired behavior after a specific number of responses have been made. For example, every three times a rat presses a lever in a Skinner box, it gets a food pellet. Fixed interval schedules reinforce the first desired response made after a specific length of time. Fixed interval schedules result in lots of behavior as the time for reinforcement approaches but little behavior until the next time for reinforcement approaches. For example, the night before an elementary school student gets a weekly spelling test, she will study her spelling words but not the night after (see Figure 8.2).


Figure 8.2 Partial reinforcement schedules.

Schedules of Reinforcement


In a variable ratio schedule, the number of responses needed before reinforcement occurs changes at random around an average. For example, if another of your flashlights works only after clicking it a number of times and doesn’t light on the first click, you try clicking it again and again. Because your expectation is different for this flashlight, you are more likely to keep exhibiting the behavior of clicking it. Using slot machines in gambling casinos, gamblers will pull the lever hundreds of times as the anticipation of the next reward gets stronger. On a variable interval schedule, the amount of time that elapses before reinforcement of the behavior changes. For example, if your French teacher gives pop quizzes, you never know when to expect them, so you study every night. Another example of variable interval behavior is when you are “liked” on social media. You never know when it will occur; an hour could lapse, a day, or only a minute.


fixed ratio schedule—know how much behavior for reinforcement

fixed interval schedule—know when behavior is reinforced

variable ratio schedule—how much behavior for reinforcement changes

variable interval schedule—when behavior for reinforcement changes