Learning and conditioning
Lynlee Howard-Payne & Jarrod Payne
After studying this chapter you should be able to:
•use Pavlov’s experiment to outline classical conditioning (including concepts such as acquisition, extinction and spontaneous recovery)
•identify the unconditioned stimulus, unconditioned response, neutral stimulus, conditioned stimulus and conditioned responses in real world applications
•use Skinner’s theory to outline operant conditioning, explaining how reinforcement and punishment can shape behaviour
•contrast elements of classical and operant conditioning
•discuss the nature and importance of social learning theory in learning behaviour through observing a model
•critique traditional learning theories by considering the role of cognition in learning.
Melinda had two pet guinea pigs, which she fed every morning before she left for university, giving them fresh pre-packed lettuce leaves, clean water and dry pellets. Her guinea pigs were chatty and sociable pets, purring when she tickled them and making high-pitched shrieks when she put the lettuce (their favourite food) into their cage. After some time, Melinda realised that the guinea pigs started to shriek with excitement at the mere sound of her opening the plastic packet in which the lettuce was packed. One day, she decided to test what else her guinea pigs responded to. She started by crumpling up plastic rubbish bags and, to her surprise, the guinea pigs shrieked with delight, even though there was no lettuce in those packets. Melinda was intrigued by this and started to wonder whether she could train her guinea pigs to behave in other interesting ways. She comes to you, as a first-year psychology student, and asks for your help. What would you suggest that she do?
Figure 9.1 Pets can be classically conditioned by using food as the unconditioned stimulus
How do we learn? Do we learn in the same way that animals learn? What are our limitations in learning? The case study above highlights one way that animals (including humans) can learn, i.e. through conditioned responses to stimuli. This is learning through association. But humans have higher-order mental processes like thinking and anticipating. Can these cognitive processes also be used in human learning?
These questions about learning have interested psychologists and the general public alike. To answer these questions, this chapter will describe the classic experiments that generated a psychological interest in how animals and humans learn their behaviours. Almost all human behaviour is learned. Think for a moment what it would be like if you suddenly lost the behaviours you had learned through your life.
Learning theory is rooted in the work of Ivan Pavlov and B. F. Skinner, both renowned scientists, who discovered and documented the principles governing how animals (humans included) learn. Pavlov’s classical conditioning theory is based on physical reflexes which become associated with repeated neutral events (as happened with Melinda’s guinea pigs). In Skinner’s operant conditioning theory, learning happens as a result of the responses that people obtain to their actions (e.g. rewards or punishment). Each of these approaches to understanding learned behaviour will be addressed in this chapter. In addition, the social learning approach of Albert Bandura will be described. This approach considers the role of social modelling in learning and may be seen as a cognitive approach.
What is learning?
Learning can be considered to be a more or less permanent change in behaviour or knowledge that is due to experience (or conditioning). Four factors are involved in the definition of learning (Field, 2000):
•Learning is inferred from a change in behaviour.
•It involves the initiation of an inferred change in memory.
•It is the result of experience.
•It is relatively permanent.
It is the combination of these factors that constitutes our definition of learning; this essentially means that, primarily from a behavioural perspective, we learn how to respond to the world around us through either direct or indirect experiences that we accumulate over a lifetime.
In associative learning, the person learns to associate experiences with each other. For example, say you try some new and unusual food and later that night you are violently ill, it is likely you will associate the food with your nausea. Likewise, you may practise your forehand on the tennis court and associate this with winning more points. In this situation, reinforcement helps make the association. In order to understand the dynamics of associative learning, you need to note what comes before the behaviour (the antecedents) and what follows it (the consequences).
Classical conditioning deals with what happens before the behaviour (i.e. the response) and usually includes physiological reflexes like sweating, blinking or salivating. For example, you walk past a fast-food outlet and the smell immediately makes you salivate. At the same time, you see the sign on the shop. You learn to associate the sign with the delicious smell and, over time, you may start to salivate just when you see the sign, even if you cannot smell the food at that time. This means learning has occurred. In operant conditioning, learning is based on the consequences of the behaviour. If Kgomotso is praised for tidying up her toys, she is likely to do it again. Her tidy behaviour has been reinforced.
In classical conditioning, a stimulus that does not produce a response is paired with a stimulus that does elicit a response. After many such pairings, the stimulus that previously had no effect begins to produce a response. In the example shown, a hooter precedes a puff of air to the eye. Eventually, the hooter alone will produce an eye blink. In operant conditioning, a response that is followed by a reinforcing consequence becomes more likely to occur on future occasions. In the example shown, a dog learns to sit up when it hears a whistle.
Figure 9.2 Classical and operant conditioning (Coon & Mitterer, 2013: p. 194)
Ivan Pavlov (1849—1936) is best known for his offering of classical conditioning to understand and predict human and animal behaviour. He did extensive research in physiology (including the role of reflexes in guiding behaviour) and digestion, and won the 1904 Nobel Prize in physiology. Pavlov was studying the digestive processes of dogs and noted that, over time, they would begin to salivate even before their food was presented to them. Pavlov knew that salivation is a reflexive response to food that aids chewing and digestion. Dogs were not supposed to salivate before they were given food. Pavlov realised that some kind of learning had occurred in the dogs and he went on to do a series of classic experiments to demonstrate what he called ’conditioning’ in the dogs.
Classical conditioning can therefore be defined as a type of learning where a stimulus (S) acquires the capacity to evoke a reflexive response (R). In Pavlov’s experiment, the food is a stimulus, which evokes the natural reflexive response of the dog salivating. In this situation, because salivating when food is present is a natural reaction (or reflex), we say that both the stimulus and response are unconditioned. Some typical unconditioned S → R bonds are: hearing a sad story and crying, being told a funny joke and laughing, and eating contaminated food and vomiting.
A person can condition a behaviour when he or she repeatedly presents another (unrelated) stimulus at the same time as the unconditioned stimulus (UCS). This stimulus (that is not initially related to the unconditioned stimulus) is referred to as the neutral stimulus. In Pavlov’s experiment, he paired the presentation of the food with the ringing of a bell. The dogs’ reaction was to continue to salivate (the unconditioned response — UCR) because the food was presented; only now a bell was rung as the food was presented. The bell was the neutral stimulus, because hearing a bell ring would not normally make a dog salivate (Pavlov, 1927).
With repeated associations of the food and the ringing bell, the dogs eventually salivated when the bell was rung even though food was not presented. Thus, at this stage, the neutral stimulus (the ringing bell) took on the characteristics of the unconditioned stimulus, in that it elicited a conditioned response (CR). The CR was the salivating at just the sound of the bell (with no food). The bell now became the conditioned stimulus (CS). This process is called acquisition, as the dog has learned or acquired a new S → R pairing (Pavlov, 1927). To help you be able to identify the unconditioned or conditioned stimulus or response in any given scenario, be sure to study the definitions included in the glossary at the end of this chapter.
Factors affecting conditioning
Pavlov believed that contiguity, which is the closeness in time between the pairing of conditioned stimulus and unconditioned stimulus, is very important for learning. The closer in time two events occur, the more likely they are to become associated. Conversely, as more time passes between the events, the less likely it is that an association will be formed (Pavlov, 1927). For example, with Pavlov’s dogs, if the bell had been rung half an hour before or after the food was presented, it would have been unlikely that the bell would have been associated with the food and become the conditioned stimulus.
Extinction and spontaneous recovery
Pavlov discovered that there is a gradual weakening and eventual disappearance of the conditioned response tendency in the absence of the unconditioned stimulus. In essence, if the person or animal continues to have the conditioned stimulus presented but without the unconditioned stimulus, the conditioned stimulus loses its power to evoke the conditioned response. This is referred to as extinction. For example, after a certain period of time, Pavlov’s dogs stopped salivating when only the bell (the conditioned stimulus) was rung without any food (unconditioned stimulus) being brought to them. At times, however, the conditioned response can reappear some time after the period of extinction. This spontaneous recovery, as it is called, can occur after a period of non-exposure to the conditioned stimulus. However, it is generally not as strong or carried out to the same degree as with the initial conditioning process (Pavlov, 1927).
Figure 9.3 A dog was classically conditioned to salivate at the sound of a ringing bell even when no food was presented
Behaviourist John Watson used repeated exposure to condition an infant aged 11 months called Little Albert to become fearful of a white rat. He did this by showing Little Albert the rat, and then clashing two metal pipes together behind the infant’s head. The natural reaction to a sudden, booming noise is to react with fright, and this would naturally make Albert cry with fear. While Albert would happily play with the rat prior to Watson’s conditioning, the baby would now scream with fear at the mere sight of the rat. This was not a surprising result as the experiment confirmed the findings of Pavlov’s classical conditioning experiments. What was surprising to Watson was that Albert would become terrified and cry in the presence of a dog, a rabbit and even when Watson approached the child wearing a fake beard. It was at this stage that Watson realised that the fear (conditioned response) that was evoked by the white, furry rat (conditioned stimulus) had now been generalised to other stimuli that were white and furry. Discuss the ethics of this experiment with your fellow students.
In his experiments, Pavlov found that, sometimes, there is a tendency for a new unconditioned stimulus (similar to the original stimulus) to elicit a response that is similar to the conditioned response (Pavlov, 1927). This is known as generalisation. Box 9.1 illustrates this phenomenon through a famous psychological experiment performed by John Watson. This aspect of classical conditioning helps make human behaviour highly adaptable.
In contrast to generalisation, stimulus discrimination occurs when one is conditioned to respond to one stimulus and not another. This refers to being conditioned to have a specific response to specific stimuli only. For example, children may initially respond by obeying the commands or requests of all adults (or people socially considered to be authority figures), but over time, they can learn to respond only to their parents.
Classical conditioning seems a relatively straightforward process; however, in the real world, it is never as simple as having one unconditioned stimulus paired with one neutral stimulus to create one conditioned stimulus that yields one conditioned response. Higher-order conditioning occurs when a well-learned conditioned stimulus is used to generate a response to another neutral stimulus. In the Pavlovian experiment, the first order of conditioning occurred when food (the unconditioned stimulus) produced the unconditioned response of salivation. Then the food (unconditioned stimulus) was paired with a ringing bell (neutral stimulus) to yield (with repeated pairings) salivation to the bell (in the absence of food) as the conditioned response (Pavlov, 1927). Second-order conditioning would occur if the scientist were to pair a flashing light (a neutral stimulus) with the ringing bell (the conditioned stimulus), to generate the conditioned response of salivation. Eventually, in theory, the dog would respond to the flashing light as it did to the ringing bell (even in its absence) by salivating as though the dog was anticipating the delivery of food.
Classical conditioning in everyday life
One of the fascinating things about classical conditioning is that it can be observed all around us in common, everyday situations. Here are some examples of classical conditioning that you may see.
When one has an irrational fear of an object, event or situation, which warrants no logical fear, then one may be said to have a phobia of this stimulus. This is known as a conditioned emotional response. A basic classical conditioning model that could generate a phobia is similar to the example of Watson and Little Albert. Take some time to consider a particular phobia that you, a friend or a family member may have and try to isolate and label the unconditioned stimulus—unconditioned response bond. Note that the unconditioned response with regard to phobias is always a fear response to the unconditioned stimulus, which is something that does not usually reflexively generate fear. Neutral stimuli often associated with phobias include heights, insects, animals, etc. The association of the neutral stimulus with the unconditioned stimulus — once, if particularly traumatic (e.g. seeing someone fall off a cliff), or a repeated number of times, generates a conditioned phobic response of fear to the conditioned stimulus (the previously neutral stimulus that becomes the phobia, in this case, a fear of heights).
•Almost all human behaviour is learned. Humans learn through association and/or cognitive processes.
•Learning is the more or less permanent change in behaviour or knowledge that is due to experience (or conditioning).
•In associative learning, it is useful to note the antecedents and consequences of the behaviour. Classical conditioning is based on the antecedents while operant conditioning is based on the consequences of the behaviour.
•Pavlov developed the theory of classical conditioning to understand and predict human and animal behaviour. Classical conditioning is a type of learning where a stimulus acquires the capacity to evoke a reflexive response. A neutral stimulus is paired with an unconditioned stimulus (a reflex that occurs naturally) and with repeated associations becomes a conditioned stimulus which evokes a conditioned response.
•More effective learning occurs when the pairing of conditioned stimulus and unconditioned stimulus occur close together in time.
•Extinction of the conditioned response eventually occurs in the absence of the unconditioned stimulus. However, the conditioned response can reappear after a period of extinction.
•Generalisation occurs when a new unconditioned stimulus (similar to the original stimulus) elicits a response that is similar to the conditioned response.
•Stimulus discrimination is when one is conditioned to respond to one stimulus and not another.
•Higher-order conditioning occurs when a conditioned stimulus is used to generate a response to another neutral stimulus.
Research suggests that some phobias can be more easily conditioned than others, based on the notion of preparedness. This idea suggests that, as an evolutionary benefit for survival, we are biologically programmed to learn to fear objects or events that have inherent danger for us (Mineka & Öhman, 2002). This may explain why fears of closed spaces, snakes or water are fairly common phobias, as these can indeed be life threatening under certain circumstances (like drowning in a fast-flowing river). The fear is pathological, however, when one is faced with these stimuli in non-threatening situations (like taking a shower or a bath in shallow water) and one’s phobic reaction may lead to life-impairing strategies in order to avoid having to deal with the stimulus. A psychotherapist may address this issue by encouraging the patient to undergo exposure therapy. This is a form of cognitive-behavioural therapy in which a patient is prepared, through relaxation and cognitive techniques, for exposure to the stimuli that elicit phobic responses, in order to weaken their strength.
Many of us have found ourselves in situations when, after having a lovely meal, we discover that something we ate has given us food poisoning. This may result in severe vomiting, nausea, diarrhoea and a fever as our bodies attempt to process the tainted food. We might hear a friend saying, ’Oh no, I will NEVER have sushi again. The last time I had it, I was sick for days. Now, I feel ill even at the thought of raw fish.’ Foods that make us sick may lead to avoidance of that food, perhaps for a lifetime. This is because we can develop a conditioned response of feeling physically ill in response to the conditioned stimulus of the food that made us sick (See Box 9.2).
We are bombarded with messages from the media about what to wear, drive and eat. You will recall the section on Watson’s use of conditioning to elicit an emotional response to a stimulus, the Little Albert experiment. Advertising depends very much on this principle as it relies on linking a naturally attractive unconditioned stimulus (like an attractive man or woman) to a neutral stimulus, like the product being sold, to make the product become the conditioned stimulus, which will elicit the conditioned response of a pleasant emotional state. This results in the consumer feeling more positively toward the product (just like they do with the unconditioned stimulus of the attractive person).
9.2THE GARCIA EFFECT AND FOOD AVERSIONS
Garcia’s experiment involving induced food aversion in rats showed that other stimuli associated with a food that results in a food aversion, would not result in an aversion to the related stimuli (Garcia, Hankins & Rusiniak, 1974). Thus, Garcia found that, if eating a hot dog served to you on a green plate makes you sick, you will not develop a conditioned response to the sight of a green plate, but rather have a limited conditioned response to the taste of hot dogs only. This is in opposition to the Pavlovian view that any stimulus can be conditioned to produce aversive behaviour, but this research does support the Pavlovian concept of stimulus discrimination.
Vicarious conditioning is learning that occurs indirectly. For example, if a mother who learned to fear the dentist as a child now takes her own children to the dentist, the children will be aware of their mother’s anxiety and, through this association, develop their own conditioned emotional response to visiting the dentist. Vicarious conditioning can also lead to trauma counsellors experiencing traumatic reactions themselves.
9.3THE SUPERSTITIOUS PIGEON
While there are numerous classic experiments conducted by Skinner to display the power of these techniques, one in particular is valuable for explaining superstition (Skinner, 1962). In this experiment, pigeons were placed in a cage with a food chute. Food was released into the chute (at random intervals) without any action needing to be taken by the pigeons. The pigeons thus received a positive reinforcement, but not one which was tied to any particular action. What Skinner found was that the birds began to repeat various actions that they had happened to be performing when the food arrived. It seemed they began to see a causal link between their behaviour and the presentation of food. The actions performed by the birds included turning in circles and lifting invisible weights, and moving their heads in a pendulum pattern. Skinner used this experiment as evidence to show that superstition in humans is the result of random positive reinforcement of certain behaviours which come to be seen as causal in nature but which in reality are nothing of the sort. Think about whether it was appropriate for Skinner to generalise the behaviours of the pigeons to humans.
•There are many examples of classical conditioning in everyday situations.
•Phobias involve a conditioned emotional response to something that would not ordinarily cause fear in a person. In phobias, the unconditioned response is always a fear response to the unconditioned stimulus. Humans may be biologically programmed to learn to fear objects or events that have inherent danger for them.
•Food aversions may result if we develop a conditioned response of feeling physically ill in response to the conditioned stimulus of some food that made us sick.
•Advertising also uses classical conditioning by linking a naturally attractive unconditioned stimulus (like an attractive person) to a neutral stimulus (the product being sold); this makes the product become the conditioned stimulus, which will elicit the conditioned response of a pleasant emotional state.
•Vicarious conditioning is learning that occurs through a person experiencing the associative learning of others.
B.F. Skinner (1904—1990) is most renowned for his contribution to understanding and shaping behaviour through operant conditioning and schedules of reinforcement (and punishment). He is one of the founders of behaviourism and contributed tremendously to experimental psychology. He also invented the Skinner box, used in his experiments where a rat or pigeon learned to press a lever in order to obtain a reward (like food). In 1972, he won the prestigious Human of the Year award, and in the year that he died, he obtained a citation for Outstanding Lifetime Contribution to Psychology. Skinner’s theory of operant conditioning grew out of the work of Edward Thorndike (see Box 9.4) who developed the law of effect. This essentially said that actions that are reinforced are repeated.
Operant conditioning is a form of associative learning which explains much of our day-to-day behaviour. It is very widely used in a variety of contexts (e.g. parenting, schools, mental hospital and prisons). Operant conditioning can be considered to be a type of learning in which voluntary (controllable, non-reflexive) behaviour is strengthened through reinforcement but weakened when that behaviour is punished. A reinforcer is anything which increases the likelihood of the behaviour being repeated. Operant conditioning differs from classical conditioning in that the behaviours studied in the latter are reflexive (e.g. salivating), whereas the behaviours studied and governed by the principles of operant conditioning are non-reflexive (e.g. disciplining children, gambling or dog training).
Operant conditioning, therefore, endeavours to understand non-reflexive, more complex behaviours, and to establish the conditions under which these behaviours are likely to be repeated. It is called operant conditioning because the person actively ’operates’ on the environment. Learning is achieved when the person continues to behave in ways that are reinforced (e.g. by a reward like money or food) or avoids such behaviours that bring about punishment (e.g. being scolded or paying a fine). Unlike for classical conditioning, people have greater control, in that their preferences affect what is experienced as reinforcement or punishment (Skinner, 1935).
This is illustrated by a child who will not do his/her homework, much to the frustration of his/her parent. It is important to remember that people are unique and each person will find different things reinforcing. To get the child to change the ’non-homework’ behaviour to the desired behaviour (doing homework every day), the parent may have to try a number of rewards (reinforcers) or punishment techniques to discover to which the child responds. The parent may think that promising a sweet (a positive reinforcement) after the child finishes the homework will get the child to do the homework, but the child may only respond to words of praise. Once the parent discovers what type of reinforcement the child responds to, the parent will have to use it consistently for shaping the child’s behaviour (i.e. until the child’s preference changes as he/she matures into adolescence). Some parents might think about punishing the child in some way; Skinner was, however, clear that reinforcement was more effective than punishment for achieving required behaviours (see later in this chapter).
An additional difference between classical and operant conditioning is the fact that, in classical conditioning, the stimulus leading to conditioning comes before the behaviour, but in operant conditioning, the stimulus comes after the behaviour. Essentially, this means that behaviour can only be repeated or avoided after reinforcement or punishment has been delivered to the participant. In effect, then, the reinforcement or punishment is the stimulus to further behaviour. Reread the section on Pavlov’s experiment to compare how the stimulus (the dog food) was delivered before the behaviour (salivation).
9.4THORNDIKE’S LAW OF EFFECT (1911)
American psychologist Edward L. Thorndike’s (1874 —1949) research on animal intelligence had a major effect on the way that animal and human learning was studied, laying the foundation for the experimental analysis of behaviour (Chance, 1999). Thorndike placed emphasis on the power of consequences in order for learning to occur, and stated that when behaviours are followed by favourable consequences they become more likely to occur, whereas behaviours that are followed by unfavourable consequences become less likely to occur. He called this the law of effect and it was to become an influential idea in the work of noted behaviourist B.F. Skinner.
Learning an operant response
As mentioned above, Skinner was the leading researcher in the field of operant conditioning. He designed the Skinner box: an empty cage with a light, and chutes for food and water. It also contains a lever which when pressed releases food pellets down the chute. A hungry rat is placed in the box and eventually presses the lever. A pellet is released. The rat scurries over to the chute and eats the pellet. After grooming for a bit, the rat resumes its exploration of the cage and again presses the lever. Another pellet is released and the rat goes to eat. After this happens a few times, the rat learns that pressing the lever releases the food. Note that it has not learned a new skill (it could already press the lever), but it has learned to apply the skill in a specific way or with a different frequency.
Over repeated experiments, Skinner realised that it is important for the reinforcement to happen very soon after the behaviour is performed, otherwise the association is not made. For example, if you are in the process of house training a new puppy and discover a puddle of urine on the floor, reprimanding him would serve no purpose as he would not associate the punishment with the reason for it. However, if you find the puppy while he is urinating in the house and scold him then and there, it is more likely that he will learn that urinating in the house leads to punishment, and so he will avoid the behaviour in future.
What happens when a brand new behaviour needs to be learned? For example, a show dog may need to be taught to perform a complicated behaviour for a competition. Operant conditioning relies on a method called shaping to generate an entirely new behaviour using a reward system. The process starts by establishing the participant’s preferred reward (as was done in the homework example above). The participant is then rewarded for each small advance that is made in the direction of the desired behaviour. Once an advance has been made, the participant will not be rewarded for simply repeating that behaviour, but only for advancing to the next milestone. Finally, the participant is only rewarded for performing the utterly new behaviour in its entirety.
In the empty Skinner box, it could take a while for the rat to ’find’ the lever and then press it. So Skinner and his associates found that they could teach a rat to approach the lever by giving the rat a reward for each behaviour it performed that brought it closer to pressing the lever (Skinner, 1948). When the rat was placed in the box, it was rewarded by the researcher first for facing the lever, then for each step that it took towards the lever. When the rat stopped or stepped backwards, it was not rewarded. By the time the rat reached the lever, it was only rewarded for more complex behaviour, like standing in front of or pressing the lever. Eventually, the rat would reliably walk towards the lever and press it to get its reward.
There are a number of studies that challenge the notion of shaping as their results show evidence contradictory to the idea that completely foreign behaviours can be taught to humans and animals alike. Behaviourist researchers, Breland and Breland (1961), in trying to teach a chicken to play baseball, showed that some behaviours could not to be taught to animals and that animals would persist with certain instinctual behaviours despite punishment or lack of reinforcement of those behaviours. This phenomenon is referred to as instinctive drift. While Skinner assumes that any animal or person is a blank slate, a tabula rasa, and can be taught to do anything he/she is physically capable of doing, instinctive drift provides evidence to the contrary.
•Skinner’s theory of operant conditioning explains much of our day-to-day behaviour. Operant conditioning can be considered to be a type of learning in which voluntary (controllable, non-reflexive) behaviour is strengthened through reinforcement but weakened when that behaviour is punished.
•Operant conditioning studies non-reflexive behaviours (unlike classical conditioning); also the stimulus comes after the behaviour (whereas in classical conditioning, it comes before.
•Reinforcers are anything that increases the likelihood of the behaviour being repeated. In operant learning, it is necessary to identify what reinforcers are effective for the participant. Reinforcers also need to be presented very soon after the behaviour occurs.
•In shaping, a new behaviour is learned through rewarding each small advance that is made in the direction of the desired behaviour. However, not all behaviours can be taught by shaping as an animal’s instinctive drift will lead it to persist with certain instinctual behaviours despite punishment or lack of reinforcement of those behaviours.
Principles of reinforcement
Skinner proposed that there are two types of reinforcement, both of which encourage the repetition of the desired behaviour, as they result in the participant experiencing a beneficial outcome. Positive reinforcement is when something pleasant (like a reward) is delivered to the participant, while negative reinforcement is when something unpleasant is removed from the participant (e.g. when taking a painkiller helps your headache, you are likely to take one again next time you have a headache). In either case, the point of reinforcement is to increase the frequency or probability of a desired response occurring again (Skinner, 1935). A basic example of this is if the desired behaviour is for a rat to press a lever, positive reinforcement would reward the rat (with food) if it did so, while negative reinforcement would remove an unpleasant stimulus (e.g. experiencing an electric shock).
Students should pay particular attention to the correct usage of the term ’negative reinforcement’. It does not refer to punishment, nor does it imply that negative or bad behaviour is encouraged. Reinforcement always seeks to increase the frequency or probability of the behaviour, and this can be achieved through negative reinforcement when the organism experiences the benefit of having something disagreeable removed from its environment.
Skinner also identified two types of reinforcers: primary reinforcers and secondary reinforcers. The former is a stimulus that naturally strengthens any response that precedes it without the need for any learning on the part of the organism. These reinforcers tend to be biologically based, such as the satisfaction that comes from receiving food, water and sex. The latter, secondary reinforcers, are previously neutral stimuli that acquire the ability to strengthen responses because the stimuli have been paired with a primary reinforcer (Skinner, 1958). Note the similarity to the processes of classical conditioning, with the key differences being that the organism still has control over its voluntary behaviour and that the behaviour must occur before the delivery of the reinforcement. Typical secondary reinforcers are money, approval and exam marks. These can often be exchanged for something of practical value (e.g. money can buy a necessary or desired item, and good exam marks might lead to a monetary reward).
Figure 9.4 Praise as an example of positive reinforcement
In his experiments with rats, Skinner found that learning occurred at a quicker rate when the behaviour was reinforced at each point of correct response. In this regard, Skinner was beginning to experiment with different schedules of reinforcement. This term refers to when and how often a response is reinforced and these variations can have quite an important impact on behaviour. There are two types of reinforcement schedules: continuous and intermittent. In a continuous schedule, each instance of behaviour is reinforced, but in intermittent schedules, behaviour is only reinforced after a certain amount of time or a certain number of responses.
Skinner’s initial experiment used a continuous reinforcement schedule. However, once the behaviour has been learned, the most effective way of maintaining this behaviour is through an intermittent reinforcement schedule, where (as indicated above) reinforcement does not follow every response. There are two basic types of intermittent reinforcement schedule, namely ratio and interval; each of these can be either fixed (predictable) or variable (unpredictable) (Ferster, 2002; Skinner, 1958) (see Figure 9.5):
•Fixed ratio reinforcement scheduling relies on the reinforcement being given after every Nth response; in other words, a certain number of responses have to occur before getting reinforcement. An example of this sort of reinforcement schedule would be casual employees who are paid R20 per hour that they work, or a sales representative who earns a R1 000 bonus for every 10th item of product that she sells.
•Variable ratio reinforcement scheduling is similar to fixed ratio scheduling except that the ratio of responses to the reinforcement varies. In this case, reinforcement is given after every Nth response, but N is an average. Slot machines in casinos operate on this system of variable ratio reinforcement, despite some superstitious gamblers believing that they have developed a method or system that promises them a winning. The slots function on a programme that offers winnings for every average Nth response, for example to give winnings on every 45th pull on average. This could mean that the gambler could win on the first, then on the 80th pull, and then on the 135th pull and so on, provided that it averages out to give a winning, on average, every 45th pull.
•Fixed interval reinforcement schedules dictate that a selected period of time must pass, and then a certain response or behaviour must be performed in order for the participant to receive reinforcement. The reliable schedule of a commuter train, arriving at a particular stop every 30 minutes, is a good example. If you miss the train that has just left, you will have to wait the allocated 30 minutes before the next train will arrive to collect you. Another example is receiving a monthly pay cheque.
•Variable interval reinforcement schedules have the same principles as fixed interval, but now the time interval varies. For example, you receive a call from your friend every day when you get home after university. It tends to be just after you have had dinner, but it can vary from between the time you arrive home until just before you go to bed.
Skinner proposed an alternative (but not opposite) method of changing behaviour. Whereas reinforcement increases the probability of a response occurring again, punishment operates on the notion that the delivery of an unpleasant stimulus will decrease the frequency or probability of a response being repeated. However, through his research, Skinner found that punishment was not as powerful as reinforcement in bringing about behaviour change. Skinner offers us two types of punishment, both carried out with the aim of reducing or stopping a particular behaviour:
•Positive punishment is carried out when an aversive (unpleasant) stimulus is administered so as to reduce the likelihood of certain behaviours being repeated. An example of this is when a driver has to pay a hefty fine for speeding on the highway.
•Negative punishment is the removal of a pleasant stimulus, again reducing the probability that the behaviour will be repeated, as when a child has their favourite toy taken away when they have a tantrum, or when a teenager is grounded for bad behaviour.
Figure 9.5 Schedules of reinforcement
•According to Skinner, there are two types of reinforcement: positive reinforcement and negative reinforcement. Both increase the likelihood of a behaviour being repeated.
•In positive reinforcement, something pleasant is delivered to the participant; in negative reinforcement, something unpleasant is removed from the participant.
•There are two types of reinforcers: primary and secondary. Primary reinforcers are based on biological needs and are not learned; secondary reinforcers are learned and are associated with a primary reinforcer.
•Reinforcement may be continuous or intermittent. There are four intermittent schedules of reinforcement: fixed ratio, variable ratio, fixed interval and variable interval.
•Punishment is an alternative method for changing behaviour. Punishment is intended to decrease the frequency or probability of a response being repeated. However, punishment is not as effective as reinforcement in bringing about behaviour change.
•There are two types of punishment: positive (the presentation of something aversive) and negative (the removal of something pleasant).
9.5APPLICATIONS OF OPERANT CONDITIONING
A number of examples have been offered throughout this section to illustrate how operant conditioning can be applied in everyday life. If you would like to witness some dramatic applications of Skinner’s work regarding the rehabilitation of criminals, watch Stanley Kubrick’s 1971 movie, A Clockwork Orange, as you may find it interesting as a remarkable fictional example of conditioning in action. Current reality shows like The Dog Whisperer and Super Nanny apply the same sort of techniques mentioned in this section to train and discipline unruly dogs and children, respectively.
Table 9.1 A summary of operant conditioning factors
Operant conditioning factors
Always increases the probability that a response will be repeated, and is associated with the participant having a pleasant outcome
Always decreases the probability that a response will be repeated as the participant seeks to avoid something unpleasant
A stimulus is delivered to the participant
A stimulus is withdrawn or removed from the participant
Social (observational) learning theory
Social learning theory is an example of the cognitive learning that was mentioned at the beginning of this chapter. Cognitive learning takes into account the way humans have the capacity to know, remember, understand and anticipate outcomes. For example, you probably have a cognitive map of your campus that you use to find your way to different lecture theatres or the library. A cognitive map is a mental representation or image of a geographical area.
Social or observational learning theory was developed by Albert Bandura (1925 —), who is also renowned for his offering of the theory of self-efficacy (see Chapter 5). In addition, Bandura served as president of the American Psychological Association in 1974 and is an emeritus professor at Stanford University.
Bandura’s social learning theory states that learning occurs when one’s behaviour changes after viewing the behaviour of a model (see Box 9.6). While he respects many of the fundamental concepts of Pavlov’s and Skinner’s approaches to traditional learning theory, Bandura proposes that direct reinforcement or punishment cannot account for all types of learning. His theory includes a social element, arguing that people can learn new information and behaviours by watching other people. This can happen directly or indirectly. For example, a young soccer player may watch his hero in a Premier Soccer League match and try to play like him. Alternatively, the learning may happen indirectly. For example, a child may learn not to tell a lie when he/she observes a sibling being punished for doing so. This is known as vicarious learning, which may occur through observing the outcomes of the actions of other people or even through the media (Bandura, 1965). Bandura (2001) notes the way in which the availability of electronic media has expanded the role of vicarious learning and indeed transformed how social systems operate.
9.6IMITATION OF FILM-MEDIATED AGGRESSIVE MODELS
Arguably, Bandura’s most influential and renowned experiments were the series known as the Bobo doll studies (Bandura, Ross & Ross, 1961). Bandura was attempting to show that children can learn behaviour by observing others performing that behaviour. In his 1963 study, Bandura was interested in whether watching adults perform violent actions on television would lead to children imitating that violent behaviour. In order to test the idea of observational learning, Bandura designed the experiment to include a Bobo doll (a five-foot-high inflatable toy that, when hit, bounces back into its original upright position). The sample of children was divided into four groups. The first group watched the Bobo doll being hit in real life (by the experimenter). The second group watched this same situation being shown on a television screen. The third group watched the situation being carried out by a cartoon character. The final group (as a control group) did not witness any violence towards the Bobo doll. All three experimental groups showed the same sorts of aggressive behaviour when they were asked to play with the Bobo doll, while the control group was less aggressive in their play. Interestingly, the group that watched the aggressive behaviour on television showed greater overall aggression than either the real-life or cartoon groups. Furthermore, boys were more likely than girls to display more aggression of both imitative and non-imitative forms (Bandura et al., 1963).
Bandura has used the above experiment as evidence to show that children learn through observing the behaviour of others, and this has been particularly influential in the field of parenting research. The experiment is, however, not without critique. Primarily, there are concerns about the validity of the sample (the children were all from the Stanford University nursery school and were all white and middle to upper class). Second, the validity of the laboratory setting can be questioned as it was not the real world and, by showing violent media in this experimental setting, the experimenter is seemingly stating that violence is okay in that setting). Lastly, Bobo dolls are designed to be hit and thus violence towards them may not generalise to other violent behaviour (Felson, 1996; Freedman, 1994; Hart & Kritsonis, 2006).
9.7SOCIAL LEARNING AND THE SOUTH AFRICAN CONTEXT
If we think about South African society, we can see ways in which Bandura’s theory might be applicable. For example, if you see someone driving through a red traffic light or a stop street without any punishment, it is likely that you may do the same next time you are faced with an amber light. What other instances can you think of in the South African context?
Bandura (1965) suggests that for observational learning to occur, the following four processes need to take place:
1.Attention. For learning to occur, you must pay attention to the behaviour being modelled. This may depend on your own characteristics (e.g. capacity to pay attention) and on the characteristics of the model. If the model is of high status or seen as an authority or particularly skilled, or similar to you, it is more likely that you will pay attention to him/her.
2.Retention. The ability to store and recall the information is the next important part of the learning process.
3.Reproduction. Once you have retained the information obtained from watching a model perform a particular behaviour, you need to perform the behaviour for yourself. Repetition and practice will result in mastery of the skill needed to perform that behaviour. If the model’s behaviour is successful or brings a reward, the observer is more likely to try to imitate the behaviour.
4.Motivation. Without motivation, there would be no desire to imitate a given behaviour. This is where Bandura agrees with concepts from Skinner, as reinforcement and punishment play an important role in one’s motivation to imitate a modelled behaviour. This reinforcement or punishment can be experienced directly, or by watching the model receive the punishment or reward for behaving in a certain manner.
Figure 9.6 Children learn how to feed their younger siblings by watching their parents do it
Alternatives to traditional learning theory
Although this chapter has presented three primary learning theories, there are a number of varieties of behaviourism. Strict behaviourism, which ignores the value of cognitive (mental) processes in shaping behaviour, would hold that the same principles or conditioning techniques that would be used to train a dog would be successful in training (or educating) humans. Many people would completely reject this old-fashioned position towards human learning processes. However, strict behavioural approaches have been useful in offering a guiding philosophy to educating autistic and intellectually disabled individuals.
Recently, cognitive-behaviourists have engaged with a more popular and effective approach to understanding and predicting human behaviour. Cognitive-behaviourism holds that mental events (including thoughts, feelings and internal dialogue) guide learned behaviour. A therapist who supports this approach would assist people in developing new ways of thinking about themselves in relation to the reality of the world. Cognitive-behavioural therapy would thus help clients overcome problems (like depression and anxiety) caused by dysfunctional and destructive thinking. Studies have shown that this type of therapy is usually brief (short-term) and can successfully resolve psychological distress of this nature in as little as four sessions (depending on the therapist, the client and the severity of the psychological issue) (Lipsitz & Marshall, 2001).
•Social learning theory is an example of cognitive learning; this takes into account the way humans have the capacity to know, remember, understand and anticipate. People may use cognitive maps to help them choose alternative routes.
•Bandura’s social learning theory states that learning occurs when one’s behaviour changes after viewing the behaviour of a model. People can also learn vicariously by watching the responses to the behaviour of others.
•Observational learning depends on four processes: attention, retention, reproduction and motivation.
•Strict behaviourism ignores the usefulness of cognitive processes as they cannot be observed. However, cognitive-behaviourists acknowledge the way that mental events (including thoughts, feelings and internal dialogue) guide learned behaviour. Cognitive-behavioural therapy helps clients with problems caused by dysfunctional and destructive thinking.
The primary insight from this chapter on learning theory is to appreciate that for animals and human beings alike, most behaviour is learned behaviour. If behaviour can be learned, it can also be unlearned, with the correct reinforcements or punishments being applied. Research on learning has led to the development of a relatively precise learning theory, which can be used to understand and predict how (and under what conditions) most individuals will learn to respond to certain events.
acquisition: the phase in a learning experiment in which behaviour is first learnt
cognitive map: mental representations or images of a geographical area
conditioned emotional response: an emotional response to a previously neutral stimulus, as a result of classical conditioning
conditioned response (CR): the response that is elicited by the conditioned stimulus after classical conditioning has taken place
conditioned stimulus (CS): an initially neutral stimulus (like a bell, light or tone) that begins to elicit a conditioned response after it has been paired with an unconditional stimulus
continuous reinforcement: reinforcement schedule where reinforcement is given for every instance of the desired behaviour
extinction: the end result of the process whereby there is a reduction in the strength or probability of a learned behaviour that occurs when the conditioned stimulus is presented without the unconditioned stimulus (in classical conditioning) or when the behaviour is no longer reinforced (in operant conditioning)
fixed interval reinforcement: reinforcement schedule where a specified period of time must pass, and then a certain response or behaviour must be performed in order for the participant to receive reinforcement
fixed ratio reinforcement: reinforcement schedule where a certain number of responses have to occur before getting reinforcement
generalisation: the transfer of a learned response from one stimulus to a similar stimulus
instinctive drift: the tendency of a non-human subject to revert to unconditioned behaviours despite a lack of reinforcement for those behaviours in the experimental setting
negative reinforcement: a situation in which an operant behaviour is strengthened (reinforced) because it removes or prevents a negative (aversive) stimulus
neutral stimulus: a stimulus which produces no response in the subject at the start of the experiment, but, after being paired with the unconditioned stimulus, becomes the conditioned stimulus
positive reinforcement: an operant conditioning procedure in which a behaviour is followed by a positive stimulus or reinforcer which typically increases the strength of the behaviour
preparedness: the extent to which an organism’s evolutionary history makes it easy for the organism to learn a particular association or response
primary reinforcers: reinforcers that are not learned and which often satisfy biological needs (e.g. food)
punishment: an operant conditioning procedure in which the behaviour is followed by a negative or aversive stimulus typically causing the behaviour to decrease in strength
reinforcer: anything which increases the likelihood of a behaviour being repeated
secondary reinforcers: a learned reinforcer (like money) that acquires the ability to strengthen responses because it has been paired with a primary reinforcer
shaping: a procedure for training a new operant behaviour by reinforcing behaviours that are closer and closer to the final behaviour that is desired
spontaneous recovery: the reappearance, after the passage of time, of a response that had previously undergone extinction
stimulus discrimination: stimulus discrimination involves a subject being able to distinguish (discriminate) between similar stimuli
unconditioned response (UCR): in classical conditioning, an innate response that is elicited by a stimulus in the absence of conditioning
unconditioned stimulus (UCS): in classical conditioning, the stimulus that elicits the response before conditioning occurs
variable interval reinforcement: reinforcement schedule where reinforcement is given after a variable time interval
variable ratio reinforcement: reinforcement schedule where reinforcement is given after every Nth response where N is an average
vicarious conditioning: learning that occurs indirectly
vicarious learning: learning of a behaviour by observing the behaviour being performed by others and then modelling (acting out) that behaviour
Multiple choice questions
1.Learning involves changes in behaviour as a result of:
2.Yolisa loves to go dancing at a local night club. One night she sees a bouncer beat up one of the patrons. After that, she feels uneasy and anxious whenever she goes to the club, especially when she sees the bouncers. A behaviourist might say that in that situation, the club serves as a(n):
3.In his classic studies, when Pavlov presented the bell (conditioned stimulus) continuously without the presentation of the food (unconditioned stimulus), the dog’s salivation decreased due to a process called:
4.In operant conditioning, the reinforcer occurs _____________ the response; in classical conditioning, it occurs _____________.
5.According to Skinner, when a child gets all her spelling right and her teacher gives her a star, this is _____________, while in _______________, the teacher
says she no longer has to stay in at break.
a)positive reinforcement; negative reinforcement
b)negative reinforcement; positive reinforcement
c)positive punishment; negative punishment
d)negative punishment; positive punishment.
6.Factory workers are paid for the number of garments they produce in a given time. This type of reinforcement schedule, in which reinforcement is given after an average number of operant responses have occurred, is called a _______________ schedule.
7.A phobia can be seen as a/an:
a)attempt to act frightened to gain attention
b)fear based on realistic dangers
c)behaviour that only children display
d)conditioned emotional response.
8.Siphesihle watches his older brother polish his shoes before school. Later, he attempts to polish his own shoes. Siphesihle’s older brother has acted as:
b)a negative reinforcer
c)a positive reinforcer
9.Social Learning theory was developed by which of the following theorists?
a)B. F. Skinner
d)none of the above.
10.In Bandura’s studies of learned aggression using the Bobo doll, which of the following is NOT true?
a)The control group were less aggressive in their play than the experimental groups.
b)Girls displayed more aggression than boys.
c)The group that watched the aggressive behaviour on television showed the most aggression.
d)All of the above are true.
1.Nosipho’s parents always leave her with the same babysitter, Emily, when they go out. Within minutes of Emily’s arrival, Nosipho’s parents leave. Emily arrives at Nosipho’s birthday party; as soon as Nosipho sees her she begins to cry. Using the principles of classical conditioning, explain why Nosipho cries when she sees Emily.
2.Discuss the nature and importance of observational learning and discuss Bandura’s view on whether reinforcement affects learning or performance.
3.For each of the following scenarios, identify the unconditioned response, unconditioned stimulus, conditioned stimulus and conditioned response (where appropriate):
a)Your dog comes running every time she hears the tin opener.
b)You do your assignments sitting on your bed; at the end of semester you find you feel tense when sitting on your bed.
c)Your friend wears a rubber band on his wrist and snaps it every time he swears (he is trying to stop swearing).
d)On your way home from campus, there is one garden with several dogs that always bark at you; you change your route home to one with no dogs.
e)When you watch a sad movie, you always have a large bowl of chips; later you find that eating chips makes you weepy.