Foundations: Methods and Approaches
Part V: Content Review for the AP Psychology Exam
EXPERIMENTAL, CORRELATIONAL, AND CLINICAL RESEARCH
Three major types of research in psychology include: experimental, correlational, and clinical. An experiment is an investigation seeking to understand relations of cause and effect. The experimenter changes a variable (cause) and measures how it, in turn, changes another variable (effect). At the same time, the investigator tries to hold all other variables constant so she can attribute any changes to the manipulation. The manipulated variable is called the independent variable. The dependent variable is what is measured. For example, an experiment is designed to determine whether watching violence on television causes aggression in its viewers. Two groups of children are randomly placed either in front of violent or nonviolent television programs for one hour. The program type is the independent variable because it can be manipulated by the experimenter. Afterward, a large doll may be placed in front of each child for one hour while the experimenter records the number of times that child hits, kicks, or punches the doll. This behavior is the dependent variable because it is the variable that is measured. The presence of the doll in both groups is the control variable, because it is constant in both groups.
In order to draw conclusions about the result of the controlled experiment, it is important that certain other conditions are met. The researcher identifies a specific population, or group of interest, to be studied. Because the population may be too large to study effectively, a representative sample of the population may be drawn. Representativeness is the degree to which a sample reflects the diverse characteristics of the population that is being studied. Random sampling is a way of ensuring maximum representativeness. Once sampling has been addressed, subjects are randomly assigned into both the experimental and control groups. Random assignment is done to ensure that each group has minimal differences.
Other terms to remember include the following: the group receiving or reacting to the independent variable is the experimental group; the control group does not receive the independent variable but should be kept identical in all other respects. Using two groups allows for a comparison to be made and causation to be determined.
A classic example of unintentional sampling bias occurred during the 1948 U.S. Presidential Election: a survey was conducted by randomly calling households and asking them whom they intended to vote for, Harry Truman or Thomas Dewey. Based on this phone survey, Dewey was projected to win. The results of the election proved otherwise, as Truman was re-elected. What could have possibly gone wrong? In 1948, having a telephone was not such a common thing, and households that had them were generally wealthier. As a result, the “random” selection of telephone numbers was not a representative sample because many people (a large proportion of whom voted for Truman) did not have telephone numbers. For the purposes of the test, you should be able to identify the following types of sampling biases:
· The bias of selection from a specific real area occurs when people are selected in a physical space. For example, if you wanted to survey college students on whether or not they like their football team, you could stand on the quad and survey the first 100 people that walk by. However, this is not completely random because people who don’t have class at that time are unlikely to be represented.
· Self-selection bias occurs when the people being studied have some control over whether or not to participate. A participant’s decision to participate may affect the results. For example, an Internet survey might elicit responses only from people who are highly opinionated and motivated to complete the survey.
· Pre-screening or advertising bias occurs often in medical research; how volunteers are screened or where advertising is placed might skew the sample. For example, if a researcher wanted to prove that a certain treatment helps people to stop smoking, the mere act of advertising for people who “want to quit smoking” might provide only a sample of people who are already highly motivated to quit and might have done so without the treatment.
· Healthy user bias occurs when the study population tends to be in better shape than the general population. As with the bias of selection from a specific real area, this is an instance in which those subjects might not, in turn, accurately represent their neighborhoods—even though the gym might have a diverse population.
To avoid inadvertently influencing the results, as in the previous examples, researchers use a single- or double-blind design. Single-blind design means that the subjects do not know whether they are in the control or experimental group. In a double-blind design, neither the subjects nor the researcher knows who is in the two groups. Double-blind studies are designed so that the experimenter does not inadvertently change the responses of the subject, such as by using a different tone of voice with members of the control group than with the experimental group. Obviously, a third party has the appropriate records so that the data can be analyzed later. In some double-blind experiments, the control group is given a placebo, a seemingly therapeutic object or procedure, which causes the control group to believe they are in the experimental group but actually contains none of the tested material.
Types of Research
Two specific types of research that can be set up as correlational or experimental designs are longitudinal studies and cross-sectional studies. Longitudinal studies happen over long periods of time with the same subjects (e.g., studying the long-term effects of diet and exercise on heart disease), and cross-sectional studies are designed to test a wide array of subjects from different backgrounds to increase generalizability.
Correlational research involves assessing the degree of association between two or more variables or characteristics of interest that occur naturally. It is important to note that, in this type of design, researchers do not directly manipulate variables but rather observe naturally occurring differences. If the characteristics under consideration are related, they are correlated. It is important to note that correlation does not prove causation; correlation simply shows the strength of the relationship among variables. For example, poor school performance may be correlated with lack of sleep. However, we do not know if lack of sleep caused the poor performance, or if the poor school performance caused the lack of sleep, or if some other unidentified factor influenced them both. If an unknown factor is playing a role, it is known as a confounding variable, a third variable, or an extraneous variable. One way to gather information for correlational studies is through surveys. Using either questionnaires or interviews, one can accumulate a tremendous amount of data and study relationships among variables. Such techniques are often used to assess voter characteristics, teen alcohol and drug use, and criminal behavior. For example, survey studies might examine the relationship between socioeconomic status and educational levels. Correlational studies can be preferred to experiments because they are less expensive, not as time consuming, and easier to conduct. In addition, some relationships cannot be ethically studied in experiments. For example, you may want to study how child abuse affects self-efficacy in adulthood, but no one will allow you to randomly assign half of your baby participants to the child abuse condition.
Clinical research often takes the form of case studies. Case studies are intensive psychological studies of a single individual. These studies are conducted under the assumption that an in-depth understanding of single cases will allow for general conclusions about other similar cases. Case studies have also been used to investigate the circumstances of the lives of notable figures in history. Frequently, multiple case studies on similar cases are combined to draw inferences about issues. Researchers must be careful, though, because case studies, like correlational ones, cannot lead to conclusions regarding causality. Sigmund Freud and Carl Rogers used numerous case studies to draw their conclusions about psychology. The danger of generalizing from the outcomes of case studies is that the individuals studied may be atypical of the larger population. This is why researchers try to ensure that their studies are generalizable—that is, applicable to similar circumstances because of the predictable outcomes of repeated tests.
Two important features of studies are the conceptual definition and the operational definition. Whereas the conceptual definition is the theory or issue being studied, the operational definition refers to the way in which that theory or issue will be directly observed or measured in the study. For example, in a study on the effects of adolescent substance abuse, the way in which taking drugs affects adolescent behavior is the conceptual definition, while the number of recorded days the student is absent from school due to excessive use of substances is the operational definition.
Operational definitions have to be internally and externally valid. Internal validity is the certainty with which the results of an experiment can be attributed to the manipulation of the independent variable rather than to some other, confounding variable. External validity is the extent to which the findings of a study can be generalized to other contexts in the “real world.” It is also important that the study have reliability, which is whether or not the same results appear if the experiment is repeated under similar conditions. A related concept is inter-rater reliability, the degree to which different raters agree on their observations of the same data.
OTHER TYPES OF RESEARCH
In addition to organizing experiments inside of a lab, researchers can observe behavior outside of the lab; such naturalistic observation has enriched our knowledge of psychology.
Psychologists and other scientists collect data. This data is then subjected to statistical analysis. Statistical methods can be divided into descriptive and inferential statistics. Descriptive statistics summarize data, whereas inferential statistics allow researchers to test hypotheses about data and determine how confident they can be in their inferences about the data.
Descriptive statistics do just what their name implies—they describe data. They do not allow for conclusions to be made about anything other than the particular set of numbers they describe. Commonly used descriptive statistics are the mean, the mode, and the median. These descriptive statistics are measures of central tendency—that is, they characterize the typical value in a set of data.
The Mean, Mode, and Median Measure the Middle!
The mean is the arithmetic average of a set of numbers. The mode is the most frequently occurring value in the data set. (If two numbers both appear with the greatest frequency, the distribution is called bimodal.) The median is the number that falls exactly in the middle of a distribution of numbers. These statistics can be represented by a normal curve. In a perfectly normal distribution, the mean, median, and mode are identical. The range is simply the largest number minus the smallest number.
The graph of the normal distribution depends on two factors—the mean and the standard deviation. The mean of the distribution determines the location of the center of the graph, and the standard deviation determines the height and width of the graph. When the standard deviation is large, the curve is short and wide; when the standard deviation is small, the curve is tall and narrow. All normal distributions look like a symmetric, bell-shaped curve, as shown above.
Q: What is standard deviation?
Answer on this page.
In a typical distribution of numbers, about 68 percent of all scores are within one standard deviation above or below the mean, and about 95 percent of all scores are within two standard deviations above or below the mean. So, for example, IQ is typically said to have a mean of 100 and a standard deviation of 15, so a person with a score of 115 is one standard deviation above the mean.
Be aware that math questions about normal distributions can appear on the test. Because skewed distributions do not all share the same mathematical properties, questions about percentages and these distributions are often trick questions.
The curve on the left is shorter and wider than the curve on the right, because the curve on the left has a bigger standard deviation.
In skewed distributions, the median is a better indicator of central tendency than the mean. A positive skew means that most values are on the lower end, but there are some exceptionally large values. This creates a “tail” or skew toward the positive end. A negative skew means the opposite: most values are on the higher end, but there are some exceptionally small values. This creates a “tail” or skew toward the negative end.
Although the mean, the mode, and the median give approximations of the central tendency of a group of numbers, they do not tell us much about the variability in that set of numbers. Variability refers to how much the numbers in the set differ from one another. The standard deviation measures a function of the average dispersion of numbers around the mean and is a commonly used measure of variability. For example, say you have a set of numbers that has a mean of 100. If most of those numbers are close to 100, say, ranging from 95 to 105, then the standard deviation will be small. However, if the mean of 100 comes from a set of numbers ranging from 50 to 150, then the standard deviation will be large.
It’s unlikely that there will be any questions that actually exercise your math skills, but you should be able to read and interpret a graph and understand what the standard deviation represents in a study. For example, suppose that 1,000 subjects participate in a study on reaction time and that the reaction times of the subjects are normally distributed with a mean of 1.3 seconds and a standard deviation of 0.2 seconds. In this particular instance, participants with a reaction time between 1.1 and 1.5 seconds would represent the normal distribution and one standard deviation above and below, which as we’ve established is generally around 68 percent of the population. Therefore, roughly 680 of 1,000 people would have this reaction time. Meanwhile, a reaction time of over 1.9 seconds would be extremely rare—more than three standard deviations above the mean. It’s likely that only 0.3 percent of the data would fall into the category of three standard deviations above or below the mean. Thus, only 0.15 percent would be that far above the mean, or about 1.5 of the 1,000 subjects.
Another common descriptive statistic is the percentile. This statistic is used frequently when reporting scores on standardized tests. Percentiles express the standing of one score relative to all other scores in a set of data. For example, if your SAT score is in the 85th percentile, then you scored higher than 85 percent of the other test-takers.
A: Standard deviation measures the average distribution of numbers around the mean.
When looking at correlational data, as described above, we need statistical techniques to describe how the attributes we are studying relate to one another. The correlation coefficient is a statistic that will give us such information. The correlation coefficient is a numerical value that indicates the degree and direction of the relationship between two variables. Correlation coefficients range from +1.00 to —1.00. The sign (+ or —) indicates the direction of the correlation, and the number (0 to + or —1.00) indicates the strength of the relationship. The Pearson correlation coefficient is a descriptive statistic that describes the linear relationship between two attributes. Pearson correlations can be positive, zero, or negative and are typically measured on a scale ranging from 1 to 0 to —1. A correlation of 1 indicates a perfect positive correlation. This means that as attribute X increases, attribute Y always increases proportionally. A correlation of —1 is a perfect negative correlation: as the value of attribute X increases, the value of attribute Y always decreases proportionally. A correlation of 0 indicates that the attributes are not related.
Positive Correlation (As years of education increase, income increases.)
Negative Correlation (As absences for math lessons increase, math score decreases.)
To use another example, take the following study, which assessed 200 male children from ages 1 to 12. A standardized questionnaire was given to the parents of the children and used to check their “agreeableness” on a scale from 0 to 5. Additionally, psychologists took standard measures of the behavioral problems exhibited by the children. After the incidents were totaled for each subject, the psychologists found a correlation between child agreeableness and later behavioral problems of —0.6.
Correlation and Causation
A famous example of the tricky relationship between correlation and causation can be taken from an observation once made about New York City, in that the murder rate is directly correlated to the sale of ice cream (as ice cream sales increase, so do the number of murders). Does this mean that buying ice cream is definitively the cause of the increase in murders? Of course not! When two variables are correlated (especially two variables that are as complex as the measures of human behavior studied in the example above), there are always a number of other factors that could be influencing either correlated variable. In the ice cream/murder example, one such potential confounding variable might be temperature; as the temperature rises, more crimes are committed, but people also tend to eat more cold foods, like ice cream.
You don’t need any math here either, but you do have to understand that this is an inverse correlation between the scores: as the child’s agreeableness increases, behavioral problems decrease. Don’t forget, however, what we stressed earlier in this chapter: correlation does not imply causation.
Inferential statistics are used to determine our level of confidence in claiming that a given set of results would be extremely unlikely to occur if the result were only up to chance. When experiments are conducted, they are typically conducted using a small group of people. However, psychologists typically want to be able to generalize the results of the experiment to a larger group of people, perhaps even to all people. The small group of people in the experiment constitutes the sample, and the large group to whom the psychologist is trying to generalize is called the population. It is important that the sample reflects the characteristics of the population as a whole. If it does, then the sample is referred to as being representative.
Sample size refers to the number of observations or individuals measured. The sample size is typically denoted by N (the total number of subjects in the sample being studied) or n (the total number of subjects in a subgroup of the sample being studied). While larger sample sizes always confer increased accuracy, the sample size used in a study is typically determined based on convenience, expense, and the need to have sufficient statistical power (the likelihood that your sample includes a sufficient number of subjects to conclude that the hypothesis being evaluated is true within an acceptable margin of error). Larger sample sizes are always better—the larger the sample size, the more likely it is that the inferences about the broader population are correct.
Inferential statistics are tools for hypothesis testing. The null hypothesis states that a treatment had no effect in an experiment. The alternative hypothesis is that the treatment did have an effect. Inferential statistics allow us the possibility of rejecting the null hypothesis with a known level of confidence—that is, of saying that our data would be extremely unlikely to have occurred were the null hypothesis true. Tests such as these are statistically significant because they enable us to examine whether effects are likely to be a result of treatment or are likely to be simply the normal variations that occur among samples from the same population. If a result is found to be statistically significant, then that result may be generalized with some level of confidence to the population.
Alpha is the accepted probability that the result of an experiment can be attributed to chance rather than the manipulation of the independent variable. Given that there is always the possibility that an experiment’s outcome can happen by chance, no matter how improbable, psychologists usually set alpha at 0.05, which means that an experiment’s results will be considered statistically significant if the probability of the results happening by chance is less than 5 percent.
Two primary types of errors can occur when testing a hypothesis. A Type I error refers to the conclusion that a difference exists when, in fact, this difference does not exist. A Type II error refers to the conclusion that there is no difference when, in fact, there is a difference. Psychologists pay particularly close attention to Type I errors because they want to be conservative in their inferences: they do not want to conclude that a difference exists if, in fact, it does not. A good analogy for Type I and Type II errors is that a Type I error is a “false positive,” and a Type II error is a “false negative.” The probability of making a Type I error is called the p-value. A p-value indicates that the results are statistically significant (not due only to chance). If p = 0.05, we have only a 5 percent chance of making a Type I error. In other words, a difference as extreme as what was obtained would be found only 5 percent of the time if the null hypothesis were correct.
ETHICS IN RESEARCH
Occasionally, psychological experiments involve deception, which may be used if informing participants of the nature of the experiment might bias results. This deception is typically small, but in rare instances it can be extreme. For example, in the 1970s, Stanley Milgram conducted obedience experiments in which he convinced participants that they were administering painful electric shocks to other participants, when, in fact, no shocks were given. The shocked “participants” were in fact confederates; that is, they were aware of the true nature of the experiment but pretended to be participants. Those giving the shocks were the real participants. Many people felt that this study was unethical because the participants were not aware of the nature of the study and could have believed that they had done serious harm to other people. Since this time, ethical standards have been set forth by the American Psychological Association (APA) to ensure the proper treatment of animal and human subjects. Institutional Review Boards (IRBs) assess the research plans before the research is approved to ensure that it meets all ethical standards. Additionally, participants must give informed consent; in other words, they agree to participate in the study only after they have been told what their participation entails. Participants are also allowed to leave the experimental situation if they become uncomfortable about their participation. After the experiment is concluded, participants must receive a debriefing, in which they are told the exact purpose of their participation in the research and of any deception that may have been used in the process of experimentation.
Confidentiality is another area of concern for psychology. Many experiments involve collecting sensitive information about participants that the participants might not want to be revealed. For this reason, most psychological data is collected anonymously, with the participants’ names not attached to the collected data. If such anonymity is not possible, it is the researcher’s ethical obligation to ensure that names and sensitive information about participants are not revealed.
Pain, both physiological and psychological, is also an issue in experiments. In the past, shock was an acceptable technique with human participants. However, physical pain is infrequently used in experiments today. Psychological stress is also minimized.
The use of animals in psychological experiments is a topic of controversy. According to animal-rights activists, animals often endure both physiological and psychological stress in experiments. Often, the animals are euthanized at the end of the research. Psychologists counter that many lifesaving drugs could not be tested were it not for tests with animals. Moreover, animal models afford a level of experimental control that is not attainable with human participants. Of course, no ethical researcher wants to cause unnecessary pain or discomfort to any subject—animal or human.
Experimental, Correlational, and Clinical Research
bias of selection
healthy user bias
single- or double-blind
pearson correlation coefficient
type I error
type II error
Ethics in Research
Institutional Review Boards (IRBs)
Chapter 6 Drill
See Chapter 19 for answers and explanations.
1.In a double-blind experimental design, which of the following would be true?
(A)The experimental subjects know whether they are in an experimental group or in a control group, but the researchers do not.
(B)The researchers know whether particular subjects have been assigned to an experimental group or a control group, but the experimental subjects do not.
(C)Both the researchers and the experimental subjects know whether the latter have been assigned to an experimental group or a control group.
(D)Neither the researchers nor the experimental subjects know whether the latter have been assigned to an experimental group or a control group.
(E)The observers are unable to see the responses or behaviors of the experimental group during the course of the experimental manipulation.
2.In a normal distribution of scores, approximately what percentage of all scores will occur within one standard deviation from the mean?
3.A Type II error involves
(A)concluding a difference between groups exists after the experimental manipulation when, in fact, a difference does not exist
(B)concluding a difference between groups does not exist after the experimental manipulation when, in fact, a difference does exist
(C)concluding a score is two standard deviations above the mean when, in fact, it is two standard deviations below the mean
(D)concluding a score is two standard deviations below the mean when, in fact, it is two standard deviations above the mean
(E)rejecting the null hypothesis when, in fact, it should have been accepted
4.Which of the following would NOT be considered essential for a proposed research design to meet the requirements for ethicality?
(A)Research subjects must consent to participate in the project, and a full description of what their participation consists of must be spelled out before they are asked to give consent.
(B)Participants must be allowed to withdraw from the project at any time.
(C)Both the subjects and the researchers must know which of the subjects will be part of the experimental group.
(D)If deception is involved, a full debriefing of the subjects must occur soon after the completion of the project.
(E)In keeping with protecting the privacy and confidentiality of the subjects, data should be obtained as anonymously as possible.
5.The correlation between two observed variables is —0.84. From this, it can be concluded that
(A)as one variable increases, the other is likely to increase, showing a direct relationship
(B)as one variable increases, the other is likely to decrease, showing an inverse relationship
(C)the two variables are unrelated
(D)one variable causes the other variable to occur
(E)one variable causes the other variable not to occur
6.A study seeks to find the effects of video games on violent behavior. The researcher creates an experimental design in which 100 random participants play violent video games and another 100 play nonviolent video games for one hour. The researcher then records and observes the behavior of the subjects. The behavior of the subjects is known as the
7.Which of the following is concerned with the real-life applicability of a study?
8.A study that analyzes the effects of heart disease in different regions of the country and socioeconomic statuses is called a
9.A researcher seeks to study the effects of a weight-loss supplement and decides to place an advertisement on buses and subways in New York City to attract subjects. All could happen with this type of subject selection EXCEPT
(D)healthy user bias
10.When graphing the distribution of a study, the researcher notices that a disproportionate amount of subjects scored low on their test, shifting the peak of the bell curve she was expecting. This is called a
Respond to the following questions:
· Which topics in this chapter do you hope to see on the multiple-choice section or essay?
· Which topics in this chapter do you hope not to see on the multiple-choice section or essay?
· Regarding any psychologists mentioned, can you pair the psychologists with their contributions to the field? Did they contribute significant experiments, theories, or both?
· Regarding any theories mentioned, can you distinguish between differing theories well enough to recognize them on the multiple-choice section? Can you distinguish them well enough to write a fluent essay on them?
· Regarding any figures given, if you were given a labeled figure from within this chapter, would you be able to give the significance of each part of the figure?
· Can you define the key terms at the end of the chapter?
· Which parts of the chapter will you review?
· Will you seek further help, outside of this book (such as a teacher, Princeton Review tutor, or AP Students), on any of the content in this chapter—and, if so, on what content?