Cognitive Psychology: Intelligence and Testing
Part V: Content Review for the AP Psychology Exam
STANDARDIZATION AND NORMS
When we use tests designed to measure psychological characteristics, we need to know what the scores mean. For example, if a tester measures your IQ, and you score a 125 on this IQ test, how do you know what your IQ is relative to the rest of the world? To determine such relative standing, tests are standardized. Standardization is accomplished by administering the test to a standardization sample, a group of people who represent the entire population. The data collected from the standardization sample is compared against norms, which are standards of performance against which anyone who takes a given test can be compared. Tests need to be restandardized when a new, different population takes the test. The Flynn effect supports the need to restandardize because the data indicates that the population has become smarter over the past 50 years. Thus, an IQ of 100 may mean different things in different years, depending on the standardization sample.
RELIABILITY AND VALIDITY
Tests used to measure any psychological trait or ability must be both reliable and valid. Reliability is a measure of how consistent a test is in the measurements it provides. In other words, reliability refers to the likelihood that the same individual would get a similar score if tested with the same test on separate occasions (disallowing for practice effects or effects due to familiarity with the test items from the first testing). In fact, reliability is often assessed by giving participants a test and later—preferably after they have forgotten the specific items—administering the same test again. The two sets of scores are compared and a correlation coefficient is computed between them. This is called the test-retest method. Tests that are perfectly reliable have a reliability coefficient of one. Reliabilities apply only to groups, however, so that even though a given test is highly reliable, a given individual may show substantial fluctuations in scores.
More Reliability Methods
Other methods of testing reliability include split-half, in which two halves of the same test are given to the same subjects, and the results are correlated, and equivalent form, in which different but similar tests covering the same concepts are given to the same group of subjects and the results are correlated.
Validity refers to the extent that a test measures what it intends to measure. Validity is calculated by comparing how well the results from a test correlate with other measures that assess what the test is supposed to predict. So, for example, if you just developed a new IQ test, and you wanted to know if it was valid, you might compare your results to those that the same participants had achieved on other IQ measures. Even better, you might correlate the IQ test scores with school grades, on the notion that IQ test scores should predict school grades. It is possible to have a test that is reliable but not valid. Such a test consistently measures something, but not what it is intended to measure. However, it is impossible to have a test that is valid but not reliable. If individuals’ scores fluctuate wildly, then they cannot consistently correlate with others’ scores, whatever these other scores may be. Internal validity is the degree to which the subject’s results are due to the questions being asked and not another variable. External validity is true validity—that is, the degree to which results from the test can be generalized to the “real world.” In this case, a test would be externally valid if it does, in fact, measure intelligence.
TYPES OF TESTS
Tests used in psychology can be projective tests, in which ambiguous stimuli, open to interpretation, are presented, or inventory-type tests, in which participants answer a standard series of questions.
Two popular projective tests are the Rorschach Inkblot Test and the Thematic Apperception Test (TAT). The Rorschach is a sequence of 10 inkblots, each of which the participant is asked to observe and then characterize. For example, a participant might see one inkblot as a bat or another as two people staring at each other. Sometimes, people see multiple images in a single inkblot. Different aspects of the participant’s descriptions, such as form and movement of objects, are scored to yield an evaluation of the individual’s personality.
The TAT is a series of pictures of people in ambiguous relationships with other people. The participant’s task is to generate a story to accompany the picture. The story includes both what led up to the scene in the picture and what will occur next. Again, the participant’s responses are used to make judgments about his/her personality. Both of these tests are used by followers of the psychoanalytic view of personality. The major criticism of projective personality tests is that the assessment of the responses can be too subjective.
Other Test Types
There are many other types of tests. Power tests gauge abilities in certain areas. These are usually extremely difficult tests in which it is unlikely that a person could answer all the questions correctly. At the other end of the spectrum are speed tests. These have very easy items, but the test is timed, making completion difficult. Achievement tests assess knowledge gained; the Advanced Placement exams are of this type. In contrast to these are aptitude tests, which evaluate a person’s abilities. A road test before getting a driver’s license is an example of an aptitude test.
Inventory-type tests contain fixed answers to questions. They typically do not allow free responses. A classic example is the MMPI-2-RF mentioned in Chapter 15. This test presents the participant with a variety of statements. The participant’s task is to answer “true,” “false,” or “can’t say.” This test, too, yields a characterization of personality. It is often used to diagnose abnormalities.
Intelligence can be defined as goal-directed adaptive thinking. Such thinking is difficult to measure on a standardized test. In fact, the nature of intelligence itself is an issue of contention among psychologists. Few psychologists would claim that the popular “intelligence” tests measure all aspects of intelligence. Alfred Binet was a French psychologist who first began to measure children’s intelligence for the French government. Binet’s test measured the “mental age” of school-age children so that children needing extra help could be placed in special classrooms. An American psychologist and Stanford University professor named Lewis Terman modified Binet’s test to create a test commonly referred to as the Stanford-Binet Test. The Stanford-Binet became the first widely administered intelligence test during World War I when the United States Army used it to rank recruits. Most modern psychologists measure an aspect of intelligence, called the IQ or intelligence quotient.
This quotient originally was conceived of as a ratio of mental age over chronological (physical) age, multiplied by 100. Mental age is a measure of performance based on comparing the participant’s performance to that of an “average” person of a given age. Therefore, if you take a test and your score is comparable to that of an average 10-year-old, then your mental age is 10. IQ scores are normally distributed, with a mean, median, and mode of about 100, and a standard deviation of 15 or 16 points.
The most common intelligence tests given to children today are the Stanford-Binet Intelligence Scale and the Wechsler Intelligence Scale for Children (WISC-IV). There is also a version of the Wechsler specifically geared toward adults, the Wechsler Adult Intelligence Scale (WAIS). The WISC-IV and WAIS generally have six types of questions: information (how many wings does a bird have?), comprehension (what is the advantage of keeping money in a bank?), arithmetic (if 3 pencils cost $1, what will be the cost of 15 pencils?), similarities (in what ways are seals and sea lions alike?), vocabulary (what does “retain” mean?), and digit span questions in which subjects are asked to hold information in short-term memory.
Measuring IQs Today
Today, IQs are rarely computed as quotients, but rather are computed on the basis of the extent to which a person’s score is above or below the average.
There has been an ongoing debate as to whether intelligence is one specific set of abilities or many different sets of abilities. In the early part of the 20th century, Charles Spearman proposed that there was a general intelligence (or g factor) that was the basis of all other intelligence. The g factor is the intelligence applied across mental activities, which is close to the standard definition for “intelligence.” The s factor is the breakdown of this intelligence into a specific component, such as one’s ability to process math equations or linguistic puns. Spearman used factor analysis, a statistical measure for analyzing test data. Robert Sternberg proposed that intelligence could be more broadly defined as having three major components: analytical, practical, and creative intelligence. Louis Thurstone, a researcher in the field of intelligence, posited that we need to think of intelligence more broadly because intelligence can come in many different forms. The most famous proponent of the idea of multiple intelligences is Howard Gardner of Harvard University. Gardner has identified the following types of intelligence: verbal and mathematical (these are the two traditionally measured by IQ tests) as well as musical, spatial, kinesthetic, environmental, interpersonal (people perceptive), and intrapersonal (insightful, self-awareness). Daniel Goleman, a psychologist at Rutgers, has done recent work on the importance of emotional intelligence (being able to recognize people’s intents and motivations) and has created programs for enhancing one’s emotional intelligence.
Heredity/Environment and Intelligence
Nature and nurture interact in the formation of human intelligence. One way to measure the influence of inheritance on IQ is through a heritability coefficient. This coefficient, which ranges from 0 to 1, is a rough measure of the proportion of variation among individuals that can be attributed to genetic effects. Heritability is sometimes computed by comparing the IQs of identical twins who were raised separately. The assumption is that because the identical twins have identical genes, all variation in identical twins reared apart must be due to environment. Of course, the assumption is rarely completely met because identical twins are usually not separated at birth and even if they are, they still have shared the intrauterine environment of the mother. This type of analysis typically yields heritability quotients of about 0.6—0.8 (on a scale of 0—1.0). The percentage not due to heritability can be contributed to the environmentality of a particular trait. When psychologists compare the IQs of identical twins raised together to those of fraternal twins raised together, the resulting heritability quotient is about 0.75. This analysis assumes that families and people outside families treat identical and fraternal twins in the same way, an assumption that seems questionable. Many psychologists believe that the true heritability quotient for IQ is about 0.5. Thus, half of the variation among people is due to heredity, half to environment. It is important to realize that the heritability of a trait has nothing to do with its modifiability. For example, height is highly heritable, but heights have been increasing over the past several generations, especially in certain Asian countries such as Japan, as a result of changing diet. Here’s a helpful analogy to illustrate modifiability of intelligence and the interplay of nature and nurture: think of nature as the soil in which intelligence can grow and nurture as the degree of care for the crop.
As previously stated, IQs are roughly normally distributed. As a result, a large majority of people will have an IQ near 100. However, in a normal distribution, there will also be a small number of people at the high and low ends of the IQ range.
Very high IQs are one basis for considering people to be intellectually gifted. Sometimes, an IQ in the 99th percentile (higher than about 135) is considered “gifted,” although there is no set standard. Moreover, other measures besides IQ should be used in assigning a label of “gifted.” Louis Terman conducted a study of gifted children, following them into adulthood. Many of the participants went on to be very successful; however, part of their success may have been due to the socioeconomic status of their parents. Other factors unrelated to IQ may also have influenced the ability of the participants to succeed.
Between the DSM-IV and DSM-5, the term “mental retardation” became “intellectual disability.” It’s unlikely that you’ll see the outdated former term on the AP Psychology Exam, but better safe than sorry!
Intellectual disability refers to low levels of intelligence and adaptive behavior. Low IQ alone does not signify this. To be classified as intellectually disabled, a person must also demonstrate a low level of adaptive competence, or the ability to get along in the world. Intellectual disability can be categorized by severity ranging from mild, with an IQ range of 50—70, to profound, characterized by an IQ lower than 25.
ETHICS IN TESTING
Those who are involved in psychometrics, or psychological testing, must be sure that they follow certain guidelines. Confidentiality must be protected. The purposes of the test must be clear to those administering and those taking the test. A group of individuals at each research institution sit on the Institutional Review Board, which combs through the proposed methodology of a study to determine if there may be any unethical behavior in or adverse consequences of a scientist’s research before he or she is granted permission to perform any experiments. Questions should be asked and answered concerning who will see the results of the test and how the scores will be used. Furthermore, the impact of the scores should be ascertained before the test is given.
Standardization and Norms
Reliability and Validity
Types of Tests
Rorschach Inkblot Test
Thematic Apperception Test
intelligence quotient (IQ)
Stanford-Binet Intelligence Scale
Wechsler Intelligence Scale for Children
Wechsler Adult Intelligence Scale
Ethics in Testing
Chapter 12 Drill
See Chapter 19 for answers and explanations.
1.In the context of psychometric testing, content validity is defined as
(A)the extent to which the test actually measures what it is purported to measure
(B)the degree to which there is a correlation between results on the test and future performance on another measure
(C)the degree to which the test will yield similar results across administrations
(D)the extent to which scores on two versions of the test are highly correlated
(E)the degree to which scores on two sections of the same test are consistent with each other
2.Which of the following is an example of a projective test?
(A)The Stanford-Binet Intelligence Scale
(B)The Thematic Apperception Test (TAT)
(C)The Minnesota Multiphasic Personality Inventory (MMPI)
(D)The Strong Vocational Interest Blank
3.On a normal score distribution, an IQ score of 85 would be located
(A)approximately one standard deviation above the mean
(B)approximately one standard deviation below the mean
(C)approximately two standard deviations above the mean
(D)approximately two standard deviations below the mean
(E)in a variable position—it would depend on the age of the respondent
4.Test standardization is accomplished by
(A)administering the test to a sample chosen to reflect the characteristics of the population in question
(B)administering different parts of the test to different samples meant to reflect different populations
(C)correlating the results on the test with results on other tests that claim to measure the same dimension
(D)correlating the consistency of scores given by different sets of graders
(E)equilibrating the number of times each answer choice appears
5.Which of the following is NOT a dimension of intelligence in Howard Gardner’s theory of multiple intelligences?
(A)how consistently the test holds up over time
(B)how well the test measures what it means to test
(C)a way in which you can be sure that an experiment tests only one variable at a time
(D)how consistently an individual will score on the same test on subsequent occasions
(E)to what extent the findings of a study can be generalized to the whole population
7.A multiple-choice or a true/false question test is an example of a(n)
8.Which of the following is necessary for a test to be ethical?
(B)Full disclosure of deception
Respond to the following questions:
· Which topics in this chapter do you hope to see on the multiple-choice section or essay?
· Which topics in this chapter do you hope not to see on the multiple-choice section or essay?
· Regarding any psychologists mentioned, can you pair the psychologists with their contributions to the field? Did they contribute significant experiments, theories, or both?
· Regarding any theories mentioned, can you distinguish between differing theories well enough to recognize them on the multiple-choice section? Can you distinguish them well enough to write a fluent essay on them?
· Regarding any figures given, if you were given a labeled figure from within this chapter, would you be able to give the significance of each part of the figure?
· Can you define the key terms at the end of the chapter?
· Which parts of the chapter will you review?
· Will you seek further help, outside of this book (such as a teacher, Princeton Review tutor, or AP Students), on any of the content in this chapter—and, if so, on what content?