## 5 Steps to a 5: AP Psychology - McGraw Hill 2021

# Elementary Statistics

5 Scientific Foundations of Psychology

STEP 4 Review the Knowledge You Need to Score High

**Statistics** is a field that involves the analysis of numerical data about representative samples of populations. A large amount of data can be collected in research studies. Psychologists need to make sense of the data. Qualitative data are frequently changed to numerical data for ease of handling. Quantitative data already are numerical.

**Descriptive Statistics**

Numbers that summarize a set of research data obtained from a sample are called **descriptive statistics.** In general, descriptive statistics describe sets of interval or ratio data. After collecting data, psychologists organize the data to create a **frequency distribution,** an orderly arrangement of scores indicating the frequency of each score or group of scores. The data can be pictured as a **histogram**—a bar graph from the frequency distribution—or as a **frequency polygon**—a line graph that replaces the bars with single points and connects the points with a line. With a very large number of data points, the frequency polygon approaches a smooth curve. Frequency polygons are shown in Figure 5.1.

**Figure 5.1 (a) The normal distribution or bell curve. (b) Negatively skewed distribution—skewed to the left. (c) Positively skewed distribution—skewed to the right.**

**Measures of Central Tendency**

Measures of **central tendency** describe the average or most typical scores for a set of research data or distribution. Measures of central tendency include the mode, median, and mean. The **mode** is the most frequently occurring score in a set of research data. If two scores appear most frequently, the distribution is **bimodal;** if three or more scores appear most frequently, the distribution is **multimodal.** The **median** is the middle score when the set of data is ordered by size. For an odd number of scores, the median is the middle one. For an even number of scores, the median lies halfway between the two middle scores. The **mean** is the arithmetic average of the set of scores. The mean is determined by adding up all of the scores and then dividing by the number of scores. For the set of quiz scores 5, 6, 7, 7, 7, 8, 8, 9, 9, 10, the mode is 7; the median is 7.5; the mean is 7.6. The mode is the least-used measure of central tendency but can be useful to provide a “quick and dirty” measure of central tendency, especially when the set of data has not been ordered. The mean is generally the preferred measure of central tendency because it takes into account the information in all the data points; however, it is very sensitive to extremes/outliers. The mean is pulled in the direction of extreme data points. The advantage of the median is that it is less sensitive to extremes, but it doesn’t take into account all of the information in the data points. The mean, mode, and median turn out to be the same score in symmetrical distributions. The two sides of the frequency polygon are mirror images, as shown in Figure 5.1a. The **normal distribution** or normal curve is a symmetric, bell-shaped curve that represents data about how many human characteristics are dispersed in the population. Distributions where most of the scores are squeezed into one end are **skewed.** A few of the scores stretch out away from the group like a tail. The skew is named for the direction of the tail (look for the “tail of the whate”). Figure 5.1b pictures a negatively skewed distribution, and Figure 5.1c shows a positively skewed distribution. The mean is pulled in the direction of the tails, so the mean is lower than the median in a negatively skewed distribution, and higher than the median in a positively skewed distribution. In very skewed distributions, the median is a better measure of central tendency than the mean.

**Measures of Variability**

**Variability** describes the spread or dispersion of scores for a set of research data or distribution. Measures of variability include the range, variance, and standard deviation. The **range** is the largest score minus the smallest score. It is a rough measure of dispersion. For the same set of quiz scores (5, 6, 7, 7, 7, 8, 8, 9, 9, 10), the range is 5. **Variance** and **standard deviation (SD)** indicate the degree to which scores differ from each other and vary around the mean value for the set. Variance and standard deviation indicate both how much scores group together and how dispersed they are. Variance is determined by computing the difference between each value and the mean, squaring the difference between each value and the mean (to eliminate negative signs), summing the squared differences and then taking the average of the sum of squared differences. The standard deviation of the distribution is the square root of the variance. For a different set of quiz scores (6, 7, 8, 8, 8, 8, 8, 8, 9, 10), the variance is 1 and the SD is 1. The standard deviation must fall between 0 and half the value of the range. If the standard deviation approaches 0, scores are very similar to each other and very close to the mean. If the standard deviation approaches half the value of the range, scores vary greatly from the mean. Frequency polygons with the same mean and the same range, but a different standard deviation, that are plotted on the same axes show a difference in variability by their shapes. The taller and narrower frequency polygon shows less variability and has a lower standard deviation than the short and wider one.

**How to Calculate Standard Deviation by Hand**

1. Calculate mean.

2. Calculate each deviation. Subtract your mean score from every actual (observed) score.

3. Square each deviation.

4. Find the “average“ squared deviation by calculating the sum of the squared deviations divided by (*n* - 1).

5. Divide that sum by the number of cases in your data.

6. Finally, calculate the square root of the number calculate in Step 4.

If you are not required to bring a calculator to the exam, you won’t be required to figure out variance or standard deviation.

**Correlation**

Scores can be reported in different ways. One example is the **standard score** or *z***score.** Standard scores enable psychologists to compare scores that are initially on different scales. For example, a *z* score of 1 for an IQ test might equal 115, while a *z* score of 1 for the SAT in mathematics might equal 600. The mean score of a distribution has a standard score of zero. A score that is one standard deviation above the mean has a *z* score of 1. A standard score is computed by subtracting the mean raw score of the distribution from the raw score of interest and then dividing the difference by the standard deviation of the distribution of raw scores. Another type of score, the **percentile score**, indicates the percentage of scores at or below a particular score. Thus, if you score at the 90th percentile, 90 percent of the scores are the same or below yours. Percentile scores vary from 1 to 99.

A statistical measure of the degree of relatedness or association between two sets of data, *X* and *Y*, is called the **correlation coefficient.** The correlation coefficient (*r*) varies from −1 to +1. One indicates a perfect relationship between the two sets of data. If the correlation coefficient is −1, that perfect relationship is indirect or inverse; as one variable increases, the other variable decreases. If the correlation coefficient (*r*) is +1, that perfect relationship is direct; as one variable increases, the other variable increases, and as one variable decreases, the other variable decreases. A correlation coefficient (*r*) of 0 indicates no relationship at all between the two variables. As the correlation coefficient approaches +1 or −1, the relationship between variables gets stronger. Correlation coefficients are useful because they enable psychologists to make predictions about *Y* when they know the value of *X* and the correlation coefficient. For example, if *r* = .9 for scores of students in an AP Biology class and for the same students in an AP Psychology class, a student who earns an A in biology probably earns an A in psychology, whereas a student who earns a D in biology probably earns a D in psychology. If *r* = .1 for scores of students in an English class and scores of the same students in an AP Calculus class, knowing the English grade doesn’t help predict the AP Calculus grade.

*Correlation does not imply causation.* Correlation indicates only that there is *a relationship between variables, not how the relationship came about.*

The strength and direction of correlations can be illustrated graphically in **scattergrams** or **scatterplots** in which paired *X* and *Y* scores for each subject are plotted as single points on a graph. The slope of a line that best fits the pattern of points suggests the degree and direction of the relationship between the two variables. The slope of the line for a perfect positive correlation is *r* = +1, as shown in Figure 5.2a. The slope of the line for a perfect negative correlation is *r* = −1, as shown in Figure 5.2b. Where dots are scattered all over the plot and no appropriate line can be drawn, *r* = 0 as shown in Figure 5.2c, which indicates no relationship between the two sets of data.

**Figure 5.2 (a) Scattergram for perfect positive correlation ( r = +1.0). (b) Scattergram for perfect negative correlation (r = —1.0). (c) Scattergram for no relationship between two sets of data (r = 0).**

**Inferential Statistics**

Inferential statistics are used to interpret data and draw conclusions. They tell psychologists whether or not they can generalize from the chosen sample to the whole population, if the sample actually represents the population. Inferential statistics use rules to evaluate the probability that a correlation or a difference between groups reflects a real relationship and not just the operation of chance factors on the particular sample that was chosen for study. **Statistical significance ( p)** is a measure of the likelihood that the difference between groups results from a real difference between the two groups rather than from chance alone. Results are likely to be statistically significant when there is a large difference between the means of the two frequency distributions, when their standard deviations are small, and when the samples are large. Some psychologists consider that results are significantly different only if the results have less than a 1 in 20 probability of being caused by chance (

*p*< .05). Others consider that results are significantly different only if the results have less than a 1 in 100 probability of being caused by chance (

*p*< .01). The lower the

*p*value, the less likely the results were due to chance. Results of research that are statistically significant may be practically important or trivial. Statistical significance does not imply that findings are really important.

**Meta-analysis**provides a way of statistically combining the results of individual research studies to reach an overall conclusion. Scientific conclusions are always tentative and open to change should better data come along. Good psychological research gives us an opportunity to learn the truth.