Gender, Nature, and Nurture - Richard A. Lippa 2014
Combining the Results of Many Studies: Meta-Analysis
What's the Difference Anyway?
It is a truism in science that no single study can definitively answer any question, and this is certainly true in the study of sex differences. Are men more physically aggressive than women? No single study can answer this question. Still, many individual studies have addressed this question, either directly or indirectly. To complicate matters, however, various studies have measured different kinds of aggression, and even when they have measured the same kinds of aggression, various studies may have measured aggression differently and with different degrees of precision.
For example, some studies have asked people to report their levels of aggression on questionnaire scales. Psychology experiments have sometimes placed college men and women in settings where they deliver what seem to be painful electric shocks to obnoxious partners during experimental games. Studies of children have asked their parents and teachers to rate them on aggressiveness. Still other studies have analyzed statistics about sex differences in real-life aggressive behaviors, such as criminal assaults and murders. Therefore, when trying to summarize observed sex differences in aggression, social scientists face a problem of trying to combine apples and oranges—different results based on different measures of aggression, which have been obtained in different studies, from different populations, under different circumstances.
This "apples and oranges" problem is not insurmountable. In trying to summarize the results of various studies, researchers can focus their attention on a uniform group of studies (e.g., on experimental studies of aggression conducted on adult participants only). Whichever studies are to be summarized, it is important that researchers scale sex differences the same way across studies. This is why the d statistic is so important. In various studies, if groups of men and women (or boys and girls) have had their aggression measured, it is generally possible to compute a d statistic. Then researchers can average the d statistics from the various studies to see what the average findings are. This technique of quantitatively combining (i.e., numerically averaging) the results of many different studies is called meta-analysis. Over the past 20 years, meta-analysis has become a very important method for reviewing and synthesizing research findings in the social and biological sciences (Hunt, 1999).
In one meta-analysis of 64 experimental studies that reported sex differences in aggression, psychologists Alice Eagly and Valerie Steffen (1986) computed the average value of d across studies to be 0,29, with men tending to be more aggressive than women. This value of d implies that, on average, about 39% of women are more aggressive than the average man, or conversely, that 61% of men are more aggressive than the average woman. Sex differences in aggression were sufficiently consistent across the 64 studies for Eagly and Steffen to conclude that these differences were very unlikely to be due to chance. In the language of statistics, the overall sex difference in aggression was found to be statistically significant. We can therefore conclude with some confidence, based on this synthesis of 64 studies, that men are, on average, somewhat more aggressive than women in experiments on aggression.
Meta-analysis is useful because it not only provides us with the average results of many studies but also helps us understand why results vary across studies. As noted earlier, studies investigating sex differences in aggression differ from one another in their subjects, methods, settings, and measures of aggression. Such differences can be coded (i.e., assessed and quantified based on the published research reports) and then included as factors to be analyzed in a meta-analysis. For example, meta-analyses of sex differences in aggression have coded studies based on whether they studied physical or verbal aggression. Their results showed that sex differences (i.e., d values) are larger in studies that measure physical aggressiveness (d = 0.40) and smaller in studies that measure verbal aggressiveness (d = 0.18). Thus, meta-analyses conclude that men are more physically aggressive than women.
At about the same time that Eagly and Steffen (1986) published their meta-analysis, University of Wisconsin psychologist Janet Shibley Hyde (1986) published another meta-analysis on sex differences in aggression. Hyde reported a somewhat larger mean sex difference in aggressiveness (d = 0.50). For this value of d, only 30% of females are more aggressive than the average male, and 70% of males are more aggressive than the average female.
Why the difference between Hyde's findings and those of Eagly and Steffen? One answer is that Hyde's meta-analysis included studies of children, whereas Eagly and Steffan's meta-analysis looked only at studies of adolescents and adults, indeed, in an earlier meta-analysis Hyde (1984) broke down studies by subjects' age and she found that sex differences in aggression were large in children aged 4 through 5 years (d = 0.86), moderate in children age 9 through 12 years (d = 0.54), and smallest for college-age subjects (d = 0.27). Findings such as these begin to offer hints about factors that influence sex differences in aggression.
In a still more recent meta-analysis, British psychologist John Archer (Archer & Mehdikhani, 2004) summarized the results of nonexperimental studies of real-life kinds of aggression, as assessed by direct observations of aggression, self-reports, peer reports, and teacher reports. Compared with previous meta-analyses, Archer found somewhat Larger sex differences in physical aggression; d = 0.53 for direct observations of aggression, d = 0.39 for self-reports of aggression, d = 0.84 for peer reports of aggression, and d = 0.40 for teacher reports of aggression. The d value for observed aggression implies that 70% of males are more aggressive than the average female.
This tale of three meta-analyses makes an important point; the results of meta-analyses depend in part on the studies that are fed into them. One of the first steps in conducting any meta-analysis is to identify the studies to be reviewed, which ideally include all the studies ever conducted on a given research topic. Identifying studies has been made easier by computerized citation searches. Using a computerized search, a researcher could, for example, search for any study published over the past 10 years that includes in its abstract words or phrases such as gender differences, sex difference, aggression, hostility, and so on. Computer searches, however, are unlikely to locate all of the studies carried out on a given topic. Some studies are never published. Some studies on aggression have looked at sex differences only incidentally, thus, the sex differences they find may not be reported in the study's abstract. Inevitably, computerized searches miss some relevant studies.
There is a final important point worth making about meta-analytic summaries of sex differences: They may sometimes underestimate effect sizes because of the unreliability of the measures obtained in various studies. This underestimation may be particularly true for meta-analyses of experimental studies, which tend to use one-shot, one-time, single-act measures. Research shows that aggregated measures (i.e., summed or averaged measures of many behaviors from a particular individual) tend to be more reliable than single measures of behavior (Rushton, Brainerd, & Pressley, 1983). A good example comes from a study of antisocial behaviors in almost 1,000 New Zealand boys and girls (Moffitt, Caspi, Rutter, & Silva, 2001). When sex differences were examined for single measures of antisocial behaviors (e.g., parents' reports, teachers' reports, peers' reports, and self-reports at a given age), their mean magnitude was about d = 0.25, which is considered small. However, when a composite, trait-like measure of antisocial behavior was formed by summing measures from various sources over several ages, the observed sex difference almost doubled to d = 0.49, which is considered a medium-sized effect. Unfortunately, single-source, singie-act, and single-time measures are more common in many studies than highly reliable, aggregated measures are; therefore, meta-analyses may often underestimate the sizes of sex differences in gender-related traits when they do not take the unreliability of measurements into account.