Deconstructing and Reconstructing Masculinity—Femininity - Masculinity and Femininity: Gender Within Gender

Gender, Nature, and Nurture - Richard A. Lippa 2014

Deconstructing and Reconstructing Masculinity—Femininity
Masculinity and Femininity: Gender Within Gender

Before describing my approach to masculinity and femininity in more detail, let's first pause and take stock of where we have been and consider the state of the field in recent years.


By the 1980s and 1990s, scholarly respect for the concepts of masculinity and femininity was clearly in decline. These traits were regarded as stereotypes more in people's heads than as real characteristics of people. Feminist psychologists ridiculed Terman and Miles' bipolar approach, and many seriously questioned androgyny research as well (Bem, 1993; Lewin, 1984a, 1984b; Morawski, 1987). A consensus was emerging that gender does not comprise core traits of the individual but rather it is a social construction manufactured and sustained by stereotypic beliefs and social settings (Deaux & LaFrance, 1998). This position proposes that differences in the behavior of men and women result largely from people's beliefs about gender (e.g., men are better at math), which then become self-fulfilling prophecies (see Chapters 3 and 5). Gender differences are further enforced by patriarchal (i.e., male dominated and male favoring) social structures, which give men more power than women. Social roles also serve to create and reinforce gender differences when they encourage instrumental behaviors In men (e.g., in the role of worker) and expressive behaviors in women (e.g., in the roles of mother and homemaker). Stated simply, gender is something that is done to us by society, not something we are born with.

What was the evidence for the social constructionist position? Many studies have suggested that gender-related traits and behaviors—nonverbal mannerisms, dress, interests, abilities, and personality traits such as assertiveness and nurturance—are only weakly interrelated and quite variable across situations. Richard Ashmore (1990) offered a loose glue metaphor; that is, the different components of gender—interests, attitudes, abilities, sexuality—do not really hang together very well, Janet Spence (1993) echoed this in her multifactorial theory of gender: "... knowing that a person... enjoys cooking tells us little about how much the person likes or dislikes studying math" (Spence & Buckner, 1995, p. 120).

If the various components of gender do not hang together very well, then the scientific case for masculinity and femininity seems to be in trouble because the defining feature of masculinity and femininity—indeed, of any personality trait—is that people show cohesive patterns of behavior that are consistent over time and across settings. Many recent gender researchers have argued that people do not behave in consistently masculine or feminine ways. Theorists like Sandra Bem (1987, 1993) have asserted the strong constructionist position that masculinity and femininity are all in our heads. Janet Spence and Camille Buckner (1995) went so far as to suggest that the terms masculinity and femininity should be abolished from the scientific vocabulary.

Resurrecting Masculinity—Femininity: Gender Diagnosticity

By the late 1980s I too was dissatisfied with existing approaches to masculinity and femininity. On the one hand, I was sympathetic to arguments that masculinity and femininity are social and cultural constructs. These traits do seem to possess a kind of fluidity that's hard to pin down. What's masculine in one historical era (e.g., long hair on men) may be feminine in another. And what's feminine in one culture (e.g., being a doctor) may be masculine in another.

On the other hand, masculinity and femininity still made sense to me as a lay person. As I observed people around me, I had the clear impression that some men were indeed more masculine than others and that some women were more feminine than others. For me, the paradox then became how can these traits be real and consequential, but at the same time culturally and historically variable? In nitty-gritty research terms the question became: How can research psychologists measure these traits, which seem so apparent to the untrained eye, yet so hard to pin down scientifically?

To answer these questions, I devised a new approach to measuring masculinity and femininity, an approach I termed gender diagnosticity (GD). This approach was a kind of compromise between essentialist and social constructionist views of M-F. The GD approach holds that masculinity-femininity exists and can be measured, but at the same time it varies somewhat over time and across groups and cultures.

What exactly is GD? It refers to the estimated probability that a person is male or female, based on some piece of gender-related information about the person. Examples of gender-related pieces of information include "this person wants to be a kindergarten teacher" or "this person has short hair." The gender diagnostic probability serves as a measure of masculinity or femininity within the sexes. The GD approach harks back to the bipolar approach to M-F in that it assumes that information that distinguishes the sexes can serve to measure masculinity and femininity within the sexes. However, it differs from the older bipolar approach in that it allows the information that defines masculinity and femininity to change over historical time and across different groups. The reason this is possible is that the GD approach always calibrates masculinity and femininity against a particular group of men and women (or boys and girls) at a particular time in history. In other words, it establishes local standards of masculinity and femininity.

An example makes this clearer. Suppose I place a person wrapped in a burlap bag before you. This ensures that you do not know whether the person is male or female. Then I give you just one piece of information: This individual is aggressive. I then pose the question, "What is the probability that this person is male?" Your estimate is to be based on actual data. For example, you conduct a survey in which you ask a group of 100 men and 100 women in your neighborhood to rate whether they are aggressive or not. You are then in a position to compute the likelihood that the aggressive person in burlap is male or female. Suppose your study shows that 60 men and 40 women in your neighborhood labeled themselves as aggressive. If the aggressive person wrapped in burlap is from your neighborhood, then the probability is 60% that he is a man and 40% that she is a woman.

This is the essence of GD. Clearly, gender diagnostic probabilities vary depending on the piece of information used to diagnose gender and also depending on the group of people you are studying. For example, if I told you that the person in the burlap bag is a Michigan State University student (the group being studied) who wants to be an electrical engineer (the piece of information), what would you estimate the probability to be that this person is male? To answer this question empirically, you would have to know the relative proportions of Michigan State men and women who actually want to be electrical engineers. (Even without knowing this information, what would you guess is the probability that this person is male?)

Once again, GD is the computed probability that a person is predicted (diagnosed) to be male or female based on some kind of gender-related information. In my research, I typically compute GD probabilities (GD scores) based on peoples' occupational and hobby preferences, using a statistical procedure called discriminant analysis. (For the technical details, see Lippa and Connelly [1990]). I compute these probabilities based on multiple pieces of information, for example, individuals' rated preferences for 70 different occupations. This allows me to compute reliable GD scores. Recall from our earlier discussion that good tests include many items so that they will yield reliable scores. Still, the basic concept remains the same; gender diagnosticity is the computed probability that a person is male or female based on a set (rather than a single piece) of gender-related information.

Unlike M-F scores, GD probabilities are always computed anew for a particular group of men and women. For example, the GD score of a college student at Michigan State University would be computed in comparison to a group of Michigan State men and women. Because GD measures are computed for particular groups of people, the way M-F is defined may vary from group to group. This is true because pieces of information that distinguish men and women in one group may not do so in another group.

Again, a concrete example helps illustrate this point, When Terman and Miles conducted their classic research over 60 years ago, college men and women showed a large difference in their desire to be lawyers, with men expressing greater interest in law than women. However, today this same piece of information is often not gender diagnostic. Contemporary college men and women do not differ much in their expressed interest in law as a profession. The moral of the story? We cannot necessarily use items that were gender diagnostic in the 1930s to measure masculinity and femininity at the start of the 21st century.

Consider another example. Wearing pants was undoubtedly more gender diagnostic 100 years ago than it is today. Many women wear pants today; however, few did in the late 1800s and early 1900s. Because wearing pants was more gender-diagnostic then than now, it was probably a better indicator of masculinity (at least for women) then than it is today. A woman who wore pants in the 1800s was probably viewed as extremely masculine. Today, a woman who wears pants may be seen as quite feminine, in the United States, at least. This qualification ("in the United States") suggests another interesting point. The behavior of wearing pants is probably more gender diagnostic in some countries and cultures (e.g., in Egypt) than in others (e.g., in the United States). Thus, wearing pants may signal a woman's masculinity more in some cultures than in others.

Once again, the GD approach computes the probability that a person is male or female, based on pieces of information that distinguish men and women in a particular group, in a particular culture, during a particular historical period. Another way of saying this is that the GD approach computes how male-like or female-like an individual is, compared to some local reference group of men and women, using some pieces of information that distinguish these men and women. The advantage of the GD approach is that it acknowledges that masculinity and femininity are, to some extent, historically and culturally relative.

Despite the fact that masculinity and femininity sometimes display themselves differently in different groups and cultures, the GD approach nonetheless asserts that individual differences in masculinity and femininity can be measured. In virtually all cultures and in all historical eras, there are some behaviors that are more typical of men and others that are more typical of women. If we measure individuals on those behaviors, we can compute the likelihood that a person is male or female based on these behaviors. That is, we can measure how male-typical or female-typical that person's behavior is for people in that culture.

Although it is true that some indicators of masculinity and femininity vary substantially over time and across cultures, it is also likely that some indicators do not. For example, the question—"How interested are you in being an electrical engineer?"—was highly gender diagnostic in the 1930s in the United States, and it remains true today. Of course, this does not mean that men and women's relative interest in being electrical engineers will never change in the future. However, it does suggest that some pieces of information may diagnose gender more consistently over time and place than others. Although the content of masculinity and femininity may fluctuate (as proposed by social constructionists), it may also have some consistency (as proposed by essentialists).

As a matter of convention, gender diagnostic probabilities are computed to be the individual's probability of being male, which is simply one minus the probability of being female. Thus, by convention, high probabilities mean that the individual is more male-like and low probabilities mean the individual is more female-like. Let's bring back our person in the burlap bag one last time. On a questionnaire, this individual has expressed a strong interest in being a Secret Service agent, a police officer, an auto mechanic, a truck driver, and an Army officer, but a strong dislike for being a florist, a nurse, an elementary school teacher, a professional dancer, or a librarian. What's your best estimate of the probability that this individual is male?

By comparing this person's occupational preference ratings with the ratings of a particular group of men and women, I can actually compute this probability. I have no doubt that, if computed for most groups of men and women in our society today, this person's GD score would be high (say 0.90). That is, this individual is very likely to be male. If a person's GD score is around 0.50, then his or her occupational preferences are neither strongly male- nor female-typical, in other words, we are not sure about the individual's gender based on the occupational preference information. Finally, if a person's GD score is low—say 0.20—then the person is likely to be a female. A person receiving a low GD score is very female-typical. In short, he or she is feminine.

Note that a man can receive a low GD score. A low score simply means that the man's occupational preferences are more female-like than male-like when compared to some larger group of men and women. Similarly, a woman can receive a high GD score; that is, she can be relatively male-like in her occupational preferences when compared to some larger group of men and women. Indeed, the whole purpose of computing GD scores is not to actually diagnose who is male and female. Real people, after all, are not wrapped in burlap bags. We usually know immediately whether they are male or female. The purpose of computing GD scores is to assess how male-like or female-like a particular man or woman is, that is, to measure individual differences in M-F.

What Is Gender Diagnosticity Related to?

Does the GD approach buy us anything more than previous approaches to masculinity and femininity have? I believe the answer to this question is, "yes." Many studies have shown that GD measures are reliable (Lippa, 1991, 1995b, 1998b; Lippa & Connelly, 1990); furthermore, GD does not correlate much with instrumentality, expressiveness, or the Big Five personality traits. Thus GD does not suffer from the "old wine in new bottles" problem. GD correlates moderately with bipolar M-F scales (Lippa & Hershberger, 1999; Lippa, Martin, & Friedman, 2000). However, GD often shows superior validity to these scales (Lippa, 1991, 1998b, 2001b).

To demonstrate the validity of a measure, researchers must show that it is related to traits, behaviors, and ratings that make theoretical sense. The most obvious way to show that a new measure of masculinity and femininity is valid is to demonstrate that it is related to lay people's judgments of their own and others' masculinity and femininity. Recall that Terman and Miles failed to demonstrate this with their early M-F test. In contrast, several studies have shown that GD is related to lay judgments of masculinity and femininity. In one of these studies, I asked 119 college men and 145 college women to rate how masculine and feminine they considered themselves to be. These ratings were then correlated with their M, F, and GD scores. The results showed that the men's and women's GD scores predicted their self-rated masculinity-femininity, better than M or F did (Lippa, 1991, see also Lippa, Linke, & Killingback, 2004).

Another study investigated the relationship between men's and women's GD scores and their nonverbal masculinity-femininity as judged by others (Lippa, 1998c). Thirty-four college men and 33 college women were briefly videotaped as they gave talks. Research assistants then viewed these videotapes and rated how masculine and feminine the college students appeared to be, based on their appearance, movements, and vocal style. The results showed that the videotaped students' GD scores significantly predicted how masculine and feminine they were judged to be, again better than their M or F scores did.

A third study asked 37 college men and 57 college women to create autobiographical photo essays (Lippa, 1997). Each student took 12 photographs that showed who they are and assembled them into a booklet with captions. Research assistants then read the photo essays and rated how masculine and feminine the students seemed to be, based on the information in their photo essays. The results showed that college men's GD scores strongly predicted how masculine and feminine they were judged to be, again, much more strongly than their M or F scores did. However, women's GD (and M and F) scores only modestly predicted their rated masculinity and femininity. These different results for women seemed to reflect the fact that women's judged masculinity-femininity was influenced by their physical attractiveness. Women were judged to be feminine based on how pretty they were, not on the degree to which they displayed feminine behaviors and interests in their photo essays.

Additional validity studies have addressed whether GD is related to psychological adjustment, physical health, sexual orientation, scholastic ability, and intelligence. Let's start with GD and adjustment. In two separate studies, I measured large groups of college students on gender diagnosticity, masculinity, and femininity and examined whether these traits were related to various measures of psychological adjustment (Lippa, 1995b). Recall that previous research often focused on self-esteem, anxiety, and depression as indices of psychological adjustment. Many studies have shown that all these seemingly different measures in fact tap one broad, underlying personality factor, which is called neuroticism or negative affectivity. (Negative affectivity means negative emotionality; see Watson & Clark, 1984, 1997). Like earlier studies, my study included various measures of negative affectivity. However, I also included measures of aggressiveness, meanness, overbearingness, vindictiveness, and unassertiveness. In one study, I included a measure of authoritarianism, a trait linked to rigid, conventional thought patterns and prejudice against minority groups (see Chapter 1).

What were the results? Measures of masculinity and femininity (instrumentality, expressiveness, and GD) were in fact related to various kinds of adjustment. However, each measure related to different kinds of adjustment. People who were high on masculinity tended to be aggressive and overbearing (showing negative adjustment), but they also tended to be appropriately assertive and low on neuroticism (positive adjustment). People who were high on femininity tended to be overly involved with others and too easily taken advantage of (negative adjustment), but they also tended to be agreeable (positive adjustment). Thus, instrumentality and expressiveness (i.e., masculinity and femininity) prove to be two-edged swords in the sense that they are linked to both positive adjustment and negative traits. In contrast, GD is related to only one kind of maladjustment, and this finding was true for men only. High-GD (i.e., masculine interest) men tended to be authoritarian. This result was bolstered by data showing that high-GD in men was associated with increased prejudice against gay people and negative attitudes toward women's rights (see also, Lippa & Arad, 1999).

A recent study by Robert Young and Helen Sweeting (2004) similarly pointed to the conclusion that instrumentality, expressiveness, and GD are linked to different kinds of adjustment, this time, in 15-year-old Scottish boys and girls. This study assessed instrumentality, expressiveness, and GD (based on self-reported leisure and sports activities) in 1.116 male and 1,080 female secondary school students. Participants reported whether they bullied other students or were victimized by bullies, and they also completed scales that measured their overall levels of self-esteem and depression.

The results? Students who were high on instrumentality were more likely to bully others, whereas students who were high on expressiveness were less likely to bully others. Consistent with previous research, instrumentality, in both boys arid girls, was associated with higher self-esteem. An interesting new finding was that, among both boys and girls, male-typical GD scores (i.e., masculine interests) were associated with less depression. Finally, gender-atypical boys (i.e., boys whose GD scores placed them among the most feminine 10% of boys) were significantly more likely to be bullied by other students and to report being lonely. These findings suggest that gender-nonconforming adolescent boys (those with very feminine GD scores) may be more likely to suffer victimization and social isolation than less feminine boys. In sum, instrumentality, expressiveness, and GD were all linked to various kinds of adjustment and maladjustment in high school boys and girls; however, each measure tended to be linked to different kinds of adjustment and maladjustment.

Following Terman and Miles' lead, I conducted a series of studies that investigated whether masculinity and femininity—this time, as assessed by GD—are related to sexual orientation. I found that GD measures are in fact strongly related to sexual orientation in both men and women (Lippa, 2000, 2002; Lip pa & Tan, 2001). One study assessed GD, M, F, and sexual orientation in an unselected sample of more than 700 college students. Two additional studies solicited participation from large groups of gay and lesbian volunteers and compared their GD, masculinity, and feminine scores with heterosexual men and women. All studies showed that gay men have considerably more feminine GD scores than heterosexual men do, and lesbian women have considerably more masculine scores than heterosexual women do (d statistics for the homosexual-heterosexual differences were often greater than 1.0). Furthermore, GD proved to be much more strongly linked to sexual orientation than instrumentality or expressiveness scores were.

In one study, I compared instrumentality, expressiveness, and GD scores In transsexual and non-transsexual individuals (Lippa, 2001a). It seemed a reasonable prediction that male-to-female transsexuals (individuals who are genetically male, but who wish to live as females and, sometimes, to be surgically reassigned to be females) would be more feminine than the average man is. Similarly, it seemed a reasonable prediction that female-to-male transsexuals would be more masculine than the average woman is. The results of my study showed that gender-related interests (GD measures) distinguished between transsexual and non-transsexual individuals much more strongly than instrumentality and expressiveness did. Thus, male-to-female transsexuals have much more female-typical interests than the average man does, whereas female-to-male transsexuals have much more male-typical interests than the average woman does.

In another line of M-F research, I investigated possible links between masculinity and mortality. Could it be that masculinity is linked to physical illness as well as to psychological maladjustment? Is it possible that masculine men and women die younger than their more feminine counterparts, just as men on average die younger than women? Some recent evidence suggests that certain masculine traits (e.g., negative instrumental traits such as arrogance and egotism) are related to health risks such as smoking, hostility, and poor social relations (Helgeson, 1994b). But is masculinity actually related to a person's likelihood of dying at any given age?

A study conducted by Leslie Martin, Howard Friedman, and myself suggested that the answer to this question is in fact, "yes" (Lippa, Martin. & Friedman, 2000). To reach our conclusion, we analyzed data from Lewis Terman's classic gifted children study. That is, we returned to the data that had triggered masculinity-femininity research 80 years ago to uncover new facts about masculinity today. Although most of Terman's gifted children have died by now, the data collected from them lives on, safely stored in archives at Stanford University. In recent years, these data have been used to study psychological factors that influence health and longevity (Friedman, et al., 1995). My colleagues and I used these data to investigate the possible link between masculinity and mortality.

How did we measure masculinity? In 1940, Terman and his associates administered the Strong Vocational Interest Blank to many of his gifted children, who were by then about 30 years old. Using these archival data, we were able to compute GD scores for these subjects based on their occupational preference ratings. Because the Terman archives include records of participants' deaths, we were in a good position to investigate whether masculinity was related to mortality. Our results were quite clear: high GD was linked to higher mortality rates in both men and women.

What else is GD related to? Following the lead of earlier research, I investigated whether GD is related to scholastic aptitude and intelligence. Consistent with previous findings, I found that high school boys who are feminine and girls who are masculine tend to score higher than their more sex-typed peers on the National Merit Qualifying Test (Lippa, 1998a).

Is There a "Deep Structure" to Masculinity—Femininity?

One strength of the GD approach is that it allows masculinity and femininity to vary over time and across groups of people. However, this flexibility carries with it a price; masculinity and femininity may seem to be shifting targets that have no stable core to them. Is there in fact a core to masculinity and femininity, as measured by GD? Recall that GD is typically computed from men's and women's occupational and hobby preferences (which I'll call interests). In recent years, there has been increased interest in how people's interests relate to other broad personality dimensions (Ackerman, 1997).

One model has dominated research on occupational preferences and interests over the past 30 years: John Holland's (1992) hexagon or RIASEC model (see Chapter 1). Holland argued that there are six basic kinds of occupations: realistic, investigative, artistic, social, enterprising, and conventional. (The RIASEC acronym is constructed from the first letters of each of these six kinds.) Figure 2.1 provides a brief description of each type of occupation.

Holland's model proposes that people's patterns of occupational likes and dislikes can be schematically summarized by a hexagon. On the one hand, if two RIASEC occupational types are next to each other on the hexagon (e.g., artistic and social occupations), then people's preference for these kinds of occupations are likely to be similar. On the other hand, if two RIASEC categories are opposite each other on the hexagon (e.g., realistic and social occupations), then people's preferences for these kinds of occupations are likely to be unrelated or even opposite. Many studies have confirmed that people's occupational preferences do in fact follow the pattern suggested by Holland's model.

In the early 1980s, Dale Prediger (1982) proposed two fundamental dimensions underlying Holland's hexagon, which he labeled the people-things dimension and the ideas-data dimension (see Fig 2.1). The people-things dimension taps how much people like occupations that deal with people (e.g., teacher, counselor, manager) versus occupations that deal with inanimate things (e.g., machinist, scientist, computer programmer mechanic, and farmer). The ideas-data dimension taps how much people like occupations that deal with creative thinking (e.g., scientist, researcher, artist) versus occupations that deal with record-keeping and data management (e.g., cierk, bookkeeper, secretary, accountant). In a sense, Prediger proposed a two-dimensional deep structure to people's occupational preferences.


FIG. 2.1 Holland's six kinds of occupations.

Given that GD is often computed from occupational preferences, it seemed reasonable to ask, What is the relationship between GD measures and Prediger's two dimensions? In three separate studies, I sought answers to this question (Lippa, 1998b). I found that GD correlates strongly with the people-things dimension but not at all with the ideas-data dimension, Thus my current working hypothesis is that GD is fundamentally related to the people-things dimension of occupational preferences and interests (Lippa, in press).

It is important also to emphasize what the people-things dimension is not. It is not extraversion or sociability or instrumentality (which are measured by M scales). Nor is it agreeableness or expressiveness (which are measured by F scales). Rather, it is some basic mental and attitudinal stance toward activities that involve people versus activities that involve mechanical things. By implication, I think it taps a person's desire to deal with and think about the fuzzy, messy, and ambiguous world of human motives, thoughts, and feelings versus the more clear-cut, precise, and deterministic world of mechanical and physical phenomena. The first is feminine, the latter masculine. And on this dimension, I have no doubt where Marcel Proust would fall.