Is There a Language Module? - The Language Module Reconsidered

The Adaptable Mind: What Neuroplasticity and Neural Reuse Tell Us about Language and Cognition - John Zerilli 2021

Is There a Language Module?
The Language Module Reconsidered

As we saw in Chapter 4, certain areas of the brain have long been regarded as quintessentially language areas. For many researchers this assumption and the conviction that there is far more nature than nurture involved in language acquisition have sat cheek by jowl. In the last two decades, however, the standard view of how language is organized and processed in the brain, as well as how it is acquired, has changed dramatically. This is so for at least two reasons (Kuhl & Damasio 2013). First, neuroimaging evidence in the form of electroencephalography, magnetoencephalography, positron emission tomography, and (increasingly) functional magnetic resonance imaging has furnished a wealth of information about how and where language is processed in real time in the brains of patients carrying out linguistic tasks. The picture that emerges here is very unlike the one bequeathed by Paul Broca and Carl Wernicke. Second, psycholinguistic evidence is much richer and subtler than what was available in previous decades. It reveals that infants begin learning language from the moment they come into contact with the sound inventories of their native tongue, indeed, even in utero. It appears that the early sensitivity of a fetus to features of intonation may later help the infant learn its mother tongue (Mampe et al. 2009). For instance, the French “papa” has a delayed stress, and a rising intonation, while the German has an early stress, and a falling intonation. When an infant begins to form its first sounds, it can build on melodic patterns that are thus already familiar, and so does not have to start from scratch when learning phonological and morphological regularities (the investigators suspect the evolutionary roots of this behavior to be older than the emergence of spoken language). I shall say a little more on the acquisition issue in my next section. Here I shall focus on organization, and review evidence of the extensive reuse of what were traditionally regarded typical language circuits.

Plausibly, the more distributed a system in the brain, the more likely it is that it will not be a specialized system (Anderson 2010; see my Chapter 3). It is now known that language is one of the most distributed systems in the brain, and that “the operation of language to its full extent requires a much more extended network than what [classical models have] assumed” (Hagoort & Indefrey 2014, p. 359; Anderson 2010, p. 247). As Hagoort and Indefrey summarized the emerging consensus:

The basic principle of brain organization for higher cognitive functions proposes that these functions are based on the interaction between numerous neuronal circuits and brain regions that support various contributing functional components. These circuits are not necessarily specialized for language but nevertheless need to be recruited for the sake of successful language processing. (2014, p. 359)

The evidence motivating this principle in turn corroborates the prediction that more recently evolved functions should be more distributed than older ones, since it should overall prove easier to exploit existing circuits than to have to evolve custom-made ones, with there being “little reason to suppose that the useful elements will happen to reside in neighboring brain regions. . . . [A] more localist account of the evolution of the brain would . . . expect the continual development of new, largely dedicated neural circuits” for new functions (Anderson 2010, p. 246). Anderson’s review of some 1500 subtraction-based fMRI experiments suggests that language could well be the paradigm of distributed processing, supported by more distributed activations than visual perception and attention are (Anderson 2007a) and indeed than any other domain that was tested, including reasoning, memory, emotion, mental imagery, and action (Anderson 2008).

Broca’s area holds a special place in the tradition of modular theorizing about language. While it cannot be doubted that the area plays a crucial role in language processing, it is also implicated in various action- and imagery-related tasks such as those involving the preparation of movement (Thoenissen et al. 2002), the sequencing of actions (Nishitani et al. 2005), the recognition of actions (Decety et al. 1997; Nishitani et al. 2005), imagery of motion (Binkofski et al. 2000), and the imitation of actions (Nishitani et al. 2005). It is also known to be involved in certain memory tasks (Kaan & Stowe 2002) as well as in music perception (Maess et al. 2001). Kaan and Swaab (2002) set out to identify whether syntactic processing is localized in the brain, and found that while Broca’s area is recruited during syntactic processing tasks, it joins a larger brain network that includes the anterior, middle, and superior areas of the temporal lobes, none of which in turn appears to be syntax-specific.

In the auditory domain, phoneme discrimination has long impressed perceptual psychologists. It involves “categorical perception”; that is, “the segmenting of a signal that varies continuously along a number of physical dimensions . . . into discrete categories, so that signals within the category are counted as the same, even though acoustically, they may differ from one another more than do two signals in different categories” (Cowie 2008, at § 3.3.4). Fiona Cowie, no fan of linguistic nativism, accepts that there is a “quite substantial . . . inborn contribution to phonological learning.” But, as she goes on to discuss:

. . . is this inborn contribution to phonological learning language specific[?]. . . . [T]o this question, the answer appears to be “No.” First, the “chunking” of continuously varying stimuli into discrete categories is a feature not just of speech perception, but of human perception generally. For instance, it has been demonstrated in the perception of non-linguistic sounds, like musical pitch, key and melody, and meaningless chirps and bleats. . . . It has also been demonstrated in the processing of visual stimuli like faces. . . . Secondly, it is known that other animals too perceive categorically. For instance, crickets segment conspecific songs in terms of frequency . . . swamp sparrows “chunk” notes of differing durations. . . . [O]ther species respond categorically to human speech! Chinchillas . . . and cotton-top tamarins . . . make similar phonological distinctions to those made by human infants. (Cowie 2008, at § 3.3.4)

A recent experiment found that early exposure to multiple languages heightens acoustic sensitivity generally (Liu & Kager 2016). In particular, bilingual children appear more sensitive to subtle variations in musical pitch than their monolingual counterparts.

There is something especially piquant in discovering that classic sensory and motor areas play a key role in higher thought. In Chapter 2, I reviewed evidence of the role of vision in semantics. Damasio and Martin demonstrated over two decades ago that visual areas are active during noun-processing tasks (e.g., naming colors, animals, etc.) (Damasio & Tranel 1993; Damasio et al. 1996; Martin et al. 1995, 1996, 2000). We saw that word-generation in sighted subjects depends at least in part on the bilateral occipital cortices, regions that have always been thought to be the most specialized in the brain (Pascual-Leone et al. 2005, p. 394). Beyond the association with phylogenetically older sensory and perceptual functions, language also seems to have been originally bound up with the motor system, for motor circuits still appear to be crucial to language perception and comprehension on many levels of processing (as indeed the functional profile of Broca’s area would tend to suggest). Pulvermüller and Fadiga (2010) report that at the level of speech perception and processing, changes in the motor and premotor cortex lead to deficits in phoneme discrimination (2010, pp. 353—355). There is also evidence that the acoustic properties of phonemes have been shaped to some extent by postural aspects of the motor system (Graziano et al. 2002; MacNeilage 1998). At the level of semantic comprehension, magnetic stimulation of the motor system influences the recognition of semantic word categories (Pulvermüller & Fadiga 2010, pp. 355—357). Pulvermüller (2005) earlier reported evidence that hearing the words “lick,” “pick” and “kick,” in that order, activates successively more of the primary motor cortex, suggesting both that the motor regions involved are inherent to the comprehension task and that comprehension may involve some kind of simulation. Glenberg et al. (2008) report similar findings, in particular how the use-driven plasticity of motor circuits affects abstract and concrete language processing. Incidentally, it has been demonstrated that reading comprehension improves when children are allowed to manipulate physical objects (Glenberg et al. 2007). Finally, syntactic processing seems to depend in important ways upon the perisylvian cortex, which is involved in the processing of hierarchically structured action sequences (e.g., lifting a cup, turning it this way, etc., as guided by the overall aim of quenching thirst) (Pulvermüller & Fadiga 2010, pp. 357—358). And it is known that both word- and object-combining have overlapping neural implementations (Greenfield 1991). (I review more evidence of the motor—syntax connection in my discussion of sequence learning later in this chapter.) Taken together, these results suggest that the motor system enters nontrivially into the perception and comprehension of language at various levels of processing, including phonological, semantic, and syntactic levels (Knott 2012).

This brings us back to Broca’s area. I have already reviewed evidence attesting to its functional complexity and its importance in action sequencing. A natural response of those committed to the specificity of language circuits would be to concede all of this reuse, and say simply that what we are witnessing is the reuse of linguistic circuits for other, nonlinguistic functions. A subtle variant of this idea lies behind the contention that Merge may be the source of productivity and generativity in nonlinguistic domains. As Brattico and Liikkanen pose the issue:

To how many cognitive domains can this combinatorial operation be applied? In principle, there seems to be no limit, provided that the appropriate interface mechanisms are in place. This architecture of language makes it easy to imagine a recursive symbol processor which can create productive behavior in several cognitive domains depending on which type of symbols it applies to and which type of interfaces it is required to handle. (2009, p. 262)

Following Chomsky, they opine that it might have been the application of Merge to concepts that yielded the “explosive growth of the capacities of thought . . . leading to the liberty of the imagination to transpose and change its ideas,” which, as suggested by Hume, could generate such imaginary objects as “winged horses, fiery dragons, and monstrous giants.” When Merge is emptied of all content, the result is the system of natural numbers.8 And so on. The research just mentioned, highlighting the indispensable contribution of primitive sensory-motor areas for syntactic and semantic processing, suggests that the argument is skewed, for it tends to imply that Merge, recursion, metarepresentation, or whatever generative engine happens to be invoked to account for linguistic productivity—with Broca’s area providing its most likely neurological basis (see, e.g., Brattico & Liikkanen 2009, p. 273)—is some sort of ELU; i.e., an integrated, dedicated, self-contained computational mechanism, perhaps dissociable from core motor operations (see, e.g., Berwick & Chomsky 2016, pp. 75—77). If far more evolutionarily primitive mechanisms are behind crucial aspects of linguistic processing at the highest levels, this seems very suppositious. It is more plausible (i.e., parsimonious) to assume that linguistic productivity was assembled from prior sensory-motor materials, with Broca’s area providing a rich source of sequence-processing power (see further in this chapter). That is to say, the role of Broca’s area in language is evidence that it already performed just the kind of sensory-motor functions that made it ideal for integration within a larger language network (Müller & Basho 2004).

This conjecture is rendered more plausible when one reflects further on the deep connections between syntactic structure and motor sequences.9 Even in lower organisms, motion is never haphazard and shambolic; it is always coordinated, structured, and systematic relative to the organism’s aims and the needs of survival. Coordination is intrinsic to motor function, a basic prerequisite of meaningful action. Basic body acts form “action chains” of “meaningful goal-directed action sequence[s],” as exemplified in the drinking-from-a-cup action sequence mentioned earlier (Pulvermüller and Fadiga 2010, p. 357). A center-embedded sentence (The man {whom the dog chased away} ran away) parallels the nested structure of a typical jazz piece (theme {solo} modified theme) and the action chain formed when entering a dark room at night (open the door {switch on the light} close the door); in each of these cases, “a superordinate sequence surrounds a nested action or sequence” (Pulvermüller and Fadiga 2010, p. 357). Indeed, the patterns of coordination and subordination within many complex/cumulative sentences are often deliberately designed to evoke the actions they describe, a device familiar to writers and on display in the best literature (Landon 2013). It should not come as a surprise, then, that syntax recruits the same areas of the brain that are essential for the planning and coordination of movement. Christiansen and Chater (2016) go a little further, placing sequence learning at center-stage of their account of linguistic productivity. They think complex sequence learning amply explains our ability to process recursive structures, and (consistent with my theme) that recursivity “relies on evolutionarily older abilities for dealing with temporally presented sequences of input” (2016, p. 204). There is a wealth of comparative and genetic evidence—quite apart from the neural evidence I have dwelt on up to this point—that can also be marshalled in support of the idea that language makes heavy demands on our complex sequence-learning abilities. What is currently known of the FOXP2 gene is consistent with a human adaptation for sequential processing (Fisher & Scharff 2009). It is well known that mutations of the gene produce severe speech and orofacial impairments (Lai et al. 2001; MacDermot et al. 2005). Moreover, when the homologous gene was inserted into mice, the mice displayed superior learning abilities for action sequences (Schreiweis et al. 2014). Specific language impairment (SLI), for its part, seems to be the result of a clear sequence-processing deficit (Hsu et al. 2014). Further neural evidence of a shared basis for language and general sequence learning is also available (see the review in Christiansen & Chater 2016, pp. 206—207). For example, syntactic and sequencing abilities do not appear to dissociate: when one gets knocked out, chances are the other does, too.

Notice, by the way, that (without prejudging the issue) this account is compatible with the idea that aspects of arithmetic, conceptual thought, musical syntax, and so on, could be exaptations of prior sequence-learning capabilities, whether via language or some other (perhaps more direct) phylogenetic route. Certainly recursive and metarepresentational capacities seem to crop up elsewhere in cognition, well outside the domains of language and thought, such as mental time travel, theory of mind/sociality, culture, and morality (Suddendorf 2013).

Thus far I have been largely concerned with the neuroimaging and biobehavioral evidence against linguistic modularity. For the remainder of this section, I shall very briefly mention a few arguments based on other considerations; namely, those arising from evolutionary theory and computational modeling. In the upcoming section, I shall address the matter of innateness. The final section introduces my Redundancy Model to account for the rare but still important evidence of cognitive dissociations, as well as other phenomena not easily explicable without some such account.

It is widely accepted that, of all human phenotypes, language is one of relatively recent origin, certainly far more recently evolved than basic sensory-motor, memory, and conceptual systems. Even if one adopts the view that language and the physiological mechanisms required to support complex vocalizations evolved together (i.e., that language and speech co-evolved), by any account, language is a phylogenetically recent phenomenon—de Boer (2016) thinks it is as old as the adaptations for complex vocalizations and places its emergence at around 400,000 years ago. This fact at once suggests that specific cognitive adaptations for language are unlikely, essentially for the reasons already given: it is generally easier for evolution to reuse and exapt existing resources than to have to evolve them from scratch (Anderson 2010). But other reasons support this conclusion as well. For adaptations to arise, evolution requires a stable environment (Sterelny 2006; Christiansen & Chater 2016). An adaptation for language would require a linguistically stable environment, but language and cultural environments generally are anything but stable, with both words and structural features of languages subject to swift changes, and cultures subject to significant shifts of convention, often even intragenerationally (Dunn et al. 2011; Greenhill et al. 2010; Sterelny 2012). In fact, when it comes to cultural environments, plasticity is typically favored over robustness—changes that allow the organism to cope with unpredictable variations in the local environment are favored over specific adaptations narrowly tailored to that environment, unless of course the culture does provide a stable target over which selection can operate (see § 7.4).

Changing tack somewhat, advances in computational neuroscience have uncovered a core set of standard, “canonical” neural computations. These computations are “combined and repeated across brain regions and modalities to apply similar operations to different problems” (Carandini 2015, p. 179). One example of a canonical computation, particularly in sensory systems, is “filtering.” This is a basic connectionist operation in which neurons perform a weighted sum on sensory inputs. The weights are called “receptive fields,” and the process is performed across the visual, auditory, somatosensory, and possibly motor systems—systems most of which we have seen are important and even crucial to language processing. Another canonical computation is “divisive normalization.” This involves dividing neuronal responses by a common factor; namely, the summed activity of a specific collection of neurons. The process is considered important to operations as varied as “the representation of odours, the deployment of visual attention, the encoding of value, and the integration of multisensory information” (Carandini 2015, p. 180). Other examples would include predictive coding, which has certainly received its fair share of attention in recent years (Clark 2013), as well as “exponentiation, recurrent amplification, associative learning rules, cognitive maps, coincidence detection, top-down gain changes, population vectors, and constrained trajectories in dynamical systems” (Carandini 2015, p. 180). What all this shows is that, at levels of explanation not too far down—we are still at the “algorithmic” level here, not quite yet at the circuit or cell level—there are fundamental computations intrinsic to the functioning of the brain that cut across various modalities, very probably including language. Cognitive operations thus look set to share many of their underlying computations with other domains, even with domains whose physical resources they do not actually share.

While we are on the topic, it might be good to mention Spaun again (the brain simulation we met in Chapter 3). Spaun makes a different point, one I have been at pains to make in this section, this chapter, and indeed throughout the whole book. Spaun has been successful in showing that a computer can employ fully domain-general learning principles, for Spaun reuses the same circuits to accomplish very different functions (cf. Pinker 1994). As I explained in Chapter 3, most machines are good at doing just one thing (playing chess, solving equations, etc.). Spaun is unique both in the variety of the tasks it can perform and in its ability to learn new tasks using the same set of circuits. It is the first major step in answering an important challenge leveled by evolutionary psychologists and other proponents of traditional forms of modularity who for many years said that such a machine could not be designed (virtually on a priori grounds!). Well, Spaun is a machine that functions entirely by domain-general principles. (Quod erat demonstrandum.)