Accounting for Linguistic Modularization - The Language Module Reconsidered

The Adaptable Mind: What Neuroplasticity and Neural Reuse Tell Us about Language and Cognition - John Zerilli 2021

Accounting for Linguistic Modularization
The Language Module Reconsidered

Throughout this chapter, I have been investigating a very particular question, and an important one: Does language rely on specialized cognitive and neural machinery, or does it rely on the same machinery that allows us to get by in other domains of human endeavor? The question is bound up with many other questions of no less importance: questions concerning the uniqueness of the human mind, the course of biological evolution, and the power of human culture. What is perhaps a little unusual about this question, however—unusual for a question whose answer concerns those working in both the sciences and the humanities—is that it can be phrased as a polar interrogative; i.e., as a question that admits of a yes or no response. And indeed the question has divided psychologists, linguists, and the cognitive science community generally for many decades now, more or less into two camps. In this concluding section, I would like to sketch the beginnings of an answer to this question in a way that does not pretend it can receive a simple yes or no answer. Let me stress again that neural reuse is undeniable, that the evidence for it is simply overwhelming, and that it has left no domain of psychology untouched. There seems to be nothing so specialized in the cortex that it cannot be repurposed to meet new challenges. In that regard, to be sure, what I am proposing in this section is unapologetically on the side of those who maintain that language is not special—that there is no specialized “language organ” or ELU. And yet I would like to carefully distinguish this claim from the claim that there are no areas of the brain that subserve exclusively linguistic functions. The neuropsychological literature offers striking examples of what appear to be fairly clean dissociations between linguistic and nonlinguistic capacities; i.e., cases in which language-processing capacities appear to be disrupted without impeding other cognitive abilities, and cases in which the reverse situation holds (Fedorenko et al. 2011; Hickok & Poeppel 2000; Poeppel 2001; Varley et al. 2005; Luria et al. 1965; Peretz & Coltheart 2003; Apperly et al. 2006). An example would be where the ability to hear words is disrupted, but the ability to recognize non-word sounds is spared (Hickok & Poeppel 2000; Poeppel 2001). Discussing such cases, Pinker and Jackendoff (2005, p. 207) add that “[c]ases of amusia and auditory agnosia, in which patients can understand speech yet fail to appreciate music or recognize environmental sounds . . . show that speech and non-speech perception in fact doubly dissociate.” Although, as we saw in Chapter 4, dissociations are compatible with reuse—indeed, there is work suggesting that focal lesions can produce specific cognitive impairments within a range of non-classical architectures (Plaut 1995)—and it is equally true that often the dissociations reported are noisy (Cowie 2008, at § 3.6.3), still, their very ubiquity needs to be taken seriously and accounted for in a more systematic fashion than many defenders of reuse have been willing to do (see, e.g., Anderson 2010, p. 248, 2014, pp. 46—48). After all, a good deal of support for theories of reuse comes from the neuroimaging literature, which, as I have pointed out several times already, is somewhat ambiguous taken by itself. As Fedorenko et al. (2011, p. 16428) explain:

standard functional MRI group analysis methods can be deceptive: two different mental functions that activate neighbouring but non-overlapping cortical regions in every subject individually can produce overlapping activations in a group analysis, because the precise locations of these regions vary across subjects, smearing the group activations. Definitively addressing the question of neural overlap between linguistic and nonlinguistic functions requires examining overlap within individual subjects, a data analysis strategy that has almost never been applied in neuroimaging investigations of high-level linguistic processing.

When Fedorenko and her colleagues applied this strategy themselves, they found that “most of the key cortical regions engaged in high-level linguistic processing are not engaged by mental arithmetic, general working memory, cognitive control or musical processing,” and they think that this indicates “a high degree of functional specificity in the brain regions that support language” (2011, p. 16431). While I do not believe that claims of this strength have the least warrant—as I shall explain, functional specificity cannot be established merely by demonstrating that a region is selectively engaged by a task—these results do at least substantiate the dissociation literature in an interesting way and make it more difficult for those who would prefer to dismiss the dissociations with a ready-made list of alternative explanations. Similar results were found by Fedorenko et al. (2012).

I think neural redundancy is the best explanation for what we see in cases like these, and that redundancy is in fact a central feature of cortical design. As I briefly mentioned in Chapter 6, the brain incorporates a large measure of redundancy of function (Edelman & Gally 2001; Mason 2010; Whiteacre 2010; Deacon 2010; Iriki & Taoka 2012; Maleszka et al. 2013). Modules (M-networks) and similar structures in the brain fall into an iterative, repetitive, and almost lattice-like arrangement in the cortex. Neighboring modules have similar response properties: laminar and columnar changes are for the most part smooth—not abrupt—as one moves across the cortex, and adjacent modules do not differ markedly from one another in their basic structure and computations (if they even differ at all when taken in such proximity). Regional solitarity is therefore not likely to be a characteristic of the brain (Anderson 2014, p. 141).10 We do not, in all likelihood, have just one module for X and one module for Y, but in effect several copies of the module for X and several copies of the module for Y, all densely stuffed into the same cortical zones. As Buxhoeveden and Casanova (2002, p. 943) explain of neurons generally:

In the cortex, more cells do the job that fewer do in other regions. . . . As brain evolution paralleled the increase in cell number, a reduction occurred in the sovereignty of individual neurones; fewer of them occupy critical positions. As a consequence, plasticity and redundancy have increased. In nervous systems containing only a few hundred thousand neurones, each cell plays a more essential role in the function of the organism than systems containing billions of neurones.

The same principle very likely holds true for functionally distinct groupings of neurons (i.e., modules), as Jungé and Dennett conjecture:

It is possible that specialized brain areas contain a large amount of structural/computational redundancy (i.e., many neurons or collections of neurons that can potentially perform the same class of functions). Rather than a single neuron or small neural tract playing roles in many high-level processes, it is possible that distinct subsets of neurons within a specialized area have similar competencies, and hence are redundant, but as a result are available to be assigned individually to specific uses. . . . In a coarse enough grain, this neural model would look exactly like multi-use (or reuse). (2010, p. 278)

This is plausibly why capacities that are functionally very closely related, but for whatever reason are forced to recruit different neural circuits, will often be localized in broadly the same regions of the brain. For instance, first and second languages acquired early in ontogeny settle down in nearly the same region of Broca’s area; and even when the second language is acquired in adulthood, the second language is still represented within Broca’s area (while artificial languages are not) (Kandel & Hudspeth 2013). The neural coactivation graphs of such C-networks must look very similar. Indeed these results suggest—and a Redundancy Model would predict—that two very similar tasks that for whatever reason are forced to recruit different neural circuits should exhibit similar patterns of activation.

The significance of this simple but surprisingly neglected feature of cortical design cannot be overstated. Indeed, I think it should rank alongside reuse as an organizing principle of the brain, for what it actually means for reuse is quite interesting. Although there is abundant evidence of the reuse of the same neural tokens to accomplish different tasks (see Chapters 3 and 5), redundancy means we must accept that at least some of the time what we will really be witnessing is reuse of the same types to accomplish these tasks.11 To my mind, this does not in any way diminish the standing of reuse. To the extent that a particular composite reuses types, and is dissociable pro tanto—residing in segregated brain tissue that is not active outside the function concerned—it is true that to that extent its constituents will appear to be domain-specific. But in this case looks will be deceiving. The classical understanding of domain specificity in effect assumes solitarity—that a module for X does something that no other module can do as well, or that even if another module can do X as well, taken together, these X-ing modules do not perform outside the X-domain. Here is an example of the latter idea (Bergeron 2007, p. 176):

a pocket calculator could have four different division modules, one for dividing numbers smaller than or equal to 99 by numbers smaller than or equal to 99, a second one for dividing numbers smaller than or equal to 99 by numbers greater than 99, a third one for dividing numbers greater than 99 by numbers greater than 99, and a fourth one for dividing numbers greater than 99 by numbers smaller than or equal to 99. In such a calculator, these four capacities could all depend on (four versions of) the same algorithm. Yet, random damage to one or more of these modules in a number of such calculators could lead to observable (double) dissociations between any two of these functions.

Here, each module performs fundamentally the same algorithm, but in distinct hardware, such that dissociations are observable between any two functions. Notice, however, that none of these modules performs outside the “division” domain. This is what allows such duplicate modules to be considered domain-specific—they perform functions that, for all that they might run in parallel on duplicate hardware, are unique to a specific domain of operation; in this case, division. If such modules could do work outside the division domain, they would lose the status of domain specificity, and acquire the status of domain neutrality (i.e., they would be domain-general). This is why a module that appears dedicated to a particular function may not be domain-specific in the classical sense. Dedication is not the same as domain specificity, and redundancy, whether of calculator algorithms or neural circuits, explains why. A composite will be dedicated without being domain-specific if its functional resources are accessible to other domains through the deployment (reuse) of neural surrogates (i.e., redundant or “proxy” tokens). In this case, its constituents will be multi-potential but single-use (Jungé & Dennett 2010, p. 278), and the domain specificity on display somewhat cosmetic. In other words, the elements here will look like type B modules, but in essence they are type C modules (or even type D regions). To take an example with more immediate relevance to the brain, a set of modules that are structurally and computationally similar may be equally suited for face-recognition tasks, abstract-object-recognition tasks, the recognition of moving objects, and so on. One of these modules could be reserved for faces, another for abstract objects, another for moving objects, and so on. What is noteworthy is that, while the functional activation may be indistinguishable in each case, and the same type of resource will be employed on each occasion, a different token module will be at work at any one time. To quote Jungé and Dennett again: “In an adult brain, a given neuron [or set of neurons] would be aligned with only a single high-level function, whereas each area of neurons would be aligned with very many different functions” (2010, p. 278). Such modules (and composites) are for all intents and purposes qualitatively identical, though clearly not numerically identical, meaning that, while they share their properties, they are not one and the same (Parfit 1984). The evidence of reuse is virtually all one way when it comes to the pervasiveness of functional inheritance across cognitive domains. It may be that this inheritance owes to reuse of the same tokens (literal reuse) or to reuse of the same types (reuse by proxy), but the inheritance itself has been amply attested. This broader notion of reuse still offers a crucial insight into the operations of cognition, and I daresay represents a large part of the appeal of the original massive redeployment hypothesis (Anderson 2007c).

It is interesting to note in this respect that, although detractors have frequently pointed out the ambiguity of neuroimaging evidence on account of its allegedly coarse spatial resolution (see § 3.3.3), suggesting that the same area will be active across separate tasks even if distinct but adjacent circuits are involved each time, this complaint can have no bearing whatsoever on reuse by proxy. Fedorenko et al. (2011, p. 16431) take their neuroimaging evidence to support “a high degree of functional specificity in the brain regions that support language,” but their results do not license this extreme claim. The regions they found to have been selectively engaged by linguistic tasks were all adjacent to the regions engaged in nonlinguistic tasks. Elementary considerations suggest that they have discovered a case of reuse by proxy involving language: the domains tested (mental arithmetic, general working memory, cognitive control, and musical processing) make use of many of the same computations as high-level linguistic processing, even though they run them on duplicate hardware. Redundancy makes it is easy to see how fairly sharp dissociations could arise—knocking out one token module need disrupt only one high-level operation: other high-level operations that draw on the same type of resource may well be spared.

The consequences of this distinction between literal reuse and reuse by proxy for much speculation about the localization and specialization of function are potentially profound. In cognitive neuropsychology, the discovery that a focal lesion selectively impairs a particular cognitive function is routinely taken as evidence of its functional specificity (Coltheart 2011; Sternberg 2011). Even cognitive scientists who take a developmental approach to modularity—i.e., who concede that parts of the mind may be modular but stress that modularization is a developmental process—concede too much when they imply, as they frequently do, that modularization results in domain-specific modules (Karmiloff-Smith 1992; Prinz 2006; Barrett 2006; Cowie 2008; Guida et al. 2016). This is true in some sense, but not in anything like the standard sense, for the Redundancy Model envisages that developmental modules form a special class of C-networks; namely, those that are qualitatively identical but numerically distinct. The appearance of modularization in development is thus fully compatible with deep domain interpenetration. In any event, the Redundancy Model does not predict that all acquired skills will be modular. The evidence suggests that, while some complex skills reside in at least partly dissociable circuitry, most complex skills are implemented in more typical C-networks (i.e., those consisting of literally shared parts).12

In one sense, asking why the cortex incorporates a large measure of redundancy of function is a bit like asking why we have two eyes, two kidneys, ten toes, and so on. The intuitive response is that by having “spare” organs, we can distribute the workload more efficiently among all of them. It is generally a good design feature of any system to have spare capacity. For instance, in engineered systems, “redundant parts can substitute for others that malfunction or fail, or augment output when demand for a particular output increases” (Whiteacre 2010, 14). The positive connection between robustness and redundancy in biological systems is clear (Edelman and Gally 2001; Mason 2010; Whiteacre 2010; Iriki & Taoka 2012). So there are good reasons for evolution to have seen to it that our brains have spare capacity. But in the case of the brain and the cortex most especially, there are other reasons why redundancy would be an important design feature. It offers a solution to what Jungé and Dennett (2010, p. 278) called the “time-sharing” problem. It may also offer a solution to what I call the “encapsulation” problem.

The time-sharing problem arises when multiple simultaneous demands are made on the same cognitive resource. This is probably a regular occurrence, and language in particular would present a whole host of opportunities for time-sharing. Here are just a few examples.

•Driving a car and holding a conversation at the same time: if it is true that some of the selfsame motor operations underlying aspects of speech production and comprehension are also required for the execution of sequenced or complex motor functions (as is perhaps exemplified by driving a manual vehicle, or operating complex machinery), how do we manage to pull this off?

•By reflecting the recursive structure of thought, the coding function may redeploy the recursive operation simultaneously during sentence production. This might be the case during the formation of an embedded relative clause—the thought and its encoding may require parallel use of the same sequencing principle. Again, how do we manage this feat?

•If metarepresentational operations are involved in the internalization of conventional sound—meaning pairs, and also in the pragmatics and mindreading that carry on simultaneously during conversation, as argued by Suddendorf (2013), it could be another instance of time-sharing. The example is contentious, but it still raises the question: How does our brain manage to do things like this?

Christiansen and Chater’s (2016) “Chunk and Pass model of language processing envisages multilevel and simultaneous chunking procedures. As they put it, “the challenge of language acquisition is to learn a dazzling sequence of rapid processing operations” (2016, p. 116). What must the brain be like to allow for this dazzling display?

Explaining these phenomena is difficult. Indeed, when dealing with clear (literal) instances of reuse, results from the interference paradigm show that processing bottlenecks are inevitable—true multitasking is impossible (see § 4.2.3). Redundancy offers a natural explanation of how the brain overcomes the time-sharing problem. It explains, in short, how we are able to “walk and chew gum” at the same time.

Redundancy might also offer a solution to the encapsulation problem. As I explained in § 4.2.3, functional composites are not likely to be characterized by informational encapsulation, because in sharing their parts with other systems they will prima facie have access to the information stored and manipulated by those other systems (Anderson 2010, p. 300). If overlapping brain networks must share information on some level (Pessoa 2016, p. 23), it would be reasonable to suppose that central and peripheral systems do not overlap. This is because peripheral systems, which are paradigmatically fast and automatic, would not be able to process inputs as efficiently if there were a serious risk of central system override—i.e., of beliefs and other central information getting in the way of automatic processing. But we know from the neuroimaging literature that quite often the brain networks implementing central and peripheral functions do overlap. This is puzzling in light of the degree of cognitive impenetrability that certain sensory systems still seem to exhibit—limited though it may be. If it is plausible to suppose that the phenomenon calls for segregated circuitry, redundancy could feature in a solution to the puzzle, since it naturally explains how the brain can make parallel use of the same resources. Neuroimaging maps might well display what appear to be overlapping brain regions between two tasks (one involving central information, the other involving classically peripheral operations), but the overlap would not exist—there would be distinct, albeit adjacent and nearly identical, circuits recruited in each case. Of course, there may be other ways around the encapsulation problem that do not require segregated circuitry: the nature and extent of the overlap is presumably important. But clearly, redundancy opens up some fascinating explanatory possibilities.

To the extent that acquired skills must overcome both the time-sharing problem as well as the encapsulation problem—for acquired competencies are often able to run autonomously of central processes—we might expect that their neural implementations incorporate redundant tissue. In concluding, let me illustrate this point by offering a gloss on a particular account of how skills and expertise are acquired during development elaborated by Guida et al. (2016) and Anderson (2014). The process involved is called “search.” Search is an exploratory synaptogenetic process, “the active testing of multiple neuronal combinations until finding the most appropriate one for a specific skill, i.e., the neural niche of that skill” (Guida et al. 2016, p. 13). The theory holds that, in the early stages of skill acquisition, the brain must search for an appropriate mix of brain areas and does so by recruiting relatively widely across the cortex. When expertise has finally developed, a much narrower and more specific network of brain areas has been settled upon, such that “[a]s a consequence of their extended practice, experts develop domain-specific knowledge structures” (Guida et al. 2016, p. 13). The gloss (and my hunch) is this: first, that repeated practice of a task that requires segregation (to get around time-sharing and encapsulation issues) will in effect force search into redundant neural terrain (Karmiloff-Smith 1992; Barrett 2006; Barrett & Kurzban 2006); second, that search will recruit idle or relatively underutilized circuits in preference to busy ones as a general default strategy. Guida et al. (2016) cite evidence that experts’ brains reuse areas of which novices’ brains make only limited use: “novices use episodic long-term memory areas (e.g., the mediotemporal lobe) for performing long-term memory tasks,” but “experts are able to (re)use these areas also for performing working-memory tasks” (Guida et al. 2016, p. 14). Guida et al., in agreement with Anderson (2014), seem to have literal reuse in mind. But the same evidence they cite is consistent with reuse by proxy. As Barrett and Kurzban (2006, p. 639) suggest, echoing a similar suggestion by Karmiloff-Smith (1992), a developmental system

could contain a procedure or mechanism that partitioned off certain tasks—shunting them into a dedicated developmental pathway—under certain conditions, for example, when the cue structure of repeated instances of the task clustered tightly together, and when it was encountered repeatedly, as when highly practiced. . . . Under this scenario, reading could still be recruiting an evolved system for object recognition, and yet phenotypically there could be distinct modules for reading and for other types of object recognition.