In an attempt to retaliate, Nittrouer wrote “A reply to “innate phonetic boundaries revisited.”” In this article she combats the criticisms made by Aslin et al. and proves their arguments to be unjustified. First, she shows how their definitions of the Universal theory and categorical perception are almost identical to her own, disproving their claim that she misrepresents and idealizes both theories. Also, Aslin et al. show that innate abilities to discriminate should not be held to stringent criteria because some speech contrasts are easier to discriminate due to differences in acoustic salience. However, depending on experience with their native language, listeners differ in their abilities to discriminate contrasts, regardless of acoustic salience. If phonetic boundaries were innate, experience would not affect discriminative ability and would be unchanged by acoustic salience. In fact, Aslin et al. report contradictory results and admit to reaching the same conclusion as Nittrouer. In their reply, Aslin et al. writes, “Thus, at best the infant literature supports a weak view of categorical perception.” At this point, Nittrouer shows that the areas of disagreement between the two arguments are hazy. In the end, Nittrouer’s argument stands strongest; phonetic categories are not as concrete as conventional notions of categorical speech perception once suggested they were.
3.) Ohala’s Support for the Importance of Acoustic-auditory Properties:
The constituents of speech perception have been debated throughout the history of psycholinguistics. Many argue that articulation provides the basis for comprehending speech. For example, the motor theory proposes that we perceive speech by first identifying the intended phonetic gestures or articulations that produce them. In other words, to decode speech signals one must compare the presumed articulation to a preexisting articulatory model. If the models match, then the signal is recognized; models can still correspond, despite differences in prosodic factors such as stress, intonation, and rate, thus solving the problem of invariance. Also, the direct realist theory of speech perception asserts that articulations are perceived directly and acoustic signals only function as a means of carrying such gestures. However, Ohala opposes both, the motor and direct realist theories and claims that speech perception does not require recovering the articulations of the speaker; instead, he hypothesizes that the primary units are acoustic. In support of his view, Ohala presents phonological data, infant and nonhuman abilities to mimic human speech without knowledge of what vocal tracts produce such sounds, and the capacity for humans to differentiate between nonspeech sounds.
Signaling systems, such as speech, try to maximize the physical differences between distinct messages. Accordingly, if vocal gestures and articulations are what must be conveyed for speech perception, then one would expect them to maintain a certain degree of differentiation. However, if such variance can be found in acoustic signals, then one would have to agree that sound carries a greater meaning in perception than actual production. Ohala proves the latter by using evidence from phonological data. He shows that of all the consonants, obstruents outnumber the rest, causing a pattern of popping and hissing sounds. Since obstruents are articulated similarly to other consonants, this pattern suggests that meaning is being conveyed through acoustic-auditory properties and not through pronunciation. The disproportionate use of sibilant fricatives also provides support for primary units of speech perception being acoustic and not articulatory. Sibilant fricatives are unique because they result in high-frequency noise which puts them in contrast to other fricatives. Since we must rely on some measure of differentiability to comprehend differences in speech, we must rely on the sounds that are most distinct. The popularity of sibilant fricatives in speech shows that meaning is conveyed in the acoustic signal, rather than through articulations.
It has also been well established that certain nonhuman species can mimic human speech. Using commonsense, one can agree that such species have no clue as what is going on inside the human’s vocal tract, yet still they can exact enough information to replicate human sounds. This suggests that the animals are using the messages conveyed through the acoustic signals of speech and not the details of articulation. Also, humans can perceive and classify sounds, such as machine noises, automobiles, and appliances, without the ability to recover the mechanism producing them. Similar to speech, we rely on the sound itself and not the mechanisms that produce the signal. Along with the prominence of certain phonological structures, the ability for nonhumans to mimic human sounds and for humans to differentiate many nonhuman sounds proves that the auditory system is more complex then we first thought. In fact, acoustic-auditory properties might have more importance in speech perception then articulation, itself.
Ohala also addresses the issue of learning a new language; he admits that in learning, recovering the articulation of the teacher is very important to acquisition. However, he proves that outside formal instruction, language students learn through trying different articulatory gestures, analyzing the feedback of the native language speakers, adjusting their own pronunciations, and trying again. This process of trial and error, suggests that speech perception of one’s native language, which occurs much more rapidly, must be based not on articulation but on acoustic properties. Also, even in second languages, the ability to differentiate sounds auditorily occurs before the ability to articulate such differences. Ohala cautions psycholinguist in jumping to the conclusion that perception occurs in reference to production, he suggests the possibility of a bidirectional model, encompassing both articulation and sound, and he raises awareness of the importance of acoustic-auditory signals.
- The Dual Route of Visual Word Recognition:
Ability to perceive language is multi-faceted; it involves speaking, listening, reading and writing. These tasks, in themselves contain many sub-categories, which raise their own questions about speech perception; for example, what procedures allow for our ability to pronounce, out loud, what we read from print? In Coltheart’s dual-route model of reading aloud, two procedures are attributed to this capability. In order to process printed words into spoken ones, we rely on lexical and nonlexical procedures. The lexical system is responsible for our access to the visual representation (held in our orthographic input lexicon) of the word we are viewing. It also permits the recovery of that word’s spoken form from our phonological output lexicon. On the other hand, the nonlexical system functions in accordance to a set of letter-to-sound rules known as the GPC rules. These rules generate a phonological representation without involving a lexical search; they also operate from left to right, processing phonemes serially to the phoneme stage of the system. Both systems function independent of one another; however, it is believed that they begin and end with the same stages. Coltheart’s model is a connectionist one in that, processing occurs through inhibitory and excitatory activation. Activation occurs from the bottom up, as well as from the top down, suggesting that there is a bidirectional connection between levels.
The regularity, pseudohomophony, masked priming, and neighborhood size effects, all provide evidence that two routes exist for visual word recognition. It has been well established that regularity affects both error rates and latency of accurate responses when reading aloud. Regular words, in comparison to irregular ones, are easier to detect and pronounce, resulting in a lesser amount of processing time. In perceiving a regular word, the dual-route model suggests that activation for the correct phoneme units corresponds in both, the lexical and nonlexical systems. However, activation differs in the two systems when the word is irregular, causing conflict and greater reaction times. This can be avoided when the word is of high frequency. The more common a word, the more rapidly the lexical system can produce an outcome. Irregular, high-frequency words can be generated without activation of nonlexical procedures. A low-frequency exception word requires a greater number of processing cycles before its activation level can reach the critical value and be recognized as a word. This causes increased error rates and reaction time.
Pseudohomophones are nonwords that are pronounced in the same way as real words. It is more difficult, in lexical decision and reading-aloud tasks, to reject a pseudohomophone than for a matched nonpseudohomophone. Since the two words sound the same, phonologically, the nonword receives the same lexical activation as the actual word. Pseduohomophones that are orthographically similar are, especially, difficult to refuse. This is because these nonwords generate so much activation in the nonlexical system that more time (i.e. number of processing cycles) must be devoted to them in order to reach a lexical decision.
For masked priming tasks, Coltheart hypothesizes that subjects will be faster at recognizing a word if it has first been primed. He attributes this phenomenon to greater activation of the units in the visual word recognition system (orthographic input lexicon) due to memory of the priming stimulus. If the prime is a homophone (a real word that sounds the same as another word but conveys a different meaning) or a pseduohomophone, Coltheart suggests that the priming stimulus will set the starting phonological activation of the primed word above zero, resulting in a faster reaction time. Coltheart also considers the neighborhood size effect. The neighborhood size of a word/nonword is the number of different real words that differ from it by only one letter. Ability to make a lexical decision depends on the amount of neighbors the letter string has. Refusing a nonword that has many neighbors is more difficult than a nonword with few neighbors. This is because the many neighbors of a nonword, such as “sare”, cause high activation of the “are” part of the letter string. As a result, one’s ability to label “sare” as a nonword becomes distracted. However, the number of neighbors facilitates one’s ability to accept a real word, thus reducing reaction time. A word such as “meat” is highly activated because of its many neighbors, thus making the lexical decision that it is a word an easy one.
In conjunction to the above effects, Coltheart also studied the effects of position of irregularity on naming latency and used his findings as further support for his dual-route model of serial processing in reading aloud. He found that there was a decline in regularity effect due to the positioning of the irregularity. Meaning, the later in the word the irregularity occured, the faster his subjects were at processing it. The activation we receive prior to irregularity is greatest when the irregularity is placed at the end of the word; often, the activation generated is enough for us to produce a pronunciation. If the irregularity occurs at the beginning of the word we meet instant inhibition, making it more difficult to recognize and pronounce the word.