Modern technologies in teaching FLT

p align="left">Segmental Feedback. Technically, designing a voice-interactive pronunciation tutor goes beyond the state of the art required by commercial dictation systems. While the grammar and vocabulary of a pronunciation tutor is comparatively simple, the underlying speech processing technology tends to be complex since it must be customized to recognize and evaluate the disfluent speech of language learners. A conventional speech recognizer is designed to generate the most charitable reading of a speaker's utterance. Acoustic models are generalized so as to accept and recognize correctly a wide range of different accents and pronunciations. A pronunciation tutor, by contrast, must be trained to both recognize and correct subtle deviations from standard native pronunciations.

A number of techniques have been suggested for automatic recognition and scoring of non-native speech (Bernstein, 1997; Franco, Neumeyer, Kim, & Ronen, 1997; Kim, Franco, & Neumeyer, 1997; Witt & Young, 1997). In general terms, the procedure consists of building native pronunciation models and then measuring the non-native responses against the native models. This requires models trained on both native and non-native speech data in the target language, and supplemented by a set of algorithms for measuring acoustic variables that have proven useful in distinguishing native from non-native speech. These variables include response latency, segment duration, inter-word pauses (in phrases), spectral likelihood, and fundamental frequency (F0). Machine scores are calculated from statistics derived from comparing non-native values for these variables to the native models.

In a final step, machine generated pronunciation scores are validated by correlating these scores with the judgment of human expert listeners. As one would expect, the accuracy of scores increases with the duration of the utterance to be evaluated. Stanford Research Institute (SRI) has demonstrated a 0.44 correlation between machine scores and human scores at the phone level. At the sentence level, the machine-human correlation was 0.58, and at the speaker level it was 0.72 for a total of 50 utterances per speaker (Franco et al., 1997; Kim et al., 1997). These results compare with 0.55, 0.65, and 0.80 for phone, utterance, and speaker level correlation between human graders. A study conducted at Entropic shows that based on about 20 to 30 utterances per speaker and on a linear combination of the above techniques, it is possible to obtain machine-human grader correlation levels as high as 0.85 (Bernstein, 1997).

Others have used expert knowledge about systematic pronunciation errors made by L2 adult learners in order to diagnose and correct such errors. One such system is the European Community project SPELL for automated assessment and improvement of foreign language pronunciation (Hiller, Rooney, Vaughan, Eckert, Laver, & Jack, 1994). This system uses advanced speech processing and recognition technologies to assess pronunciation errors by L2 learners of English (French or Italian speakers) and provide immediate corrective feedback. One technique for detecting consonant errors induced by inter-language transfer was to include students' L1 pronunciations into the grammar network. In addition to the English /th/ sound, for example, the grammar network also includes /t/ or /s/, that is, errors typical of non-native Italian speakers of English. This system, although quite simple in the use of ASR technology, can be very effective in diagnosing and correcting known problems of L1 interference. However, it is less effective in detecting rare and more idiosyncratic pronunciation errors. Furthermore, it assumes that the phonetic system of the target language (e.g., English) can be accurately mapped to the learners' native language (e.g., Italian). While this assumption may work well for an Italian learner of English, it certainly does not for a Chinese learner; that is, there are sounds in Chinese that do not resemble any sounds in English.

A system for teaching the pronunciation of Japanese long vowels, the mora nasal, and mora obstruents was recently built at the University of Tokyo. This system enables students to practice phonemic differences in Japanese that are known to present special challenges to L2 learners. It prompts students to pronounce minimal pairs (e.g., long and short vowels) and returns immediate feedback on segment duration. Based on the limited data, the system seems quite effective at this particular task. Learners quickly mastered the relevant duration cues, and the time spent on learning these pronunciation skills was well within the constraints of Japanese L2 curricula (Kawai & Hirose, 1997). However, the study provides no data on long-term effects of using the system.

Supra-segmental Feedback. Correct usage of supra-segmental features such as intonation and stress has been shown to improve the syntactic and semantic intelligibility of spoken language (Crystal, 1981). In spoken conversation, intonation and stress information not only helps listeners to locate phrase boundaries and word emphasis, but also to identify the pragmatic thrust of the utterance (e.g., interrogative vs. declarative). One of the main acoustical correlates of stress and intonation is fundamental frequency (F0); other acoustical characteristics include loudness, duration, and tempo. Most commercial signal processing software have tools for tracking and visually displaying F0 contours (see Figure 2). Such displays can and have been used to provide valuable pronunciation feedback to students. Experiments have shown that a visual F0 display of supra-segmental features combined with audio feedback is more effective than audio feedback alone (de Bot, 1983; James, 1976), especially if the student's F0 contour is displayed along with a native model. The feasibility of this type of visual feedback has been demonstrated by a number of simple prototypes (Abberton & Fourcin, 1975; Anderson-Hsieh, 1994; Hiller et al., 1994; Spaai & Hermes, 1993; Stibbard, 1996). We believe that this technology has a good potential for being incorporated into commercial CALL systems.

Other types of visual pronunciation feedback include the graphical display of a native speaker's face, the vocal tract, spectrum information, and speech waveforms (see Figure 2). Experiments have shown that a visual display of the talker improves not only word identification accuracy (Bernstein & Christian, 1996), but also speech rhythm and timing (Markham & Nagano-Madesen, 1997). A large number of commercial pronunciation tutors on the market today offer this kind of feedback. Yet others have experimented with using a real-time spectrogram or waveform display of speech to provide pronunciation feedback. Molholt (1990) and Manuel (1990) report anecdotal success in using such displays along with guidance on how to interpret the displays to improve the pronunciation of suprasegmental features in L2 learners of English. However, the authors do not provide experimental evidence for the effectiveness of this type of visual feedback. Our own experience with real-time spectrum and waveform displays suggests their potential use as pronunciation feedback provided they are presented along with other types of feedback, as well as with instructions on how to interpret the displays.

Teaching Linguistic Structures and Limited Conversation

Apart from supporting systems for teaching basic pronunciation and literacy skills, ASR technology is being deployed in automated language tutors that offer practice in a variety of higher-level linguistic skills ranging from highly constrained grammar and vocabulary drills to limited conversational skills in simulated real-life situations. Prior to implementing any such system, a choice needs to be made between two fundamentally different system design types: closed response vs. open response design. In both designs, students are prompted for speech input by a combination of written, spoken, or graphical stimuli. However, the designs differ significantly with reference to the type of verbal computer-student interaction they support. In closed response systems, students must choose one response from a limited number of possible responses presented on the screen. Students know exactly what they are allowed to say in response to any given prompt. By contrast, in systems with open response design, the network remains hidden and the student is challenged to generate the appropriate response without any cues from the system.

Closed Response Designs. One of the first implementations of a closed response design was the Voice Interactive Language Instruction System (VILIS) developed at SRI (Bernstein & Rtischev, 1991). This system elicits spoken student responses by presenting queries about graphical displays of maps and charts. Students infer the right answers to a set of multiple-choice questions and produce spoken responses.

A more recent prototype currently under development in SRI is the Voice Interactive Language Training System (VILTS), a system designed to foster speaking and listening skills for beginning through advanced L2 learners of French (Egan, 1996; Neumeyer et al., 1996; Rypa, 1996). The system incorporates authentic, unscripted conversational materials collected from French speakers into an engaging, flexible, and user-centered lesson architecture. The system deploys speech recognition to guide students through the lessons and automatic pronunciation scoring to provide feedback on the fluency of student responses. As far as we know, only the pronunciation scoring aspect of the system has been validated in experimental trials (Neumeyer et al., 1996).

In pedagogically more sophisticated systems, the query-response mode is highly contextualized and presented as part of a simulated conversation with a virtual interlocutor. To stimulate student interest, closed response queries are often presented in the form of games or goal-driven tasks. One commercial system that exploits the full potential of this design is TraciTalk (Courseware Publishing International, Inc., Cupertino, CA), a voice-driven multimedia CALL system aimed at more advanced ESL learners. In a series of loosely connected scenarios, the system engages students in solving a mystery. Prior to each scenario, students are given a task (e.g., eliciting a certain type of information), and they accomplish this task by verbally interacting with characters on the screen. Each voice interaction offers several possible responses, and each spoken response moves the conversation in a slightly different direction. There are many paths through each scenario, and not every path yields the desired information. This motivates students to return to the beginning of the scene and try out a different interrogation strategy. Moreover, TraciTalk features an agent that students can ask for assistance and accepts spoken commands for navigating the system. Apart from being more fun and interesting, games and task-oriented programs implicitly provide positive feedback by giving students the feeling of having solved a problem solely by communicating in the target language.

The speech recognition technology underlying closed response query implementations is very simple, even in the more sophisticated systems. For any given interaction, the task perplexity is low and the vocabulary size is comparatively small. As a result, these systems tend to be very robust. Recognition accuracy rates in the low to upper 90% range can be expected depending on task definition, vocabulary size, and the degree of non-native disfluency.

FUTURE TRENDS IN VOICE-INTERACTIVE CALL

In the previous sections, we reviewed the current state of speech technology, discussed some of the factors affecting recognition performance, and introduced a number of research prototypes that illustrate the range of speech-enabled CALL applications that are currently technically and pedagogically feasible. With the exception of a few exploratory open response dialog systems, most of these systems are designed to teach and evaluate linguistic form (pronunciation, fluency, vocabulary study, or grammatical structure). This is no coincidence. Formal features can be clearly identified and integrated into a focused task design. This means that robust performance can be expected. Furthermore, mastering linguistic form remains an important component of L2 instruction, despite the emphasis on communication (Holland, 1995). Prolonged, focused practice of a large number of items is still considered an effective means of expanding and reinforcing linguistic competence (Waters, 1994). However, such practice is time consuming. CALL can automate these aspects of language training, thereby freeing up valuable class time that would otherwise be spent on drills.

While such systems are an important step in the right direction, other more complex and ambitious applications are conceivable and no doubt desirable. Imagine a student being able to access the Internet, find the language of his or her choice, and tap into a comprehensive voice-interactive multimedia language program that would provide the equivalent of an entire first year of college instruction. The computer would evaluate the student's proficiency level and design a course of study tailored to his or her needs. Or think of using the same Internet resources and a set of high-level authoring tools to put together a series of virtual encounters surrounding the task of finding an apartment in Berlin. As a minimum, one would hope that natural speech input capacity becomes a routine feature of any CALL application.

To many educators, these may still seem like distant goals, and yet we believe that they are not beyond reach. In what follows, we identify four of the most persistent issues in building speech-enabled language learning applications and suggest how they might be resolved to enable a more widespread commercial implementation of speech technology in CALL.

1. More research is necessary on modeling and predicting multi-turn dialogs.

An intelligent open response language tutor must not only correctly recognize a given speech input, but in addition understand what has been said and evaluate the meaning of the utterance for pragmatic appropriateness. Automatic speech understanding requires Natural Language Processing (NLP) capabilities, a technology for extracting grammatical, semantic, and pragmatic information from written or spoken discourse. NLP has been successfully deployed in expert systems and information retrieval. One of the first voice-interactive dialog systems using NLP was the DARPA-sponsored Air Travel Information System (Pallett, 1995), which enables the user to obtain flight information and make ticket reservations over the telephone. Similar commercial systems have been implemented for automatic retrieval of weather and restaurant information, virtual environments, and telephone auto-attendants. Many of the lessons learned in developing such systems can be valuable for designing CALL applications for practicing conversational skills.

2. More and better training data are needed to support basic research on modeling non-native conversational speech.

One of the most needed resources for developing open response conversational CALL applications is large corpora of non-native transcribed speech data, of both read and conversational speech. Since accents vary depending on the student's first language, separate databases must either be collected for each L1 subgroup, or a representative sample of speakers of different languages must be included in the database. Creating such databases is extremely labor and cost intensive--a phone level transcription of spontaneous conversational data can cost up to one dollar per phone. A number of multilingual conversational databases of telephone speech are publicly available through the Linguistic Data Consortium (LDC), including Switchboard (US English) and CALLHOME (English, Japanese, Spanish, Chinese, Arabic, German). Our own effort in collaboration with John Hopkins University (Byrne, Knodt, Khudanpur, & Bernstein, 1998; Knodt, Bernstein, & Todic,1998) has been to collect and model spontaneous English conversations between Hispanic natives. All of these efforts will improve our understanding of the disfluent speech of language learners and help model this speech type for the purpose of human-machine communication.

DEFINING AND ACQUIRING LITERACY IN THE AGE OF INFORMATION

Moll defined literacy as "a particular way of using language for a variety of purposes, as a sociocultural practice with intellectual significance" (1994, p. 201). While traditional definitions of literacy have focused on reading and writing, the definition of literacy today is more complex. The process of becoming literate today involves more than learning how to use language effectively; rather, the process amplifies and changes both the cognitive and the linguistic functioning of the individual in society. One who is literate knows how to gather, analyze, and use information resources to solve problems and make decisions, as well as how to learn both independently and cooperatively. Ultimately literate individuals possess a range of skills that enable them to participate fully in all aspects of modern society, from the workforce to the family to the academic community. Indeed, the development of literacy is "a dynamic and ongoing process of perpetual transformation" (Neilsen, 1989, p. 5), whose evolution is influenced by a person's interests, cultures, and experiences. Researchers have viewed literacy as a multifaceted concept for a number of years (Johns, 1997). However, succeeding in a digital, information-oriented society demands multiliteracies, that is, competence in an even more diverse set of functional, academic, critical, and electronic skills.

To be considered multiliterate, students today must acquire a battery of skills that will enable them to take advantage of the diverse modes of communication made possible by new technologies and to participate in global learning communities. Although becoming multiliterate is not an easy task for any student, it is especially difficult for ESL students operating in a second language. In their attempts to become multiliterate, ESL students must acquire linguistic competence in a new language and at the same time develop the cognitive and sociocultural skills necessary to gain access into the social, academic, and workforce environments of the 21st century. They must become functionally literate, able to speak, understand, read, and write English, as well as use English to acquire, articulate and expand their knowledge. They must also become academically literate, able to read and understand interdisciplinary texts, analyze and respond to those texts through various modes of written and oral discourse, and expand their knowledge through sustained and focused research. Further, they must become critically literate, defined here as the ability to evaluate the validity and reliability of informational sources so that they may draw appropriate conclusions from their research efforts. Finally, in our digital age of information, students must become electronically literate, able "to select and use electronic tools for communication, construction, research, and autonomous learning" (Shetzer, 1998).

Helping students develop the range of literacies they need to enter and succeed at various levels of the academic hierarchy and subsequently in the workforce requires a pedagogy that facilitates and hastens linguistic proficiency development, familiarizes students with the requirements and conventions of academic discourse, and supports the use of critical thinking and higher order cognitive processes. A large body of research conducted over the past decade (see, e.g., Benesch, 1988; Brinton, Snow, & Wesche, 1989; Crandall, 1993; Kasper, 1997a, 2000a; Pally, 2000; Snow & Brinton, 1997) has shown that content-based instruction (CBI) is highly effective in helping ESL students develop the literacies they need to be successful in academic and workforce environments.

CONTENT-BASED INSTRUCTION AND LITERACY DEVELOPMENT

CBI develops linguistic competence and functional literacy by exposing ESL learners to interdisciplinary input that consists of both "everyday" communicative and academic language (Cummins, 1981; Mohan, 1990; Spanos, 1989) and that contains a wide range of vocabulary, forms, registers, and pragmatic functions (Snow, Met, & Genesee, 1989; Zuengler & Brinton, 1997). Because content-based pedagogy encourages students to use English to gather, synthesize, evaluate, and articulate interdisciplinary information and knowledge (Pally, 1997), it also allows them to hone academic and critical literacy skills as they practice appropriate patterns of academic discourse (Kasper, 2000b) and become familiar with sociolinguistic conventions relating to audience and purpose (Soter, 1990).

The theoretical foundations supporting a content-based model of ESL instruction derive from cognitive learning theory and second language acquisition (SLA) research. Cognitive learning theory posits that in the process of acquiring literacy skills, students progress through a series of three stages, the cognitive, the associative, and the autonomous (Anderson, 1983a). Progression through these stages is facilitated by scaffolding, which involves providing extensive instructional support during the initial stages of learning and gradually removing this support as students become more proficient at the task (Chamot & O'Malley, 1994). Second language acquisition (SLA) research emphasizes that literacy development can be facilitated by providing multiple opportunities for learners to interact in communicative contexts with authentic, linguistically challenging materials that are relevant to their personal and educational goals (see, e.g., Brinton, et al., 1989; Kasper, 2000a; Krashen, 1982; Snow & Brinton, 1997; Snow, et al., 1989).

In a 1996 paper published in The Harvard Educational Review, The New London Group (NLG) advocated developing multiliteracies through a pedagogy that involves a complex interaction of four factors which they called Situated Practice, Overt Instruction, Critical Framing, and Transformed Practice. According to the NLG, becoming multiliterate requires critical engagement in relevant tasks, interaction with diverse forms of communication made possible by electronic technologies, and participation in collaborative learning contexts. Warschauer (1999) concurred and stated that a pedagogy of critical inquiry and problem solving that provides the context for "authentic and collaborative projects and analyses" (p. 16) that support and are supported by the use of electronic technologies is necessary for ESL students to acquire the linguistic, social, and technological competencies key to literacy in a digital world.

According to a 1995 report published by the United States Department of Education, "technology is an important enabler for classes organized around complex, authentic tasks" and when "used in support of challenging projects, [technology] can contribute to students' sense ... that they are using real tools for real purposes." Technology use increases students' motivation as it promotes their active engagement with language and content through authentic, challenging tasks that are interdisciplinary in nature (McGrath, 1998). Technology use also encourages students to spend more time on task. As they search for information in a hyperlinked environment, ESL students benefit from increased opportunities to process linguistic and content information. Used as a tool for learning, technology supports a level of task authenticity and complexity that fits well with the interdisciplinary work inherent in content-based instruction and that promotes the acquisition of multiliteracies.

THEORY INTO PRACTICE

These research findings suggest that in our efforts to prepare ESL students for the challenges of the academic and workforce environments of the 21st century, we should adopt a pedagogical model that incorporates information technology as an integral component and that specifically targets the development of the range of literacies deemed necessary for success in a digital, information-oriented society. This paper describes a content-based pedagogy, which I call focus discipline research (Kasper, 1998a), and presents the results of a classroom study conducted to measure the effects of focus discipline research on the development of ESL students' literacy skills.

As described here, focus discipline research puts theory into practice as it incorporates the principles of cognitive learning theory, SLA research, and the four components of the NLG's (1996) pedagogy of multiliteracies. Through pedagogical activities that provide the context for situated practice, overt instruction, critical framing, and transformed practice, focus discipline research promotes ESL students' choice of and responsibility for course content, engages them in extended practice with linguistic structures and interdisciplinary material, and encourages them to become "content experts" in a subject of their own choosing.

CONCLUSION

It can be seen that it is difficult and probably undesirable to attempt to determine the difficulty of a listening and viewing task in any absolute terms. By considering the three aspects that affect the level of difficulty, namely text, task, and context features, it is possible to identify those characteristics of tasks that can be manipulated. Having identified the variable characteristics of tasks in developing the model, it is necessary to look to the dynamic interaction among, tasks, texts, and the computer-based environment.

Task design and text selection in this model also incorporate the identification and consideration of context. Teachers can make provision for their influence on learner perception of difficulty by providing texts and tasks that range across these levels, and by ensuring that learners with lower language proficiency can ease themselves gradually into the more contextually difficult tasks. This can be achieved by reducing the level of difficulty of other parameters such as text or task difficulty, or by minimizing other aspects of contextual difficulty. Thus, for example, learners of lower proficiency who are exposed for the first time to a task based on a broadcast announcement would be provided with appropriate visual support in the form of graphics or video to reduce textual difficulty. The task type would also be kept to a low level of cognitive demand (Hoven, 1991, 1997a, 1997b).

In a CELL environment, this identification of parameters of difficulty enables task designers to develop and modify tasks on the basis of clear language pedagogy that is both learner-centred and cognitively sound. Learners are provided with the necessary information on text, task, and context to make informed choices, and are given opportunities to implement their decisions. Teachers are therefore creating a CELL environment that facilitates and encourages exploration of, and experimentation with, the choices available. Within this model, learners are then able to adjust their own learning paths through the texts and tasks, and can do this at their own pace and at their individual points of readiness. In sociocultural terms, the model provides learners with a guiding framework or community of practice within which to develop through their individual Zones of Proximal Development. The model provides them with the tools to mediate meaning in the form of software incorporating information, feedback, and appropriate help systems.

By taking account of learners' needs and making provision for learner choice in this way, one of the major advantages of using computers in language learning--their capacity to allow learners to work at their own pace and in their own time--can be more fully exploited. It then becomes our task as researchers to evaluate, with learners' assistance, the effectiveness of environments such as these in improving the their listening and viewing comprehension as well as their approaches to learning in these environments.

REFERENCES

1. Adair-Hauck, B., & Donato, R. (1994). Foreign language explanations within the zone of proximal development. The Canadian Modern Language Review 50(3), 532-557.

2. Anderson, A., & Lynch, T. (1988). Listening. Oxford: Oxford University Press.

3. Armstrong, D. F., Stokoe, W. C., & Wilcox, S. E. (1995). Gesture and the nature of language. Cambridge: University of Cambridge.

4. Arndt, H., & Janney, R. W. (1987). InterGrammar: Toward an integrative model of verbal, prosodic and kinesic choices in speech. Berlin: Mouton de Gruyter.

5. Asher, J. J. (1981). Comprehension training: The evidence from laboratory and classroom studies. In H. Winitz (Ed.), The Comprehension Approach to Foreign Language Instruction (pp. 187-222). Rowley, MA: Newbury House.

6. Bacon, S. M. (1992a). Authentic listening in Spanish: How learners adjust their strategies to the difficulty of input. Hispania 75, 29-43.

7. Bacon, S. M. (1992b). The relationship between gender, comprehension, processing strategies, cognitive and affective response in foreign language listening. Modern Language Journal 76(2), 160-178.

8. Batley, E. M., & Freudenstein, R. (Eds.). (1991). CALL for the Nineties: Computer Technology in Language Learning. Marburg, Germany: FIPLV/EUROCENTRES.

9. Ellis, R. (1985). Understanding second language acquisition. Oxford: Oxford University Press.

10. Faerch, C., & Kasper, G. (1986). The role of comprehension in second language learning. Applied Linguistics 7(3), 257-274.

11. Felder, R. M., & Henriques, E. R. (1995). Learning and teaching styles in foreign language education. Foreign Language Annals 28, 21-31.

12. Felix, U. (1995). Theater Interaktiv: multimedia integration of language and literature. On-CALL 9, 12-16.

13. Fidelman, C. (1994). In the French Body/In the German Body: Project results. Demonstrated at the CALICO '94 Annual Symposium "Human Factors." Northern Arizona University, Flagstaff, AZ.

14. Fidelman, C. G. (1997). Extending the language curriculum with enabling technologies: Nonverbal communication and interactive video. In K. A. Murphy-Judy (Ed.), NEXUS: The convergence of language teaching and reseearch using technology, pp. 28-41. Durham, NC: CALICO.

15. Fish, H. (1981). Graded activities and authentic materials for listening comprehension. In The teaching of listening comprehension. ELT Documents Special: Papers presented at the Goethe Institut Colloquium Paris 1979, pp. 107-115. London: British Council.

16. Garrigues, M. (1991). Teaching and learning languages with interactive videodisc. In M. D. Bush, A. Slaton, M. Verano, & M. E. Slayden (Eds.), Interactive videodisc: The "Why" and the "How." (CALICO Monograph Series, Vol. 2, Spring, pp. 37-43.) Provo, UT: Brigham Young Press.

17. Gassin, J. (1992). Interkinesics and Interprosodics in Second Language Acquisition. Australian Review of Applied Linguistics 15(1), 95-106.

18. Hoven, D. (1997a). Instructional design for multimedia: Towards a learner-centred CELL (Computer-Enhanced Language Learning) model. In K. A. Murphy-Judy (Ed.), NEXUS: The convergence of language teaching and research using technology, pp. 98-111. Durham, NC: CALICO.

19. Hoven, D. (1997b). Improving the management of flow of control in computer-assisted listening comprehension tasks for second and foreign language learners. Unpublished doctoral dissertation, University of Queensland, Brisbane, Australia. Retrieved July 25, 1999 from the World Wide Web: http://jcs120.jcs.uq.edu.au/~dlh/thesis/.

20. Richards, J. C. (1983). Listening comprehension: Approach, design, procedure. TESOL Quarterly 17(2), 219-240.

Страницы: 1, 2

В соцсетях