1988 — 1992 |
Price, Patti Shattuck-Hufnagel, Stefanie (co-PI) [⬀] Ostendorf, Mari |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Prosody Analysis/Synthesis Using Probabilistic Models and Linguistic Theory @ Trustees of Boston University
The objectives of this project are (1) to develop a computational model of prosody for speech synthesis and analysis, and (2) evaluate this model in speech synthesis and as a potential knowledge source for speech recognition and understanding. The approach is multidisciplinary, combining linguistic theory, speech knowledge and statistical modeling technique. The initial model is based on FM radio newscasting speech. The research effort includes: design and collection of an appropiate database, hand labeling of prosodic units and measurement of their acoustic correlates in part of the database, development of techniques to automatically segement and label prosodic units, implementation and evaluation of the model for analysis and synthesis.
|
0.943 |
1989 — 1993 |
Rohlicek, Jan Ostendorf, Mari |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Segment-Based Acoustic Models With Multi-Level Search Algorithms For Continuous Speech Recognition @ Trustees of Boston University
This award in the NSF/DARPA Joint Initiative on Image Understanding and Speech Recognition is for research on multilevel acoustic and phonetic models for large-vocabulary speaker-independent continuous speech recognition. Drs. Ostendorf and Rohlicek will use the hidden-Markov speech recognition system at BBN Systems and Technologies Corporation as a baseline, and will add multilevel phonetic segment modeling to improve performance. The grant includes support for several graduate and undergraduate students, as well as workstation hardware and software needed for speech recognition research.
|
0.943 |
1989 — 1996 |
Ostendorf, Mari Shattuck-Hufnagel, Stefanie (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Evaluating the Use of Prosodic Information in Speech Recognition An Understanding @ Trustees of Boston University
This award in the NSF/DARPA Joint Initiative on Image Understanding and Speech Recognition is for the study of prosody as a tool in speech recognition and understanding. Prosody includes such features as lexical and phrasal stress, intonation contours, and metrical phrase structure. Separating these processes from the syntactic elements of speech analysis should reduce word error rates and provide clues to intended meaning. Drs. Ostendorf, Price, and Shattuck-Hufnagel will begin with a study of radio-announcer speech (already begun under NSF Grant IRI-8805680), and will incorporate their findings into the spoken language system at SRI International.
|
0.943 |
1994 — 2000 |
Nawab, S. Hamid [⬀] Ostendorf, Mari |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Segment-Based Acoustic Models For Continuous Speech Recognition @ Trustees of Boston University
9408896 Ostendorf This is a standard award made as an extension to the research conducted under IRI-8902124 and is funded under ARPA's program competition for the Augmentation Awards for Science and Engineering Research Training. The research proposed investigates model-based approaches or the representation of channel noise and distortion via cepstrum and parametric transformations, involving one graduate and one undergraduate student following research topic stimulated by the previous award. Graduate student research under this award considers a segmental model approach to model-based channel compensation using parallel HMMs and maximum likelihood channel identification of both additive and convolutional noise. The undergraduate research topic involves the evaluation of signal processing approaches for noise and channel compensation as they compare with more classical approaches that use standard Melwarped cepstra and derivative features. Both of these projects complement the objective of the original award and augment it.
|
0.943 |
1996 — 2001 |
Ostendorf, Mari |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Speech Generation For Human-Computer Interaction @ Trustees of Boston University
The goal of this project is to increase effectiveness of human-computer communication by improving the quality of automatically generated speech. Using the known linguistic structure that is a by-product of text generation, the research investigates both utterance-level and dialog-level control of prosody (i.e. phrasing, emphasis and intonation) i n a commercial synthesizer. The approach -- development of statistical models of the mapping between meaning and prosodic structure -- is novel in its emphasis on automatic learning algorithms associated with the aim of portability to different task domains/generators. The research involves: collection and automatic prosodic labeling of `acted` speech in target task domains; use of generated syntactic, semantic and discourse annotation to drive prosodic control modules; investigation of the role of prosody in computer initiative, e.g. in clarification or error correction subdialogs; and development of evaluation protocols for assessing speech generation quality and its impact on human-computer interaction. The research will benefit existing spoken language interfaces by providing higher quality speech output and more flexibility for response generation, but it will also open up applications for human-computer interaction where a visual display is not available, e.g. in small devices, in telephone-based computer access or for persons who are visually impaired.
|
1 |
1998 — 1999 |
Ostendorf, Mari |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Workshop For Discussing Research Priorities and Evaluation Strategies in Speech Synthesis @ Trustees of Boston University
The purpose of the workshop is to discuss and develop a plan for a research program to advance the state of the art in speech synthesis technology. A group of participants will include government, industry, and academic representatives, who have expertise in a number of areas directly or indirectly related to speech synthesis. The workshop program includes both plenary and breakout sessions, with presentations from many of the participants. The focus will be on identifying key technology during the first day and on developing new ideas for evaluation and research infrastructure needs during the second day. A report of the workshop findings will be presented to the sponsors and the public.
|
0.943 |
1998 — 2001 |
Nawab, S. Hamid [⬀] Castanon, David (co-PI) [⬀] Karl, William (co-PI) [⬀] Ostendorf, Mari |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mri: Acquisition of Computer Facilities to Support An Interdisciplinary Multidata Signal and Image Processing Laboratory @ Trustees of Boston University
EIA-9871159 Nawab, S. Hamid Castanon, David A. Boston University MRI: Acquisition of Computer Facilities to Support an Interdisciplinary Multidata Signal and Imaging Processing Laboratory The requested instrumentation will serve to establish a dedicated interdisciplinary signal and image processing facility, serving a core of eight faculty members and their students in the Electrical and Computer Engineering Department at Boston University. The equipment is intended to develop a unified signal and image processing computing facility in the department. This facility will support student and faculty research projects, enabling new multi-modal signal and image processing projects through access to a variety of I/O devices and enhancing current and pending research projects by making cross-disciplinary work much easier with the existence of a common software environment and fast data storage and computing platforms. The proposed equipment consists of, 1) a computational engine with significant core memory to provide the necessary tools for tracking the challenging computational problems; 2) a dedicated data server; 3) a number of satellite computers of varying computational power and type, and 4) a wide variety of input and output devices chosen to provide state-of-the-art capture, storage, manipulations, and presentation capabilities in major functional areas associated with signal and image processing research and education.
|
0.943 |
1999 — 2001 |
Ostendorf, Mari |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Stimulate: Modeling Structure in Speech Above the Segment For Spontaneous Speech Recognition @ University of Washington |
1 |
2000 — 2006 |
Ostendorf, Mari Charniak, Eugene (co-PI) [⬀] Picone, Joseph [⬀] Jelinek, Frederick (co-PI) [⬀] Johnson, Mark |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Information Access to Spoken Documents @ Mississippi State University
This is the first year funding of a four-year continuing award. This project addresses issues relating to the construction of a system for answering questions about information contained in a collection of spoken documents. It focuses on the key scientific questions that arise in the integration of prosodic information, speech recognition and parsing in the retrieval of spoken documents, but will not involve implementation of a complete system. There are four key themes in the research: utilizing parsing in information retrieval; integrating prosodic information in parsing spoken language; incorporating uncertainty in parsing to handle speech recognition errors; and improvements to speech recognition of spontaneous speech. All components will share a probabilistic formulation, thereby affording a systematic framework for integrating the information they provide. A primary project goal is to better understand how information provided by one of these components might be effectively utilized to improve he performance of other components in the information retrieval task. Absent a corpus tailored to the information retrieval topics the PI and his team plan to study, progress will be evaluated using existing annotated text collections such as Switchboard and LDC's Broadcast News collections. The work will lead to advances in information extraction from telephone messages, conversations, university lectures, or from any text (such as encyclopedias), and should potentially serve as the basis for a sorely needed sophisticated web browser technology and data mining applications, which in turn would enable people who currently under-utilize computers to become full participants in the information revolution.
|
0.93 |
2001 — 2006 |
Ostendorf, Mari Morgan, Nelson [⬀] Stolcke, Andreas Ellis, Daniel Kirchhoff, Katrin (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr/Pe+Sy:Mapping Meetings: Language Technology to Make Sense of Human Interaction @ International Computer Science Institute
Meetings are essential and ongoing processes in almost every enterprise. To record meetings is to provide a history of human interactions. However, two central challenges remain: (1) how to make sense of the group dynamics in those meetings and (2) how to search through a history of those interactions to find the information one may want. This research aims to develop automatic information processing systems based on the metaphor of a "meeting map", a structured representation that supports the presentation of multiple views of a meeting at different scales. The project will focus on two broad map categories: content maps, portraying topics discussed and decisions made; and interaction maps, identifying the roles and relationships of the participants and the level of concurrence. Building content and interaction maps will involve automatic classification of information from topic changes and salience to disagreement/consensus. These maps will be used for generating simple indicative summaries, and off-the-shelf visualization tools will be used for map presentation. The project will build on analyses of 100 hours of meetings. Evaluations will use objective recognition accuracies and expert assessments of automatic summaries. Meeting maps respect the diversity of information present in meeting scenarios, and provide effective support for human-to-human interactions.
|
0.913 |
2003 — 2008 |
Ostendorf, Mari Knight, Kevin (co-PI) [⬀] Marcu, Daniel Bilmes, Jeffrey (co-PI) [⬀] Kirchhoff, Katrin (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Applying Translation Technology to Language Modeling @ University of Washington
Virtually all systems that produce text, from speech recognition to natural language generation, use a language model as a core component in order to rank word strings by their well-formedness and appropriateness for a given context. These models are difficult to develop both because of algorithmic challenges specific to integration of multiple knowledge sources and the lack of robust language processing tools. The goal of this project is to develop models via new techniques for exploiting the information available in parallel multilingual corpora, i.e., translations of the same source in multiple languages. Such corpora implicitly encode a hidden, common core that can be uncovered using state-of-the-art estimation techniques. The project involves: i) automatic learning of structure within and across languages at multiple levels of abstraction: semantics, morphology, phonology, and paraphrasing, and ii) integration of the results into novel language model frameworks to address the problem of limited domain- and language-specific training data. The hypothesis is that, by sharing data and structure across languages and genres within a language, the resulting models will be richer and more robust. Such ideas were impossible to envision until recently; availability of multilingual corpora and increases in computing power make them now feasible.
This project marries machine translation and speech recognition language modeling techniques, anticipating that the combination will lead to more powerful and general models. The research will facilitate rapid development of tools for less well studied languages and will immediately impact applications in mainstream languages ranging from information management to international collaboration to bilingual education. The results will also have implications for statistical modeling problems beyond language processing.
|
1 |
2005 — 2008 |
Ostendorf, Mari Atlas, Les Roy, Sumit Riskin, Eve (co-PI) [⬀] Klavins, Eric Gupta, Maya |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
A Computing Lab For Integrated Teaching of Systems Courses in Electrical Engineering @ University of Washington
Electrical Engineering (55)
This proposal aims to improve the electrical engineering (EE) student experience in the systems area: signal processing, communications and control. The goals are to enhance student learning of theory and connections to practice, to increase interest in EE and in the systems area and foster diversity in the student body, and to expand student participation in cross-disciplinary projects. The general approach involves introducing collaborative lab experiences and team projects based on realistic applications, and to include cross-disciplinary and remote collaboration. Since today's undergraduates are increasingly familiar with technologies such as digital music, photography, and video from everyday life, early and pervasive connection to the technology can aid in understanding of fundamental theoretical concepts. Thus, the program will embrace cutting-edge technology, both in terms of exposure of students to applications and in the use of this technology in teaching. The specific plans impact the undergraduate curriculum at three levels: development of a new freshman introductory EE course, revision of an existing EE core course in signals and systems, and expansion of senior capstone design opportunities. Necessarily, the course developments will involve improvements to laboratory equipment used in teaching, as well as development of new course materials and teaching strategies. The proposed curriculum developments build on specific course material developed elsewhere, as well as results in the literature on collaborative learning. The evaluation of the proposed work with respect to the learning objectives and diversity goals will include both formative and summative efforts, including standard course evaluation forms, focus groups, attitude surveys, analysis of students' responses to specific exam questions, and quantitative analysis of inter-student classroom interactions, being conducted with assistance form the college's center for learning and teaching. Dissemination is being accomplished through the Connections website, a site that hosts course modules, and through presentations and papers at both education and technical conferences.
|
1 |
2006 — 2007 |
Ostendorf, Mari |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
U.S.-Germany Dissertation Enhancement: Predicting Hidden Structure and Punctuation in Speech For Machine Translation @ University of Washington
This project will send Dustin Hillard, a graduate student at the University of Washington, to RWTH Aachen University for three months to conduct research for his doctoral dissertation in the field of machine translation. Mr. Hillard will collaborate with Professor Hermann Ney in Aachen and he will also collaborate with Professor Alex Waibel at the University of Karlsruhe. Professor Mari Ostendorf, the PI on this award, is Mr. Hillard's thesis advisor at the University of Washington.
This research addresses two primary questions dealing with punctuation in machine translation: What information beyond the words can be given to a speech translation system to improve the translation quality in terms of evaluation of the translated words? How is the punctuation for the target language output best predicted?
The experimental results will provide comprehensive comparisons of how punctuation and hidden prosodic structure interact with machine translation. In addition, unsupervised learning algorithms will be investigated for detecting hidden prosodic structure in the source language using syntactic side information and for communicating that structure to translation and target language output.
Intellectual Merit The research supported by this award is challenging and could potentially advance fundamental knowledge in machine translation. It represents an intellectually well-founded partnership between researchers in the U.S and researchers in Germany.
Broader Impacts Support is provided for one graduate student to conduct research in Germany, thus providing a U.S. student with an international research experience. In addition, advances in machine translation will help communication in a multi-lingual world and could have applications to the commercial and national security sectors.
|
1 |
2008 — 2017 |
O'donnell, Matthew (co-PI) [⬀] Burgstahler, Sheryl Ostendorf, Mari Lange, Sheila (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Accessstem: the Northwest Alliance For Students With Disabilities in Science, Technology, Engineering, and Mathematics-Phase Ii (Accessstem2) @ University of Washington
The "AccessSTEM: The Northwest Alliance for Students with Disabilities in Science, Technology, Engineering, and Mathematics - Phase II (AccessSTEM2)" project will increase the associate, baccalaureate, and graduate science, technology, engineering and mathematics (STEM) degree attainment of individuals with disabilities in the Seattle, WA region. The primary institution, the University of Washington (UW), is partnering with Bellevue Community College (BCC), Seattle Central Community College (SCCC), and all high schools within the Seattle Public Schools system to accomplish this goal.
The AccessSTEM-Phase 2 Alliance will increase the associate, baccalaureate and graduate STEM degree attainment of students with disabilities by attending to the following four objectives:
1. Implement changes within awardee and partner postsecondary institutions (UW, BCC, SCCC) to make STEM programs more welcoming and accessible to students with disabilities (e.g., more accessible websites and science labs, STEM publications that encourage the participation of students with disabilities);
2. Create and expand engagement of stakeholders (precollege STEM educators, disability services, veteran associations, projects that broaden participation in STEM, and industry and career services) in fostering STEM education and careers that are welcoming and accessible to people with disabilities;
3. Implement evidence-based practices (e.g., mentoring, peer support, internships) to increase numbers of individuals with disabilities moving through critical junctures to STEM associate, baccalaureate, and graduate degrees and careers; and
4. Support and expand an online resource center that shares research and promising practices worldwide.
|
1 |
2009 — 2015 |
Ostendorf, Mari |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Simplifying Text For Individual Reading Needs @ University of Washington
A surprisingly large number of Americans read below their grade level, either because of limited education or because their native language is not English. Low reading levels impact a child?s progress in school and an adult?s job opportunities as well as limiting information access. This project aims to improve access by developing new language processing technology for selecting and transforming text to obtain material at lower reading levels, extending current paraphrasing work that focuses on summarization as compression to include explanatory expansions. In addition, the goal is to develop adaptive models that can be tuned to a specific domain and an individual's needs. The approach involves analyzing corpora of comparable text collected from the web, developing models of paraphrasing aimed at generating simplified English, developing a discourse-sensitive clause selection method for expanding or omitting details, and exploring representations of language that facilitate domain and user adaptation. The language processing contributions of this work include development of text resources to support language technology in education applications, new representations of reading difficulty, and advances in automatic methods of paraphrasing. The broader impact of this project includes making information more accessible to people with limited English reading proficiency. In addition, students working on the project will have the opportunity to interact with teachers from a local school so as to better understand the impact of their work and guide their approach, and their work will be showcased in University of Washington diversity-oriented outreach programs.
|
1 |
2013 — 2016 |
Levow, Gina-Anne [⬀] Ostendorf, Mari Wright, Richard |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Ataros: Automatic Tagging and Recognition of Stance @ University of Washington
From activities as simple as scheduling a meeting to those as complex as balancing a national budget, people take stances in negotiations and decision making. While the related areas of subjectivity and sentiment analysis have received significant attention, work has focused almost exclusively on text, whereas much stance-taking activity is carried out verbally. Early experiments suggest that people alter their speaking style when engaged in stance-taking, and listeners can much more readily detect negative attitudes by listening to the original speech than by reading transcripts. However, due to the diversity of factors that influence speech production, from individual differences to social context, isolating the signals of stance-taking in speech for automatic recognition presents substantial challenges.
This Early Grant for Exploratory Research project represents a focused exploration of spoken interactions to provide a characterization of linguistic factors associated with stance-taking and develop computational methods that exploit these features to automatically detect stance-taking behavior. Robust linguistic markers of stance-taking are identified through analysis of both controlled elicitations and archived recordings of Congressional hearings on the financial crisis. The former allow experimental comparisons to highlight sometimes subtle contrasts, while the latter enable validation and extension of those findings in real-world, high-stakes discussions. The analysis includes novel acoustic-phonetic measures of dynamic patterns in speech, such as vowel space scaling and pitch/energy velocity, with sophisticated visualization techniques developed to support feature exploration. Findings are validated via stance recognition experiments combining acoustic and lexical cues, which lay the foundation for automatic tracking of trends and shifts in attitudes.
|
1 |
2016 — 2019 |
Ostendorf, Mari Wright, Richard |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Modeling Idiosyncrasies of Speech For Automatic Spoken Language Processing @ University of Washington
Spoken language encodes significant information in pitch and energy dynamics (prosody) and in disfluencies (self-edits) that human listeners use to understand a talker's meaning and the social/emotional context. Due to a lack of adequate models of these phenomena, current speech processing systems make little use of this information. This project tackles modeling limitations by focusing on unexpected speech phenomena, assuming that these events often carry the most valuable information, and by working with speech from a variety of social contexts. The work has applications that range from literacy assessment to improved human-computer interaction. Further, understanding the communicative role of different disfluencies in non-clinical speech will lead to more accurate clinical diagnoses. Educational aspects aim at broad exposure of the research methods to a diverse group of students at all academic levels through short courses, student TED talks, and work with a UW program for attracting and retaining low income students in STEM fields.
The goal of this project is to develop computational models that extract information from prosodic cues and disfluencies for use in a variety of spoken language processing applications. The approach leverages multiscale context in predictors of expected acoustic dynamics of speech in order to automatically identify regions of atypical timing or exaggeration. Specifically, it uses deep neural networks with parallel text and acoustic inputs to represent local dynamics in combination with point process models to characterize global rates of atypical events. Linguistic analyses and crowd-sourced perception studies are used to determine types of anomalies that are information bearing (vs. noise that should be ignored in language processing), leading to improved speech understanding models. Experiments make use of a variety of data sources to assess adaptation strategies and ensure generalizability of findings. Evaluation of computational models is in the context of multiple downstream applications in order to broadly explore potential contributions.
|
1 |
2022 — 2025 |
Ostendorf, Mari |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Improving Speech Technology For Better Learning Outcomes: the Case of Aae Child Speakers @ University of Washington
The lack of reading proficiency seen in children of underserved school districts has lasting impacts on students’ performances in various subjects. Low literacy is an especially pressing issue for African American students. Interactive spoken language systems offer the possibility of a powerful tool for assisting in early childhood education, freeing up teachers’ time, and engaging students in repeated opportunities for learning. These systems involve both Automatic Speech Recognition and Text-to-Speech Systems. The goal of this research is to improve the performance of such systems for young speakers of African American English (AAE) such that automated oral literacy assessment can be developed. The research has important societal and technological impacts. It will enhance the usability of speech technology in early education for AAE speaking children, providing a model for better supporting students with diverse dialects. Many under-resourced children do not have access to adequate reading and language assessments, and the proposed work will address these issues by creating methods for adapting spoken language technology to AAE children, increasing fairness in speech technology on a broader scale. The work has strong outreach and dissemination programs and will train undergraduate and graduate students in interdisciplinary research in Electrical and Computer Engineering, Linguistics, Education, and Psychology.
Challenges facing children’s Automatic Speech Recognition (ASR) are due to (1) lack of child speech data and, hence, current models used for recognition are trained using data collected from adult speakers, and (2) children display a wider range of intra- and inter- speaker variability than adults. ASR performance is especially poor for children who are non-native English speakers or those who at times transition into dialects such as AAE that are different from what ASR systems are typically trained on. In addition, most dialog systems built on text-to-speech (TTS) technology are designed using General American English (GAE) voices, which minority children may not identify with. In the high-stakes area of education, these considerations impact the effectiveness of technology for different groups. The work will utilize a new and continuously developing database of AAE children's speech to research the impact of spoken language systems on children’s learning outcomes. On the learning side, the research will highlight the impact of dialect on literacy assessment. On the technology side, the work will yield novel machine learning algorithms for low-resource tasks. Specifically, this project will develop data augmentation techniques that can increase the amount of training data available for low-resource tasks, and data normalization techniques so that ASR performance is improved for AAE child speakers. The work on TTS will explore new methods of disentangling speaker and dialect impacts on spectral realization of phrases that model dialect density (rather than treating dialect as a categorical variable) and separately accounting for pronunciation and prosodic factors. Methods found to be effective for TTS will be leveraged in the data augmentation work for ASR and explored as a diagnostic in literacy assessment.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |