2014 — 2015 |
Bergen, Leon Gibson, Edward [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Doctoral Dissertation: Investigating the Role of Grammatical Representation in Language Learnability @ Massachusetts Institute of Technology
Technologies which process natural language have become ubiquitous in the last decade. Web search engines, for example, process billions of pages of text, in order to determine which of those pages best match a user's search query. Many interfaces for interacting with computers -- for example, Apple's Siri personal assistant -- take voice-issued commands from their users, and must process these commands in order to follow the users' instructions. Finally, machine translation technologies have become available for many of the world's most common languages, allowing users to automatically translate text that they find in foreign books or websites. These technologies mostly rely on simple models of language, known as n-gram models or context-free grammars, which were developed in the 1950's and 1960's, and refined in later decades. These simple models of language have many advantages, most notably that they can be used to process large amounts of data very quickly. Because of their simplicity, however, these models are not able to capture many aspects of meaning in natural language. This has resulted in limitations for the technologies discussed above; virtual personal assistants are only able to process very simple types of instructions, and machine translations is still far from being as accurate as human translation. In the current project, Leon Bergen and Dr. Edward Gibson will be investigating more sophisticated kinds of language models, with the goal of increasing the ability of computers to understand language.
Under the direction of Dr. Gibson, Mr. Berger will be studying language models known as mildly context-sensitive grammars. These grammars are able to express certain types of linguistic knowledge that humans have, but which cannot be expressed using simpler types of grammatical formalisms. For example, native speakers of English know that a declarative sentence like "Mary kicked the ball" is closely related in meaning to the question "What did Mary kick?" Although this fact seems obvious, it is difficult (or impossible) to express using simple types of grammars. However, mildly context-sensitive grammars can be used to express this knowledge in a very natural way. Mr. Bergen and Dr. Gibson will be studying whether mildly context-sensitive grammars can be automatically learned from examples of grammatical sentences. To do this, they will be using techniques from machine learning, a branch of computer science and statistics that develops algorithms that can automatically learn from data. The researchers will integrate these learning algorithms with their grammatical formalism, and will test whether their method learns an accurate grammar. The accuracy of the grammar will be evaluated using a corpus -- a collection of sentences -- in which every sentence has been manually annotated with its correct grammatical structure. If accurate mildly context-sensitive grammars can be learned in this manner, then this provides a potential method for improving the natural language processing technologies which were discussed above. In particular, because this method does not require an expert to write down the complete grammar for a language, it has the potential to be deployed without tremendous engineering effort, and may be deployed easily in foreign languages.
|
0.913 |
2020 — 2024 |
Bergen, Leon Polikarpova, Nadia Bakovic, Eric [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Syphon: a Framework For Automated Phonological Reasoning @ University of California-San Diego
This project develops new tools for the study of phonology, the sound patterns in human languages. For example, phonology is concerned with explaining why the past tense suffix of different English verbs is pronounced differently: ?begg ed? is pronounced [beg d], while ?zipp ed? is pronounced [zip t]. The explanation in this case is a phonological process that turns the past tense suffix /d/ into its voiceless counterpart [t] in [zip t] because it occurs after a voiceless consonant /p/. Phonological inference is the problem of discovering a formal description of a phonological process that explains given data (e.g. examples of English verb form pronunciations). Inference is an error-prone and time-consuming task for a phonologist, especially given that inference results depend on the formalism used to describe processes, and there is no single formalism universally accepted in the community. On the contrary, phonologists continuously propose new and refine existing formalisms in order to explain more and more observed language data.
The goal of this project is to build a software framework, SyPhon, that automates the process of phonological inference. SyPhon takes as input datasets that illustrate phonological processes, as well as a specification of the formal language for describing processes. The framework produces as output the optimal explanation (according to some cost function) of the given data in a given formal language. SyPhon enables phonologists to rapidly explore different theories, by varying the formal language and the cost function, and observing the inference results on a dataset. The core technical challenge of this project is the extreme computational cost of phonological inference, which requires searching a large space of possible formal descriptions. To make such inference feasible, the investigators leverage state-of-the-art techniques from an area of computer science called program synthesis; these techniques allow SyPhon to reduce the search problem to a constrained optimization problem that is efficiently solvable in practice.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |