2002 — 2003 |
Joshi, Aravind (co-PI) [⬀] Marcus, Mitchell |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Human Language Technology 2002: Special Focus On Language Modeling of Biological Data @ University of Pennsylvania
This will support a special focus workshop at the Human Language Technology Conference in the area of Language Processing of Biological Data. The purpose of this special focus within HLT 2002 context is to bring to the attention of a wide audience of researchers across all aspects of human language technology the research opportunites and recent research breakthroughs in this newly emerging area. This support is also intended to further promote cross-disciplinary approaches to the new field of bioinformatics.
|
0.915 |
2005 — 2006 |
Marcus, Mitchell Kroch, Anthony [⬀] Kulick, Seth (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sger: Enriching Parser Output For Treebank Construction @ University of Pennsylvania
The construction of treebanks for linguistic and natural language processing (NLP) research has become more and more widespread over the past decade, beginning with the Penn Treebank and the Penn-Helsinki Parsed Corpora of Historical English and now extending to corpora of other languages, both modern and historical. The methods used in the construction of these treebanks are partially automated but require extensive manual correction, leading to a slow rate of production and a certain level of inconsistency in the output. The present project arises out of the urgent need for treebanking efforts to produce more accurate output and to do so more rapidly. With National Science Foundation support, Dr. Anthony Kroch, Dr. Seth Kulick and Dr. Mitch Marcus will improve the automated tools for corpus construction, applying recently developed techniques to enrich parser output while preserving bracketing accuracy. The primary goal is rapid deployment for treebanking but also to improve the descriptive adequacy of NLP technology on a more fundamental level. These more fundamental improvements should have important implications for increasing the power and practical utility of the technology in a range of applications beyond treebanking itself. The fundamental intellectual merit of this proposal is that it will extend the power of current methods of linguistic research and its broader impacts lie in the envisaged improvements to natural language technology for such practical applications as information retrieval and machine translation.
|
0.915 |
2005 — 2006 |
Badler, Norman (co-PI) [⬀] Marcus, Mitchell |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sger: Generating Animations of American Sign Language Classifier Predicates @ University of Pennsylvania
American Sign Language (ASL) is a full natural language, with a linguistic structure distinct from English, used as the primary means of communication for approximately one half million deaf people in the United States. Furthermore, because they are unable to hear spoken English during the critical language acquisition years of childhood, the majority of deaf high school graduates in the U.S. have only a fourth grade English reading level. Because of this low English literacy rate and because English and ASL have such different linguistic structure, many deaf people in the United States could benefit from technology that translates English text into animations of ASL performed by a virtual human character on a computer screen. But previous English-to-ASL machine translation projects have made only limited progress. Instead of producing actual ASL animations, these projects have produced restricted subsets of the language, thus allowing them to side-step many important linguistic and animation issues, including in particular the ubiquitous ASL linguistic constructions called "classifier predicates" that are required in order to translate many English input sentences. Classifier predicates are an ASL phenomenon, in which the signer uses the space around his or her body to position invisible objects representing entities or concepts under discussion; the signer's hands show the movement and location of these objects in space. Classifier predicates are the ASL phenomenon that is most unlike elements of spoken or written languages, and they are therefore difficult to translate by machine translation software. In this research the PIs and their graduate students will build on prior research in ASL linguistics, machine translation and artificial intelligence, 3D graphics simulation and human animation, to design and implement a prototype software system capable of producing animations of classifier predicates from English text. In doing so, they will address some of the most challenging issues in English-to-ASL translation, with the goal of producing a software design that can serve as a robust framework for future implementation of a complete English-to-ASL machine translation system. The prototype implementation will have sufficient visual quality and linguistic breadth to enable a pilot evaluation of the design and the quality of the output animations by deaf native ASL signers.
Broader Impacts: This research will lead to significant advances in the state of the art relating to English-to-ASL machine translation software, which will eventually allow development of new applications to provide improved access to information, media and services for the hundreds of thousands of deaf Americans who have low English literacy. Instead of displaying English text, devices like computers, closed-captioned televisions, or wireless pagers could show deaf users an animation of a virtual human character performing ASL. Novel educational reading applications software for deaf children to promote English literacy skills could also be developed. The project will also expose the graduate students involved to research issues relating to ASL and animation, and will support a summer ASL language training program at Gallaudet University for these students.
|
0.915 |
2005 — 2010 |
Marcus, Mitchell |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Unsupervised Learning of Morphology @ University of Pennsylvania
This project is working to develop a new system that simultaneously discovers the patterns of word morphology and parts of speech for a wide range of the world's languages from unannotated text. Given a quantity of training text, such a system will yield a transducer, which segments the words in new texts into stems and affixes and determine the part of speech of each word as a whole. Through unsupervised learning, an iterative bootstrapping procedure will combine several different linguistic knowledge sources to gradually build up a representation of the language in the form of paradigms. From these paradigms, symbolic part of speech rules and morphophonological rewrite rules will be extracted, which will then be compiled into a probabilistic finite-state transducer, which can label new texts with morphology and part of speech.
Despite the widespread application of machine learning techniques to natural language processing, developing morphological analyzers still involves much human effort. While the morphology of English is very simple, the automatic analysis by computer of texts or speech in the majority of the world's languages depend on the availability of appropriate morphological analyzers. It is also important for the important problem of automatic information extraction in the biomedical domain, where it is necessary to analyze the complex structure of technical terms, even in English. Such analyzers are useful in most applications in natural language processing, including parsing, information retrieval, machine translation, text summarization, correct pronunciation in speech synthesis, language models in speech recognition, language generation, and named entity recognition.
|
0.915 |
2006 — 2007 |
Marcus, Mitchell |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Doctoral Consortium At Human Language Technology Conference - North American Chapter of the Association For Computational Linguistics Annual Meeting (Hlt-Naacl) 2006 @ University of Pennsylvania
For the first time, a Doctoral Consortium is being held at the Human Language Technology conference - North American chapter of the Association for Computational Linguistics annual meeting (HLT-NAACL) 2006 in New York City on June 4, 2006. The event is designed for Ph.D. students who are in the last few years of their doctoral program (who have settled on a research direction and have typically submitted a thesis proposal). Participants will submit an application containing an overview of their research, and they will be selected based on a review process involving both students and established researchers in the area. The participating students will reflect the variety of research areas of the HLT/NAACL conference (natural language processing, speech processing, and information retrieval) and reflect a diversity of students in groups underrepresented in science and engineering.
Broader Impacts:
The event will provide an opportunity for a group of senior Ph.D. students to discuss their research and career objectives with a panel of established researchers. The event is also an opportunity for students to gain exposure for their work among the HLT/NAACL research community (especially from professionals outside of their thesis committee). Students will also attend professional development sessions and present their work to a broader audience during a poster session held during the main HLT/NAACL conference. The event has also been primarily organized and run by students -- providing all the students involved with invaluable opportunities for professional growth and interaction with senior researchers on the organizing committee of the main conference.
|
0.915 |