2012 — 2014 |
Van Durme, Benjamin Callison-Burch, Chris (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Combining Natural Language Inference and Data-Driven Paraphrasing @ Johns Hopkins University
Natural language inference (NLI) and data-driven paraphrasing share the related goals of being able to detect the semantic relationship between two natural language expressions, and being able to re-word an input text so that the resulting text is meaning-equivalent but worded differently. On the one hand, work in recognizing textual entailment (RTE) within NLI has attempted to formalize the process of determining whether a natural language hypothesis is entailed by a natural language premise, sometimes called "natural logic". Research in data-driven paraphrasing, on the other hand, attempts to extract paraphrases at a variety of levels of granularity including lexical paraphrases (simple synonyms), phrasal paraphrases, phrasal templates (or "inference rules"), and sentential paraphrases, for various downstream applications such as question answering, information extraction, text generation, and summarization.
This EAGER award explores bridging the gap, through analysis of sentential paraphrasing via synchronous context free grammars (SCFGs), and how they may be coupled to formal constraints akin to recent work in phrase-based formulations of natural logic for RTE. Data-driven paraphrasing has largely neglected semantic formalisms, and NLI has relied heavily on hand-crafted resources like WordNet. If this project is successful it will potentially lead towards NLI systems that are more robust, and paraphrasing systems that are better formalized. Taken together, these improvements will allow better RTE systems to be developed. Moreover, this project has the potential to impact widely used human language technologies such as web search and natural language interfaces to mobile devices, and to further the connection between computational semantics and formal linguistics.
|
0.915 |
2013 — 2018 |
Van Durme, Benjamin Rawlins, Kyle (co-PI) [⬀] Smolensky, Paul [⬀] Legendre, Geraldine (co-PI) [⬀] Omaki, Akira (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Inspire Track 1: Gradient Symbolic Computation @ Johns Hopkins University
This INSPIRE award is partially funded by the Linguistics Program and the Perception, Action & Cognition Program in the Division of Behavioral and Cognitive Sciences in the Directorate for Social, Behavioral & Economic Sciences; by the Robust Intelligence Program in the Division of Information & Intelligent Systems in the Directorate for Computer & Information Science & Engineering; and by the Algorithmic Foundations Program in the Division of Computer and Network Systems in the Directorate for Computer & Information Science & Engineering.
Discrete, combinatorial systems of structured symbols permeate human cognition in domains such as language, motor control, complex action planning, learning, and higher-level vision. Nonetheless, the computational apparatus that the brain exploits is based on continuous, activation-based propagation of information through complex networks of neurons. A fundamental problem of the cognitive sciences is how to integrate gradient, continuous neural computation with the discrete combinatorial dimension of cognition. The solution to this puzzle will provide a deeper understanding of the mind and may also serve as the basis of a new generation of computing systems capable of authentically brain-like behavior.
Under the direction of Dr. Smolensky, the research team will develop an approach to this puzzle by exploring and testing the predictions of their theory of Gradient Symbolic Computation (GSC) in the domain of language. Their efforts will include the development of the formal, mathematical foundations of GSC. In parallel, the PIs will develop a framework for modeling Gradient Symbolic Processing. To that end, the PIs will use computational modeling and experimental psycholinguistic studies of phenomena that typify the morpho-phonological, syntactic, and semantic characteristics of language and language processing.
The broader impacts of the work include the potential to transform general computing for future approaches to computer design, to provide innovations in computer language processing, and to empower major advances in our understanding of human language, its impairment in disease, and its learning and remediation. The project also strongly engages STEM education. Undergraduate, graduate, and post-doctoral researchers will all play key roles in highly interdisciplinary STEM research integrating experimental, theoretical, and computational methods. The new type of computation created will provide an integrative framework for developing courses bridging computation theory, psychology, and linguistics. Pedagogical materials developed in these courses will be made publicly available to facilitate undergraduate and graduate program development at other institutions.
|
0.915 |
2014 — 2017 |
Szalay, Alexander (co-PI) [⬀] Burns, Randal Budavari, Tamas Braverman, Vladimir [⬀] Van Durme, Benjamin |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Bigdata: F: Dka: Collaborative Research: Clustering Algorithms For Data Streams @ Johns Hopkins University
This project will develop novel theoretical methods and algorithms for clustering massive datasets with applications to astronomy, neuroscience and natural language processing. Clustering is the process of creating groups of data based on similarities between individual data points. The developed theoretical methods will be used in applications where clustering algorithms are critical and the input data is extremely large. First, new clustering algorithms will be designed to scale and will allow for better cosmological simulations. The simulations involve billions of particles in each snapshot, and existing clustering algorithms based upon a simple friends-of-friends approach do not scale to these cardinalities. Second, this project will advance the computational capabilities in statistical neuroscience by employing clustering algorithms to discover both regular patterns and anomalies in normal and abnormal brain graphs. Finally, this research will explore the important topic of finding anomalies in massive text streams, such as Twitter. In this setting, one is concerned with detecting anomalous bursts in traffic content that share a similar pattern. These bursts might signal an important political event or a natural disaster. This project will support undergraduate and graduate research aimed at developing skills needed for algorithmic work on massive data sets.
There exist numerous heuristics and approximation algorithms for many variants of the clustering problem. However, these methods are often slow or infeasible for applications with massive datasets. This research will improve space and time upper bounds for clustering algorithms in the streaming model. This project will address the k-mean and k-median problems in the dynamic streaming model, extend the results on separable data when the input comes from Euclidian space, improve the bounds in the sliding window model, combine the coresets technique with novel sampling approaches and the method of smooth histograms. The PIs' previous work has already been applied to natural language processing and this project will expand this direction further and explore the important topic of "First Story Detection." Furthermore, this research will explore the similarities and differences between various sampling and sketching techniques, and how they could be used in large multidimensional astronomical databases, like SDSS (Sloan Digital Sky Survey) SkyServer. These novel approaches will provide major speedups for the execution of large statistical aggregate queries. The new streaming algorithms will be used to find substructure in very large cosmological N-body simulations.
For further information see the project web site at: http://www.cs.jhu.edu/~vova
|
0.915 |
2022 — 2025 |
Van Durme, Benjamin |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Computational Statutory Reasoning @ Johns Hopkins University
Tax law is a huge, complex body of text, paralleling how the huge, complex U.S. economy is taxed. All three branches of government continually add new text: Congress adds to the Tax Code, the IRS issues interpretations, and courts write decisions in tax cases. It is challenging if not impossible for any single human to be aware of all of tax law. This can lead to entirely-sensible tax-law authorities interacting in ways unforeseen by their authors, enabling tax-avoidance strategies used by individuals and corporations with clever tax advisors. Such strategies cost the government billions of dollars and feed public perceptions of tax unfairness. Developing artificial intelligence (AI) that can automatically understand and reason with tax-law text would have two benefits. First, tax-avoidance strategies possible with existing tax law could be identified and shut down. Second, creators of new tax-law text (congressional staffers, IRS attorneys, and judges writing opinions in tax cases) could verify that they were not inadvertently enabling new tax-avoidance strategies. <br/><br/>The aim of this project is to develop tools to automatically understand and reason with tax-law documents. This includes tax statutes and case law. The main research questions are how to reason about which statutes apply to a given case, how new statutes potentially impact previous decided cases, and how to automatically determine whether one case constitutes precedent for another case. First, this project will build benchmark datasets to measure progress on the above research goals, relying on existing expertise in dataset curation and on open legal data. Second, recent progress on converting textual data to structures supporting automated reasoning needs to be extended to the legal domain. This will require innovations in mapping language (statutes) into machine interpretable rules as compared to extracting text into data. Third, this project will develop legal domain ontologies, schemas, and information extraction models to analyze US case law. Progress on analyzing statutes and cases will involve extending capabilities in areas such as semantic parsing, entity typing, coreference, annotation science, schema induction and inference, AI system engineering, textual inference, and domain specialized language model pre-training. The effort will lead to new ways of thinking about the creation and use of legal language, with advances in natural language processing and automated reasoning, especially in the area of few-shot learning.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |