2000 — 2003 |
Kerr, Douglas Caffrey, Martin [⬀] Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
A Web-Based Relational Database For Thermodynamic and Structure Data On Lipids @ Ohio State University Research Foundation -Do Not Use
Project Abstract
DBI 9981990
Caffrey Ohio State University
A web-based Relational Database for Thermodynamic and Structure Data on Lipids
Selectively permeable membranes are an essential feature of living cells. The membrane is a bimolecular lipid leaflet in and on which are situated proteins and other molecules. Lipid mesomorphism, or liquid crystallinity, allows for the positioning of the membrane on the verge of bilayer stability thereby modulating membrane function. Interest in the mesomorphic properties of biological and synthetic lipids within the biochemistry and biophysics communities has grown enormously in the past three decades. As a result, there exists a wealth of information on lipid phase behavior, much of which is scattered throughout the literature. The objective of this proposal is to provide ready web access to these data, to lipid molecular structures and to the appropriate literature in a continuously updated relational database and a means for data submission and analysis over the web. The data come in two forms. The first refers to phase transition types, temperatures and enthalpies for lipids in different hydration states. The second concerns lipid miscibility represented graphically as isobaric and isothermal phase diagrams. The project is cross-disciplinary in that it combines the efforts of a biochemist working on lipids and membrane structural biology and a computer scientist with database expertise.
URL: http://www.lipidat.chemistry.ohio-state.edu
|
0.919 |
2003 — 2006 |
Machiraju, Raghu [⬀] Agrawal, Gagan (co-PI) [⬀] Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Software: Framework For Mining Large and Complex Scientific Datasets @ Ohio State University Research Foundation -Do Not Use
Numerical simulations are replacing traditional experiments in gaining insights into complex physical phenomena. Given recent advances in computer hardware and numerical methods, it is now possible to simulate physical phenomena at very fine temporal and spatial resolutions. As a result, the the amount of data generated is overwhelming.
Scientists are interested in analyzing and visualizing the data produced by such simulations to better understand the process that is being simulated. Analyzing such large scale data is hard. Not only the methods used are computationally expense, current programming tools make the analysis difficult to specify and modify. Thus, there is a dire need for a systematic approach, along with supporting algorithms and methodologies for flexible parallel implementations, to achieve scalable and interactive analysis on large scientific datasets.
In this project, we propose the construction of such a scalable toolkit, namely the Computational Analysis Toolkit (CAT). This toolkit proposes to exploit ongoing work in feature analysis, scalable data mining and parallel programing environments. The crux of the approach is feature-mining; a process where by regions are delineated through various stages of detection, verification, de-noising, and tracking of points of interest. Additionally, we propose the use of some key data mining mining algorithms for achieving enhanced and robust implementations of feature-mining algorithms.
It is our objective that the CAT toolkit should not only allow for the detection of features but also provide for a means to control the analysis in an interactive setting. For example, demographic and lifetime analysis of certain critical features as determined by the user/scientist may be an important way of understanding the underlying process being simulated. These critical features, once tagged via a suitable interface, can be profiled and a concise representation this profile can then be presented to the user as needed.
We believe that for long-term use of a tool for feature and data mining, it is important that a) the algorithms are parallelized on a variety of platforms, b) the parallel implementations are easy to maintain and modify, and c) APIs are available for users to rapidly create scalable implementations of new mining algorithms. We are proposing to achieve these goals by using and extending a parallelization framework developed locally. This framework, referred to as FRamework for Rapid Implementations of Datamining Engines (FREERIDE), offers high-level APIs and runtime techniques to enable parallelization of algorithms for data mining and related tasks. It allows parallelization on both distributed memory and shared memory configurations, and further supports efficient processing of disk-resident datasets.
The proposal, besides providing a useful toolkit, is likely engender the use of methodologies for large data exploration. Our efforts are likely to contribute to literature in scalable data and feature mining algorithms, and feature profile summarization.
|
0.919 |
2003 — 2008 |
Machiraju, Raghu [⬀] Wilkins, John Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr/Ngs: a Framework For Discovery, Exploration and Analysis of Evolutionary Simulation Data (Deas) @ Ohio State University Research Foundation -Do Not Use
In science the challenge is always finding a signal in the noise. Examples include hurricane forecasting and monitoring both intelligence and seismic activity. Our proposal addresses these issues through a broad framework we call generalized feature mining. The framework has two major components: feature mining, and shape-based data mining and analysis. At its core, feature mining detects features for a specific application domain. Each instance involves a specific extended shape description tailored to it. For evolutionary simulations, feature mining also tracks features across multiple temporal scales. Shape-based data mining and analysis learn from the process. The aim is to correlate information from the extended shape descriptors with transient detection to find or refine spatio-temporal rules for the evolution of features. Environmental influences, such as walls, must be built into the rules so they are predictive. To close the loop, the detected features can be displayed as they are found or refined. The evolutionary rules predicted by our framework can lead to new science { not only understanding the underlying phenomena but also leading to computationally simpler models that encapsulate the essentials.
|
0.919 |
2004 — 2009 |
Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: a Scalable Framework For Mining Scientific and Biomedical Data @ Ohio State University Research Foundation -Do Not Use
This project involves the development of a scalable framework for mining large, dynamic, biomedical and scientific datasets. This project has two research goals. The first research goal involves the development of novel methods to accurately model the relevant spatial or structural relationships embedded in such data, in particular the use of graph-based methods to model structure and geometric orthogonal polynomials to model shape. The second research goal involves the development of parallel and incremental algorithms, in conjunction with novel cluster file system support, to effectively and efficiently mine such data. A key feature of this work is the use of real-life large scale datasets as testbeds, specifically, data produced by molecular dynamics simulations to study the evolution of defects in materials, bio-molecular structure data to study structure-activity relationships, and clinical eye disease data to study the onset and progression of Keratoconus and Glaucoma disease patterns.
The educational component of this project seeks to foster, and promote a new inter-disciplinary graduate and undergraduate curriculum in data mining at the Ohio State University. Co-learning, a novel method by which students across disciplines can learn from one another and leverage each other's strengths, will be employed through the design of suitable inter-disciplinary large-scale exploratory data mining class projects.
This project will have a significant impact on how large biomedical and scientific datasets are efficiently explored and analyzed, and will enable scientists and clinicians to gain an effective understanding of the underlying scientific process involved, thereby extending the state-of-the-art in these domains.
|
0.919 |
2004 — 2008 |
Saltz, Joel (co-PI) [⬀] Kurc, Tahsin Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ngs: a Services-Oriented Framework For Next Generation Data Analysis Centers @ Ohio State University Research Foundation -Do Not Use
Abstract-CNS-0406386
We propose to develop a software framework to develop middleware service for next generation data analysis centers built on top of heterogeneous clusters: and ii) to deploy within the context of such centers, anytime knowledge discovery and data mining algorithms that operate interactively on dynamic data. The key elements of the proposed framework are: a) Storage Services: efficient mechanisms for managing, preprocessing, and accessing dynamic data in a distributed environment; b) Caching Services: efficient mechanism for supporting data and knowledge re-use; c)Scheduling Services: efficient mechanism for admission control, and resource-aware inter-task and intra-task scheduling; and d) Application Adaptability: parallel adaptive incremental data mining algorithms for key data mining tasks that interact effectively with the proposed services.
|
0.919 |
2007 — 2009 |
Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sger: An Event-Driven Approach For Analyzing Interaction Networks @ Ohio State University Research Foundation -Do Not Use
Datasets originating from many different real-world domains can be represented in the form of interaction networks in a concise and meaningful fashion. Examples abound, ranging from gene expression networks to social networks, and from the World Wide Web to protein-protein interaction networks. The study of these complex interaction networks, which are often evolving, can provide insight into their structure, properties and behavior.
Identifying the portions of the network that are changing, characterizing the type of change and extracting relevant patterns that can help predict future events and behavior are all critical challenges that need to be met in this context. To this end the PI plans to explore and design an event-driven methodology to study the evolutionary behavior of such interaction networks from the perspective of node-level and community-level viewpoints. Incorporating semantic information and leveraging graph grammars in a structured manner will also be explored in this context.
The main scientific outcome of this research will include the ability to extract, analyze and understand key patterns and features of such dynamic interaction networks in the context of end applications drawn from clinical and social settings. The broader outcomes of this work will be to train capable graduate and undergraduate students in the fields of network analysis and data mining. Women and minorities will be especially encouraged to participate and existing interactions with a local HBCU will be strengthened.
Project Page: http://www.cse.ohio-state.edu/~srini/SGER/information
|
0.919 |
2007 — 2011 |
Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Scalable Data Analysis: An Architecture Conscious Approach @ Ohio State University Research Foundation -Do Not Use
Advances in technology have enabled individuals, organizations and government agencies to collect and store massive amounts of data across all walks of human endeavor. A critical challenge here is to extract actionable information from such tera- and peta-scale data stores in as efficient manner as possible so that domain scientists can make critical advances in various fields including the sciences, engineering, medicine and homeland security.
Toward this objective, the PI seeks to employ an architecture-conscious approach to scalable data analysis on modern cluster systems interconnected through a high speed network. The central thesis of this work is that current day algorithms for data analysis often grossly under-utilize architectural resources (processors, memory, disk and network). This project seeks to address this limitation in the context of key application drivers drawn from scientific simulations, bioinformatics and security applications. Specifically locality enhancing techniques, the ability to leverage new features of modern architectures, the ability to efficiently work with large out-of-core data structures, multi-level load balancing and distribution of work among cluster nodes and mechanisms that support remote memory paging on modern clusters will be investigated and leveraged in this context.
The main scientific outcomes of this research will include the ability to process and analyze hitherto intractably large datasets enabling new scientific discoveries in the corresponding domains and the ability to engage and fully utilize the underlying parallel architecture to respond and react to domain expert queries efficiently. Another expected outcome of this work will be from specific solutions obtained to deploy generic runtime abstractions that can be used by a host of data-intensive applications. The broader outcomes of this work will be to train capable undergraduate and graduate students. Women and minorities will be especially encouraged to participate and existing interactions with a local HBCU will be strengthened through various initiatives.
|
0.919 |
2009 — 2013 |
Sadayappan, Ponnuswamy (co-PI) [⬀] Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Global Graphs: a Middleware For Data Intensive Computing
It is often the case that the time and effort required to develop effective and efficient software on high-end computing systems is the bottleneck in many areas of science and engineering. This project is building a novel middleware framework called Global Graphs that targets this bottleneck. Global Graphs takes a data-structure centric view of shared data where graph-based dynamic data structures drive the development of the rest of the system.
A key scientific outcome of this proposed framework is to allow the programmer to have multiple views of the shared data as well as multiple views of the control and tasking model. This flexibility can be leveraged along a discrete scale of data and process views depending on whether the goal is to develop a quick prototype for validating ideas on small scale problems, or the goal is efficient realization on large scale problems, or something in between these two extremes. An additional outcome will be the development of a performance feedback engine that will provide the programmer insights into parts of the program to focus on for performance tuning.
The proposed work has important implications for a range of domains requiring the processing of large scale datasets, including data mining, scientific computing and XML data management. The broader outcomes of this work will be to train capable undergraduate and graduate students. The PIs are actively encouraging under-represented minorities to participate in this effort.
|
0.948 |
2011 — 2012 |
Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Towards New Scalable Stochastic Flow Algorithms
The process of clustering or partitioning of nodes within a graph is a fundamental task with applications in many areas ranging from social network analysis to chip design and from biological network analysis to the analysis of intelligence networks. This project seeks to explore and develop a new class of algorithms for graph clustering based on the principle of stochastic flows. Such algorithms have been used effectively on small scale biological networks and have been shown to be robust to noise effects. However, widespread utilization has been limited due to the lack of scalability of the algorithm and its inability, in its current form, to accommodate domain-specific constraints on clustering.
This exploratory project seeks to address these two limitations: First, it seeks to develop a novel approach for supporting flexible clustering in the context of stochastic flow clustering, allowing users to control the skew of the resulting clustering arrangement (e.g., to ensure balanced clusters), and allowing the nodes of a graph to participate in multiple clusters (so as to allow clusters to overlap). Second, it seeks to develop solutions that can scale to very large graphs (e.g. social networks, web graphs) through the innovative applications of graph sparsification and novel parallel algorithms on high performance systems. Open source implementation of the resulting a proof-of-concept solution will be distributed to the broader scientific community.
The scientific impact of this exploratory research agenda include the following: First, if one is successful in scaling up stochastic flow algorithms to web-scale datasets while retaining its many advantages, this would open up a viable robust and improved alternative to the current state-of-the art. Second, one can also employ stochastic flow clustering algorithms in a manner analogous to spectral methods on more traditional data sources (non-graphical), enabling more wide-spread use of flow clustering algorithms.
The broader impacts of the project include increased research-based training opportunities for undergraduate and graduate students in data analytics. Additional information about the project can be found at: http://www.cse.ohio-state.edu/~srini/EAGER11/
|
0.948 |
2011 — 2015 |
Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Socs: Collaborative Research: Social Media Enhanced Organizational Sensemaking in Emergency Response
This collaborative research leverages expertise of researchers at Wright State University (IIS-1111182) and Ohio State University (IIS-1111118). Online social networks and always-connected mobile devices have created an immense opportunity that empowers citizens and organizations to communicate and coordinate effectively in the wake of critical events. Specifically, there have been many isolated examples of using Twitter to provide timely and situational information about emergencies to relief organizations, and to conduct ad-hoc coordination. However, there are few attempts that try to understand the full ramifications of using social networks in a more concerted manner for effective organizational sensemaking. This project aims to conduct multidisciplinary research involving computer and social scientists fill this gap.
This project seeks to leverage Twitter posts (tweets) as the primary source of citizen inputs and couple relevant content and network information along with microworld simulations involving human role players to measure effectiveness of various organized sensemaking strategies. To arrive at meaningful summaries of citizen input, tweet content is analyzed using a semantic content analysis by combining natural language techniques that are suitably fused with existing knowledge bases (GeoNames, Wikipedia). Content analysis is further enhanced by innovatively combining it with the dynamic analysis of the twitter network to realize concise and trustworthy information nuggets of potential interest to organizations and citizens. The resulting summaries will be fed to a suitably designed microworld simulation involving human actors to derive realistic settings for modeling disaster situations and typical organizational structures.
This project is expected to have a significant impact in the specific context of disaster and emergency response. However, elements of this research are expected to have much wider utility, for example in the domains of e-commerce, and social reform. From a computational perspective, this project introduces the novel paradigm of people-content-network analysis whose application is not just limited to organized sensemaking. For social scientists, it provides a platform that can be used to assess relative efficacy of various organizational structures using microworld simulations and is expected to provide new insights into the types of social network structures (mix of symmetric and asymmetric) that might be better suitable to propagate information in emergent situations. From an educational standpoint, the majority of funds will be used to train the next generation of interdisciplinary researchers drawn from the computational and social sciences. Research activities will also be integrated with graduate course work. Participation of underrepresented groups will be encouraged. Datasets and software developed as part of this project will be made available to the broader research community via the project page (http://knoesis.org/research/semspc/projects/socs).
|
0.948 |
2012 — 2013 |
Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ccf: Eager: Collaborative Research: Scalable Graph Mining and Clustering On Desktop Supercomputers
Real world data, such as World Wide Web, social networks, corporate knowledge networks, biological networks, semantic networks, etc., can be abstracted in the form of a massive and complex graph, with millions to billions of nodes and edges. With the explosion of such data, there is a pressing need for data mining, analysis, and querying tools to rapidly make sense of and extract knowledge. However, effectively leveraging the resources of modern architectures and mining such large graphs for interesting patterns remains challenging. At the same time commodity desktop architectures that have processors with multiple cores and graphics processors (with hundreds of stream cores) are opening up significant opportunities for parallel graph analytics and management on the desktop.
This exploratory research seeks to scale up the performance of graph mining and clustering algorithms on modern desktop supercomputers to leverage the power of multi-core systems equipped with graphics processors, and to explore and develop new algorithms for reducing the search space and and the amount of data processed.
|
0.948 |
2012 — 2016 |
Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf:Small:Collabroative Research: Elastic Fidelity: Trading Off Computational Accuracy For Energy Efficiency
Energy and power consumption have become a critical issue ranging from microarchitectures to large-scale data centers and supercomputers. Conservative estimates suggest that the information technology industry world-wide energy consumption is in excess of 400 TWh and growing, generating roughly the same carbon footprint as the airline industry, accounting for 2% of global emissions. At the same time, the power constraints of chips hamper their performance, and the shrinking transistor geometries and low supply voltages increase the severity of processor variations resulting in higher timing error rates. High error rates lead to a significant drop in yield and increased manufacturing costs, calling for designs that are able to withstand them.
This project seeks to understand and explore the novel paradigm of elastic fidelity computing. Elastic fidelity computing capitalizes on the observation that many applications can naturally tolerate errors, and that not all of them need to run at 100% fidelity all the time. Specifically, the goal of this work is to understand the error models of various hardware components as they relate to data movement, storage, and computation, and simultaneously to understand the error resiliency of applications and re-architect them to leverage elastic fidelity.
Elastic fidelity offers potentially transformative effects for science and society, by challenging conventional wisdom and taking a fresh look at the interplay of errors, output quality and energy efficiency for an important class of pervasive streaming and data-intensive applications. More specifically, elastic fidelity promises significant energy savings that can put computing on an environmentally sustainable path, by lowering the operational costs in major economic sectors, and making the manufacturing of future chips cheaper by relaxing the accuracy requirements of hardware components. The results of this research will be disseminated through publications, workshops, advanced curriculum, and releases of the developed infrastructure in the public domain. To accelerate broad societal effects, the project participants will seek to foster technology transfer by promoting collaboration and industry involvement through presentations and site visit
|
0.948 |
2014 — 2017 |
Sivakoff, David Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sampling and Inference in Network Analysis
The study of complex networks constitutes an interdisciplinary area of inquiry that transcends traditional knowledge domains by focusing on the fundamental interdependencies of components within various systems-of-interest. Examples abound from social networks to coupled human and natural systems, from financial networks to disease systems, and from telecommunication networks to energy and power systems. It is the interconnection among these components that often sit at the heart of our most vexing global grand challenge problems, including climate change, energy demands, security, health and wellness, and livelihood and poverty. The study of such complex systems and often large scale networks -- understanding their intrinsic properties, changes to their structure over time or due to external factors, multi-scale behavior of individuals to coarser grained modular communities -- can afford important insights to individuals, organizations and society at large when tackling such grand challenge problems.
This project seeks to develop robust and scalable sampling methods for the modeling and analysis of large, potentially dynamic, networks. Sampling is often touted as a means to efficiently combat the inherent complexity of estimating the relevant characteristics of a population. Sampling a network is complicated because they are composed of two units (nodes and edges) that are not always nicely nested. A key objective will be to study and provide a sound mathematical basis along with high performance tools for both node-centric and edge-centric sampling methodologies for the analysis and modeling of networks. The objective of realizing high performance tools for real world applications, drawn from social networks and network biology, will be equally significant, and is necessary for sustained innovation of an inter-disciplinary nature. This research will shed light on the theoretical underpinnings of graph sampling and probabilistic inference in both the static and dynamic network contexts. From an educational standpoint, the investigators will train the next generation of graduate students in this interdisciplinary arena and will also actively encourage participation of undergraduates and under-represented minorities.
|
0.948 |
2015 — 2017 |
Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Practical Graph Sparsification On Gpus
A large number of real-world problems can be concisely abstracted and modeled as a graph or a network. A fundamental challenge with processing and analyzing such graphs is the issue of scale, millions or billions of nodes and billions and trillions of edges. While advances in technology have led to the development of faster and better architectures, simply porting existing codes to such architectures will not suffice -- performance gains are typically not commensurate with advances in technology in part due to the inherent data movement costs associated with such algorithms. This project seeks to investigate two complementary strategies (graph sparsification and architecture-aware algorithm designs) to address this challenge head on. The key outcomes of this research will be algorithmic and systemic innovations that can radically impact next generation graph analytic systems. This effort is expected to provide a model for the research, education and training of both undergraduate and graduate students including those from under-represented groups.
With respect to innovation, practical graph sparsification strategies as a generic strategy to scaling down the data movement requirements of modern graph and network analysis algorithms will be investigated. Specifically, innovative hashing-based approaches to accommodate edge directionality, weighted graphs, and heterogeneous content will be developed. Additionally, radically new ways to implement and re-architect such analysis algorithms on current and next generation Graphics Processor Unit (GPU)-based systems while expicitly accounting for data movement costs within the architecture will be designed. Specifically, a novel sketching strategy will be employed for this purpose. In terms of impact, the sparsification-based approach can be significant in terms of the wide use and application of such strategies for scaling up tasks such as link prediction, community discovery, and collective classification and deploying them on modern GPUs. Exemplar outcomes are expected to include a high performance GPU-based network analysis tools for data scientists, and the interdisciplinary training of students in data mining, network science and high performance computing leveraging research in pedagogy, in conjunction with Ohio State University's new undergraduate major in data analytics.
For further information see the project web site at: http://www.cse.ohio-state.edu/~srini/GraphSpar/
|
0.948 |
2015 — 2019 |
Liu, Desheng (co-PI) [⬀] Kubatko, Ethan (co-PI) [⬀] Shalin, Valerie (co-PI) [⬀] Sheth, Amit (co-PI) [⬀] Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Hazards Sees: Social and Physical Sensing Enabled Decision Support For Disaster Management and Response
Infrastructure systems are a cornerstone of civilization. Damage to infrastructure from natural disasters such as an earthquake (e.g., Haiti, Japan), a hurricane (e.g., Katrina, Sandy), or a flood (e.g., Kashmir floods) can lead to significant economic loss and societal suffering. Human coordination and information exchange are at the center of damage control. This project aims to radically reform decision support systems for managing rapidly changing disaster situations by the integration of social, physical and hazard models. The researcher team will serve as a model for highly integrative and collaborative work among researchers in computer science, engineering, natural sciences, and the social sciences for research, education, and training of undergraduate and graduate students, including those from under-represented groups.
The team seeks to design novel, multi-dimensional, cross-modal aggregation and inference methods to compensate for the uneven coverage of sensing modalities across an affected region. They use data from social and physical sensors as input into an integrated model, from which they are designing a new methodology to predict and prioritize the consequences of damage; they are including both temporally and conceptually extended consequences of damage to people, civil infrastructure (transportation, power, waterways) and their components (e.g., bridges, traffic signals). They are developing innovative technology to support the identification of new background knowledge and structured data to improve object extraction, location identification correlation, and integration of relevant data across multiple sources and modalities (social, physical and Web). They use novel coupling of socio-linguistic and network analysis to identify important persons and objects, statistical and factual knowledge about traffic and transportation networks, and the resulting impact on hazard models (e.g. storm surge) and flood mapping. They are developing domain-grounded mechanisms to address pervasive trustworthiness and reliability concerns. Exemplar outcomes include specific tools for first-responders and recovery teams to aid in the prioritization of relief and repair efforts as well as improved flood response, urban mapping, and dynamic storm surge models. They also are providing interdisciplinary training of students, leveraging research in pedagogy in conjunction with Ohio State University's new undergraduate major in data analytics and Wright State University's Big and Smart Data graduate certificate program.
|
0.948 |
2016 — 2019 |
Sadayappan, Ponnuswamy [⬀] Parthasarathy, Srinivasan Pouchet, Louis-Noel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Xps: Full: Collaborative Research: Paragraph: Parallel, Scalable Graph Analytics
Many real world problems can be effectively modeled as complex relationship networks or graphs where nodes represent entities of interest and edges mimic the interactions or relationships among them. The number of such problems and the diversity of domains from which they arise is growing. However developing high-performance applications to extract useful information from such datasets is very challenging. Graphical processing units are very attractive for such applications because they offer higher computational performance and energy efficiency than standard multi-core processors. However, the development of high-performance applications for them is currently much more challenging than parallel program development for standard multi-core processors. Effective application development to use graphical processing units generally requires that developers possess considerable expertise on their architectural characteristics and use specialized programming models and performance optimization techniques. Thus, simultaneously achieving high performance and high user productivity for data analytics applications for such devices is a daunting challenge.
This project proposes a scalable high-level software framework to enable the productive development of high-performance applications for graphical processing units. It features two distinct abstractions to address the performance and productivity challenges in developing graph/data analytics applications: 1) a frontier-centric abstraction that is based on a common iterative characteristic of many of these applications, with a dynamically moving active frontier of vertices (or edges) where computation is centered, and 2) an abstraction based on sparse linear algebra primitives, exploiting the dual relationship between sparse matrices and graphs. A benchmark suite of graph analytics applications will be developed and evaluated using both abstractions, enabling insights into the effectiveness of these alternate high-level abstractions for a range of analytics applications. The benchmark suite and the software framework will be publicly released.
|
0.948 |
2016 — 2018 |
Parthasarathy, Srinivasan Sadayappan, Ponnuswamy [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Towards Automated Characterization of the Data-Movement Complexity of Large Scale Analytics Applications
We have entered a new era where power/energy limitations have become fundamental drivers of technological trends. The cost in both time and energy for moving data from off-chip main memory to the processor is significantly higher than the cost of a double-precision floating-point computation. With future technologies, this ratio will only get worse. Therefore the characterization of the inherent data movement costs of algorithms is very important, and is particularly critical for large scale data-analytic applications. However, unlike the well-understood computational complexity of algorithms, the data movement complexity is known only for a small number of algorithms.
Prior techniques for characterizing the data movement complexity of algorithms has either been restricted to subclasses of computations, or has required ad hoc manual reasoning. This project develops a scalable automated tool for analyzing the data movement complexity of arbitrary unstructured computations, expressed as computational directed acyclic graphs (CDAGs). The researchers explore several directions including out-of-core strategies, decomposition/recomposition of graphs, directional component analysis, and empirical function fitting, to address scalability challenges.
|
0.948 |
2017 — 2018 |
Wang, Yang (co-PI) [⬀] Blanas, Spyros Parthasarathy, Srinivasan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Eager: Hi-Hdfs - Holistic I/O Optimizations For the Hadoop Distributed Filesystem
File systems and their outdated POSIX "byte stream" interface suffer from an impedance mismatch with the versatile I/O requirements of today's applications. Specifically, the I/O path from the application to the raw storage device is becoming longer and it involves the interplay of intricate software and hardware components. This produces complex aggregate I/O patterns that application developers (often subject matter experts with limited knowledge of how massive concurrency creates I/O bottlenecks) cannot optimize based on intuition alone. File systems that tout their high scalability, such as the Hadoop distributed file system, largely do so by limiting applications to sequential access patterns. The question of whether one can accelerate the I/O performance of the Hadoop distributed file system for analytical applications with complex data models that cannot readily serialize data contiguously for fast sequential access remains open. This project seeks to address this question and build HI-HDFS -- a framework that automatically collects and manages semantically richer I/O metadata to guide object placement in the Hadoop distributed file system. The HI-HDFS framework synthesizes the I/O activity across software components throughout the datacenter in a navigable graph structure to identify application-agnostic motifs in I/O activity. A novel I/O forecasting technique identifies and ameliorates bottlenecks at large scale by inspecting I/O activity from small-scale runs. Overall, the HI-HDFS framework challenges the I/O optimization mantra that manual data placement is the cornerstone of I/O performance and paves the way towards next-generation object-centric storage systems for high-performance computers. The efficacy of this automated approach will be examined on a complex data processing workload from the domain of emergency response which exhibits I/O patterns that are characteristic of modern analytical applications. The broader impacts of this work are expected to include open-source prototype implementations as well as pedagogical impact on a cloud computing course for both Computer Science and Data Analytics undergraduate majors at Ohio State.
|
0.948 |
2020 — 2023 |
Panda, Dhabaleswar [⬀] Machiraju, Raghu (co-PI) [⬀] Parthasarathy, Srinivasan Ramnath, Rajiv (co-PI) [⬀] Parwani, Anil |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mri: Radical: Reconfigurable Major Research Cyberinfrastructure For Advanced Computational Data Analytics and Machine Learning
The analysis of high-resolution images in both two and three-dimensions is becoming important for many scientific areas, such as in medicine, astronomy and engineering. Discoveries in these disciplines often require analyzing millions of images. The analysis of these images is complex and requires many steps on powerful computers. Some of these steps require looking through lots of images while some of these steps require deep analysis of each image. In many cases, these analyses have to be completed quickly, i.e. in "real-time", so that information and insights can be provided to humans as they do their work. These kinds of operations require powerful computers consisting of many different, heterogeneous but simple computing components. These components need to be configured and reconfigured so that they can efficiently work together to do these large-scale analyses. In addition, the software that controls these computers also has to be intelligently designed so that these analyses can be run on the right types of configurations. This project aims to acquire the necessary computing components and assemble such a powerful computer (named RADiCAL). Research done using RADiCAL will result in important scientific discoveries that will make us more prosperous, improve our health, and enable us to better understand the world and universe around us. Doing this research will also educate many students, including those from under-represented groups, who will become part of a highly-trained workforce capable of addressing our nation's needs long into the future.
The intellectual merit of RADiCAL is in the design a novel, high-performance, next-generation, heterogeneous, reconfigurable hardware and software stack to provide real-time interaction, analytics, machine/deep learning (ML/DL) and computing support for disciplines that involve massive observational and/or simulation data. RADiCAL will be built from commodity hardware, and designed for reconfiguration and observability. RADiCAL will enable a comprehensive research agenda on software that will facilitate rapid and flexible construction of analytics workflows and their scalable execution. Specific software research include: 1) a library with support for storage and retrieval of multi-resolution, multi-dimensional datasets, 2) scalable learning and inference modules, 3) data analytics middleware systems, and 4) context-sensitive human-in-the-loop ML models and libraries that encode domain expertise, coupling tightly with both lower level layers and the hardware components to facilitate scalable analysis and explainability. With the proposed hardware acquisition and software research, the transformative goal will be to facilitate decision-making and discovery in Computational Fluid Dynamics (CFD) and medicine (pathology). With respect to broader impacts, RADiCAL will provide a unique research, testing, and training infrastructure that will catalyze research in multiple disciplines as well as facilitate convergent research across disciplines. The advanced imaging applications and techniques for expert-assisted image analysis will be broadly applicable to other human-in-the-loop systems and have the potential to advance medicine and health. Projects that use RADiCAL will also provide unique test-beds for valuable empirical research on human-computer interaction and software engineering best practices. Well-established initiatives at The Ohio State University will facilitate the recruitment of graduate and undergraduate students from underrepresented groups for involvement in using the cyberinfrastructure. The heterogeneous and reconfigurable research instrument will be utilized to create sophisticated educational modules on how to co-design computational science experiments from the science goals to the underlying cyberinfrastructure. Tutorials and workshops will be organized at PEARC, Supercomputing and other conferences to share the research results and experience with the community.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.948 |
2020 — 2021 |
Panda, Dhabaleswar (co-PI) [⬀] Parthasarathy, Srinivasan Teodorescu, Radu (co-PI) [⬀] Blanas, Spyros Subramoni, Hari (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Pposs: Planning: a Cross-Layer Observable Approach to Extreme Scale Machine Learning and Analytics
The ability to analyze and learn from large volumes of data is becoming important in many walks of human endeavor, including medicine, science, and engineering. Analysis workflows for high-resolution images (e.g. medical imaging, sky surveys), scientific simulations, as well as those for graph analytics and machine learning are typically time consuming because of the extreme scales of data involved. While the hardware elements of the modern data center are undergoing a rapid transformation to embrace the storage, processing, and analysis of needs of such applications - understanding of how the different layers of the systems stack interact with one another and contribute to end-to-end application performance is challenging. This planning project envisions the ACROPOLIS framework to address these challenges. ACROPOLIS will enable a comprehensive research agenda on systems software that will facilitate rapid and flexible construction of analytics workflows and their scalable execution. By facilitating the rapid prototyping of application drivers ACROPOLIS can also enable important scientific discoveries to potentially improve human health and better understand the world around us. The research enabled by ACROPOLIS will also educate many students, including those from under-represented groups, who will become part of a highly-trained workforce capable of addressing our nation's needs long into the future. With respect to broader impacts, ACROPOLIS will provide a unique research and training infrastructure that will catalyze research in multiple disciplines as well as facilitate convergent research across disciplines. Well-established initiatives at The Ohio State University, such as the Louis Stokes Alliances for Minority Participation (LSAMP) as well as new programs in Data Analytics, will facilitate the recruitment of graduate and undergraduate students for involvement in this research agenda. This project is aligned with two of NSF?s 10 Big Ideas: Harnessing the Data Revolution and Growing Convergence Research, as well as the American AI Initiative.
The project addresses five key research pillars: 1) Flexible abstractions for parallel computation and data representation, 2) Modeling data movement complexity at extreme scales, 3) Pattern-driven scalable communication and I/O systems, 4) Near-memory architectures for machine learning and analytics, and 5) Cross-layer observability and introspection. Specifically, the focus is on the design of an end-to-end framework inculcating a high-performance, next-generation, heterogeneous, reconfigurable hardware and software stack to facilitate real-time interaction, analytics, and machine learning for a range of scientific disciplines including Computational Pathology and Computational Fluid Dynamics and Emergency Response.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.948 |
2021 — 2022 |
Sheth, Amit (co-PI) [⬀] Shalin, Valerie (co-PI) [⬀] Parthasarathy, Srinivasan Garrett, R. Hyder, Ayaz |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nsf Convergence Accelerator Track F: Actionable Sensemaking Tools For Curating and Authenticating Information in the Presence of Misinformation During Crises
High volume, rapidly changing, diverse information, which often includes misinformation, can easily overwhelm decision makers during a crisis. Decisions made both during and long after a crisis, affect the trust between responsible decision makers and citizens (many from vulnerable populations), who are impacted by those decisions. This project seeks to help decision makers manage information, promoting reliance on authentic knowledge production processes while also reducing the impact of intentional disinformation and unintended misinformation. The project team will develop a suite of prototype tools that bring timely, high-quality integrated content to bear on decision making and governance, as a routine part of operations, and especially during a crisis. Integrated and authenticated content comprising scientific facts and technical information coupled with citizen and stakeholder viewpoints assure the accuracy of safety decisions and the appropriate prioritization of relief efforts. The project team will synthesize convergent expertise across multiple disciplines; engage and build stakeholder communities through partnerships with government and industry to guide tool development; build a prototype tool for authenticating data and managing misinformation; and validate the tool using real world crisis scenarios.
The project team will create use-inspired personalized AI-driven sensemaking prototype tools for decision-makers to comprehend and authenticate dynamic, uncertain, and often contradictory information to facilitate effective decisions during crises. The tools will focus on curation while accounting for source and explainable content credibility. Guidance from community stakeholders obtained using ethnographic methods will ensure that the resulting tools are practical, timely, and relevant for informed decision making. These tools will capitalize on features of the information environment, human cognitive abilities and limitations, and algorithmic approaches to managing information. In particular, content and network analyses can reveal constellations of sources with a higher probability of producing credible information, while knowledge graphs can help surface and organize important materials being shared while facilitating explainability. The project team will also design and develop a microworld environment to examine and improve tool robustness while simultaneously helping to train decision makers in real-world settings such as school districts and public health settings. This project represents a convergence of disciplines spanning expertise in computer science, social sciences, linguistics, network science, public health, cognitive science, operations, and communication that are necessary to achieve its goals. Partnerships between communities, government industry, and academia will ensure the deliverables are responsive to stakeholder needs.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.948 |
2021 — 2024 |
Parthasarathy, Srinivasan Ning, Xia |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii:Small: Interpretable Deep Generative Models For Drug Development
Drug discovery is time-consuming and costly: it takes approximately 10-15 years and between $500 million to $2 billion to fully develop a new drug. Molecule optimization is a critical step in drug discovery to improve desired properties of drug candidates through chemical modification. For example, in lead (molecules showing both activity and selectivity towards a given target) optimization, the chemical structures of the lead molecules can be altered to improve their selectivity and specificity. Conventionally, this process is facilitated based on knowledge, intuition and experience of medicinal chemists, and is done via fragment-based screening or synthesis. Such an approach is not scalable. The objective of this project is to develop a new class of Artificial Intelligence (AI) methods and tools to conduct in silico molecule generation. Specifically, this project will focus on the following important aspects in AI-based in silico molecule optimization: 1) major scaffold retention, 2) molecule diversity, 3) molecule synthesizability; 4) multi-property optimization; and 5) interpretability. The central hypothesis underlying the proposed research is that the increasing amount of publicly available molecule data, including molecule properties, synthesis pathways and drug-likeness, contains a wealth of information that, if properly analyzed and utilized, can provide key insights in revealing, characterizing and automating the computational molecule generation and optimization process.
Developing a new class of AI methods for in silico drug molecule optimization will require the development of novel AI models and methods for in silico molecule optimization. Examining designs based on new deep generative models, deep graph convolutional networks, conditional sampling approaches and reinforcement learning methods that learn from pairs of molecular graphs, and accordingly generate new molecular graphs with improved biochemical and biophysical properties, is necessary. The proposed research will also provide a holistic framework to explore prospective molecules that are sufficiently different from one another; and will investigate molecular graph search approaches and Bayesian optimization methods to guide search in the latent embedding (representation) space. For multi-property optimization, the proposed research will provide a pipeline structure and new reinforcement learning approaches. To understand and facilitate interpretable generative models, the proposed research will develop a set of novel methods including network dissection, perturbation-based attribution methods, self-explaining methods and disentanglement. This project will have substantial societal and educational impacts, and will enhance diversity in STEM through education and research dissemination. The broader scientific contributions of the will be the development of innovative AI methodologies and tools that will aid drug development. These technical innovations will not only address the key computational challenges in generative models for molecules, but also potentially generalize to other problems (e.g. cheminformatics, materials design) in which generation of structural data is highly needed and interpretation of such generation process is critical. The proposed research can potentially reduce the investment costs during drug discovery, increase its successful rate significantly, and ultimately aid in the improvement of the US health care quality.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.948 |