2011 — 2017 |
Merchant, Nirav Orcutt, John Rajasekar, Arcot Moore, Reagan Regli, William Goodall, Jonathan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Datanet Full Proposal: Datanet Federation Consortium @ University of North Carolina At Chapel Hill
Major science and engineering initiatives are dependent upon massive data collections that comprise observational data, experimental data, simulation data, and engineering data. To support science and engineering collaborations, a policy driven national data management infrastructure will be implemented. The implementation prototype will address both the life cycle of science and engineering data and the sustainability of data collections and repositories over time, across changes in technology and changes in usage. The motivation for building the national infrastructure comes from the data management requirements of the NSF Ocean Observatories Initiative (real-time data streams, simulation output, video), the NSF Consortium of Universities for Advancement of Hydrologic Science (point data), engineering projects in education and CAD/CAM/CAE archives, the iPlant collaborative (genome databases), the Odum social science institute (statistics), and the NSF Science of Learning Centers (EEG / MRI sensor data, video).
The approach is based on a bottom-up federation of existing data management systems through use of the integrated Rule-Oriented Data System (iRODS). Each of the referenced national initiatives has implemented a core data management system based upon the iRODS data grid technology. Through federation, the independent systems can be assembled into a national data infrastructure that integrates collections across project-specific technology (such as real-time sensor data acquisition systems), institutional repositories, regional data grids, federal repositories, and international data grids. The resulting infrastructure will enable collaborative research among researchers in academic institutions and federal agencies, and across national boundaries.
Evolution of the policies (computer actionable rules) and procedures (computer executable workflows) that govern each stage of the data life cycle will be supported. Specific policies and procedures will be implemented for each domain to support their community standards for managing data in their local data grid. The project will develop the interoperability mechanisms required to share data between the domains, develop sets of policies and procedures to govern the data life cycle stages, and develop policies and procedures that enable re-use of collections. The national data management infrastructure will demonstrate enforcement of data management policies that comply with NSF Data management and preservation requirements.
|
0.964 |
2013 — 2018 |
Antin, Parker [⬀] Merchant, Nirav Goff, Stephen (co-PI) [⬀] Lyons, Eric (co-PI) [⬀] Ware, Doreen Vaughn, Matthew Stanzione, Daniel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
The Iplant Collaborative: Cyberinfrastructure For the Life Sciences
iPlant is a new kind of virtual organization, a cyberinfrastructure (CI) collaborative created to catalyze progress in computationally-based discovery in plant biology. iPlant has created a comprehensive and widely used CI, driven by community needs, and adopted by a number of large-scale informatics projects and thousands of individual users. The project has laid a strong foundation to build an increasingly more capable CI and is poised to have an even greater impact on the plant sciences and a number of related fields, with a new focus on addressing computational bottlenecks for a broad number of life science researchers.
In the next five years, iPlant will continue to enhance the capabilities of a comprehensive CI and will also expand the scope to cover a number of new fields of inquiry. In iPlant's initial phase, Grand Challenge projects were defined to shape community requirements for the design of the CI. The two projects, Genotype-to-Phenotype and the Tree of Life, led to new analytical tools and competences for genomics and evolutionary biology. Future work will advance these capabilities and expand into capture and modeling of phenotypic, environmental, and ecological data. As before, this growth will be motivated by community needs and accomplished by community collaboration.
iPlant will continue to actively partner with other large CI development efforts and will coordinate CI development where feasible, appropriate, and mutually beneficial. iPlant will continue to be the underlying infrastructure provider for a number of projects that provide a variety of bioinformatics services. While continuing to support plant biology discovery research, iPlant will expand scope beyond the plant sciences, in coordination with nascent animal-centered efforts. The project will continue to adapt to the rapidly changing needs of the life sciences community and the rapidly changing technological landscape faced by researchers.
The intellectual merit of the project is in advancing the state of modern biology. Without question, research progress in the plant sciences, and in life sciences more generally, is increasingly limited by data and computational challenges. As knowledge of plant biology increases, the field will progress from informatics-based discovery to predictive modeling and eventually to synthetic biology. A comprehensive CI that eliminates the bottlenecks of data management, data standards, file formats, analysis, efficient collaboration, and knowledge dissemination will be a necessary underlying enabler to achieve this vision, and iPlant is positioned to be this enabling infrastructure.
The broader impacts of the project are numerous. The CI currently supports thousands of end users through its data storage, cloud, and online analytical capabilities. As a service provider, iPlant underlies a number of other online biological information resources. The project will continue its wide-ranging and successful education and outreach efforts, and will teach computational skills to learners at all levels, with particular focus on faculty to enable a sustained culture change that incorporates these advanced skills into the teaching of biology.
|
1 |
2014 — 2019 |
Stewart, Craig [⬀] Foster, Ian Vaughn, Matthew Merchant, Nirav Taylor, James (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
High Performance Computing System Acquisition: Jetstream - a Self-Provisioned, Scalable Science and Engineering Cloud Environment
High Performance Computing System Acquisition: Jetstream - a self-provisioned, scalable science and engineering cloud environment
Jetstream will be a new type of computational research resource open for the national (nonclassified) research community - a data analysis and computational resource that US scientists and engineers will use interactively to conduct their research anytime, anywhere. Jetstream will complement current NSF-funded computational resources and bring a cloud-based system to the NSF computational resources incorporating the best elements of commercial cloud computing resources with some of the best software in existence for solving important scientific problems. This system will enable many US researchers and engineers to make new discoveries that are important to understanding the world around us and will help researchers make new discoveries that improve the quality of life of American citizens.
In terms of technical details, Jetstream will be a configurable large-scale computing resource that leverages both on-demand and persistent virtual machine technology to support a much wider array of software environments and services than current NSF resources can accommodate. As a fully configurable "cloud" resource, Jetstream bridges the obvious major gap in the current ecosystem, which has machines targeted at large-scale High-Performance Computing, high memory, large data, high-throughput, and visualization resources. As the open cloud for science, Jetstream will:
*Provide "self-serve" academic cloud services, enabling researchers or students to select a VM image from a published library, or alternatively to create or customize their own virtual environment for discipline- or task-specific personalized research computing.
*Host persistent VMs to provide services beyond the command line interface for science gateways and other science services. For example, Jetstream will become a primary host of the popular Galaxy scientific workbench and its main datasets, bringing many Galaxy users to the NSF ecosystem from day one.
*Enable new modes of sharing computations, data, and reproducibility.
*Expand access to the NSF XSEDE ecosystem by making virtual desktop services accessible from institutions with limited resources
|
0.958 |
2015 — 2019 |
Peterson, Larry Merchant, Nirav Xu, Hao Bavier, Andrew (co-PI) [⬀] Baker, Scott (co-PI) [⬀] Baker, Scott (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cc*Dni Dibbs: Give Your Data the Edge: a Scalable Data Delivery Platform
Scientific collaboration is increasingly data driven. Large volumes of data are generated, aggregated, archived, and shared?often with collaborators (and their compute resources) spread across the globe - with subsequent analysis generating still more data to archive and distribute. Scientists either become experts at data transfer, storage, and management, or their ability to build on each other?s work suffers. This project addresses this challenge, with an emphasis on easing the burden on the user, building a solution that is sustainable over the long term, and delivering performance that scales in the number of collaborators.
The project's technical approach is to deploy a general-purpose storage platform, called Syndicate, that harnesses a collection of available storage components to provide a global, scalable, and secure storage service. These include public and private cloud storage (for data durability), network caches and content distribution networks (for scalable read bandwidth), and local disks (for local reads/write performance). The project's goal is to make it possible for applications to access data independent of where it is stored, where Syndicate simultaneously: (1) minimizes the operational burden imposed on users, (2) maximizes the use of commodity infrastructure; and (3) maximizes aggregate I/O performance. At its core, Syndicate's value proposition is to fully decouple storage semantics from infrastructure. This lets users select infrastructure based on its cost/performance trade-off, while ensuring that their domain-specific storage requirements are met. And to demonstrate this value, the project is building a pilot deployment that spans edge resources on nine campuses, and operating the capability on behalf of a diverse set of application domains, ranging from biology to network analytics to personalized medicine to retail and consumer analysis.
|
1 |
2017 — 2022 |
Pollock, Tresa (co-PI) [⬀] Manjunath, Bangalore [⬀] Roy Chowdhury, Amit Merchant, Nirav Miller, Robert (co-PI) [⬀] Miller, Robert (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Si2-Ssi: Limpid: Large-Scale Image Processing Infrastructure Development @ University of California-Santa Barbara
Scientific imaging is ubiquitous: From materials science, biology, neuroscience and brain connectomics, marine science and remote sensing, to medicine, much of the big data science is image centric. Currently, interpretation of images is usually performed within isolated research groups either manually or as workflows over narrowly defined conditions with specific datasets. This LIMPID (Large-scale IMage Processing Infrastructure Development) project will have a transformative impact on such discipline-centric workflows through the creation of an extensive and unique resource for the curation, distribution and sharing of scientific image analysis methods. The project will create an image processing marketplace for use by a diverse community of researchers, enabling them to discover, test, verify and refine image analysis methods within a shared infrastructure. As a freely available, cloud-based resource, LIMPID will facilitate participation of underrepresented groups and minority-serving institutions, as well as international scientists, allowing them to address questions that would otherwise require expensive software. The potential impacts of the project are significant: from wide dissemination of novel processing methods, to development of automatic methods that can leverage data and human feedback from large datasets for software training and validation. For the broader scientific community, this immediately provides a resource for joint data and methods publication, with provenance control and security. This in turn will facilitate faster development and deployment of tools and foster new collaborations between computer scientists developing methods and scientific users. The project will prepare a diverse cadre of students and researchers, including women and members of under-represented groups, to tackle complex problems in an interdisciplinary environment. Through workshops, participation at scientific meetings, and summer undergraduate research internships, a broad community of users will be engaged to actively contribute to all aspects of research, development, and training during the course of this project.
The primary goal is to create a large scale distributed image processing infrastructure, the LIMPID, though a broad, interdisciplinary collaboration of researchers in databases, image analysis, and sciences. In order to create a resource of broad appeal, the focus will be on three types of image processing: simple detection and labelling of objects based on detection of significant features and leveraging recent advances in deep learning, semi-custom pipelines and workflows based on popular image processing tools, and finally fully customizable analysis routines. Popular image processing pipeline tools will be leveraged to allow users to create or customize existing pipeline workflows and easily test these on large-scale cloud infrastructure from their desktop or mobile devices. In addition, a core cloud-based platform will be created where custom image processing can be created, shared, modified, and executed on large-scale datasets and apply novel methods to minimize data movement. Usage test cases will be created for three specific user communities: materials science, marine science and neuroscience. An industry supported consortium will be established at the beginning of the project towards achieving long-term sustainability of the LIMPID infrastructure.
This project is supported by the Office of Advanced Cyberinfrastructure in the Directorate for Computer & Information Science and Engineering and the Division of Materials Research in the Directorate for Mathematical and Physical Sciences.
|
0.964 |
2018 — 2020 |
Darabi Sahneh, Faryad (co-PI) [⬀] Kobourov, Stephen (co-PI) [⬀] Merchant, Nirav Papes, Monica (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Tripods+X:Vis: Data Science Pathways For a Vibrant Tripods Commons At Scale
Scientists in diverse domains from astronomy and atmospheric sciences, to earth sciences and genomics are generating massive datasets at an unprecedented scale. Rapidly evolving computational and data management technologies for harnessing value from these datasets are providing the foundation for a vibrant ecosystem by establishing robust collaborations and building communities of domain scientists, data scientists, and engineers. These collaborations are central for transforming these datasets into information and knowledge. Barriers of both a technical and non-technical nature can hamper productivity for such transdisciplinary teams and collaborations, especially when highly productive teams with diverse expertise and computational backgrounds work on common problems. These barriers are often associated with frictions at the boundaries of computational technologies and human communications, especially when working at scale. Overcoming such challenges is critical for ensuring successful outcomes.
This project will bring together participants representing thought-leaders and practitioners in data-driven open science projects, TRIPODS+X project teams, and participants from the astronomy and earth sciences communities through two Innovation Labs. The first Lab will introduce participants to the national NSF-funded cyberinfrastructure and commercial cloud infrastructure, providing the opportunity to evaluate and learn from exemplary projects that have utilized these platforms for their collaborations, allowing participants to explore how their communities can extend these platforms for their data science projects in a reliable, scalable and reproducible manner. The second Innovation Lab will establish an early prototype TRIPODS Commons, a cohesive platform for showcasing, experimenting with, and sharing research products (code, data, methods), eventually becoming an avenue that provides visibility to the vibrancy and productivity of projects occurring at all NSF TRIPODS Institutes. Through these Innovation labs, the project will provide pragmatic approaches and pathways for establishing successful transdisciplinary collaborations that enable teams to work across domains and institutional boundaries, and at scales essential for addressing the research, education, and advanced cyberinfrastructure needs as outlined in NSF's 10 Big Ideas.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2019 — 2021 |
Enquist, Brian [⬀] Merchant, Nirav |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Near Term Forecasts of Global Plant Distribution, Community Structure, and Ecosystem Function
This project is the first to explore how plant species distributions across the entire globe may respond to global change. The project brings together ecologists, environmental engineers, data scientists, and conservation stakeholders to determine optimal ways to integrate these data sources to make near term forecasts for all plants globally by addressing changes in (1) species' abundance and geographic distribution, (2) community structure, and (3) ecosystem function. This three-pronged approach is designed to span a range of approaches to understand the spectrum of possible futures consistent with current knowledge while integrating knowledge across scales of biological organization. These forecasts will be used along with input from conservation stakeholders to assess how differing conservation decisions can minimize the impacts of global change responses. An ultimate goal of the project is to automate a pipeline to ingest new incoming data, update forecasts, and serve these to end-users to enable a near-real time forecasting workflow to provide best-available predictions at any given time to inform conservation decisions.
A key aspect of these forecasts is their reliance on novel environmental information that better characterize the conditions that influence plant performance, including soil moisture and extreme weather events based on NASA satellite observations. These species-level predictions will be linked to community demography models that integrate a variety of relatively untapped data sources for understanding global change, including plant trait data, community plot data across the globe, highly detailed plot data from National Ecological Observatory Network (NEON) and Long Term Ecological Research (LTER) sites, and global biomass data from NASA's Global Ecosystem Dynamics Investigation (GEDI) mission. By integrating this wide variety of data sources, the mechanistic understanding needed to make robust near term forecasts can be made, to understand ecosystem properties like Net Primary productivity, Carbon stock, and resilience. Based on workshops with conservation stakeholders, researchers will determine how best to use this unique suite of forecasts to best inform different conservation questions in different regions of the world. The project will also result in an open, cleaned and curated database on global plant distributions. This will aid others in exploring data and predictions by delivering and visualizing complex future scenarios in an easy to use portal. All results of the project can be found at the website for the Biodiversity Informatics and Forecasting Institute or BIFI, at https://enquistlab.github.io/BIFI .
This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2020 — 2025 |
Jacobs, Gwen (co-PI) [⬀] Vaughn, Matthew Pierce, Marlon Merchant, Nirav Hancock, David |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Category I: Jetstream 2: Accelerating Science and Engineering On-Demand
The frontiers of science are rapidly evolving in regard to availability of data to be analyzed and the breadth and variety of analytical tools that researchers use. To effectively analyze and make sense of this ever-growing cache of information, and to make it possible to leverage new artificial intelligence tools for research, researchers need on-demand, interactive, and programmatic cyberinfrastructure (CI) delivered via the cloud. Jetstream 2 is a system that will be easy to expand and reconfigure, and capable of supporting diverse modes of on-demand access and use, The system will also revolutionize the national cyberinfrastructure (CI) ecosystem by enabling ?AI for Everyone? with virtual GPU capabilities and widespread outreach through the five partners, led by Indiana University. The project promises to enable the research community to use a greater variety of computational resources and to expand its reach into student populations, drawn from a broad range of disciplines, thus contributing to building the future STEM workforce.
Jetstream 2 will be an 8 PetaFLOPS (PFLOPS) cloud computing system using next-generation AMD ?Milan? CPUs and the latest NVIDIA Tensor Core GPUs with 18.5 petabytes (PB) of storage. Consisting of five computational systems, Jetstream 2?s primary system will be located at Indiana University, with four modest regional systems deployed nationwide at Arizona State University (ASU), Cornell University, the University of Hawai?i (UH), and the Texas Advanced Computing Center (TACC). Additional partnerships with the University of Arizona, Johns Hopkins University, and University Corporation for Atmospheric Research (UCAR) will contribute to Jetstream 2?s unparalleled usability and support for a broad range of scientific efforts.
The Jetstream team has been at the forefront of training the research community to transition from batch computing methods to adopt cloud-style usage. Jetstream 2 will continue this path and will ease the transition between academic and commercial cloud computing. Some of the advanced features include push-button virtual clusters, advanced high-availability science gateways services (including commercial cloud integration), federated authentication for JupyterHubs, bare metal and virtualization within the same system through programmable CI, support for on-demand data intensive workloads in addition to on-demand computation, high-performance software-defined storage, and advanced multi-platform orchestration capabilities.
Jetstream 2 will have far-reaching societal benefits. As enhanced educational infrastructure, it will serve more students, from traditional undergraduates to domain-science experts desiring training in computational techniques, than any other NSF-funded CI resource. These students will be better equipped to fully participate in the evolving STEM workforce. In addition to enabling new research, discovery, and innovation across many disciplines, Jetstream 2 will advance the national CI ecosystem and extend the broader impacts of existing NSF investments. Jetstream 2?s ?Core Services? will demonstrate a practical model of distributed cloud computing that will give academic institutions an incentive to invest their own funds in new advanced CI facilities. Although modest in scale, these facilities will represent the state of the art in reconfigurable computing. The implementation of Jetstream 2 will also demonstrate that colleges and universities can invest sustainable amounts of their own funds in highly-effective, flexible CI resources that generate a significant return on investment. In sum, Jetstream 2 will transform the national CI landscape and greatly benefit the nation.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.958 |
2020 — 2021 |
Merchant, Nirav O'leary, Patrick Maxwell, Reed Condon, Laura [⬀] Melchior, Peter (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nsf Convergence Accelerator - Track D: Hidden Water and Hydrologic Extremes: a Groundwater Data Platform For Machine Learning and Water Management
The NSF Convergence Accelerator supports use-inspired, team-based, multidisciplinary efforts that address challenges of national importance and will produce deliverables of value to society in the near future. The broader impact and potential societal benefit of this Convergence Accelerator Phase I project is to utilize artificial intelligence methods such as machine learning (ML) to achieve better water management outcomes that directly benefit society by developing the ability to better plan for and manage extreme events through improved hydrologic forecasting. HydroFrame-ML is motivated by, and structured around, applied solutions for water management planning and decision making. Extreme events like drought and floods have far-reaching societal impacts. They are common, costly and likely to get worse in the future. The project team is partnered with the Bureau of Reclamation, which is the largest wholesale water provider in the country, providing water to more than 31 million people and 10 million acres of farmland. The Bureau of Reclamation will drive use case design and the metrics used to evaluate success in Phase 1, as well as partner in the expansion of the project team for Phase 2. Additionally, the project team will develop hands-on activities and challenges designed to give undergraduates experience in machine learning and data science, in the context of pressing real-world challenges. Aided by the planned addition of a STEM mentorship program partner in Phase 2, the team will build content with the vision of helping to broaden participation of underrepresented students well beyond the timeframe of this project.
The proposed project brings together the most physically rigorous national scale groundwater simulations developed through HydroFrame with national leaders in Earth Systems Modeling and water management. By providing end-to-end workflows combining state of groundwater science with operational management tools, HydroFrame-ML will advance both large-scale water management as well as our understanding of how human operations and groundwater interact in extreme events. Their products will provide innovative ways to improve forecasts and in the process will expand our knowledge about the (1) contributions of groundwater to extreme events in managed systems; (2) biases in our current risk-assessment approaches which do not consider groundwater; and (3) potential to improve long-term sustainability by more actively managing groundwater and accounting for groundwater surface water interactions in projections.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2021 — 2023 |
Merchant, Nirav Condon, Laura [⬀] Melchior, Peter (co-PI) [⬀] Maxwell, Reed |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Track D: Hidden Water and Extreme Events: Hydrogen, a Physically Rigorous Machine Learning Platform For Hydrologic Scenario Generation
Water is the driving force behind extreme events like floods, droughts and wildfires. These events have cost the US $234.3B in damages just in the past three years, and this figure is projected to increase. Recent events like the record setting wildfires in California and the mega drought on the Colorado river are merely the latest illustrations. Historical data are no longer a reliable guide for the risks we will face in the future. This project addresses the uncertainty that poses a huge challenge for decision makers.
HydroGEN is a web-based machine learning (ML) platform that generates custom hydrologic scenarios on demand. It combines powerful physics-based simulations with ML and observations to provide customizable scenarios from the bedrock through the treetops. Without any prior modeling experience, water managers and planners can directly manipulate state-of-the-art tools to explore scenarios that matter to them.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |