1999 — 2001 |
Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Powre: Applying Database Techniques to Management of Large Data Flows in Scientific Applications @ Georgia Tech Research Corporation
EIA-9973834 Plale, Beth Georgia Institute of Technology
CISE/POWRE: Applying Database Techniques to Management of Large Data Flows in Scientific Applications
This project supports the PI's research activities as a Postdoc at Georgia Institute of Technology. To address the problem of large data flows and incompatibility between components, the PI is proposing a general middleware scheme called active streams: an approach for controlling data flow by injecting computational components directly into a data flow. A computational component implements one or more event-action rules, in the style of active databases, where a set of events triggers a rule action. To summarize the goal of the project is an architecture , abstractions, and algorithms for middleware components that are efficient in the internal query evaluation, and responsive to changes in the environment.
|
0.907 |
2001 — 2005 |
Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr/Sy Collaborative Research: a Unified Relational Approach to Grid Information Services
An application running in a distributed computing environment such as the Computational Grid must adapt to the available hardware and software resources. This requires information about the properties of Grid resources such as hosts, network switches, links and paths, software libraries and systems, user and organization rights, software services, event channels and dictionaries, and more. The information needed for an application to run, the values of the information (how fast the information changes) and the freshness of the information (how fast updates must be pushed to the application) can vary dramatically. These attributes place significant demands on the resource information service, demands that are arising with increasing prevalence in the general area of directory services as well. The Grid Forum, an international standards body for world-wide Grid computing, is developing standards for representing and querying this information. There is much that is excellent about these evolving standards, but there are many forms of highly desirable queries that will be difficult or expensive to perform in these systems. In particular, dynamic information will require very high update rates not supported by LDAP-based implementations.
This project will address these concerns through a proposed (and tentatively named) Grid Resource Information Service (GRIS), a unified relational approach to grid information services. The research will start with the full ACID (Atomicity/Consistency/Isolation/Durability) functionality of a relational database system and "build down" to a practical resource information system that still provides most of the benefits of the RDBMS. Such a system will provide a single highly flexible query model and language for all types of Grid resource information, no matter how dynamic. The research will culminate in an extensible implementation based on commodity database systems and the SQL language, including "canned queries" for non-SQL users. The project will evaluate the new system and techniques using logged updates and queries from an existing Grid information service, and comparing results with a hierarchical system such as Globus MDS2. To facilitate comparisons, the project will produce a set of benchmark queries from discussions with users, tool developers, and Grid Forum members, and will quantify the limits of these queries.
|
1 |
2002 — 2008 |
Fox, Geoffrey (co-PI) [⬀] Bramley, Randall (co-PI) [⬀] Lumsdaine, Andrew (co-PI) [⬀] Wise, David [⬀] Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Infrastructure: a Research Infrastructure For Collaborative, High-Performance Grid Applications
0202048 Wise, David S. Indiana University - Bloomington
RI: A Research Infrastructure for Collaborative, High-Performance Grid Applications
This project, developing an experimental infrastructure for distributed high performance computing, supports ten research projects extending the location-transparency that the Grid provides for computation resources to the full spectrum of activities which end-users require. Services being explored include software development, parallel code middleware, distributed software components for scientific computing, security for parallel remote method invocation, managing large-scale data streams, and collaboration methodologies. The research builds on and extends the institutions collaborations with several national Grid research teams. In contrast to existing national and university infrastructure available through production machines, this research requires an environment tolerant of experimental network protocols, temporary middleware, and other system-level changes. The infrastructure will contribute to the following research projects: a. Opie: basic work on parallel matrix algorithms that achieve high efficiency across many architectural platforms b. LAM: middleware MPI implementations supporting hierarchical and fault-tolerant parallel computing c. dQUOB: application of SQL queries to live data streams d. RMI Security: basic research into security mechanisms for remote method invocation, allowing security to be traded off with efficiency e. HPJ: High Performance Java creating a language platform for portable high performance coding f. Grid Broker: reliable, robust publish/subscribe service for introducing fault tolerance into the distributed Grid environment g. Community Grids Collaboratory: advanced collaboration capabilities with applications to both distance education and distributed communities h. Xports: design of methodologies for remote instrument access and data management of the resulting extremely large data sets i. Software Components: distributed software component model designed for applications that use parallel computing "nodes" in wide-area Grid environments j. Science Portals: set of tools that allow programmers to build Grid distributed applications accessed and controlled from desktop environments and web browsers Major improvements to infrastructure supporting all these projects include a 16-node cycle server and a large-scale file server as well as network upgrades to and within the building.
|
1 |
2003 — 2009 |
Marru, Suresh Plale, Beth Gannon, Dennis (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Information Technology Research (Itr): Linked Environments For Atmospheric Discovery (Lead)
Each year across the United States, floods, tornadoes, hail, strong winds, lightning, and winter storms cause hundreds of deaths and result in annual economic losses of more than $13B. Their mitigation is stifled by rigid information technology frameworks that cannot accommodate the unique real time, on-demand, and dynamically-adaptive needs of weather research.
Linked Environments for Atmospheric Discovery (LEAD), the foundation of which is a series of interconnected virtual "Grid environments," allows scientists and students to access, prepare, predict, manage, analyze, and visualize a broad array of meteorological information independent of format and physical location. A transforming element of LEAD is the ability for analysis tools, forecast models, and data repositories to function as dynamically adaptive, on-demand systems that can change configuration rapidly and automatically in response to the evolving weather; respond immediately to user decisions based upon the weather problem at hand; and steer remote observing systems to optimize data collection and forecast/warning quality.
LEAD will allow researchers, educators, and students to run atmospheric models and other tools in much more realistic, real time settings than is now possible, hasten the transition of research results to operations, and bring the pedagogical benefits of sophisticated atmospheric science tools into high school classrooms for the first time. Its capabilities will be integrated into dozens of universities and operational research centers that collectively reach 21,000 university students, 1800 faculty, and hundreds of operational practitioners.
|
1 |
2003 — 2007 |
Pierce, Marlon Gannon, Dennis (co-PI) [⬀] Fox, Geoffrey (co-PI) [⬀] Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nmi: Collaborative Proposal: Middleware For Grid Portal Development
This proposal is designed to facilitate grid portal development using the portlet/container approach to building portals. This approach separates portal control and basic services from content. A central control server provides basic portal services such as authentication, access control, and user customizability. Into this framework, portal content and custom services are plugged in using software components called portlets. The container manages the organization and interaction of the portlets, and the portlets deliver specific web content (either local or remote), including Grid service interfaces.
The portlet-based design concept supports distributed, loosely coupled development and deployment: user interfaces and science interface components can be developed independently, using the standard portlet API, and then reused between portals. Services and interfaces may be installed and added to various portals in a well-defined way. The portlet model is also an ideal fit to the emerging Open Grid Services Architecture (OGSA) and its implementation specification, the Open Grid Service Infrastructure (OGSI). Because OGSI is based on the new web-service standards, each Grid service can be directly accessed by a custom portlet.
The impact of making the Grid readily approachable by the international community of researchers is potentially extremely large, as the immense resources that have been collected and organized in recent years by the underlying Grid technologies become visible as usable components of the global research community's desktop. This project will greatly simplify the use of Grid technologies and allow new services to be made readily available to individual researchers and groups by enabling the proliferation of Grid portal technology through reusability and simplification of installation. Scientists will be able to easily form flexible groups with collaborators across the world and use the Grid to share data and resources. This project provides tools for collaboration between established and ad hoc groups of users, enabling those scientists to communicate effectively with each other about the science they are doing, and providing customized views of the Grid that are tailored to meet the needs of collaborating groups.
|
1 |
2005 — 2008 |
Stewart, Craig [⬀] Pilachowski, Catherine (co-PI) [⬀] Bramley, Randall (co-PI) [⬀] Plale, Beth Simms, Stephen Hacker, Thomas |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mri: Acquisition of a High-Speed, High Capacity Storage System to Support Scientific Computing: the Data Capacitor
This project, creating a Data Capacitor and a Metadata/Web Services server, addresses two clear and widespread challenges: the need -To store and manipulate large amounts of data for short periods of time (hours to several days) and -For Reliable and unambiguous publication, discovery, and utilization of data via the Web.
The Data Capacitor, a 250 Terabyte short term data store with very fast I/O and the Metadata/Web Services server, a robust server, enable the institution and collaborators to adopt and depend upon the Web services for exchange of research data. Research and development efforts at IU will create the tools required for the Data Capacitor to be used to its fullest. Progress and research possibilities in many disciplines have been fundamentally changed by the abundance of data now so rapidly produced by advanced digital instruments. Scientists face the present challenge of drawing out from these data the information and meaning contained within. IU has established a significant cyberinfrastructure composed of high performance computing systems, archival storage systems, and advanced visualization systems spanning two main campuses in Indianapolis and Bloomington, and connected to national and international networks. This institution enhances its infrastructure in ways that will result in qualitative changes in the research capabilities and discovery opportunities of a broad array of scientist that work with large data sets. The Data Capacitor is expected to become a development platform and testbed for new cyberinfrastructure, as well as a proof of concept for large capacity, short-term storage devices. On the other hand, the Metadata/Web Services server enables the institution to establish a leadership position in standards-based data dissemination in many fields.
Broader Impact: The Data Capacitor enhances current practice in relevant scientific communities, enables technology transfer and commercialization, develops a 21st century workforce, and ensures public understanding of the value of science. Deliberate use of objective metrics in all areas of broader impact ensures that new discoveries, technology development, educational activities, and public information efforts translate into benefit for the scientific community and society as a whole. Women and underrepresented groups will be drawn into computing-intensive sciences and applications of computing.
|
1 |
2006 — 2007 |
Gannon, Dennis (co-PI) [⬀] Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Science of Search: Data Search, Analytics, and Architectures Center (Dsaac)
A planning meeting will be held to determine the organization and viability of forming a new multi-university Industry / University Cooperative Research Center (I/UCRC) for Data Search, Analytics, and Architectures, with Indiana University as the lead research site and the Florida International University as a research site. The Center will focus on an area of technical and economic importance. It will study the representation, management, storage and analyses of large multi-modal data. Managing large complex data sets and analyzing them is problem common to many industries. The proposed center should benefit significantly from the resources available at the two institutions including unique and extensive facilities funded by NSF on Emerging Techniques for Advanced Information Processing at Florida International University.
|
1 |
2007 — 2011 |
Simmhan, Yogesh (co-PI) [⬀] Leake, David (co-PI) [⬀] Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sdci Data: New Toolkit For Provenance Collection, Publishing, and Experience Reuse
OCI - SDCI Data: New Toolkit for Provenance Collection, Publishing, and Experience Reuse
As research digital data collections created through computational science experiments proliferate, it becomes increasingly important to address the provenance issues of the data validity and quality: to record and manage information about where each data object originated, the processes applied to the data products, and by whom. The first outcome of this work is a provenance collection and experience reuse tool that makes minimal assumptions about the software environment and imposes minimal burden on the application writer. It stores and produces results in a form suitable for publication to a digital library. The provenance collection system is a standalone system that imposes a minimal burden on users to integrate it into their application framework and it exhibits good performance.
A second outcome of the work is a recommender system for workflow completion that employs case-based reasoning to provenance collections in order to make suggestions to users about future workflow-driven investigations. The workflow completion tool builds on computer models of case-based reasoning to develop a support system that leverages the collective experience of the users of the provenance system to provide suggestions. As a key part of effectively evaluating aspects of the tool, this work builds a gigabyte benchmark database of real and synthetic provenance information. Real workflows are sought from the community, with synthetic extensions to the data set for completeness for purposes of testing. The software and database are available to the research community.
|
1 |
2007 — 2010 |
Plale, Beth Gannon, Dennis (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr---Csi. An Adaptive Programming Framework For Data and Event Driven Computation
Data-driven applications in computational science react in real time to their environment in a complex detect-analyze-response cycle. These computations can often be viewed as complex data flow graphs having components that are both data- and computationally- intensive, and requiring access to live data feeds and access to large-scale computational resources. A user may cycle through multiple graphs accessing data from sensors, instruments, databases, and large collections of files in the process of discovering new knowledge. This research investigates a programming model and framework for knowledge discovery in data-driven applications. Users program the system by declarative specification of detect-analyze-response behavior. Underlying the programming model is a continuous rule-based events processor and workflow orchestration engine organized as Web services. The research formalizes an abstract model of interaction and will map the higher-level conceptualization to the events processing and workflow runtime components. It demonstrates that the model supports a unique adaptive framework where knowledge gained from the computational and data analysis can be fed back to the data event streams. The approach is validated experimentally through quantifiable metrics and by its application to two model problems: severe storm prediction where a weather forecast is triggered based on data mining results from mining radar or model data, and adaptive resource management where hardware and software resources and environment data streams are monitored for on-the-fly resource requirements prediction.
|
1 |
2008 — 2010 |
Stewart, Craig (co-PI) [⬀] Brown, Geoffrey (co-PI) [⬀] Plale, Beth Gannon, Dennis (co-PI) [⬀] Wheeler, Bradley [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cyberinfrastructure Software Sustainability and Reusability Workshop
National Science Foundation Office of Cyberinfrastructure
Proposal # 0829462 PI name Bradley Wheeler Institution Indiana University Title ?Cyberinfrastructure Software Sustainability and Reusability Workshop?
Project Summary
This workshop proposal targets an examination of sustainability and reusability of software developed, supported, and used by the NSF community. Specifically, workshop goals include: examination of current software evaluation and adoption models by labs and virtual organizations; examination of long-term sustainability models; and mechanisms for supporting sustainability via funding organizations, open source, and commercialization. White papers on these topics and others will be solicited of the community in advance of the workshop. Results from the workshop will be documented, as well as recommendations to NSF. Intellectual merit is identified as the exploration of this topic and a resulting deeper understanding of how we as a country of scientists and educators deal with sustaining community sourced software over the long term. Broader impact is multi-dimensional: in addition to the potential transformative nature of resulting actions and strategies by both the community and funding agencies, the proposal will make explicit funds available for HBCU and MSI participation.
|
1 |
2010 — 2013 |
Brown, Geoffrey (co-PI) [⬀] Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Small: Assisted Emulation For Digital Preservation
For the past 20 years, CD-ROMs have been the primary media for distributing key economic, scientifi c, environmental, and societal data as well as educational and scholarly work. More than 150,000 titles have been published including thousands distributed by the United States and other governments. Yet no viable strategy has been developed to ensure that these materials will be accessible to future generations of scholars. In the short term, these materials are subject to physical degradation which will make them ultimately unreadable and, in the long-term, technological obsolescence will make their contents unusable. This project will develop practical techniques using off-the-shelf emulators with virtualization software to ensure long-term viability of CD-ROM materials. Although emulation has been widely discussed as a preservation strategy, it suffers from a fundamental flaw, since future users are unlikely to be familiar with legacy software environments and will find such software increasingly difficult to use. Furthermore the user communities of many such materials are sparse and distributed, thus any necessary technical knowledge is unlikely to be available to library patrons. The key objective of this project is to develop the technology and processes necessary to mitigate these flaws and to enable large-scale deployment of emulation by libraries and archives.
This project will develop automation technologies to capture the technical knowledge necessary to install and perform common actions with legacy CD-ROM materials in the form of scripts for performing on-the-fly customization of \generic" emulation environments. The long-term vision is to support a distributed CD-ROM collection, developed by a community of libraries, which enables client workstations to access preserved CD-ROM images through customized emulation environments. The project will explore the costs of developing the scripts necessary to automate the use of specific CD-ROMs and the technologies necessary to enable libraries to pool their resources to create a distributed network preserved CD-ROM materials.
The project is structured as a two-year pilot study that will develop automation tools, apply these tools to a large (several thousand representative set of CD-ROM materials, evaluate the performance of this approach in a distributed environment, disseminate the tools and scripts as software artifacts, and provide statistics for planning the large-scale preservation of CD-ROM materials. The research performed in this proposal will enable libraries and archives to solve a growing problem while reducing the resources required to maintain their collections of removable media. This project provides a foundation for libraries and archives to pool their intellectual resources by providing access to virtual media collections accessed through shared emulators using community generated scripts. The materials whose preservation will be enabled by this project include key scientific and societal data published by United States and other governments as well as cultural and educational materials from many sources. This project will have a significant impact on undergraduate science education by direct mentoring of undergraduate research assistants and providing the opportunity for their involvement in writing and presenting scholarly works.
|
1 |
2010 — 2012 |
Evans, Tom Plale, Beth Ostrom, Elinor (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: in-Situ Archiving of Digital Scientific Data
The science community has increasingly employed multi-method approaches to scientific exploration with an increasing reliance on computational methods. This is particularly the case with the science of climate and global environmental change. With this evolution has come the fundamental importance of data storage and archiving. There is an open problem in archiving digital science data that affects many fundamental science initiatives.
We propose an Open Archival Information System (OAIS) (2009) compliant data archival repository that lives early in the scientific research pipeline, supporting the ingest and access mechanisms that users have become accustomed to and have staff to support, while simultaneously providing support for curation and preservation of data, and making relational database, and eventually other databases, more usable in real time by researchers and policy makers. The testbed for this approach is the International Forestry Resources and Institutions (IFRI) database, the most complete data archive of how communities develop strategies for sustainable forest management. The IFRI scientific user community consists of field visits every five years to over 250 diverse sites in 11 countries.
The proposed repository conceptually wraps the original database into a unit that also contains a metadata catalog and provenance collection tool with interaction and replication guided by the OAIS standard. A fundamental research question in this effort is the data model that maps a database schema to an object model which abstracts scientific intent. The abstraction of scientific intent is grounded in a general conceptual model for reasoning about the life cycle of social-ecological systems and their interactions and outcomes. We thus expect to generalize the tools and data model that provide the map from a database to the science-oriented conceptual model expressed as an ontology.
The International Forestry Resources and Institutions network includes twelve Collaborating Research Centers in ten countries on four continents. The early research conducted in this project will form a foundation for outreach through IFRI that could have broad potential for science and policy impacts worldwide. The proposal funds a computer science graduate student and postdoctoral fellow who will be engaged in interdisciplinary research in an area of emerging importance in the next many generations.
A critical component of the long term success of the ideas of this proposal will be by getting word out. Therefore we will seek to present talks about these tools and approach long-term digital data collection projects, particularly ones focused on environmental monitoring such as LTER and OOI.
|
1 |
2011 — 2012 |
Jensen, Scott (co-PI) [⬀] Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Coming Together Around Data, a Pi Project Meeting For Datanet and Interop
This is a proposal for a Principle Investigators meeting for the DataNet and INTEROP programs. Often, PI communities that evolve within two related yet distinct programs can become partitioned in both their thinking and their interactions along strictly programmatic boundaries. The meeting proposes to use the unifying theme of the participants common passion for data to transcend these boundaries and create new collaborative networks to better explore the challenges in data management, data preservation, and interoperability. Meetings with this unconventional venue provide open and creative conversations to expose and accentuate the group's collective knowledge, allow for a sharing of ideas and insights, and let participants gain a deeper understanding of the challenging issues involved.
|
1 |
2011 — 2017 |
Kumar, Praveen Plale, Beth Myers, James Alter, George Hedstrom, Margaret [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Datanet Full Proposal: Sustainable Environment Through Actionable Data (Sead) @ University of Michigan Ann Arbor
Abstract: Award Number ? 0940824 Title: DataNet Full Proposal: Sustainable Environment through Actionable Data (SEAD)
The universities of Michigan, Indiana, and Illinois propose a DataNet partnership called Sustainable Environment through Actionable Data (SEAD). SEAD will enable new modalities of sustainability science - the study of dynamic interactions between nature and society. Advancing the science of sustainability requires integration of social science, natural science, and environmental data at multiple spatial and temporal scales that is rich in local and location-specific observations; referenced for regional, national, and global comparability and scale; and integrated to enable end users to detect interactions among multiple phenomena. SEAD will respond to the expressed needs of sustainability science researchers for long-term management of heterogeneous data by developing new capabilities for data integration, dissemination, and long-term preservation. SEAD will provide researchers with tools for active curation and use social networking to engage data producers and users in community curation, gradually shifting curatorial and collection development responsibilities from professional curators to the producer and user communities. Our focus is on the "long tail" of social and environmental data: derived data products, data collections from individual PI's and small group investigations, and data sets of local, regional or topical significance that are critical to sustainability science but are of limited value until they can be referenced geo-spatially and temporally, combined with related data and observations, and modeled consistently. SEAD will make data accessible to diverse users, including domain scientists, local, national and international policy makers, manufacturers of sustainable technologies, citizen scientists, and informed consumers. SEAD will take advantage of existing robust digital library and institutional repository (IR) infrastructures at the three universities for access, storage, and preservation to ensure wide accessibility of data, linkages between data and scientific publications, and persistence.
SEAD will serve researchers efficiently and in a financially sustainable way via active curation, make innovative use of social networking, integrate data with existing digital library infrastructures, and provide synthesis services that significantly increase the research and societal value of data. Our work will establish a new active curation paradigm that can be readily integrated into the scientific workflow and that leverages social networking technologies to engage the science community in data curation. Our research program will produce novel solutions to the synthesis of heterogeneous data across different levels of spatio-temporal granularity and scope; management of logical contexts and data models; appropriate sharing of data with privacy and proprietary restrictions; and preservation through emulation and migration-based-technologies and policies for distributed stewardship. Our cyberinfrastructure development work will support a network of repositories that functions on several levels: locally through integration of SEAD data into campus digital library/repository infrastructures, inter-institutionally through a model for distributed data curation and storage, and nationally and internationally by extending our approach to other IRs, other DataNet Partners, sensor and observational networks, and topical data archives. Our financial sustainability plan will identify appropriate incentive mechanisms and business models based on a tight coupling of preservation and access services with research library managed IR infrastructure and ongoing involvement of scientists and users.
SEAD will build national and global capabilities for science-informed sustainability policy and planning in land use, natural resource management, agriculture, energy, economic development, "green" manufacturing, and related areas where critical decisions will be made in the next decade. The project will engage the community that preserves and shares scientific data, thus enhancing the public investment in scientific research and making taxpayer funded data widely available and easier to use which will provide high-value cost-effective curation and preservation capabilities through partnerships with other "small science" domains.
|
0.937 |
2012 — 2014 |
Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research Si2 Sse: Pipeline Framework For Ensemble Runs On Clouds
Cloud computing is an attractive computational resource for e-Science because of the ease with which cores can be accessed on demand, and because the virtual machine implementation that underlies cloud computing reduces the cost of porting a numeric or analysis code to a new platform. It is difficult to use cloud computing resources for large-scale, high throughput ensemble jobs however. Additionally, the computationally oriented researcher is increasingly encouraged to make data sets available to the broader community. For the latter to be achieved, using capture tools during experimentation to harvest metadata and provenance reduces the manual burden of marking up results. Better automatic capture of metadata and provenance is the only means by which sharing of scientific data can scale to meet the burgeoning explosion of data.
This project develops a pipeline framework for running ensemble simulations on the cloud; the framework has two key components: ensemble deployment and metadata harvest. Regarding the former, on commercial cloud platforms typically a much smaller number of jobs than desired can be started at any one time. An ensemble run will need to be pipelined to a cloud resource, that is, executed in well-controlled batches over a period of time. We will use platform features of Azure, and employ machine learning techniques to continuously refine the pipeline submission strategy and workflow strategies for ensemble parameter specification, pipelined deployment, and metadata capture. Regarding the latter key component, we expect to reduce the burden of sharing scientific datasets resulting from the use of cloud resources through automatic metadata and provenance capture and representation that aligns the metadata with emerging best practices in data sharing and discovery. Ensemble simulations result in complex data sets, whose reuse could be increased by expressive, granule and collection level metadata, including the lineage of the resulting products, to contribute towards trust.
In this project we focus on a compelling and timely application from climate research: One of the more immediate and dangerous impacts of climate change could be a change in the strength of storms that form over the oceans. In addition, as sea level rises due to global warming and melting of the polar ice caps, coastal communities will become increasingly vulnerable to storm surge. There have already been indications that even modest changes in ocean surface temperature can have a disproportionate effect on hurricane strength and the damage inflicted by these storms. In an effort to understand these impacts, modelers turn to predictions generated by hydrodynamic coastal ocean models such as the Sea, Lake and Overland Surges from Hurricanes (SLOSH) model. The proposed research advances the knowledge and understanding of probabilistic storm surge products by enhancements to the SLOSH model itself and through mechanisms that take advantage of commercial cloud resources. This knowledge is expected to have application in research, the classroom, and in operational settings.
The broader significance of the project is several-fold. Cloud computing is an important economic driver but it remains difficult for use in computationally driven scientific research. This project lowers the barriers to conducting e-Science research that utilizes cloud resources, specifically Azure. It will contribute tools to help researchers share, preserve, and publicize the scientific data sets that result from their research. Because we focus on and improve an application that predicts storm surge in response to sea level changes and severe storms, our work contributes to societal responses and adaptations to climate change, including planning and building the sustainable, hazard-resilient coastal communities of the future.
|
1 |
2012 — 2013 |
Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
A Data Consortium: Coming Together Around Data
If properly realized, the data deluge will be a catalyst for new scientific discovery that fuels advances in grand challenge questions such as climate and social-ecological interactions. While federal agencies such as the National Science Foundation have invested very successfully in repositories, infrastructure, and tools for data-intensive science, investing in data solutions is not the same as investing in high performance computing resources because unlike general purpose compute facilities where the facility can be separated from the use, it is difficult to separate data from its semantics, so general purpose solutions only address part of the problem, and a small part at that. Recognizing the opportunities that could be realized through stronger integrated efforts, NSF is encouraging a path towards coordinated efforts that result in satisfying the needs of a broader constituency that strives for interoperability, harmonization of concepts, protocols, and standards nationally and internationally. This proposal is a small but fundamental next step towards building an organization with lasting and significant impact on the broader community engaged in 4th paradigm research and education.
|
1 |
2014 — 2019 |
Evans, Tom Plale, Beth Attari, Shahzeen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Wsc-Catergory 2 Collaborative: Impacts of Agricultural Decision Making and Adaptive Management On Food Security
Despite significant attention from governments, donor agencies, and NGOs, food security remains an unresolved challenge in the context of global human welfare. Both technical and conceptual limits have prevented the collection and analysis of rich empirical datasets with high temporal frequency over large spatial extents necessary to investigate how changes to seasonal precipitation patterns are affecting food security. This research project will transform both methodological and conceptual frameworks for assessing the sustainability of dryland agricultural systems. The research will bring new understanding of how dryland farmers adapt to within-season variability in climate and how those adaptations affect their current and future resilience to climate variability and climate change. Project findings will improve forecast models used to monitor and predict the sustainability of water-dependent agricultural systems. By marrying the simple idea of cell phone adoption with state-of-art research in data science, crop prediction, and environmental/social monitoring, the project will advance and accelerate scientific understanding of an important global sustainability problem.
This project will focus on characterizing the nature and impact of intra-seasonal smallholder decision making on adaptation to climate variability in semi-arid agricultural systems. Specifically, the research addresses three critical research questions: (1) How do intra-seasonal dynamics of both the environment and social systems shape farmer adaptive capacity? (2) To what extent does intra-seasonal decision making enable farmers to adapt to climate uncertainty? and (3) How can intra-seasonal data improve the ability to model, predict, and improve adaptation to climate variability in ways that enhance food security? The research team will integrate physical models of hydrological and agricultural dynamics with real-time environmental data and weekly farmer decision making in individual fields. These real-time data are obtained from previously-developed novel cellular-based environmental sensing pods coupled to real-time reports of farmer decision making submitted via cell phones. The team will use a combination of environmental and social data to develop a suite of modeling tools for understanding how climate variability impacts the sustainability of agricultural systems in the study regions. The research team also will develop modeling tools for improved forecasts of food security capable of producing new understandings of the intra-seasonal dynamics of both social and environmental processes. Although the test bed for this research is the Southern Province of Zambia and portions of the Rift Valley and Central Provinces of Kenya centered around the Laikipia District, the results may well be broadly applicable to other semi-arid and arid regions of the world.
|
1 |
2014 |
Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Software Sustainability: An Si^2 Pi Workshop
This award will support a 1.5 day workshop in Arlington, VA to bring together the community of SI2 awardees with the aims of: 1) serving as a forum for focused PI technical exchange, through an early evening poster session; 2) serving as a forum for discussion of topics of relevance to the PIs from topics emerging both from within NSF and from the broader community, by informing the attendees of emerging best practices, and stimulating thinking on new ways of achieving sustainability and of ensuring that the foundation laid by SI2 is preserved into the future; and 3) gathering experiences and a shared sense of best practice that results in a published workshop report.
The workshop will bring together researchers who are a proto-community of NSF open source software developers. The meeting will examine the characteristics of the community, and consider whether the products from the program can be enhanced by giving the community a new identify and new way of looking at itself. The meeting will also address citation, attribution, and reproducibility, which are three related topics often discussed in the context of data, but less so in the context of software. The attendees will consider practical steps that could be taken to advance software citation and science reproducibility. Finally, sustainability of software is a major topic for NSF and for the SI2 PIs. The meeting will highlight new ways of thinking about software sustainability, drawing on experts in the field and on recent SI2 EAGER funded projects that are studying the community to help the workshop attendees in their thinking about sustainability.
The community outputs of the workshop will be: posters developed by the SI2 PIs that will be shared amongst the attendees and shared more broadly on the workshop web site; an experiences report (licensed under a Creative Commons license) produced by the award PIs, distributed via the workshop web site, via email to participants who will be asked to disseminate among their project colleagues and peers, and via an archive repository through which it will be accessible through a persistent ID; and attendee journalism during the event in the form of a public Google doc and public Twitter stream.
|
1 |
2015 — 2018 |
Nusser, Sarah Seidel, Edward Plale, Beth Athey, Brian (co-PI) [⬀] Riedy, Joshua Mcgimpsey, William |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Bd Hubs: Midwest: Seedcorn: Sustainable Enabling Environment For Data Collaboration @ University of Illinois At Urbana-Champaign
Catalyzed by the NSF Big Data Hub program, the Universities of Illinois, Indiana, Michigan, North Dakota, and Iowa State University have created a flexible regional Midwest Big Data Hub (MBDH), with a network of diverse and committed regional supporting partners (including colleges, universities, and libraries; non-profit organizations; industry; city, state and federal government organizations who bring data projects from multiple private, public, and government sources and funding agencies). The NSF-funded SEEDCorn project will be the foundational project to energize the activities of MBDH, leveraging partner activities and resources, coordinating existing projects, initiating 20-30 new public-private partnerships, sharing best practices and data policies, starting pilots, and helping to acquire funding. The result of SEEDCorn will be a sustainable hub of Big Data activities across the region and across the nation that enable research communities to better tackle complex science, engineering, and societal challenges, that support competitiveness of US industry, and that enable decision makers to make more informed decisions on topics ranging from public policy to economic development.
The MBDH is focusing on specific strengths and themes of importance to the Midwest across three sectors: Society (including smart cities and communities, network science, business analytics), Natural & Built World (including food, energy, water, digital agriculture, transportation, advanced manufacturing), and Healthcare and Biomedical Research (which spans patient care to genomics). Integrative "rings" connect all spokes and will be organized around themes of specific MBDH strengths, including (a) Data Science, where computational and statistical approaches can be developed and integrated with domain knowledge and societal considerations that support the underlying needs of "data to knowledge," (b) services, infrastructure, and tools needed to collect, store, link, serve, and analyze complex data collections, to support pilot projects, and ultimately provide production-level data services across the hub, and (c) educational activities needed to advance the knowledge base and train a new generation of data science-enabled specialists and a more general workforce in the practice and use of data science and services.
Further information on the project can be found at http://midwestbigdatahub.org.
|
0.942 |
2017 — 2018 |
Plale, Beth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ipa Agreement |
1 |
2017 — 2018 |
Quick, Robert Plale, Beth Almas, Bridget Lannom, Laurence |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cc* Storage: Robust Persistent Identification of Data (Rpid)
This project focuses upon robust, persistent identification of data, which could greatly improve scientific discovery and reuse of datasets. Persistent identifiers of data (PIDs) enhance scientific discovery and are an important element of research data sharing. This project creates a testbed to evaluate new capabilities for persistent identifiers. The initial stage of the project will include four diverse repositories containing millions of PIDs, and the second stage will allow any NSF-eligible institution to use the testbed to evaluate their own work. The project would enhance research interoperability.
Currently, there are several persistent identifier options for data (such as Digital Object Identifiers, Handles, the Archive Resource Key, and Uniform Resource Names). The existing environment is limited by multiple solutions, weak interoperability, and procedures translating PID to data object that are inconsistent. This project provides several new capabilities: - A testbed to research new capabilities and interoperability for persistent identifiers, which builds upon Indiana University cyberinfrastructure and instances on Amazon Web Services; - The ability to prototype and evaluate PID types, allowing study of the difficulties and advantages of relating types to one another across a distributed system; and - Approaches to mapping from PIDs to Canonical Text Services Uniform Resource Names (CTS URNs). Combining URNs with Handles should allow a precise CTS URN referencing capability, with the flexible resolution of the existing widely-used Handle System. The goal is to standardize the results of PID resolution, allowing various PID services to interoperate at a higher level.
|
1 |
2021 — 2026 |
Panda, Dhabaleswar [⬀] Chaudhary, Vipin (co-PI) [⬀] Machiraju, Raghu (co-PI) [⬀] Plale, Beth Fosler-Lussier, Eric (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ai Institute For Intelligent Cyberinfrastructure With Computational Learning in the Environment (Icicle)
Although the world is witness to the tremendous successes of Artificial Intelligence (AI) technologies in some domains, many domains have yet to reap the benefits of AI due to the lack of easily usable AI infrastructure. The NSF AI Institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) will develop intelligent cyberinfrastructure with transparent and high-performance execution on diverse and heterogeneous environments. It will advance plug-and-play AI that is easy to use by scientists across a wide range of domains, promoting the democratization of AI. ICICLE brings together a multidisciplinary team of scientists and engineers, led by The Ohio State University in partnership with Case Western Reserve University, IC-FOODS, Indiana University, Iowa State University, Ohio Supercomputer Center, Rensselaer Polytechnic Institute, San Diego Supercomputer Center, Texas Advanced Computing Center, University of Utah, University of California-Davis, University of California-San Diego, University of Delaware, and University of Wisconsin-Madison. Initially, complex societal challenges in three use-inspired scientific domains will drive ICICLE’s research and workforce development agenda: Smart Foodsheds, Precision Agriculture, and Animal Ecology.
ICICLE’s research and development includes: (i) Empowering plug-and-play AI by advancing five foundational areas: knowledge graphs, model commons, adaptive AI, federated learning, and conversational AI. (ii) Providing a robust cyberinfrastructure capable of propelling AI-driven science (CI4AI), solving the challenges arising from heterogeneity in applications, software, and hardware, and disseminating the CI4AI innovations to use-inspired science domains. (iii) Creating new AI techniques for the adaptation/optimization of various CI components (AI4CI), enabling a virtuous cycle to advance both AI and CI. (iv) Developing novel techniques to address cross-cutting issues including privacy, accountability, and data integrity for CI and AI; and (v) Providing a geographically distributed and heterogeneous system consisting of software, data, and applications, orchestrated by a common application programming interface and execution middleware. ICICLE’s advanced and integrated edge, cloud, and high-performance computing hardware and software CI components simplify the use of AI, making it easier to address new areas of inquiry. In this way, ICICLE focuses on research in AI, innovation through AI, and accelerates the application of AI. ICICLE is building a diverse STEM workforce through innovative approaches to education, training, and broadening participation in computing that ensure sustained measurable outcomes and impact on a national scale, along the pipeline from middle/high school students to practitioners. As a nexus of collaboration, ICICLE promotes technology transfer to industry and other stakeholders, as well as data sharing and coordination across other National Science Foundation AI Institutes and Federal agencies. As a national resource for research, development, technology transfer, workforce development, and education, ICICLE is creating a widely usable, smarter, more robust and diverse, resilient, and effective CI4AI and AI4CI ecosystem.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.948 |
2022 — 2027 |
Plale, Beth Pierce, Marlon Marru, Suresh |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Frameworks: Cybershuttle: An End-to-End Cyberinfrastructure Continuum to Accelerate Discovery in Science and Engineering
Science depends critically on accessing scientific and engineering software, data repositories, storage resources, analytical tools, and a wide range of advanced computing resources, all of which must be integrated into a cohesive scientific research environment. The Cybershuttle project is creating a seamless, secure, and highly usable scientific research environment that integrates all of a scientist’s research tools and data, which may be on the scientist’s laptop, a computing cloud, or a university supercomputer. These research environments can further support scientific research by enabling scientists to share their research with collaborators and the broader scientific community, supporting replicability and reuse. The Cybershuttle team integrates biophysicists, neuroscientists, engineers, and computer scientists into a single team pursuing the project goals with a grounding in cutting-edge research problems such as understanding how spike proteins in viruses work, how the brain functions during sleep, and how artificial intelligence techniques can be applied to modeling engineering materials. To meet its ambitious goals, the project is building on over a decade of experience in developing and operating the open-source Apache Airavata software framework for creating science-centric distributed systems. Cybershuttle is providing a system that can be used as a training ground to educate students in concepts of open-source software development and applied distributed systems, fostering a globally competitive workforce who can move easily between academic and non-academic careers. <br/><br/>Cybershuttle is creating a new type of user-facing cyberinfrastructure that will enable seamless access to a continuum of CI resources usable for all researchers, increasing their productivity. The core of the Cybershuttle framework is a hybrid distributed system, based on open-source Apache Airavata software. This system integrates locally deployed agent programs with centrally hosted middleware to enable an end-to-end integration of computational science and engineering research on resources that span users’ local resources, centralized university computing and data resources, computational clouds, and NSF-funded, national-scale computing centers. Scientists and engineers access this system using scientific user environments designed from the beginning with the best user-centered design practices. Cybershuttle uses a spiral approach for developing, deploying, and increasing usage and usability, beginning with on-team scientists and expanding to larger scientific communities. The project engages the larger community of scientists, cyberinfrastructure experts, and other stakeholders in the creation and advancement of Cybershuttle through a stakeholder advisory board. Cybershuttle's team includes researchers from Indiana University, the University of Illinois at Urbana-Champaign, the University of California San Diego, the San Diego Supercomputer Center, and the Allen Institute.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2022 — 2027 |
Plale, Beth Snapp-Childs, Winona |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Rcn:Cip: Midwest Research Computing and Data Consortium
Cyberinfrastructure (CI) professionals underpin the nation’s academic and non-profit research and development by the innovation and maintenance of computational environments for science and engineering research and development. CI professionals are employed at academic institutions throughout the country, sometimes in groups of one to two people, and at some institutions in numbers considerably larger. The CI professional(s) must be responsive and innovative regardless of group size. And this pressure exists in a highly dynamic environment of computationally-based research that encompasses new technology needs almost daily, e.g. artificial intelligence (AI), edge/fog computing, function-as-a-service of public clouds. The Midwest Research Computing and Data Consortium (MWRCD) is the Midwestern hub of research computing and data infrastructure professionals, and its purpose is to build a community among cyberinfrastructure professionals who share a strong regional identity in order to share information, solve problems, and advocate for important community-driven needs. As a regional consortium, the MWRCD is positioned for equity; that is, it better serves smaller, less well-resourced institutions. It is positioned for impact for the CI professional as well, which it does through innovative networking strategies, focusing on broadening participation across diverse groups and institutions, creating resource documents on topics of interest, and fostering new collaborations. Initial topics include FAIR data, AI cyberinfrastructure, and return on investment of cyberinfrastructure. The project is committed to data collection and evaluation that will contribute to the broader study of the role of the CI professional in team science.<br/><br/>The Midwest Research Computing and Data Consortium’s organization and plans are founded upon the Community Participation Model in which the organization moves from principally serving as an information sharing hub to achieving transformational objectives. To actualize the goals, MWRCD will use a set of programmatic mechanisms, which prioritize inclusiveness and broadening representation, which are referred to as programmatic accelerators. These accelerators consist of a 1) Student Fellows Shadow program, 2) Residencies Program, 3) Professional Mentorship Program, and 4) Affinities Group Program. Additionally, the materials to be developed and disseminated will enable and empower research computing and data professionals to communicate more effectively with key stakeholders. The theme of MWRCD efforts is that regional stakeholders working together in a trusted setting can help each other and the field. This is by working collectively on topics of mutual interest, by onboarding organizations that find value in the mission of the organization, by sharing information transparently, and by providing opportunities for cyberinfrastructure professionals in collaborating organizations to contribute, to be mentored, to belong to a cohort, and to become leaders.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |