2004 — 2006 |
Deelman, Ewa Gil, Yolanda (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sci/Nmi/Sger: Towards Cognitive Grids: Knowledge-Rich Grid Services For Autonomous Workflow Refinement and Robust Execution @ University of Southern California
This SGER proposal describes research on grid workflow refinement and execution for abstract workflows in two areas: preplanning and advance resource reservation; and context-aware dynamic planning and failure repair. Drawing from AI planning techniques, resource reasoning, and languages for expressive knowledge representation, techniques for workflow refinement will be developed. Mechanisms for monitoring and failure-detection based on models for expressive representation of the environments will also be developed. Resulting service implementations will build upon the existing Pegasus workflow mapping system and disseminated through Pegasus.
|
1 |
2005 — 2006 |
Nakano, Aiichiro (co-PI) [⬀] Lerman, Kristina (co-PI) [⬀] Deelman, Ewa Hall, Mary |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr---Aes: Collaborative Research: Intelligent Design and Optimization of Parallel and Distributed Applications @ University of Southern California
This project systematically addresses the enormous complexity of mapping applications to current and future parallel platforms - both scalable parallel architectures consisting of tens of thousands of processors and distributed systems comprised of collections of these and other resources. By integrating the system layers - domain-specific environment, application program, compiler, run-time environment, performance models and simulation, and workflow manager -- and through a systematic strategy for application mapping, the project will exploit the vast machine resources available in such parallel platforms to dramatically increase the productivity of application programmers.
The key contribution of the project will be a systematic solution for performance optimization and adaptive application mapping -- a large step towards automating a process that is currently performed in an ad hoc way by programmers and compilers -- so that it is feasible to obtain scalable performance on parallel and distributed systems consisting of tens of thousands of processing nodes. The application components will be viewed as dynamically adaptive algorithms for which there exist a set of variants and parameters that can be chosen to develop an optimized implementation. Knowledge representation and machine learning techniques utilize this domain knowledge and past experience to navigate the search space efficiently.
|
1 |
2006 — 2009 |
Deelman, Ewa Gil, Yolanda [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nsf Workshop On Scientific Workflows Challenges (Wsw-06) @ University of Southern California
Workflows have recently emerged as a paradigm for conducting large-scale scientific analyses. The structure of a workflow specifies what analysis routines need to be executed, the data flow amongst them, and relevant execution details. These workflows often need to be executed in distributed environments, where data sources may be available in different physical locations and the steps may have execution requirements calling for high-end computing and memory resources at remote locations. Workflows help manage the coordinated execution of related tasks. They also provide a systematic way to capture scientific methodology and provide provenance information for their results. Yet, robust and flexible workflow creation, mapping, and execution are largely open research problems.
Scientific workflows present new challenges over business workflows and other kinds of process models. They typically use very large, distributed data sets, employ computationally intensive tasks, and require high-end and distributed computing technology. They are also often iteratively and interactively designed, since that is the nature of the scientific exploration and analysis process they reflect. On the other hand, scientific workflows also have simplified requirements in terms of their data flow structure, execution management, or security/privacy constraints. Currently, scientific workflows are mostly designed without formal principles and are rarely optimized, scalable or reusable.
The aim of this workshop is to bring together IT researchers and practitioners as well as domain scientists. Application scientists will be asked to describe requirements and desired new analyses and computations that are not possible with today's technologies. IT researchers will be asked to identify problems in their specific areas of expertise. Discussions will focus on four main topics: (1) applications and requirements; (2) dynamic workflows and user steering; (3) data and workflow descriptions; and (4) system-level management to support large-scale workflows.
The outcome of the workshop will be a report outlining research directions and activities that will bring the needed communities together to work on producing a new paradigm for scientific workflows. Easy-to-use tools for building efficient, scalable and reusable scientific workflows are likely to bring benefits to many fields, and can raise the pace and quality of research work in many areas.
The workshop Web site (http://vtcpc.isi.edu/wiki) provides further information about the workshop and will be used for disseminating the workshop report and other results.
|
1 |
2006 — 2010 |
Nakano, Aiichiro (co-PI) [⬀] Lerman, Kristina (co-PI) [⬀] Deelman, Ewa Hall, Mary |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr---Aes: Collaborative Research: Intelligent Optimization of Parallel and Distributed Applications (Wp2) @ University of Southern California
CSR-AES: Intelligent Optimization of Parallel and Distributed Applications
ABSTRACT This project derives a systematic solution for performance optimization and adaptive application mapping to obtain scalable performance on parallel and distributed systems consisting of tens of thousands of processing nodes. With expert domain scientists in molecular dynamics (MD) simulation, we expect to achieve performance levels on MD codes even better than what has been derived manually after years of development and many ports to a variety of architectures. The application components are viewed as dynamically adaptive algorithms for which there exist a set of variants and parameters that can be searched to develop an optimized implementation. A workflow is an instance of the application where nodes represent application components and dependences between the nodes represent execution ordering constraints. By encoding an application in this way, we capture a large set of possible application mappings with a very compact representation. The system layers explore the large space of possible implementations to derive the most appropriate solution. Because the space of mappings is prohibitively large, the system captures and utilizes domain knowledge from the domain scientists and designers of the compiler, run-time and performance models to prune most of the possible implementations. Knowledge representation and machine learning utilize this domain knowledge and past experience to navigate the search space efficiently. This multidisciplinary approach impacts the state-of-the-art in the sub-fields of compilers, run-time systems, machine learning, knowledge representation, and accelerates advances in MD simulation with far more productive software development and porting. More broadly, this research enables systematic performance optimization in other sciences.
|
1 |
2007 — 2012 |
Deelman, Ewa |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sdci Nmi Improvement: Pegasus: From Concept to Execution- - -Mapping Scientific Workflows Onto the National Cyberinfrastructure @ University of Southern California
National Science Foundation NSF Software Development for Cyberinfrastructure (SDCI) Program Office of Cyberinfrastructure
Proposal Number: 0722019 Principal Investigator: Ewa Deelman Institution: University of Southern California Proposal Title: SDCI NMI Improvement: Pegasus: From Concept to Execution --- Mapping Scientific Workflows onto the National Cyberinfrastructure
Abstract
This project addresses improvements to Pegasus, a workflow system overlaying a workflow engine called DAGMan. These improvements will make Pegasus easy to deploy and use across a broad range of science users and environments. New debug capabilities will be added, usability will be improved, new communities of users will be directly engaged, richer workflow representations will be supported, dynamic workflows will be supported, priority-based task submission capabilities will be added, monitoring will be enhanced, and integration with emerging workflow technologies will be pursued. Intellectual merit lies in the demonstrated value to physics, astronomy, and other users. Broad impact of this proposal includes extension of the user base across new communities, including those using TeraGrid and OSG.
|
1 |
2007 — 2011 |
Deelman, Ewa Gil, Yolanda (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Designing Scientific Software One Workflow At a Time @ University of Southern California
PROPOSAL NUMBER: 0725332 TITLE: Designing Scientific Software One Workflow at a Time PI: Ewa Deelman and Yolanda Gil
Much of science today relies on software to make new discoveries. This software embodies scientific analyses that are frequently composed of several application components and created collaboratively by different researchers. Computational workflows have recently emerged as a paradigm to manage these large-scale and large-scope scientific analyses. Workflows represent computations that are often executed in geographically distributed settings, their interdependencies, their requirements and their data products. The design of these workflows is at the core of today?s scientific discovery processes and must be treated as scientific products in their own right. The focus of this research is to develop the foundations for a science of design of scientific processes embodied in the new artifact that is the computational workflow. The work will integrate best practices and lessons learned in existing workflow applications, and extend them in order to define and formalize design principles of computational workflows. This work will result in a fundamentally new approach to designing workflows that will greatly improve the scientific software design methodology by defining and formalizing design principles, and by familiarizing the scientific community with these effective workflow design processes.
|
1 |
2009 — 2013 |
Deelman, Ewa Brooks, Christopher Gunter, Daniel Swany, Douglas |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Stci: Middleware For Monitoring and Troubleshooting of Large-Scale Applications On National Cyberinfrastructure @ University of Southern California
This proposal will be awarded using funds made available by the American Recovery and Reinvestment Act of 2009 (Public Law 111-5), and meets the requirements established in Section 2 of the White House Memorandum entitled, Ensuring Responsible Spending of Recovery Act Funds, dated March 20, 2009.
The STCI: Middleware for Monitoring and Troubleshooting of Large-Scale Applications on National Cyberinfrastructure project aims to provide robust and scalable workflow monitoring services that can be used to track the progress of workflow-based applications as they are executing on the distributed cyberinfrastructure. New anomaly detection and troubleshooting services will also be developed to alert users to problems with the application and cyberinfrastructure services and allow them to quickly navigate and mine the application's execution records. The foundation of this work is the development of a robust and scalable infrastructure for performance information gathering and distribution. Information flowing through this infrastructure will be stored in high-performance archives and distributed to interested entities through subscription interfaces. Three main services will be developed: 1) an online monitoring service, 2) an anomaly detection service based on dynamic mining of application and cyberinfrastructure logs and 3) a troubleshooting service that will help trace the source of a failure.
Intellectual Merit This work will potentially increase scientists' productivity by allowing them to quickly identify problems in an application, thus reducing the time it takes to generate scientifically meaningful results. This work will also make the performance of complex scientific workflows more transparent, which will enable the generation of accurate estimates of overall time to completion, more efficient use of resources, and easier resolution of end-to-end performance problems in collaboration with network and resource providers.
Broader Impact Scientific communities in astronomy, biology, earthquake science, physics, and others will immediately benefit from the proposed system. Because the approach relies on simple, well-defined logging formats, this work is applicable to a range of workflow management systems as well as sub-components of those systems such as job managers and data transfer tools.
|
1 |
2009 — 2013 |
Deelman, Ewa Wuerthwein, Frank Holzman, Burt (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Stci: Integrated Resource Provisioning Across the National Cyberinfrastructure in Support of Scientific Workloads @ University of Southern California
This proposal will be awarded using funds made available by the American Recovery and Reinvestment Act of 2009 (Public Law 111-5), and meets the requirements established in Section 2 of the White House Memorandum entitled, Ensuring Responsible Spending of Recovery Act Funds, dated March 20, 2009.
The goal of the project ?STCI: Integrated Resource Provisioning Across the National Cyberinfrastructure in Support of Scientific Workloads? is to develop a resource provisioning system that will provide a common job interface to the two national cyberinfrastructures in the US (OSG and the TeraGrid). The system will provide both a priori and dynamic provisioning capabilities, where resources can be reserved explicitly before application execution as well as implicitly as the application jobs enter the job management system. The effects of such provisioning strategies will be tracked via monitoring solutions integrated with the system. This work will also extend the provisioning capabilities to virtual environments such as those delivered by commercial and science clouds. Service administration capabilities will also be provided as part of this work.
Intellectual Merit The proposed system will provide a robust and scalable resource provisioning capability that will bridge heterogeneous, distributed cyberinfrastructures, making it easier for scientists with diverse computational requirements to efficiently leverage the available computing power and improve their overall productivity. Efficient use of computational resources is also achieved by the integrated system by occupying resources as they become available and releasing those resources when they are no longer needed.
Broader Impact The integrated and extended system will support a broad spectrum of applications in the scientific domain ranging from workflows composed of interdependent tasks to applications composed of a large number of independent jobs to loosely coupled parallel applications. In addition to the current users, the proposed system can be leveraged by applications in astronomy, gravitational-wave physics, fusion, and many others.
|
1 |
2009 — 2013 |
Chervenak, Ann [⬀] Deelman, Ewa |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Dc: Medium: Intelligent Data Placement in Support of Scientific Workflows @ University of Southern California
Transformative research is conducted via computational analyses of large data sets in the terabyte and petabyte range. These analyses are often enabled by scientific workflows, which provide automation and efficient and reliable execution on campus and national cyberinfrastructure resources. Workflows face many issues related to data management such as locating input data, finding necessary storage co-located with computing capabilities, and efficiently staging data so that the computation progresses but storage resources do not fill up. Such data placement decisions need to be made within the context of individual workflows and across multiple concurrent workflows. Scientific collaborations also need to perform data placement operations to disseminate and replicate key data sets. Additional challenges arise when multiple scientific collaborations share cyberinfrastructure and compete for limited storage and compute resources. This project will explore the interplay between data management and computation management for these scenarios. The project will include the design of algorithms and methodologies that support large-scale data management for efficient workflow-based computations composed of individual analyses and workflow ensembles while preserving policies governing data storage and access. The algorithms will be evaluated regarding their impact on performance of synthetic and real-world workflows running in simulated and physical cyberinfrastructures. New approaches to data and computation management can potentially transform how scientific analyses are conducted at the petascale. Besides advancing computer science, this work will have direct impact on data and computation management for a range of scientific disciplines that manage large data sets and use them in complex analyses running on cyberinfrastructure.
|
1 |
2010 — 2013 |
Deelman, Ewa |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sdci/Stci as the Software Supply Chain of the National Cyberinfrastructure Workshop @ University of Southern California
The SDCI/STCI PI meeting will bring together leading software cyberinfrastructure projects to discuss issues relevant to the community as we move into the future. This meeting will discuss the effective use of NSF middleware development projects by scientists and engineers so that researchers in a number of domains can make advances in their respective fields without being burdened by the interactions with the cyberinfrastructure.
Intellectual Merit The proposed workshop will support the exchange of ideas among the current software cyberinfrastructure projects. It will potentially provide guidance on issues related to the development of robust software and to the problem of software sustainability.
Broader Impact The proposed workshop will solicit participation from major science and engineering projects that rely on the national cyberinfrastructure for their computations and data management needs. Their participation will help ensure that the cyberinfrastructure software developed as part of SDCI and STCI projects will be relevant and broadly applicable to a number of science and engineering domains. The results of this meeting will then potentially guide middleware development and testing in the future
|
1 |
2012 — 2014 |
Deelman, Ewa Livny, Miron (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
The Role of Software and Software Institutes in Computational Science Over Time @ University of Southern California
The workshop will bring together Principle Investigators of the leading software cyberinfrastructure projects and discuss issues relevant to the community as we move into the future. In 2011 and 2012 the OCI Software Infrastructure for Sustained Innovation (SI2) program funded software efforts in small development efforts that can provide software pieces that can be integrated into the larger cyberinfrastructure, and larger collaborations that were delivering significant community software. In addition, new SI2 awards aimed at conceptualizing large-scale software institutes, aimed at providing a fabric for the software needed by domain scientists to achieve breakthroughs in their intra and inter-disciplinary efforts, will be awarded. New NSF initiatives such as EarthCube are defining roadmaps for cyberinfrastructure development in Earth sciences. This workshop will bring together the Principle Investigators of the recent SI2 awards to discuss potential synergies and collaborations, define challenges ahead, discuss the relationship of the SI2 efforts to the planned Software Institutes, and explore the relationship of the OCI-funded software in the context of the broad NSF initiatives such as EarthCube, DataWay, and other planned community-focused efforts.
To achieve its goals, the workshop will focus on these main themes: (i) Discussing SI2 projects within the context of SI2 Institutes and NSF-wide initiative such as EarthCube, DataWay, and others; (ii) sharing experiences in building quality software and services; (iii) fostering collaboration and providing incentives for collaboration and cyberinfrastructure development as a career; and (iv) sustaining the software capabilities in the long term, defining software value.
The workshop will solicit participation from major science and engineering projects that rely on the national cyberinfrastructure for their computations and data management needs. Their participation will help ensure that the cyberinfrastructure software developed as part of SI2 projects will be relevant and broadly applicable to a number of science and engineering domains. The results of this workshop will then potentially guide cyberinfrastructure development and testing in the future.
|
1 |
2012 — 2018 |
Deelman, Ewa Livny, Miron (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Si2-Ssi: Distributed Workflow Management Research and Software in Support of Science @ University of Southern California
This award funds the enhancement of state-of-the-art workflow technologies and their promotion within a broad range of scientific domains. The overarching goal is to advance scientific discovery by providing scientists with tools that can manage computations on national cyberinfrastructure in a way that is reliable and scalable.
The key technology supported by this award is the Pegasus Workflow Management System (Pegasus). This program of work includes the development, support, and maintenance of Pegasus. Pegasus allows users to declaratively describe their workflow, then makes a plan that maps this description onto the available execution resources and executes the plan. This approach is scalable, reliable, and supports applications running on campus resources, clouds, and national cyberinfrastructure.
The work conducted under this award will 1) enhance the sustainability of the Pegasus software through the expanded adoption of sound software engineering practices and improved usability, 2) enhance core capabilities, especially in the area of data management, to meet user requirements and make Pegasus easier to integrate into end-to-end scientific environments, 3) promote the adoption of workflow management technologies within domain and computer sciences.
Intellectual Merit: Pegasus WMS brings innovative and powerful frameworks to the desk of the scientist. Through close collaboration with a broad community of engaged users, experimentation in large-scale distributed computing is made possible. This experimentation supports the development of new scientific workflow management concepts, frameworks and technologies. The proposed work also supports scientific reproducibility by providing a workflow management system that integrates and automates data, metadata, and provenance management functions.
Broader Impact: Pegasus WMS has been adopted by scientists from different domains and has been integrated into end-user environments such as workflow composition tools and portals. The program of outreach and education facilitated by this award will expand the impact of Pegasus through tutorials, workshops, meetings with potential users, and online materials. The proposed interface enhancements will allow more end-user environments to leverage Pegasus? capabilities and will extend the impact of Pegasus to a broader spectrum of users.
|
1 |
2012 — 2014 |
Chen, Ting Deelman, Ewa Knowles, James A (co-PI) [⬀] |
U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Robust and Portable Workflow-Based Tolls For Mrna and Genome Re-Sequencing @ University of Southern California
DESCRIPTION (provided by applicant): Sequencing of DNA and cDNA libraries on next-generation sequencing (NGS) platforms has become the method of choice for genomic and transcriptional analyses. One obstacle that inhibits wider adoption of NGS techniques is the lack of comprehensive, yet easy to use software packages with which to conduct data analysis. To meet this need, we have developed RseqFlow, a set of common analytic modules for the analysis of RNA-seq data which is formalized into an easy to use workflow. The workflow is managed by the Pegasus Workflow Management System (WMS), which maps the modules to available computational resources and automatically executes the steps in the appropriate order. A Virtual Machine (VM) was created for the software package which eliminates complex configuration and installation steps. In this proposal, we plan to extend RseqFlow to include more analytic functions and also to generalize it to work for multiple model organisms including the Mouse, Worm, Fruit fly, Plant and Yeast. We also propose the development of a similar workflow for the analysis of genome re-sequencing data. Both of the workflows will take advantage of several analytic tools we have developed, including PerM (short read alignment), ComB (SNP Calling), Clippers (Indel/Junction detection), and WeaV (de novo assembly). One of the unique features of our workflow is an iterative alignment strategy where sequence variants are used to update the sequence and improve alignment accuracy which in turn affords us the ability to accurately determine not only SNPs and indels but also structural and copy-number variations. A final effort will include combining the workflows for RNA-seq data and genome re-sequencing data to perform RNA editing analysis. All programs developed under this proposal will be rigorously tested on a number of different data sets and on multiple computational platfonns, and use sound software engineering practices. All software released under this proposal will be open source and greatly benefit many biological projects which incorporate DNA and RNA sequencing approaches.
|
1 |
2012 — 2013 |
Deelman, Ewa Gil, Yolanda [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Earthcube Community Workshop: Designing a Roadmap For Workflows in Geosciences @ University of Southern California
EarthCube is focused on community-driven development of an integrated and interoperable knowledge management system for data in the geo- and environmental sciences. By utilizing a cooperative, as opposed to competitive, process like that which created the Internet and Open Source software, EarthCube will attack the recalcitrant and persistent problems that so far have prevented adequate access to and the analysis, visualization, and interoperability of the vast storehouses of disparate geoscience data and data types residing in distributed and diverse data systems. This awards funds a series of broad, inclusive community interactions to gather adequate information and requirements to create a roadmap for a critical capability (workflow) in the development of EarthCube, a major new NSF initiative. Workflow in the context of EarthCube, and cyberinfrastructure in general, encompasses a broad range of topics including distributed execution management, the coupling of multiple models into composite applications, the integration of a wide range of data sources with processing, and the creation of refined data products from raw data. A key benefit of the funded work in terms of evaluating and creating community consensus on the best way forward for this capability (i.e., workflow) is the ability to document the provenance of data used in modeling and reproduce model and data-enabled scientific results. The funded workshop and information collecting activity will be open to all interested parties and is being led by a diverse and expert team of cyberinfrastructure developers, computer scientists, and geoscientists. Broader impacts of the work include converging on approaches, protocols, and standards that may be applicable across the sciences. They also include the fostering of close interaction between communities that do not commonly interact with one another and focusing them on the common goal of creating a new paradigm in data and knowledge management in the geosciences.
|
1 |
2013 — 2014 |
Deelman, Ewa |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Cc-Nie Integration: Transforming Computational Science With Adamant (Adaptive Data-Aware Multi-Domain Application Network Topologies) @ University of Southern California
Workflows, especially data-driven workflows and workflow ensembles are becoming a centerpiece of modern computational science. However, scientists lack the tools that integrate the operation of workflow-driven science applications on top of dynamic infrastructures that link campus, institutional and national resources into connected arrangements targeted at solving a specific problem. These tools must (a) orchestrate the infrastructure in response to application demands, (b) manage application lifetime on top of the infrastructure by monitoring various workflow steps and modifying slices in response to application demands, and (c) integrate data movement with the workflows to optimize performance.
Project ADAMANT (Adaptive Data-Aware Multi-domain Application Network Topologies) brings together researchers from RENCI/UNC Chapel Hill, Duke University and USC/ISI and two successful software tools to solve these problems: Pegasus workflow management system and ORCA resource control framework, developed for NSF GENI. The integration of Pegasus and ORCA enables powerful application- and data-driven virtual topology embedding into multiple institutional and national substrates (providers of cyber-resources, like computation, storage and networks). ADAMANT leverages ExoGENI - an NSF-funded GENI testbed, as well as national providers of on-demand bandwidth services (NLR, I2, ESnet) and existing OSG computational resources to create elastic, isolated environments to execute complex distributed tasks. This approach improves the performance of these applications and, by explicitly including data movement planning into the application workflow, enables new unique capabilities for distributed data-driven "Big Science" applications.
|
1 |
2014 — 2017 |
Deelman, Ewa Couvares, Peter Brown, Duncan [⬀] Qin, Jian (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cif21 Dibbs: Domain-Aware Management of Heterogeneous Workflows: Active Data Management For Gravitational-Wave Science Workflows
Analysis and management of large data sets are vital for progress in the data-intensive realm of scientific research and education. Scientists are producing, analyzing, storing and retrieving massive amounts of data. The anticipated growth in the analysis of scientific data raises complex issues of stewardship, curation and long-term access. Scientific data is tracked and described by metadata. This award will fund the design, development, and deployment of metadata-aware workflows to enable the management of large data sets produced by scientific analysis. Scientific workflows for data analysis are used by a broad community of scientists including astronomy, biology, ecology, and physics. Making workflows metadata-aware is an important step towards making scientific results easier to share, to reuse, and to support reproducibility. This project will pilot new workflow tools using data from the Laser Interferometer Gravitational-wave Observatory (LIGO), a data-intensive project at the frontiers of astrophysics. The goal of LIGO is to use gravitational waves---ripples in the fabric of spacetime---to explore the physics of black holes and understand the nature of gravity.
Efficient methods for accessing and mining the large data sets generated by LIGO's diverse gravitational-wave searches are critical to the overall success of gravitational-wave physics and astronomy. Providing these capabilities will maximize existing NSF investments in LIGO, support new modes of collaboration within the LIGO Scientific Collaboration, and better enable scientists to explain their results to a wider community, including the critical issue of data and analysis provenance for LIGO's first detections. The interdisciplinary collaboration involved in this project brings together computational and informatics theories and methods to solve data and workflow management problems in gravitational-wave physics. The research generated from this project will make a significant contribution to the theory and methods in identification of science requirements, metadata modeling, eScience workflow management, data provenance, reproducibility, data discovery and analysis. The LIGO scientists participating in this project will ensure that the needs of the community are met. The cyberinfrastructure and data-management scientists will ensure that the software products are well-designed and that the work funded by this award is useful to a broader community.
|
0.954 |
2016 — 2019 |
Deelman, Ewa |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Cici: Secure and Resilient Architecture: Scientific Workflow Integrity With Pegasus @ University of Southern California
Scientists use computer systems to analyze and store their scientific data, sometimes in a complex process across multiple machines. This process can be tedious and error-prone, which has led to the development of software known as a "workflow management system". Workflow management systems allow scientists to describe their process in a human-friendly way and then the software handles the details of the processing for the scientists, dealing with tedious and repetitive steps and handling errors. One popular workflow management system is Pegasus, which, over the past three years, was used to run over 700,000 workflows by scientists in a number of domains including astronomy, bioinformatics, earthquake science, gravitational wave physics, ocean science, and neuroscience. The "Scientific Workflow Integrity with Pegasus" project enhances Pegasus with additional security features. The scientist's description of their desired work is protected from tampering and the data processed by Pegasus is checked to ensure it hasn't been accidentally or maliciously modified. Such tamper protection is attained by cryptographic techniques that ensure data integrity. These changes allow scientists, and our society, to be more confident of scientific findings based on collected data.
The Scientific Workflow Integrity with Pegasus project strengthens cybersecurity controls in the Pegasus Workflow Management System in order to provide assurances with respect to the integrity of computational scientific methods. These strengthened controls enhance both Pegasus' handling of science data and its orchestration of software-defined networks and infrastructure. The result is increased trust in computational science and increased assurance in our ability to reproduce the science by allowing scientists to validate that data has not been changed since a workflow completed and that the results from multiple workflows are consistent. The focus on Pegasus is due to its popularity in the scientific community as a method of computation and data management automation. For example, LIGO, the NSF-funded gravitational-wave physics project, recently used the Pegasus Workflow Management System to structure and execute the analyses that confirmed and quantified its historic detection of a gravitational wave, confirming the prediction made by Einstein 100 years ago. The proposed project has established collaborations with LIGO and additional key NSF infrastructure providers and science projects to ensure broadly applied results.
|
1 |
2017 — 2022 |
Deelman, Ewa Livny, Miron (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Si2-Ssi: Pegasus: Automating Compute and Data Intensive Science @ University of Southern California
This project addresses the ever-growing gap between the capabilities offered by on-campus and off-campus cyberinfrastructures (CI) and the ability of researchers to effectively harness these capabilities to advance scientific discovery. Faculty and students on campuses struggle to extract knowledge from data that does not fit on their laptops or cannot be processed by an Excel spreadsheet and they find it difficult to efficiently manage their computations. The project sustains and enhances the Pegasus Workflow Management System, which enables scientist to orchestrate and run data- and compute-intensive computations on diverse distributed computational resources. Enhancements focus on the automation capabilities provided by Pegasus to support workflows handling large data sets, as well as usability of Pegasus that lowers the barrier of its adoption. This effort expands the reach of the advanced capabilities provided by Pegasus to researchers from a broader spectrum of disciplines that range from gravitational-wave physics to bioinformatics, and from earth science to material science.
For more than 15 years the Pegasus Workflow Management System has been designed, implemented and supported to provide abstractions that enable scientists to focus on structuring their computations without worrying about the details of the target CI. To support these workflow abstractions Pegasus provides automation capabilities that seamlessly map workflows onto target resources, sparing scientists the overhead of managing the data flow, job scheduling, fault recovery and adaptation of their applications. Automation enables the delivery of services that consider criteria such as time-to-solution, as well as takes into account efficient use of resources, managing the throughput of tasks, and data transfer requests. The power of these abstractions was demonstrated in 2015 when Pegasus was used by an international collaboration to harness a diverse set of resources and to manage compute- and data- intensive workflows that confirmed the existence of gravitational waves, as predicted by Einstein's theory of relativity. Experience from working with diverse scientific domains - astronomy, bioinformatics, climate modeling, earthquake science, gravitational and material science - uncover opportunities for further automation of scientific workflows. This project addresses these opportunities through innovation in the following areas: automation methods to include resource provisioning ahead of and during workflow execution, data-aware job scheduling algorithms, and data sharing mechanisms in high-throughput environments. To support a broader group of "long-tail" scientists, effort is devoted to usability improvements as well as outreach, education, and training activities. The proposed work includes the implementation and evaluation of advanced frameworks, algorithms, and methods that enhance the power of automation in support of data-intensive science. These enhancements are delivers as dependable software tools integrated with Pegasus so that they can be evaluated in the context of real-life applications and computing environments. The data-aware focus targets new classes of applications executing in high-throughput and high-performance environments. Pegasus has been adopted by researchers from a broad spectrum of disciplines that range from gravitational-wave physics to bioinformatics, and from earth science to material science. It provides and enhances access to national CI such as OSG and XSEDE, and as part of this work it will be deployed within Chameleon and Jetstream to provide broader access to NSF's CI investments. Through usability improvements, engagement with CI and community platform providers such as HubZero and Cyverse, combined with educational, training, and tutorial activities, this project broadens the set of researchers that leverage automation for their work. Collaboration with the Gateways Institute assures that Pegasus interfaces are suitable for vertical integration within science gateways and seamlessly supports new scientific communities.
|
1 |
2018 — 2020 |
Deelman, Ewa Nabrzyski, Jaroslaw Mandal, Anirban (co-PI) [⬀] Ricci, Robert Pascucci, Valerio (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Pilot Study For a Cyberinfrastructure Center of Excellence @ University of Southern California
NSF's major multi-user research facilities (large facilities) are sophisticated research instruments and platforms - such as large telescopes, interferometers and distributed sensor arrays - that serve diverse scientific disciplines from astronomy and physics to geoscience and biological science. Large facilities are increasingly dependent on advanced cyberinfrastructure (CI) - computing, data and software systems, networking, and associated human capital - to enable broad delivery and analysis of facility-generated data. As a result of these cyber infrastructure tools, scientists and the public gain new insights into fundamental questions about the structure and history of the universe, the world we live in today, and how our plants and animals may change in the coming decades. The goal of this pilot project is to develop a model for a Cyberinfrastructure Center of Excellence (CI CoE) that facilitates community building and sharing and applies knowledge of best practices and innovative solutions for facility CI.
The pilot project will explore how such a center would facilitate CI improvements for existing facilities and for the design of new facilities that exploit advanced CI architecture designs and leverage establish tools and solutions. The pilot project will also catalyze a key function of an eventual CI CoE - to provide a forum for exchange of experience and knowledge among CI experts. The project will also gather best practices for large facilities, with the aim of enhancing individual facility CI efforts in the broader CI context. The discussion forum and planning effort for a future CI CoE will also address training and workforce development by expanding the pool of skilled facility CI experts and forging career paths for CI professionals. The result of this work will be a strategic plan for a CI CoE that will be evaluated and refined through community interactions: workshops and direct engagement with the facilities and the broader CI community. This project is being supported by the Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering and the Division of Emerging Frontiers in the Directorate for Biological Sciences.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2018 — 2021 |
Deelman, Ewa Welch, Von Wang, Cong Mandal, Anirban [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cici: Ssc: Integrity Introspection For Scientific Workflows (Iris) @ University of North Carolina At Chapel Hill
Scientists use computer systems to analyze and store their scientific data, sometimes in a complex process across multiple machines in different geographical locations. It has been observed that sometimes during this complex process, scientific data is unintentionally modified or accidentally tampered with, with errors going undetected and corrupt data becoming part of the scientific record. The IRIS project tackles the problem of detecting and diagnosing these unintentional data errors that might occur during the scientific processing workflow. The approach is to collect data relevant to the correctness and integrity of the scientific data from various parts of the computing and network system involved in the processing, and to analyze the collected data using machine learning techniques to uncover errors in the scientific data processing. The solutions are integrated into Pegasus, a popular "workflow management system" - a software used to describe the complex process in a user-friendly way and that handles the details of processing for the scientists. The research methods will be validated on national computing resources with exemplar scientific applications from gravitational-wave physics, earthquake science, and bioinformatics. These solutions will allow scientists, and our society, to be more confident of scientific findings based on collected data.
Data-driven science workflows often suffer from unintentional data integrity errors when executing on distributed national cyberinfrastructure (CI). However, today, there is a lack of tools that can collect and analyze integrity-relevant data from workflows and thus, many of these errors go undetected jeopardizing the validity of scientific results. The goal of the IRIS project is to automatically detect, diagnose, and pinpoint the source of unintentional integrity anomalies in scientific workflows executing on distributed CI. The approach is to develop an appropriate threat model and incorporate it in an integrity analysis framework that collects workflow and infrastructure data and uses machine learning (ML) algorithms to perform the needed analysis. The framework is powered by novel ML-based methods developed through experimentation in a controlled testbed and validated in and made broadly available on NSF production CI. The solutions will be integrated into the Pegasus workflow management system, which is used by a wide variety of scientific domains. An important part of the project is the engagement with science application partners in gravitational-wave physics, earthquake science, and bioinformatics to deploy the analysis framework for their workflows, and to iteratively fine tune the threat models, ML model training, and ML model validation in a feedback loop.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.945 |
2018 — 2020 |
Deelman, Ewa Zink, Michael (co-PI) [⬀] Wang, Cong (co-PI) [⬀] Mandal, Anirban [⬀] Rodero, Ivan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cc* Integration: Delivering a Dynamic Network-Centric Platform For Data-Driven Science (Dynamo) @ University of North Carolina At Chapel Hill
Computational science today depends on many complex, data-intensive applications operating on datasets that originate from a variety of scientific instruments and data stores. A major challenge for data-driven science applications is the integration of data into the scientist's workflow. Recent advances in dynamic, networked cloud infrastructures provide the building blocks to construct integrated, reconfigurable, end-to-end infrastructure that has the potential to increase scientific productivity. However, applications and workflows have seldom taken advantage of these advanced capabilities. Dynamo will allow atmospheric scientists and hydrologists to improve short- and long-term weather forecasts, and aid the oceanographic community to improve key scientific processes like ocean-atmosphere exchange, turbulent mixing etc., both of which have direct impact on our society. The Dynamo project will develop innovative network-centric algorithms, policies and mechanisms to enable programmable, on-demand access to high-bandwidth, configurable network paths from scientific data repositories to national CyberInfrastructure facilities, and help satisfy data, computational and storage requirements of science workflows. This will enable researchers to test new algorithms and models in real time with live streaming data, which is currently not possible in many scientific domains. Through enhanced interactions between Pegasus, the network-centric platform, and new network-aware workflow scheduling algorithms, science workflows will benefit from workflow automation and data management over dynamically provisioned infrastructure. The system will transparently map application-level, network Quality of Service expectations to actions on programmable software defined infrastructure.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.945 |
2019 — 2020 |
Deelman, Ewa |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
2019 Nsf Workshop On Connecting Large Facilities and Cyberinfrastructure @ University of Southern California
Cyberinfrastructure (CI) is a fabric that pervades and enables modern science and has been long-supported at NSF. CI comprises advanced computing, data, software, and networking infrastructure, as well as the necessary specialized human capital. NSF-supported Large Facilities are major scientific research platforms and represent some of the NSF's most substantial investments. Facilities rely heavily on existing CI and new CI capabilities and solutions to support their scientific communities. However, CI used by facilities is currently predominantly built independently by each facility. In 2015, NSF began to support a series of workshops focused on CI for Large Facilities to bring together the facility and CI communities to share common experiences and challenges, discuss potential collaborations and opportunities for leveraging CI within the community. The 2019 Workshop on NSF Large Facilities and CI aims to continue and advance the discussion, exchange, and community building, through a forum for sharing of ideas and experiences and, importantly, prepare for future CI research, development, and deployment.
Specific goals of the workshop include 1) identifying common cyberinfrastructure challenges among facilities, 2) understanding the facility data lifecycle, including the commonalities and differences between data lifecycle stages, 3) exploring opportunities for joint training and education among facilities and large CI projects, 4) sharing experiences in CI project management, and 5) discussing approaches to building a community of CI professionals. An important focus of the workshop will be the CI needed to support the entire facility science lifecycle, which spans data capture and processing, data storage and archiving, and data access, analysis, visualization, and dissemination. The exploration of potential collaborations on common CI challenges is another important aim. Other workshop topics may include computation, network management, and education and workforce development. A workshop report will be posted online to disseminate the discussions and findings to the broader CI community. Participation will be encouraged from a diverse set of CI researchers and professionals at various stages in their careers.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2020 — 2021 |
Deelman, Ewa Ferreira Da Silva, Rafael |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Pposs: Planning: Performance Scalability, Trust, and Reproducibility: a Community Roadmap to Robust Science in High-Throughput Applications @ University of Southern California
This project is focused on a critical issue in computational science. As scientists in all fields increasingly rely on high-throughput applications (which combine multiple components into increasingly complex multi-modal workflows on heterogeneous systems), the increasing complexities of those applications hinder the scientists? ability to generate robust results. The project recruits a cross-disciplinary community working together to define, design, implement, and use a set of solutions for robust science. In so doing, the community defines a roadmap that enables high-throughput applications to withstand and overcome adverse conditions such as heterogeneous, unreliable architectures at all scales including extreme scale, rigorous testing under uncertainties, unexplainable algorithms (e.g., in machine learning), and black-box methods. The project?s novelties are its comprehensive, cross-disciplinary study of high-throughput applications for robust scientific discovery from hardware and systems all the way to policies and practices.
Through three virtual mini-workshops called virtual world cafes, this project engages a community of scientists at campuses (through the Computing Alliance of Hispanic-Serving Institutions [CAHSI], the Coalition for Academic Scientific Computing [CASC], and the Southern California Earthquake Center [SCEC]), at national laboratories, and in industry. The scientists participate in defining scalability, trust, and reproductivity in an initial set of high-throughput applications; identifying a set of experimental practices that support the in-concert successful progress of these applications? workflows; advancing towards a vision of general hardware and software solutions for robust science by evaluating the generality and transferability of experimental practices and by identifying any missing parts; and defining a research agenda for the next-generation workflows.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2020 — 2022 |
Deelman, Ewa |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Eager: Leveraging Advanced Cyberinfrastructure and Developing Organizational Resilience For Nsf Large Facilities in the Pandemic Era @ University of Southern California
The COVID-19 global pandemic in 2020 has created major disruptions to the research enterprise. NSF-supported large facilities are critical elements of US research infrastructure and are increasingly dependent on advanced cyberinfrastructure (CI) ? comprising advanced computing, data and software assets, networking, and the related specialized workforce ? to accomplish their science missions. This study investigates how large facilities are impacted by, and are responding to, the pandemic challenge with a focus on understanding factors related to the use of existing CI. This project will also explore the value of CI in the broader social context of how people and the facilities perceive and respond to major disruptive events. The goal is to determine how to design large facility organizations to be more resilient during crises and major disasters and what CI capabilities are needed to support these and other large science projects to accomplish their science missions during such disruptions.
This study comprises three main research questions related to NSF large facilities and CI during the pandemic: (a) What types of research activities remain "business as usual" and what types of activities must adapt or stop completely under pandemic conditions? (b) If facilities could turn back time, what would they have done to better prepare? And (c) What lessons are facilities learning from the current disruptions, and how can these be best disseminated to the facility, CI, and research communities? The approach is grounded in Weick's Theory of Organizing, and examines disruptions from the environment (ecological change) through the stages of enactment (immediate actions), selection (rules establishment), and retention (identification of approaches worth re-utilizing in future events), with feedback loops linking the stages and the environment. The project?s goals will be accomplished primarily through interviews with domain scientists, CI users, developers, and administrators who are engaged in NSF large facility science and operations. The project will also analyze and document the organizational structures of the facilities to identify the key engagement points with national CI resources and services, towards enhancing the ability of the broader CI community to engage with the facilities. The ultimate objective of this project is to provide a framework for facilities and other large science projects to mitigate disruptions to their scientific and operational activities in current and future times of crisis. Study outcomes and findings will be widely disseminated to the stakeholder communities.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2020 — 2021 |
Deelman, Ewa |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Eager: Advancing Reproducibility in Multi-Messenger Astrophysics @ University of Southern California
This project advances reproducibility in multi-messenger astrophysics by developing and sharing sustainable knowledge necessary to understand and use published scientific results. Specifically, the project targets breakthrough findings published by the First M87 Event Horizon Telescope (EHT) and the Neutron star Interior Composition ExploreR (NICER) projects. Understanding how reproducibility is incorporated in astrophysics workflows and sharing practices in reproducible scientific software help enable open science across disciplines. Codes, data, and workflows generated by this project enable researchers and students at various levels of education to regenerate the same findings, learn about the scientific methods, and engage in new science, technology, engineering, and mathematics (STEM) research.
The project provides the astrophysics community with a transformative building block to a roadmap for reproducible open science. Findings about the reproducibility process of the EHT and NICER results are captured and disseminated through documentation, data products, and methods used. Through new insights described in publications, the project delivers recommendations on how results in astrophysics can be effectively enhanced to provide reproducible, replicable, and shareable scientific discovery. The project findings enable scientists to reproduce published results beyond the targeted work and make the methods visible to a larger audience including STEM students; the new understating helps accelerate the pace of scientific discovery.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2020 — 2022 |
Deelman, Ewa Calyam, Prasad (co-PI) [⬀] Zink, Michael [⬀] Mandal, Anirban (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cc* Integration-Large: An 'On-the-Fly' Deeply Programmable End-to-End Network-Centric Platform For Edge-to-Core Workflows @ University of Massachusetts Amherst
Unmanned Aerial Vehicles (also known as drones) are becoming popular in the sky. Their application reaches from hobby drones for leisurely activities to life-critical drones for organ transport to commercial applications such as air taxis. The safe, efficient, and economic operation of such drones poses a variety of challenges that have to be addressed by the science community. For example, drones need very detailed, close to the ground weather information for safe operations, and data processing and energy consumption of drones need to be intelligently handled. This project will provide tools that will allow researchers and drone application developers to address operational drone challenges by using advanced computer and network technologies.
This project will provide an architecture and tools that will enable scientists to include edge computing devices in their computational workflows. This capability is critical for low latency and ultra-low latency applications like drone video analytics and route planning for drones. The proposed work will include four major tasks. First, cutting edge network and compute infrastructure will be integrated into the overall architecture to make them available as part of scientific workflows. Second, in-network processing at the network edge and core will be made available through new programming abstractions. Third, enhanced end-to-end monitoring capabilities will be offered. Finally, the architecture will leverage the Pegasus Workflow Management System to integrate in-network and edge processing capabilities.
Providing best practices and tools that enable the use of advanced cyberinfrastructure for scientific workflows will have a broad impact on society in the long term. The science drivers that will be supported by this project have the potential to increase the safety and efficiency of drone applications, an area that will grow in significance in the foreseeable future. The project team will enable access to a rich set of resources for researchers and educators from a diverse set of institutions (historically black colleges and universities (HBCU), community colleges, women?s colleges) to further democratize research. In addition, collaboration with the NSF REU (Research Experience for Undergraduates) Site in Consumer Networking will promote participation of under-served/under-represented students in project activities.
Information about the project will be available at http://www.flynet-ci.org to provide information on overall project activities, outreach activities, publications, tools and software, and the project?s team members. The project website will be preserved for at least three years after the project ends.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.939 |
2021 — 2026 |
Deelman, Ewa Pascucci, Valerio (co-PI) [⬀] Mandal, Anirban (co-PI) [⬀] Nabrzyski, Jaroslaw Murillo, Angela |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ci Coe: Ci Compass: An Nsf Cyberinfrastructure (Ci) Center of Excellence For Navigating the Major Facilities Data Lifecycle @ University of Southern California
Innovative and robust Cyberinfrastructure (CI) is critical to the science missions of the NSF Major Facilities (MFs), which are at the forefront of science and engineering innovations, enabling pathbreaking discoveries across a broad spectrum of scientific areas. The MFs serve scientists, researchers and the public at large by capturing, curating, and serving data from a variety of scientific instruments (from telescopes to sensors). The amount of data collected and disseminated by the MFs is continuously growing in complexity and size and new software solutions are being developed at an increasing pace. MFs do not always have all the expertise, human resources, or budget to take advantage of the new capabilities or to solve every technological issue themselves. The proposed NSF Cyberinfrastructure Center of Excellence, CI Compass, brings together experts from multiple disciplines, with a common passion for scientific CI, into a problem-solving team that curates the best of what the community knows; shares expertise and experiences; connects communities in response to emerging challenges; and builds on and innovates within the emerging technology landscape. By supporting MFs to enhance and evolve the underlying CI, the proposed CI Compass will amplify the largest of NSF’s science investments, and have a transformative, broad societal impact on a multitude of MF science and engineering areas and the community of scientists, engineers, and educators MFs serve. CI Compass will also impact the broader NSF CI ecosystem through dissemination of CI Compass outcomes, which can be adapted and adopted by other large-scale CI projects and thus empower them to more efficiently serve their user communities.
The goal of the proposed CI Compass is to enhance the CI underlying the MF data lifecycle (DLC) that represents the transformation of raw data captured by state-of-the-art scientific MF instruments into interoperable and integration-ready data products that can be visualized, disseminated, and converted into insights and knowledge. CI Compass will engage with MFs and contribute knowledge and expertise to the MF DLC CI by offering a collection of services that includes evaluating CI plans, helping design new architectures and solutions, developing proofs of concept, and assessing applicability and performance of existing CI solutions. CI Compass will also enable knowledge-sharing across MFs and the CI community, by brokering connections between MF CI professionals, facilitating topical working groups, and organizing community meetings. CI Compass will also disseminate the best practices and lessons learned via online channels, publications, and community events.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |