1997 — 2002 |
Chervenak, Ann |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: the Personal Terabyte Disk: Managing and Exploiting Large Future Magnetic Disks @ Georgia Tech Research Corporation
Magnetic disk drives are expected to increase in capacity and drop in price at a rate of over 50% per year for at least a decade. By 2010, a terabyte of magnetic disk storage for personal computers will cost a few hundred dollars. The Personal Terabyte Project is developing a set of tools for managing and exploiting this large personal data repository. To guarantee reliability of personal terabyte data, backup techniques including compression, incremental backups, and selective backups of essential data will be implemented, as will disk array redundancy techniques. Future homes will have a variety of network connections, from high-bandwidth, inexpensive broadcast connections to higher-cost, lower-bandwidth point-to-point links. The personal terabyte will cache relevant data that arrives on broadcast networks and aggressively prefetch additional data over point-to-point, on-demand connections. As data are stored on the personal terabyte, the toolkit will perform automatic indexing of text, images and video to facilitate finding data on the massive file system. To address the gap between memory and disk access times, the toolkit will prefetch data from disk to main memory. Finally, to improve application performance, the toolkit will facilitate sharing of the disk cache among local and remote processes.***
|
0.93 |
1999 — 2005 |
Schwan, Karsten (co-PI) [⬀] Ahamad, Mustaque (co-PI) [⬀] Atkeson, Christopher Ramachandran, Umakishore [⬀] Chervenak, Ann |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Infrastructure: Advanced Media-Oriented Systems Research: Ubiquitous Capture, Interpretation, and Access @ Georgia Tech Research Corporation
EIA-9972872 Ramachandran, Umakishore Ahamad, Mustaque Atkeson, Christopher G. Chervenak, Ann Schwan, Karsten Georgia Institute of Technology
CISE Research Infrastructure: Advanced Media-Oriented Systems Research: Ubiquitous Capture, Interpretation and Access
Georgia Tech researchers will perform research on systems and application level issues arising from two applications, a virtual classroom and perceptual information spaces. The PI's will perform research addressing capture, interpretation, access and delivery of multiple media streams. At the application level, research will address deducing users expressions (e.g. focus of interest) and emotions; scaling issues to the campus wide level will be explored for the virtual classroom, and the equipment will support a research on perceptual processing. Systems level research will examine QoS methods, shared state, metadata, and storage architecture for multimedia lectures and other objects. Middleware and runtime systems will be developed for inter-cluster and client server computing as well as to handle media streams in a heterogeneous environment.
|
0.93 |
2009 — 2013 |
Chervenak, Ann Deelman, Ewa (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Dc: Medium: Intelligent Data Placement in Support of Scientific Workflows @ University of Southern California
Transformative research is conducted via computational analyses of large data sets in the terabyte and petabyte range. These analyses are often enabled by scientific workflows, which provide automation and efficient and reliable execution on campus and national cyberinfrastructure resources. Workflows face many issues related to data management such as locating input data, finding necessary storage co-located with computing capabilities, and efficiently staging data so that the computation progresses but storage resources do not fill up. Such data placement decisions need to be made within the context of individual workflows and across multiple concurrent workflows. Scientific collaborations also need to perform data placement operations to disseminate and replicate key data sets. Additional challenges arise when multiple scientific collaborations share cyberinfrastructure and compete for limited storage and compute resources. This project will explore the interplay between data management and computation management for these scenarios. The project will include the design of algorithms and methodologies that support large-scale data management for efficient workflow-based computations composed of individual analyses and workflow ensembles while preserving policies governing data storage and access. The algorithms will be evaluated regarding their impact on performance of synthetic and real-world workflows running in simulated and physical cyberinfrastructures. New approaches to data and computation management can potentially transform how scientific analyses are conducted at the petascale. Besides advancing computer science, this work will have direct impact on data and computation management for a range of scientific disciplines that manage large data sets and use them in complex analyses running on cyberinfrastructure.
|
0.921 |
2011 — 2015 |
Chervenak, Ann |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Sdci Net: Policy-Driven Large Scale Data Access Framework With Light-Weight Performance Monitoring and Estimation @ University of Southern California
Large-scale science applications are expected to generate exabytes of data over the next 5 to 10 years. With scientific data collected at unprecedented volumes and rates, the success of large scientific collaborations will require that they provide distributed data access with improved data access latencies and increased reliability to a large user community. To meet these requirements, scientific collaborations are increasingly replicating large datasets over high-speed networks to multiple sites. The main objective of this work is to develop and deploy a general-purpose data access framework for scientific collaborations that provides lightweight performance monitoring and estimation, fine-grained and adaptive data transfer management, and enforcement of site and Virtual Organization policies for resource sharing. Lightweight mechanisms will collect monitoring information from data movement tools without putting extra loads on the shared resources. Data transfer management mechanisms will select transfer properties based on each transfer's performance estimation and will adapt those properties when observed performance changes due to the dynamic load on storage, network and other resources. Finally, policy-driven resource management using Virtual Organization policies regarding replication and resource allocation will balance user requirements for data freshness with the load on resources.
Intellectual merit: The team will produce a software framework that will improve the ability of distributed scientific collaborations to provide efficient access to replicated datasets by a large community of users; this framework will combine fine-grained transfer management, transfer advice from policy-driven resource management, and light-weight monitoring. Broader impact: The proposed development will facilitate scientific advances in many domains that increasingly depend on replication and sharing of ever-growing datasets.
|
0.921 |
2015 — 2016 |
Yao, Ke-Thia (co-PI) [⬀] Nakano, Aiichiro (co-PI) [⬀] Chame, Jacqueline Chervenak, Ann |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cds&E: in Situ Data Analysis and Scalable Machine Learning For Exascale Scientific Simulations @ University of Southern California
Large scale scientific simulations are used in a range of application domains including materials science, climate modeling, combustion and others. These simulations are often limited by the hardware on which they run, including the capacity of computational platforms and storage systems. The goals of these simulations may include finding rare but interesting events in simulation output, discovering common sequences of events, and discovering causality among events. Often, these scientific simulations consume all available computational resources on a high performance computing platform during a simulation run, and be forced to only sample data techniques to decrease the size of the simulation so as to make it possible to store, transfer and post-process the output data. Such data sampling reduces the quality of science results, since not all available data are utilized during analysis. This project aims to greatly improve the scale and quality of scientific simulation results using innovative "in situ" algorithms and machine learning techniques for rare event detection. This research will be validated using a large-scale materials science simulation, that of self-healing nanomaterial system capable of sensing and repairing damage in harsh chemical environments and in high temperature/high pressure operating conditions. Self-healing is of significance since it can improve the reliability and lifetime of materials while reducing the cost of manufacturing, monitoring and maintenance of high-temperature turbines, wind, solar energy and lighting systems. The research can be generalized to a range of scientific simulation domains that share the common goals of discovering rare and interesting events, sequences of events and causality among events. Finally, the research concepts and results will be incorporated into graduate level courses taught by the research team.
The goal of the project is to demonstrate the feasibility, performance and scalability of the research approaches in greatly improving the quality of exascale scientific simulations using in situ machine learning algorithms within a well-defined, reusable in situ software framework. The scope of the project includes: selecting a simplified, but representative, long-time material process suitable for super-state parallel replica dynamics (SPRD); developing in situ machine learning algorithms for rare event detection of super-state transitions; and studying library-based approaches to support the high performance coupling of exascale simulations with in situ machine learning algorithms. To accomplish the project goals, the following three objectives are defined: 1) Prove the feasibility, performance and scalability of in situ SPRD simulation for predicting long-time material processes; 2) Prove the feasibility, performance and scalability of in situ machine learning algorithms for rare-event detection of super-state transitions; and 3) Prove the feasibility, performance and scalability of in situ library-based approaches to coupling exascale simulations and machine learning algorithms.
|
0.921 |