2014 — 2017 |
Tatineni, Mahidhar Majumdar, Amitava |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Bigdata: F: Dkm: Collaborative Research: Scalable Middleware For Managing and Processing Big Data On Next Generation Hpc Systems @ University of California-San Diego
Managing and processing large volumes of data and gaining meaningful insights is a significant challenge facing the Big Data community. Thus, it is critical that data-intensive computing middleware (such as Hadoop, HBase and Spark) to process such data are diligently designed, with high performance and scalability, in order to meet the growing demands of such Big Data applications. While Hadoop, Spark and HBase are gaining popularity for processing Big Data applications, these middleware and the associated Big Data applications are not able to take advantage of the advanced features on modern High Performance Computing (HPC) systems widely deployed all over the world, including many of of the multi-Petaflop systems in the XSEDE environment. Modern HPC systems and the associated middleware (such as MPI and Parallel File systems) have been exploiting the advances in HPC technologies (multi/many-core architectures, RDMA-enabled networking, NVRAMs and SSDs) during the last decade. However, Big Data middleware (such as Hadoop, HBase and Spark) have not embraced such technologies. These disparities are taking HPC and Big Data processing into "divergent trajectories."
The proposed research, undertaken by a team of computer and application scientists from OSU and SDSC, aim to bring HPC and Big Data processing into a "convergent trajectory." The investigators will specifically address the following challenges: 1) designing novel communication and I/O runtime for Big Data processing while exploiting the features of modern multi-/many-core, networking and storage technologies; 2) redesigning Big Data middleware (such as Hadoop, HBase and Spark) to deliver performance and scalability on modern and next-generation HPC systems; and 3) demonstrating the benefits of the proposed approach for a set of driving Big Data applications on HPC system. The proposed work targets four major workloads and applications in the Big Data community (namely data analytics, query, interactive, and iterative) using the popular Big Data middleware (Hadoop, HBase and Spark). The proposed framework will be validated on a variety of Big Data benchmarks and applications. The proposed middleware and runtimes will be made publicly available to the community. The research enables curricular advancements via research in pedagogy for key courses in the new data analytics program at Ohio State and SDSC -- among the first of its kind nationwide.
|
0.975 |
2016 — 2019 |
Majumdar, Amitava Tatineni, Mahidhar |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Large: Collaborative Research: Next Generation Communication Mechanisms Exploiting Heterogeneity, Hierarchy and Concurrency For Emerging Hpc Systems @ University of California-San Diego
This award was partially supported by the CIF21 Software Reuse Venture whose goals are to support pathways towards sustainable software elements through their reuse, and to emphasize the critical role of reusable software elements in a sustainable software cyberinfrastructure to support computational and data-enabled science and engineering.
Parallel programming based on MPI (Message Passing Interface) is being used with increased frequency in academia, government (defense and non-defense uses), as well as emerging uses in scalable machine learning and big data analytics. The emergence of Dense Many-Core (DMC) architectures like Intel's Knights Landing (KNL) and accelerator/co-processor architectures like NVIDIA GPGPUs are enabling the design of systems with high compute density. This, coupled with the availability of Remote Direct Memory Access (RDMA)-enabled commodity networking technologies like InfiniBand, RoCE, and 10/40GigE with iWARP, is fueling the growth of multi-petaflop and ExaFlop systems. These DMC architectures have the following unique characteristics: deeper levels of hierarchical memory; revolutionary network interconnects; and heterogeneous compute power and data movement costs (with heterogeneity at chip-level and node-level). For these emerging systems, a combination of MPI and other programming models, known as MPI+X (where X can be PGAS, Tasks, OpenMP, OpenACC, or CUDA), are being targeted. The current generation communication protocols and mechanisms for MPI+X programming models cannot efficiently support the emerging DMC architectures. This leads to the following broad challenges: 1) How can high-performance and scalable communication mechanisms for next generation DMC architectures be designed to support MPI+X (including Task-based) programming models? and 2) How can the current and next generation applications be designed/co-designed with the proposed communication mechanisms?
A synergistic and comprehensive research plan, involving computer scientists from The Ohio State University (OSU) and Ohio Supercomputer Center (OSC) and computational scientists from the Texas Advanced Computing Center (TACC), San Diego Supercomputer Center (SDSC) and University of California San Diego (UCSD), is proposed to address the above broad challenges with innovative solutions. The research will be driven by a set of applications from established NSF computational science researchers running large scale simulations on Stampede and Comet and other systems at OSC and OSU. The proposed designs will be integrated into the widely-used MVAPICH2 library and made available for public use. Multiple graduate and undergraduate students will be trained under this project as future scientists and engineers in HPC. The established national-scale training and outreach programs at TACC, SDSC and OSC will be used to disseminate the results of this research to XSEDE users. Tutorials will be organized at XSEDE, SC and other conferences to share the research results and experience with the community.
|
0.975 |
2019 — 2020 |
Norman, Michael [⬀] Majumdar, Amitava Altintas, Ilkay Strande, Shawn Tatineni, Mahidhar |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Category I. Computing Without Boundaries: Cyberinfrastructure For the Long Tail of Science @ University of California-San Diego
Science and engineering rely upon an increasingly complex and integrated ecosystem of advanced computing and data systems, scientific software, and expertise to conduct research that leads to new knowledge and discovery, and improves the Nation's competitiveness and the health and welfare of its citizens. From the building blocks of life on earth to the deepest mysteries of the universe, researchers use this cyberinfrastructure (CI) to carry out computation and analysis at ever larger scales and complexity. Computing without Boundaries: Cyberinfrastructure for the Long Tail of Science is a transformational project from the San Diego Supercomputer Center at the University of California, San Diego, designed to address these challenges. The centerpiece of the project is the acquisition, deployment, and operation of Expanse, a powerful supercomputer that will complement and extend NSF's Innovative High-Performance Computing (HPC) program. Expanse will: 1) increase the capacity and performance for thousands of users of batch-oriented and science gateway computing; and 2) provide new capabilities that will enable research increasingly dependent upon heterogeneous and distributed resources composed into integrated and highly usable CI. Expanse will feature innovations in system software, operations, and support that extend its capabilities far beyond the limits of the physical system. Through its integration with the public cloud and the Open Science Grid, collaboration with the Science Gateway Community Institute, and support for composable systems, Expanse will become part of a more inclusive national CI. The long tail of computing reflects diversity in science, researchers, their institutions, and those who support and operate CI. The project will reach out to underserved and under-resourced communities, through new initiatives like HPC@MSI, which will allocate a portion of Expanse to Minority Serving Institutes via a rapid access allocation to help them quickly use this powerful new resource. Projected to have a peak speed of 5 Petaflop/s, Expanse will feature next-generation Intel Central Processing Units (CPUs), NVIDIA Graphics Processing Units (GPUs), and a Mellanox InfiniBand network. It will be composed of 13 SDSC Scalable Compute Units (SSCUs), each of which contains 56 CPU nodes and 4 GPU nodes along with over 60 TB of distributed non-volatile memory storage for user scratch. The SSCUs will be integrated with a 12-PB Lustre parallel file system and 7 PB of object storage. SDSC, along with its partners Dell and Aeon Computing, will deploy Expanse in SDSC's energy-efficient data center on the UCSD campus. Expanse will be connected to multiple high-performance research and education networks at 100 Gbps and reach thousands of users who require high performance, but modest-scale resources. Allocation and usage policies will be tailored to achieve fast turnaround and responsiveness. Experts in computational science, data-intensive computing, scientific workflows, and large-scale systems operations will support Expanse at the highest levels of utilization, reliability, and usability. Through integration with national CI, Expanse will enable new models of computing and research that require the full complement of HPC systems and simulation, experimental data analysis, and computational expertise to support and facilitate breakthrough science. Knowledge gained through the project will lead to improvements in algorithms, software, and systems management tools, as well as a better understanding of how integrated CI can be configured to support emerging research.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.975 |
2019 — 2022 |
Majumdar, Amitava Tatineni, Mahidhar |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Frameworks: Designing Next-Generation Mpi Libraries For Emerging Dense Gpu Systems @ University of California-San Diego
The extremely high compute and communication capabilities offered by modern Graphics Processing Units (GPUs) and high-performance interconnects have led to the creation of High-Performance Computing (HPC) platforms with multiple GPUs and high-performance interconnects per node. Unfortunately, state-of-the-art production quality implementations of the popular Message Passing Interface (MPI) programming model do not have the appropriate support to deliver the best performance and scalability for applications on such dense GPU systems. These developments in High-End Computing (HEC) technologies and associated middleware issues lead to the following broad challenge: How can existing production quality MPI middleware be enhanced to take advantage of emerging networking technologies to deliver the best possible scale-up and scale-out for HPC and Deep Learning (DL) applications on emerging dense GPU systems? A synergistic and comprehensive research plan, involving computer scientists from The Ohio State University (OSU) and Ohio Supercomputer Center (OSC) and computational scientists from the Texas Advanced Computing Center (TACC), and San Diego Supercomputer Center (SDSC) and University of California San Diego (UCSD), is proposed to address the above broad challenges with innovative solutions. The proposed framework will be made available to collaborators and the broader scientific community to understand the impact of the proposed innovations on next-generation HPC and DL frameworks and applications in various science domains. Multiple graduate and undergraduate students will be trained under this project as future scientists and engineers in HPC. The proposed work will enable curriculum advancements via research in pedagogy for key courses in the new Data Science programs at OSU, SDSC and TACC. The established national-scale training and outreach programs at TACC, SDSC and OSC will be used to disseminate the results of this research to XSEDE users. Tutorials and workshops will be organized at PEARC, SC and other conferences to share the research results and experience with the community. The project is aligned with the National Strategic Computing Initiative (NSCI) to advance US leadership in HPC and the recent initiative of the US Government to maintain leadership in Artificial Intelligence (AI.)
The proposed innovations include: 1) Designing high-performance and scalable point-to-point, and collective communication operations that fully utilize multiple network adapters and advanced in-network computing features for GPU and CPU buffers within and across nodes; 2) Designing novel datatype processing and unified memory management to improve application performance; 3) Designing CUDA-aware I/O subsystem to accelerate MPI I/O and checkpoint-restart for HPC and DL applications; 4) Designing support for containerized environments to better enable easy deployment of proposed solutions on modern cloud environments; and 5) Carry out integrated development and evaluation to ensure proper integration of proposed designs with the driving applications. The proposed designs will be integrated into the widely-used MVAPICH2 library and made available. The project team members will work closely with internal and external collaborators to facilitate wide deployment and adoption of released software. The proposed solutions will be targeted to enable scale-up and scale-out of the driving science domains (molecular dynamics, lattice QCD, seismology, image classification, and fusion research) on emerging dense GPU platforms. The transformative impact of the proposed development effort is to achieve scalability, performance, and portability out of HPC and DL frameworks and applications to take advantage of emerging dense GPU platforms and hence, leading to significant advancements in science and engineering.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.975 |
2021 — 2026 |
Wuerthwein, Frank [⬀] Rosing, Tajana (co-PI) [⬀] Defanti, Thomas Tatineni, Mahidhar Weitzel, Derek |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Category Ii: a Prototype National Research Platform @ University of California-San Diego
Advances in data-intensive science and engineering research, supported by ongoing developments in cyberinfrastructure, enable new insights and discoveries. Among these are progress in understanding fundamental processes and mechanisms from human public health to the health of the planet; predicting and responding to natural disasters; and promoting the increasing interconnectedness of science and engineering across many fields, including in astronomy, extreme-scale systems management, cell biology, high energy physics, social science, and satellite image analyses. Fundamentally new system architectures are required to accelerate such advances, including capabilities that integrate diverse computing and data resources, research and education networks, edge computing devices, and scientific instruments into highly usable and flexible distributed systems. Such systems provide both technological platforms for conducting research, and can catalyze distributed and multidisciplinary teams, which are developing new and transformative approaches to addressing disciplinary and multidisciplinary research problems.
Recent reports, informed through community visioning, including the NSF supported report “Transforming Science Through Cyberinfrastructure”, note that a cyberinfrastructure (CI) ecosystem designed to be open and scalable, and to grow with time may advance through in kind contributions of compute and data resources by the national science and education community. This CI ecosystem may be viewed, “more holistically as a spectrum of computational, data, software, networking, and security resources, tool and services, and computational and data skills and expertise that can be seamlessly integrated and used, and collectively enable new, transformative discoveries across S&E [science and education]”.
Aligned with this vision of a national scale CI ecosystem, the San Diego Supercomputer Center (SDSC) at the University of California, San Diego (UCSD), in association with partners at the University of Nebraska, Lincoln (UNL) and the Massachusetts Green High Performance Computing Center (MGHPCC), will deploy the “Prototype National Research Platform” (NRP). This novel, national scale, distributed testbed architecture includes: a high performance subsystem to be deployed at SDSC that integrates advanced processors to be available in association with extremely low latency national Research and Educational (R&E) networks operating at multiple 100Gbps speeds; additional highly optimized subsystems each constituting 288 Graphics Processing Units (GPUs) to be deployed at the University of Nebraska, Lincoln (UNL) and the Massachusetts Green High Performance Computing Center (MGHPCC), to be also interconnected to the R&E networks at 100Gbps speeds at each location; a minimum of additional 1 PB of high performance disk storage to be deployed at each of the three sites to establish a Content Delivery Network (CDN) providing prototype caliber access to data anywhere in the nation within a round trip time (RTT) of ~10ms to be available through a set of eight optimally positioned 50TB Non Volatile Memory (NVMe)-based network caches; and an innovative system software environment enabling both centralized management of the nationally distributed testbed system. Additionally, the system architecture will remain open to future growth through additional integration of capabilities to be achieved through a novel “bring your own resource” program.
The project is structured as a three-year testbed phase, followed by a two-year allocations phase. During the testbed phase, SDSC researchers, working closely with collaborators at UNL and MGHPCC, as well as with small numbers of research teams, will evaluate the NRP architecture and performance of constituent components. Semiannual workshops will bring teams together to share lessons learned, develop the knowledge and best practices to inform researchers, and explore the innovative architecture to accelerate S&E discoveries from ideas to publications. During the allocations phase, NRP will be available to researchers with projects deemed meritorious by an NSF-approved allocation process. Workshops continue through the allocations phase.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.975 |