1991 — 1994 |
Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ria: Design, Analysis, Simulation and Evaluation of Multi-Level Caches For Scalable Multiprocessors
This research studies the role and the performance of multi-level caches in scalable shared memory multiprocessors. The objectives of this project are: (1) evaluate and characterize performance of multi-level caches for multiprocessors as a function of cache parameters, cache coherency protocol, and program characteristics using trace-driven simulations; (2) evaluate cost-performance tradeoffs in selecting various multi-level cache configurations for scalable shared memory architectures using different workload parameters; (3) investigate and develop trace reduction and sampling techniques to speed up simulations to study multi-level cache performance in multiprocessors;(4) develop analytical models to evaluate performance of multi-level caches in scalable architectures; and (5) investigate what minimal set of characteristic metrics must be measured to predict multi-level cache performance over a wide range of cache parameters.
|
0.954 |
1993 — 2000 |
Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nsf Young Investigator: Compiler and Runtime Optimization Techniques For Parallel Programming On Distributed Memory Machines @ Northwestern University |
1 |
1996 — 2000 |
Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
System Software Support For Input-Output On Parallel Computing @ Northwestern University |
1 |
1997 — 2003 |
Scheuermann, Peter (co-PI) [⬀] Lee, D. (co-PI) [⬀] Banerjee, Prithviraj [⬀] Sarrafzadeh, Majid (co-PI) [⬀] Choudhary, Alok Taylor, Valerie Hauck, Scott (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Infrastructure: a Distributed High-Performance Computing Infrastructure @ Northwestern University
CDA-9703228 Prithviraj Banerjee Northwestern University A Distributed High-Performance Computing Infrastructure This award is for the acquisition of 20 high-end UNIX workstations, 50 low-end UNIX workstations, three UNIX fileservers, an 8-processor distributed shared memory multiprocessor, and a 64-ported ATM switching hub. The machines would be networked together using high-speed OC-3 ATM networks with bandwidths of 155 Mbps. As the use of high-speed networking moves from the laboratory to the workplace, new opportunities arise for the design and implementation of a high-speed distributed computing environment. The goals of this project are: (1) to explore the use of high-speed networking and computing to investigate file systems and data management issues for high-performance distributed computing, (2) to investigate the parallel programming support of networks of high-speed workstations and personal computers as an alternative to stand-alone parallel computers, (3) to study high-performance computer-aided design of electronic systems in a heterogeneous environment, and to develop a Web-based CAD computing center, that takes advantage of high-speed networking, (4) to explore new instructional techniques that take advantage of the high bandwidth and high speed.
|
1 |
1997 — 2002 |
Haines, Matthew Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Interoperable Data Files For High-Performance Computing
This proposal addresses the problem of providing high-performance access to interoperable data files. Scientific databases have been touted as being the solution to the I/O problems faced by scientific applications. Based on relational, object-oriented, or hybird data models, these systems help to improve the access times to the data. In some cases they even permit parallel access to the database. However, these systems do not address the problem of interoperability. We propose to synthesize our previously-independent research in both of these areas to create data files that support both high-performance parallel access and interoperability. More importantly, we propose to develop access methods as a part of the system interface that incorporate the notion of collective I/O for high-performance, functions that can learn from access patterns to further improve on performance for future access patterns, and a data description language that allows users to define abstract file types and specify layouts and other hints that will improve access while maintaining interoperability.
|
1 |
2001 — 2004 |
Choudhary, Alok Kandemir, Mahmut |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ngs: Scalable I/O Management and Access Optimizations For Scientific Applications For High-Performance Computing @ Northwestern University
EIA-0103023 Alok Choudhary Northwestern University
Scalable I/O Management & Access Optimizations for Scientific Applications for High-Performance Computing
The main objective of this proposal is to address the problem of large-scale storage, performance management of I/O, automatic performance optimizations of I/O using historical information and access patterns, data management, analysis, and access using simple interfaces which permit flow of access information to lower levels software for exploiting higher level information. Furthermore, since analysis at such a scale is simply not feasible if done manually (e.g., visualization alone or off-line analysis), integration of on-line analysis and feature extraction while simulations and experiments are executing is very important. Our observation is that neither parallel file systems nor runtime systems and database management systems (DBMS) fully-address the large-scale data management problem, as they lack global information about the applications access patterns and most of them are not effective in handling storage hierarchies.
We believe that the results from the proposed research will enable scientists to address one of the most important bottlenecks in computational simulation cycles; namely, the bottleneck of analyzing and managing massive data in high-performance distributed computing environment (such as Grid).
|
1 |
2002 — 2005 |
Choudhary, Alok Mambretti, Joel Dinda, Peter Taylor, Valerie |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Dot -- Distributed Optical Testbed to Facilitate the Development of Techniques For Efficient Execution of Distributed Applications @ Northwestern University
EIA- 0224427 Taylor, Valerie Choudhary, Alok; Dinda, Peter A.; Mambretti, Joel J. Northwestern University
Title: CISE RR: (Collaborative) DOT--Distributed Optical Testbed to Facilitate the Development of Techniques for Efficient Execution of Distributed Applications
This collaborative proposal with Illinois Institute of Technology (Sun, 02-24377) and the University of Chicago (Foster, 02-24187), acquiring data nodes and compute nodes at five sites, contributes to build a Distributed Optical Testbed (DOT). The DOT system, a product of the paradigm shift from large-scale applications running on large parallel systems at single sites to those running on distributed systems, has come about by the availability of high-speed optical networks (E.g., Starlight, TeraGrid 40 Gb/s network, the PacificRail 10 Gb/s network). This shift necessitates techniques that allow applications to efficiently utilize distributed systems. In contrast to parallel systems, these systems must exploit two characteristics: Heterogeneity of resources (processors and networks) and Dynamic changes in performance of shared resources, especially wide area networks. The system, consisting of Linux clusters at six geographically different sites interconnected via two existing research DWDM networks, I-WIRE and OMNInet, involves the following sites: Argonne National Laboratory (ANL), Illinois Institute of Technology (IIT), National Center for Supercomputer Applications (NCSA), Northwester University Chicago Campus (NU-C), Northwestern University Evanston Campus (NU-E), and the University of Chicago (UC). DOT will facilitate the following research activities in the area of distributed applications: Dynamic Load Balancing (Taylor) Performance Monitoring and Prediction (Dinda, Sun, Taylor) Data Management (Choudhary, Foster) The first activity develops techniques utilizing network performance predictions that take into consideration the heterogeneity of the processors and networks of distributed systems to dynamically balance the load during execution. The second extends performance monitoring, modeling and prediction techniques that have been focused on parallel systems and broadband network to distributed systems with optical networks and different topologies. The last develops techniques that manage the distributed data such that the actual data location is transparent and the data is accessed efficiently. These research activities are driven by three applications that have been parallelized using MPI, such that the applications can be easily ported to DOT: ENZO, an adaptive cosmological application, Cactus, an open framework used to solve Einstein's equations, and AudioVoice, a virtualized distributed audio application with physical simulations that have real-time deadlines and varying computational demands. Each application presents challenges, which include adaptivity, flexible framework, and simulations with real-time deadlines.
|
1 |
2003 — 2010 |
Choudhary, Alok Narahari, Bhagirath (co-PI) [⬀] Simha, Rahul [⬀] Memon, Nasir (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: a Hardware/Compiler Co-Design Approach to Software Protection @ George Washington University
ITR: A Compiler-Hardware Co-Design Approach to Software Protection
PI's: Rahul Simha, Bhagi Narahari, Alok Choudhary, Nasir Memon
Abstract:
The growing area of software protection aims to address the problems of code understanding and code tampering along with related problems such as authorization. This project will combine novel techniques in the areas of compilers, architecture, and software security to provide a new, efficient, and tunable approach to some problems in software protection. The goal is to address a broad array of research issues that will ultimately enable design tools such as compilers to assist system designers in managing the tradeoffs between security and performance.
The main idea behind the proposed approach is to hide code sequences (keys) within instructions in executables that are then interpreted by supporting FPGA (Field Programmable Gate Array) hardware to provide both a "language" (the code sequences) and a "virtual machine within a machine" (the FPGA) that will allow designers considerable flexibility in providing software protection. Thus, by using long sequences and PKI to exchange a secret key with the FPGA while also encrypting the executable with that secret key, a system can be positioned at the high-security (but low-performance) end of the spectrum. Similarly, as will be explained in the proposal, by using shorter sequences and selective encryption, one can achieve high-performance with higher security than is possible with systems that rely only on obscurity.
|
0.951 |
2004 — 2009 |
Choudhary, Alok Memik, Seda Memik, Gokhan (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Ultra-Scalable System Software and Tools For Data-Intensive Computing @ Northwestern University
This project entails research and development and to address the software and tools problems fro ultra-scale parallel machines, especially targeted for scalable I/O, storage and memory hierarchy. The fundamental premise is that to achieve extreme scalability, incremental changes or adaptation of traditional (extension of sequential) interfaces and techniques for scaling data accesses and I/O will not succeed, because they are based on pessimistic and conservative assumptions of parallelism, synchronization, and data sharing patterns. We will develop innovative techniques to optimize data access that utilize the understanding of high-level access patterns ("intent"), and use that information through runtime layers to enable optimizations and reduction / elimination of locking and synchronization at different levels. The proposed mechanisms will allow different software layers to interact/cooperate with each other. Specifically, the upper layers in the software stack extract high-level access pattern information and pass it to the lower layers in the stack, which in turn exploit them to achieve ultra-scalability. In particular, the main objectives of this project are: (1) Techniques, tools and software for extracting data access patterns and data-flow at runtime; (2) Interfaces and strategies for passing access pattern across the different layers for optimizations; (3) Implementation of these techniques in appropriate layers such as parallel file system, communication software 9e.g., MPI2), and runtime libraries to reduce or eliminate synchronization and locking; (4) Runtime techniques and tools that exploit access patterns for reducing power consumption and cooling requirements for the underlying storage system; and (5) Development of interfaces and software to use active storage for data analysis and filtering
|
1 |
2004 — 2008 |
Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Ngs: Dynamic Runtime and Compilation Support For I/O-Intensive Applications @ Northwestern University
Abstract, CNS-0406341
Tera-scale high-performance computing has enabled scientists to tackle very large and computationally challenging problems, such as those found in the scientific computing domain. However, as computing scales to levels never seen before, it also becomes extremely data intensive and I/O intensive. Thus, I/O is becoming a major bottleneck, thereby slowing the expected pace of scientific discovery and analysis of data. Furthermore, in order to cope with larger problems and data sizes, models and applications are being designed to be dynamic in nature. That is, the applications are dynamic both in terms of their computation patterns as well as data access patterns. Due to the complexities of systems and applications, it is, therefore, very important to address research issues and develop dynamic techniques at the level of runtime systems and compilers to scale I/O in the right proportions.
Technical Merit:
The objectives of this project are to design and develop next generation software techniques to address the data, I/O, and storage bottlenecks for large-scale scientific applications. Particularly, this project aims to investigate dynamic runtime and compilation techniques for scalable I/O optimizations for largescale systems. Another important aspect will be to drive these optimizations by learning and characterizing performance of I/O and data accesses, and subsequently using those to develop rules that will be used by dynamic runtime and compilation systems to enable high-performance I/O. Current state-of-the-art compiler support for I/O-intensive applications is tremendously lacking. Runtime needs of many large-scale I/O-intensive applications can benefit a lot from a robust dynamic compilation and linking infrastructure. The specific objectives of this project are:
. Developing an understanding of dynamically varying data access needs of I/O-intensive applications,
. Capturing dynamic access patterns and application steering information at runtime within a metadata manager,
. Designing and implementing dynamic compilation techniques based on the runtime access patterns and performance statistics collected by the metadata manager,
. Designing and implementing a layout manager that collects storage format (layout) suggestions from multiple concurrently executing applications and determines the globally acceptable storage layouts for disk-resident and tape-resident data,
. Designing and implementing a high-level, dynamic, easy-to-use I/O library that can be invoked by the dynamic compiler/linker,
. Investigating what types of user-specified hints can be passed to the runtime system/compiler, and how they can be incorporated to reduce the overheads associated with dynamic compilation,
. Evaluating the performance of the developed dynamic compilation/linking infrastructure under realistic I/O-intensive workloads and quantifying the runtime overheads associated with dynamic compilation, and
. Providing the developed infrastructure and experimental findings in the public domain, and incorporating the research findings into the undergraduate and graduate curriculum.
|
1 |
2005 — 2009 |
Katsaggelos, Aggelos (co-PI) [⬀] Choudhary, Alok Wu, Ying (co-PI) [⬀] Memik, Seda Memik, Gokhan (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: High-Performance Techniques, Designs and Implementation of Software Infrastructure For Change Detection and Mining @ Northwestern University
ABSTRACT NSF 0536994, Choudhary NSF 0536947, Fox
Problems in managing, automatically discovering, and disseminating information are of critical importance to national defense, homeland security, and emergency preparedness and response. Much of this data originates from on-line sensors that act as streaming data sources, providing a continuous flow of information. As sensor sources proliferate, the flow of data becomes a deluge, and the extraction and delivery of important features in a timely and comprehensible manner becomes an ever increasingly difficult problem. More specifically, developing data mining and assimilation tools for data deluged applications faces three fundamental challenges. The amount of distributed real time streaming data is so large that even current extreme scale computing cannot effectively process it. Second, today's broadly deployable network protocols and web services do not provide the low latency and high bandwidth required by high volume real time data streams and distributed computing resources connected over networks with high bandwidth delay products. Finally, the vast majority of today's statistical and data mining algorithms assume that all the data is co-located and at rest in files. Here, the real time data streams are distributed and the applications that consume them must be optimized to process multiple high volume real time streams. The goal is to develop novel algorithms and hardware acceleration schemes to allow real-time statistical modeling and change detection on such large-scale streaming data sets. By using Service Oriented Architecture principles, a framework for integrating high -performance change detection software services, including accelerations of commonly used kernels in statistical modeling, into a Grid messaging substrate will be developed and tested. Geographical Information System (GIS) services will be supported using Open Geospatial Consortium standards to enable geo-referencing.
This project has the potential to have near-term and long-term impact in several important areas. In the near-term, the implementation of kernels and modules of statistical modeling and change detection algorithms will allow the end-user applications (e.g., homeland security, defense) to achieve one to two orders of magnitude improvement in performance for data driven decision support. In the longer term, the availability of toolkits and kernels for the change detection and data mining algorithms will facilitate the development of applications in many areas including defense, security, science and others. Furthermore, this research will bring the use of reconfigurable architectural acceleration of functions on streaming data including change detection and data mining, thereby opening new avenues of research and enabling newer data-driven applications on complex datasets. Both graduate and undergraduate students (through undergraduate fellowships) are engaged in the research. In addition, team members actively engage with minority serving institutions using audio/video and distance education tools.
|
1 |
2006 — 2012 |
Choudhary, Alok Thakur, Rajeev |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Scalable I/O Middleware and File System Optimizations For High-Performance Computing @ Northwestern University
For data-intensive applications, I/O and storage layers are extremely critical, and are often overlooked, but they become a bottleneck in not only obtaining scalable performance but also in utilization and productivity of systems and application scientists. Along with computational capabilities, scalable software for I/O, and storage for the required capacity and performance must be developed in order to address the data intensive nature of applications and reap benefits in performance and productivity of High End Systems. This proposal entails research and development to address several parallel I/O problems in the HECURA initiative. In particular, the main goals of this proposal are to design and implement novel I/O middleware techniques and optimizations, parallel file system techniques that scale to ultra-scale systems, design and development of techniques that efficiently enable newer APIs including suggested extensions to POSIX for parallelism, and flexible I/O benchmarks that mimic real and dynamic I/O behavior of science and engineering applications. The PIs propose innovative techniques to optimize data accesses that utilize the understanding of high-level access patterns, and use that information through middleware and file systems to enable optimizations. Specifically, the objectives are to (1) design and develop middleware I/O optimizations and cache system that are able to capture small, unaligned, irregular I/O accesses from large number of processors and uses access pattern information to optimize for I/O; (2) incorporate these optimizations in MPICH2's MPI-IO implementation to make them available to a large number of users; (3) design scalable parallel file system techniques and optimizations including a versioning parallel file system, programmable and adaptable consistency semantics, layout optimizations, and self-tuning capabilities; (4) design and evaluate enhanced APIs for file system scalability, particularly for recently proposed enhancements to the POSIX interface (API) to enable highperformance parallel I/O; and (5) develop flexible, execution oriented and scalable I/O benchmarks that mimic the I/O behavior of real science, engineering and bioinformatics applications.
|
1 |
2006 — 2009 |
Choudhary, Alok Zhou, Hai Dick, Robert (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sod-Team: Robust System Design Under Weak Component Assumptions @ Northwestern University
Directorate for Computer and Information Science and Engineering (CISE) Division Computer and Network Systems (CNS) Science of Design (SoD) Program
Proposal Number: 0613967 P/I: Hai Zhou PI's Department: Electrical and Computer Engineering Institution: Northwestern University Award: $200,000 for 24 months
Title: "SoD TEAM: Robust System Design Under Weak Component Assumptions"
This project focuses on developing a new "science of design" for large, robust systems. Typically, large-scale robust systems are distributed and reactive with heterogeneous components that may be designed by different contributors; some may be legacy systems and some may be un-trusted third-party programs. Recent rapid development and deployment of Internet and networked devices, such as cell phones and sensor networks, has resulted in application systems development that is vastly different from traditional software engineering: commercial of-the-shelf components, legacy components, and un-trusted components are generally unavoidable as components in such a system. Diminished designer-control over components poses a challenge: the weaker the component assumptions, the more difficult it is to build a provably-correct system or even a system that meets requirements. This project's "science of design" provides precise specifications for imperfect components, gives functional limits of feasible systems under different assumptions, and provides a methodology to design a robust system under weak component assumptions. The PIs anticipate that their rigorous theory of design and specification (based on adapting and extending Temporal Logic of Action -- TLA -- a program logic that expresses both programs and their properties with a single language) will deepen understanding of the relationships and trade-offs between system assurance and component-assumptions, particularly for concurrent systems. Such a design methodology (i.e., their language for specifying components and a set of tools for checking system properties) facilitate the design of larger and more secure systems.
Program Manager: Anita J. La Salle Date: June 21, 2006
|
1 |
2006 — 2010 |
Choudhary, Alok Memik, Gokhan (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Cri - Scalable Benchmarks, Software and Data For Data Mining, Analytics and Scientific Discoveries @ Northwestern University
This collaborative project, developing a broad suite of data mining benchmarks, defines benchmark data sets and efficient algorithms for important data mining kernels establishing a comprehensive benchmark suite for data mining applications. Overall, applications using data mining algorithms now form a large enough percentage to warrant research into the development of a data mining benchmark that can be used to evaluate new processor architecture and serve for comparison in testing new data mining algorithms. Taking an initial, and significant step towards developing benchmarks, test suites and datasets for applications which can be used to drive the design, implementation, and growth of systems from processor to application levels, the project specifically pursues the following goals:
-Develop a benchmarking suite that will be used to understand the bottlenecks in high performance data mining and guide in the development of next-generation processors, and -Devise data mining kernels that can be efficiently executed on existing and future processors.
Benchmarks play a major role in advancing architectures, software scalability, networks, and other IT disciplines. They not only play a role in measuring the relative performance of different systems, but also aid in the research and development of architectures to applications in terms of quality, scalability, cost, execution time, and other measures. Establishing a benchmark and accompanying tools for data access and usage, performing a detailed analysis of applications in the suite, and developing a testbed to perform these analyses, the work contributes a community resource that can help in design evaluation, comparison, and improvement for processor architecture, algorithms, and scalable systems.
Broader Impact: While providing a standardize way of evaluating and comparing algorithms, applications, designs, and products, the results from this project have the potential to directly impact the advancement of various fields including data mining algorithms and applications, newer architectures, and system design for data intensive computing. The project opens the way to the development of a new industry segment addressing data intensive computing, similar to what resulted from media, networking, and signal processing applications. Moreover, the resource contributes to education by providing the community with software, tools, and data that can be used in the classroom.
|
1 |
2007 — 2012 |
Choudhary, Alok Beckman, Peter Liao, Wei-Keng Ross, Robert Kandemir, Mahmut |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sdci Hpc: Improvement: Parallel I/O Software Infrastructure For Petascale Systems @ Northwestern University
Technical Merit: This project proposes to address the software problem for petascale parallel machines, and it especially targets for scalable I/O, storage and systems with deep memory hierarchy accesses. In particular, this project proposes to improve, enhance, develop, and deploy robust software infrastructure to provide end-to-end scalable I/O performance that utilizes the understanding of high-level access patterns (?intent?), and uses that information through runtime layers to enable optimizations at different levels. We propose mechanisms that allow different software layers to interact and cooperate with each other to achieve end-to-end performance objectives. Specifically, the objectives of this project, are to develop, improve and deploy (1) scalable software for end-to-end I/O performance optimizations; (2) Parallel netCDF (PnetCDF) enhancements providing statistical functions and data mining functions; (3) PnetCDF software optimizations using non-blocking I/O mechanisms; (4) MPI-IO caching mechanisms to optimize I/O software stack performance; (5) I/O forwarding and dedicated caching mechanisms important to effectively utilize the structures of upcoming petascale systems; (6) effective benchmarking and testing suites for the I/O stack; (7) an optimization assist tool that, through program analysis, can identify and guide a user to optimize I/O; (8) testing leveraging the mechanisms and tools developed as part of the NMI; and (9) tutorials and tools for helping application scientists incorporate these I/O stack optimizations into their production applications. We also believe that the software and techniques developed in this project will be directly applicable to and useful in other high-level software libraries and formats such as the Hierarchical Data Format (HDF).
Broader Impact: We will build upon and leverage our team's collective experience (which includes distribution of widely used and robust software systems for HPC such as ROMIO, MPICH2, PVFS,PnetCDF and NU-Minebench) to distribute software developed in this project for cyberinfrastructure, and therefore, directly impact the scalability of applications in many domains. Through our team's active participation in multiple infrastructure centers (e.g., teragrid), we will deploy the software on production systems. We will also incorporate the results and lessons from this project into the various tutorials that are presented by our team members in the area of parallel computing, parallel I/O and systems software in most leading conferences in HPC throughout the world. Through this project and utilizing summer internships, we will provide an opportunity to students to work with application scientists, thereby fostering interdisciplinary collaboration. This project will also support graduate students work towards advanced degrees. PI Choudhary has graduated more than 23 PhDs, many of whom have joined academia and national labs. Multiple PIs in this project have graduated several female and underrepresented PhDs, and we will continue to enhance this tradition. In addition to incorporating the lessons from this project into various tutorials, we will also incorporate them into classroom material both for undergraduate and graduate level courses as we have done in the past. Finally, we have a strong collaboration with industry in the HPC area and we will leverage that collaboration to provide the outcomes and results of this project to them.
|
1 |
2008 — 2013 |
Choudhary, Alok Thakur, Rajeev |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Advanced Compiler Optimizations and Programming Language Enhancements For Petascale I/O and Storage @ Northwestern University
Peta-scale systems have tens of thousands to millions of cores. To exploit the performance for applications such as climate prediction, environmental modeling, astrophysics, biology, life-sciences, etc, the problems of programming these systems must be solved. Programming language enhancements, compiler techniques and runtime support must be developed to enable computing and knowledge discovery at this unprecedented scale. The challenges faced by a user in programming these machines include performance, power, productivity, and portability, which are inter-related in a complex way.
This project entails the design and development of programming language, programming model, and compilation optimizations for I/O and storage performance and power optimizations. The project is investigating ?What minimal set of changes or enhancements to programming models, programming languages, and what optimizations to compilers and runtime systems are needed to enable better I/O, file and storage systems performance while optimizing power and improving productivity?? Some specific questions include: What language enhancements can be used by to specifically improve the I/O and storage performance? Should interfaces be developed that can be used across languages for I/O? What compiler optimizations are needed? Can the compiler identify transform codes that can inform the I/O runtime and storage systems on phases to power-down disks to save power at certain times as required? The project tasks include: the design and development of programming-language enhancement; the design of a compilation framework and performance- and power-oriented I/O optimizations using novel compiler analyses; and the design of a novel hint-handling mechanism within the I/O stack.
|
1 |
2009 — 2016 |
Scheuermann, Peter (co-PI) [⬀] Trajcevski, Goce [⬀] Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Nets: Large:Collaborative Research: Context-Driven Management of Heterogeneous Sensor Networks @ Northwestern University
Wireless sensor networks (WSNs) composed of smart sensors interconnected over wireless links are quickly becoming the technology of choice for monitoring and measuring geographically distributed physical, chemical, or biological phenomena in real time. Dynamic WSN environments encountered in environmental monitoring, surveillance, pollution control, and reconnaissance applications, require responsive management of WSN resources and their adaptive allocation to sensing, networking support, localization, and planning tasks, based on user requests and changes in the environment. A specified quality of service should however be ensured for criteria such as resolution of the raw-data, latency, network reconfiguration delay, and resource utilization in the steady-state. This project develops an integrated cross-layered approach to networking, databases, control, mobility management, and information processing in WSNs. In particular, context-aware and energy-efficient solutions are pursued that are based on opportunistic sensing and processing techniques, dynamic indexing structures, novel query language constructs, reactive mobility control algorithms, and distributed compression based routing algorithms.
The technological advances from this research will significantly simplify the deployment of WSNs and lead to novel context-aware applications. The advances will directly benefit domains such as emergency-response management, environmental threat remediation, and biological habitat monitoring. Apart from developing the required algorithms, the project will implement simulation platforms and a monitoring environment using physical devices. The platforms will provide students with new educational opportunities to actively explore information acquisition and resource management in resource-constrained environments. All project resources will be shared with the public through a project webpage.
|
1 |
2009 — 2013 |
Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Dc: Medium: Collaborative Research: Ellf: Extensible Language and Library Frameworks For Scalable and Efficient Data-Intensive Applications @ Northwestern University
The growth of scientific data sets to petabyte sizes offers significant opportunities for important discoveries in fields such as combustion chemistry, nanoscience, astrophysics, climate prediction and biology as well as from data on the internet. However, the realization of new scientific insights from this data is limited by the difficulty of creating scalable applications due to the lack of easy-to-use programming models and tools. To address challenges in creating data intensive applications, the project will build an extensible language framework, backed by an expressive collection of high-performance libraries (I/O and analytic), to provide a development environment in which multiple domain-specific language extensions allow programmers and scientists to more easily and directly specify solutions to data-intensive problems as programs written in domain-adapted languages. The project will build on recent attribute grammar research to build an extensible specification of C to host domain-specific language extensions which will also address the inadequate performance in storage, I/O and analysis capabilities in low-level language such as C.
The proposed extensible language and library framework has the potential to be a transformative problem solving environment for programmers and scientists since it allows scalable and efficient solutions to data-intensive problems to be specified at a high-level of abstraction. The resulting language framework and libraries will be freely available to researchers writing applications for climate and other applications involving spatio-temporal data. This includes many applications in the physical sciences and engineering and thus it is expected that the framework will find use in other scientific domains as well.
|
1 |
2009 — 2011 |
Choudhary, Alok Nakka, Nithin |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Data- and Analytics Driven Fault-Tolerance and Resiliency Strategies For Peta-Scale Systems @ Northwestern University
Project Summary: This proposal aims at improving system level fault tolerance for high end computing systems by exploring the use of system monitoring data based models in "(1) a methodology for making informed decisions as to which parts of the system need to be fortified, at what time, and to what extent, and? building ?(2) a framework to reconfigure the system to meet user-specified resiliency requirements."
|
1 |
2009 — 2013 |
Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Ct-M: Hardware Containers For Software Components - Detection and Recovery At the Hardware/Software Interface @ Northwestern University
This project focuses on hardware features to improve the security of software systems. By refining the coarse-grained protections available in today's architectures, the project will aim to protect the integrity of individual software objects or components. The hardware mechanisms force tight controls on the execution of software components, which programmers can define to be as large as entire applications or as small as individual objects. The goal is to rapidly detect and also recover from attacks that improperly access memory or take over the CPU. The approach also includes hardware-supervised recovery, to enable systems to return to normal operation after an attack and to protect the recovery process itself from attacks.
The benefits of this project include the ability to thwart a large class of attacks and the potential of developing more robust software systems in the future. Recovery, which has received somewhat less attention than attack prevention or detection, is especially important for embedded systems that do not have the luxury of intervention by human operators.
The project will be used to train graduate students and to feed material into graduate courses taught at the three participating universities. Modules will also be developed for use in K-12 education with the aim of drawing students into considering careers in computer science and engineering.
|
1 |
2010 — 2016 |
Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Understanding Climate Change: a Data Driven Approach @ Northwestern University
Understanding Climate Change: A Data Driven Approach
Climate change is the defining environmental challenge now facing our planet. Whether it is an increase in the frequency or intensity of hurricanes, rising sea levels, droughts, floods, or extreme temperatures and severe weather, the social, economic, and environmental consequences are great as the resource-stressed planet nears 7 billion inhabitants later this century. Yet there is considerable uncertainty as to the social and environmental impacts because the predictive potential of numerical models of the earth system is limited. These models are incapable of addressing important questions relating to food security, water resources, biodiversity, mortality, and other socio-economic issues over relevant time and spatial scales.
Climate model development has contributed small and incremental improvements; however, extensive modeling gains have not been forthcoming. Modeling limitations have hampered efforts at providing information on climate change impacts and adaptation and mitigation strategies. A new and transformative approach is required to improve prediction of the potential impacts on human welfare. Data driven methods that have been highly successful in other facets of the computational sciences are now being used in the environmental sciences with success. This Expedition project will significantly advance key challenges in climate change science developing exciting and innovative new data driven approaches that take advantage of the wealth of climate and ecosystem data now available from satellite and ground-based sensors, the observational record for atmospheric, oceanic, and terrestrial processes, and physics-based climate model simulations.
To realize this ambitious goal, novel methodologies appropriate to climate change science will be developed in four broad areas of data-intensive computer science: relationship mining, complex networks, predictive modeling, and high performance computing. Analysis and discovery approaches will be cognizant of climate and ecosystem data characteristics, such as non-stationarity, nonlinear processes, multi-scale nature, low-frequency variability, long-range spatial dependence, and long-memory temporal processes such as teleconnections. These innovative new approaches will be used to better understand the complex nature of the earth system and the mechanisms contributing to such climate change phenomena as hurricane frequency and intensity in the tropical Atlantic, precipitation regime shifts in the ecologically sensitive African Sahel or the Southern Great Plains, and the propensity for extreme weather events that weaken our infrastructure and result in environmental disasters with economic losses in excess of $100 billion per year in the U.S. alone.
Assessments of climate change impacts, which are useful for stakeholders and policymakers, depend critically on regional and decadal scale projections of climate extremes. Thus, climate scientists often need to develop qualitative inferences about inadequately predicted climate extremes based on insights from observations (e.g., increase in hurricane intensity) or conceptual understanding (e.g., relation of wildfires to regional warming or drying and hurricanes to sea surface temperatures). These urgent societal priorities offer fertile grounds for knowledge discovery approaches. In particular, qualitative inferences on climate extremes and impacts may be transformed into quantitative predictive insights based on a combination of hypothesis-guided data analysis and relatively hypothesis-free, yet data-guided discovery processes.
A primary focus of this Expedition project will be on uncertainty reduction, which can bring the complementary or supplementary skills of physics-based models together with data-guided insights regarding complex climate processes. The systematic evaluation of climate models and their component processes, as well as uncertainty assessments at regional and decadal scales is a fundamental problem that will be addressed. The ability to translate gains in the predictive skills of climate variables to improvements in impact assessments and attributions is a critical requirement for informing policymakers. Novel methodologies will be developed to gain actionable insights from disparate impacts-related datasets as well as for causal attribution or root-cause analysis.
This research will be conducted in close collaboration with the climate science community and will complement insights obtained from physics-based climate models. Improved understanding of salient atmospheric processes will be provided to those contributing to the development and improvement of climate models with the goal of improving predictability. The approaches and formalisms developed in this research are expected to be applicable to a broad range of scientific and engineering problems, which use model simulations to analyze physical processes. This project will also contribute to efforts in education, diversity, community engagement, and dissemination of tools and computer and atmospheric science findings.
|
1 |
2010 — 2014 |
Choudhary, Alok Liao, Wei-Keng |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: An Application Driven I/O Optimization Approach For Petascale Systems and Scientific Discoveries @ Northwestern University
This research focuses on developing scalable parallel file access methods for multi-scale problem domain decompositions, such as the one presented in Adaptive Mesh Refinement (AMR) based algorithms. Existing parallel I/O methods concentrate on optimizing the process collaboration under a fairly evenly-distributed request pattern. However, they are not suitable for data structures in AMR, because the underlying data distribution is highly irregular and dynamic. Process synchronization in the existing parallel I/O methods can penalize the I/O parallelism if the process collaboration is not carefully coordinated. This research addresses such synchronization issue by developing scalable solutions in the Parallel netCDF library (PnetCDF), particularly to address AMR structured data and its I/O patterns. PnetCDF is a popular I/O library used by many computer simulation communities. A scalable solution for storing and accessing AMR data in parallel is considered a challenging task. This research will design a process-group based parallel I/O approach to eliminate unrelated processes and thus avoid possible I/O serialization. In addition, a new metadata representation will also be developed in pnetCDF for conserving tree-structured AMR data relationship in a portable form.
|
1 |
2010 — 2012 |
Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Travel Support For Workshop: Reaching Exascale in This Decade to Be Co-Located With International Conference On High-Performance Computing (Hipc 2010) @ Northwestern University
This proposal requests travel support for organizing a workshop on Exascale Computing co-located with the International Conference on High-Performance Computing, Dec. 18-22 in Goa India. The workshop will involve presentations on the challenges of Exascale computing as well as on the results from previous workshops to be disseminated to wider audience and international community. The presentations will be given by leaders in the field of high-performance computing as well as leaders and participants from the working groups developing the Exascale agenda. In addition, a panel consisting of leaders from government funding agencies including those from US, India, Europe and Asia will discuss future directions and opportunities for funding and collaboration in the Exascale computing arena.
|
1 |
2011 — 2012 |
Rasio, Frederic (co-PI) [⬀] Kalogera, Vassiliki Luijten, Erik (co-PI) [⬀] Paris, Joseph Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mri: Acquisition of a Hyrid High Performance Computer Cluster For Gravitational-Wave Source Simulation and Data Analysis @ Northwestern University
This award supports the acquisition of high-performance computing (HPC) equipment that will enable research in the area of gravitational-wave physics relevant to ground-based GW detectors, such as the NSF-funded Laser Interferometer Gravitational-wave Observatory (LIGO). The research enabled by this equipment focuses on both modeling the astrophysical sources of gravitational waves, as well as the development of computational tools for the analysis of gravitational-wave signals obtained by LIGO, bringing together a collaboration of physicists with computer scientists and applied mathematicians. The hybrid character of the equipment originates from the incorporation of cutting-edge Graphics Programming Units (GPUs appropriate for scientific computing) used as accelerators for massively parallel computations in additional to regular computing units.
The study of gravitational wave sources is important in many other areas of physics and the enabled research work has a strong interdisciplinary character. The equipment will further enable the multi-faceted training of students in HPC technology and computational research, including algorithmic development for GPUs most desirable for a technically sophisticated workforce, competitive in the 21st century. A small fraction of the computing time resources will also be coupled to another NSF-funded project at Northwestern, a GK-12 program; these resources will bring computational thinking and simulation tools to K-12 classroom through activities and modules tied to the science curriculum thus engaging teachers and students in inquiry-based learning, understanding of the research process, and advancing communication and outreach skills of the graduate students.
|
1 |
2011 — 2014 |
Choudhary, Alok Liao, Wei-Keng Agrawal, Ankit (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Discovering Knowledge From Scientific Research Networks @ Northwestern University
Advancement in scientific research and discovery process can be significantly accelerated by mining and analysis of scientific data generated from various sources. In addition to the experimental data produced from simulations and observations, there is another category of scientific data; namely, scientific research and process data, such as the discussions and outcomes of complete and in-progress research projects, in the form of technical reports, research papers, discussion forums, mailing lists, research blogs, etc, and the connections between research activities. This data can be analyzed to discover many important features valuable not only for scientific discovery, but also for making the discovery process more effective, efficient, and productive. Furthermore, discovering?virtual communities? with similar needs, interests, and requirements can suggest potential collaborations, software tools, etc.
This project develops an infrastructure called DiscKNet (Discovering Knowledge from Scientific Research Networks) to mine the enriched scientific research network for emerging trends, new research areas, potential collaborations, etc. It entails constructing a scientific research network based on scientific publications, discussion forums, mailing lists, reportsfrom supercomputing centers, research blogs, conference pages and common interest groups in social media such as Facebook and Twitter, etc. The design, development, and application of data mining techniques on this network lead a scientific discovery process through the identification of high impact tools and techniques, trends and usage patterns in supercomputing center activity, common issues with software tools, and potential fruitful scientific collaboration opportunities. The project provides a platform for scientists, experimentalists, research centers to build new communities. For education it assists professors, educators, researchers to find the right groups for current discussion and future collaboration.
|
1 |
2013 — 2016 |
Liao, Wei-Keng Agrawal, Ankit (co-PI) [⬀] Choudhary, Alok |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Scalable Big Data Analytics @ Northwestern University
Big Data analytics requires bridging the gap between data-intensive computing and data-driven computing to obtain actionable insights. The former has primarily focused on optimizing data movement, reuse, organization and storage, while the latter has focused on hypothesis-driven, bottom-up data-to-discovery and the two fields have evolved somewhat independently. This exploratory project aims to investigate a holistic Ecosystem that optimizes data generation from simulations, sensors, or business processes (Transaction Step); organizes this data (possibly combining with other data) to enable reduction, pre-processing for downstream data analysis (Organization Step); performs knowledge discovery, learning and mining models from this data (Prediction Step); and leads to actions (e.g., refining models, new experiments, recommendation) (Feedback Step).
Intellectual Merit: As opposed to the current practice of considering optimizations in each step in isolation, the project considers scalability and optimizations of the entire Ecosystem for big data analytics as part of the design strategy. The project aims to consider big data challenges in designing algorithms, software, analytics, and data management. This strategy contrasts with traditional approaches that first design algorithms for small data sizes and then scale them up. The project aims to treat data complexity, computational requirement, and data access patterns as a whole when designing and implementing algorithms, software and applications.
Broader Impacts: The project could advance the state of the art in big data analytics across a number of key applications such as Climate Informatics and Social Media Analytics. The software resulting from the project is being made available to the broder scientific community under open source license. The project offers enhanced opportunities for education and training of graduate students and postdoctoral researchers at Northwestern University.
|
1 |
2013 — 2017 |
Choudhary, Alok Rasio, Frederic [⬀] Kalogera, Vassiliki Liao, Wei-Keng |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cds&E: Black Holes in Dense Star Clusters @ Northwestern University
Many stars form in large clusters containing anywhere from thousands to many millions of objects. Stars in these clusters are born with a broad range of masses. Furthermore, the most massive stars evolve quickly, ending their lives in just a few million years and leaving behind black holes as remnants. The star clusters themselves, however, can continue to live for many billions of years, and indeed the globular clusters seen in many galaxies are thought to contain some of the oldest stars in the Universe. Therefore, many of the star clusters we see today should contain large numbers of black holes formed a long time ago. This research will use state-of-the-art supercomputer simulations to study the formation and evolution of these black holes in a variety of star cluster environments. It will also leverage innovative hybrid computational techniques, including General-Purpose computing on Graphics Processing Units (GPGPU).
The study of black hole formation and evolution is important in many areas of physics and
astronomy, including galaxy formation and cosmology, the study of quasars and other active
galactic nuclei, general relativity and gravitational wave astronomy. The stellar dynamics supercomputer codes to be developed for this project are general tools, which will be useful
for studying many other problems involving dense star clusters with or without massive black
holes. The planned research activities will involve the training of undergraduate students at Northwestern University, and will likely include students from under-represented minorities. Graduate students
will also receive training and mentoring. Outreach activities are also planned that will take
advantage of Dearborn Observatory on the Northwestern campus in Evanston, as well as the resources of the nearby Adler Planetarium and Astronomy Museum in Chicago.
|
1 |
2014 — 2017 |
Choudhary, Alok Liao, Wei-Keng Agrawal, Ankit (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Medium: Collaborative Research: Scalable Algorithms For Spatio-Temporal Data Analysis @ Northwestern University
Acceleration of computing power of supercomputers along with development and deployment of large instruments such as telescopes, colliders, sensors and devices raises one fundamental question. "Can the time to insight and knowledge discovery be reduced at the same exponential rate?" The answer currently is clearly "NO", because a critical step that combines analytics, mining and discovering knowledge from the massive datasets has lagged far behind advances in software, simulation and generation of data. Analysis of data requires "data-driven" computing and analytics. This entails scalable software for data reduction, approximations, analysis, statistics, and bottom-up discovery. Scalable and parallel analytics software for processing large amount of data is required in order to make a significant leap forward in scientific discoveries. This project develops innovative, scalable, and sustainable data analytics algorithms to enable analysis and mining of massive data on high-performance parallel computers, which include (1) bottom-up and unsupervised data clustering algorithms that are suitable for spatio-temporal data, massive graph analytics, community computations, and detection of patterns in time-varying graphs, different types of data, and different data characteristics; (2) change detection and anomaly detection in spatio-temporal data; and (3) tracking moving data and cluster dynamics within certain time and space constraints. These parallel algorithms use the massive amount of data generated from scientific applications, such as astrophysics, cosmology simulations, climate modeling, and social networking analysis, for result verification and performance evaluation on modern high-performance parallel computers. This project directly addresses the critical needs for spatio-temporal data analysis, performance scalability, and programming productivity of large-scale scientific discovery via parallel analytics software for big data. This work will impact applications of enormous societal benefits and scientific importance such as climate understanding, environmental sustainability, astrophysics, biology and medicine by accelerating scientific discoveries. Furthermore, the developed software infrastructure can be used and adopted in commercial applications, such as commerce, social, security, drug discovery, and so on. The source codes are open to the public for all community to adapt, build-upon, customize and contribute to, thereby multiplying its value and usage.
|
1 |