2011 — 2014 |
Choudhary, Alok [⬀] Liao, Wei-Keng Agrawal, Ankit |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Discovering Knowledge From Scientific Research Networks @ Northwestern University
Advancement in scientific research and discovery process can be significantly accelerated by mining and analysis of scientific data generated from various sources. In addition to the experimental data produced from simulations and observations, there is another category of scientific data; namely, scientific research and process data, such as the discussions and outcomes of complete and in-progress research projects, in the form of technical reports, research papers, discussion forums, mailing lists, research blogs, etc, and the connections between research activities. This data can be analyzed to discover many important features valuable not only for scientific discovery, but also for making the discovery process more effective, efficient, and productive. Furthermore, discovering?virtual communities? with similar needs, interests, and requirements can suggest potential collaborations, software tools, etc.
This project develops an infrastructure called DiscKNet (Discovering Knowledge from Scientific Research Networks) to mine the enriched scientific research network for emerging trends, new research areas, potential collaborations, etc. It entails constructing a scientific research network based on scientific publications, discussion forums, mailing lists, reportsfrom supercomputing centers, research blogs, conference pages and common interest groups in social media such as Facebook and Twitter, etc. The design, development, and application of data mining techniques on this network lead a scientific discovery process through the identification of high impact tools and techniques, trends and usage patterns in supercomputing center activity, common issues with software tools, and potential fruitful scientific collaboration opportunities. The project provides a platform for scientists, experimentalists, research centers to build new communities. For education it assists professors, educators, researchers to find the right groups for current discussion and future collaboration.
|
0.96 |
2013 — 2016 |
Liao, Wei-Keng Agrawal, Ankit Choudhary, Alok [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Scalable Big Data Analytics @ Northwestern University
Big Data analytics requires bridging the gap between data-intensive computing and data-driven computing to obtain actionable insights. The former has primarily focused on optimizing data movement, reuse, organization and storage, while the latter has focused on hypothesis-driven, bottom-up data-to-discovery and the two fields have evolved somewhat independently. This exploratory project aims to investigate a holistic Ecosystem that optimizes data generation from simulations, sensors, or business processes (Transaction Step); organizes this data (possibly combining with other data) to enable reduction, pre-processing for downstream data analysis (Organization Step); performs knowledge discovery, learning and mining models from this data (Prediction Step); and leads to actions (e.g., refining models, new experiments, recommendation) (Feedback Step).
Intellectual Merit: As opposed to the current practice of considering optimizations in each step in isolation, the project considers scalability and optimizations of the entire Ecosystem for big data analytics as part of the design strategy. The project aims to consider big data challenges in designing algorithms, software, analytics, and data management. This strategy contrasts with traditional approaches that first design algorithms for small data sizes and then scale them up. The project aims to treat data complexity, computational requirement, and data access patterns as a whole when designing and implementing algorithms, software and applications.
Broader Impacts: The project could advance the state of the art in big data analytics across a number of key applications such as Climate Informatics and Social Media Analytics. The software resulting from the project is being made available to the broder scientific community under open source license. The project offers enhanced opportunities for education and training of graduate students and postdoctoral researchers at Northwestern University.
|
0.96 |
2014 — 2017 |
Choudhary, Alok [⬀] Liao, Wei-Keng Agrawal, Ankit |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shf: Medium: Collaborative Research: Scalable Algorithms For Spatio-Temporal Data Analysis @ Northwestern University
Acceleration of computing power of supercomputers along with development and deployment of large instruments such as telescopes, colliders, sensors and devices raises one fundamental question. "Can the time to insight and knowledge discovery be reduced at the same exponential rate?" The answer currently is clearly "NO", because a critical step that combines analytics, mining and discovering knowledge from the massive datasets has lagged far behind advances in software, simulation and generation of data. Analysis of data requires "data-driven" computing and analytics. This entails scalable software for data reduction, approximations, analysis, statistics, and bottom-up discovery. Scalable and parallel analytics software for processing large amount of data is required in order to make a significant leap forward in scientific discoveries. This project develops innovative, scalable, and sustainable data analytics algorithms to enable analysis and mining of massive data on high-performance parallel computers, which include (1) bottom-up and unsupervised data clustering algorithms that are suitable for spatio-temporal data, massive graph analytics, community computations, and detection of patterns in time-varying graphs, different types of data, and different data characteristics; (2) change detection and anomaly detection in spatio-temporal data; and (3) tracking moving data and cluster dynamics within certain time and space constraints. These parallel algorithms use the massive amount of data generated from scientific applications, such as astrophysics, cosmology simulations, climate modeling, and social networking analysis, for result verification and performance evaluation on modern high-performance parallel computers. This project directly addresses the critical needs for spatio-temporal data analysis, performance scalability, and programming productivity of large-scale scientific discovery via parallel analytics software for big data. This work will impact applications of enormous societal benefits and scientific importance such as climate understanding, environmental sustainability, astrophysics, biology and medicine by accelerating scientific discoveries. Furthermore, the developed software infrastructure can be used and adopted in commercial applications, such as commerce, social, security, drug discovery, and so on. The source codes are open to the public for all community to adapt, build-upon, customize and contribute to, thereby multiplying its value and usage.
|
0.96 |