1986 — 1991 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Presidential Young Investigator Award: Very Large Scale Integers Design |
0.904 |
1989 — 1990 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Implementation of a Distributed Operating System and a Distributed Database
A workstation cluster plus oscilloscope will be provided for researchers at Columbia University for research in the Department of Computer Science. This equipment is provided under the Instrumentation Grants for Research in Computer and Information Science and Engineering program. The research for which the equipment is to be used will be in the areas of distributed operating systems and distributed database.
|
0.904 |
1991 — 1992 |
Pu, Calton Duchamp, Daniel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Research in Wireless Distributed Systems
This award is for the purchase of radio modems and portable computers for research into radio network (wireless) computer systems. The research will concentrate on several aspects of wireless communication including: extending the IP protocol to handle hosts that are not topologically fixed, techniques for handing off from one radio cell to another without data or communications loss, intelligent caching to reduce peak loads over slow radio links, self-organizing load balancing, and distribution of kernel functionality. Wireless networks provide an attractive alternative to the expense of running cables between and within buildings so long as high communication speeds are not required. This research is intended to solve some of the problems with wireless networks and to build an experimental network at Columbia University. It is very expensive to run wires or fiber optic cables between and within buildings. An alternative to cabling is to build a wireless computer network much like the cellular phone network is an alternative to the telephone line network. The problems with building a wireless computer network are more complex than those involved in building a wireless voice network. These problems include protocols to "hand off" communications when cell boundaries are crossed (with voice, some noise is acceptable, with data this must be minimized), load balancing methods so that slow radio links don't get clogged, and distribution of operating system functionality to reduce communications. The equipment funded by this grant will enable an experimental wireless computer network to be built. The network will be used to experimentally validate communication protocols and to validate algorithms for load balancing and other methods needed to reduce communications on the network.
|
0.904 |
1992 — 1995 |
Pu, Calton Bourne, Philip [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
An Object-Oriented Toolbox For Use With the Protein Data Bank (Pdb)
The Protein Data Bank (PDB) contains the atomic structure of macromolecules. As of October 1991 there were 790 structural entries (196 Mbytes), if current growth rates persist, this number could grow to 10,000 by the end of the decade. The data provide opportunities for understanding biological function through, for example, comparative structural research. This work addresses several challenges in first making the PDB more accessible to molecular biologists and crystallographers in particular, and second assisting in the management of increasing amounts of data. Several software developments are being undertaken in parallel, but share the same class libraries. First, a new object-based PDB storage format provides suitable access to the levels of substructure found in macromolecules. Second, object-based software tools that interrogate and manipulate structural data, and assist in structure verification are being derived from existing structured programs. Finally, a high-level query language provides intuitive and direct interaction with the PDB. Each aspect of software development proceeds by prototyping followed by iterative cycles of testing in the laboratory and code modification. This work integrates the state-of-art database research results such as object-oriented databases and knowledge bases, software engineering results such as component and glue collaborative work such as extended transaction models to support cooperative scientific research. These tools could potentially precipitate the discovery of new structure-function relationships by permitting data query in a more intuitive fashion.
|
0.904 |
1993 — 1996 |
Bourne, Philip [⬀] Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cifters: Object Oriented Tools For Manipulating Crystallographic Information Files (Cif)
9310154 Bourne The macromolecular crystallography community has recently adopted a standard set of data definitions as extension to that subset already derived for small molecule crystallography and referred to as the Crystallographic Information File (CIF). Macromolecular CIF definitions will facilitate the highly desireable features of simpler information exchange and a rich controlled vocabulary for use world-wide by scientists in an expanding discipline. Speedy adoption of CIF requires flexible, interoperable, and portable software tools. The investigators will build a class library and a set of tools based on the object-oriented software technology to browse, edit, display, query, verify and format CIF files, called CIFters. In addition, CIFters will be callable from existing programs in common use in macromolecular crystallography facilitating access to CIF files, further promoting the use of CIF. The long term goal of this project is to develop extensible software tools which are widely available to the structural biology community. In addition, this award support an end-user workshop to begin the effort to design the appropriate CIFters. ***
|
0.904 |
1995 — 1999 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Strengthening the Acid Properties: Modular Transaction Models and Tp Systems @ Oregon Graduate Institute of Science & Technology
Transaction Processing (TP) systems form a core software component in database management systems, making sharing of information and concurrent processing in multi-user, distributed environments possible. However, the currently used Atomicity, Consistency, Isolation, and Durability (ACID) properties of transactions have limited the use of TP systems in new applications such as engineering design and long activities, due to their inflexibility. This project strengthens the ACID properties to support a spectrum of choices. For example, Epsilon Serializability gives application designers fine-grained choices in the specification of bounded inconsistency tolerance. This work complements theoretical results such as the ACTA framework. This project maps strengthened atomicity, isolation, and durability properties onto existing proven software systems, in particular, TP monitors such as Transarc Encina. Combining theory with practice, the results from this research will make strengthened production TP systems available to the designers and implementors of advanced applications such as cooperative long tra nsactions and autonomous TP on the Internet
|
0.978 |
1995 — 1998 |
Pu, Calton Sampson, Paul Levy, Gad |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mathematical Sciences: Integrating Heterogeneous Geo- Physcal Data by Combining Error Structures: An Inter- Disciplinary Pilot Project @ Oregon State University
9418904 Levy Operational weather and climate predictions models of the near future have three fundamental requirements: 1. Data management facilities that allow the assimilation and manipulation of massive amounts of heterogeneous and imprecise data obtained from a variety of sources. 2. A modeling technology, both physical and statistical, that allows the integration of partial and heterogeneous data, accounting for the realistic non-linear physics of the problem and the often complex error structure encountered. 3. Efficient support of these functions within the data assimilation cycle. Responding effectively to these requirements calls for an interdisciplinary approach. A collaboration which will bring together complementary expertise in the computational, statistical, and the geosciences for cross training and joint research is proposed. The goal is to initiate cross training which will enable the collaborators to address, as a team, crucial problems in the atmospheric data assimilation cycle. The feasibility of applying sophisticated theoretical solutions in practice by building software tools will be tested. By implementing an intensive interdisciplinary plan that will include a colloquium series, cross-teaching, workshops, and regular conferences, the collaborators will broaden their understanding of the complementary fields to be able to jointly propose and develop a general methodology and a prototype for a specific demonstration system. In the demonstration system, a physical Boundary Layer (BL) model will be used to integrate and manage massive satellite sensed data. The processing of such data inevitably involves complex gridding, interpolation, data quality control; and averaging, as well as the infusion of arbitrary ancillary data sets and estimation of error covariance structure. Techniques to provide these functions and apply them to the specific case of scatterometer (ERS-1) stress data will be studied, a nd the feasibility of synthesizing them from current disciplinary methods will be explored. The resulting data will be processed through a Planetary Boundary Layer Model (PBLM) to link them to atmospheric numerical prediction and general circulation models. Consequently, the system that will be proposed at the end of the pilot project will make full use of BL physics, heterogeneous error structure statistics, and high quality data management to obtain better models.
|
0.912 |
2000 — 2003 |
Pu, Calton Liu, Ling |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
System Support For Distributed Information Change Monitoring @ Georgia Tech Research Corporation
The project investigates the design and implementation issues in the mointoring of information changes in large distributed networks such as the Internet and World Wide Web. One of the objectives is to develop efficient and scalable strategies, techniques, and systems support for distributed control of large numbers of information change monitoring requests. Research questions to be addressed include the following. What software techniques and tools are able to extract, integrate, and query streams of data over semi-structured or unstructured data? Which distributed trigger and data processing techniques are most scalable and yet efficient in the presence of millions of information change monitoring requests? How do change monitors adapt to wide system parameter variations in runtime environments such as the Internet? How do they scale up as the number of information sources reach millions? Critical system components include change detection algorithms (e.g., tree comparison for web pages), trigger and query grouping, indexing, and caching, as well as parallel processing. Appropriate components will be implemented and evaluated through simulation and measurements on the Internet. These experiments will emphasize the efficiency and scalability of Internet-scale information change monitoring systems. The research results will aid in the engineering, implementation and evaluation of systems and middleware software support for scalable and efficient processing of distributed triggers and queries, as well as effective detection and notification of information changes.
|
0.93 |
2001 — 2008 |
Ahamad, Mustaque [⬀] Omiecinski, Edward (co-PI) [⬀] Pu, Calton Mark, Leo (co-PI) [⬀] Liu, Ling |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr/Si: Guarding the Next Internet Frontier: Countering Denial of Information @ Georgia Tech Research Corporation
As applications enabled by the Internet become information rich, ensuring access to quality information in the presence of potentially malicious entities will be a major challenge. The goal of this research project is to develop defensive techniques to counter denial-of-information (DoI) attacks. Such attacks attempt to confuse an information system by deliberately introducing noise that appears to be useful information. The mere availability of information is insufficient if the user must find a needle in a haystack of noise that is created by an adversary to hide critical information. The research focuses on the characterization of information quality metrics that are relevant in the presence of DoI attacks. In particular, two complementary metrics are explored. Information regularity captures predictability in the patterns of information creation and access. The second metric, information quality trust, captures the known ability of an information source to meet the needs of its clients. The development of techniques to derive the values of these metrics for information sources is a key goal of the research. Other planned research activities include the building of a distributed information infrastructure and experimental evaluation of defensive techniques against DoI attacks.
|
0.93 |
2002 — 2004 |
Schwan, Karsten [⬀] Pu, Calton Pande, Santosh Eisenhauer, Greg |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Infofabric: Adaptive Services in Distributed Embedded Systems @ Georgia Tech Research Corporation
Schwan, Karsten CCR-0208953 "InfoFabric: Adaptive Services in Distributed Embedded Systems" This research is developing "Infofabric" services to manage multiple shared data streams and enable high performance sensing and communication in dynamically reconfigurable sensor nets. For example, in emergency response applications, the computing infrastructures employed are rapidly assembled conglomerates of portable and handheld end user devices. Multiple communication modes are used to interact across collaborating peers and also with local and remote command centers and/or information repositories. A key problem is that such devices typically cannot access, display, and manipulate information with the quality needed by end users. An example is an observer `in the field' trying to match visible cloud formations with the outputs produced by remotely running weather simulations, the latter using real-time radar data. Unless the handheld device can visualize data with high quality and in real-time, field observations cannot be used to refine or steer the remote weather prediction programs. Similarly, search and rescue operations can be aided by rich (multi-media), real-time communications between team members and by high fidelity graphical displays of terrain data available from remote servers. The basic technical problems to be solved for the resulting complex, distributed and embedded applications include (1) the provision of high levels of flexibility in how, where, and when necessary processing and communication actions are performed on the underlying distributed platforms, and (2) the ability to continuously meet end user needs despite runtime variations in service locations, platform capabilities (e.g., remaining power on end devices), and user requirements. The `InfoFabric' approach supports data-intensive, embedded applications with lightweight publish/subscribe middleware. An end user dynamically subscribes to information channels when needed, and the InfoFabric applies the processing specified by the user. Processing and communication actions are dynamically mapped to the underlying distributed devices and machines. To attain high performance and meet embedded systems requirements like such as power, new compiler and runtime binary code generation methods dynamically generate and install code on the InfoFabric's platform. Code is specialized to match current user needs to available platform resources. To meet dynamic needs and deal with runtime changes in resource availability, resource management mechanisms associated with middleware carry the performance, usage, and needs information required for runtime adaptation of processing and communication actions. Because the InfoFabric middleware has detailed knowledge of the ways in which information should be transported and manipulated before delivering it to end users, it can employ techniques like automatic redundancy and replication, and service (re)location and (re)partitioning to match changing user needs and platform availabilities.
|
0.93 |
2002 — 2006 |
Schwan, Karsten [⬀] Pu, Calton Eisenhauer, Greg Dovrolis, Constantinos Wolf, Matthew |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sti: Netreact Services: Middleware Technologies to Enable Real-Time Collaboration Across the Internet @ Georgia Tech Research Corporation
Modern science is a distributed wide-area enterprise, requiring real-time coordination of scientific instruments and remote sensors, computational resources, large data repositories and teams of researchers in different locations, even on different continents. The high-bandwidth networking demands of any such real-time and data-intensive collaborations tax the largest network pipes, and when conducted across the Internet, their Gigabit/sec data streams must utilize heterogeneous platforms with network link speeds that vary from 10Gbps to 10Mbps and with end user machines that range from desktop PCs to large supercomputers. However, effective scientific collaboration demands that team members be able to interact with each other and with critical remote resources in real-time, despite platform heterogeneity and despite dynamic variations in the availability of platform resources.
The key idea of the NetReact middleware services is to utilize the substantial server and processing resources associated with distributed collaborations to improve end user performance and compensate for potential deficiencies in network capabilities. NetReact provides rich functionality for dynamically reconfiguring both middleware and applications in response to network and platform monitoring, and to coordinate (1) middleware and application-level reactions to changes in network state with (2) the possibly simultaneous actions taken at the transport level. NetReact's monitoring (NRM) services dynamically determine available network bandwidth and communication latencies. NetReact uses such information to adjust middleware and application actions, to tune the underlying network transport, and even to dynamically select suitable network paths for ongoing middleware-enabled scientific collaborations. By embedding NetReact services into the grid computing middleware commonly used for scientific collaboration, the functionality of such NetReact-enriched middleware is improved substantially, enabling end users to collaborate in real-time even when they do not have access to high end machines or high capacity network links, thereby supporting scientific applications that currently remain out of reach for existing networking and grid computing technologies.
|
0.93 |
2002 — 2005 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Guarding Quasi-Invariants: Generalizing Specialization For System Software Security & Reliability @ Georgia Tech Research Corporation
This project will develop concepts and techniques to improve the security and reliability of system software by detecting and managing invisible links in the code. Invisible links are dependencies among program components that are difficult to find by looking at the code alone. A common source of invisible links is the optimization process that removes "unnecessary" code due to some system invariants. Software reuse and evolution may invalidate these invariants, break invisible links, and cause crashes such as the Ariane 501 rocket. Further, malicious attacks such as TOCTTOU (time-of-check to time-of-use) often exploit invisible links.
Our approach combines three techniques that have not been brought together previously. First is a software abstraction with support for flexible correctness criteria definitions, called Transactional Activity Model, which will demark code boundaries that contain invisible links. Second is the use of wrappers to implement the enforcement of correctness criteria on top of production software, for example, concurrency control around the Unix file system for TOCTTOU. Third, program specialization techniques, in particular, the guarding of quasi-invariants, can make invisible links visible and generate the code to maintain the integrity of these links (e.g., making sure the file has not been replaced by the attacker). This combination offers the promise to reveal invisible links and therefore manage those dependencies explicitly.
|
0.93 |
2002 |
Pu, Calton Schwan, Karsten [⬀] Yalamanchili, Sudhakar (co-PI) [⬀] Blough, Douglas |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Center For Experimental Research in Computer Systems @ Georgia Tech Research Corporation
This planning grant award is the first step toward the setting up of the Center for Experimental Research in Computer Systems (CERCS) seeks to address complex communication/computation systems by bringing together researchers with knowledge of the key technologies underlying these systems, and thereby, create research teams that can address future systems and applications in a fashion that is integrated across multiple technologies and heterogeneous system components. The mission of CERCS is to develop new hardware and software technologies, to create technological advances, and to take advantage of these advances to remove technological barriers faced by complex, integrated systems.
The CERCS approach is experimental and fosters research in which new technologies are evaluated experimentally, with large-scale applications and on systems of substantial size or complexity. The aim is to understand the challenging application requirements that cause novel system-level research, where insights at the system level motivate changes in how certain applications are implemented, and where new system technologies enable new classes of applications. The Center will work with external partners to comprehend their needs and requirements, and to experiment with alternative solutions and approaches.
|
0.93 |
2003 — 2007 |
Schwan, Karsten [⬀] Pu, Calton Eisenhauer, Greg Wolf, Matthew |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Software: Adaptive-Xml: Tools For Collaborative Network Computing @ Georgia Tech Research Corporation
Modern science is an increasingly distributed enterprise, particularly when addressing challenging scientific problems with multidisciplinary researchteams, where team members are routinely assembled from multiple universities,the national labs, and industry participants. A problem pervasive to suchdistributed endeavors is the need to efficiently share scientific data across multiple teams, sites, applications, and machines.
This project's focus is on the ability to represent such data so that it is easily shared across research teams that each use their own, well-defined and domain-specific data representations. S(cientific)-XML is a suite of tools that translate user-friendly XML-based meta-information about shared data to/from the application-specific, efficient, binary-based data structure descriptions used by high performance scientific codes. With S-XML, end users can conveniently express and view their structured data, but all data manipulation and exchanges are performed using efficient binary data representations. Complementing these tools is the XML-ECho adaptive XML-conscious peer-to-peer communication infrastructure, which implements the wide-area exchange of the large-scale binary data used in scientific collaborations.
This middleware uses runtime adaptation to dynamically adjust its data transport and manipulation actions to meet application-level quality of service needs. Specifically, via XML-based descriptions of data structure, end users can dynamically express and alter interest expression that state what data is most important to them and should therefore, be transported preferentially.
The resulting Adaptive-XML tools and data exchange middleware will enable effective collaboration in scientific endeavors that remain infeasible with today's technologies, a concrete example being the Terascale Supernova Initiative now being undertaken by a large research team distributed across U.S. universities and national labs. Project outcomes will also benefit U.S. corporations, as evident from our discussions with companies like Schlumberger and also from the deployment of some of our technologies in industry testbed (e.g., at Delta Air Lines).
|
0.93 |
2003 — 2012 |
Pu, Calton Schwan, Karsten [⬀] Yalamanchili, Sudhakar (co-PI) [⬀] Blough, Douglas |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Industry/University Cooperative Research Center For Experimental Research in Computer Systems (I/Ucrc Ercs) @ Georgia Tech Research Corporation
An Industry/University Cooperative Research Center (I/UCRC) will be established at the Georgia Institute of Technology, called the I/UCRC for Experimental Research in Computer Systems (ERCS).
The I/UCRC is committed to fostering interdisciplinary research and establishing a culture of experimental research reaching out to local and national industry, to encourage participation and contribute to the regional and national economics through the availability of intellectual talent and emerging technologies. Operationally ERCS will create, develop, and evaluate hardware/software systems, in the context of realistic end user applications, for platforms ranging from embedded/real-time devices, to parallel/cluster systems, to the Internet and facilitate the construction and management of such systems by creating new principles, algorithms and techniques, software tools and mechanisms.
|
0.93 |
2003 — 2008 |
Schwan, Karsten [⬀] Pu, Calton Pande, Santosh Eisenhauer, Greg Balch, Tucker (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Collaborative Research: Morphable Software Services: Self-Modifying Programs For Distributed Embedded Systems @ Georgia Tech Research Corporation
Future embedded system applications and infrastructures will be increasingly dynamic. Moreover, the devices used in such infrastructures will vary widely, from sensors and embedded devices, to handhelds, to high end server systems, all of which interact continuously in order to collect, collate, and deliver information from where it is produced to where it is needed. This project addresses the dynamic nature of distributed embedded systems, by developing new information technologies that integrate across multiple areas of Computer Science, including computer architecture, operating and real-time systems, compilers, and middleware. The key intent is to create morphable embedded services, that is, services that continuously self-modify and adapt in order to meet dynamic application needs and environmental/resource constraints, including power budgets, end-to-end quality of service (QoS) guarantees (e.g., timing constraints), and security constraints.
There are many Useful examples of morphable services. In limited forms, they are already present in today's cellphone platforms, for instance, where end users dynamically download new rings or acquire new games (possibly displacing existing ones), etc. Service morphing, however, goes much beyond such configuration capabilities. Imagine a cellphone, for example, which dynamically morphs into an portable wand, using its sensing (e.g., its built-in camera) and communication abilities (e.g., by interacting with other nearby phones) to guide its owner out of a disaster site. Then, in contrast to such functionality-centric morphing, consider this cellphone drawing on the power of nearby server systems (or other phones) to provide suitable levels of service to its user, despite the fact that its power is running low. This can be done, for instance, by dynamically offloading services onto other platforms, by (re)partitioning services across the device and cooperating server systems, and/or by deploying more power-efficient and perhaps, less graphics-capable service code to the phone itself. Another interesting aspect of our work is its ability to go beyond performance and power as the only critical elements of future systems. With our approach, for instance, compiler methods and middleware can be used to enhance information security rather than system performance. This can be done by scattering critical application state to reduce its exposure to external intrusions. As a result, information security can become an integral element of the QoS needs of applications.
A concrete example of security-focused service morphing is to `scatter' critical and vulnerable values across multiple cooperating distributed platforms and to `assemble' them only to the extent needed by the application under compiler control. Moreover, when the last use of the `assembled' value is complete, the `assembled' value is destroyed. Each use of an `assembled' value is verified by compiler-generated code that authenticates it. For example, consider the use of last four digits of the social security number used for authenticating a transaction. The entire social security number will never be stored in a memory location as a value (that could be hacked into). The value will be scattered in a form known only to the compiler which it will then use to `assemble' the value just in time only for the extent of the use.
The different service morphing techniques to be developed in this research include dynamic component (re)deployment, (re)specialization, and (re)partitioning. Such actions are supported by system-level mechanisms that efficiently carry the performance, usage, and requirements information needed for runtime component morphing, principally addressing components' processing and communication actions. The intent is for self-modifying components to be able to acquire runtime information about current resource availabilities and Quality of Service demands. While developing these software technologies, we will concurrently explore new application-specific techniques and methods that take advantage of morphable software services, targeting remote sensing and autonomous robotics applications. Finally, while most of our work will utilize current embedded systems platforms, using XScale boards, we will also consider how to further improve hardware platforms to better enable morphable services. Such work essentially broadens the optimization space in which morphable services are able to operate.
Our technical approach integrates across multiple CS disciplines, by exploiting, for instance, detailed knowledge about computer architecture (e.g., power usage related to memory footprint) to develop compiler techniques that dynamically generate code with functionality and the performance/power profiles more suitable to current application needs. Compiler-level and architectural knowledge is maintained as meta-information at the middleware level, and lightweight middleware dynamically deploys newly morphed code to target platforms. Kernel-level mechanisms collect and distribute the resource information needed for such actions. They also help integrate the application-level with the system-level actions being taken, the latter being particularly important when satisfying certain end-to-end constraints (e.g., timing or power constraints) desired by distributed embedded applications.
A key goal of this research is to demonstrate the importance and utility of morphable services for critical applications. This implies the need to jointly develop application techniques and ideas with morphable service technologies. By grounding our research in a challenging application domain, autonomous robots used in emergency management situations, our technological solutions must ``close the loop'', integrating system-level information about resource constraints, with middleware-level options to morph services, with application-level opportunities for making tradeoffs and choices about how to best meet current requirements. The result are systems in which changed application needs result in new code modules deployed and specialized to meet these needs, jointly with changes in underlying system configurations and properties. In other words, applications and systems are continuously `morphed' to best match end user requirements.
A concrete example of extending application-level research to exploit service morphing is to extend mission-centric notions of `value' in autonomous robots. In robotics, `value' captures an individual robot's contribution to a mission undertaken by a robot team, and `value' helps a robot determine its next actions. Our new research will extend these solutions: instead of considering only movement alternatives, the robots will also consider the `values' of other activities like communication, computation and observation. This approach depends significantly on other components of this proposal, namely QoS management and cooperative service morphing, so that the communication links available to a robot team and the CPU power needed for interpreting distributed sensor inputs can be deployed appropriately.
|
0.93 |
2003 — 2008 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Survivable Continual Data Streams @ Georgia Tech Research Corporation
Continual data streams are a generalization of continuous data streams due to two major factors: (1) irregular data bursts and (2) the need to integrate all kinds of data and metadata, instead of pure time series or multimedia. Continual data streams naturally have a trade-off between performance scalability properties and system survivability properties. On the one hand, survivability requires increased redundancy since node and network instability inevitably renders parts of the system unavailable. On the other hand, scalability requires a decrease in data redundancy, to reduce update and propagation costs. This inherent trade-off between survivability and scalability is a major research challenge due to the irregular arrival and integration of continual data streams. The project will investigate the approaches that span the spectrum between absolute consistency guarantees for replicas in traditional replication on one extreme and by-chance consistency/zero guarantees for cached copies in traditional proxy caches on the other extreme. Formally, this approach is based on the notion of bounded inconsistency such as Epsilon Serializability. The main technical idea is to keep the distance between a replica and the original to within the specified threshold (to handle bursts) while optimizing the performance scalability and integrating heterogeneous data streams.
|
0.93 |
2006 — 2008 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sger: Cyber Physical Systems: Architecture Study and Research Challenges @ Georgia Tech Research Corporation
This SGER award supports a study focused on the architectural alternatives and research challenges of next generation Cyber Physical Systems (CPS). Its goal is to stimulate a cross-disciplinary discussion and debate on CPS research topics and projects. Cyber Physical Systems are carefully engineered mission-critical systems that interact with the physical world and whose operations are integrated, monitored, and controlled by an intelligent computational core. Many existing systems can be considered "first generation" (1G) CPS systems. Fly-by-wire aircrafts are an example of integrated CPS that operates mostly autonomously. Robot-serviced manufacturing production lines are an example of distributed CPS that requires careful coordination among its many components. Although these 1G CPS systems have contributed significantly to our economy and society, they have been built through expensive ad hoc methods and have difficulties adapting and evolving. Recent examples of such difficulties include new aviation technology, and the financial losses and transformation of the automotive industry due to aging manufacturing plants. This CPS study is expected to contribute towards a long-term vision and agenda for Cyber-Physical Systems.
|
0.93 |
2007 — 2010 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ct-T: Collaborative Research: Adaptive Attacks and Defenses in Denial of Information @ Georgia Tech Research Corporation
Spam has become a prominent problem in every important communications medium. Most email users face spam every day. An entire industry has sprung to "improve" search engine rankings of web sites. Further, automatically generated spam has invaded blogs, social networks, online advertising, and VoIP connections. Spam is a rapidly growing practical problem due to the easy adaptation of attacking tools that bypass defense mechanisms. For example, useful defense techniques such as statistical learning filters and collaborative filtering are capable of distinguishing spam from legitimate email. However, attackers have been using automated tools to bypass these defense mechanisms, resulting in a seemingly endless "arms race" between attacks and defenses. For example, randomizing spam tokens and inserting legitimate text as camouflage can significantly reduce the effectiveness of statistical learning filters. Although there is no known general solution for the arms race, known as Adversarial Learning, a defense based on the exploitation of the semantic necessity of spam email to contain strong spam tokens such as VIAGRA (or its misspellings) has been found and demonstrated to end the camouflage arms race. This project seeks additional evidence to support the hypothesis that such structural (inherent) characteristics can be found and used in the identification of many kinds of spam attacks. The first research thrust focuses on the development of defense methods resilient to adaptive spam attacks. The second research thrust investigates the combination of spam attacks in distinct areas (e.g., email and web spam) and combined defenses. Success in these thrusts will significantly and permanently reduce the effectiveness of spam attacks.
|
0.93 |
2009 — 2012 |
Ahamad, Mustaque (co-PI) [⬀] Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ii-New: Collaborative Research: Spam Processing, Archiving, and Monitoring Community Facility (Spam Commons) @ Georgia Tech Research Corporation
In this project, the PIs propose to construct and develop a shared infrastructure to support the collection and maintenance of realistic, large scale spam data sets, referred as SPAM Commons.
Spam is a problem in many important communications media such as email and web. A sub-problem of spam, phishing (a form of online pretexting), caused an estimated $3.2B in damages in 2007. The broad impact of effective spam filtering methods can be estimated in billions of dollars in several communications media such as email and web.
Spam has also invaded other media, with concrete attack examples in social networks, blogosphere, Internet telephony (VoIP), instant messaging, and click fraud.
Unfortunately, spam research has been hampered by the lack of published real world data sets due to concerns with privacy and company intellectual property. This project team develops a shared infrastructure to support the collection and maintenance of realistic, large scale spam data sets, called Spam Processing, Archiving, and Monitoring Community Facility (SPAM Commons).
The main goals of SPAM Commons are: (1) to facilitate remedial research that will stem the wastes and losses caused by spam, and (2) enable revolutionary research that aim for stopping certain kinds of spam attacks altogether.
SPAM Commons is divided into a Public Partition and a Protected Partition.
The Public Partition is a direct analog of standard corpora for speech and image recognition research, consisting of a systematic and regular collection of both spam and legitimate data in the various communications media, starting from email and web spam, and expanding into other communications media as spam becomes a serious threat in each area and data become available.
The Protected Partition consists of a combined data and processing facility that makes private data or near real-time spam data available for experimental evaluation of spam defense mechanisms in a protected testbed. Access to such protected data will enable new spam research on real-time evolving spam and real world data sets that is infeasible today.
The intellectual challenges of the SPAM Commons project extend beyond the new research on various abovementioned spam areas enabled by the availability of data sets. The construction of both partitions of SPAM Commons includes significant intellectual challenges of their own. First, the isolation of Protected Partition addresses partially the concerns of privacy, which remains a general research problem. Second, useful spam and legitimate data sets require automated distinction of spam from legitimate documents with certainty, which remains an open research question in email, web, and other media. Third, the adversarial and mutual evolution of spam producers and defenders require continuous collection of fresh data for further study. Finally, the collection and streaming of near-real-time spam data represent research resources currently unavailable to spam researchers. Advances in these areas will spur the growth and evolution of SPAM Commons that will enable new research on the evolving and growing spam problem.
The impact of SPAM Commons data sets on experimental spam research may be similar to the impact of large corpora in disciplines such as speech/image recognition and natural language processing, which achieved a level of scientific result reproducibility and comparativeness after the use of such corpora became standard requirements. The proposed data repository will be supported and used by 9 university partners (Clayton State, Emory, Georgia Tech, NC A&T, Northwestern, Texas A&M, UC Davis, U. Georgia, UNC Charlotte), and several industry partners (IBM, PureWire, Secure Computing).
|
0.93 |
2009 — 2014 |
Ahamad, Mustaque (co-PI) [⬀] Pu, Calton Liu, Ling Immergluck, Lilly |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Netse: Medium: Privacy-Preserving Information Network and Services For Healthcare Applications @ Georgia Tech Research Corporation
This research explores challenges in developing privacy-preserving information networks and services (PPNs). Next generation healthcare information systems and applications, such as personalized and predictive medicine, need PPNs for privacy-preserving information sharing and dissemination among independent healthcare providers, enabling information access over distributed access controlled content, while safeguarding personal health information and medical privacy of individuals from unauthorized disclosures.
The intellectual merits of this research include the development of: (1) privacy-preserving search capabilities over distributed access controlled content, a critical functionality for PPNs; (2) a suite of utility-aware data anonymization services, preserving the privacy of personal medical information against unauthorized disclosure, at the same time maximizing the data utility for medical service providers; and (3) the PPN architecture and middleware optimized for high availability, scalability and failure recovery.
The broad impact is two-fold. First, this research will create better and broader understanding of the challenges and functional requirements for building the next generation of privacy preserving networked information systems over distributed access controlled content. A domain-specific proof-of-concept prototype on top of the PPN core will be developed for discovering and analyzing risk factors for resistant bacterial infections. These real-world studies will be conducted in collaboration with Morehouse School of Medicine and Children's Healthcare of Atlanta, and be use as both a driver and a testbed for this research. Second, this research will demonstrate that the PPN is an enabling infrastructure for real-time, continuous and on demand data analysis over massively-distributed and privately-shared data repositories.
|
0.93 |
2009 — 2015 |
Pu, Calton Schwan, Karsten [⬀] Yalamanchili, Sudhakar (co-PI) [⬀] Blough, Douglas |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Industry/University Cooperative Research Center For Experimental Research in Computer Systems (Iucrcercs) @ Georgia Tech Research Corporation
IIP 0934313 Georgia Institute of Technology Schwan
This is a proposal to renew support for the Industry/University Cooperative Research Center for Experimental Research in Computer Systems (CERCS). The multi-university center is headquartered at the Georgia Institute of Technology, with an affiliate group at the Ohio State University. CERCS was established in 2001. The focus of the CERCS faculty at Georgia Institute of Technology is on the core systems of technologies underlying large scale computing systems-technology creation.
CERCS will continue to entertain a large number and variety of projects, driven by faculty interests, industry connection and center capabilities. Three key domains of interest to CERCS are Enterprise Systems, Scientific Computing, and Embedded Systems; and, underlying and uniting these domains are four significant research thrusts. CERCS is committed to fostering interdisciplinary research, establishing a culture of experimental research reaching out to local and national industry, and to encourage participation and contribute to the regional and national economies through the availability of talent and emerging technologies.
The broader impact of CERCS is on the application of the research results on the IT producer and consumer companies. CERCS plans to stimulate and ensure practical research with broad practical impact by collaborating with industry partners. Work in energy management, multicore software stack, and applications to scientific computing, enterprise computing, and mobile computing will all benefit society by reducing costs of computing and allowing society to efficiently address larger computing problems than currently possible. CERCS also plans to create broad student community for research and education activities and to educate qualified students to join the software/computing industry.
|
0.93 |
2011 — 2013 |
Schwan, Karsten [⬀] Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Frp: An Experimental Comparative Study of N-Tier Application Performance in Computational Clouds @ Georgia Tech Research Corporation
Center for Experimental Research in Computer Systems Proposal #1127904
This proposal seeks funding for the Center for Experimental Research in Computer Systems at Georgia Institute of Technology. Funding Requests for Fundamental Research are authorized by an NSF approved solicitation, NSF 10-601. The solicitation invites I/UCRCs to submit proposals for support of industry-defined fundamental research.
While cloud computing is rising in importance for numerous applications, there remains a fundamental lack of understanding of performance achievable for different configurations, especially for N-tier applications common in such areas as e-commerce and social networking. The proposed research will seek to systematically design large scale experiments from which performance data will be derived and performance metrics established for N-tier applications. The resulting large data sets can enable researchers to explore means to achieve optimal allocation of hardware and software resources for specific applications. The proposed comparative experimental study will enable development of comparative models through which N-tier application performance can be predicted and as such holds the opportunity for significant breakthroughs in understanding of cloud performance for this class of problems.
The proposed research has the potential to source the development of tools from which industry providers of cloud resources can better manage their resources and offer services in a cost effective way. Additionally, this optimization can be applied to achievement of Green IT goals. The work is well supported by center individual industry members and has the potential to extend the portfolio of the center by virtue of the many studies and modeling efforts achievable using the dataset generated by this study. Beyond the center, the dataset, if properly designed, has the potential for broad impact in the research community as a resource for studies in this area. The proposal furthermore provides a solid plan for student and UREP involvement.
|
0.93 |
2011 — 2012 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Rapid: Automating Emergency Data and Metadata Management to Support Effective Short Term and Long Term Disaster Recovery Efforts @ Georgia Tech Research Corporation
Proposal #: CNS 11-38666 PI(s): Pu, Calton Institution: Georgia Institute of Technology Title: RAPID: Automating Emergency Data and Metadata Management to Support Effective Short and Long Term Disaster Recovery Efforts Project Proposed: This RAPID project, collecting, processing, and disseminating appropriate sensor data, aims to contribute to an effective recovery. The work addresses the challenges of sensor data flood during an emergency, through integration, evaluation, and enhancement of current data management tools, particularly with respect to meta-data. Automation of data and meta-data collection, processing, and dissemination are expected to alleviate the time pressure on human operators. The fundamental tools support quality information dimensions such as provenance, timeliness, security, privacy, and confidentiality, enabling an appropriate interpretation of the sensor data in the long term. For the short term, the tools are expected to help relief the workers as data producers and consumers; for the long term, they will provide high quality information for disaster recovery decision support systems. Additionally, the cloud-based system architecture and implementation of the CERCS cluster of Open Cirrus provide high availability and ease of access for recovery efforts in Japan as well as for researchers worldwide. The integration of techniques from several information dimensions (e.g., data provenance, surety, and privacy) and the application of code generation techniques to automate the data and metadata management tools constitute the intellectual merit of the proposed research. New challenges will be encountered in the potential interferences among the quality of information dimensions. It is also a new challenge to apply code generation techniques in the adaptation of software tools to accommodate changes imposed by environmental damages and contextual as well as cultural differences among countries. The investigator collaborates with Prof. Masaru Kitsuregawa from the University of Tokyo, Japan, a leading researcher in data management. He is the first database researcher from Asia to win the ACM SOGMOD Innovation Award (2009). In addition to a letter of support and biographical sketches of the Japanese collaborator, a support letter has been submitted by Intel to OISE, CISE and Engineering. Intel has offered access to the Intel Open Cirrus cluster to conduct the research. Broader Impacts: The proposed tools should contribute to improve both the quantity and quality of data being collected by a variety of sensors, thus improving the effectiveness of short and long term decision making. For example, measured radiation levels in agricultural products can serve as an indication of spreading radioactive contaminations that complement the direct readings of radiation in soil samples. The project enables informed decisions based on precise interpretation of real sensor data that may improve the quality of life at both human and social levels, while reducing costs. The project will also contribute in graduate student education.
|
0.93 |
2011 — 2014 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr:Small: Multi-Bottlenecks: What They Are and How to Find Them @ Georgia Tech Research Corporation
This project addresses computing clouds, large-scale shared infrastructures that offer practically unlimited hardware to most users and applications. In order to achieve scalable performance, all components of the system, from hardware to operating system, middleware, various servers, and the application itself, need to cooperate. Bottlenecks in components can slow down the entire system. In traditional computer systems (e.g., as modeled by queuing theory), a typical assumption is that their workloads consist of independent jobs. This assumption, which is valid for old-style batch-oriented processing and interactive users, guarantees the appearance of single bottlenecks for an entire system. Single bottlenecks can be relatively easily detected, since they appear as resources reaching saturation (e.g., 100% utilization).
The "independent jobs" model does not hold for the important class of web-facing applications (e.g., e-commerce) that rely on the popular n-tier architecture. N-tier systems divide the system into a pipeline of processing components, e.g., consisting of web servers, application servers, and database servers. While the n-tier architecture supports good performance scalability at the web server and application server tiers, it also introduces several (sometimes unexpected) strong dependencies among other tiers and components. These dependencies produce an interesting phenomenon called multi-bottleneck. Multi-bottlenecks are characterized by system throughput limited by a ceiling regardless of additional hardware, and no single resource shows average utilization anywhere near saturation. (Anecdotally, this is an increasingly common situation in practice.) Multi-bottlenecks are difficult to find, diagnose, and remove when using traditional performance evaluation methods. They are also important in clouds since they will be the only bottlenecks left after the removal of easily spotted single bottlenecks.
This project develops, evaluates, and refines a systematic search method, called Telescoping, to find multi-bottlenecks by running large scale experiments on production clouds. A simulator generates well-defined multi-bottlenecks to help refine the Telescoping search method and tune its parameters. Then, n-tier benchmarks such as RUBiS and RUBBoS (e-commerce applications) on production clouds such as Open Cirrus, Amazon EC2, and Emulab, gather experimental evidence on multi-bottlenecks. These experiments shed light on a little-known phenomenon in a rich, but unexplored area (performance limits of jobs with dependencies). Success can lead to significant new developments in the theoretical understanding of jobs with dependencies and improve practical uses of clouds by n-tier systems.
|
0.93 |
2012 — 2014 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Savi: Eager: For Global Research On Applying Information Technology to Support Effective Disaster Management (Grait-Dm) @ Georgia Tech Research Corporation
Disasters cause damages estimated in many billions of dollars and many lives lost every year. Disaster management is a challenging task due to the seemingly unpredictable alterations of the environment and impact on people. With the primary focus on the application of information technology and Big Data, this award establishes a "virtual institute" for Global Research on Applying Information Technology to Support Effective Disaster Management (GRAIT-DM). It will foster research collaborations and community activities, with the goal of improving our preparedness for, response to, and recovery from disasters. Led by partners at the Georgia Institute of Technology and the University of Tokyo, the virtual institute is a U.S.-Japanese cooperative effort that should grow to become a global collaboration in the near future.
Information technology has transformed modern disaster management, as demonstrated by Twitter which was a valuable information source during the Tohoku Earthquake. A Big Data-based approach to disaster management research can be both transformative and challenging, at both human and social scales. Reflecting this, the GRAIT-DM project supports the collection of large data sets from environmental sensors and information networks shared by many researchers working on various aspects of disaster management. This virtual institute promotes global research on the application of information technology by engaging the big data producers (e.g., sensor networks researchers), big data consumers (e.g., disaster management researchers), and big data managers (e.g., data analytics researchers) who connect the big data producers to consumers. Concrete activities include community-building workshops in the U.S. and Japan, outreach and publication of research reports, educational activities such as summer schools for graduate students and junior researcher exchanges, and a web portal to provide access to data and support for software tools for community use. Broader impacts include beneficial leveraging of international research and infrastructure investments, enhancement of on-going projects through cross-fertilization, an accelerated rate of innovation relevant to disaster management, and the development of a work-force with specialized talent, capable of excelling in a new, highly interconnected world that must cope with disasters.
This award has been designated as a Science Across Virtual Institutes (SAVI) award and is being co-funded by NSF's Office of International Science and Engineering.
|
0.93 |
2014 — 2017 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: An Exploratory Study of Multi-Hazard Management Through Multi-Source Integration of Physical and Social Sensors @ Georgia Tech Research Corporation
Natural and man-made disasters can cause significant material damages and human suffering. For example, Superstorm Sandy of 2012 is estimated to have caused more than $68 billion in damages and killed at least 286 people in seven countries. Improving the preparation for, response to, and recovery from disasters can reduce damages, relieve human suffering, and speed up recovery. Among disasters, a multi-hazard is a sequence of disasters in which the first disaster causes the subsequent disasters, making it far more difficult for emergency response teams to handle all of them. For example, the March 11, 2011, Tohoku, Japan, earthquake triggered an unprecedented tsunami, which led to flooding at, and partial meltdown of, the Fukushima Daiichi Nuclear Power Plant. A more frequent example of multi-hazards is landslides, which can be triggered by many causes including earthquakes, rainfall, and man-made environmental changes.
While the detection of a single disaster usually only requires one kind of dedicated sensor, for example, seismographs can detect earthquakes reliably, multi-hazards often require a combination of various kinds of sensors for the detection of the multiple events in the sequence. Indeed, the detection of multi-events in general and multi-hazards in particular is a non-trivial problem due to the various kinds of events involved and the large number of combinations that make offline combinatorial analysis impractical. In the case of landslides, their detection is complicated further by the several possible and unrelated causes of landslides (e.g., earthquake and rainfall), each requiring a different kind of sensor.
In this project, the team is building a landslide detection system, called LITMUS, that integrates data from two physical sensors -- USGS Global Seismographic Network (GSN), NASA Tropical Rainfall Monitoring Mission (TRMM) -- with data from pervasive social media platforms. This integration of multiple heterogeneous sensors in LITMUS is an illustrative example of successfully applying big data software tools and analytics techniques to solve real-world problems. Specifically, the team is extending geo-tagging to relevant data items, which are filtered in several stages to reduce noise and false positives, and applying machine learning, information retrieval, and semantic web techniques to each data stream. Finally, filtered social media data are being cross-referenced with physical events from the same geo-location to generate supporting evidence for landslide detection. A LITMUS prototype has been detecting more landslides around the world than traditional landslide reporting systems: tests with live streaming data show that the combined result is a list of landslide events that has included the USGS authoritative list, plus many other confirmed landslides around the world.
|
0.93 |
2014 — 2017 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Csr: Small: Lightning in Clouds: Detection and Characterization of Very Short Bottlenecks @ Georgia Tech Research Corporation
A plausible explanation for the persistent low utilization of data centers (around 18% by Gartner reports) is the managerial need to maintain quality of service against the well-known Latency Long Tail problem, where some apparently random requests that normally return within milliseconds would suddenly take multiple seconds. The latency long tail problem arises at moderate utilization levels (e.g., 50%) with all resources far from saturation. Despite the efforts to remedy the latency long tail problem in various ways, its causes have remained elusive: In most cases, the very requests that took several seconds actually return within milliseconds when executed by themselves. Studying and solving the latency long tail problem will contribute to better utilization while maintaining quality of service, leading to lower costs for cloud users, higher return on investment for cloud providers, and lower power consumption for the environment. The main goal of this project is the investigation of the class of very short bottlenecks, in which the CPU becomes saturated only for a small fraction of a second, as a significant cause of latency long tail problems. Despite their short lifespan, very short bottlenecks can lead to significant response time increases (several seconds) by propagating queuing effects up and down the request chain in an n-tier application system because of strong dependencies among the tiers during request processing.
This project runs large scale experiments in clouds and simulators to generate extensive fine-grain monitoring data in the investigation of very short bottlenecks, which are virtually invisible under typical performance monitoring tools with sampling periods of seconds or minutes. To match the time scale of very short bottlenecks, special instrumentation software tools are being refined to sample intra-server resource utilization at millisecond resolution and timestamp inter-server messages at microsecond resolution. Preliminary studies of n-tier application benchmarks with naturally bursty workloads have found very short bottlenecks that cause latency long tail in several system layers: systems software (JVM garbage collection), processor architecture (dynamic voltage and frequency scaling), and consolidation of applications in virtualized cloud environments. They show the potential for many other sources of very short bottlenecks, e.g., kernel daemon processes that use 100% of CPU for several milliseconds. Through careful distributed event analysis of the experimental data, new kinds of very short bottlenecks can be discovered, verified, reproduced, and studied in detail. Concrete solutions for specific very short bottlenecks have been developed, e.g., an improved Java garbage collector. However, other very short bottlenecks have no specific bug-fixes, e.g., those created by consolidated workload overlapping bursts of statistical nature. As an alternative to bug-fixes, more general solutions that disrupt queuing propagation are being explored. As a concrete example, instead of using a classic request/response approach, where waiting threads participate in the queuing propagation, asynchronous requests with notification of responses to reduce overall queuing is being investigated as a potential solution to eliminate or reduce the impact of several kinds of very short bottlenecks.
|
0.93 |
2015 — 2018 |
Pu, Calton Tien, Iris Goodman, Seymour |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Crisp Type 1: Multi-Scale Modeling Framework For the Assessment and Control of Resilient Interdependent Critical Infrastructure Systems @ Georgia Tech Research Corporation
This project will create a novel modeling framework to assess and control interdependent critical infrastructure systems (ICIs). Infrastructure systems are critical to the functioning of our society, and the services they deliver form the backbone of the health, safety, and security of our nation. These systems are complex, comprised of many interdependent components. Further, these systems are interdependent, with the performance of one system dependent on the performance of one or more of the others. This leaves ICIs vulnerable to a variety of hazards, both natural and manmade. This project will study how to improve the resilience of these systems, with the recognition that achieving resilience will be a shared responsibility among stakeholders. At the same time, more and more data is becoming available to assess the states of ICIs both under normal conditions and over time.
This project will take a multidisciplinary approach, integrating across engineering, computation, and policy to create a powerful stakeholder-driven framework that models ICIs across scales and utilizes data across sources to evaluate the current status of infrastructures and make predictions on their performance and reliability. The researchers will study three ICIs in particular: transportation, power, and communications infrastructure, applying the framework to the study of these ICIs in two specific communities, one urban and one rural. The framework will be created in conjunction with the development of new processes to achieve stakeholder buy-in and policy adoption to support integration of the new technology with policy. With fast algorithms to solve the models of the framework and real-time (or near-real-time) data collection capabilities, a powerful resilient infrastructure management system that can react, adapt, and even proactively take precautionary actions in anticipation of impending disasters is envisioned.
The results of this project will also be integrated into extensive classroom and educational research activities, training the next generation of scientists, engineers, and policymakers on the importance of critical infrastructure resilience and in the development of new multi-disciplinary methods and tools to achieve resilience.
|
0.93 |
2015 — 2018 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Rcn: Savi: Adaptive Management and Use of Resilient Infrastructures in Smart Cities: Support For Global Collaborative Research On Real-Time Analytics of Heterogeneous Big Data @ Georgia Tech Research Corporation
Cities provide ready and efficient access to facilities and amenities through shared civil infrastructures such as transportation and healthcare. Making such critical infrastructures resilient to sudden changes, e.g., caused by large-scale disasters, requires careful management of limited and varying resources. The rapidly growing big data from both physical sensors and social media in real-time suggest an unprecedented opportunity for information technology to enable increasing efficiency and effectiveness of adaptive resource management techniques in response to sharp changes in supply and/or demand on critical infrastructures. Within the general areas of resilient infrastructures and big data, this project will focus on the integration of heterogeneous Big Data and real-time analytics that will improve the adaptive management of resources when critical infrastructures are under stress. The integration of heterogeneous data sources is essential because many kinds of physical sensors and social media provide useful information on various critical infrastructures, particularly when they are under stress.
This Research Coordination Network (RCN) will promote meetings and activities that stimulate and enable new research on integration of heterogeneous physical sensor data and social media for real-time big data analytics in support of resilient critical infrastructures such as transportation and healthcare in smart cities. As first example, the RCN will support participation from young faculty attending the Early Career Investigators' Workshop on Cyber-Physical Systems in Smart Cities (ECI-CPS) at CPSweek (April of each year) and young faculty attending the Workshop on Big Data Analytics for Cyber-physical Systems (BDACPS). As a second example, the RCN will support contributions to a Special Track on Big Data Analytics for Resilient Infrastructures at the IEEE Big Data Congress. As a third example, the RCN will support participation in International meetings organized by other countries, e.g., Japan's Big Data program by Japan Science and Technology Agency (JST). The project will also maintain a repository of research resources. Concretely, the RCN will actively collect and make readily available public data sets (e.g., physical and social sensor data) and software tools (e.g., to support real-time big data analytics). The technologies and tools that arise from RCN-enabled research will be applied to socially and economically impactful areas such as reducing congestion and personalized healthcare in smart cities.
|
0.93 |
2016 — 2020 |
Pu, Calton Liu, Ling |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Twc: Medium: Privacy Preserving Computation in Big Data Clouds @ Georgia Tech Research Corporation
Privacy is critical to freedom of creativity and innovation. Assured privacy protection offers unprecedented opportunities for industry innovation, science and engineering discovery, as well as new life enhancing experiences and opportunities. The ability to perform efficient and yet privacy preserving big data computations in the Cloud holds great potential for safe and effective data analytics, such as enabling health-care applications to provide personalized medical treatments using an individual's DNA sequence, or enabling advertisers to create targeted advertisements by mining a user's clickstream and social activities, without violation of data privacy. The PrivacyGuard project is developing algorithms, systems and tools that provide end-to-end privacy guarantees over the life cycle of a data analytic job. The end-to-end privacy guarantee can be measured by how difficult one can learn about some of the original sensitive data from the sanitized data releases, the intermediate results of execution and the output of an analytic job. The ultimate goal of PrivacyGuard is to develop a methodical framework and a suite of techniques for ensuring distributed computations to meet the desired privacy requirements of input data, as well as protecting against disclosure of sensitive patterns during execution and in the final output of the computation.
The PrivacyGuard project advances the knowledge and understanding of privacy preserving distributed computation from three perspectives: (1) It designs formal mechanisms to formulate a data owner's end-to-end privacy requirement for each data release, for example, by associating each data release with a well-defined usage scope to confine the set of data analytics models and algorithms that can operate on the released data. (2) It develops a suite of execution privacy guards with dual objectives: to audit and enforce privacy compliances during distributed computation against data-flow based privacy violations and to guard the compliance of input privacy. (3) It devises a proactive approach to output privacy against information leakages associated with mining output, for example, by leveraging differential privacy model to maximize the upper bound for data privacy guarantee and minimize the lower bound for data utility losses. The PrivacyGuard project is the first effort towards a practical and systematic implementation framework for ensuring the end-to-end privacy in distributed big data computations. Furthermore, by integrating the PrivacyGuard research with the curriculum development on big data systems and analytics courses at Georgia Institute of Technology, it contributes to the education and training of new generation of data scientists to be the privacy compliance advocates.
|
0.93 |
2017 — 2018 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
1st Us-Japan Workshop Enabling Global Collaborations in Big Data Research; June, 2017, Atlanta, Ga @ Georgia Tech Research Corporation
The 1st US-Japan Workshop Enabling Global Collaborations in Big Data Research brings together researchers from the United States (U.S.) and Japan to discuss experiences, challenges, and opportunities in international research collaborations. The workshop provides opportunities for participants from both countries to identify mutual research interests that leverage resources and expertise to accelerate advancements in smart and connected communities, cyber-physical systems, artificial intelligence, and machine learning. The workshop includes two tracks to support a broad range of participation: track 1, for participants with prior collaboration experience and significant potential to advance the field; and track 2, for those without prior collaboration experience who may benefit significantly from the diverse training environments afforded by international collaborations. The outcomes of the workshop will be disseminated through a report describing the main strategic areas, key research opportunities, and collaboration scenarios discussed during the workshop, thereby benefiting a broader research community. The workshop is co-located with the 2017 IEEE International Conference on distributed Computing Systems (ICDCS). This engagement builds upon prior National Science Foundation (NSF)-Japan Science and Technology Agency (JST) collaborations.
The areas of smart and connected communities, cyber-physical systems, artificial intelligence, and machine learning present intellectual challenges of their own, as well as challenges and opportunities at their intersections. Specifically, the challenges and opportunities created by growing data, sensors, cloud and edge computing, and networking at a global scale call for international research collaborations. For example, advances in machine learning and artificial intelligence, paired with the development and implementation of large sensor networks (e.g., Array of Things in Chicago and Fujisawa Sustainable Smart Town in Japan) can enable city-scale data to improve efficiency, economic prosperity, and security in our cities and communities. The workshop will focus on research challenges that are beyond the individual reach of each participant, but that become feasible goals with effective collaboration between the two countries. This can be achieved when both sides share similar interests, but with complementary expertise and skills, (e.g., from different but related areas such as the examples mentioned above).
|
0.93 |
2020 — 2021 |
Pu, Calton Liu, Ling |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Rapid: Tracking and Evaluation of the Coronavirus (Covid-19) Epidemic Propagation by Finding and Maintaining Live Knowledge in Social Media @ Georgia Tech Research Corporation
Accurate situational awareness becomes an increasingly difficult challenge in rapidly changing environments. With currently exponential growth of COVID-19 confirmed cases, timely and reliable information becomes extremely important for informed decision making. Official reports based on confirmed test results are reliable, but widely considered to be a subset of the real situation. In contrast, social media provide broad coverage, but they have low reliability due to significant misinformation and disinformation or inaccurate news. With the gradual opening of businesses in the US, while the prospect of an effective vaccine remains uncertain, the need for reliable and accurate situation awareness becomes paramount, since the decisions for further business openings and practices of social distancing will depend on the information and perception of risks of contagion and the need for economic recovery.
This project addresses the technical challenges of finding new, verifiable facts from noisy online media and social networks in a timely manner. Social media contain the necessary timely information, but they also carry significant challenges represented by misinformation, disinformation, and concept drift. Traditional machine learning (ML) models trained from closed data sets have been unable to meet these challenges when faced with true novelty in evolving new data, beyond the fixed training data. To handle these challenges, the Evidence-Based Knowledge Acquisition (EBKA) approach automates the integration of noisy social media data such as Twitter and Weibo with recognized, respected authoritative sources to detect verifiable facts timely and reliably. The project build on the LITMUS software tools to provide timely and reliable information com complement physical test result data, and enable better informed decision making by government officials, first responders, and the general public.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.93 |
2020 — 2022 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Live Reality: Sustainable and Up-to-Date Information Quality in Live Social Media Through Continuous Evidence-Based Knowledge Acquisition @ Georgia Tech Research Corporation
Social media have complemented traditional press with immediate reports and worldwide coverage. However, they also receive and propagate significant amounts of misinformation and disinformation such as fake news. A skillful mixture of verifiable facts and outrageous fiction, fake news aim to attract reader attention, make an immediate initial impact, and then quickly forgotten. Even as disposable novelty, fake news have had significant impact on real world events such as elections. For human readers and machine learning (ML) classifiers, distinguishing fake news from real news has been challenging due to their sophisticated construction, camouflaging fiction with facts, as well as continuously evolving by incorporating the newest and hottest topics as they mutate. The Live Reality project will track the evolution of fake news through continuous import of reliable, verified facts from authoritative sources, and separate the facts from fiction, to catch fake news in the act. The automated real-time tracking capability is a significant innovation compared to traditional ML classifiers generated from manually labeled training data, which are constrained to finding historical fake news, long after the fact.
Given the short lifespan of disposable novelty (days or hours), catching fake news in the act requires significant innovation in two dimensions. First, the ML classifier must be continuously updated to recognize true novelty that have never been seen before. Second, the update must be sufficiently timely to catch disposable novelty before they expire, e.g., within hours of their initial dissemination. Continuous collection of live social media and authoritative sources will generate novel fake news and associated ground truth, which are integrated through the Evidence-Based Knowledge Acquisition (EBKA) approach, which adds reliable information from authoritative sources into a continuously adaptive teamed classifier to distinguish the verifiable facts from the fiction in fake news. As news topics evolve, fake news are expected to follow, and EBKA will generate and integrate new sub-models into the live teamed classifier to recognize the new topics. The EBKA approach will be demonstrated on live data containing fake news on a variety of topics, specifically disaster management such as the COVID-19 pandemic. Due to the disposable novelty nature of fake news, EBKA will be evaluated in two dimensions: classifier performance in terms of accuracy and precision, and timeliness of classifier identifying truly new fake news soon after their appearance in the real world.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.93 |
2020 — 2023 |
Pu, Calton |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Hnds-I: Collaborative Research: Developing a Data Platform For Analysis of Nonprofit Organizations @ Georgia Tech Research Corporation
Nonprofit organizations are important contributors to the US economy and social well-being. Across a wide range of domains, such as healthcare, childcare, education, job training, and many others, nonprofit organizations serve the public, reduce the costs of government, and improve daily lives. Millions of individuals interact with nonprofit organizations every day. Yet despite these important roles, the high costs of collecting and sharing data have prevented a greater understanding of nonprofit organizations and their collective contributions to society. This project, the Nonprofit Organization Research Panel Project (NORPP) Manager, will create a publicly-accessible, internet-based, and collaborative research platform that will lower the costs of collecting and sharing large amounts of high-quality, multiyear data on nonprofits and their impacts. The platform will strengthen research and evaluation, broaden access to data-intensive research, and lead to more scientifically informed decision-making by organizations, policymakers, and funders and to improved outcomes for the communities they serve.
The NORPP platform will offer three primary functions. First, tools and automated processes will allow researchers to recruit and grow representative samples of nonprofit organizations nationally and across communities, states, and regions over time using a common methodology. The platform will automate systematic sampling and weighting procedures to ensure representativeness for studies using the platform, and it will reduce other costs to researchers by automating information flows with organizations in its samples. It will also allow public access among the research and practice communities to download nationally representative data, build additional project-specific samples within the platform, add original survey instruments, and collect original data. Second, the NORPP Manager will support the merger of data developed within the platform with IRS Form 990 Data and other available data on organizations from the population of 501(c)(3) nonprofit organizations that file with the IRS, including geocoded data on organizations? communities and external environment that are publicly available from Census and other sources. Third, the NORPP Manager will serve as a collaborative repository to manage a bank of core and supplemental questionnaires to facilitate replication studies, improve content validity of measurements, and allow for consistent measurement scales across nonprofit organizational research. This work will increase the efficiency of research on nonprofit organizations by significantly reducing the costs to sample, contact, and survey nonprofit organizations, and to merge those data with existing data with the goal of facilitating the growth of rigorous, data-intensive research across the many social science disciplines that intersect with nonprofit organizational research.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.93 |