2006 — 2011 |
Ferhatosmanoglu, Hakan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Exploration of Dynamic Sequences in Scientific Databases
Scientific data repositories increasingly involve large amounts of images and streams of empirical measurements generated by a diverse set of data sources. The goal of this project is to develop online structures and algorithms to dynamically maintain and analyze data sequences for scientific discovery and monitoring purposes. The implementation focuses on specific applications from physical and biological sciences that generate vast amounts of multi-dimensional data sequences. For scientific discoveries, an iterative querying framework is developed for modeling of the sequences of observations. The framework optimally utilizes access structures to execute queries ranging from a simple max aggregate to complex scientific queries. Interactive tools are implemented where researchers are able to incorporate domain specific knowledge into the search process. For real-time monitoring, one-pass summaries that can be updated in constant-time are developed. The structures are designed to be self-adaptive with respect to the workload changes and to handle heterogeneous and incomplete information. The project involves collaborations with domain experts in focus areas and is expected to advance the state-of-the-art knowledge in the application domains. For example, the gene expression analysis tools implemented in this project have already enhanced the ability of the collaborative researchers in their studies of Haemophilus Influenzae (first described in 1892 by Dr. Richard Pfeiffer during an influenza pandemic) in order to understand it role in a wide range of clinical diseases, so that effective vaccines can be developed. This research project is integrated with education through significant educational and outreach activities. The developed toolkits, findings, and methods of the project will be communicated in a broader context and to an expanded audience through the project website (http://www.cse.ohio-state.edu/~hakan/Career.html).
|
0.948 |
2006 — 2010 |
Bedford, Keith Agrawal, Gagan [⬀] Li, Rongxing (Ron) (co-PI) [⬀] Ferhatosmanoglu, Hakan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ceo:P--a Data-Intensive Cyberinfrastructure Component For Coastal Forecasting and Change Analysis @ Ohio State University Research Foundation -Do Not Use
Abstract OCI 0619041
Over the years, much work has been done on observing and modeling the environment. Many complex systems have been, or are being, built. Despite advances in the amount of data being collected, (including larger number of sources as well as increased spatio-temporal granularity) and enhancements in the techniques being used for analyzing these datasets, a number of challenges remain in this area.
Firstly, the current systems are very tightly coupled. There is hardly any reuse of algorithm implementations across different systems. It is also extremely hard to test or incorporate new analysis algorithms. The implementations are closely tied to the available resources, and finally, the existing systems cannot adapt the granularity of analysis to the resource availability and time constraints. The emerging trend towards (closely related) concepts of service-oriented architectures and grid computing can alleviate the above problems. They can enable development of services that are not tied to specific datasets or end applications, and implementation of applications using these services. However, this also requires advances in grid middleware components that are able to support streaming applications and data virtualization/integration.
This project proposes to develop and evaluate a cyberinfrastructure component for environmental applications. This will include developments in middleware, model integration, analysis, and mining techniques, and the use of a service model for supporting two closely related applications. These applications will be real-time coastal now casting and forecasting, and long term coastal erosion analysis and prediction. The specific problems addressed are as follows. In the first application, focus will be on real-time now casting and forecasting of coastal conditions. Middleware and service-oriented implementation will be used to allow new algorithms to be inserted (for example, for beach closings and coliform forecasts), allow more complex models to be used based on resource and time constraints, allow new data streams to be inserted flexibly, and allow new algorithms for analysis and interpretation to be operated on data being produced from forecasting/now casting models. In the second application, advanced models will be developed for long-term coastal changes and erosion patterns, and allow larger scale, distributed, and flexible data analysis. Implementation and evaluation will be in the context of the Great Lakes Observing System (GLOS) and will be done jointly with the National Oceanic and Atmospheric Administration (NOAA). This is an excellent opportunity to carry out realistic design, deployment, and evaluation of the cyberinfrastructure component, and also impact the long-term design and operation of a real environmental observation system. This project will be a joint effort between The Ohio State University (OSU) and the National Oceanic and Atmospheric Administration (NOAA). The OSU team includes two computer science researchers: Gagan Agrawal (grid middleware systems) and Hakan Ferhatosmanoglu (databases and data analysis), and two environmental researchers: Keith Bedford (environmental modeling) and Ron Li (geospatial data analysis and remote sensing). The NOAA collaborators include Dr. Frank Aikman, NOAA-National Ocean Service (NOS), and Dr. David Schwab, NOAA -Great Lakes Environmental Research Lab (GLERL).
|
0.927 |
2008 — 2012 |
Ferhatosmanoglu, Hakan Wang, Yusu (co-PI) [⬀] Li, Chenglong (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Similarity-Based Indexing and Integration of Protein Sequence and Structure Databases @ Ohio State University Research Foundation -Do Not Use
The Ohio State University is awarded a grant to develop database indexing and similarity search technologies to manage, analyze, and integrate protein sequence and structure databases. Searching for similar sequences and structures in genomic and proteomic databases is a fundamental task in bioinformatics. As the size of the available data increases rapidly, it is essential to build indexing schemes so that integrated maintenance and querying of both sequence and structure data can be achieved effectively. To address this challenge, this project uses a unified theme for both types of data: extracting key features and mapping them into compact feature vectors spaces to facilitate construction of integrated index structures with sensitive, accurate, and efficient querying capabilities. For the sequence data, the project will develop novel feature extraction that involve physiochemical properties of the amino acids and detect low level of similarities. For the structural data, the project will develop methods to capture local structural motifs using contact maps and spatial motifs. In both cases, compact representation of features will be constructed, as well as efficient structure to index them. The approach incorporates biochemical proteins of molecules into feature extraction to discover functional sites of proteins and to return biologically relevant query results. Finally, based on the unified feature representation and indexing framework, the project will develop methods to integrate sequence and structure data effectively at various levels. A holistic approach combining sequence and structure data would help to overcome the limitations of each, and provide more accurate query results. The results of the project will benefit a wide range of application areas in natural and health sciences, including: comparative and functional genomics, protein modeling and design, drug development, and preventative and personalized medicine. Software developed in this project will facilitate large-scale genome-wide research projects which require iterative and interactive querying of available sequence and structure databases. The novel representations and sensitive motif extraction methods developed are also applicable to biological data visualization, classification, and multiple alignment problems. The software and the results of this project will be available at the website: http://bio.cse.ohio-state.edu.
|
0.927 |