2008 — 2012 |
Su, Andrew I |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Biogps: Extensible Web 2.0 Gene Portal For Structured and Unstructured Annotation @ Novartis Inst For Functional Genomics
DESCRIPTION (provided by applicant): Gene portals have become essential resources for modern biological research. Technologies for high-throughput experimentation are now commonly used in biomedical research, and scientists are often faced with evaluating many candidate genes which are unfamiliar to them. These researchers often turn to gene portals to quickly summarize and curate known gene annotation for their genes of interest from public databases, enabling them to quickly evaluate genes and formulate hypotheses for follow-up testing. Despite their current utility, there are also many opportunities to improve gene portals to better serve the scientific community. Here, we propose the construction of a new gene portal called BioGPS (Biology Gene Portal Services) which is based on recent concepts in web design and online collaboration. These principles, commonly referred to by the moniker "Web 2.0", were largely defined based on an analysis of successful web sites, including Wikipedia, Google, and Amazon. In the context of BioGPS, this proposal focuses on two key elements of Web 2.0. First, BioGPS will place heavy emphasis on maximizing usability. Development of BioGPS will draw on concepts and techniques from the discipline of computer science called Human Computer Interactions (HCI). These efforts will include continual usability testing, and utilizing techniques to maximize user interactivity. Second, BioGPS will also emphasize the concept of extensibility. Rather than develop a web site with a small community of web users, we will design a web platform that serves both the needs of web users and of bioinformatics scientists. Extensibility also applies to many different domain areas with BioGPS. BioGPS will be extensible in terms of data, allowing scientists to customize and contribute their own data sets to the gene portal. BioGPS will also be extensible in terms of knowledge, enabling scientists to collaborative share and edit free-text gene annotation through a gene wiki system. And finally, BioGPS will be extensible in terms of application development, allowing other bioinformatics programmers to extend functionality and link other bioinformatics analysis tools. In summary, the BioGPS gene portal will provide a platform for networks of users to synergistically leverage community knowledge and effort, allowing researchers across biological disciplines to efficiently translate high-throughput data to testable hypotheses. The development of the BioGPS gene portal will allow communities of biological researchers to effectively share knowledge and effort devoted to gene annotation. Generating a detailed and rich view of the function of every gene in the human genome will benefit our understanding of basic biological mechanisms, as well as the role of individual genes in human health.
|
1 |
2010 — 2013 |
De Alfaro, Luca Su, Andrew I |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
The Gene Wiki: Community Intelligence Applied to Gene Annotation @ Novartis Inst For Functional Genomics
DESCRIPTION (provided by applicant): Annotating the function of all genes in the human genome is a formidable task, and the biological community's collective progress to date represents only the earliest beginnings of this process. Of all the entries in the Entrez Gene database, almost 80% have five or fewer linked references in PubMed, and almost 50% have no linked references. Addressing this challenge requires not only continued effort, but also new models of functional annotation. Currently, the process of systematically annotating gene function primarily involves large-scale efforts by the model organism community and genome annotation centers. These annotation pipelines typically utilize a staff of curators to manually or semi-manually review the biomedical literature. Although well-trained and productive, the curation community is small relative to the scale of knowledge being produced, resulting in a gap between curated data and published knowledge. This proposal describes an effort called the Gene Wiki, an initiative designed to apply the concept of "community intelligence" to gene annotation. The Gene Wiki invites and empowers the entire community to participate directly in the gene annotation process. The resulting community-reviewed gene-specific review articles serve as a complementary resource to the traditional curator-reviewed databases. The pilot project creating the Gene Wiki was quite successful, attracting a critical mass of readers, editors, and content. This proposal extends the Gene Wiki along three specific aims. First, new content will be added to make the Gene Wiki pages more information-rich, and two mechanisms for updating content will be created to ensure that the Gene Wiki stays timely. These steps will ensure that the critical mass of users will be maintained and enlarged in the future. Second, the Gene Wiki will be integrated with WikiTrust, a system that enables readers to quickly and visually evaluate the trustworthiness of Gene Wiki content. These reliability metrics will be based on systematic analysis of the editing history of each Gene Wiki article. Third, the unstructured text in the Gene Wiki will be translated to structured knowledge for downstream data mining. This aim will be achieved by collaborating with the traditional curator community and with the biomedical ontology community. Successful completion of these three specific aims will greatly enhance the utility of the Gene Wiki to the scientific community, and also serve as an illustration of the power of community intelligence applied to biomedical research. PUBLIC HEALTH RELEVANCE: The Gene Wiki is an initiative to adapt the principle of community intelligence to the goal of understanding the function of human genes. Successful completion of this work will result in a more complete and up-to-date understanding of how specific genes affect biological systems and human health.
|
1 |
2013 — 2016 |
Su, Andrew I |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Biogps:a Crowdsourced Portal For Gene-Centric Online Resources @ Scripps Research Institute
DESCRIPTION (provided by applicant): This proposal describes the continued maintenance and development of BioGPS (http://biogps.org), a gene annotation portal based on the principle of crowdsourcing. Genome-scale science is becoming increasingly common for performing unbiased surveys of gene function. Technologies exist for high-throughput interrogation of genetic variation, gene expression, protein expression, protein modifications, epigenetic variation, and other molecular features. Using these approaches, scientists can rapidly identify a set of candidate genes that are relevant to their biological system of interest. However, understanding current knowledge of those genes remains a significant challenge. There are hundreds if not thousands of online sites with gene annotation information, all having some partially overlapping subset of information relative to the other sites. BioGPS was created to simply navigating the landscape of gene annotation resources. BioGPS does this by promoting two key principles - community extensibility and user customizability. Community extensibility means that we empower any user in the BioGPS user community to add new content to BioGPS. User customizability means that we allow each individual user to tailor BioGPS to their individual needs. Building on these principles, BioGPS has evolved into a highly functional gene portal that is widely used in the genetics and genomics community. This proposal builds on the previous project period's success. As a web application built on crowdsourcing, this proposal focuses on two mechanisms for continuing growth and positively impacting biomedical research. First, we will pursue several strategies for better attracting new users and retaining existing users. These strategies include implementing new features to improve usability, developing focused outreach efforts, and introducing social networking dynamics. The second strategy pursued in this proposal is to add new mechanisms for users to contribute to the BioGPS community. Specifically, we will create a new set of features around analyzing and sharing gene lists to complement our existing emphasis on single genes. BioGPS is a useful tool for biomedical research. Successful completion of this proposal will result in continued growth of both the user community and the feature set available through BioGPS.
|
1 |
2014 — 2017 |
Lindsey, Merry L Ping, Peipei [⬀] Su, Andrew I Watson, Karol E |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
A Community Effort to Translate Protein Data to Knowledge: An Integrated Platform @ University of California Los Angeles
DESCRIPTION (provided by applicant): The inception of the BD2K Initiative is a testament to the foresight of NIH and our community. Clearly, the future of biomedicine rests on our collective ability to transform Big Data into intelligible scientific facts. In line with the BD2K objectives,our goal is to revolutionize how we address the universal challenge to discern meaning from unruly data. Capitalizing on our investigators' complementary strengths in computational biology and cardiovascular medicine, we will present a fusion of cutting-edge innovations that are grounded in a cardiovascular research focus, encompassing: (i) on-the-cloud data processing, (ii) crowd sourcing and text-mining data annotation, (iii) protein spatiotemporal dynamics, (iv) multi-omic integration, and (v) multiscale clinical data modeling. Drawing from our decade of experience in creating and refining bioinformatics tools, we propose to amalgamate established Big Data resources into a generalizable model for data annotation and collaborative research, through a new query system and cloud infrastructure for accessing multiple omics repositories, and through computational-supported crowdsourcing initiatives for mining the biomedical literature. We propose to interweave diverse data types for revealing biological networks that coalesce from molecular entities at multiple scales, through machine learning methods for structuring molecular data and defining relationships with drugs and diseases, and through novel algorithms for on-the-cloud integration and pathway visualization of multi-dimensional molecular data. Moreover, we propose to innovate advanced modeling tools to resolve protein dynamics and spatiotemporal molecular mechanisms, through mechanistic modeling of protein properties and 3D protein expression maps, and through Bayesian algorithms that correlate patient phenotypes, health histories, and multi-scale molecular profiles. The utility and customizability o our tools to the broader research population is clearly demonstrated using three archetypical workflows that enable annotations of large lists of genes, transcripts, proteins, or metabolites; powerful analysis of complex protein datasets acquired over time; and seamless aQoregation of diverse molecular, textual and literature data. These workflows will be rigorously validated using data from two significant clinical cohorts, the Jackson Heart Study and the Healthy Elderly Longevity (Wellderly). In parallel, a multifaceted strategy will be implemented to educate and train biomedical investigators, and to engage the public for promoting the overall BD2K initiative. We are convinced that a community-driven BD2K initiative will best realize its scientific potential and transform the research culture in a sustainable manner, exhibiting lasting success beyond the current funding period.
|
0.901 |
2014 — 2017 |
Lindsey, Merry L Ping, Peipei [⬀] Su, Andrew I Watson, Karol E |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
Administrative Core @ University of California Los Angeles
Highly synergistic collaborative efforts among determined, skilled, and professional investigators often produce novel insights greater than those that could have been generated by a single effort. It is in this regard that the Administration Component will work fervently to foster collaboration and promote multi-institutional productivity through providing integral organization, management, and support of the proposed multilayered BD2K Initiative Center. Specifically, the Administration Component serves a fundamental role in the success of the three main Components of the Center, namely. Data Science Research (DSR), Training, and Consortium Activities, as well as in collaboration among all BD2K Centers, the NIH, and the greater biomedical scientific community. The Administration Component aims to support the mission of the NIH BD2K Initiative by providing administrative assistance with the following Specific Aims; In our capacity within our own Center, we plan: 1a. To support the Center and the mission of the NIH BD2K Initiative in providing innovative research and effective technological engineering of software and computational tools, with a central theme for the organization, management, and processing of Big Data for biomedical research. 1b. To coordinate meetings, workshops, and seminars in an effort to promote collaboration among the various components of the Center and to nurture productivity across Components. 1c. To manage the daily activities of the Center including budgeting, travel reservations, website maintenance, and other critical administrative tasks. In our activities beyond our own Center, we aim: 2a. To promote the dissemination of software, tools, and knowledge by generating a more profound interest in Big Data analysis, modeling, and literacy thereby eliciting a substantial impact on the scientific community. 2b. To foster a working relationship between the community and scientific researchers with bioinformatics knowledge and tools through the effective dissemination of data and products generated by the Center, thus propelling innovations and advancement in Big Data research. Under the leadership of N|H, we aim to support the scientific and educational goals of the NIH BD2K Initiative in overcoming challenges and revitalizing the integration of Big Data Science and biomedical research.
|
0.901 |
2014 — 2017 |
Lindsey, Merry L Ping, Peipei [⬀] Su, Andrew I Watson, Karol E |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
Bd2k Consortium Activities @ University of California Los Angeles
The ultimate goal of the BD2K Initiative and therefore, of each BD2K Center, is to enable the biomedical research community to use the various types of Big Data for research. Inherent in the success of each BD2K Center is not only the engineering of novel tools and platforms for handling and analyzing biomedical Big Data, but also the utilization of the diverse expertise and specialties of each Center, thereby connecting with the scientific community as well as the general public to disseminate and build enthusiasm for Big Data research. This concept underscores the supreme necessity for a BD2K Center Consortium that is united under one mission with global influence. The joint-efforts by all the NIH BD2K Centers as a Consortium will synergistically empower the entire community. We envision that these efforts may be collaboratively organized to gain both a broad spectrum of contemporary data science software tools for addressing targeted challenges in biomedical research, and a collection of training resources for fulfilling the educational needs at multiple levels. Our Center will completely serve and support the NIH BD2K Initiative. Accordingly, we will structure our BD2K Center Consortium Activities to achieve four specific aims: 1) Our Center will fully abide the governance of the BD2K Center Consortium through the leadership of Steering Committee (SC), the NIH BD2K Project Team (BPT), and recommendations from the Independent Experts Committee (lEC). We will actively collaborate, organize and participate in all BD2K Center Consortium meetings, SC meetings, and visiting other Centers as instructed by NIH; 2) Our Center will serve the BD2K Center Consortium by assisting the NIH BPT in establishing policies/guidelines to transform the current research culture, and in encouraging Big Data standardization to facilitate data sharing and interoperability in biomedical research; 3) Our Center will proactively collaborate with other NIH BD2K Centers by synergizing workforces and resources, supporting the development of DSR and Training components in the broad BD2K Consortium; and 4) Our Center will unreservedly commit to foster a continuous public recognition and endowment in data science, ensuring an exuberant vitality of data science in biomedical research, rendering the BD2K Initiative a sustainable life beyond the proposed NIH funding period.
|
0.901 |
2014 — 2017 |
Lindsey, Merry L Ping, Peipei [⬀] Su, Andrew I Watson, Karol E |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
Data Science Research @ University of California Los Angeles
A critical challenge in Big Data science is the overall lack of data ahalysis platforms available for transforming Big Data into biological knowledge. To address this challenge, we propose a set of interconnected computational tools capable of organizing and analyzing heterogeneous data to support combined inquiries and to de-convolute complex relationships embedded within large-scale data. We demonstrate its utility with a cardiovascular-centric platform that is easily generalizable to similar efforts in other disciplines. Our Center has designed a federated data architecture of existing resources substantiated by a solid and growing user base, and innovations to elevate functionality. Novel crowdsourcing and text-mining methods will extract the wealth of untapped knowledge embedded in biomedical literature, and novel in-depth proteomics analytical tools will unprecedentedly elucidate dynamic protein features. A key strength of our platform will be the rigorous validation using clinical data from Jackson Heart Study and the Healthy Elderly Active Longevity (HEAL; Wellderly) cohorts. Our proposal includes nine scientific aims that address three main focus areas: (i) we will build a new model platform that amalgamates community-supported Big Data resources, enabling data annotations and collaborative analyses; (ii) we will integrate molecular data with drug and disease information, both structured and unstructured, for knowledge aggregation, and (iii) we will create on-the-cloud analytical and modeling tools to power in-depth protein discoveries. Specifically, we will create a novel distributed query system and cloud-based infrastructure that is capable of providing unified access to multi-omics datasets; we will develop computational and crowdsourcing methods to systematically define relationships between genes, proteins, diseases, and drugs from the literature, emphasizing cardiovascular medicine; we will rally community participation and promote awareness of collaborative research through outreach and educational games; we will create a platform to analyze and visualize multi-scale pathway models of genes, proteins, and metabolites; we will develop tools and algorithms to mechanistically model spatiotemporal protein networks in organelles and to. predict higher physiological phenotypes; and we will correlate individual phenotypes, health histories, and multi-scale molecular profiles to examine cardiovascular disease mechanisms. These tools will be implemented, delivered, and executed on the cloud infrastructure to minimize the computational power required of users.
|
0.901 |
2014 — 2017 |
Su, Andrew I |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Gene Wiki: Expanding the Ecosystem of Community-Intelligence Resources @ Scripps Research Institute
DESCRIPTION (provided by applicant): This proposal describes the continued maintenance and development of the Gene Wiki, the goal of which is to create a continuously-updated, community-reviewed, and collaboratively-written review article for every human gene. The Gene Wiki was created directly within Wikipedia as an informal collection of 10,646 gene-specific articles. In the first funding period, the infrastructure to keep Gene Wiki infoboxes in sync wit the source databases in the genomics community was developed. Next, methods to assess and quantify the trustworthiness of each word of a Gene Wiki article were developed and implemented in a system called WikiTrust. And finally, simple text-mining applied to Gene Wiki was able to identify thousands of novel gene annotations. During the next project period, the Gene Wiki is poised to make further strides. First, the scope of the Gene Wiki will be expanded to also include review articles on diseases and drugs. Thousands of articles will either be created or maintained through this initiative with a particular emphasis on rare diseases. Second, a dedicated outreach component will ensure that the community of editors is poised to grow. This outreach effort will engage both faculty members who are experts on specific genes of interest, as well as classroom instructors at all levels who want to design curriculum based on the Gene Wiki for a class project. Third, a Centralized Model Organism Database will be constructed in the Wikidata environment, which will serve as a clearing house of microbial gene and genome annotation data. And fourth an entirely new crowdsourcing application will be created that taps into crowds of patient-aligned individuals and their desire to advance research. These individuals will be a novel crowd that will be applied to the challenge of systematically annotating the biomedical literature. In summary, the Gene Wiki is a useful tool for biomedical research. Successful completion of this proposal will result in more efficient knowledge management and dissemination through crowdsourcing.
|
1 |
2014 — 2017 |
Lindsey, Merry L Ping, Peipei [⬀] Su, Andrew I Watson, Karol E |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
Training @ University of California Los Angeles
Mounting amounts of diverse biomedical data have been generated. Extracting meaningful information from these datasets has relied on the efforts of informaticians, who are extensively trained in the computer science realm, with little to no training in biology. Similarly, biologists in general are not proficient to analyze, annotate, and translate their large datasets into valuable biomedical insights. In addition, there has been an overall lack of public understanding for the importance of Big Data science, hindering the enthusiasm to advance data science in the biomedical field. To bridge the gaps that exist among data generation, interpretation and awareness, our training program will provide critical data science education to current biomedical researchers, expand the data science workforce in the biomedical field, and elicit a broad public recognition of data science. Accordingly, we have engineered an integrated training program with four specific aims: 1) To empower current biomedical researchers with the ability to manage and interpret Big Data by gaining proficiency in utilizing data science software tools; 2) To utilize the training component as an interactive testing field for software packages developed by the Data Science Research (DSR) component. User critiques/feedback will refine and transform software tools to a professional grade, facilitating the community to capture the full value of Big Data; 3) To cultivate a new generation of developers with transdisciplinary expertise in both computational biology and biomedical informatics; and 4) To heighten public awareness of and enthusiasm for the substantial opportunities embedded within computational biology, which has the potential to transform biomedical research and medicine. To achieve these aims, we have constructed three trainee-oriented modules: Biomedical Researcher /User-Oriented Module, Big Data Science Researcher-Oriented Module, and General Public-Oriented Module. A trans-institutional collaboration has been organized (i.e., UCLA, TSRI, UMMC, and EMBL-EBI), and all components have demonstrated outstanding track records in education. This collaboration will ensure successful execution of the training component substantiated by distinguished experts and meritorious educators from a wide breadth of disciplines, spanning -omics, bioinformatics, and computational science.
|
0.901 |
2018 — 2020 |
Su, Andrew I Wu, Chunlei [⬀] |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Biogps, Biothings and Bioreel: Illuminating Dark Data For Biomedical Research @ Scripps Research Institute
The overall goal of this project is to promote the accessibility and dissemination of biomedical information so that the research community can better leverage existing knowledge. Science is most efficient when hypotheses are based on the entirety of knowledge available to date. Unfortunately, up-to-date and comprehensive access to relevant knowledge is rarely achieved. This proposals put a particular emphasis on illuminating biomedical ?dark data.? By analogy to the dark matter that is unaccounted for in the universe, dark data is defined by being unseen or underutilized by the scientific community. In this project, we will continuously strengthen our currently widely- used applications BioGPS and MyGene.info, and also develop two new applications: BioThings and BioReel. These applications, collectively, are targeted to make dark data resources Findable, Accessible, Interoperable, and Reusable (FAIR). BioGPS and BioReel are designed for non-computational scientists. BioGPS (http://biogps.org) is a gene portal for aggregating information on human genes and proteins. It illuminates dark data by creating a simple platform to discover and access gene-centric websites. BioGPS users can benefit each other by sharing the specific resources they discovered, and how they use or like them. BioReel will be developed as a tool to periodically monitor the relevant resources for researchers, and keep them notified when the knowledge about their genes of interest have been updated (e.g. new datasets available, annotated in a new pathway). MyGene.info and BioThings are designed for bioinformatics developers, who often face fragmented source data in terms of both the content and the heterogeneous formats. The significant amount of repetitive data-wrangling efforts has to be done by almost every bioinformaticians. We developed MyGene.info to integrate gene and protein annotation data into a simple and high performance web Application Programming Interface (API). It illuminates dark data on gene and protein annotations by pre-integrating over 200 annotation types in a standardized format. In this proposal, we will continue expand MyGene.info to include additional highly- requested annotations, both from a major data repository and smaller domain-specific data sources. In addition, we will generalize the infrastructure and the software pattern underlying the MyGene.info project, to make a generic API framework called the ?BioThings SDK?. Two new APIs will be built using this framework, focusing on drugs/chemicals and diseases respectively, where the data fragmentation across resources are equally a problem.
|
1 |
2018 — 2021 |
Su, Andrew I |
U19Activity Code Description: To support a research program of multiple projects directed toward a specific major objective, basic theme or program goal, requiring a broadly based, multidisciplinary and often long-term approach. A cooperative agreement research program generally involves the organized efforts of large groups, members of which are conducting research projects designed to elucidate the various aspects of a specific objective. Substantial Federal programmatic staff involvement is intended to assist investigators during performance of the research activities, as defined in the terms and conditions of award. The investigators have primary authorities and responsibilities to define research objectives and approaches, and to plan, conduct, analyze, and publish results, interpretations and conclusions of their studies. Each research project is usually under the leadership of an established investigator in an area representing his/her special interest and competencies. Each project supported through this mechanism should contribute to or be directly related to the common theme of the total research effort. The award can provide support for certain basic shared resources, including clinical components, which facilitate the total research effort. These scientifically meritorious projects should demonstrate an essential element of unity and interdependence. |
Consortium For Viral Systems Biology Data Management and Bioinformatics Core @ Scripps Research Institute
Project Summary / Abstract Our Center seeks to use high-throughput pro?ling technologies to develop predictive models of Lassa fever and Ebola virus disease at a systems biology level. The success of this mission is dependent on the unique cohorts and innovative pro?ling methods, the combination of which will result in a unique and powerful data set. In this context, the overall mission of the Data Management and Bioinformatics Core (DMBC) is to ensure that the Center utilizes best practices for data provenance, analysis, management, and dissemination throughout the data lifecycle. To accomplish this mission, we will promote the robust collection and analysis of the primary data, develop a framework for reproducible work?ows that can be used across the Center, and maximize dissemination and reuse of Center-generated data and tools. Our mission will be accomplished through the completion of three Speci?c Aims. Aim 1 focuses on accurately capturing clinical and laboratory data, and then reproducibly processing them using robust work?ow tools. This aim describes a key, foundational function that is critical for the success of the Center. Aim 2 focuses on dissemination of Center-generated resources according to FAIR principles - Findable, Accessible, Interoperable, and Reusable. This work will ensure that the impact of our Center extends beyond our team?s stated analysis plan, and it leverages our extensive work building tools for e?ective data and software dissemination. Aim 3 focuses on directly engaging the community in collaborations with a variety of groups with complementary skills and assets. This work leverages the principle that collaborative platforms enable discoveries that would not be possible through individual investigations. Overall, we believe that this plan for the DMBC o?ers a broad and solid foundation on which our Center?s data generation, data modeling, and biological discovery activities can build.
|
1 |
2018 — 2020 |
Su, Andrew I |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Gene Wiki: a Community-Maintained Knowledge Base of Biomedical Information @ Scripps Research Institute
ABSTRACT The biomedical research enterprise is incredibly productive, generating new knowledge at an unprecedented pace. However, as a community, we do a relatively poor job organizing and managing that knowledge so that it is maximally useful for the design and interpretation of other experiments. Scientific research is most efficient when new hypotheses are informed by the totality of past findings, and that scientific knowledge is Findable, Accessible, Interoperable, and Reusable (FAIR). Unfortunately the vast majority of research is published only in free-text, unstructured journal articles, rendering the findings very difficult to integrate and compute upon. This proposal describes the use of crowdsourcing to address this challenge in biomedical knowledge management. It specifically proposes to leverage Wikidata, which has the goal of creating a comprehensive knowledge base that both humans and computers can both read and edit. Wikidata is run by the same organization that runs Wikipedia, and like its sister project, it employs the principle of crowdsourcing to tackle a grand challenge in information management. Both Wikipedia and Wikidata invite and empower the community at large to collaboratively add, edit, and refine content. In this proposal, we continue our work to create the world's largest open and FAIR knowledge base of biomedical information within Wikidata. This proposal include three Specific Aims. First, we will improve both the quantity and quality of biomedical information in Wikidata. Quantity will be increased by loading several key biomedical vocabularies and ontologies, and data quality will be made more rigorous by the introduction of formal and computable data models. Second, we will facilitate and incentivize contributions of data by third- party data contributors. This Aim will be achieved by extending our python programming library for reading from and writing to Wikidata, and by creating automated reports that notify resource providers when new relevant content is added or edited. Third, we will also seek to encourage contributions from domain experts using targeted incentives. Specifically, this aim will develop interfaces to Wikidata that provide integrated data reports that are otherwise unavailable, as well as extend the Gene Wiki Reviews series of invited reviews, which rewards contributions with traditional metrics of academic achievement. Finally, underlying these three Specific Aims will be a Driving Biological Project focusing in infectious disease research, which will ensure the tools and resources developed will have practical benefit to discovery-oriented research projects.
|
1 |
2020 |
Su, Andrew I |
OT2Activity Code Description: A single-component research award that is not a grant, cooperative agreement or contract using Other Transaction Authorities |
Biothings Explorer: a Platform For Distributed Knowledge Integration Across Biomedical Apis @ Scripps Research Institute |
1 |
2020 — 2021 |
Su, Andrew I |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Compound Repositioning For Alzheimer's Disease Using Knowledge Graphs, Insurance Claims Data, and Gene Expression Complementarity @ Scripps Research Institute
PROJECT SUMMARY This proposal focuses on the challenge of identifying drug repositioning candidates for Alzheimer?s Disease. The foundation of this work is the ReFRAME library, a set of ~13,000 compounds that includes nearly all small molecules that have been FDA-approved, reached clinical development, or undergone significant preclinical profiling. The ReFRAME library is being actively screened against a diverse cross-section of in vitro assays. This proposal pursues three distinct strategies for identifying repositioning candidates among the ReFRAME collection. First, we will create and mine a large and heterogeneous biomedical knowledge graph. We will use machine learning methods to identify repositioning candidates based on properties of the knowledge graph surrounding and joining each drug and disease. Second, we will mine a massive data set of insurance claims data for associations between drug use and the incidence or severity of Alzheimer?s Disease. Containing almost 7 billion medical claims and over 2 billion pharmacy claims, this data set represents the largest source of claims data available. Third, we will use concept of gene expression complementarity to identify repositioning candidates. We will generate a gene expression signature for every ReFRAME compound in three cell lines relevant to Alzheimer?s Disease, and we will screen for compounds that produce a signature that appear to reverse gene expression changes seen in Alzheimer?s Disease. After assembling repositioning candidates identified through all three of these methods, we will prioritize up to 100 compounds (or compound combinations) for further characterization and validation. These follow-up experiments will initially investigate the activity of these compounds in five cell-based assays to establish a mechanistic hypothesis on their mechanism of action in Alzheimer?s Disease. Secondary follow-up experiments may include validation in some combination of in vitro (including hiPSC-derived cerebrocortical neurons and/or organoids) and in vivo systems. We believe that the multifaceted approach described in this proposal offers the best possible chance at successfully identifying AD repositioning candidates. Moreover, this work will create methods and resources that will be useful to the broader scientific community, both for Alzheimer?s Disease and for other disease areas.
|
1 |