2011 — 2015 |
Brown, Steve Flicek, Paul Mallon, Ann-Marie Parkinson, Helen Elizabeth |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
Mouse Phenotyping Informatics Infrastruture - Mp12 @ European Molecular Biology Laboratory
DESCRIPTION (provided by applicant): The K0MP2 data are a critical resource for biomedical research, will inform human disease studies, be integrated with existing resources and will be accessed, analyzed and mined by mouse biologists, translational researchers, clinicians and wider biomedical community. The MIP2 project will support the K0MP2 project by providing the Data Coordination Centre which will process the complex phenotypic data provide access via a web portal. The DCC aims are to collect and store valid data as it appears throughout the project, to provide unified access to these data for specialist and non specialist users via the web and programmatically, and to support complex queries and statistical analyses. The project has several distinct components and tasks: The Pheno-DCC will validate, perform quality control and manage data acquired dynamically from centres. This will ensure data are robust and allow progress tracking of data at all stages of processing. Specialist data wranglers will manage this process and will interact with the users of the data to ensure user interfaces support the needs of the varied community who will access these data. The statistical and annotation pipelines and environment which will analyze raw data and summarize data for each mutant and assay for presentation to users. A supporting core database, the Core Data Archive, which will store all project data, provide programmatic access, push data to external resources such as NCBI and the Jackson Laboratory and critically provide data for the user interfaces. A single point of entry web portal hosted at www.knockoutmouse.org which will present data to users. I, integrate data from parallel mouse phenotyping projects and provide access to a statistical analysis and query environment Integrate K0MP2 data via biomedical databases at the EBI to ensure that the data are widely distributed to mouse biologists and other scientists .
|
0.927 |
2016 — 2021 |
Brown, Steve Flicek, Paul Mallon, Ann-Marie Parkinson, Helen Elizabeth Smedley, Damian |
UM1Activity Code Description: To support cooperative agreements involving large-scale research activities with complicated structures that cannot be appropriately categorized into an available single component activity code, e.g. clinical networks, research programs or consortium. The components represent a variety of supporting functions and are not independent of each component. Substantial federal programmatic staff involvement is intended to assist investigators during performance of the research activities, as defined in the terms and conditions of the award. The performance period may extend up to seven years but only through the established deviation request process. ICs desiring to use this activity code for programs greater than 5 years must receive OPERA prior approval through the deviation request process. |
Mouse Phenotyping Informatics Infrastructure - Mp12 @ European Molecular Biology Laboratory
Project summary The Knockout Mouse Phenotyping Project (KOMP) is a critical resource for biomedical research that provides unbiased gene to phenotype associations from genes with little or no-known function, supplying strains for follow-up mechanistic studies and integration across resources to provide new systematic insights into the underlying causes of rare and common disease. The MPI2 Consortium will continue to support KOMP2 and IMPC partners by providing data acquisition, analysis, visualisation, quality control and integration of this valuable dataset. Specifically The MPI2 partners will enhance the Mouse Phenotype portal improving the site layout and navigation for users and making improvements to gene, disease, publication, expression data pages making data more quickly accessible. We will containerise of the infrastructure underlying the web portal. these changes will ease deployment and therefore community re-use of the portal.
|
0.927 |
2017 — 2021 |
Cunningham, Fiona (co-PI) [⬀] Flicek, Paul Parkinson, Helen Elizabeth |
U41Activity Code Description: To support biotechnology resources available to all qualified investigators without regard to the scientific disciplines or disease orientations of their research activities or specifically directed to a categorical program area. |
Establishing the Gwas Catalog as a Resource For Large-Scale Association Studies @ European Molecular Biology Laboratory
The GWAS Catalog?s objective is to summarise GWAS data acquired from scientific publications, and to give the results structure, in order to summarize research findings to a broad scientific community. The Catalog is used by a growing user community of biologists and bioinformaticians worldwide. Over the next five years, the Catalog will continue to provide the most thoroughly curated resource for human variation data, by engaging journals in data recruitment, and by allowing co-submission/data transfer from other resources like dbGAP and the EGA. In order to underpin the Catalog?s relevance, a multi-stranded approach combining data generation, infrastructure development and liaison with the Catalog?s user community will be adopted. The first Aim for the next five years is for the Catalog to continue to deliver the Catalog as a community resource with high quality content. The curation system will evolve from manual curation, towards identification of data for automated extraction and review of submitted metadata, supporting author deposition, and the development of supporting QC processes. In Aim 2, the scope of the Catalog will be broadened to include new GWAS study designs, additional associated data, and emerging technologies. The Catalog?s eligibility criteria will ensure alignment with current research and the needs of the user community, but will be monitored and re-evaluated as needed. Building on previous pilots, the focus of Aim 2 will be on the inclusion of targeted array data and other genotyping methods, such as sequencing or imputation from family members. In Aim 3, the Catalog will be delivered as a scalable and sustainable resource for the future, which will allow for an extended scope of data. The development and promotion of standard formats for GWAS study design and results will be critical to ensure an efficient process for incorporating data into the Catalog. Authors will be encouraged to submit all SNP-trait associations, irrespective of p-value: this will vastly expand the depth of data available, and the utility of the Catalog. The manual curation system will be re-developed, with process automation to increase curator efficiency. Curation resources will be allocated in order to prioritise studies with the highest utility, therefore expediting the publication of these data in the Catalog. Finally, the Catalog?s resources, interfaces, and data access will be improved for all researchers by enhancing data representation, the search functionality, data visualization and integration with data from other relevant resources. User needs will be identified through surveys, and combined with feedback from other communication routes; existing data curation processes will then be modified to improve data representation, visualization, access and versatility. The continuation of the Catalog, as the main resource for data published on diseases with complex genetic traits, is of crucial importance for the biomedical research community, as a more efficient and effective way to better understand and to prevent, or cure, diseases like cardiovascular conditions, cancer and diabetes.
|
0.927 |
2017 — 2020 |
Flicek, Paul |
U41Activity Code Description: To support biotechnology resources available to all qualified investigators without regard to the scientific disciplines or disease orientations of their research activities or specifically directed to a categorical program area. |
Gencode Management, Dissemination and Training @ European Molecular Biology Laboratory
MANAGEMENT, DISSEMINATION AND TRAINING - PROJECT SUMMARY The management, dissemination and training planned for the next four years of GENCODE will largely continue the practices found to be successful in the current project. The major difference for this proposal is the transition of the project to EMBL-EBI. The date of final transition for the WTSI HAVANA team is set for 1 April 2017, coincident with the start of the proposed funding from this application, and will follow an orderly process informed by previous resource transitions from WTSI to EMBL-EBI. As the technical transition proceeds, we will move computational and software infrastructure from WTSI to EMBL-EBI through 2016 and early 2017, and provide HAVANA staff with guest logins to for testing before transition. The staff and physical transition will occur in 2017. All HAVANA staff moving to EMBL-EBI will be offered EMBL contracts with a start date of 1 April 2017. In terms of project management, the Scientific Advisory Board (SAB) will continue to provide advice on all aspects of the project, including progress, priorities, new technologies, operational processes of the consortium, and will also serve as representatives of the user community. The SAB will meet annually. A Research Support Officer will be deployed at 0.5 FTE on the project to organize the annual SAB meetings, and also to ensure regular meetings and teleconferences between the partners, as well as regular updates and reports. In terms of access, the major aims for GENCODE are regular releases of annotation; maintenance and extension of the existing GENCODE portal website (http://www.gencodegenes.org) providing information to the research and clinical community working on human and mouse genetics and genomics; and the design and implementation of new web interfaces. Annotation will be released through the Ensembl and UCSC browsers, as well as via FTP and HTTP, and Track Hubs. The GENCODE portal, used for dedicated data download and specific project news, will be expanded to provide in-depth documentation and details of the processes used within the GENCODE consortium. To further aid data access and discovery via the Ensembl website, new views of visually highlighting the differences and similarities in annotation for different strains of mice will be developed. The training plans for the next phase of GENCODE include extensive workshops, on the use of GENCODE annotation, how to use GENCODE tools, and how to submit annotation to the public genomic archives. Support for users will also be provided remotely by an email Helpdesk, where queries will be tracked by Request Tracker.
|
0.927 |
2017 — 2020 |
Flicek, Paul |
U41Activity Code Description: To support biotechnology resources available to all qualified investigators without regard to the scientific disciplines or disease orientations of their research activities or specifically directed to a categorical program area. |
Gencode Resource Informatics @ European Molecular Biology Laboratory
RESOURCE INFORMATICS ? PROJECT SUMMARY The creation, advancement and maintenance of the GENCODE resource requires both adherence to and optimization of defined processes that ensure the genome annotation created now and in the future will always be of the same or better standard compared to what has already been created. The GENCODE resource must also be attuned to the new technologies and opportunities that arise as the field of genomics evolves. A primary objective of the GENCODE resource is to ensure quality control (QC) and data validation of annotations. Ensembl will compare the GENCODE gene set to other gene sets (e.g. UniProt) to check for missing genes or transcripts; CNIO will validate the coding genes; the CNIO/CNIC proteomics pipeline will validates the gene models; CNIO/CNIC will perform manual verification for QC of proteomics data. Project stability will be ensured through a well-maintained computational infrastructure, adequate QC processes that will ensure the highest possible quality, as well as regular releases of freely available annotation in high value formats. The annotation curation for human and mouse will be completed, in particular the existing human partial transcript models will be extended to full length, expanding the human lncRNA annotation, as well as the completion of the initial full pass of the mouse annotation. GENCODE will incorporate individual genome representation and population data represented by available human variation data at both the sequence level (e.g. 1000 Genomes) and at the transcriptomic level (e.g. GTEx), and by the 16 mouse strain genomes produced by the Mouse Genomes Project led by the WTSI. Data from individuals and populations will be annotated. A personal genome resource will be developed, which will produce an accurate representation of an individual's gene set. Two pilot projects will help to define the most effective way to support future GENCODE annotations. The first pilot project will use GENCODE's experience in developing population reference genome graphs to pilot a scalable and potentially universal approach to population based genome annotation. The second pilot project will focus on connecting regulatory regions to regulated genes. GENCODE will enhance the current annotation of genes with their regulatory elements so that the annotation is dependent on tissue and cell type. The demand for manual annotation of transcripts across strains and species may outstrip GENCODE's ability to provide such services via existing mechanisms, therefore a system to enable the submission of annotated data will be developed. The described measures will ensure that GENCODE in 2020 will be significantly more valuable for research and clinical applications in genomics than today.
|
0.927 |
2017 — 2020 |
Flicek, Paul |
U41Activity Code Description: To support biotechnology resources available to all qualified investigators without regard to the scientific disciplines or disease orientations of their research activities or specifically directed to a categorical program area. |
Gencode Resource Project @ European Molecular Biology Laboratory
RESOURCE PROJECT - PROJECT SUMMARY A comprehensive knowledge of the location, structure, and expression of genes in the human genome is central to the understanding of human biology and the mechanisms of disease. Similarly for mouse, a comprehensive high quality gene set will aid in the design of experiments, and the interpretation of the effects of gene knockouts and resulting phenotypes, and as a model for human disease will help inform human gene function. The GENCODE consortium has assembled a team of world experts in a variety of fields related to gene annotation to create and distribute this gold standard. GENCODE's wide expertise covers gene and transcript isoform identification, pseudogene evolution, sequence conservation, gene expression, proteomics and post-translational modifications, gene regulatory elements, development and maintenance of the infrastructure required to create genome annotation at scale, and demonstrated community engagement and leadership. The complete and accurate annotation of the human and mouse genomes is necessary as many of the protein-coding genes are still incomplete or misannotated. GENCODE also aims to include all non-protein-coding genes which remain poorly understood with many loci still missing. Beyond the coding and non-coding genes, GENCODE creates reference pseudogene annotation as recent studies indicate that pseudogenes can play key regulatory roles. The completion of the full first-pass manual annotation of the reference mouse genome assembly will therefore be one of the main objectives of GENCODE. Efforts are underway in the Genome Reference Consortium (GRC) to expand the definition of the reference human genome to include genomic sequence for all haplotypes and gene alleles. GRC have already committed to supporting the genomes of a collection of 16 representative mice strains, thus effectively replacing the linear genome with a ?graph-like? structure of 16 separate haplotypes. GENCODE already annotate the full reference genome for human and mouse, including all available alternate sequences. GENCODE will continue to provide annotation appropriate to these new genomic sequences. In addition to genomic mutations that impair gene product function, many phenotypes are caused or moderated by the regulation of gene products. Therefore, GENCODE's complete annotation of all transcript isoforms logically includes key regulatory regions that are fundamentally a part of each gene and GENCODE will pilot the annotation of these tissue-specific regions.
|
0.927 |
2017 — 2020 |
Flicek, Paul |
U41Activity Code Description: To support biotechnology resources available to all qualified investigators without regard to the scientific disciplines or disease orientations of their research activities or specifically directed to a categorical program area. |
Gencode: Comprehensive Genome Annotation For Human and Mouse @ European Molecular Biology Laboratory
OVERALL - PROJECT SUMMARY The objective of the GENCODE consortium is to create a foundational reference genome annotation, in which all gene features in the human and mouse genomes are identified and classified with high accuracy based on biological evidence, and then to release these annotations for the benefit of biomedical research and genome interpretation. GENCODE aims for a better understanding of a `normal' human genome; using genome sequences of the most commonly used mouse strains will facilitate the most effective use of these key models for large-scale knockout analysis and disease-specific research. To produce regular annotation releases of high accuracy, GENCODE will continue to follow its well-established and conservative research design, supplemented by targeted investigations into the value of new technologies, new data and new sources of evidence. GENCODE focuses on protein-coding and non-coding loci, including their alternatively spliced isoforms and pseudogenes. Over the course of this proposal GENCODE will follow major directions in genomics, including graph- based genome representations, long-read transcriptome sequencing, connecting genes and the associated regulatory regions that affect their transcription, and identifying genes that are not present on the current reference assembly. The GENCODE consortium has four fundamental components: (1) a comprehensive gene annotation pipeline leveraging manual annotation; (2) an integrated approach to pseudogene identification and classification; (3) a set of computational methods to evaluate and enhance gene annotation; and (4) complementary experimental pipelines for validation and functional annotation. More specifically, in the next four years GENCODE aims to (1) extend the human and mouse GENCODE gene sets to as near completion as possible given current experimental technology; (2) deploy population-based genome annotation to ensure that any transcript isoform expressed in an individual human will be present in the reference annotation set; (3) extend the gene annotation to include core regulatory regions and tissue-specific enhancers from selected datasets; (4) to distribute GENCODE annotations and engage with community annotation efforts. Current popular distribution channels for GENCODE data including the GENCODE web site, the Ensembl and UCSC Genome Browsers, will be maintained. Finally, new mechanisms for prioritizing genes for manual annotation with community input will be established, with the long-term aim of establishing GENCODE as the standard annotation set for research and clinical genomics efforts.
|
0.927 |
2019 |
Brown, Steve Flicek, Paul Mallon, Ann-Marie Meehan, Terrence Parkinson, Helen Elizabeth Smedley, Damian |
UM1Activity Code Description: To support cooperative agreements involving large-scale research activities with complicated structures that cannot be appropriately categorized into an available single component activity code, e.g. clinical networks, research programs or consortium. The components represent a variety of supporting functions and are not independent of each component. Substantial federal programmatic staff involvement is intended to assist investigators during performance of the research activities, as defined in the terms and conditions of the award. The performance period may extend up to seven years but only through the established deviation request process. ICs desiring to use this activity code for programs greater than 5 years must receive OPERA prior approval through the deviation request process. |
Komp2 Um1 Administrative Supplement Request to Support Early Stage Knockout Mouse Embryonic Phenotyping @ European Molecular Biology Laboratory
Project Summary - KOMP2 UM1 Administrative Supplement Request to support HaploEssential Early Stage Knockout Mouse Embryonic Phenotyping Essential genes, genes whose function is necessary for the basic function of life, are enriched for variants causing human developmental disorders. Mice with recessive null mutations of essential genes supports this hypothesis, with human variant candidate genes identified by rare disease genetic consortia being enriched for orthologous null mouse alleles that are lethal or subviable. In companion supplemental proposals, our KOMP- funded partners propose to extend the mouse phenotyping pipeline to the study of null alleles with dominant lethality (haploinsufficiency) by characterising pre-implantation embryos of up to 600 potential essential genes where one allele has been made non-functional using CRISPR-based methods. These embryos will be analysed preimplantation using time-lapse video technology with surviving embryos implanted and analysed via the current embryo phenotyping pipeline. This project will be of high value to clinical geneticists studying developmental disorders as it will help identify potential disease associated genes in days rather than months, which will aid speedy diagnosis. The project will also identify genetic features that characterise haplo-essential genes leading to better predictive tools and deepening our understanding of how early embryonic developmental processes manifest in disease. To support our KOMP partners and ensure the results from pre-implantation and early stage embryo phenotyping of haplo-essential genes are widely disseminated, we will perform the following activities: · Develop and implement bioinformatic approaches to identify candidate haploinsufficient genes using integrated analysis of human intolerance to loss of function scores, human de novo mutations associated with rare disease, cellular gene essentiality studies, and model organism data. · Coordinate activity across the KOMP partners by extending our production tracking database to capture milestones in pre-implantation studies and disseminate updates in real time. · Standardize the methods used in pre-implantation embryo phenotyping to by collaborating with the KOMP partners to develop new SOPs and define the necessary QC methods to ensure the data is of high quality. · Extend existing data flow pipelines to capture image and video data produced by the KOMP partners as well as post-implantation studies on different genetic backgrounds. · Freely disseminate early stage embryo phenotyping data and essential gene lists via a dedicated component of the mousephenotype.org that will allow specialised search and host dedicated visualisation tools. · Host a Data Analysis Workshop to review results, organise writing for a consortium publication and develop an outreach strategy to maximize the impact of the project.
|
0.927 |
2020 — 2021 |
Flicek, Paul Haussler, David H (co-PI) [⬀] Howe, Kevin (co-PI) [⬀] Paten, Benedict [⬀] |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Enabling Comparative Pangenomics @ University of California Santa Cruz
Project Summary: Enabling Comparative Pangenomics To many in the field, it is clear that we are moving rapidly toward a golden age of vertebrate comparative genomics in which thousands of high quality genomes of different species are publicly available and used in understanding the human genome. Despite the opportunity presented by the growth in available genomes, there has been relative stagnation in the software used to compare complete genomes, most of the software developed being old and limited in capabilities. To remedy this situation, we will create a hardened toolkit for genome comparison and annotation that can be robustly applied to thousands of vertebrate genomes. To demonstrate this toolkit and deliver its results to the broader genomics community, we will apply it to create a resource within the existing UCSC and Ensembl Genome Browsers that will incorporate thousands of vertebrate genomes. Large, well organized consortia have coalesced to take on the challenge of sequencing and assembling vertebrate genomes. Our alignments will form a backbone of these projects? analysis, and our synthesis of their data will create a resource that is much greater than the sum of what might otherwise be a series of smaller, fragmented and not directly comparable efforts. We will gather together more than 600 vertebrate genomes into our proposed resource in the first year of the proposal, rapidly delivering results. Paralleling the growth in available reference genomes, the last decade has been marked by an explosion in population sequencing projects. Although much of the cataloged human variation has a very recent evolutionary origin, there is a tremendous opportunity to combine and so better understand intra- and inter- species change using models from population genetics. We will create pangenome software to (i) avoid reference bias in species comparisons (i.e. avoiding assumptions about which alleles are fixed when comparing between species, which is important in quasi-species such as cichlids), (ii) allow ancestral alleles to be comprehensively estimated, including those that are part of structural variation, and (iii) more easily enable the study of balancing selection. To demonstrate the utility of comprehensive variation integration we will create a prototype of a pan-genome for the apes. We will use this graph to identify ancestral alleles and to dynamically convert annotations between species and assembly versions, and, via population mapping experiments, we will demonstrate its power for typing segregating but ancient variation. Using knowledge of ape evolution, we will ultimately extend this graph to adequately model the most complex regions of the human genome.
|
0.916 |
2021 |
Brown, Steve Flicek, Paul Mallon, Ann-Marie Parkinson, Helen Elizabeth Smedley, Damian |
UM1Activity Code Description: To support cooperative agreements involving large-scale research activities with complicated structures that cannot be appropriately categorized into an available single component activity code, e.g. clinical networks, research programs or consortium. The components represent a variety of supporting functions and are not independent of each component. Substantial federal programmatic staff involvement is intended to assist investigators during performance of the research activities, as defined in the terms and conditions of the award. The performance period may extend up to seven years but only through the established deviation request process. ICs desiring to use this activity code for programs greater than 5 years must receive OPERA prior approval through the deviation request process. |
Komp2 Um1 Administrative Supplement Request to Support Mouse Phenotyping Informatics Infrastructure - Mpi2 @ European Molecular Biology Laboratory
PROJECT SUMMARY The Knockout Mouse Phenotyping Project (KOMP) is a critical resource for biomedical research that provides unbiased gene to phenotype associations from genes with little or no-known function, supplying strains for follow-up mechanistic studies and integration across resources to provide new systematic insights into the underlying causes of rare and common disease. The MPI2 Consortium will continue to support KOMP2 and IMPC partners by providing data acquisition, analysis, visualisation, quality control and integration of this valuable dataset. Specifically ? The DCC will develop standardized protocols for new KOMP2 phenotyping tests and continue to support and enhance data upload mechanisms for the KOMP2 production and phenotyping centers. Specialist data wranglers will continue to perform quality control and interact with data submitters through the QC interface platform to address issues and will work with MPI2 developers to extend automated QC tools. Preliminary statistical analysis will be performed after data validation to quickly inform users of potentially interesting strains. ? The statistical analysis and annotation pipelines will be maintained and extended to include new tests such as aging studies. The phenotype comparisons to identify candidate disease models will be enhanced by including new disease populations and more extensive semantic mappings between phenotype ontologies. ? The Core Data Archive will continue to store all raw data and its analysis, provide programmatic access, push data to new resources such as NCBI, and integrate KOMP2 data with other EBI resources such as Reactome Pathways and the Expression Atlas ? The MPI2 partners will continue to enhance the single point of access www.mousephenotype.org portal, programmatic access and bulk downloads based on feedback from users including the deployment of online analysis tools
|
0.927 |