2004 — 2010 |
Makova, Kateryna |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Male Mutation Bias and Paternal Age Effect in Mammals @ Pennsylvania State University-Univ Park
DESCRIPTION (provided by applicant): Mutations are the cause of many human genetic diseases and the main source of genetic variation in natural populations. Thus, elucidating the mechanisms of mutagenesis is of great significance. The fact that, in mammals, the number of germline cell divisions (and of DNA replications) is higher in males than in females provides an opportunity to test whether mutations result from errors in DNA replication. If this hypothesis is true, we expect higher mutation rate in males than in females (male mutation bias) and higher mutation rate in older males than in younger males (paternal age effect). Here we will employ the tools of comparative genomics and bioinformatics to analyze available mammalian genomic sequences and generate additional data experimentally to test the following specific hypotheses: 1. Errors in DNA replication are the primary sources of insertions and deletions (indels). 2. Nucleotide substitutions, particularly at CpG dinucleotides, depend on the number of germline cell divisions. To test these first two hypotheses we will estimate mutation rates from mammalian whole-genome alignments and compare these rates between sex chromosomes and autosomes. 3. Microsatellite repeat expansions and contractions are caused by errors in DNA replication. This will be tested by observing de novo mutations in single sperm of human males of different ages. 4. The magnitude of male mutation bias and generation time are positively correlated in mammals. To investigate this, we will sequence introns of genes homologous between X and Y in mammals with long generation time (Cetacea and Perissodactyla) and analyze additional data from the literature. The proposed research has direct relevance to issues of public health and clinical genetics. The overwhelming majority of mutations causing human genetic diseases are indels, nucleotide substitutions, and microsatellite repeat expansions/contractions. Moreover, single nucleotide polymorphisms (SNPs), an outcome of nucleotide substitutions, and microsatellites are widely used markers for mapping diseases and traits in association studies. Thus, it is critical to know whether mutations at these loci are driven by replication-dependent or by replication-independent factors (e.g., environmental agents such as free radicals). Additionally, the conclusions of this project will be important for genetic counseling. Namely, our results will indicate whether the age of a father at the time of conception represents a risk factor for pathology in the offspring.
|
1 |
2009 — 2012 |
Eckert, Kristin A (co-PI) [⬀] Makova, Kateryna |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Computational and Biochemical Analysis of Microsatellite Life Cycle @ Pennsylvania State University-Univ Park
DESCRIPTION (provided by applicant): Microsatellite sequences are abundant in the human genome and have mutation rates orders of magnitude higher than any other genomic sequences. As a result, microsatellites are frequently used as markers in forensics and population genetics. Importantly, microsatellites influence genome functions by being part of protein-coding regions or by regulating gene expression, and allele-length polymorphisms at microsatellites are implicated as genetic risk factors in several diseases. Because the full impact of microsatellite changes on genome function has yet to be elucidated, it is of utmost importance to gain knowledge about how microsatellite arise, mutate, and eventually cease to exist at individual loci in the human genome. The evolution of each microsatellite has been presented theoretically as a life cycle, with the stages of birth, active dynamic mutation activity, and death. However, the concept of the microsatellite life cycle has not been previously investigated in detail. The goal of this interdisciplinary proposal is to elucidate mechanisms defining microsatellite life cycle in the human genome. This will be accomplished by a combination of computational and biochemical approaches, and follows the NIH roadmap themes of Interdisciplinary research and Bioinformatics and computational biology. Specific Aim 1 is to determine the mechanisms of microsatellite birth. We will use biochemical experiments to determine the microsatellite threshold in terms of the minimal number of repeats (or length) required for dynamic mutations to occur. These thresholds will be determined for various motifs, and will be used in computational analyses to examine mechanisms and densities of new microsatellite births. The results of this aim will allow us for the first time to derive a regression model explaining variation in microsatellite birth densities across the genome. Specific Aim 2 will examine microsatellite interruption and death. Our preliminary studies demonstrate that microsatellite interruptions can be observed frequently in the human genome, and that DNA polymerases can directly produce such interruptions in vitro. This aim will use computational and biochemical techniques to measure the mutational consequences of interruptions and the extent to which they contribute to microsatellite death. Specific Aim 3 is to computationally determine the mechanisms contributing to variation in mature microsatellite mutation rates among and within individual human genomes, and to biochemically determine specific mechanisms contributed by intrinsic features. Overall, the results of this project will be of considerable significance for our understanding of the dynamics of genome evolution. Additionally, our research proposal has direct relevance to the issues of public health and clinical genetics. The new information gained by our research can be used to predict the probability of each microsatellite to undergo mutation or cease to exist, and the probability of any genomic region to bear a new microsatellite. This will have major importance for assessing an individual's disease risks, especially in the era when individual human genomes are being rapidly sequenced. PUBLIC HEALTH RELEVANCE: Repetitive DNA sequences, called microsatellites, are characteristic of primate genomes and are known to regulate gene expression, and mutations within microsatellite sequences are causally linked to the development of several human diseases. Our interdisciplinary project will elucidate the mechanisms whereby microsatellites arise, mutate, and disappear at distinct loci in individual human genomes. This research could have major consequences for predicting the risk of diseases caused by microsatellites.
|
1 |
2010 — 2015 |
Makova, Kateryna Chiaromonte, Francesca (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Computational Tools and Statistical Analysis of Co-Varying Rates of Different Mutation Types @ Pennsylvania State Univ University Park
The Pennsylvania State University at University Park is awarded a grant to study regional (primarily intra-chromosomal) variation and co-variation in rates of different mutation types from comparisons of completely sequenced mammalian genomes and human re-sequencing data. Mutations are the source of genetic variation in natural populations and provide material for molecular evolution, yet the mechanisms of mutagenesis are, to date, not completely understood. The project will first characterize the regional variation and co-variation of mutation types such as nucleotide substitutions, small insertions and deletions, and changes in microsatellite repeat number, examining their genomic co-occurrence and linear association at multiple scales. This will lead to insights on co-occurring mutation types and on genomic scales at which their co-variation prevails. Secondly, the team will investigate potential causes of regional rate variation and co-variation for these mutation types, simultaneously relating them to genome landscape features with linear approaches at multiple scales. This will lead to insights on the role of genomic features in explaining regional rate variation and co-variation for mutations of different types. Thirdly, they will assess the need for, and implement, non-linear analysis techniques and regression methods. Fourth, the results on mutation rates and co-occurrence will be used to improve computational predictions of functional regions by means of background corrections exploiting several mutation types simultaneously. Local corrections will be based on rates of multiple mutation types, and employed to improve the performance of functional element prediction algorithms. Finally, the computational and statistical tools developed in this project will be implemented in readily accessible software suites in Galaxy, a free-standing genome analysis platform (http://galaxyproject.org). Tools for detecting mutations, estimating and apportioning mutation rates and genomic features in windows, and applying multivariate, multi-scale and non-linear techniques will be integrated into a web-based platform that will make them readily available for the analysis of any sequenced genomes. The resulting framework will be highly interactive, based on proven methodology, and easily accessible by other researchers and educators with no need for programming experience. Graduate students working on this project will acquire interdisciplinary training in biology, computer science, and statistics. Undergraduate students from underrepresented groups, women and minorities, will be recruited for the project through existing programs at Penn State.
|
1 |
2015 — 2018 |
Makova, Kateryna Nielsen, Rasmus (co-PI) [⬀] |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
The Dynamics of Mitochondrial Mutations @ Pennsylvania State University-Univ Park
? DESCRIPTION (provided by applicant): Mutations in mtDNA cause >200 genetic diseases. One in eight unrelated females is a carrier of an mtDNA disease. Most of mtDNA diseases are heteroplasmic - for them, the disease?associated alleles are present alongside the wild?type alleles. The severity of the mtDNA diseases depends on the frequency of the disease associated alleles in a tissue. Therefore, it is critical to study how mtDNA allele frequencies change between generations and within a human body. Such changes are governed by mutation, genetic drift, and selection. Yet the basic parameters and relative contribution of these evolutionary processes to shaping the mtDNA genetic makeup have neither been modeled nor characterized in detail. The goal of this proposal is to investigate evolutionary processes governing mtDNA allele frequency changes between generations and among tissues of an individual. To achieve this goal, we have created a unique resource - collected samples from buccal and blood cells of mother and two child (mother?2child) trios from a human population (from Central Pennsylvania). We will sequence and analyze their mtDNA with newly developed computational and statistical tools packaged in a reproducible software pipeline that will be shared with the scientific community. In Aim 1 we will develop a novel population genetics framework for mtDNA mutation and drift, and will apply it to mtDNA sequences from 200 mother?2 child trios. We will estimate germ?line, embryonic, and somatic mutation rates and bottleneck sizes. The germ?line bottleneck size estimate is needed to predict heteroplasmy levels in children, and the embryonic one to parse the distribution of heteroplasmies among tissues. The germ?line mutation rate estimate will be useful for human evolutionary studies, and the somatic mutation rate estimate will inform us about mutation accumulation in mtDNA diseases. Aim 2 will test a potential effect of age on accumulation of mtDNA mutations in the female germ line. In addition to mother?2 child trios, here we will examine female germ line directly - by sequencing mtDNA from unfertilized oocytes from 100 women of different ages. mtDNA germ?line mutation rates might increase because of oocyte aging processes which create a mutagenic environment. An age?related increase in mtDNA mutation rate, if found, will be important for formulating family planning recommendations in modern Western societies in which reproduction is frequently delayed. In Aim 3 we will develop novel likelihood?based methods for detecting selection at mtDNA. Applying these methods to the mtDNA sequencing data from mother?2child trios and unfertilized oocytes, we will contrast the strength of germ?line vs. somatic mtDNA selection, and evaluate whether mitochondrial selection operates predominantly at the level of individual mitochondria in a cell (or cells within a tissue), or amon individuals in a population. This will significantly contribute to an ongoing debate about where in an organism mtDNA selection operates. Thus, using innovative methodology, we will address pivotal questions about mtDNA evolution and disease.
|
1 |
2016 — 2021 |
Chiaromonte, Francesca (co-PI) [⬀] Hardison, Ross C [⬀] Makova, Kateryna Shashikant, Cooduvalli S. (co-PI) [⬀] |
T32Activity Code Description: To enable institutions to make National Research Service Awards to individuals selected by them for predoctoral and postdoctoral research training in specified shortage areas. |
Computation, Bioinformatics, and Statistics (Cbios) Training Program @ Pennsylvania State University-Univ Park
DESCRIPTION (provided by applicant): Genomic data are transforming how scientists in medicine and basic science conduct research. The advancement of genome science requires a new generation of scientists with strong computational and statistical skills and the ability to effectively interact with experimentalists. The proposed Penn State Computation, Bioinformatics, and Statistics (CBIOS) Training Program will prepare a cadre of investigators to think innovatively and keep pace with the quickly evolving landscape of high throughput genomic technologies. The program faculty are interdisciplinary and highly collaborative, with expertise in computation, bioinformatics, statistics, functional, medical, and evolutionary genomics. Learning these discipline-crossing skills will make trainees competitive for future careers in emerging and rapidly advancing fields of comparative, systems, statistical and medical genomics. The educational objectives of the CBIOS program are to engender in the trainees the following: 1. A thorough understanding of hypothesis testing in the scientific process. 2. The ability to work from theory to data and back. 3. Fluency in the use of computational and statistical tools for high throughput data. 4. The ability to integrate and innovate computational and statistical analysis of high throughput data. 5. Excellence in cross-disciplinary scientific communication including ethical implications of computational and bioinformatics research. 6. The ability to lead cross-disciplinary research teams The CBIOS training program will accomplish these objectives through a set of existing core and elective courses along with a new practicum course, all of which are integrated with a journal club and seminar series. The program will enhance professional development through invited seminar speakers and retreats, and will specifically develop trainees' communication skills to enable dissemination of genomics research to a broad audience. Predoctoral trainees will be selected early in their graduate program for two years of intensive training. A total of 15 trainees (10 NIH and 5 PSU supported) will be trained during a five-year granting period. The faculty supporting this training program have a combined annual research funding base of $65 million direct costs, and thus offer a robust mentoring foundation for student research experience and opportunities.
|
1 |
2019 — 2021 |
Makova, Kateryna |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Y Chromosome Evolution @ Pennsylvania State University-Univ Park
PROJECT SUMMARY The male-specific Y chromosome is critical for sex determination and fertility. Yet, because of its highly repetitive structure and haploidy, its sequence has only been deciphered for a handful of mammalian species, including just three apes ? human, chimpanzee, and gorilla. The lack of Y chromosome sequences has made it difficult to obtain a complete picture of mammalian genome evolution, and has hampered studies of sex- specific dynamics in natural populations. In this project, we have chosen to study evolution of ape Y chromosomes because they differ enormously among species at the cytogenetic level, and because mating and dispersal patterns, which influence selection and genetic drift acting on the Y, vary dramatically among apes. Our goal is to decipher the evolutionary processes shaping ape Y chromosome evolution by examining Y interspecific divergence and intraspecific diversity. In Aim 1 we will study evolution of ape Y chromosome architecture. Applying our novel method based on the latest experimental and computational techniques, we assembled the Y chromosomes of gorilla, bonobo, and Bornean orangutan. Using these and publicly available ape Y assemblies in a phylogenetic framework, we will study several features of Y chromosome architecture: sequence divergence, gene content, and transposable element accumulation. Instances of lineage-specific accelerated or decelerated evolution of Y chromosome evolution will be identified and their causes will be explored in subsequent aims. In Aim 2 we will investigate evolutionary forces affecting global Y chromosome architecture by studying Y chromosome diversity. We will test whether the observed diversity patterns, as inferred from publicly available and in-house generated re-sequencing data, are consistent with random genetic drift or with positive or negative selection. In Aims 3 and 4, the selection targets will be identified. In Aim 3, we will decipher the individual gene sequences from short- and long-read transcriptome assemblies, construct gene phylogenies, and test for lineage-specific selection acting on individual genes and on individual gene copies for multi-copy gene families. Aim 4 will evaluate potential selection acting on the expression levels and copy number of multi-copy ampliconic gene families on the Y chromosome. These genes are expressed during spermatogenesis and their deletions have been implicated in human male infertility. Overall, our project will have important implications for uncovering the intricacies of ape genome evolution. The ape Y chromosome assemblies, alignments, and transcript catalogues will serve as an invaluable resource for addressing a myriad of long-standing biological questions and for designing genetic markers to trace the dispersal of male apes in the wild. This is critical, as all studied ape species are endangered. The techniques developed for this project will be shared with other researchers, enabling them to study Y chromosomes of other species. Our thorough investigation of evolution of ampliconic gene sequence, expression levels, and copy number will significantly contribute to our understanding of the causes of male infertility.
|
1 |
2021 |
Makova, Kateryna |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
The Impact of G-Quadruplexes On Genome Evolution @ Pennsylvania State University-Univ Park
PROJECT SUMMARY Apart from the double-helix B-DNA structure discovered by Watson and Crick, approximately 13% of the human genome comprises sequence motifs that can form non-canonical, or non-B, DNA conformations. This project focuses on G-quadruplexes, the type of non-B DNA for which we have the strongest evidence of genome-wide formation and functionality in human cells. There are more than 700,000 putative G-quadruplex loci in the human genome. They constitute ~1% of the genome, compared to ~1.5% occupied by protein-coding exons. Recent in vivo experiments showed that G-quadruplexes regulate key cellular processes (e.g., chromatin organization and transcription). Thus we hypothesize that some groups of G-quadruplex loci evolve under purifying selection. Yet, G-quadruplexes may represent a hurdle for DNA replication. Our published preliminary results, based on the analysis of long-read sequencing data, demonstrated decreased polymerization speed and increased polymerization errors at G-quadruplex loci genome-wide. We hypothesize that the same phenomena occur in human cells and lead to increased mutagenesis at G-quadruplex loci. Building upon our published and unpublished preliminary results, this project will examine the contribution of G-quadruplex motifs to genome evolution, which has been critically underexplored. Aim 1 will elucidate the mechanistic basis behind the increased mutation rate at G-quadruplex loci, using state-of-the-art high-fidelity duplex sequencing. With in vivo experiments, we will test a hypothesis that mutation rates are increased specifically at G-quadruplex structures forming in human cells and are associated with replication slowdown. With in vitro experiments, we will test a hypothesis that two major eukaryotic replicative polymerases (polymerases epsilon and delta, responsible for leading and lagging strand synthesis, respectively) stall and have increased error frequencies at G-quadruplexes. Aim 2 will assess the contribution of G-quadruplex loci to regional variation in mutation rates in the genome and will test a hypothesis that G-quadruplex loci facilitate structural variation in human populations and chromosomal rearrangements during evolution. Advanced statistical techniques, including ones from the Functional Data Analysis domain, will be used in this Aim. Finally, Aim 3 will examine selection acting on G-quadruplex loci using classical and novel statistical tests. We will test a hypothesis that G-quadruplexes located in different functional compartments of the genome experience varying selective pressures, e.g., promoter motifs are expected to evolve under strong purifying selection. Moreover, we will investigate a potential association between biophysical stability of G-quadruplex structures and the strength of selection acting on them. This Aim will also identify groups of physiologically relevant G-quadruplex loci that will drive future functional studies. Overall, the project will substantially advance our understanding of the contribution of G-quadruplexes to genome evolution and diseases.
|
1 |