Although the human germline mutation rate is higher than that in any other well-studied species, the rate is not exceptional once the effective genome size and effective population size are taken into consideration. Human somatic mutation rates are substantially elevated above those in the germline, but this is also seen in other species. What is exceptional about humans is the recent detachment from the challenges of the natural environment and the ability to modify phenotypic traits in ways that mitigate the fitness effects of mutations, e.g., precision and personalized medicine. This results in a relaxation of selection against mildly deleterious mutations, including those magnifying the mutation rate itself. The long-term consequence of such effects is an expected genetic deterioration in the baseline human condition, potentially measurable on the timescale of a few generations in westernized societies, and because the brain is a particularly large mutational target, this is of particular concern. Ultimately, the price will have to be covered by further investment in various forms of medical intervention. Resolving the uncertainties of the magnitude and timescale of these effects will require the establishment of stable, standardized, multigenerational measurement procedures for various human traits.
The advent of genome editing techniques based on the clustered regularly interspersed short palindromic repeats (CRISPR)–Cas9 system has revolutionized research in the biological sciences. CRISPR is quickly becoming an indispensible experimental tool for researchers using genetic model organisms, including the nematode Caenorhabditis elegans. Here, we provide an overview of CRISPR-based strategies for genome editing in C. elegans. We focus on practical considerations for successful genome editing, including a discussion of which strategies are best suited to producing different kinds of targeted genome modifications.
Caenorhabditis elegans; CRISPR/Cas9; genome editing; WormBook
complex trait; genomics; population genetics; quantitative genetics
Estimation of epidemiological and population parameters from molecular sequence data has become central to the understanding of infectious disease dynamics. Various models have been proposed to infer details of the dynamics that describe epidemic progression. These include inference approaches derived from Kingman’s coalescent theory. Here, we use recently described coalescent theory for epidemic dynamics to develop stochastic and deterministic coalescent susceptible–infected–removed (SIR) tree priors. We implement these in a Bayesian phylogenetic inference framework to permit joint estimation of SIR epidemic parameters and the sample genealogy. We assess the performance of the two coalescent models and also juxtapose results obtained with a recently published birth–death-sampling model for epidemic inference. Comparisons are made by analyzing sets of genealogies simulated under precisely known epidemiological parameters. Additionally, we analyze influenza A (H1N1) sequence data sampled in the Canterbury region of New Zealand and HIV-1 sequence data obtained from known United Kingdom infection clusters. We show that both coalescent SIR models are effective at estimating epidemiological parameters from data with large fundamental reproductive number R0 and large population size S0. Furthermore, we find that the stochastic variant generally outperforms its deterministic counterpart in terms of error, bias, and highest posterior density coverage, particularly for smaller R0 and S0. However, each of these inference models is shown to have undesirable properties in certain circumstances, especially for epidemic outbreaks with R0 close to one or with small effective susceptible populations.
Bayesian inference; phylodynamics; coalescent; epidemic; stochastic
Since the discovery of microRNAs (miRNAs) only two decades ago, they have emerged as an essential component of the gene regulatory machinery. miRNAs have seemingly paradoxical features: a single miRNA is able to simultaneously target hundreds of genes, while its presence is mostly dispensable for animal viability under normal conditions. It is known that miRNAs act as stress response factors; however, it remains challenging to determine their relevant targets and the conditions under which they function. To address this challenge, we propose a new workflow for miRNA function analysis, by which we found that the evolutionarily young miRNA family, the mir-310s (mir-310/mir-311/mir-312/mir-313), are important regulators of Drosophila metabolic status. mir-310s-deficient animals have an abnormal diet-dependent expression profile for numerous diet-sensitive components, accumulate fats, and show various physiological defects. We found that the mir-310s simultaneously repress the production of several regulatory factors (Rab23, DHR96, and Ttk) of the evolutionarily conserved Hedgehog (Hh) pathway to sharpen dietary response. As the mir-310s expression is highly dynamic and nutrition sensitive, this signal relay model helps to explain the molecular mechanism governing quick and robust Hh signaling responses to nutritional changes. Additionally, we discovered a new component of the Hh signaling pathway in Drosophila, Rab23, which cell autonomously regulates Hh ligand trafficking in the germline stem cell niche. How organisms adjust to dietary fluctuations to sustain healthy homeostasis is an intriguing research topic. These data are the first to report that miRNAs can act as executives that transduce nutritional signals to an essential signaling pathway. This suggests miRNAs as plausible therapeutic agents that can be used in combination with low calorie and cholesterol diets to manage quick and precise tissue-specific responses to nutritional changes.
Drosophila; oogenesis; follicle stem cell; Hedgehog signaling; miRNA; the mir-310s; Rab23; dietary restriction; metabolic stress; Hh ligand
A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species.
Populus; whole-genome resequencing; nucleotide polymorphism; recombination; natural selection
The inference of demographic history from genome data is hindered by a lack of efficient computational approaches. In particular, it has proved difficult to exploit the information contained in the distribution of genealogies across the genome. We have previously shown that the generating function (GF) of genealogies can be used to analytically compute likelihoods of demographic models from configurations of mutations in short sequence blocks (Lohse et al. 2011). Although the GF has a simple, recursive form, the size of such likelihood calculations explodes quickly with the number of individuals and applications of this framework have so far been mainly limited to small samples (pairs and triplets) for which the GF can be written by hand. Here we investigate several strategies for exploiting the inherent symmetries of the coalescent. In particular, we show that the GF of genealogies can be decomposed into a set of equivalence classes that allows likelihood calculations from nontrivial samples. Using this strategy, we automated blockwise likelihood calculations for a general set of demographic scenarios in Mathematica. These histories may involve population size changes, continuous migration, discrete divergence, and admixture between multiple populations. To give a concrete example, we calculate the likelihood for a model of isolation with migration (IM), assuming two diploid samples without phase and outgroup information. We demonstrate the new inference scheme with an analysis of two individual butterfly genomes from the sister species Heliconius melpomene rosina and H. cydno.
maximum likelihood; population divergence; gene flow; structured coalescent; generating function
The degree of concordance between populations in the genetic architecture of a given trait is an important issue in medical and evolutionary genetics. Here, we address this problem, using a replicated pooled genome-wide association study approach (Pool-GWAS) to compare the genetic basis of variation in abdominal pigmentation in female European and South African Drosophila melanogaster. We find that, in both the European and the South African flies, variants near the tan and bric-à-brac 1 (bab1) genes are most strongly associated with pigmentation. However, the relative contribution of these loci differs: in the European populations, tan outranks bab1, while the converse is true for the South African flies. Using simulations, we show that this result can be explained parsimoniously, without invoking different causal variants between the populations, by a combination of frequency differences between the two populations and dominance for the causal alleles at the bab1 locus. Our results demonstrate the power of cost-effective, replicated Pool-GWAS to shed light on differences in the genetic architecture of a given trait between populations.
GWAS; genome-wide association study; pigmentation; dominance; Drosophila melanogaster; pooled sequencing
Animals from flies to humans adjust their development in response to environmental conditions through a series of developmental checkpoints, which alter the sensitivity of organs to environmental perturbation. Despite their importance, we know little about the molecular mechanisms through which this change in sensitivity occurs. Here we identify two phases of sensitivity to larval nutrition that contribute to plasticity in ovariole number, an important determinant of fecundity, in Drosophila melanogaster. These two phases of sensitivity are separated by the developmental checkpoint called “critical weight”; poor nutrition has greater effects on ovariole number in larvae before critical weight than after. We find that this switch in sensitivity results from distinct developmental processes. In precritical weight larvae, poor nutrition delays the onset of terminal filament cell differentiation, the starting point for ovariole development, and strongly suppresses the rate of terminal filament addition and the rate of increase in ovary volume. Conversely, in postcritical weight larvae, poor nutrition affects only the rate of increase in ovary volume. Our results further indicate that two hormonal pathways, the insulin/insulin-like growth factor and the ecdysone-signaling pathways, modulate the timing and rates of all three developmental processes. The change in sensitivity in the ovary results from changes in the relative contribution of each pathway to the rates of terminal filament addition and increase in ovary volume before and after critical weight. Our work deepens our understanding of how hormones act to modify the sensitivity of organs to environmental conditions, thereby affecting their plasticity.
critical period for environmental sensitivity; ecdysone signaling; insulin/insulin-like growth factor signaling; larval nutrition; ovary size
Recent and rapid advances in genetic and molecular tools have brought spectacular tractability to Caenorhabditis elegans, a model that was initially prized because of its simple design and ease of imaging. C. elegans has long been a powerful model in biomedical research, and tools such as RNAi and the CRISPR/Cas9 system allow facile knockdown of genes and genome editing, respectively. These developments have created an additional opportunity to tackle one of the most debilitating burdens on global health and food security: parasitic nematodes. I review how development of nonparasitic nematodes as genetic models informs efforts to import tools into parasitic nematodes. Current tools in three commonly studied parasites (Strongyloides spp., Brugia malayi, and Ascaris suum) are described, as are tools from C. elegans that are ripe for adaptation and the benefits and barriers to doing so. These tools will enable dissection of a huge array of questions that have been all but completely impenetrable to date, allowing investigation into host–parasite and parasite–vector interactions, and the genetic basis of parasitism.
C. elegans; CRISPR; parasitic nematode; tool development
Many computations with SNP data including genomic evaluation, parameter estimation, and genome-wide association studies use an inverse of the genomic relationship matrix. The cost of a regular inversion is cubic and is prohibitively expensive for large matrices. Recent studies in cattle demonstrated that the inverse can be computed in almost linear time by recursion on any subset of ∼10,000 individuals. The purpose of this study is to present a theory of why such a recursion works and its implication for other populations. Assume that, because of a small effective population size, the additive information in a genotyped population has a small dimensionality, even with a very large number of SNP markers. That dimensionality is visible as a limited number of effective SNP effects, independent chromosome segments, or the rank of the genomic relationship matrix. Decompose a population arbitrarily into core and noncore individuals, with the number of core individuals equal to that dimensionality. Then, breeding values of noncore individuals can be derived by recursions on breeding values of core individuals, with coefficients of the recursion computed from the genomic relationship matrix. A resulting algorithm for the inversion called “algorithm for proven and young” (APY) has a linear computing and memory cost for noncore animals. Noninfinitesimal genetic architecture can be accommodated through a trait-specific genomic relationship matrix, possibly derived from Bayesian regressions. For populations with small effective population size, the inverse of the genomic relationship matrix can be computed inexpensively for a very large number of genotyped individuals.
genomic relationship matrix; genomic selection; inversion; single-step GBLUP; recursion
High-throughput screens allow us to understand how transcription factors trigger developmental processes, including cell specification. A major challenge is identification of their binding sites because feedback loops and homeostatic interactions may mask the direct impact of those factors in transcriptome analyses. Moreover, this approach dissects the downstream signaling cascades and facilitates identification of conserved transcriptional programs. Here we show the results and the validation of a DNA adenine methyltransferase identification (DamID) genome-wide screen that identifies the direct targets of Glide/Gcm, a potent transcription factor that controls glia, hemocyte, and tendon cell differentiation in Drosophila. The screen identifies many genes that had not been previously associated with Glide/Gcm and highlights three major signaling pathways interacting with Glide/Gcm: Notch, Hedgehog, and JAK/STAT, which all involve feedback loops. Furthermore, the screen identifies effector molecules that are necessary for cell-cell interactions during late developmental processes and/or in ontogeny. Typically, immunoglobulin (Ig) domain–containing proteins control cell adhesion and axonal navigation. This shows that early and transiently expressed fate determinants not only control other transcription factors that, in turn, implement a specific developmental program but also directly affect late developmental events and cell function. Finally, while the mammalian genome contains two orthologous Gcm genes, their function has been demonstrated in vertebrate-specific tissues, placenta, and parathyroid glands, begging questions on the evolutionary conservation of the Gcm cascade in higher organisms. Here we provide the first evidence for the conservation of Gcm direct targets in humans. In sum, this work uncovers novel aspects of cell specification and sets the basis for further understanding of the role of conserved Gcm gene regulatory cascades.
glide/gcm; Drosophila; DamID; mGcm; screen
Germ cell specification as sperm or oocyte is an ancient cell fate decision, but its molecular regulation is poorly understood. In Caenorhabditis elegans, the FOG-1 and FOG-3 proteins behave genetically as terminal regulators of sperm fate specification. Both are homologous to well-established RNA regulators, suggesting that FOG-1 and FOG-3 specify the sperm fate post-transcriptionally. We predicted that FOG-1 and FOG-3, as terminal regulators of the sperm fate, might regulate a battery of gamete-specific differentiation genes. Here we test that prediction by exploring on a genomic scale the messenger RNAs (mRNAs) associated with FOG-1 and FOG-3. Immunoprecipitation of the proteins and their associated mRNAs from spermatogenic germlines identifies 81 FOG-1 and 722 FOG-3 putative targets. Importantly, almost all FOG-1 targets are also FOG-3 targets, and these common targets are strongly biased for oogenic mRNAs. The discovery of common target mRNAs suggested that FOG-1 and FOG-3 work together. Consistent with that idea, we find that FOG-1 and FOG-3 proteins co-immunoprecipitate from both intact nematodes and mammalian tissue culture cells and that they colocalize in germ cells. Taking our results together, we propose a model in which FOG-1 and FOG-3 work in a complex to repress oogenic transcripts and thereby promote the sperm fate.
sperm/oocyte decision; FOG-1; FOG-3; CPEB; Tob
Longitudinal allele frequency data are becoming increasingly prevalent. Such samples permit statistical inference of the population genetics parameters that influence the fate of mutant variants. To infer these parameters by maximum likelihood, the mutant frequency is often assumed to evolve according to the Wright–Fisher model. For computational reasons, this discrete model is commonly approximated by a diffusion process that requires the assumption that the forces of natural selection and mutation are weak. This assumption is not always appropriate. For example, mutations that impart drug resistance in pathogens may evolve under strong selective pressure. Here, we present an alternative approximation to the mutant-frequency distribution that does not make any assumptions about the magnitude of selection or mutation and is much more computationally efficient than the standard diffusion approximation. Simulation studies are used to compare the performance of our method to that of the Wright–Fisher and Gaussian diffusion approximations. For large populations, our method is found to provide a much better approximation to the mutant-frequency distribution when selection is strong, while all three methods perform comparably when selection is weak. Importantly, maximum-likelihood estimates of the selection coefficient are severely attenuated when selection is strong under the two diffusion models, but not when our method is used. This is further demonstrated with an application to mutant-frequency data from an experimental study of bacteriophage evolution. We therefore recommend our method for estimating the selection coefficient when the effective population size is too large to utilize the discrete Wright–Fisher model.
population genetics; allele frequencies; selection coefficient; Wright–Fisher model; diffusion approximation
Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation.
biotechnology; genome mapping; structural variation detection
In closed mitotic systems such as Saccharomyces cerevisiae, the nuclear envelope (NE) does not break down during mitosis, so microtubule-organizing centers such as the spindle-pole body (SPB) must be inserted into the NE to facilitate bipolar spindle formation and chromosome segregation. The mechanism of SPB insertion has been linked to NE insertion of nuclear pore complexes (NPCs) through a series of genetic and physical interactions between NPCs and SPB components. To identify new genes involved in SPB duplication and NE insertion, we carried out genome-wide screens for suppressors of deletion alleles of SPB components, including Mps3 and Mps2. In addition to the nucleoporins POM152 and POM34, we found that elimination of SEC66/SEC71/KAR7 suppressed lethality of cells lacking MPS2 or MPS3. Sec66 is a nonessential subunit of the Sec63 complex that functions together with the Sec61 complex in import of proteins into the endoplasmic reticulum (ER). Cells lacking Sec66 have reduced levels of Pom152 protein but not Pom34 or Ndc1, a shared component of the NPC and SPB. The fact that Sec66 but not other subunits of the ER translocon bypass deletion mutants in SPB genes suggests a specific role for Sec66 in the control of Pom152 levels. Based on the observation that sec66∆ does not affect the distribution of Ndc1 on the NE or Ndc1 binding to the SPB, we propose that Sec66-mediated regulation of Pom152 plays an NPC-independent role in the control of SPB duplication.
Mps3; Sec66/Sec71/Kar7; spindle-pole body; Pom152; Nbp1
MinION is a memory stick–sized nanopore-based sequencer designed primarily for single-molecule sequencing of long DNA fragments (>6 kb). We developed a library preparation and data-analysis method to enable rapid real-time sequencing of short DNA fragments (<1 kb) that resulted in the sequencing of 500 reads in 3 min and 40,000–80,000 reads in 2–4 hr at a rate of 30 nt/sec. We then demonstrated the clinical applicability of this approach by performing successful aneuploidy detection in prenatal and miscarriage samples with sequencing in <4 hr. This method broadens the application of nanopore-based single-molecule sequencing and makes it a promising and versatile tool for rapid clinical and research applications.
bioinformatics; cytogenetics; nanopore; single-molecule sequencing; rapid aneuploidy detection
We describe an adaptation of φC31 integrase–mediated targeted cassette exchange for use in Drosophila cell lines. Single copies of an attP-bounded docking platform carrying a GFP-expression marker, with or without insulator elements flanking the attP sites, were inserted by P-element transformation into the Kc167 and Sg4 cell lines; each of the resulting docking-site lines carries a single mapped copy of one of the docking platforms. Vectors for targeted substitution contain a cloning cassette flanked by attB sites. Targeted substitution occurs by integrase-mediated substitution between the attP sites (integrated) and the attB sites (vector). We describe procedures for isolating cells carrying the substitutions and for eliminating the products of secondary off-target events. We demonstrate the technology by integrating a cassette containing a Cu2+-inducible mCherry marker, and we report the expression properties of those lines. When compared with clonal lines made by traditional transformation methods, which lead to the illegitimate insertion of tandem arrays, targeted insertion lines give more uniform expression, lower basal expression, and higher induction ratios. Targeted substitution, though intricate, affords results that should greatly improve comparative expression assays—a major emphasis of cell-based studies.
Drosophila; cell lines; phiC31 integrase; targeted insertion
The site frequency spectrum (SFS) and other genetic summary statistics are at the heart of many population genetic studies. Previous studies have shown that human populations have undergone a recent epoch of fast growth in effective population size. These studies assumed that growth is exponential, and the ensuing models leave an excess amount of extremely rare variants. This suggests that human populations might have experienced a recent growth with speed faster than exponential. Recent studies have introduced a generalized growth model where the growth speed can be faster or slower than exponential. However, only simulation approaches were available for obtaining summary statistics under such generalized models. In this study, we provide expressions to accurately and efficiently evaluate the SFS and other summary statistics under generalized models, which we further implement in a publicly available software. Investigating the power to infer deviation of growth from being exponential, we observed that adequate sample sizes facilitate accurate inference; e.g., a sample of 3000 individuals with the amount of data expected from exome sequencing allows observing and accurately estimating growth with speed deviating by ≥10% from that of exponential. Applying our inference framework to data from the NHLBI Exome Sequencing Project, we found that a model with a generalized growth epoch fits the observed SFS significantly better than the equivalent model with exponential growth (P-value =3.85×10−6). The estimated growth speed significantly deviates from exponential (P-value ≪10−12), with the best-fit estimate being of growth speed 12% faster than exponential.
coalescent; generalized models; population growth; human demographic history; software
Speciation is fundamental to the process of generating the huge diversity of life on Earth. However, we are yet to have a clear understanding of its molecular-genetic basis. Here, we examine a computational model of reproductive isolation that explicitly incorporates a map from genotype to phenotype based on the biophysics of protein–DNA binding. In particular, we model the binding of a protein transcription factor to a DNA binding site and how their independent coevolution, in a stabilizing fitness landscape, of two allopatric lineages leads to incompatibilities. Complementing our previous coarse-grained theoretical results, our simulations give a new prediction for the monomorphic regime of evolution that smaller populations should develop incompatibilities more quickly. This arises as (1) smaller populations have a greater initial drift load, as there are more sequences that bind poorly than well, so fewer substitutions are needed to reach incompatible regions of phenotype space, and (2) slower divergence when the population size is larger than the inverse of discrete differences in fitness. Further, we find longer sequences develop incompatibilities more quickly at small population sizes, but more slowly at large population sizes. The biophysical model thus represents a robust mechanism of rapid reproductive isolation for small populations and large sequences that does not require peak shifts or positive selection. Finally, we show that the growth of DMIs with time is quadratic for small populations, agreeing with Orr’s model, but nonpower law for large populations, with a form consistent with our previous theoretical results.
speciation; Dobzhansky–Muller incompatibilities; sequence entropy; population size; coevolution; genotype–phenotype map
The fission yeast Schizosaccharomyces pombe is an important model organism for the study of eukaryotic molecular and cellular biology. Studies of S. pombe, together with studies of its distant cousin, Saccharomyces cerevisiae, have led to the discovery of genes involved in fundamental mechanisms of transcription, translation, DNA replication, cell cycle control, and signal transduction, to name but a few processes. However, since the divergence of the two species approximately 350 million years ago, S. pombe appears to have evolved less rapidly than S. cerevisiae so that it retains more characteristics of the common ancient yeast ancestor, causing it to share more features with metazoan cells. This Primer introduces S. pombe by describing the yeast itself, providing a brief description of the origins of fission yeast research, and illustrating some genetic and bioinformatics tools used to study protein function in fission yeast. In addition, a section on some key differences between S. pombe and S. cerevisiae is included for readers with some familiarity with budding yeast research but who may have an interest in developing research projects using S. pombe.
education; fission yeast; forward genetics; genetic screen; Model Organism Database; model system; Primer; Schizosaccharomyces pombe; Saccharomyces cerevisiae
Immunological memory, which protects organisms from re-infection, is a hallmark of the mammalian adaptive immune system and the underlying principle of vaccination. In early life, however, mice and other mammals are deficient at generating memory CD8+ T cells, which protect organisms from intracellular pathogens. The molecular basis that differentiates adult and neonatal CD8+ T cells is unknown. MicroRNAs (miRNAs) are both developmentally regulated and required for normal adult CD8+ T cell functions. We used next-generation sequencing to identify mouse miRNAs that are differentially regulated in adult and neonatal CD8+ T cells, which may contribute to the impaired development of neonatal memory cells. The miRNA profiles of adult and neonatal cells were surprisingly similar during infection; however, we observed large differences prior to infection. In particular, miR-29 and miR-130 have significant differential expression between adult and neonatal cells before infection. Importantly, using RNA-Seq, we detected reciprocal changes in expression of messenger RNA targets for both miR-29 and miR-130. Moreover, targets that we validated include Eomes and Tbx21, key genes that regulate the formation of memory CD8+ T cells. Notably, age-dependent changes in miR-29 and miR-130 are conserved in human CD8+ T cells, further suggesting that these developmental differences are biologically relevant. Together, these results demonstrate that miR-29 and miR-130 are likely important regulators of memory CD8+ T cell formation and suggest that neonatal cells are committed to a short-lived effector cell fate prior to infection.
microRNA regulation; adaptive immunity; development