Imprinted genes have been extensively documented in eutherian mammals and found to exhibit significant interspecific variation in the suites of genes that are imprinted and in their regulation between tissues and developmental stages. Much less is known about imprinted loci in metatherian (marsupial) mammals, wherein studies have been limited to a small number of genes previously known to be imprinted in eutherians. We describe the first ab initio search for imprinted marsupial genes, in fibroblasts from the opossum, Monodelphis domestica, based on a genome-wide ChIP-seq strategy to identify promoters that are simultaneously marked by mutually exclusive, transcriptionally opposing histone modifications.
We identified a novel imprinted gene (Meis1) and two additional monoallelically expressed genes, one of which (Cstb) showed allele-specific, but non-imprinted expression. Imprinted vs. allele-specific expression could not be resolved for the third monoallelically expressed gene (Rpl17). Transcriptionally opposing histone modifications H3K4me3, H3K9Ac, and H3K9me3 were found at the promoters of all three genes, but differential DNA methylation was not detected at CpG islands at any of these promoters.
In generating the first genome-wide histone modification profiles for a marsupial, we identified the first gene that is imprinted in a marsupial but not in eutherian mammals. This outcome demonstrates the practicality of an ab initio discovery strategy and implicates histone modification, but not differential DNA methylation, as a conserved mechanism for marking imprinted genes in all therian mammals. Our findings suggest that marsupials use multiple epigenetic mechanisms for imprinting and support the concept that lineage-specific selective forces can produce sets of imprinted genes that differ between metatherian and eutherian lines.
Genomic imprinting; Monoallelic expression; Histone modification; ChIP-seq; Monodelphis domestica; Marsupial
The initial site of smoking-induced lung disease is the small airway epithelium, which is difficult and time consuming to sample by fiberoptic bronchoscopy. We developed a rapid, office-based procedure to obtain trachea epithelium without conscious sedation from healthy nonsmokers (n=26) and healthy smokers (n=19, 27 ± 15 pack-yr). Gene expression differences (fold-change >1.5, p<0.01, Benjamini-Hochberg correction) were assessed with Affymetrix microarrays. 1,057 probe sets were differentially expressed in healthy smokers vs nonsmokers, representing >500 genes. Trachea gene expression was compared to an independent group of small airway epithelial samples (n=23 healthy nonsmokers, n=19 healthy smokers, 25 ± 12 pack-yr). The trachea epithelium is more sensitive to smoking, responding with 3-fold more differentially-expressed genes than small airway epithelium. The trachea transcriptome paralleled the small airway epithelium, with 156 of 167 (93%) genes that are significantly upand down-regulated by smoking in the small airway epithelium showing similar direction and magnitude of response to smoking in the trachea. Trachea epithelium can be obtained without conscious sedation, representing a less invasive surrogate “canary” for smoking-induced changes in the small airway epithelium. This should prove useful in epidemiologic studies correlating gene expression with clinical outcome in assessing smoking-induced lung disease.
The innate immune system in insects consists of a conserved core signaling network and rapidly diversifying effector and recognition components, often containing a high proportion of taxonomically-restricted genes. In the absence of functional annotation, genes encoding immune system proteins can thus be difficult to identify, as homology-based approaches generally cannot detect lineage-specific genes. Here, we use RNA-seq to compare the uninfected and infection-induced transcriptome in the parasitoid wasp Nasonia vitripennis to identify genes regulated by infection. We identify 183 genes significantly up-regulated by infection and 61 genes significantly down-regulated by infection. We also produce a new homology-based immune catalog in N. vitripennis, and show that most infection-induced genes cannot be assigned an immune function from homology alone, suggesting the potential for substantial novel immune components in less well-studied systems. Finally, we show that a high proportion of these novel induced genes are taxonomically restricted, highlighting the rapid evolution of immune gene content. The combination of functional annotation using RNA-seq and homology-based annotation provides a robust method to characterize the innate immune response across a wide variety of insects, and reveals significant novel features of the Nasonia immune response.
Screening of small molecule libraries offers the potential to identify compounds that inhibit specific biological processes and, ultimately, to identify macromolecules that are important players in such processes. To date, however, most screens of small molecule libraries have focused on identification of compounds that inhibit known proteins or particular steps in a given process, and have emphasized automated primary screens. Here we have used “low tech” in vivo primary screens to identify small molecules that inhibit both cytokinesis and single cell wound repair, two complex cellular processes that possess many common features. The “diversity set”, an ordered array of 1990 compounds available from the National Cancer Institute, was screened in parallel to identify compounds that inhibit cytokinesis in D. excentricus (sand dollar) embryos and single cell wound repair in X. laevis (frog) oocytes. Two small molecules were thus identified: Sph1 and Sph2. Sph1 reduces Rho activation in wound repair and suppresses formation of the spindle midzone during cytokinesis. Sph2 also reduces Rho activation in wound repair and may inhibit cytokinesis by blocking membrane fusion. The results identify two small molecules of interest for analysis of wound repair and cytokinesis, reveal that these processes are more similar than often realized and reveal the potential power of low tech screens of small molecule libraries for analysis of complex cellular processes.
This study addresses the question of how purifying selection operates during recent rapid population growth such as has been experienced by human populations. This is not a straightforward problem because the human population is not at equilibrium: population genetics predicts that, on the one hand, the efficacy of natural selection increases as population size increases, eliminating ever more weakly deleterious variants; on the other hand, a larger number of deleterious mutations will be introduced into the population and will be more likely to increase in their number of copies as the population grows. To understand how patterns of human genetic variation have been shaped by the interaction of natural selection and population growth, we examined the trajectories of mutations with varying selection coefficients, using computer simulations. We observed that while population growth dramatically increases the number of deleterious segregating sites in the population, it only mildly increases the number carried by each individual. Our simulations also show an increased efficacy of natural selection, reflected in a higher fraction of deleterious mutations eliminated at each generation and a more efficient elimination of the most deleterious ones. As a consequence, while each individual carries a larger number of deleterious alleles than expected in the absence of growth, the average selection coefficient of each segregating allele is less deleterious. Combined, our results suggest that the genetic risk of complex diseases in growing populations might be distributed across a larger number of more weakly deleterious rare variants.
purifying selection; exponential growth; deleterious mutations; demographic history; human
The parasitoid wasp Nasonia vitripennis is an emerging genetic model for functional analysis of DNA methylation. Here, we characterize genome-wide methylation at a base-pair resolution, and compare these results to gene expression across five developmental stages and to methylation patterns reported in other insects. An accurate assessment of DNA methylation across the genome is accomplished using bisulfite sequencing of adult females from a highly inbred line. One-third of genes show extensive methylation over the gene body, yet methylated DNA is not found in non-coding regions and rarely in transposons. Methylated genes occur in small clusters across the genome. Methylation demarcates exon-intron boundaries, with elevated levels over exons, primarily in the 5′ regions of genes. It is also elevated near the sites of translational initiation and termination, with reduced levels in 5′ and 3′ UTRs. Methylated genes have higher median expression levels and lower expression variation across development stages than non-methylated genes. There is no difference in frequency of differential splicing between methylated and non-methylated genes, and as yet no established role for methylation in regulating alternative splicing in Nasonia. Phylogenetic comparisons indicate that many genes maintain methylation status across long evolutionary time scales. Nasonia methylated genes are more likely to be conserved in insects, but even those that are not conserved show broader expression across development than comparable non-methylated genes. Finally, examination of duplicated genes shows that those paralogs that have lost methylation in the Nasonia lineage following gene duplication evolve more rapidly, show decreased median expression levels, and increased specialization in expression across development. Methylation of Nasonia genes signals constitutive transcription across developmental stages, whereas non-methylated genes show more dynamic developmental expression patterns. We speculate that loss of methylation may result in increased developmental specialization in evolution and acquisition of methylation may lead to broader constitutive expression.
Insects use methylation to modulate genome function in a different manner from vertebrates. Here, we quantified the global methylation profile in a parasitic wasp species, Nasonia vitripennis, a model with some advantages over ant and honeybee for functional and genetic analyses of methylation, such as short generation time, inbred lines, and inter-fertile species. Using a highly inbred line permitted us to precisely characterize DNA methylation, which is compared to gene expression variation across developmental stages, and contrasted to other insect species. DNA methylation is almost exclusively on the 5′-most 1 kbp coding exons, and ∼1/3 of protein coding genes are methylated. Methylated genes tend to occur in small clusters in the genome. Unlike many organisms, Nasonia leaves nearly all transposable element genes non-methylated. Methylated genes exhibit more uniform expression across developmental stages for both moderately and highly expressed genes, suggesting that DNA methylation is marking the genes for constitutive expression. Among pairs of differentially methylated duplicated genes, the paralogs that lose DNA methylation after duplication in the Nasonia lineage show lower expression and greater specialization of expression. Finally, by comparative analysis, we show that methylated genes are more conserved at three different time scales during evolution.
Characterizing and understanding the complex spectrum of lipids in higher organisms lags far behind our analysis of genome and transcriptome sequences. Here we generate and evaluate comprehensive lipid profiles (>200 lipids) of 92 inbred lines from five different Drosophila melanogaster populations. We find that the majority of lipid species are highly heritable, and even lipids with odd-chain fatty acids, which cannot be generated by the fly itself, also have high heritabilities. Abundance of the endosymbiont Wolbachia, a potential provider of odd-chained lipids, was positively correlated with this group of lipids. Additionally, we show that despite years of laboratory rearing on the same medium, the lipid profiles of the five geographic populations are sufficiently distinct for population discrimination. Our data predicts a strikingly different membrane fluidity for flies from the Netherlands, which is supported by their increased ethanol tolerance. We find that 18% of lipids show strong concentration differences between males and females. Through an analysis of the correlation structure of the lipid classes, we find modules of co-regulated lipids and begin to associate these with metabolic constraints. Our data provide a foundation for developing associations between variation in lipid composition with variation in other metabolic attributes, with genome-wide variation, and with metrics of health and overall reproductive fitness.
Molecular evolutionary theory predicts that the ratio of autosomal to X-linked adaptive substitution (KA/Kx) is primarily determined by the average dominance coefficient of beneficial mutations. Although this theory has profoundly influenced analysis and interpretation of comparative genomic data, its predictions are based upon two unverified assumptions about the genetic basis of adaptation. The theory assumes that 1) the rate of adaptively driven molecular evolution is limited by the availability of beneficial mutations, and 2) the scaling of evolutionary parameters between the X and the autosomes (e.g., the beneficial mutation rate, and the fitness effect distribution of beneficial alleles, per X-linked versus autosomal locus) is constant across molecular evolutionary timescales. Here, we show that the genetic architecture underlying bouts of adaptive substitution can influence both assumptions, and consequently, the theoretical relationship between KA/Kx and mean dominance. Quantitative predictions of prior theory apply when 1) many genomically dispersed genes potentially contribute beneficial substitutions during individual steps of adaptive walks, and 2) the population beneficial mutation rate, summed across the set of potentially contributing genes, is sufficiently small to ensure that adaptive substitutions are drawn from new mutations rather than standing genetic variation. Current research into the genetic basis of adaptation suggests that both assumptions are plausibly violated. We find that the qualitative positive relationship between mean dominance and KA/Kx is relatively robust to the specific conditions underlying adaptive substitution, yet the quantitative relationship between dominance and KA/Kx is quite flexible and context dependent. This flexibility may partially account for the puzzlingly variable X versus autosome substitution patterns reported in the empirical evolutionary genomics literature. The new theory unites the previously separate analysis of adaptation using new mutations versus standing genetic variation and makes several useful predictions about the interaction between genetic architecture, evolutionary genetic constraints, and effective population size in determining the ratio of adaptive substitution between autosomal and X-linked genes.
dominance; epistasis; genetics of adaptation; soft sweeps; molecular evolution
Variation in reproductive success has long been thought to be mediated in part by genes encoding seminal proteins. Here we explore the effect on male reproductive phenotypes of X-linked polymorphisms, a chromosome that is depauperate in genes encoding seminal proteins. Using 57 X chromosome substitution lines, sperm competition was tested both when the males from the wild-extracted line were the first to mate (“defense” crosses), followed by a tester male, and when extracted-line males were the second to mate, after a tester male (“offfense” crosses). We scored the proportion of progeny sired by each male, the fecundity, the remating rate and refractoriness to remating, and tested the significance of variation among lines. Eleven candidate genes were chosen based on previous studies, and portions of these genes were sequenced in all 57 lines. A total of 131 polymorphisms were tested for associations with the reproductive phenotypes using linear models. Nine polymorphisms in 4 genes were found to show significant associations (at a 5% FDR). Overall, it appears that the X chromosomes harbor abundant variation in sperm competition, especially considering the paucity of seminal protein genes. This suggests that much of the male reproductive variation lies outside of genes that encode seminal proteins.
Not all cigarette smokers develop chronic obstructive pulmonary disease (COPD), and discovering susceptibility factors is an important research priority. The oxidative burden of smoking may overwhelm antioxidant defenses, and vulnerabilities may exist as a result of sequence variants in genes encoding antioxidant enzymes. This study explored the association between genetic variation in a network of antioxidant enzymes and lung phenotypes. Linear models evaluated single locus marker associations in 2,387 European and African American participants in the Health, Aging, and Body Composition (Health ABC) Study. After correcting for multiple comparisons, 15 statistically significant associations were identified, all of which were for SNP by smoking interactions. The most statistically significant findings were in genes encoding members of the isocitrate dehydrogenase gene family (IDH3A, IDH3B, IDH2). For rs6107100 (IDH3B) the variant genotype was associated with a difference of 6% in the FEV1/FVC ratio in African American current smokers, but the SNP had little or no association with FEV1/FVC in former and never smokers (nominal pinteraction=5 × 10−6). A variant in peroxiredoxin gene (rs9787810, PRDX5) was associated with lower %predicted FEV1 and a lower ratio in European American current smokers, with little or no association in other smoking groups (nominal pinteraction=0.0001 and 0.0003, respectively). The studied genes have not been reported in previous candidate gene association studies, and thus the findings suggest novel mechanisms and targets for future research, and provide evidence for a contribution of sequence variation in genes encoding antioxidant enzymes to susceptibility in smokers.
Antioxidant enzymes; Lung function
X inactivation—the transcriptional silencing of one X chromosome copy per female somatic cell—is universal among therian mammals, yet the choice of which X to silence exhibits considerable variation among species. X inactivation strategies can range from strict paternally inherited X inactivation (PXI), which renders females haploid for all maternally inherited alleles, to unbiased random X inactivation (RXI), which equalizes expression of maternally and paternally inherited alleles in each female tissue. However, the underlying evolutionary processes that might account for this observed diversity of X inactivation strategies remain unclear. We present a theoretical population genetic analysis of X inactivation evolution and specifically consider how conditions of dominance, linkage, recombination, and sex-differential selection each influence evolutionary trajectories of X inactivation. The results indicate that a single, critical interaction between allelic dominance and sex-differential selection can select for a broad and continuous range of X inactivation strategies, including unequal rates of inactivation between maternally and paternally inherited X chromosomes. RXI is favored over complete PXI as long as alleles deleterious to female fitness are sufficiently recessive, and the criteria for RXI evolution is considerably more restrictive when fitness variation is sexually antagonistic (i.e., alleles deleterious to females are beneficial to males) relative to variation that is deleterious to both sexes. Evolutionary transitions from PXI to RXI also generally increase mean relative female fitness at the expense of decreased male fitness. These results provide a theoretical framework for predicting and interpreting the evolution of chromosome-wide expression of X-linked genes and lead to several useful predictions that could motivate future studies of allele-specific gene expression variation.
With the exception of its most primitive members, mammal species practice X inactivation, where one copy of each X chromosome pair is silenced in each cell of the female body. The particular copy of the X that is silenced nevertheless shows considerable variability among species, and the evolutionary causes for this variability remain unclear. Here, we show that X inactivation strategies are likely to evolve in response to the sex-differential fitness properties of X-linked genetic variation. Genetic variation with similar effects on male and female fitness will generally favor the evolution of random X inactivation, potentially including preferential inactivation of the maternally inherited X chromosome. Variation with opposing fitness effects in each sex (“sexually antagonistic” variation, which includes mutations that both decrease female fitness and enhance male fitness) selects for preferential or complete inactivation of the paternally inherited X. Paternally biased X inactivation patterns appear to be common in nature, which suggests that sexually antagonistic genetic variation might be an important factor underlying the evolution of X inactivation. The theory provides a conceptual framework for understanding the evolution of X inactivation strategies and generates several novel predictions that may soon be tested with modern genome sequencing technologies.
Human populations have experienced recent explosive growth, expanding by at least three orders of magnitude over the past 400 generations. This departure from equilibrium skews patterns of genetic variation and distorts basic principles of population genetics. We characterized the empirical signatures of explosive growth on the site frequency spectrum and found that the discrepancy in rare variant abundance across demographic modeling studies is mostly due to differences in sample size. Rapid recent growth increases the load of rare variants and is likely to play a role in the individual genetic burden of complex disease risk. Hence, the extreme recent human population growth needs to be taken into consideration in studying the genetics of complex diseases and traits.
Various methods have been developed for identifying gene–gene interactions in genome-wide association studies (GWAS). However, most methods focus on individual markers as the testing unit, and the large number of such tests drastically erodes statistical power. In this study, we propose novel interaction tests of quantitative traits that are gene-based and that confer advantage in both statistical power and biological interpretation. The framework of gene-based gene–gene interaction (GGG) tests combine marker-based interaction tests between all pairs of markers in two genes to produce a gene-level test for interaction between the two. The tests are based on an analytical formula we derive for the correlation between marker-based interaction tests due to linkage disequilibrium. We propose four GGG tests that extend the following P value combining methods: minimum P value, extended Simes procedure, truncated tail strength, and truncated P value product. Extensive simulations point to correct type I error rates of all tests and show that the two truncated tests are more powerful than the other tests in cases of markers involved in the underlying interaction not being directly genotyped and in cases of multiple underlying interactions. We applied our tests to pairs of genes that exhibit a protein–protein interaction to test for gene-level interactions underlying lipid levels using genotype data from the Atherosclerosis Risk in Communities study. We identified five novel interactions that are not evident from marker-based interaction testing and successfully replicated one of these interactions, between SMAD3 and NEDD9, in an independent sample from the Multi-Ethnic Study of Atherosclerosis. We conclude that our GGG tests show improved power to identify gene-level interactions in existing, as well as emerging, association studies.
Epistasis is likely to play a significant role in complex diseases or traits and is one of the many possible explanations for “missing heritability.” However, epistatic interactions have been difficult to detect in genome-wide association studies (GWAS) due to the limited power caused by the multiple-testing correction from the large number of tests conducted. Gene-based gene–gene interaction (GGG) tests might hold the key to relaxing the multiple-testing correction burden and increasing the power for identifying epistatic interactions in GWAS. Here, we developed GGG tests of quantitative traits by extending four P value combining methods and evaluated their type I error rates and power using extensive simulations. All four GGG tests are more powerful than a principal component-based test. We also applied our GGG tests to data from the Atherosclerosis Risk in Communities study and found five gene-level interactions associated with the levels of total cholesterol and high-density lipoprotein cholesterol (HDL-C). One interaction between SMAD3 and NEDD9 on HDL-C was further replicated in an independent sample from the Multi-Ethnic Study of Atherosclerosis.
Adverse early care is associated with attention regulatory problems, but not all so exposed develop attention problems. In a sample of 612 youth (girls=432, M=11.82 yrs, SD=1.5) adopted from institutions (e.g., orphanages) in 25 countries, we examined whether the Val66Met polymorphism of the BDNF gene moderates attention problems associated with the duration of institutional care. Parent-reported attention problem symptoms were collected using the MacArthur Health and Behavior Questionnaire. DNA was genotyped for the BDNF Val66Met (rs6265) SNP. Among youth from SE Asia, the predominant genotype was Val/Met, while among youth from Russia/Europe and Caribbean/South America the predominant genotype was Val/Val. For analysis, youth were grouped as carrying Val/Val or AnyMet alleles. Being female, being from SE Asia, and being younger when adopted were associated with fewer attention regulatory problem symptoms. Youth carrying at least one copy of the Met allele were more sensitive to the duration of deprivation, yielding an interaction that followed a differential susceptibility pattern. Thus, youth with Val/Met or Met/Met genotypes exhibited fewer symptoms than Val/Val genotypes when adoption was very early and more symptoms when adoption occurred later in development. Similar patterns were observed when SE Asian youth and youth from other parts of the world were analyzed separately.
Single cells and multicellular tissues rapidly heal wounds. These processes are considered distinct, but one mode of healing—Rho GTPase-dependent formation and closure of a purse string of actin filaments (F-actin) and myosin-2 around wounds—occurs in single cells (1,2) and in epithelia (3-10). Here we show that wounding of one cell in Xenopus embryos elicits Rho GTPase activation around the wound and at the nearest cell-cell junctions in the neighbor cells. F-actin and myosin-2 accumulate at the junctions as well as around the wound itself, and as the resultant actomyosin array closes over the wound site, junctional F-actin and myosin-2 become mechanically integrated with the actin and myosin-2 around the wound, forming a hybrid purse string. When cells are ablated rather than wounded, Rho GTPase activation and F-actin accumulation occur at cell-cell junctions surrounding the ablated cell, and the purse string closes the hole in the epithelium. Elevation of intracellular free calcium, an essential upstream signal for the single cell wound response (2,11), also occurs at the cell-cell contacts and in neighbor cells. Thus, the single and multicellular purse string wound responses represent points on a signaling and mechanical continuum that are integrated by cell-cell junctions.
Genetic variation among females is likely to influence the outcome of both pre- and post-copulatory sexual selection in Drosophila melanogaster. Here we use association testing to survey natural variation in 10 candidate female genes for their effects on female reproduction. Females from 91 chromosome 2 substitution lines were scored for phenotypes affecting pre- and post-copulatory sexual selection such as mating and remating rate, propensity to use sperm from the second male to mate, and measures of fertility. There were significant genetic contributions to phenotypic variation for all the traits measured. Resequencing of the 10 candidate genes in the 91 lines yielded 68 nonsynonymous polymorphisms which were tested for associations with the measured phenotypes. Twelve significant associations (markerwise P < 0.01) were identified. Polymorphisms in the putative serine protease homolog CG9897 and the putative odorant binding protein CG11797 associated with female propensity to remate and met an experimentwise significance of P < 0.05. Several other associations, including those impacting both fertility and female remating rate suggest that sperm storage might be an important factor mitigating female influence on sexual selection.
Sperm competition; association testing; female mating; sexual selection; genotype-phenotype
Total cholesterol, low-density lipoprotein cholesterol, triglyceride, and high-density lipoprotein cholesterol (HDL-C) levels are among the most important risk factors for coronary artery disease. We tested for gene–gene interactions affecting the level of these four lipids based on prior knowledge of established genome-wide association study (GWAS) hits, protein–protein interactions, and pathway information. Using genotype data from 9,713 European Americans from the Atherosclerosis Risk in Communities (ARIC) study, we identified an interaction between HMGCR and a locus near LIPC in their effect on HDL-C levels (Bonferroni corrected Pc = 0.002). Using an adaptive locus-based validation procedure, we successfully validated this gene–gene interaction in the European American cohorts from the Framingham Heart Study (Pc = 0.002) and the Multi-Ethnic Study of Atherosclerosis (MESA; Pc = 0.006). The interaction between these two loci is also significant in the African American sample from ARIC (Pc = 0.004) and in the Hispanic American sample from MESA (Pc = 0.04). Both HMGCR and LIPC are involved in the metabolism of lipids, and genome-wide association studies have previously identified LIPC as associated with levels of HDL-C. However, the effect on HDL-C of the novel gene–gene interaction reported here is twice as pronounced as that predicted by the sum of the marginal effects of the two loci. In conclusion, based on a knowledge-driven analysis of epistasis, together with a new locus-based validation method, we successfully identified and validated an interaction affecting a complex trait in multi-ethnic populations.
Genome-wide association studies (GWAS) have identified many loci associated with complex human traits or diseases. However, the fraction of heritable variation explained by these loci is often relatively low. Gene–gene interactions might play a significant role in complex traits or diseases and are one of the many possible factors contributing to the missing heritability. However, to date only a few interactions have been found and validated in GWAS due to the limited power caused by the need for multiple-testing correction for the very large number of tests conducted. Here, we used three types of prior knowledge, known GWAS hits, protein–protein interactions, and pathway information, to guide our search for gene–gene interactions affecting four lipid levels. We identified an interaction between HMGCR and a locus near LIPC in their effect on high-density lipoprotein cholesterol (HDL-C) and another pair of loci that interact in their effect on low-density lipoprotein cholesterol (LDL-C). We validated the interaction on HDL-C in a number of independent multiple-ethnic populations, while the interaction underlying LDL-C did not validate. The prior knowledge-driven searching approach and a locus-based validation procedure show the potential for dissecting and validating gene–gene interactions in current and future GWAS.
The female-specific W chromosomes and male-specific Y chromosomes have proven difficult to assemble with whole-genome shotgun methods, creating a demand for new approaches to identify sequence contigs specific to these sex chromosomes. Here, we develop and apply a novel method for identifying sequences that are W-specific.
Using the Illumina Genome Analyzer, we generated sequence reads from a male domestic chicken (ZZ) and mapped them to the existing female (ZW) genome sequence. This method allowed us to identify segments of the female genome that are underrepresented in the male genome and are therefore likely to be female specific. We developed a Bayesian classifier to automate the calling of W-linked contigs and successfully identified more than 60 novel W-specific sequences.
Our classifier can be applied to improve heterogametic whole-genome shotgun assemblies of the W or Y chromosome of any organism. This study greatly improves our knowledge of the W chromosome and will enhance future studies of avian sex determination and sex chromosome evolution.
Sex chromosomes; Next-generation sequencing
The ratio of genetic diversity on chromosome X to that on the autosomes is sensitive to both natural selection and demography. Based on whole-genome sequences of 69 females, we report that while this ratio increases with genetic distance from genes across populations, it is lower in Europeans than in West Africans independent of proximity to genes. This relative reduction is most parsimoniously explained by differences in demographic history without the need to invoke natural selection.
Measurement of metabolic and physiological parameters in replicated crosses of Drosophila melanogaster inbred lines reveals that environmental and genetic perturbations uncover substantially different networks of metabolic regulation.
We collected extensive data on enzyme activities and physiological parameters from replicated crosses of D. melanogaster inbred lines.We implemented a multivariate hierarchical Bayesian model to separately assess genetic and environmental covariation among system components and infer metabolic regulatory networks.Networks revealed by both environmental and genetic perturbations are similar among populations and between sexes.Environmental and genetic networks differ substantially, suggesting that environmental changes and mutations would have different systemic effects even when their primary targets are the same.
Progress in systems biology depends on accurate descriptions of biological networks. Connections in a regulatory network are identified as correlations of gene expression across a set of environmental or genetic perturbations. To use this information to predict system behavior, we must test how the nature of perturbations affects topologies of networks they reveal. To probe this question, we focused on metabolism of Drosophila melanogaster. Our source of perturbations is a set of crosses among 92 wild-derived lines from five populations, replicated in a manner permitting separate assessment of the effects of genetic variation and environmental fluctuation. We directly assayed activities of enzymes and levels of metabolites. Using a multivariate Bayesian model, we estimated covariance among metabolic parameters and built fine-grained probabilistic models of network topology. The environmental and genetic co-regulation networks are substantially the same among five populations. However, genetic and environmental perturbations reveal qualitative differences in metabolic regulation, suggesting that environmental shifts, such as diet modifications, produce different systemic effects than genetic changes, even if the primary targets are the same.
Bayesian model; G matrix; hierarchical model; metabolic network
Sex-biased genes -- genes that are differentially expressed within males and females -- are nonrandomly distributed across animal genomes, with sex chromosomes and autosomes often carrying markedly different concentrations of male- and female-biased genes. These linkage patterns are often gene- and lineage-dependent, differing between functional genetic categories and between species. While sex-specific selection is often hypothesized to shape the evolution of sex-linked and autosomal gene content, population genetics theory has yet to account for many of the gene- and lineage-specific idiosyncrasies emerging from the empirical literature. With the goal of improving the connection between evolutionary theory and a rapidly growing body of genome-wide empirical studies, we extend previous population genetics theory of sex-specific selection by developing and analyzing a biologically informed model that incorporates sex linkage, pleiotropy, recombination, and epistasis, factors that are likely to vary between genes and between species. Our results demonstrate that sex-specific selection and sex-specific recombination rates can generate, and are compatible with, the gene- and species-specific linkage patterns reported in the genomics literature. The theory suggests that sexual selection may strongly influence the architectures of animal genomes, as well as the chromosomal distribution of fixed substitutions underlying sexually dimorphic traits.
sex chromosomes; sexual antagonism; antagonistic pleiotropy; epistasis; sex-biased genes
Sequence variants in genes functioning in folate-mediated one-carbon metabolism are hypothesized to lead to changes in levels of homocysteine and DNA methylation, which, in turn, are associated with risk of cardiovascular disease.
330 SNPs in 52 genes were studied in relation to plasma homocysteine and global genomic DNA methylation. SNPs were selected based on functional effects and gene coverage, and assays were completed on the Illumina Goldengate platform. Age-, smoking-, and nutrient-adjusted genotype--phenotype associations were estimated in regression models.
Using a nominal P ≤ 0.005 threshold for statistical significance, 20 SNPs were associated with plasma homocysteine, 8 with Alu methylation, and 1 with LINE-1 methylation. Using a more stringent false discovery rate threshold, SNPs in FTCD, SLC19A1, and SLC19A3 genes remained associated with plasma homocysteine. Gene by vitamin B-6 interactions were identified for both Alu and LINE-1 methylation, and epistatic interactions with the MTHFR rs1801133 SNP were identified for the plasma homocysteine phenotype. Pleiotropy involving the MTHFD1L and SARDH genes for both plasma homocysteine and Alu methylation phenotypes was identified.
No single gene was associated with all three phenotypes, and the set of the most statistically significant SNPs predictive of homocysteine or Alu or LINE-1 methylation was unique to each phenotype. Genetic variation in folate-mediated one-carbon metabolism, other than the well-known effects of the MTHFR c.665C>T (known as c.677 C>T, rs1801133, p.Ala222Val), is predictive of cardiovascular disease biomarkers.
Duplications play a significant role in both extremes of the phenotypic spectrum of newly arising mutations: they can have severe deleterious effects (e.g. duplications underlie a variety of diseases) but can also be highly advantageous. The phenotypic potential of newly arisen duplications has stimulated wide interest in both the mutational and selective processes shaping these variants in the genome. Here we take advantage of the Drosophila simulans–Drosophila melanogaster genetic system to further our understanding of both processes. Regarding mutational processes, the study of two closely related species allows investigation of the potential existence of shared duplication hotspots, and the similarities and differences between the two genomes can be used to dissect its underlying causes. Regarding selection, the difference in the effective population size between the two species can be leveraged to ask questions about the strength of selection acting on different classes of duplications. In this study, we conducted a survey of duplication polymorphisms in 14 different lines of D. simulans using tiling microarrays and combined it with an analogous survey for the D. melanogaster genome. By integrating the two datasets, we identified duplication hotspots conserved between the two species. However, unlike the duplication hotspots identified in mammalian genomes, Drosophila duplication hotspots are not associated with sequences of high sequence identity capable of mediating non-allelic homologous recombination. Instead, Drosophila duplication hotspots are associated with late-replicating regions of the genome, suggesting a link between DNA replication and duplication rates. We also found evidence supporting a higher effectiveness of selection on duplications in D. simulans than in D. melanogaster. This is also true for duplications segregating at high frequency, where we find evidence in D. simulans that a sizeable fraction of these mutations is being driven to fixation by positive selection.
DNA duplications are important contributors to the phenotypic differences observed between individuals. These mutations can disrupt the normal functioning of genes and so are often associated with disease. But because they can add genetic information they can also lead to evolutionary change. Understanding how selection and non-random mutation processes shape the distribution of duplications throughout the genome is important to elucidate both the medical and evolutionary impacts of these mutations. Here, we examined the roles of selection and mutation in shaping patterns of duplication polymorphisms across the genomes of the fruit fly Drosophila melanogaster and its sister species, D. simulans. We found that selection is pervasive in both genomes but is more efficient in D. simulans than in D. melanogaster. We also found that these two species have shared duplication hotspots, i.e. orthologous regions experiencing high rates of duplication in the two genomes. After excluding the hypothesis that Drosophila duplication hotspots are associated with regions of the genome rich in segmental duplications (as observed for mammalian genomes), we show that they are associated with late-replicating regions of the genome. Our work therefore proposes a link between DNA replication and rates of duplication across the genome.
In order to investigate divergence of immune regulation among Drosophila species, we have engaged in a study of innate immune function in F1 hybrids of Drosophila melanogaster and D. simulans. If pathways have diverged between the species such that incompatibilities have arisen between interacting components of the immune network, we expect the hybrids to display dysregulation of immune genes. We have quantified gene induction in hybrid and parental flies in response to bacterial infection. These results show that although the hybrids do not suffer widespread immune breakdown, they show significantly different regulation of many immune genes relative to the parents. We examine this divergence in terms of additivity and expression differences among genes, observing distinct patterns of dysregulation among functional groups within the pathways of the innate immune system. The functional groups most sensitive to misexpression in the hybrids are the downstream components of the network, indicative of some propagation of dysregulation throughout the immune pathways. Interestingly, this dysregulation does not appear to associate with phenotypic differences in bacterial load after infection in hybrids, possibly highlighting some robustness of function of the innate immune response to perturbations like hybridization.
Drosophila; innate immunity; hybrid dysregulation; misexpression; functional divergence
There has been considerable excitement over the ability to construct linkage maps based only on genome-wide genotype data for single nucleotide polymorphic sites (SNPs) in a population sample. These maps, which are derived from estimates of linkage disequilibrium (LD), rely on population genetics theory to relate the decay of LD to the local rate of recombination, but other population processes also come into play. Here we contrast these LD maps to the classically derived, pedigree-based human recombination maps. The LD maps have a level of resolution greatly exceeding that of the pedigree maps, and at this fine scale, sperm typing allows a means of validation. While at a gross level both the pedigree maps and the sperm typing methods generally agree with LD maps, there are significant local differences between them, and the fact that these maps measure different genetic features should be remembered when using them for other genetic inferences.
linkage disequilibrium; genetic linkage; hotspot; population recombination rate; recombination intensity