PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (565776)

Clipboard (0)
None

Related Articles

1.  Transcriptome Sequencing from Diverse Human Populations Reveals Differentiated Regulatory Architecture 
PLoS Genetics  2014;10(8):e1004549.
Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP). The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and regulatory genetics across populations from the broadest points of human migration history yet sampled.
Author Summary
Previous gene expression studies have identified factors influencing population-level variation in gene regulation. However, these efforts have been limited to a small set of well-studied populations. By leveraging the high resolution of RNA sequencing and broad population sampling, we survey the landscape of transcriptome variation across a globally distributed set of seven populations that span a breadth of human genetic variation and major dispersal events. We assess differences in gene expression, transcript structure, and regulatory variation. We find only 44 transcripts that show significant differences in expression, likely as a result of the small sample size, but we find that 25% of the variance in gene expression is due to population differences. This is a larger fraction than previously observed, and it is likely due to the greater breadth of human diversity assayed in this study. We also find that population-specific variance is mostly due to transcription variability rather than the configuration of expressed gene products. Additionally, known common regulatory variants have similar effects across populations including those we study here. These data and results serve as a resource cataloging the wide array of gene expression regulation affecting population variation among diverse groups, improving our understanding of transcriptional diversity.
doi:10.1371/journal.pgen.1004549
PMCID: PMC4133153  PMID: 25121757
2.  An Integrated Linkage, Chromosome, and Genome Map for the Yellow Fever Mosquito Aedes aegypti 
Background
Aedes aegypti, the yellow fever mosquito, is an efficient vector of arboviruses and a convenient model system for laboratory research. Extensive linkage mapping of morphological and molecular markers localized a number of quantitative trait loci (QTLs) related to the mosquito's ability to transmit various pathogens. However, linking the QTLs to Ae. aegypti chromosomes and genomic sequences has been challenging because of the poor quality of polytene chromosomes and the highly fragmented genome assembly for this species.
Methodology/Principal Findings
Based on the approach developed in our previous study, we constructed idiograms for mitotic chromosomes of Ae. aegypti based on their banding patterns at early metaphase. These idiograms represent the first cytogenetic map developed for mitotic chromosomes of Ae. aegypti. One hundred bacterial artificial chromosome clones carrying major genetic markers were hybridized to the chromosomes using fluorescent in situ hybridization. As a result, QTLs related to the transmission of the filarioid nematode Brugia malayi, the avian malaria parasite Plasmodium gallinaceum, and the dengue virus, as well as sex determination locus and 183 Mbp of genomic sequences were anchored to the exact positions on Ae. aegypti chromosomes. A linear regression analysis demonstrated a good correlation between positions of the markers on the physical and linkage maps. As a result of the recombination rate variation along the chromosomes, 12 QTLs on the linkage map were combined into five major clusters of QTLs on the chromosome map.
Conclusion
This study developed an integrated linkage, chromosome, and genome map—iMap—for the yellow fever mosquito. Our discovery of the localization of multiple QTLs in a few major chromosome clusters suggests a possibility that the transmission of various pathogens is controlled by the same genomic loci. Thus, the iMap will facilitate the identification of genomic determinants of traits responsible for susceptibility or refractoriness of the mosquito to diverse pathogens.
Author Summary
About half of the human population is under risk of dengue infection. Because of the absence of a vaccine or drug treatment, the prevention of this disease largely relies on controlling its major vector mosquito Aedes aegypti. Availability of the complete genome sequence for this mosquito offers the potential to help in the identification of novel disease control strategies. An efficient vector of arboviruses, Ae. aegypti is also a convenient model for laboratory studies. A number of genetic loci related to the remarkable ability of this mosquito to transmit various pathogens were genetically mapped to the three linkage groups corresponding to the three individual chromosomes of the mosquito. However, the exact physical positions of the genetic loci and genomic sequences on the chromosomes were unknown. In this study, we developed maps for mitotic chromosomes of Ae. aegypti and localized 100 clones carrying major genetic markers, which were previously used for mapping genetic loci associated with the pathogens' transmission. Finally, linkage, chromosome, and genome maps of Ae. aegypti were integrated. Anchoring of the genomic sequences associated with genetic markers to the chromosomes of Ae. aegypti will help to identify candidate genes that might be utilized for developing advanced genome-based strategies for vector control.
doi:10.1371/journal.pntd.0002052
PMCID: PMC3573077  PMID: 23459230
3.  A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations 
PLoS Genetics  2012;8(8):e1002886.
Multivariate statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize the structure of human genetic variation, often in easily visualized two-dimensional maps. Many recent studies have reported similarity between geographic maps of population locations and MDS or PCA maps of genetic variation inferred from single-nucleotide polymorphisms (SNPs). However, this similarity has been evident primarily in a qualitative sense; and, because different multivariate techniques and marker sets have been used in different studies, it has not been possible to formally compare genetic variation datasets in terms of their levels of similarity with geography. In this study, using genome-wide SNP data from 128 populations worldwide, we perform a systematic analysis to quantitatively evaluate the similarity of genes and geography in different geographic regions. For each of a series of regions, we apply a Procrustes analysis approach to find an optimal transformation that maximizes the similarity between PCA maps of genetic variation and geographic maps of population locations. We consider examples in Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, as well as in a worldwide sample, finding that significant similarity between genes and geography exists in general at different geographic levels. The similarity is highest in our examples for Asia and, once highly distinctive populations have been removed, Sub-Saharan Africa. Our results provide a quantitative assessment of the geographic structure of human genetic variation worldwide, supporting the view that geography plays a strong role in giving rise to human population structure.
Author Summary
The spatial pattern of human genetic variation provides a basis for investigating the history of human migrations. Statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been used to summarize spatial patterns of genetic variation, typically by placing individuals on a two-dimensional map in such a way that pairwise Euclidean distances between individuals on the map approximately reflect corresponding genetic relationships. Although similarity between these statistical maps of genetic variation and the geographic maps of sampling locations is often observed, it has not been assessed systematically across different parts of the world. In this study, we combine genome-wide SNP data from more than 100 populations worldwide to perform a formal comparison between genes and geography in different regions. By examining a worldwide sample and samples from Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, we find that significant similarity between genes and geography exists in general in different geographic regions and at different geographic levels. Surprisingly, the highest similarity is found in Asia, even though the geographic barrier of the Himalaya Mountains has created a discontinuity on the PCA map of genetic variation.
doi:10.1371/journal.pgen.1002886
PMCID: PMC3426559  PMID: 22927824
4.  Power to Detect Risk Alleles Using Genome-Wide Tag SNP Panels 
PLoS Genetics  2007;3(10):e170.
Advances in high-throughput genotyping and the International HapMap Project have enabled association studies at the whole-genome level. We have constructed whole-genome genotyping panels of over 550,000 (HumanHap550) and 650,000 (HumanHap650Y) SNP loci by choosing tag SNPs from all populations genotyped by the International HapMap Project. These panels also contain additional SNP content in regions that have historically been overrepresented in diseases, such as nonsynonymous sites, the MHC region, copy number variant regions and mitochondrial DNA. We estimate that the tag SNP loci in these panels cover the majority of all common variation in the genome as measured by coverage of both all common HapMap SNPs and an independent set of SNPs derived from complete resequencing of genes obtained from SeattleSNPs. We also estimate that, given a sample size of 1,000 cases and 1,000 controls, these panels have the power to detect single disease loci of moderate risk (λ ∼ 1.8–2.0). Relative risks as low as λ ∼ 1.1–1.3 can be detected using 10,000 cases and 10,000 controls depending on the sample population and disease model. If multiple loci are involved, the power increases significantly to detect at least one locus such that relative risks 20%–35% lower can be detected with 80% power if between two and four independent loci are involved. Although our SNP selection was based on HapMap data, which is a subset of all common SNPs, these panels effectively capture the majority of all common variation and provide high power to detect risk alleles that are not represented in the HapMap data.
Author Summary
Advances in high-throughput genotyping technology and the International HapMap Project have enabled genetic association studies at the whole-genome level. Our paper describes two genome-wide SNP panels that contain tag SNPs derived from the International HapMap Project. Tag SNPs are proxies for groups of highly correlated SNPs. Information can be captured for the entire group of correlated SNPs by genotyping only one representative SNP, the tag SNP. These whole-genome SNP panels also contain additional content thought to be overrepresented in disease, such as amino acid–changing nonsynonymous SNPs and mitochondrial SNPs. We show that these panels cover the genome with very high efficiency as measured by coverage of all HapMap SNPs and a set of SNPs derived from completely resequenced genes from the Seattle SNPs database. We also show that these panels have high power to detect disease risk alleles for both HapMap and non-HapMap SNPs. In complex disease where multiple risk alleles are believed to be involved, we show that the ability to detect at least one risk allele with the tag SNP panels is also high.
doi:10.1371/journal.pgen.0030170
PMCID: PMC2000969  PMID: 17922574
5.  Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context 
PLoS Genetics  2009;5(8):e1000602.
Epigenetic control of gene transcription is critical for normal human development and cellular differentiation. While alterations of epigenetic marks such as DNA methylation have been linked to cancers and many other human diseases, interindividual epigenetic variations in normal tissues due to aging, environmental factors, or innate susceptibility are poorly characterized. The plasticity, tissue-specific nature, and variability of gene expression are related to epigenomic states that vary across individuals. Thus, population-based investigations are needed to further our understanding of the fundamental dynamics of normal individual epigenomes. We analyzed 217 non-pathologic human tissues from 10 anatomic sites at 1,413 autosomal CpG loci associated with 773 genes to investigate tissue-specific differences in DNA methylation and to discern how aging and exposures contribute to normal variation in methylation. Methylation profile classes derived from unsupervised modeling were significantly associated with age (P<0.0001) and were significant predictors of tissue origin (P<0.0001). In solid tissues (n = 119) we found striking, highly significant CpG island–dependent correlations between age and methylation; loci in CpG islands gained methylation with age, loci not in CpG islands lost methylation with age (P<0.001), and this pattern was consistent across tissues and in an analysis of blood-derived DNA. Our data clearly demonstrate age- and exposure-related differences in tissue-specific methylation and significant age-associated methylation patterns which are CpG island context-dependent. This work provides novel insight into the role of aging and the environment in susceptibility to diseases such as cancer and critically informs the field of epigenomics by providing evidence of epigenetic dysregulation by age-related methylation alterations. Collectively we reveal key issues to consider both in the construction of reference and disease-related epigenomes and in the interpretation of potentially pathologically important alterations.
Author Summary
The causes and extent of tissue-specific interindividual variation in human epigenomes are underappreciated and, hence, poorly characterized. We surveyed over 200 carefully annotated human tissue samples from ten anatosites at 1,413 CpGs for methylation alterations to appraise the nature of phenotypically, and hence potentially clinically important epigenomic alterations. Within tissue types, across individuals, we found variation in methylation that was significantly related to aging and environmental exposures such as tobacco smoking. Individual variation in age- and exposure-related methylation may significantly contribute to increased susceptibility to several diseases. As the NIH–funded HapMap project is critically contributing to annotating the human reference genome defining normal genetic variability, our work raises key issues to consider in the construction of reference epigenomes. It is well recognized that understanding genetic variation is essential to understanding disease. Our work, and the known interplay of epigenetics and genetics, makes it equally clear that a more complete characterization of epigenetic variation and its sources must be accomplished to reach the goal of a complete understanding of disease. Additional research is absolutely necessary to define the mechanisms controlling epigenomic variation. We have begun to lay the foundations for essential normal tissue controls for comparison to diseased tissue, which will allow the identification of the most crucial disease-related alterations and provide more robust targets for novel treatments.
doi:10.1371/journal.pgen.1000602
PMCID: PMC2718614  PMID: 19680444
6.  An Evaluation of the Performance of Tag SNPs Derived from HapMap in a Caucasian Population 
PLoS Genetics  2006;2(3):e27.
The Haplotype Map (HapMap) project recently generated genotype data for more than 1 million single-nucleotide polymorphisms (SNPs) in four population samples. The main application of the data is in the selection of tag single-nucleotide polymorphisms (tSNPs) to use in association studies. The usefulness of this selection process needs to be verified in populations outside those used for the HapMap project. In addition, it is not known how well the data represent the general population, as only 90–120 chromosomes were used for each population and since the genotyped SNPs were selected so as to have high frequencies. In this study, we analyzed more than 1,000 individuals from Estonia. The population of this northern European country has been influenced by many different waves of migrations from Europe and Russia. We genotyped 1,536 randomly selected SNPs from two 500-kbp ENCODE regions on Chromosome 2. We observed that the tSNPs selected from the CEPH (Centre d'Etude du Polymorphisme Humain) from Utah (CEU) HapMap samples (derived from US residents with northern and western European ancestry) captured most of the variation in the Estonia sample. (Between 90% and 95% of the SNPs with a minor allele frequency of more than 5% have an r2 of at least 0.8 with one of the CEU tSNPs.) Using the reverse approach, tags selected from the Estonia sample could almost equally well describe the CEU sample. Finally, we observed that the sample size, the allelic frequency, and the SNP density in the dataset used to select the tags each have important effects on the tagging performance. Overall, our study supports the use of HapMap data in other Caucasian populations, but the SNP density and the bias towards high-frequency SNPs have to be taken into account when designing association studies.
Synopsis
The recent completion of the Haplotype Map (HapMap) project of the human genome provides considerable information on the patterns of variation in the genome of four populations. One of the applications is a description of a set of tags that act as proxies for many other surrounding variants. This will greatly help researchers in their quest to find complex disease genes by reducing the number of genetic variants to test in association studies. To evaluate its usefulness, several aspects of the map, including its transferability to other populations, still needed to be verified experimentally. Using genomic regions where variants had been thoroughly documented in Caucasian samples from Estonia, the researchers found that the transferability of tags is extremely good. The researchers also found that variants with low frequency in the general population (i.e., less than 5%) could not be accurately captured with tags, and that the regional density of variants in the HapMap project had a major impact on the performance of the tags. This research indicates that the HapMap project will be useful, but that careful consideration of hypotheses and study design will be essential for the success of association studies.
doi:10.1371/journal.pgen.0020027
PMCID: PMC1391920  PMID: 16532062
7.  Haplotype Block Structure Is Conserved across Mammals 
PLoS Genetics  2006;2(7):e121.
Genetic variation in genomes is organized in haplotype blocks, and species-specific block structure is defined by differential contribution of population history effects in combination with mutation and recombination events. Haplotype maps characterize the common patterns of linkage disequilibrium in populations and have important applications in the design and interpretation of genetic experiments. Although evolutionary processes are known to drive the selection of individual polymorphisms, their effect on haplotype block structure dynamics has not been shown. Here, we present a high-resolution haplotype map for a 5-megabase genomic region in the rat and compare it with the orthologous human and mouse segments. Although the size and fine structure of haplotype blocks are species dependent, there is a significant interspecies overlap in structure and a tendency for blocks to encompass complete genes. Extending these findings to the complete human genome using haplotype map phase I data reveals that linkage disequilibrium values are significantly higher for equally spaced positions in genic regions, including promoters, as compared to intergenic regions, indicating that a selective mechanism exists to maintain combinations of alleles within potentially interacting coding and regulatory regions. Although this characteristic may complicate the identification of causal polymorphisms underlying phenotypic traits, conservation of haplotype structure may be employed for the identification and characterization of functionally important genomic regions.
Synopsis
Differences at the DNA level are the major contributant underlying the phenotypic diversity between individuals in a population. The most common type of this genetic variation are single nucleotide polymorphisms (SNPs). Although the majority of SNPs do not have a functional effect, others may affect chromosome organization, gene expression, or protein function. SNPs and their individual states (alleles) are not randomly distributed throughout the genome and within a population. Recombination and mutation events, in combination with selection processes and population history, have resulted in common block-like structures in genomes. These structures are characterized by a common combination of SNP alleles, a so-called haplotype. Selection for specific haplotypes within a population is primarily driven by the advantageous effect of an individual polymorphism in the haplotype block.
By comparing the orthologous rat, mouse, and human haplotype structure of a 5-megabase region from rat Chromosome 1, the authors now show that haplotype block structure is conserved across mammals, most prominently in genic regions, suggesting the existence of an evolutionary selection process that drives the conservation of long-range allele combinations. Indeed, genome-wide gene-centric analysis of human HapMap data revealed that equally spaced polymorphic positions in genic regions and their upstream regulatory regions are genetically more tightly linked than in non-genic regions.
These findings may complicate the identification of causal polymorphisms underlying phenotypic traits, because in regions where haplotype structure is conserved, not a single polymorphism, but rather combinations of tightly linked polymorphisms could contribute to the phenotypic difference. On the other hand, conservation of haplotype structure may be employed for the identification and characterization of functionally important genomic regions.
doi:10.1371/journal.pgen.0020121
PMCID: PMC1523234  PMID: 16895449
8.  Genome-Wide Associations of Gene Expression Variation in Humans 
PLoS Genetics  2005;1(6):e78.
The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12–13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.
Synopsis
With the finished reference sequence of the human genome now available, focus has shifted towards trying to identify all of the functional elements within the sequence. Although quite a lot of progress has been made towards identifying some classes of genomic elements, in particular protein-coding sequences, the characterization of regulatory elements remains a challenge. The authors describe the genetic mapping of regions of the genome that have functional effects on quantitative levels of gene expression. Gene expression of 630 genes was measured in cell lines derived from 60 unrelated human individuals, the same Utah residents of Northern and Western European ancestry that have been genetically well-characterized by The International HapMap Project. This paper reports significant variation among individuals with respect to levels of gene expression, and demonstrates that this quantitative trait has a genetic basis. For some genes, the genetic signal was localized to specific locations in the human genome sequence; in most cases the genomic region associated with expression variation was physically close to the gene whose expression it regulated. The authors demonstrate the feasibility of performing whole-genome association scans to map quantitative traits, and highlight statistical issues that are increasingly important for whole-genome disease mapping studies.
doi:10.1371/journal.pgen.0010078
PMCID: PMC1315281  PMID: 16362079
9.  Genetic Crossovers Are Predicted Accurately by the Computed Human Recombination Map 
PLoS Genetics  2010;6(1):e1000831.
Hotspots of meiotic recombination can change rapidly over time. This instability and the reported high level of inter-individual variation in meiotic recombination puts in question the accuracy of the calculated hotspot map, which is based on the summation of past genetic crossovers. To estimate the accuracy of the computed recombination rate map, we have mapped genetic crossovers to a median resolution of 70 Kb in 10 CEPH pedigrees. We then compared the positions of crossovers with the hotspots computed from HapMap data and performed extensive computer simulations to compare the observed distributions of crossovers with the distributions expected from the calculated recombination rate maps. Here we show that a population-averaged hotspot map computed from linkage disequilibrium data predicts well present-day genetic crossovers. We find that computed hotspot maps accurately estimate both the strength and the position of meiotic hotspots. An in-depth examination of not-predicted crossovers shows that they are preferentially located in regions where hotspots are found in other populations. In summary, we find that by combining several computed population-specific maps we can capture the variation in individual hotspots to generate a hotspot map that can predict almost all present-day genetic crossovers.
Author Summary
In eukaryotes genetic crossovers are responsible for generating genetic diversity and ensuring the proper segregation of chromosomes. Genetic crossovers are tightly clustered in hotspots. Although the existence of hotspots in humans is clearly proven, mechanisms of their formation and the regulation of meiotic recombination in general remain poorly understood. An additional complication in studies of meiotic recombination is the fact that the direct experimental mapping of human hotspots on a genome-wide scale is not feasible with current methods. The best available indirect methods compute the position of hotspots from patterns of historic associations between genetic markers in population samples. In this study we determined the positions of genetic crossovers in ten pedigrees of European origin and then compared the positions of crossovers with the hotspots computed from HapMap data. Importantly, we find that the population-averaged computed map is in close agreement with the observed distribution of genetic crossovers. We also find that cryptic hotspots that are not easily detected in the computed European map can be more effectively identified if other populations are included in the analysis. Our analysis shows that high-resolution recombination profiles are highly similar between distantly related populations and that by including computed hotspots from several populations we can predict nearly all crossovers.
doi:10.1371/journal.pgen.1000831
PMCID: PMC2813264  PMID: 20126534
10.  Human metabolic profiles are stably controlled by genetic and environmental variation 
A comprehensive variation map of the human metabolome identifies genetic and stable-environmental sources as major drivers of metabolite concentrations. The data suggest that sample sizes of a few thousand are sufficient to detect metabolite biomarkers predictive of disease.
We designed a longitudinal twin study to characterize the genetic, stable-environmental, and longitudinally fluctuating influences on metabolite concentrations in two human biofluids—urine and plasma—focusing specifically on the representative subset of metabolites detectable by 1H nuclear magnetic resonance (1H NMR) spectroscopy.We identified widespread genetic and stable-environmental influences on the (urine and plasma) metabolomes, with (30 and 42%) attributable on average to familial sources, and (47 and 60%) attributable to longitudinally stable sources.Ten of the metabolites annotated in the study are estimated to have >60% familial contribution to their variation in concentration.Our findings have implications for the design and interpretation of 1H NMR-based molecular epidemiology studies. On the basis of the stable component of variation quantified in the current paper, we specified a model of disease association under which we inferred that sample sizes of a few thousand should be sufficient to detect disease-predictive metabolite biomarkers.
Metabolites are small molecules involved in biochemical processes in living systems. Their concentration in biofluids, such as urine and plasma, can offer insights into the functional status of biological pathways within an organism, and reflect input from multiple levels of biological organization—genetic, epigenetic, transcriptomic, and proteomic—as well as from environmental and lifestyle factors. Metabolite levels have the potential to indicate a broad variety of deviations from the ‘normal' physiological state, such as those that accompany a disease, or an increased susceptibility to disease. A number of recent studies have demonstrated that metabolite concentrations can be used to diagnose disease states accurately. A more ambitious goal is to identify metabolite biomarkers that are predictive of future disease onset, providing the possibility of intervention in susceptible individuals.
If an extreme concentration of a metabolite is to serve as an indicator of disease status, it is usually important to know the distribution of metabolite levels among healthy individuals. It is also useful to characterize the sources of that observed variation in the healthy population. A proportion of that variation—the heritable component—is attributable to genetic differences between individuals, potentially at many genetic loci. An effective, molecular indicator of a heritable, complex disease is likely to have a substantive heritable component. Non-heritable biological variation in metabolite concentrations can arise from a variety of environmental influences, such as dietary intake, lifestyle choices, general physical condition, composition of gut microflora, and use of medication. Variation across a population in stable-environmental influences leads to long-term differences between individuals in their baseline metabolite levels. Dynamic environmental pressures lead to short-term fluctuations within an individual about their baseline level. A metabolite whose concentration changes substantially in response to short-term pressures is relatively unlikely to offer long-term prediction of disease. In summary, the potential suitability of a metabolite to predict disease is reflected by the relative contributions of heritable and stable/unstable-environmental factors to its variation in concentration across the healthy population.
Studies involving twins are an established technique for quantifying the heritable component of phenotypes in human populations. Monozygotic (MZ) twins share the same DNA genome-wide, while dizygotic (DZ) twins share approximately half their inherited DNA, as do ordinary siblings. By comparing the average extent of phenotypic concordance within MZ pairs to that within DZ pairs, it is possible to quantify the heritability of a trait, and also to quantify the familiality, which refers to the combination of heritable and common-environmental effects (i.e., environmental influences shared by twins in a pair). In addition to incorporating twins into the study design, it is useful to quantify the phenotype in some individuals at multiple time points. The longitudinal aspect of such a study allows environmental effects to be decomposed into those that affect the phenotype over the short term and those that exert stable influence.
For the current study, urine and blood samples were collected from a cohort of MZ and DZ twins, with some twins donating samples on two occasions several months apart. Samples were analysed by 1H nuclear magnetic resonance (1H NMR) spectroscopy—an untargeted, discovery-driven technique for quantifying metabolite concentrations in biological samples. The application of 1H NMR to a biological sample creates a spectrum, made up of multiple peaks, with each peak's size quantitatively representing the concentration of its corresponding hydrogen-containing metabolite.
In each biological sample in our study, we extracted a full set of peaks, and thereby quantified the concentrations of all common plasma and urine metabolites detectable by 1H NMR. We developed bespoke statistical methods to decompose the observed concentration variation at each metabolite peak into that originating from familial, individual-environmental, and unstable-environmental sources.
We quantified the variability landscape across all common metabolite peaks in the urine and plasma 1H NMR metabolomes. We annotated a subset of peaks with a total of 65 metabolites; the variance decompositions for these are shown in Figure 1. Ten metabolites' concentrations were estimated to have familial contributions in excess of 60%. The average proportion of stable variation across all extracted metabolite peaks was estimated to be 47% in the urine samples and 60% in the plasma samples; the average estimated familiality was 30% for urine and 42% for plasma. These results comprise the first quantitative variation map of the 1H NMR metabolome. The identification and quantification of substantive widespread stability provides support for the use of these biofluids in molecular epidemiology studies. On the basis of our findings, we performed power calculations for a hypothetical study searching for predictive disease biomarkers among 1H NMR-detectable urine and plasma metabolites. Our calculations suggest that sample sizes of 2000–5000 should allow reliable identification of disease-predictive metabolite concentrations explaining 5–10% of disease risk, while greater sample sizes of 5000–20 000 would be required to identify metabolite concentrations explaining 1–2% of disease risk.
1H Nuclear Magnetic Resonance spectroscopy (1H NMR) is increasingly used to measure metabolite concentrations in sets of biological samples for top-down systems biology and molecular epidemiology. For such purposes, knowledge of the sources of human variation in metabolite concentrations is valuable, but currently sparse. We conducted and analysed a study to create such a resource. In our unique design, identical and non-identical twin pairs donated plasma and urine samples longitudinally. We acquired 1H NMR spectra on the samples, and statistically decomposed variation in metabolite concentration into familial (genetic and common-environmental), individual-environmental, and longitudinally unstable components. We estimate that stable variation, comprising familial and individual-environmental factors, accounts on average for 60% (plasma) and 47% (urine) of biological variation in 1H NMR-detectable metabolite concentrations. Clinically predictive metabolic variation is likely nested within this stable component, so our results have implications for the effective design of biomarker-discovery studies. We provide a power-calculation method which reveals that sample sizes of a few thousand should offer sufficient statistical precision to detect 1H NMR-based biomarkers quantifying predisposition to disease.
doi:10.1038/msb.2011.57
PMCID: PMC3202796  PMID: 21878913
biomarker; 1H nuclear magnetic resonance spectroscopy; metabolome-wide association study; top-down systems biology; variance decomposition
11.  Rapid whole genome optical mapping of Plasmodium falciparum 
Malaria Journal  2011;10:252.
Background
Immune evasion and drug resistance in malaria have been linked to chromosomal recombination and gene copy number variation (CNV). These events are ideally studied using comparative genomic analyses; however in malaria these analyses are not as common or thorough as in other infectious diseases, partly due to the difficulty in sequencing and assembling complete genome drafts. Recently, whole genome optical mapping has gained wide use in support of genomic sequence assembly and comparison. Here, a rapid technique for producing whole genome optical maps of Plasmodium falciparum is described and the results of mapping four genomes are presented.
Methods
Four laboratory strains of P. falciparum were analysed using the Argus™ optical mapping system to produce ordered restriction fragment maps of all 14 chromosomes in each genome. Plasmodium falciparum DNA was isolated directly from blood culture, visualized using the Argus™ system and assembled in a manner analogous to next generation sequence assembly into maps (AssemblyViewer™, OpGen Inc.®). Full coverage maps were generated for P. falciparum strains 3D7, FVO, D6 and C235. A reference P. falciparum in silico map was created by the digestion of the genomic sequence of P. falciparum with the restriction enzyme AflII, for comparisons to genomic optical maps. Maps were then compared using the MapSolver™ software.
Results
Genomic variation was observed among the mapped strains, as well as between the map of the reference strain and the map derived from the putative sequence of that same strain. Duplications, deletions, insertions, inversions and misassemblies of sizes ranging from 3,500 base pairs up to 78,000 base pairs were observed. Many genomic events occurred in areas of known repetitive sequence or high copy number genes, including var gene clusters and rifin complexes.
Conclusions
This technique for optical mapping of multiple malaria genomes allows for whole genome comparison of multiple strains and can assist in identifying genetic variation and sequence contig assembly. New protocols and technology allowed us to produce high quality contigs spanning four P. falciparum genomes in six weeks for less than $1,000.00 per genome. This relatively low cost and quick turnaround makes the technique valuable compared to other genomic sequencing technologies for studying genetic variation in malaria.
doi:10.1186/1475-2875-10-252
PMCID: PMC3173401  PMID: 21871093
12.  Population Stratification of a Common APOBEC Gene Deletion Polymorphism 
PLoS Genetics  2007;3(4):e63.
The APOBEC3 gene family plays a role in innate cellular immunity inhibiting retroviral infection, hepatitis B virus propagation, and the retrotransposition of endogenous elements. We present a detailed sequence and population genetic analysis of a 29.5-kb common human deletion polymorphism that removes the APOBEC3B gene. We developed a PCR-based genotyping assay, characterized 1,277 human diversity samples, and found that the frequency of the deletion allele varies significantly among major continental groups (global FST = 0.2843). The deletion is rare in Africans and Europeans (frequency of 0.9% and 6%), more common in East Asians and Amerindians (36.9% and 57.7%), and almost fixed in Oceanic populations (92.9%). Despite a worldwide frequency of 22.5%, analysis of data from the International HapMap Project reveals that no single existing tag single nucleotide polymorphism may serve as a surrogate for the deletion variant, emphasizing that without careful analysis its phenotypic impact may be overlooked in association studies. Application of haplotype-based tests for selection revealed potential pitfalls in the direct application of existing methods to the analysis of genomic structural variation. These data emphasize the importance of directly genotyping structural variation in association studies and of accurately resolving variant breakpoints before proceeding with more detailed population-genetic analysis.
Author Summary
Several recent studies have demonstrated that deletions, duplications, and inversions contribute a substantial fraction of the total amount of variation present in the human genome. In this study, we provide a comprehensive population-genetic analysis of a single deletion previously identified by comparing the genome of a single individual against the human genome reference sequence. Complete genomic sequence spanning the deleted region was obtained, allowing us to define the deletion breakpoints and develop a direct genotyping assay. Analysis showed that the deletion removes a member of a gene family involved in the innate immune response against viral pathogens. We genotyped samples from a human diversity panel and found drastic differences in the frequency of the deletion around the world. Using data from the HapMap project and the application of existing analysis techniques, we illustrate the importance of directly genotyping this type of variation and of clearly defining its boundaries. Without this level of detail the potential functional importance of such variation may be missed.
doi:10.1371/journal.pgen.0030063
PMCID: PMC1853121  PMID: 17447845
13.  Patterns of Cis Regulatory Variation in Diverse Human Populations 
PLoS Genetics  2012;8(4):e1002639.
The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.
Author Summary
Variation among individuals in the degree to which genes are expressed (i.e. turned on or off) is a characteristic exhibited by all species, and studies have identified regions of the genome harboring genetic variation affecting gene expression levels. To assess the degree of human inter-population variability in regulatory variation, we describe mapping of regions of the genome that have functional effects on gene expression levels. We analyzed genome-wide gene expression in human cell lines derived from 726 unrelated individuals representing 8 global populations that have been genetically well-characterized by the International HapMap Project. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We identify ∼5,700 genes whose expression levels are associated with genetic variation located physically close to the gene, and we observe significant sharing of associations that is partially dependent on population genetic relatedness, among Asians, European-admixed, and African subpopulations. We identify biological functions affected by regulatory variation and describe common and unique characteristics of population-specific and population-shared associations. These results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation.
doi:10.1371/journal.pgen.1002639
PMCID: PMC3330104  PMID: 22532805
14.  Genetic Variation in the Nuclear and Organellar Genomes Modulates Stochastic Variation in the Metabolome, Growth, and Defense 
PLoS Genetics  2015;11(1):e1004779.
Recent studies are starting to show that genetic control over stochastic variation is a key evolutionary solution of single celled organisms in the face of unpredictable environments. This has been expanded to show that genetic variation can alter stochastic variation in transcriptional processes within multi-cellular eukaryotes. However, little is known about how genetic diversity can control stochastic variation within more non-cell autonomous phenotypes. Using an Arabidopsis reciprocal RIL population, we showed that there is significant genetic diversity influencing stochastic variation in the plant metabolome, defense chemistry, and growth. This genetic diversity included loci specific for the stochastic variation of each phenotypic class that did not affect the other phenotypic classes or the average phenotype. This suggests that the organism's networks are established so that noise can exist in one phenotypic level like metabolism and not permeate up or down to different phenotypic levels. Further, the genomic variation within the plastid and mitochondria also had significant effects on the stochastic variation of all phenotypic classes. The genetic influence over stochastic variation within the metabolome was highly metabolite specific, with neighboring metabolites in the same metabolic pathway frequently showing different levels of noise. As expected from bet-hedging theory, there was more genetic diversity and a wider range of stochastic variation for defense chemistry than found for primary metabolism. Thus, it is possible to begin dissecting the stochastic variation of whole organismal phenotypes in multi-cellular organisms. Further, there are loci that modulate stochastic variation at different phenotypic levels. Finding the identity of these genes will be key to developing complete models linking genotype to phenotype.
Author Summary
Systems biology is largely based on the principal that the link between genotype and phenotype is deterministic, and, if we know enough, can be predicted with high accuracy. In contrast, recent work studying transcription within single celled organisms has shown that the genotype to phenotype link is stochastic, i.e. a single genotype actually makes a range of phenotypes even in a single environment. Further, natural variation within genes can lead to each allele displaying a different phenotypic distribution. To test if multi-cellular organisms also display natural genetic variation in the stochastic link between genotype and phenotype, we measured the metabolome, growth and defense metabolism within an Arabidopsis RIL population and mapped quantitative trait loci. We show that genetic variation in the nuclear and organeller genomes influence the stochastic variation in all measured traits. Further, each trait class has distinct genetics underlying the stochastic variance, showing that there are different mechanisms controlling the stochastic genotype to phenotype link for each trait. Further work is necessary to identify the mechanisms underpinning the stochastic nature of the genotype to phenotype link.
doi:10.1371/journal.pgen.1004779
PMCID: PMC4287608  PMID: 25569687
15.  Genomic linkage map of the human blood fluke Schistosoma mansoni 
Genome Biology  2009;10(6):R71.
The first genetic linkage map of Schistosoma mansoni reveals insights into higher female recombination, confirms ZW inheritance patterns and recombination hotspots.
Background
Schistosoma mansoni is a blood fluke that infects approximately 90 million people. The complete life cycle of this parasite can be maintained in the laboratory, making this one of the few experimentally tractable human helminth infections, and a rich literature reveals heritable variation in important biomedical traits such as virulence, host-specificity, transmission and drug resistance. However, there is a current lack of tools needed to study S. mansoni's molecular, quantitative, and population genetics. Our goal was to construct a genetic linkage map for S. mansoni, and thus provide a new resource that will help stimulate research on this neglected pathogen.
Results
We genotyped grandparents, parents and 88 progeny to construct a 5.6 cM linkage map containing 243 microsatellites positioned on 203 of the largest scaffolds in the genome sequence. The map allows 70% of the estimated 300 Mb genome to be ordered on chromosomes, and highlights where scaffolds have been incorrectly assembled. The markers fall into eight main linkage groups, consistent with seven pairs of autosomes and one pair of sex chromosomes, and we were able to anchor linkage groups to chromosomes using fluorescent in situ hybridization. The genome measures 1,228.6 cM. Marker segregation reveals higher female recombination, confirms ZW inheritance patterns, and identifies recombination hotspots and regions of segregation distortion.
Conclusions
The genetic linkage map presented here is the first for S. mansoni and the first for a species in the phylum Platyhelminthes. The map provides the critical tool necessary for quantitative genetic analysis, aids genome assembly, and furnishes a framework for comparative flatworm genomics and field-based molecular epidemiological studies.
doi:10.1186/gb-2009-10-6-r71
PMCID: PMC2718505  PMID: 19566921
16.  Generation of a BAC-based physical map of the melon genome 
BMC Genomics  2010;11:339.
Background
Cucumis melo (melon) belongs to the Cucurbitaceae family, whose economic importance among horticulture crops is second only to Solanaceae. Melon has high intra-specific genetic variation, morphologic diversity and a small genome size (450 Mb), which make this species suitable for a great variety of molecular and genetic studies that can lead to the development of tools for breeding varieties of the species. A number of genetic and genomic resources have already been developed, such as several genetic maps and BAC genomic libraries. These tools are essential for the construction of a physical map, a valuable resource for map-based cloning, comparative genomics and assembly of whole genome sequencing data. However, no physical map of any Cucurbitaceae has yet been developed. A project has recently been started to sequence the complete melon genome following a whole-genome shotgun strategy, which makes use of massive sequencing data. A BAC-based melon physical map will be a useful tool to help assemble and refine the draft genome data that is being produced.
Results
A melon physical map was constructed using a 5.7 × BAC library and a genetic map previously developed in our laboratories. High-information-content fingerprinting (HICF) was carried out on 23,040 BAC clones, digesting with five restriction enzymes and SNaPshot labeling, followed by contig assembly with FPC software. The physical map has 1,355 contigs and 441 singletons, with an estimated physical length of 407 Mb (0.9 × coverage of the genome) and the longest contig being 3.2 Mb. The anchoring of 845 BAC clones to 178 genetic markers (100 RFLPs, 76 SNPs and 2 SSRs) also allowed the genetic positioning of 183 physical map contigs/singletons, representing 55 Mb (12%) of the melon genome, to individual chromosomal loci. The melon FPC database is available for download at http://melonomics.upv.es/static/files/public/physical_map/.
Conclusions
Here we report the construction of the first physical map of a Cucurbitaceae species described so far. The physical map was integrated with the genetic map so that a number of physical contigs, representing 12% of the melon genome, could be anchored to known genetic positions. The data presented is already helping to improve the quality of the melon genomic sequence available as a result of a project currently being carried out in Spain, adopting a whole genome shotgun approach based on 454 sequencing data.
doi:10.1186/1471-2164-11-339
PMCID: PMC2894041  PMID: 20509895
17.  A physical map of the bovine genome 
Genome Biology  2007;8(8):R165.
A new physical map of the bovine genome has been constructed by integrating data from genetic and radiation hybrid maps, and a new bovine BAC map, with the bovine genome draft assembly.
Background
Cattle are important agriculturally and relevant as a model organism. Previously described genetic and radiation hybrid (RH) maps of the bovine genome have been used to identify genomic regions and genes affecting specific traits. Application of these maps to identify influential genetic polymorphisms will be enhanced by integration with each other and with bacterial artificial chromosome (BAC) libraries. The BAC libraries and clone maps are essential for the hybrid clone-by-clone/whole-genome shotgun sequencing approach taken by the bovine genome sequencing project.
Results
A bovine BAC map was constructed with HindIII restriction digest fragments of 290,797 BAC clones from animals of three different breeds. Comparative mapping of 422,522 BAC end sequences assisted with BAC map ordering and assembly. Genotypes and pedigree from two genetic maps and marker scores from three whole-genome RH panels were consolidated on a 17,254-marker composite map. Sequence similarity allowed integrating the BAC and composite maps with the bovine draft assembly (Btau3.1), establishing a comprehensive resource describing the bovine genome. Agreement between the marker and BAC maps and the draft assembly is high, although discrepancies exist. The composite and BAC maps are more similar than either is to the draft assembly.
Conclusion
Further refinement of the maps and greater integration into the genome assembly process may contribute to a high quality assembly. The maps provide resources to associate phenotypic variation with underlying genomic variation, and are crucial resources for understanding the biology underpinning this important ruminant species so closely associated with humans.
doi:10.1186/gb-2007-8-8-r165
PMCID: PMC2374996  PMID: 17697342
18.  A set of EST-SNPs for map saturation and cultivar identification in melon 
BMC Plant Biology  2009;9:90.
Background
There are few genomic tools available in melon (Cucumis melo L.), a member of the Cucurbitaceae, despite its importance as a crop. Among these tools, genetic maps have been constructed mainly using marker types such as simple sequence repeats (SSR), restriction fragment length polymorphisms (RFLP) and amplified fragment length polymorphisms (AFLP) in different mapping populations. There is a growing need for saturating the genetic map with single nucleotide polymorphisms (SNP), more amenable for high throughput analysis, especially if these markers are located in gene coding regions, to provide functional markers. Expressed sequence tags (ESTs) from melon are available in public databases, and resequencing ESTs or validating SNPs detected in silico are excellent ways to discover SNPs.
Results
EST-based SNPs were discovered after resequencing ESTs between the parental lines of the PI 161375 (SC) × 'Piel de sapo' (PS) genetic map or using in silico SNP information from EST databases. In total 200 EST-based SNPs were mapped in the melon genetic map using a bin-mapping strategy, increasing the map density to 2.35 cM/marker. A subset of 45 SNPs was used to study variation in a panel of 48 melon accessions covering a wide range of the genetic diversity of the species. SNP analysis correctly reflected the genetic relationships compared with other marker systems, being able to distinguish all the accessions and cultivars.
Conclusion
This is the first example of a genetic map in a cucurbit species that includes a major set of SNP markers discovered using ESTs. The PI 161375 × 'Piel de sapo' melon genetic map has around 700 markers, of which more than 500 are gene-based markers (SNP, RFLP and SSR). This genetic map will be a central tool for the construction of the melon physical map, the step prior to sequencing the complete genome. Using the set of SNP markers, it was possible to define the genetic relationships within a collection of forty-eight melon accessions as efficiently as with SSR markers, and these markers may also be useful for cultivar identification in Occidental melon varieties.
doi:10.1186/1471-2229-9-90
PMCID: PMC2722630  PMID: 19604363
19.  Identification of candidate risk gene variations by whole-genome sequence analysis of four rat strains commonly used in inflammation research 
BMC Genomics  2014;15(1):391.
Background
The DA rat strain is particularly susceptible to the induction of a number of chronic inflammatory diseases, such as models for rheumatoid arthritis and multiple sclerosis. Here we sequenced the genomes of two DA sub-strains and two disease resistant strains, E3 and PVG, previously used together with DA strains in genetically segregating crosses.
Results
The data uncovers genomic variations, such as single nucleotide variations (SNVs) and copy number variations that underlie phenotypic differences between the strains. Comparisons of regional differences between the two DA sub-strains identified 8 genomic regions that discriminate between the strains that together cover 38 Mbp and harbor 302 genes. We analyzed 10 fine-mapped quantitative trait loci and our data implicate strong candidates for genetic variations that mediate their effects. For example we could identify a single SNV candidate in a regulatory region of the gene Il21r, which has been associated to differential expression in both rats and human MS patients. In the APLEC complex we identified two SNVs in a highly conserved region, which could affect the regulation of all APLEC encoded genes and explain the polygenic differential expression seen in the complex. Furthermore, the non-synonymous SNV modifying aa153 of the Ncf1 protein was confirmed as the sole causative factor.
Conclusion
This complete map of genetic differences between the most commonly used rat strains in inflammation research constitutes an important reference in understanding how genetic variations contribute to the traits of importance for inflammatory diseases.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-391) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-391
PMCID: PMC4041999  PMID: 24885425
Rat genome; Whole-genome sequencing; QTL; Rheumatoid arthritis; Multiple sclerosis; Disease-associated SNVs
20.  Integrating linkage and radiation hybrid mapping data for bovine chromosome 15 
BMC Genomics  2004;5:77.
Background
Bovine chromosome (BTA) 15 contains a quantitative trait loci (QTL) for meat tenderness, as well as several breaks in synteny with human chromosome (HSA) 11. Both linkage and radiation hybrid (RH) maps of BTA 15 are available, but the linkage map lacks gene-specific markers needed to identify genes underlying the QTL, and the gene-rich RH map lacks associations with marker genotypes needed to define the QTL. Integrating the maps will provide information to further explore the QTL as well as refine the comparative map between BTA 15 and HSA 11. A recently developed approach to integrating linkage and RH maps uses both linkage and RH data to resolve a consensus marker order, rather than aligning independently constructed maps. Automated map construction procedures employing this maximum-likelihood approach were developed to integrate BTA RH and linkage data, and establish comparative positions of BTA 15 markers with HSA 11 homologs.
Results
The integrated BTA 15 map represents 145 markers; 42 shared by both data sets, 36 unique to the linkage data and 67 unique to RH data. Sequence alignment yielded comparative positions for 77 bovine markers with homologs on HSA 11. The map covers approximately 32% of HSA 11 sequence in five segments of conserved synteny, another 15% of HSA 11 is shared with BTA 29. Bovine and human order are consistent in portions of the syntenic segments, but some rearrangement is apparent. Comparative positions of gene markers near the meat tenderness QTL indicate the region includes separate segments of HSA 11. The two microsatellite markers flanking the QTL peak are between defined syntenic segments.
Conclusions
Combining data to construct an integrated map not only consolidates information from different sources onto a single map, but information contributed from each data set increases the accuracy of the map. Comparison of bovine maps with well annotated human sequence can provide useful information about genes near mapped bovine markers, but bovine gene order may be different than human. Procedures to connect genetic and physical mapping data, build integrated maps for livestock species, and connect those maps to more fully annotated sequence can be automated, facilitating the maintenance of up-to-date maps, and providing a valuable tool to further explore genetic variation in livestock.
doi:10.1186/1471-2164-5-77
PMCID: PMC526187  PMID: 15473903
21.  The Contribution of RNA Decay Quantitative Trait Loci to Inter-Individual Variation in Steady-State Gene Expression Levels 
PLoS Genetics  2012;8(10):e1003000.
Recent gene expression QTL (eQTL) mapping studies have provided considerable insight into the genetic basis for inter-individual regulatory variation. However, a limitation of all eQTL studies to date, which have used measurements of steady-state gene expression levels, is the inability to directly distinguish between variation in transcription and decay rates. To address this gap, we performed a genome-wide study of variation in gene-specific mRNA decay rates across individuals. Using a time-course study design, we estimated mRNA decay rates for over 16,000 genes in 70 Yoruban HapMap lymphoblastoid cell lines (LCLs), for which extensive genotyping data are available. Considering mRNA decay rates across genes, we found that: (i) as expected, highly expressed genes are generally associated with lower mRNA decay rates, (ii) genes with rapid mRNA decay rates are enriched with putative binding sites for miRNA and RNA binding proteins, and (iii) genes with similar functional roles tend to exhibit correlated rates of mRNA decay. Focusing on variation in mRNA decay across individuals, we estimate that steady-state expression levels are significantly correlated with variation in decay rates in 10% of genes. Somewhat counter-intuitively, for about half of these genes, higher expression is associated with faster decay rates, possibly due to a coupling of mRNA decay with transcriptional processes in genes involved in rapid cellular responses. Finally, we used these data to map genetic variation that is specifically associated with variation in mRNA decay rates across individuals. We found 195 such loci, which we named RNA decay quantitative trait loci (“rdQTLs”). All the observed rdQTLs are located near the regulated genes and therefore are assumed to act in cis. By analyzing our data within the context of known steady-state eQTLs, we estimate that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates.
Author Summary
Recent studies of functional genetic variation in humans have identified numerous loci that are associated with variation in gene expression levels, called expression quantitative trait loci (eQTLs). The mechanisms by which these loci affect gene expression, however, are still largely unknown. Specifically, since most studies rely on measures of steady-state gene expression levels, they are unable to distinguish between the relative influences of either transcriptional- or decay-related processes. To address this gap, we examined the specific impact of mRNA decay processes on steady-state gene expression levels for over 16,000 genes in human lymphoblastoid cell lines. By characterizing decay rates in 70 individuals, we show that steady-state expression levels are significantly influenced by variation in decay rates for 10% of genes. Yet, for roughly half of these genes, we find that individuals with higher expression levels also have faster decay rates. This pattern points to a non-simple mechanistic interplay between transcriptional and decay processes, especially for genes involved in rapid cellular responses. Finally, we identify 195 genetic variants that are significantly associated with both gene expression variation and variation in mRNA decay rates. Using these data, we estimate that that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates.
doi:10.1371/journal.pgen.1003000
PMCID: PMC3469421  PMID: 23071454
22.  The Role of Abcb5 Alleles in Susceptibility to Haloperidol-Induced Toxicity in Mice and Humans 
PLoS Medicine  2015;12(2):e1001782.
Background
We know very little about the genetic factors affecting susceptibility to drug-induced central nervous system (CNS) toxicities, and this has limited our ability to optimally utilize existing drugs or to develop new drugs for CNS disorders. For example, haloperidol is a potent dopamine antagonist that is used to treat psychotic disorders, but 50% of treated patients develop characteristic extrapyramidal symptoms caused by haloperidol-induced toxicity (HIT), which limits its clinical utility. We do not have any information about the genetic factors affecting this drug-induced toxicity. HIT in humans is directly mirrored in a murine genetic model, where inbred mouse strains are differentially susceptible to HIT. Therefore, we genetically analyzed this murine model and performed a translational human genetic association study.
Methods and Findings
A whole genome SNP database and computational genetic mapping were used to analyze the murine genetic model of HIT. Guided by the mouse genetic analysis, we demonstrate that genetic variation within an ABC-drug efflux transporter (Abcb5) affected susceptibility to HIT. In situ hybridization results reveal that Abcb5 is expressed in brain capillaries, and by cerebellar Purkinje cells. We also analyzed chromosome substitution strains, imaged haloperidol abundance in brain tissue sections and directly measured haloperidol (and its metabolite) levels in brain, and characterized Abcb5 knockout mice. Our results demonstrate that Abcb5 is part of the blood-brain barrier; it affects susceptibility to HIT by altering the brain concentration of haloperidol. Moreover, a genetic association study in a haloperidol-treated human cohort indicates that human ABCB5 alleles had a time-dependent effect on susceptibility to individual and combined measures of HIT. Abcb5 alleles are pharmacogenetic factors that affect susceptibility to HIT, but it is likely that additional pharmacogenetic susceptibility factors will be discovered.
Conclusions
ABCB5 alleles alter susceptibility to HIT in mouse and humans. This discovery leads to a new model that (at least in part) explains inter-individual differences in susceptibility to a drug-induced CNS toxicity.
Gary Peltz and colleagues examine the role of ABCB5 alleles in haloperidol-induced toxicity in a murine genetic model and humans treated with haloperidol.
Editors' Summary
Background
The brain is the control center of the human body. This complex organ controls thoughts, memory, speech, and movement, it is the seat of intelligence, and it regulates the function of many organs. The brain comprises many different parts, all of which work together but all of which have their own special functions. For example, the forebrain is involved in intellectual activities such as thinking whereas the hindbrain controls the body’s vital functions and movements. Messages are passed between the various regions of the brain and to other parts of the body by specialized cells called neurons, which release and receive signal molecules known as neurotransmitters. Like all the organs in the body, blood vessels supply the brain with the oxygen, water, and nutrients it needs to function. Importantly, however, the brain is protected from infectious agents and other potentially dangerous substances circulating in the blood by the “blood-brain barrier,” a highly selective permeability barrier that is formed by the cells lining the fine blood vessels (capillaries) within the brain.
Why Was This Study Done?
Although drugs have been developed to treat various brain disorders, more active and less toxic drugs are needed to improve the treatment of many if not most of these conditions. Unfortunately, relatively little is known about how the blood-brain barrier regulates the entry of drugs into the brain or about the genetic factors that affect the brain’s susceptibility to drug-induced toxicities. It is not known, for example, why about half of patients given haloperidol—a drug used to treat psychotic disorders (conditions that affect how people think, feel, or behave)—develop tremors and other symptoms caused by alterations in the brain region that controls voluntary movements. Here, to improve our understanding of how drugs enter the brain and impact its function, the researchers investigate the genetic factors that affect haloperidol-induced toxicity by genetically analyzing several inbred mouse strains (every individual in an inbred mouse strain is genetically identical) with different susceptibilities to haloperidol-induced toxicity and by undertaking a human genetic association study (a study that looks for non-chance associations between specific traits and genetic variants).
What Did the Researchers Do and Find?
The researchers used a database of genetic variants called single nucleotide polymorphisms (SNPs) and a computational genetic mapping approach to show first that variations within the gene encoding Abcb5 affected susceptibility to haloperidol-induced toxicity (indicated by changes in the length of time taken by mice to move their paws when placed on an inclined wire-mesh screen) among inbred mouse strains. Abcb5 is an ATP-binding cassette transporter, a type of protein that moves molecules across cell membranes. The researchers next showed that Abcb5 is expressed in brain capillaries, which is the location of the blood-brain barrier. Abcb5 was also expressed in cerebellar Purkinje cells, which help to control motor (intentional) movements. They also measured the measured the effect of haloperidol and the haloperidol concentration in brain tissue sections in mice that were genetically engineered to make no Abcb5 (Abcb5 knockout mice). Finally, the researchers investigated whether specific alleles (alternative versions) of ABCB5 are associated with haloperidol-induced toxicity in people. Among a group of 85 patients treated with haloperidol for a psychotic illness, one specific ABCB5 allele was associated with haloperidol-induced toxicity during the first few days of treatment.
What Do These Findings Mean?
These findings indicate that Abcb5 is a component of the blood-brain barrier in mice and suggest that genetic variants in the gene encoding this protein underlie, at least in part, the differences in susceptibility to haloperidol-induced toxicity seen among inbred mice strains. Moreover, the human genetic association study indicates that a specific ABCB5 allele also affects the susceptibility of people to haloperidol-induced toxicity. The researchers note that other ABCB5 alleles or other genetic factors that affect haloperidol-induced toxicity in people might emerge if larger groups of patients were studied. However, based on their findings, the researchers propose a new model for the genetic mechanisms that underlie inter-individual and cell type-specific differences in susceptibility to haloperidol-induced brain toxicity. If confirmed in future studies, this model might facilitate the development of more effective and less toxic drugs to treat a range of brain disorders.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001782.
The US National Institute of Neurological Disorders and Stroke provides information about a wide range of brain diseases (in English and Spanish); its fact sheet “Brain Basics: Know Your Brain” is a simple introduction to the human brain; its “Blueprint Neurotherapeutics Network” was established to develop new drugs for disorders affecting the brain and other parts of the nervous system
MedlinePlus provides links to additional resources about brain diseases and their treatment (in English and Spanish)
Wikipedia provides information about haloperidol, about ATP-binding cassette transporters and about genetic association (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.1001782
PMCID: PMC4315575  PMID: 25647612
23.  Trait Variation in Yeast Is Defined by Population History 
PLoS Genetics  2011;7(6):e1002111.
A fundamental goal in biology is to achieve a mechanistic understanding of how and to what extent ecological variation imposes selection for distinct traits and favors the fixation of specific genetic variants. Key to such an understanding is the detailed mapping of the natural genomic and phenomic space and a bridging of the gap that separates these worlds. Here we chart a high-resolution map of natural trait variation in one of the most important genetic model organisms, the budding yeast Saccharomyces cerevisiae, and its closest wild relatives and trace the genetic basis and timing of major phenotype changing events in its recent history. We show that natural trait variation in S. cerevisiae exceeds that of its relatives, despite limited genetic variation, and follows the population history rather than the source environment. In particular, the West African population is phenotypically unique, with an extreme abundance of low-performance alleles, notably a premature translational termination signal in GAL3 that cause inability to utilize galactose. Our observations suggest that many S. cerevisiae traits may be the consequence of genetic drift rather than selection, in line with the assumption that natural yeast lineages are remnants of recent population bottlenecks. Disconcertingly, the universal type strain S288C was found to be highly atypical, highlighting the danger of extrapolating gene-trait connections obtained in mosaic, lab-domesticated lineages to the species as a whole. Overall, this study represents a step towards an in-depth understanding of the causal relationship between co-variation in ecology, selection pressure, natural traits, molecular mechanism, and alleles in a key model organism.
Author Summary
An overall aim in modern biology is to achieve an in-depth understanding of an organism's physiology in the context of its ecology and historic selective pressures that have been acting on its genome. The baker's yeast, Saccharomyces cerevisiae, has a peculiar life history completely dominated by clonal reproduction and self-fertilization, prompting the suggestion that natural yeasts are remnants of repeated population bottlenecks in essentially clonal lineages. Such a life history dominated by mitotic proliferation purports a strong evolutionary influence of genetic drift and predicts trait variation to be high and largely defined by the genetic history of each population. Here we chart a highly resolved map of natural trait variation in S. cerevisiae and its closest non-domesticated relative, Saccharomyces paradoxus, and confirm this prediction. We found that trait variation in budding yeast is indeed high and largely defined by population rather than source environment. In particular, the West African population was found to be phenotypically unique with an extreme abundance of low-performance alleles. Our findings support the idea of population bottlenecks in the recent yeast evolutionary history and a large influence of genetic drift.
doi:10.1371/journal.pgen.1002111
PMCID: PMC3116910  PMID: 21698134
24.  ENGINES: exploring single nucleotide variation in entire human genomes 
BMC Bioinformatics  2011;12:105.
Background
Next generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw data for 629 complete genomes representing several human populations through their Phase I interim analysis and, although there are certain public tools available that allow exploration of these genomes, to date there is no tool that permits comprehensive population analysis of the variation catalogued by such data.
Description
We have developed a genetic variant site explorer able to retrieve data for Single Nucleotide Variation (SNVs), population by population, from entire genomes without compromising future scalability and agility. ENGINES (ENtire Genome INterface for Exploring SNVs) uses data from the 1000 Genomes Phase I to demonstrate its capacity to handle large amounts of genetic variation (>7.3 billion genotypes and 28 million SNVs), as well as deriving summary statistics of interest for medical and population genetics applications. The whole dataset is pre-processed and summarized into a data mart accessible through a web interface. The query system allows the combination and comparison of each available population sample, while searching by rs-number list, chromosome region, or genes of interest. Frequency and FST filters are available to further refine queries, while results can be visually compared with other large-scale Single Nucleotide Polymorphism (SNP) repositories such as HapMap or Perlegen.
Conclusions
ENGINES is capable of accessing large-scale variation data repositories in a fast and comprehensive manner. It allows quick browsing of whole genome variation, while providing statistical information for each variant site such as allele frequency, heterozygosity or FST values for genetic differentiation. Access to the data mart generating scripts and to the web interface is granted from http://spsmart.cesga.es/engines.php
doi:10.1186/1471-2105-12-105
PMCID: PMC3107182  PMID: 21504571
25.  Coverage and Characteristics of the Affymetrix GeneChip Human Mapping 100K SNP Set  
PLoS Genetics  2006;2(5):e67.
Improvements in technology have made it possible to conduct genome-wide association mapping at costs within reach of academic investigators, and experiments are currently being conducted with a variety of high-throughput platforms. To provide an appropriate context for interpreting results of such studies, we summarize here results of an investigation of one of the first of these technologies to be publicly available, the Affymetrix GeneChip Human Mapping 100K set of single nucleotide polymorphisms (SNPs). In a systematic analysis of the pattern and distribution of SNPs in the Mapping 100K set, we find that SNPs in this set are undersampled from coding regions (both nonsynonymous and synonymous) and oversampled from regions outside genes, relative to SNPs in the overall HapMap database. In addition, we utilize a novel multilocus linkage disequilibrium (LD) coefficient based on information content (analogous to the information content scores commonly used for linkage mapping) that is equivalent to the familiar measure r2 in the special case of two loci. Using this approach, we are able to summarize for any subset of markers, such as the Affymetrix Mapping 100K set, the information available for association mapping in that subset, relative to the information available in the full set of markers included in the HapMap, and highlight circumstances in which this multilocus measure of LD provides substantial additional insight about the haplotype structure in a region over pairwise measures of LD.
Synopsis
The ability to survey hundreds of thousands of single nucleotide polymorphisms (SNPs) with cost-effective technologies is enabling investigators to conduct genome-wide association studies designed to find genetic variation affecting disease risk. To facilitate both interpretation of these studies and the design of follow-up studies, Nicolae and colleagues have made a comprehensive survey of the distribution and coverage of the first of these high-throughput platforms for genome-wide association mapping to be made publicly available, the Affymetrix GeneChip Human Mapping 100K set of SNPs (100K set). They found that SNPs within coding sequence are underrepresented in this mapping set relative to the set of SNPs included in the International HapMap Project, and this has consequences for the success of association studies. Measuring the information content confirms that the 100K set provides substantial coverage on variation in the HapMap database. The 100K set is quite redundant, as the SNPs were selected in the absence of information on the correlation (linkage disequilibrium) among them, and thus the relatively high value of the information content in the 100K set for the HapMap SNPs bodes well for general ability to survey genomic variation with a subset of variants.
doi:10.1371/journal.pgen.0020067
PMCID: PMC1456318  PMID: 16680197

Results 1-25 (565776)