PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1570546)

Clipboard (0)
None

Related Articles

1.  Patterns of Cis Regulatory Variation in Diverse Human Populations 
PLoS Genetics  2012;8(4):e1002639.
The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.
Author Summary
Variation among individuals in the degree to which genes are expressed (i.e. turned on or off) is a characteristic exhibited by all species, and studies have identified regions of the genome harboring genetic variation affecting gene expression levels. To assess the degree of human inter-population variability in regulatory variation, we describe mapping of regions of the genome that have functional effects on gene expression levels. We analyzed genome-wide gene expression in human cell lines derived from 726 unrelated individuals representing 8 global populations that have been genetically well-characterized by the International HapMap Project. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We identify ∼5,700 genes whose expression levels are associated with genetic variation located physically close to the gene, and we observe significant sharing of associations that is partially dependent on population genetic relatedness, among Asians, European-admixed, and African subpopulations. We identify biological functions affected by regulatory variation and describe common and unique characteristics of population-specific and population-shared associations. These results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation.
doi:10.1371/journal.pgen.1002639
PMCID: PMC3330104  PMID: 22532805
2.  Genome Wide Association Identifies Novel Loci Involved in Fungal Communication 
PLoS Genetics  2013;9(8):e1003669.
Understanding how genomes encode complex cellular and organismal behaviors has become the outstanding challenge of modern genetics. Unlike classical screening methods, analysis of genetic variation that occurs naturally in wild populations can enable rapid, genome-scale mapping of genotype to phenotype with a medium-throughput experimental design. Here we describe the results of the first genome-wide association study (GWAS) used to identify novel loci underlying trait variation in a microbial eukaryote, harnessing wild isolates of the filamentous fungus Neurospora crassa. We genotyped each of a population of wild Louisiana strains at 1 million genetic loci genome-wide, and we used these genotypes to map genetic determinants of microbial communication. In N. crassa, germinated asexual spores (germlings) sense the presence of other germlings, grow toward them in a coordinated fashion, and fuse. We evaluated germlings of each strain for their ability to chemically sense, chemotropically seek, and undergo cell fusion, and we subjected these trait measurements to GWAS. This analysis identified one gene, NCU04379 (cse-1, encoding a homolog of a neuronal calcium sensor), at which inheritance was strongly associated with the efficiency of germling communication. Deletion of cse-1 significantly impaired germling communication and fusion, and two genes encoding predicted interaction partners of CSE1 were also required for the communication trait. Additionally, mining our association results for signaling and secretion genes with a potential role in germling communication, we validated six more previously unknown molecular players, including a secreted protease and two other genes whose deletion conferred a novel phenotype of increased communication and multi-germling fusion. Our results establish protein secretion as a linchpin of germling communication in N. crassa and shed light on the regulation of communication molecules in this fungus. Our study demonstrates the power of population-genetic analyses for the rapid identification of genes contributing to complex traits in microbial species.
Author Summary
Many phenotypes of interest are controlled by multiple loci, and in biological systems identifying determinants of such complex traits is challenging. Here, we genotyped 112 wild isolates of Neurospora crassa and used this resource to identify genes that mediate a fundamental but poorly-understood attribute of this filamentous fungus: the ability of germinating spores to sense each other at a distance, extend projections toward one another, and fuse. Inheritance at a secretion gene, cse-1, was associated strongly with germling communication across wild strains; this association was validated in experiments showing reduced communication in a cse-1 deletion strain. By testing interacting partners of CSE1, and by assessing additional secretion and signaling factors whose inheritance associated more modestly with germling communication in wild strains, we identified eight other novel determinants of this phenotype. Our population of genotyped wild isolates provides a flexible and powerful community resource for the rapid identification of any varying, complex phenotype in N. crassa. The success of our approach, which used a phenotyping scheme far more tractable than would be required in a screen of the entire N. crassa gene deletion collection, serves as a proof of concept for association studies of wild populations for any organism.
doi:10.1371/journal.pgen.1003669
PMCID: PMC3731230  PMID: 23935534
3.  Human metabolic profiles are stably controlled by genetic and environmental variation 
A comprehensive variation map of the human metabolome identifies genetic and stable-environmental sources as major drivers of metabolite concentrations. The data suggest that sample sizes of a few thousand are sufficient to detect metabolite biomarkers predictive of disease.
We designed a longitudinal twin study to characterize the genetic, stable-environmental, and longitudinally fluctuating influences on metabolite concentrations in two human biofluids—urine and plasma—focusing specifically on the representative subset of metabolites detectable by 1H nuclear magnetic resonance (1H NMR) spectroscopy.We identified widespread genetic and stable-environmental influences on the (urine and plasma) metabolomes, with (30 and 42%) attributable on average to familial sources, and (47 and 60%) attributable to longitudinally stable sources.Ten of the metabolites annotated in the study are estimated to have >60% familial contribution to their variation in concentration.Our findings have implications for the design and interpretation of 1H NMR-based molecular epidemiology studies. On the basis of the stable component of variation quantified in the current paper, we specified a model of disease association under which we inferred that sample sizes of a few thousand should be sufficient to detect disease-predictive metabolite biomarkers.
Metabolites are small molecules involved in biochemical processes in living systems. Their concentration in biofluids, such as urine and plasma, can offer insights into the functional status of biological pathways within an organism, and reflect input from multiple levels of biological organization—genetic, epigenetic, transcriptomic, and proteomic—as well as from environmental and lifestyle factors. Metabolite levels have the potential to indicate a broad variety of deviations from the ‘normal' physiological state, such as those that accompany a disease, or an increased susceptibility to disease. A number of recent studies have demonstrated that metabolite concentrations can be used to diagnose disease states accurately. A more ambitious goal is to identify metabolite biomarkers that are predictive of future disease onset, providing the possibility of intervention in susceptible individuals.
If an extreme concentration of a metabolite is to serve as an indicator of disease status, it is usually important to know the distribution of metabolite levels among healthy individuals. It is also useful to characterize the sources of that observed variation in the healthy population. A proportion of that variation—the heritable component—is attributable to genetic differences between individuals, potentially at many genetic loci. An effective, molecular indicator of a heritable, complex disease is likely to have a substantive heritable component. Non-heritable biological variation in metabolite concentrations can arise from a variety of environmental influences, such as dietary intake, lifestyle choices, general physical condition, composition of gut microflora, and use of medication. Variation across a population in stable-environmental influences leads to long-term differences between individuals in their baseline metabolite levels. Dynamic environmental pressures lead to short-term fluctuations within an individual about their baseline level. A metabolite whose concentration changes substantially in response to short-term pressures is relatively unlikely to offer long-term prediction of disease. In summary, the potential suitability of a metabolite to predict disease is reflected by the relative contributions of heritable and stable/unstable-environmental factors to its variation in concentration across the healthy population.
Studies involving twins are an established technique for quantifying the heritable component of phenotypes in human populations. Monozygotic (MZ) twins share the same DNA genome-wide, while dizygotic (DZ) twins share approximately half their inherited DNA, as do ordinary siblings. By comparing the average extent of phenotypic concordance within MZ pairs to that within DZ pairs, it is possible to quantify the heritability of a trait, and also to quantify the familiality, which refers to the combination of heritable and common-environmental effects (i.e., environmental influences shared by twins in a pair). In addition to incorporating twins into the study design, it is useful to quantify the phenotype in some individuals at multiple time points. The longitudinal aspect of such a study allows environmental effects to be decomposed into those that affect the phenotype over the short term and those that exert stable influence.
For the current study, urine and blood samples were collected from a cohort of MZ and DZ twins, with some twins donating samples on two occasions several months apart. Samples were analysed by 1H nuclear magnetic resonance (1H NMR) spectroscopy—an untargeted, discovery-driven technique for quantifying metabolite concentrations in biological samples. The application of 1H NMR to a biological sample creates a spectrum, made up of multiple peaks, with each peak's size quantitatively representing the concentration of its corresponding hydrogen-containing metabolite.
In each biological sample in our study, we extracted a full set of peaks, and thereby quantified the concentrations of all common plasma and urine metabolites detectable by 1H NMR. We developed bespoke statistical methods to decompose the observed concentration variation at each metabolite peak into that originating from familial, individual-environmental, and unstable-environmental sources.
We quantified the variability landscape across all common metabolite peaks in the urine and plasma 1H NMR metabolomes. We annotated a subset of peaks with a total of 65 metabolites; the variance decompositions for these are shown in Figure 1. Ten metabolites' concentrations were estimated to have familial contributions in excess of 60%. The average proportion of stable variation across all extracted metabolite peaks was estimated to be 47% in the urine samples and 60% in the plasma samples; the average estimated familiality was 30% for urine and 42% for plasma. These results comprise the first quantitative variation map of the 1H NMR metabolome. The identification and quantification of substantive widespread stability provides support for the use of these biofluids in molecular epidemiology studies. On the basis of our findings, we performed power calculations for a hypothetical study searching for predictive disease biomarkers among 1H NMR-detectable urine and plasma metabolites. Our calculations suggest that sample sizes of 2000–5000 should allow reliable identification of disease-predictive metabolite concentrations explaining 5–10% of disease risk, while greater sample sizes of 5000–20 000 would be required to identify metabolite concentrations explaining 1–2% of disease risk.
1H Nuclear Magnetic Resonance spectroscopy (1H NMR) is increasingly used to measure metabolite concentrations in sets of biological samples for top-down systems biology and molecular epidemiology. For such purposes, knowledge of the sources of human variation in metabolite concentrations is valuable, but currently sparse. We conducted and analysed a study to create such a resource. In our unique design, identical and non-identical twin pairs donated plasma and urine samples longitudinally. We acquired 1H NMR spectra on the samples, and statistically decomposed variation in metabolite concentration into familial (genetic and common-environmental), individual-environmental, and longitudinally unstable components. We estimate that stable variation, comprising familial and individual-environmental factors, accounts on average for 60% (plasma) and 47% (urine) of biological variation in 1H NMR-detectable metabolite concentrations. Clinically predictive metabolic variation is likely nested within this stable component, so our results have implications for the effective design of biomarker-discovery studies. We provide a power-calculation method which reveals that sample sizes of a few thousand should offer sufficient statistical precision to detect 1H NMR-based biomarkers quantifying predisposition to disease.
doi:10.1038/msb.2011.57
PMCID: PMC3202796  PMID: 21878913
biomarker; 1H nuclear magnetic resonance spectroscopy; metabolome-wide association study; top-down systems biology; variance decomposition
4.  Re-sequencing Expands Our Understanding of the Phenotypic Impact of Variants at GWAS Loci 
PLoS Genetics  2014;10(1):e1004147.
Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20–30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5′ and 3′ untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.
Author Summary
Abnormal serum levels of various metabolites, including measures relevant to cholesterol, other fats, and sugars, are known to be risk factors for cardiovascular disease and type 2 diabetes. Identification of the genes that play a role in generating such abnormalities could advance the development of new treatment and prevention strategies for these disorders. Investigations of common genetic variants carried out in large sets of research subjects have successfully pinpointed such genes within many regions of the human genome. However, these studies often have not led to the identification of the specific genetic variations affecting metabolic traits. To attempt to detect such causal variations, we sequenced genes in 17 genomic regions implicated in metabolic traits in >6,000 people from Finland. By conducting statistical analyses relating specific variations (individually and grouped by gene) to the measures for these metabolic traits observed in the study subjects, we added to our understanding of how genotypes affect these traits. Our findings support a long-held hypothesis that the unique history of the Finnish population provides important advantages for analyzing the relationship between genetic variations and biomedically important traits.
doi:10.1371/journal.pgen.1004147
PMCID: PMC3907339  PMID: 24497850
5.  Mapping the Genetic Architecture of Gene Expression in Human Liver 
PLoS Biology  2008;6(5):e107.
Genetic variants that are associated with common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in higher-order disease traits. Therefore, identifying the molecular phenotypes that vary in response to changes in DNA and that also associate with changes in disease traits has the potential to provide the functional information required to not only identify and validate the susceptibility genes that are directly affected by changes in DNA, but also to understand the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. Toward that end, we profiled more than 39,000 transcripts and we genotyped 782,476 unique single nucleotide polymorphisms (SNPs) in more than 400 human liver samples to characterize the genetic architecture of gene expression in the human liver, a metabolically active tissue that is important in a number of common human diseases, including obesity, diabetes, and atherosclerosis. This genome-wide association study of gene expression resulted in the detection of more than 6,000 associations between SNP genotypes and liver gene expression traits, where many of the corresponding genes identified have already been implicated in a number of human diseases. The utility of these data for elucidating the causes of common human diseases is demonstrated by integrating them with genotypic and expression data from other human and mouse populations. This provides much-needed functional support for the candidate susceptibility genes being identified at a growing number of genetic loci that have been identified as key drivers of disease from genome-wide association studies of disease. By using an integrative genomics approach, we highlight how the gene RPS26 and not ERBB3 is supported by our data as the most likely susceptibility gene for a novel type 1 diabetes locus recently identified in a large-scale, genome-wide association study. We also identify SORT1 and CELSR2 as candidate susceptibility genes for a locus recently associated with coronary artery disease and plasma low-density lipoprotein cholesterol levels in the process.
Author Summary
Genome-wide association studies seek to identify regions of the genome in which changes in DNA in a given population are correlated with disease, drug response, or other phenotypes of interest. However, changes in DNA that associate with traits like common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in the higher-order disease traits. Therefore, identifying molecular phenotypes that vary in response to changes in DNA that also associate with changes in disease traits can provide the functional information necessary to not only identify and validate the susceptibility genes directly affected by changes in DNA, but to understand as well the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. To enable this type of approach we profiled the expression levels of 39,280 transcripts and genotyped 782,476 SNPs in 427 human liver samples, identifying thousands of DNA variants that strongly associated with liver gene expression. These relationships were then leveraged by integrating them with genotypic and expression data from other human and mouse populations, leading to the direct identification of candidate susceptibility genes corresponding to genetic loci identified as key drivers of disease. Our analysis is able to provide much needed functional support for these candidate susceptibility genes.
Identifying changes in DNA that associate with changes in gene expression in human tissues elucidates the genetic architecture of gene expression in human populations and enables the direct identification of functionally supported candidate susceptibility genes in genomic regions associated with disease.
doi:10.1371/journal.pbio.0060107
PMCID: PMC2365981  PMID: 18462017
6.  Different sets of QTLs influence fitness variation in yeast 
We have carried out a combination of in-lab-evolution (ILE) and congenic crosses to identify the gene sets that contribute to the ability of yeast cells to survive under alkali stress.Each selected line acquired a different set of mutations, all resulting in the same phenotype. We identified a total of 15 genes in ILE and 17 candidates in the congenic approach, and studied their individual contribution to the phenotype.The total additive effect of the QTLs was much larger than the difference between the ancestor and the evolved strains, suggesting epistatic interactions between the QTLs.None of the genes identified encode structural components of the pH machinery. Instead, most encode regulatory functions, such as ubiquitin ligases, chromatin remodelers, GPI anchoring and copper/iron sensing transcription factors.
The majority of phenotypes in nature are complex traits affected by multiple genes [usually called quantitative trait loci (QTLs)], as well as by environmental factors. Many traits with practical importance such as crop yield in plants and susceptibility to various diseases in humans fall under this category. Understanding the architecture of complex traits has become the new frontier of genetic research, and many studies have greatly contributed to this field. However, to date, the genetic basis of only a few of these traits has been identified, and many questions regarding the architecture of complex traits and the accumulation of QTLs during evolution still remain unanswered. Among them are: How many QTLs affect complex phenotypes? What is the effect of each QTL? How do complex traits change during evolution? Is the adaptation process repeatable?, etc. In order to identify the QTLs that affect one of the important components of fitness variability in yeast, and to answer some of the questions above, we combined in-lab evolution (ILE) with the construction of congenic lines to isolate and map several gene sets that contribute to the ability of yeast cells to survive under alkali stress.
We carried out an ILE experiment, in which we grew yeast populations under increasing alkali stress to enrich for beneficial mutations. This process was followed by hybridizations to tiling arrays to identify the mutations acquired during the laboratory selective process. The ILE procedure revealed mutations in 15 genes, thus defining the QTLs and mechanisms that affect, in a quantitative fashion, the ability to cope with alkali stress. Our results indicate that during ILE several populations acquired different sets of QTLs that conferred the same phenotype. We identified each individual mutation in these strains, and validated and estimated their contribution to the phenotype. The total additive effect of the QTLs was much larger than the difference between the ancestor and the evolved strains, suggesting epistatic interactions between the QTLs.
In addition to the ILE, we have studied the mechanisms regulating fitness under alkali stress at natural habitats. We used a clinically isolated strain able to grow at high pH and a standard laboratory strain with a limited ability to sustain high pH as the parents of series of backcrosses to construct congenic lines up to the 8th generation. Seventeen genomic intervals that are candidates to contain QTLs were thus identified. In order to detect the contributing QTL in each interval, a predictive algorithm was applied, which scored the candidate genes in each genomic interval based on their interactions and similarity to the ILE genes. The algorithm was validated by testing the effect of the predicted candidate gene's deletions on the phenotype. Twelve out of 29 deletions were found to affect the trait (P-value 0.023).
Interestingly, our results show that almost all beneficial mutations affected regulatory genes, and not structural components of the pH homeostasis machinery (such as proton pumps, which control the cell's pH). The genes identified affect global regulators, such as ubiquitin ligases, proteins involved in GPI anchoring, copper sensing and chromatin remodelers. Thus, we show that adaptive changes tend to occur in genes with wide influence, rather than in genes narrowly affecting the phenotype selected for.
One example of genes identified both in the ILE and in the congenic lines is the copper-sensing transcription factor MAC1, and its downstream targets CTR1 and CTR3, which encode copper transporters. Different mutations at the same residue (Cys 271) were found in four out of five independent ILE lines. These mutations inactivate a copper-sensing region of Mac1 and cause up-regulation of its target genes. The CTR1 and CTR3 genes were identified in the congenic lines. Moreover, we found that a Ty transposable element is responsible for the decreased expression of CTR3 in some strains, and its excision caused transcriptional activation, affecting the ability to thrive at high pH.
This work provides insights on both evolutionary and genetic issues (such as the appearance of adaptive mutations and the architecture of complex traits), while at the same time providing information about the mechanisms that contribute to growth at high pH, a subject with ramifications for cell physiology, pathogenicity, and stress response.
Most of the phenotypes in nature are complex and are determined by many quantitative trait loci (QTLs). In this study we identify gene sets that contribute to one important complex trait: the ability of yeast cells to survive under alkali stress. We carried out an in-lab evolution (ILE) experiment, in which we grew yeast populations under increasing alkali stress to enrich for beneficial mutations. The populations acquired different sets of affecting alleles, showing that evolution can provide alternative solutions to the same challenge. We measured the contribution of each allele to the phenotype. The sum of the effects of the QTLs was larger than the difference between the ancestor phenotype and the evolved strains, suggesting epistatic interactions between the QTLs. In parallel, a clinical isolated strain was used to map natural QTLs affecting growth at high pH. In all, 17 candidate regions were found. Using a predictive algorithm based on the distances in protein-interaction networks, candidate genes were defined and validated by gene disruption. Many of the QTLs found by both methods are not directly implied in pH homeostasis but have more general, and often regulatory, roles.
doi:10.1038/msb.2010.1
PMCID: PMC2835564  PMID: 20160707
congenic lines; growth on alkali; in-lab evolution; QTL mapping; Saccharomyces cerevisiae
7.  Geographic Distribution of Staphylococcus aureus Causing Invasive Infections in Europe: A Molecular-Epidemiological Analysis 
PLoS Medicine  2010;7(1):e1000215.
Hajo Grundmann and colleagues describe the development of a new interactive mapping tool for analyzing the spatial distribution of invasive Staphylococcus aureus clones.
Background
Staphylococcus aureus is one of the most important human pathogens and methicillin-resistant variants (MRSAs) are a major cause of hospital and community-acquired infection. We aimed to map the geographic distribution of the dominant clones that cause invasive infections in Europe.
Methods and Findings
In each country, staphylococcal reference laboratories secured the participation of a sufficient number of hospital laboratories to achieve national geo-demographic representation. Participating laboratories collected successive methicillin-susceptible (MSSA) and MRSA isolates from patients with invasive S. aureus infection using an agreed protocol. All isolates were sent to the respective national reference laboratories and characterised by quality-controlled sequence typing of the variable region of the staphylococcal spa gene (spa typing), and data were uploaded to a central database. Relevant genetic and phenotypic information was assembled for interactive interrogation by a purpose-built Web-based mapping application. Between September 2006 and February 2007, 357 laboratories serving 450 hospitals in 26 countries collected 2,890 MSSA and MRSA isolates from patients with invasive S. aureus infection. A wide geographical distribution of spa types was found with some prevalent in all European countries. MSSA were more diverse than MRSA. Genetic diversity of MRSA differed considerably between countries with dominant MRSA spa types forming distinctive geographical clusters. We provide evidence that a network approach consisting of decentralised typing and visualisation of aggregated data using an interactive mapping tool can provide important information on the dynamics of MRSA populations such as early signalling of emerging strains, cross border spread, and importation by travel.
Conclusions
In contrast to MSSA, MRSA spa types have a predominantly regional distribution in Europe. This finding is indicative of the selection and spread of a limited number of clones within health care networks, suggesting that control efforts aimed at interrupting the spread within and between health care institutions may not only be feasible but ultimately successful and should therefore be strongly encouraged.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
The bacterium Staphylococcus aureus lives on the skin and in the nose of about a third of healthy people. Although S. aureus usually coexists peacefully with its human carriers, it is also an important disease-causing organism or pathogen. If it enters the body through a cut or during a surgical procedure, S. aureus can cause minor infections such as pimples and boils or more serious, life-threatening infections such as blood poisoning and pneumonia. Minor S. aureus infections can be treated without antibiotics—by draining a boil, for example. Invasive infections are usually treated with antibiotics. Unfortunately, many of the S. aureus clones (groups of bacteria that are all genetically related and descended from a single, common ancestor) that are now circulating are resistant to methicillin and several other antibiotics. Invasive methicillin-resistant S. aureus (MRSA) infections are a particular problem in hospitals and other health care facilities (so-called hospital-acquired MRSA infections), but they can also occur in otherwise healthy people who have not been admitted to a hospital (community-acquired MRSA infections).
Why Was This Study Done?
The severity and outcome of an S. aureus infection in an individual depends in part on the ability of the bacterial clone with which the individual is infected to cause disease—the clone's “virulence.” Public-health officials and infectious disease experts would like to know the geographic distribution of the virulent S. aureus clones that cause invasive infections, because this information should help them understand how these pathogens spread and thus how to control them. Different clones of S. aureus can be distinguished by “molecular typing,” the determination of clone-specific sequences of nucleotides in variable regions of the bacterial genome (the bacterium's blueprint; genomes consist of DNA, long chains of nucleotides). In this study, the researchers use molecular typing to map the geographic distribution of MRSA and methicillin-sensitive S. aureus (MSSA) clones causing invasive infections in Europe; a MRSA clone emerges when an MSSA clone acquires antibiotic resistance from another type of bacteria so it is useful to understand the geographic distribution of both MRSA and MSSA.
What Did the Researchers Do and Find?
Between September 2006 and February 2007, 357 laboratories serving 450 hospitals in 26 European countries collected almost 3,000 MRSA and MSSA isolates from patients with invasive S. aureus infections. The isolates were sent to the relevant national staphylococcal reference laboratory (SRL) where they were characterized by quality-controlled sequence typing of the variable region of a staphylococcal gene called spa (spa typing). The spa typing data were entered into a central database and then analyzed by a public, purpose-built Web-based mapping tool (SRL-Maps), which provides interactive access and easy-to-understand illustrations of the geographical distribution of S. aureus clones. Using this mapping tool, the researchers found that there was a wide geographical distribution of spa types across Europe with some types being common in all European countries. MSSA isolates were more diverse than MRSA isolates and the genetic diversity (variability) of MRSA differed considerably between countries. Most importantly, major MRSA spa types occurred in distinct geographical clusters.
What Do These Findings Mean?
These findings provide the first representative snapshot of the genetic population structure of S. aureus across Europe. Because the researchers used spa typing, which analyzes only a small region of one gene, and characterized only 3,000 isolates, analysis of other parts of the S. aureus genome in more isolates is now needed to build a complete portrait of the geographical abundance of the S. aureus clones that cause invasive infections in Europe. However, the finding that MRSA spa types occur mainly in geographical clusters has important implications for the control of MRSA, because it indicates that a limited number of clones are spreading within health care networks, which means that MRSA is mainly spread by patients who are repeatedly admitted to different hospitals. Control efforts aimed at interrupting this spread within and between health care institutions may be feasible and ultimately successful, suggest the researchers, and should be strongly encouraged. In addition, this study shows how, by sharing typing results on a Web-based platform, an international surveillance network can provide clinicians and infection control teams with crucial information about the dynamics of pathogens such as S. aureus, including early warnings about emerging virulent clones.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000215.
This study is further discussed in a PLoS Medicine Perspective by Franklin D. Lowy
The UK Health Protection Agency provides information about Staphylococcus aureus
The UK National Health Service Choices Web site has pages on staphylococcal infections and on MRSA
The US National Institute of Allergy and Infectious Disease has information about MRSA
The US Centers for Disease Control and Infection provides information about MRSA for the public and professionals
MedlinePlus provides links to further resources on staphylococcal infections and on MRSA (in English and Spanish)
SRL-Maps can be freely accessed
doi:10.1371/journal.pmed.1000215
PMCID: PMC2796391  PMID: 20084094
8.  Selection on Plant Male Function Genes Identifies Candidates for Reproductive Isolation of Yellow Monkeyflowers 
PLoS Genetics  2013;9(12):e1003965.
Understanding the genetic basis of reproductive isolation promises insight into speciation and the origins of biological diversity. While progress has been made in identifying genes underlying barriers to reproduction that function after fertilization (post-zygotic isolation), we know much less about earlier acting pre-zygotic barriers. Of particular interest are barriers involved in mating and fertilization that can evolve extremely rapidly under sexual selection, suggesting they may play a prominent role in the initial stages of reproductive isolation. A significant challenge to the field of speciation genetics is developing new approaches for identification of candidate genes underlying these barriers, particularly among non-traditional model systems. We employ powerful proteomic and genomic strategies to study the genetic basis of conspecific pollen precedence, an important component of pre-zygotic reproductive isolation among yellow monkeyflowers (Mimulus spp.) resulting from male pollen competition. We use isotopic labeling in combination with shotgun proteomics to identify more than 2,000 male function (pollen tube) proteins within maternal reproductive structures (styles) of M. guttatus flowers where pollen competition occurs. We then sequence array-captured pollen tube exomes from a large outcrossing population of M. guttatus, and identify those genes with evidence of selective sweeps or balancing selection consistent with their role in pollen competition. We also test for evidence of positive selection on these genes more broadly across yellow monkeyflowers, because a signal of adaptive divergence is a common feature of genes causing reproductive isolation. Together the molecular evolution studies identify 159 pollen tube proteins that are candidate genes for conspecific pollen precedence. Our work demonstrates how powerful proteomic and genomic tools can be readily adapted to non-traditional model systems, allowing for genome-wide screens towards the goal of identifying the molecular basis of genetically complex traits.
Author Summary
Barriers to reproduction are necessary for generating new species. Little is known about the genes underlying reproductive barriers, particularly those that function prior to fertilization, but their identity is of great interest as they offer insight into the genetic mechanisms and evolutionary forces generating biological diversity. In this work, we use an emerging plant model system for speciation studies (yellow monkeyflowers, species of Mimulus) to identify genes that might influence the relative competitive abilities of male pollen from the same versus different species within the maternal flower's style. This is a common reproductive barrier among plant taxa known as conspecific pollen precedence (CPP), and is analogous to sperm competition during animal fertilization. We first identify the pollen proteins that are found within the style where pollen competition occurs, and then screen these for evidence that may indicate which genes have been targets of pollen competition (a form of sexual selection among individuals of a population) or adaptive diversification among species of yellow monkeyflowers (a common feature of genes underlying reproductive barriers). Our evolutionary analyses identify 159 candidates that may function in reproductive isolation of yellow monkeyflowers, and provide some of the first broad perspectives on evolution of plant reproductive genes.
doi:10.1371/journal.pgen.1003965
PMCID: PMC3854799  PMID: 24339787
9.  Genome-Wide Association Studies in an Isolated Founder Population from the Pacific Island of Kosrae 
PLoS Genetics  2009;5(2):e1000365.
It has been argued that the limited genetic diversity and reduced allelic heterogeneity observed in isolated founder populations facilitates discovery of loci contributing to both Mendelian and complex disease. A strong founder effect, severe isolation, and substantial inbreeding have dramatically reduced genetic diversity in natives from the island of Kosrae, Federated States of Micronesia, who exhibit a high prevalence of obesity and other metabolic disorders. We hypothesized that genetic drift and possibly natural selection on Kosrae might have increased the frequency of previously rare genetic variants with relatively large effects, making these alleles readily detectable in genome-wide association analysis. However, mapping in large, inbred cohorts introduces analytic challenges, as extensive relatedness between subjects violates the assumptions of independence upon which traditional association test statistics are based. We performed genome-wide association analysis for 15 quantitative traits in 2,906 members of the Kosrae population, using novel approaches to manage the extreme relatedness in the sample. As positive controls, we observe association to known loci for plasma cholesterol, triglycerides, and C-reactive protein and to a compelling candidate loci for thyroid stimulating hormone and fasting plasma glucose. We show that our study is well powered to detect common alleles explaining ≥5% phenotypic variance. However, no such large effects were observed with genome-wide significance, arguing that even in such a severely inbred population, common alleles typically have modest effects. Finally, we show that a majority of common variants discovered in Caucasians have indistinguishable effect sizes on Kosrae, despite the major differences in population genetics and environment.
Author Summary
Isolated populations have contributed to the discovery of loci with simple Mendelian segregation and large effects on disease risk or trait variation. We hypothesized that the use of isolated populations might also facilitate the discovery of common alleles contributing to complex traits with relatively larger effects. However, the use of association analyses to map common loci influencing trait variation in large, inbred cohorts introduces analytic challenges, as extensive relatedness between subjects violates the assumptions of independence upon which traditional association test statistics are based. We developed an analytic strategy to perform genome-wide association studies in an inbred family containing over 2,800 individuals from the island of Kosrae, Federated States of Micronesia. No alleles with large effect were observed with strong statistical support in any of the 15 traits examined, suggesting that the contribution of individual common variants to complex trait variation in Kosraens is typically not much greater than that observed in other populations. We show that the effects of many loci previously identified in Caucasian populations are indistinguishable in Caucasians and Kosraens, despite very different population genetics and environmental influences.
doi:10.1371/journal.pgen.1000365
PMCID: PMC2628735  PMID: 19197348
10.  A Multiparent Advanced Generation Inter-Cross to Fine-Map Quantitative Traits in Arabidopsis thaliana 
PLoS Genetics  2009;5(7):e1000551.
Identifying natural allelic variation that underlies quantitative trait variation remains a fundamental problem in genetics. Most studies have employed either simple synthetic populations with restricted allelic variation or performed association mapping on a sample of naturally occurring haplotypes. Both of these approaches have some limitations, therefore alternative resources for the genetic dissection of complex traits continue to be sought. Here we describe one such alternative, the Multiparent Advanced Generation Inter-Cross (MAGIC). This approach is expected to improve the precision with which QTL can be mapped, improving the outlook for QTL cloning. Here, we present the first panel of MAGIC lines developed: a set of 527 recombinant inbred lines (RILs) descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana. These lines and the 19 founders were genotyped with 1,260 single nucleotide polymorphisms and phenotyped for development-related traits. Analytical methods were developed to fine-map quantitative trait loci (QTL) in the MAGIC lines by reconstructing the genome of each line as a mosaic of the founders. We show by simulation that QTL explaining 10% of the phenotypic variance will be detected in most situations with an average mapping error of about 300 kb, and that if the number of lines were doubled the mapping error would be under 200 kb. We also show how the power to detect a QTL and the mapping accuracy vary, depending on QTL location. We demonstrate the utility of this new mapping population by mapping several known QTL with high precision and by finding novel QTL for germination data and bolting time. Our results provide strong support for similar ongoing efforts to produce MAGIC lines in other organisms.
Author Summary
Most traits of economic and evolutionary interest vary quantitatively and have multiple genes affecting their expression. Dissecting the genetic basis of such traits is crucial for the improvement of crops and management of diseases. Here, we develop a new resource to identify genes underlying such quantitative traits in Arabidopsis thaliana, a genetic model organism in plants. We show that using a large population of inbred lines derived from intercrossing 19 parents, we can localize the genes underlying quantitative traits better than with existing methods. Using these lines, we were able to replicate the identification of previously known genes that affect developmental traits in A. thaliana and identify some new ones. This paper also presents all the necessary biological and computational material necessary for the scientific community to use these lines in their own research. Our results suggest that the use of lines derived from a multiparent advanced generation inter-cross (MAGIC lines) should be very useful in other organisms.
doi:10.1371/journal.pgen.1000551
PMCID: PMC2700969  PMID: 19593375
11.  CHARACTERIZATION OF THE METABOCHIP IN DIVERSE POPULATIONS FROM THE INTERNATIONAL HAPMAP PROJECT IN THE EPIDEMIOLOGIC ARCHITECTURE FOR GENES LINKED TO ENVIRONMENT (EAGLE) PROJECT 
Genome-wide association studies (GWAS) have identified hundreds of genomic regions associated with common human disease and quantitative traits. A major research avenue for mature genotype-phenotype associations is the identification of the true risk or functional variant for downstream molecular studies or personalized medicine applications. As part of the Population Architecture using Genomics and Epidemiology (PAGE) study, we as Epidemiologic Architecture for Genes Linked to Environment (EAGLE) are fine-mapping GWAS-identified genomic regions for common diseases and quantitative traits. We are currently genotyping the Metabochip, a custom content BeadChip designed for fine-mapping metabolic diseases and traits, in~15,000 DNA samples from patients of African, Hispanic, and Asian ancestry linked to deidentified electronic medical records from the Vanderbilt University biorepository (BioVU). As an initial study of quality control, we report here the genotyping data for 360 samples of European, African, Asian, and Mexican descent from the International HapMap Project. In addition to quality control metrics, we report the overall allele frequency distribution, overall population differentiation (as measured by FST), and linkage disequilibrium patterns for a select GWAS-identified region associated with low-density lipoprotein cholesterol levels to illustrate the utility of the Metabochip for fine-mapping studies in the diverse populations expected in EAGLE, the PAGE study, and other efforts underway designed to characterize the complex genetic architecture underlying common human disease and quantitative traits.
PMCID: PMC3584704  PMID: 23424124
12.  Liver and Adipose Expression Associated SNPs Are Enriched for Association to Type 2 Diabetes 
PLoS Genetics  2010;6(5):e1000932.
Genome-wide association studies (GWAS) have demonstrated the ability to identify the strongest causal common variants in complex human diseases. However, to date, the massive data generated from GWAS have not been maximally explored to identify true associations that fail to meet the stringent level of association required to achieve genome-wide significance. Genetics of gene expression (GGE) studies have shown promise towards identifying DNA variations associated with disease and providing a path to functionally characterize findings from GWAS. Here, we present the first empiric study to systematically characterize the set of single nucleotide polymorphisms associated with expression (eSNPs) in liver, subcutaneous fat, and omental fat tissues, demonstrating these eSNPs are significantly more enriched for SNPs that associate with type 2 diabetes (T2D) in three large-scale GWAS than a matched set of randomly selected SNPs. This enrichment for T2D association increases as we restrict to eSNPs that correspond to genes comprising gene networks constructed from adipose gene expression data isolated from a mouse population segregating a T2D phenotype. Finally, by restricting to eSNPs corresponding to genes comprising an adipose subnetwork strongly predicted as causal for T2D, we dramatically increased the enrichment for SNPs associated with T2D and were able to identify a functionally related set of diabetes susceptibility genes. We identified and validated malic enzyme 1 (Me1) as a key regulator of this T2D subnetwork in mouse and provided support for the association of this gene to T2D in humans. This integration of eSNPs and networks provides a novel approach to identify disease susceptibility networks rather than the single SNPs or genes traditionally identified through GWAS, thereby extracting additional value from the wealth of data currently being generated by GWAS.
Author Summary
Genome-wide association studies (GWAS) seek to identify loci in which changes in DNA are correlated with disease. However, GWAS do not necessarily lead directly to genes associated with disease, and they do not typically inform the broader context in which disease genes operate, thereby providing limited insights into the mechanisms driving disease. One critical task to providing further insights into GWAS is developing an understanding of the genetics of gene expression (GGE). We present the first empiric study demonstrating that SNPs in human cohorts that associate with gene expression in liver and adipose tissues are enriched for associating with Type 2 Diabetes (T2D) in humans. By filtering “eSNPs” based on causal gene networks defined in an experimental cross population segregating T2D traits, we demonstrate a dramatically increased enrichment of T2D SNPs that enhance our ability to assess T2D risk. We demonstrate the utility of this approach by identifying malic enzyme 1 (ME1) as a novel T2D susceptibility gene in humans and then functionally validating the causal connection between ME1 and T2D in a mouse knockout model for Me1. This approach provides a path to identifying disease susceptibility networks rather than single SNPs or genes traditionally identified through GWAS.
doi:10.1371/journal.pgen.1000932
PMCID: PMC2865508  PMID: 20463879
13.  Combining Genome-Wide Association Mapping and Transcriptional Networks to Identify Novel Genes Controlling Glucosinolates in Arabidopsis thaliana 
PLoS Biology  2011;9(8):e1001125.
Genome-wide association mapping is highly sensitive to environmental changes, but network analysis allows rapid causal gene identification.
Background
Genome-wide association (GWA) is gaining popularity as a means to study the architecture of complex quantitative traits, partially due to the improvement of high-throughput low-cost genotyping and phenotyping technologies. Glucosinolate (GSL) secondary metabolites within Arabidopsis spp. can serve as a model system to understand the genomic architecture of adaptive quantitative traits. GSL are key anti-herbivory defenses that impart adaptive advantages within field trials. While little is known about how variation in the external or internal environment of an organism may influence the efficiency of GWA, GSL variation is known to be highly dependent upon the external stresses and developmental processes of the plant lending it to be an excellent model for studying conditional GWA.
Methodology/Principal Findings
To understand how development and environment can influence GWA, we conducted a study using 96 Arabidopsis thaliana accessions, >40 GSL phenotypes across three conditions (one developmental comparison and one environmental comparison) and ∼230,000 SNPs. Developmental stage had dramatic effects on the outcome of GWA, with each stage identifying different loci associated with GSL traits. Further, while the molecular bases of numerous quantitative trait loci (QTL) controlling GSL traits have been identified, there is currently no estimate of how many additional genes may control natural variation in these traits. We developed a novel co-expression network approach to prioritize the thousands of GWA candidates and successfully validated a large number of these genes as influencing GSL accumulation within A. thaliana using single gene isogenic lines.
Conclusions/Significance
Together, these results suggest that complex traits imparting environmentally contingent adaptive advantages are likely influenced by up to thousands of loci that are sensitive to fluctuations in the environment or developmental state of the organism. Additionally, while GWA is highly conditional upon genetics, the use of additional genomic information can rapidly identify causal loci en masse.
Author Summary
Understanding how genetic variation can control phenotypic variation is a fundamental goal of modern biology. A major push has been made using genome-wide association mapping in all organisms to attempt and rapidly identify the genes contributing to phenotypes such as disease and nutritional disorders. But a number of fundamental questions have not been answered about the use of genome-wide association: for example, how does the internal or external environment influence the genes found? Furthermore, the simple question of how many genes may influence a trait is unknown. Finally, a number of studies have identified significant false-positive and -negative issues within genome-wide association studies that are not solvable by direct statistical approaches. We have used genome-wide association mapping in the plant Arabidopsis thaliana to begin exploring these questions. We show that both external and internal environments significantly alter the identified genes, such that using different tissues can lead to the identification of nearly completely different gene sets. Given the large number of potential false-positives, we developed an orthogonal approach to filtering the possible genes, by identifying co-functioning networks using the nominal candidate gene list derived from genome-wide association studies. This allowed us to rapidly identify and validate a large number of novel and unexpected genes that affect Arabidopsis thaliana defense metabolism within phenotypic ranges that have been shown to be selectable within the field. These genes and the associated networks suggest that Arabidopsis thaliana defense metabolism is more readily similar to the infinite gene hypothesis, according to which there is a vast number of causative genes controlling natural variation in this phenotype. It remains to be seen how frequently this is true for other organisms and other phenotypes.
doi:10.1371/journal.pbio.1001125
PMCID: PMC3156686  PMID: 21857804
14.  Linkage and Association Mapping of Arabidopsis thaliana Flowering Time in Nature 
PLoS Genetics  2010;6(5):e1000940.
Flowering time is a key life-history trait in the plant life cycle. Most studies to unravel the genetics of flowering time in Arabidopsis thaliana have been performed under greenhouse conditions. Here, we describe a study about the genetics of flowering time that differs from previous studies in two important ways: first, we measure flowering time in a more complex and ecologically realistic environment; and, second, we combine the advantages of genome-wide association (GWA) and traditional linkage (QTL) mapping. Our experiments involved phenotyping nearly 20,000 plants over 2 winters under field conditions, including 184 worldwide natural accessions genotyped for 216,509 SNPs and 4,366 RILs derived from 13 independent crosses chosen to maximize genetic and phenotypic diversity. Based on a photothermal time model, the flowering time variation scored in our field experiment was poorly correlated with the flowering time variation previously obtained under greenhouse conditions, reinforcing previous demonstrations of the importance of genotype by environment interactions in A. thaliana and the need to study adaptive variation under natural conditions. The use of 4,366 RILs provides great power for dissecting the genetic architecture of flowering time in A. thaliana under our specific field conditions. We describe more than 60 additive QTLs, all with relatively small to medium effects and organized in 5 major clusters. We show that QTL mapping increases our power to distinguish true from false associations in GWA mapping. QTL mapping also permits the identification of false negatives, that is, causative SNPs that are lost when applying GWA methods that control for population structure. Major genes underpinning flowering time in the greenhouse were not associated with flowering time in this study. Instead, we found a prevalence of genes involved in the regulation of the plant circadian clock. Furthermore, we identified new genomic regions lacking obvious candidate genes.
Author Summary
Dissecting the genetic bases of adaptive traits is of primary importance in evolutionary biology. In this study, we combined a genome-wide association (GWA) study with traditional linkage mapping in order to detect the genetic bases underlying natural variation in flowering time in ecologically realistic conditions in the plant Arabidopsis thaliana. Our study involved phenotyping nearly 20,000 plants over 2 winters under field conditions in a temperate climate. We show that combined linkage and association mapping clearly outperforms each method alone when it comes to identifying true associations. This highlights the utility of combining different methods to localize genes involved in complex trait natural variation. Most candidate genes found in this study are involved in the regulation of the plant circadian clock and, surprisingly, were not associated with flowering time scored under greenhouse conditions. While rapid advances have been made in high-throughput genotyping and sequencing, high-throughput phenotyping of complex traits under natural conditions will be the next challenge for dissecting the genetic bases of adaptive variation in “laboratory” model organisms.
doi:10.1371/journal.pgen.1000940
PMCID: PMC2865524  PMID: 20463887
15.  Observing copepods through a genomic lens 
Frontiers in Zoology  2011;8:22.
Background
Copepods outnumber every other multicellular animal group. They are critical components of the world's freshwater and marine ecosystems, sensitive indicators of local and global climate change, key ecosystem service providers, parasites and predators of economically important aquatic animals and potential vectors of waterborne disease. Copepods sustain the world fisheries that nourish and support human populations. Although genomic tools have transformed many areas of biological and biomedical research, their power to elucidate aspects of the biology, behavior and ecology of copepods has only recently begun to be exploited.
Discussion
The extraordinary biological and ecological diversity of the subclass Copepoda provides both unique advantages for addressing key problems in aquatic systems and formidable challenges for developing a focused genomics strategy. This article provides an overview of genomic studies of copepods and discusses strategies for using genomics tools to address key questions at levels extending from individuals to ecosystems. Genomics can, for instance, help to decipher patterns of genome evolution such as those that occur during transitions from free living to symbiotic and parasitic lifestyles and can assist in the identification of genetic mechanisms and accompanying physiological changes associated with adaptation to new or physiologically challenging environments. The adaptive significance of the diversity in genome size and unique mechanisms of genome reorganization during development could similarly be explored. Genome-wide and EST studies of parasitic copepods of salmon and large EST studies of selected free-living copepods have demonstrated the potential utility of modern genomics approaches for the study of copepods and have generated resources such as EST libraries, shotgun genome sequences, BAC libraries, genome maps and inbred lines that will be invaluable in assisting further efforts to provide genomics tools for copepods.
Summary
Genomics research on copepods is needed to extend our exploration and characterization of their fundamental biological traits, so that we can better understand how copepods function and interact in diverse environments. Availability of large scale genomics resources will also open doors to a wide range of systems biology type studies that view the organism as the fundamental system in which to address key questions in ecology and evolution.
doi:10.1186/1742-9994-8-22
PMCID: PMC3184258  PMID: 21933388
genome organization; ecogenomics; parasitism and symbiosis; biological invasion; diapause; response to environmental change
16.  An Atypical Kinase under Balancing Selection Confers Broad-Spectrum Disease Resistance in Arabidopsis 
PLoS Genetics  2013;9(9):e1003766.
The failure of gene-for-gene resistance traits to provide durable and broad-spectrum resistance in an agricultural context has led to the search for genes underlying quantitative resistance in plants. Such genes have been identified in only a few cases, all for fungal or nematode resistance, and encode diverse molecular functions. However, an understanding of the molecular mechanisms of quantitative resistance variation to other enemies and the associated evolutionary forces shaping this variation remain largely unknown. We report the identification, map-based cloning and functional validation of QRX3 (RKS1, Resistance related KinaSe 1), conferring broad-spectrum resistance to Xanthomonas campestris (Xc), a devastating worldwide bacterial vascular pathogen of crucifers. RKS1 encodes an atypical kinase that mediates a quantitative resistance mechanism in plants by restricting bacterial spread from the infection site. Nested Genome-Wide Association mapping revealed a major locus corresponding to an allelic series at RKS1 at the species level. An association between variation in resistance and RKS1 transcription was found using various transgenic lines as well as in natural accessions, suggesting that regulation of RKS1 expression is a major component of quantitative resistance to Xc. The co-existence of long lived RKS1 haplotypes in A. thaliana is shared with a variety of genes involved in pathogen recognition, suggesting common selective pressures. The identification of RKS1 constitutes a starting point for deciphering the mechanisms underlying broad spectrum quantitative disease resistance that is effective against a devastating and vascular crop pathogen. Because putative RKS1 orthologous have been found in other Brassica species, RKS1 provides an exciting opportunity for plant breeders to improve resistance to black rot in crops.
Author Summary
During the evolution of plant-pathogen interactions, plants have evolved the capability to defend themselves from pathogen infection by different overlapping mechanisms. Disease resistance is constituted by an elaborate, multilayered system of defense. Among these responses, quantitative resistance is a prevalent form of resistance in crops and natural plant populations, for which the genetic and molecular bases remain largely unknown. Thus, identification of the genes underlying quantitative resistance constitutes a major challenge in plant breeding and evolutionary biology, and might have enormous practical implications for human health by increasing crop yield and quality. Our work contributes to understanding the molecular bases of quantitative resistance to the vascular pathogen Xanthomonas campestris (Xc), which is responsible for black rot, an important disease of crucifers worldwide. By multiple approaches, we demonstrate that RKS1 is a quantitative resistance gene in Arabidopsis thaliana conferring broad-spectrum resistance to Xc and that this resistance mechanism in plants is associated with regulation of RKS1 expression. We also provide evidence that RKS1 allelic variation is a major component of quantitative resistance to Xc at the species level. Finally, the long-lived polymorphism associated with RKS1 suggests that evolutionary stable broad-spectrum resistance to Xc may be achieved in natural populations of A. thaliana.
doi:10.1371/journal.pgen.1003766
PMCID: PMC3772041  PMID: 24068949
17.  A High-Resolution Map of Segmental DNA Copy Number Variation in the Mouse Genome 
PLoS Genetics  2007;3(1):e3.
Submicroscopic (less than 2 Mb) segmental DNA copy number changes are a recently recognized source of genetic variability between individuals. The biological consequences of copy number variants (CNVs) are largely undefined. In some cases, CNVs that cause gene dosage effects have been implicated in phenotypic variation. CNVs have been detected in diverse species, including mice and humans. Published studies in mice have been limited by resolution and strain selection. We chose to study 21 well-characterized inbred mouse strains that are the focus of an international effort to measure, catalog, and disseminate phenotype data. We performed comparative genomic hybridization using long oligomer arrays to characterize CNVs in these strains. This technique increased the resolution of CNV detection by more than an order of magnitude over previous methodologies. The CNVs range in size from 21 to 2,002 kb. Clustering strains by CNV profile recapitulates aspects of the known ancestry of these strains. Most of the CNVs (77.5%) contain annotated genes, and many (47.5%) colocalize with previously mapped segmental duplications in the mouse genome. We demonstrate that this technique can identify copy number differences associated with known polymorphic traits. The phenotype of previously uncharacterized strains can be predicted based on their copy number at these loci. Annotation of CNVs in the mouse genome combined with sequence-based analysis provides an important resource that will help define the genetic basis of complex traits.
Author Summary
A major goal of genetics and genomics is to understand how genetic differences between individuals (genotypes) translate into variation in disease susceptibility, behavior, and many other organism-level characteristics (phenotypes). While the sizes of genetic variants range from a single base to whole chromosomes, historically, only the extreme ends of this spectrum have been explored. DNA copy number variants (CNVs) lie between these two extremes, ranging in size from hundreds to millions of bases. The recent application of microarray technology to detect genetic variation in humans has led to the realization that CNVs are common. In fact, rough estimates indicate that CNVs and small-scale variants may constitute similar proportions of total genomic DNA. In this report, the authors characterize 80 CNVs across the genomes of 21 inbred strains of mice. The identification and characterization of mouse CNVs are important because inbred strains of mice are the most widely used model system to explore biomedical genetics. These CNVs are located near another class of genomic features, segmental duplications, more often than would be expected by chance, which supports the hypothesis that CNVs and segmental duplications are causally linked. Importantly, many of the CNVs contain known genes and thus may underlie both gene expression and phenotypic variation between strains.
doi:10.1371/journal.pgen.0030003
PMCID: PMC1761046  PMID: 17206864
18.  The Role of Abcb5 Alleles in Susceptibility to Haloperidol-Induced Toxicity in Mice and Humans 
PLoS Medicine  2015;12(2):e1001782.
Background
We know very little about the genetic factors affecting susceptibility to drug-induced central nervous system (CNS) toxicities, and this has limited our ability to optimally utilize existing drugs or to develop new drugs for CNS disorders. For example, haloperidol is a potent dopamine antagonist that is used to treat psychotic disorders, but 50% of treated patients develop characteristic extrapyramidal symptoms caused by haloperidol-induced toxicity (HIT), which limits its clinical utility. We do not have any information about the genetic factors affecting this drug-induced toxicity. HIT in humans is directly mirrored in a murine genetic model, where inbred mouse strains are differentially susceptible to HIT. Therefore, we genetically analyzed this murine model and performed a translational human genetic association study.
Methods and Findings
A whole genome SNP database and computational genetic mapping were used to analyze the murine genetic model of HIT. Guided by the mouse genetic analysis, we demonstrate that genetic variation within an ABC-drug efflux transporter (Abcb5) affected susceptibility to HIT. In situ hybridization results reveal that Abcb5 is expressed in brain capillaries, and by cerebellar Purkinje cells. We also analyzed chromosome substitution strains, imaged haloperidol abundance in brain tissue sections and directly measured haloperidol (and its metabolite) levels in brain, and characterized Abcb5 knockout mice. Our results demonstrate that Abcb5 is part of the blood-brain barrier; it affects susceptibility to HIT by altering the brain concentration of haloperidol. Moreover, a genetic association study in a haloperidol-treated human cohort indicates that human ABCB5 alleles had a time-dependent effect on susceptibility to individual and combined measures of HIT. Abcb5 alleles are pharmacogenetic factors that affect susceptibility to HIT, but it is likely that additional pharmacogenetic susceptibility factors will be discovered.
Conclusions
ABCB5 alleles alter susceptibility to HIT in mouse and humans. This discovery leads to a new model that (at least in part) explains inter-individual differences in susceptibility to a drug-induced CNS toxicity.
Gary Peltz and colleagues examine the role of ABCB5 alleles in haloperidol-induced toxicity in a murine genetic model and humans treated with haloperidol.
Editors' Summary
Background
The brain is the control center of the human body. This complex organ controls thoughts, memory, speech, and movement, it is the seat of intelligence, and it regulates the function of many organs. The brain comprises many different parts, all of which work together but all of which have their own special functions. For example, the forebrain is involved in intellectual activities such as thinking whereas the hindbrain controls the body’s vital functions and movements. Messages are passed between the various regions of the brain and to other parts of the body by specialized cells called neurons, which release and receive signal molecules known as neurotransmitters. Like all the organs in the body, blood vessels supply the brain with the oxygen, water, and nutrients it needs to function. Importantly, however, the brain is protected from infectious agents and other potentially dangerous substances circulating in the blood by the “blood-brain barrier,” a highly selective permeability barrier that is formed by the cells lining the fine blood vessels (capillaries) within the brain.
Why Was This Study Done?
Although drugs have been developed to treat various brain disorders, more active and less toxic drugs are needed to improve the treatment of many if not most of these conditions. Unfortunately, relatively little is known about how the blood-brain barrier regulates the entry of drugs into the brain or about the genetic factors that affect the brain’s susceptibility to drug-induced toxicities. It is not known, for example, why about half of patients given haloperidol—a drug used to treat psychotic disorders (conditions that affect how people think, feel, or behave)—develop tremors and other symptoms caused by alterations in the brain region that controls voluntary movements. Here, to improve our understanding of how drugs enter the brain and impact its function, the researchers investigate the genetic factors that affect haloperidol-induced toxicity by genetically analyzing several inbred mouse strains (every individual in an inbred mouse strain is genetically identical) with different susceptibilities to haloperidol-induced toxicity and by undertaking a human genetic association study (a study that looks for non-chance associations between specific traits and genetic variants).
What Did the Researchers Do and Find?
The researchers used a database of genetic variants called single nucleotide polymorphisms (SNPs) and a computational genetic mapping approach to show first that variations within the gene encoding Abcb5 affected susceptibility to haloperidol-induced toxicity (indicated by changes in the length of time taken by mice to move their paws when placed on an inclined wire-mesh screen) among inbred mouse strains. Abcb5 is an ATP-binding cassette transporter, a type of protein that moves molecules across cell membranes. The researchers next showed that Abcb5 is expressed in brain capillaries, which is the location of the blood-brain barrier. Abcb5 was also expressed in cerebellar Purkinje cells, which help to control motor (intentional) movements. They also measured the measured the effect of haloperidol and the haloperidol concentration in brain tissue sections in mice that were genetically engineered to make no Abcb5 (Abcb5 knockout mice). Finally, the researchers investigated whether specific alleles (alternative versions) of ABCB5 are associated with haloperidol-induced toxicity in people. Among a group of 85 patients treated with haloperidol for a psychotic illness, one specific ABCB5 allele was associated with haloperidol-induced toxicity during the first few days of treatment.
What Do These Findings Mean?
These findings indicate that Abcb5 is a component of the blood-brain barrier in mice and suggest that genetic variants in the gene encoding this protein underlie, at least in part, the differences in susceptibility to haloperidol-induced toxicity seen among inbred mice strains. Moreover, the human genetic association study indicates that a specific ABCB5 allele also affects the susceptibility of people to haloperidol-induced toxicity. The researchers note that other ABCB5 alleles or other genetic factors that affect haloperidol-induced toxicity in people might emerge if larger groups of patients were studied. However, based on their findings, the researchers propose a new model for the genetic mechanisms that underlie inter-individual and cell type-specific differences in susceptibility to haloperidol-induced brain toxicity. If confirmed in future studies, this model might facilitate the development of more effective and less toxic drugs to treat a range of brain disorders.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001782.
The US National Institute of Neurological Disorders and Stroke provides information about a wide range of brain diseases (in English and Spanish); its fact sheet “Brain Basics: Know Your Brain” is a simple introduction to the human brain; its “Blueprint Neurotherapeutics Network” was established to develop new drugs for disorders affecting the brain and other parts of the nervous system
MedlinePlus provides links to additional resources about brain diseases and their treatment (in English and Spanish)
Wikipedia provides information about haloperidol, about ATP-binding cassette transporters and about genetic association (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.1001782
PMCID: PMC4315575  PMID: 25647612
19.  The Genomic Architecture of Population Divergence between Subspecies of the European Rabbit 
PLoS Genetics  2014;10(8):e1003519.
The analysis of introgression of genomic regions between divergent populations provides an excellent opportunity to determine the genetic basis of reproductive isolation during the early stages of speciation. However, hybridization and subsequent gene flow must be relatively common in order to localize individual loci that resist introgression. In this study, we used next-generation sequencing to study genome-wide patterns of genetic differentiation between two hybridizing subspecies of rabbits (Oryctolagus cuniculus algirus and O. c. cuniculus) that are known to undergo high rates of gene exchange. Our primary objective was to identify specific genes or genomic regions that have resisted introgression and are likely to confer reproductive barriers in natural conditions. On the basis of 326,000 polymorphisms, we found low to moderate overall levels of differentiation between subspecies, and fewer than 200 genomic regions dispersed throughout the genome showing high differentiation consistent with a signature of reduced gene flow. Most differentiated regions were smaller than 200 Kb and contained very few genes. Remarkably, 30 regions were each found to contain a single gene, facilitating the identification of candidate genes underlying reproductive isolation. This gene-level resolution yielded several insights into the genetic basis and architecture of reproductive isolation in rabbits. Regions of high differentiation were enriched on the X-chromosome and near centromeres. Genes lying within differentiated regions were often associated with transcription and epigenetic activities, including chromatin organization, regulation of transcription, and DNA binding. Overall, our results from a naturally hybridizing system share important commonalities with hybrid incompatibility genes identified using laboratory crosses in mice and flies, highlighting general mechanisms underlying the maintenance of reproductive barriers.
Author Summary
A number of laboratory studies of speciation have uncovered individual genes that confer lower fitness when in a divergent genetic background. It is, however, unclear whether these genes were ever relevant to restricting gene flow in nature, or whether they are indirect consequences of functional divergence between species that in most cases can no longer hybridize in natural conditions. Analysis of introgression across the genome of divergent populations provides an alternative approach to determine the genetic basis of reproductive isolation. Using two subspecies of rabbits as a model for the early stages of speciation, we provide a genome-wide map of genetic differentiation. Our study revealed important aspects of the genetic architecture of differentiation in the early stages of divergence and allowed the identification of genomic regions that resist introgression and are likely to confer reproductive barriers in natural conditions. The gene content of these regions, which in several cases reached gene-level resolution, suggests an important role for epigenetic and transcription regulation in the maintenance of reproductive barriers. Some of the genes identified here in a natural system are similar in function to the hybrid male sterility genes identified in laboratory studies of speciation.
doi:10.1371/journal.pgen.1003519
PMCID: PMC4148185  PMID: 25166595
20.  Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings 
PLoS Genetics  2006;2(3):e41.
The study of continuously varying, quantitative traits is important in evolutionary biology, agriculture, and medicine. Variation in such traits is attributable to many, possibly interacting, genes whose expression may be sensitive to the environment, which makes their dissection into underlying causative factors difficult. An important population parameter for quantitative traits is heritability, the proportion of total variance that is due to genetic factors. Response to artificial and natural selection and the degree of resemblance between relatives are all a function of this parameter. Following the classic paper by R. A. Fisher in 1918, the estimation of additive and dominance genetic variance and heritability in populations is based upon the expected proportion of genes shared between different types of relatives, and explicit, often controversial and untestable models of genetic and non-genetic causes of family resemblance. With genome-wide coverage of genetic markers it is now possible to estimate such parameters solely within families using the actual degree of identity-by-descent sharing between relatives. Using genome scans on 4,401 quasi-independent sib pairs of which 3,375 pairs had phenotypes, we estimated the heritability of height from empirical genome-wide identity-by-descent sharing, which varied from 0.374 to 0.617 (mean 0.498, standard deviation 0.036). The variance in identity-by-descent sharing per chromosome and per genome was consistent with theory. The maximum likelihood estimate of the heritability for height was 0.80 with no evidence for non-genetic causes of sib resemblance, consistent with results from independent twin and family studies but using an entirely separate source of information. Our application shows that it is feasible to estimate genetic variance solely from within-family segregation and provides an independent validation of previously untestable assumptions. Given sufficient data, our new paradigm will allow the estimation of genetic variation for disease susceptibility and quantitative traits that is free from confounding with non-genetic factors and will allow partitioning of genetic variation into additive and non-additive components.
Synopsis
Quantitative geneticists attempt to understand variation between individuals within a population for traits such as height in humans and the number of bristles in fruit flies. This has been traditionally done by partitioning the variation in underlying sources due to genetic and environmental factors, using the observed amount of variation between and within families. A problem with this approach is that one can never be sure that the estimates are correct, because nature and nurture can be confounded without one knowing it. The authors got around this problem by comparing the similarity between relatives as a function of the exact proportion of genes that they have in common, looking only within families. Using this approach, the authors estimated the amount of total variation for height in humans that is due to genetic factors from 3,375 sibling pairs. For each pair, the authors estimated the proportion of genes that they share from DNA markers. It was found that about 80% of the total variation can be explained by genetic factors, close to results that are obtained from classical studies. This study provides the first validation of an estimate of genetic variation by using a source of information that is free from nature–nurture assumptions.
doi:10.1371/journal.pgen.0020041
PMCID: PMC1413498  PMID: 16565746
21.  Learning Gene Networks under SNP Perturbations Using eQTL Datasets 
PLoS Computational Biology  2014;10(2):e1003420.
The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response.
Author Summary
A complete understanding of how gene regulatory networks are wired in a biological system is important in many areas of biology and medicine. The most popular method for investigating a gene network has been based on experimental perturbation studies, where the expression of a gene is experimentally manipulated to observe how this perturbation affects the expressions of other genes. Such experimental methods are costly, laborious, and do not scale to a perturbation of more than two genes at a time. As an alternative, genetical genomics approach uses genetic variants as naturally-occurring perturbations of gene regulatory system and learns gene networks by decoding the perturbation effects by genetic variants, given population gene-expression and genotype data. However, since there exist millions of genetic variants in genomes that simultaneously perturb a gene network, it is not obvious how to decode the effects of such multifactorial perturbations from data. Our statistical approach overcomes this computational challenge and recovers gene networks under SNP perturbations using probabilistic graphical models. As population gene-expression and genotype datasets are routinely collected to study genetic architectures of complex diseases and phenotypes, our approach can directly leverage these existing datasets to provide a more effective way of identifying gene networks.
doi:10.1371/journal.pcbi.1003420
PMCID: PMC3937098  PMID: 24586125
22.  The value of avian genomics to the conservation of wildlife 
BMC Genomics  2009;10(Suppl 2):S10.
Background
Genomic studies in non-domestic avian models, such as the California condor and white-throated sparrow, can lead to more comprehensive conservation plans and provide clues for understanding mechanisms affecting genetic variation, adaptation and evolution.
Developing genomic tools and resources including genomic libraries and a genetic map of the California condor is a prerequisite for identification of candidate loci for a heritable embryonic lethal condition. The white-throated sparrow exhibits a stable genetic polymorphism (i.e. chromosomal rearrangements) associated with variation in morphology, physiology, and behavior (e.g., aggression, social behavior, sexual behavior, parental care).
In this paper we outline the utility of these species as well as report on recent advances in the study of their genomes.
Results
Genotyping of the condor resource population at 17 microsatellite loci provided a better assessment of the current population's genetic variation. Specific New World vulture repeats were found in the condor genome. Using condor BAC library and clones, chicken-condor comparative maps were generated. A condor fibroblast cell line transcriptome was characterized using the 454 sequencing technology.
Our karyotypic analyses of the sparrow in combination with other studies indicate that the rearrangements in both chromosomes 2m and 3a are complex and likely involve multiple inversions, interchromosomal linkage, and pleiotropy. At least a portion of the rearrangement in chromosome 2m existed in the common ancestor of the four North American species of Zonotrichia, but not in the one South American species, and that the 2m form, originally thought to be the derived condition, might actually be the ancestral one.
Conclusion
Mining and characterization of candidate loci in the California condor using molecular genetic and genomic techniques as well as linkage and comparative genomic mapping will eventually enable the identification of carriers of the chondrodystrophy allele, resulting in improved genetic management of this disease.
In the white-throated sparrow, genomic studies, combined with ecological data, will help elucidate the basis of genic selection in a natural population. Morphs of the sparrow provide us with a unique opportunity to study intraspecific genomic differences, which have resulted from two separate yet linked evolutionary trajectories. Such results can transform our understanding of evolutionary and conservation biology.
doi:10.1186/1471-2164-10-S2-S10
PMCID: PMC2966331  PMID: 19607652
23.  Genome-wide association studies for agronomical traits in a world wide spring barley collection 
BMC Plant Biology  2012;12:16.
Background
Genome-wide association studies (GWAS) based on linkage disequilibrium (LD) provide a promising tool for the detection and fine mapping of quantitative trait loci (QTL) underlying complex agronomic traits. In this study we explored the genetic basis of variation for the traits heading date, plant height, thousand grain weight, starch content and crude protein content in a diverse collection of 224 spring barleys of worldwide origin. The whole panel was genotyped with a customized oligonucleotide pool assay containing 1536 SNPs using Illumina's GoldenGate technology resulting in 957 successful SNPs covering all chromosomes. The morphological trait "row type" (two-rowed spike vs. six-rowed spike) was used to confirm the high level of selectivity and sensitivity of the approach. This study describes the detection of QTL for the above mentioned agronomic traits by GWAS.
Results
Population structure in the panel was investigated by various methods and six subgroups that are mainly based on their spike morphology and region of origin. We explored the patterns of linkage disequilibrium (LD) among the whole panel for all seven barley chromosomes. Average LD was observed to decay below a critical level (r2-value 0.2) within a map distance of 5-10 cM. Phenotypic variation within the panel was reasonably large for all the traits. The heritabilities calculated for each trait over multi-environment experiments ranged between 0.90-0.95. Different statistical models were tested to control spurious LD caused by population structure and to calculate the P-value of marker-trait associations. Using a mixed linear model with kinship for controlling spurious LD effects, we found a total of 171 significant marker trait associations, which delineate into 107 QTL regions. Across all traits these can be grouped into 57 novel QTL and 50 QTL that are congruent with previously mapped QTL positions.
Conclusions
Our results demonstrate that the described diverse barley panel can be efficiently used for GWAS of various quantitative traits, provided that population structure is appropriately taken into account. The observed significant marker trait associations provide a refined insight into the genetic architecture of important agronomic traits in barley. However, individual QTL account only for a small portion of phenotypic variation, which may be due to insufficient marker coverage and/or the elimination of rare alleles prior to analysis. The fact that the combined SNP effects fall short of explaining the complete phenotypic variance may support the hypothesis that the expression of a quantitative trait is caused by a large number of very small effects that escape detection. Notwithstanding these limitations, the integration of GWAS with biparental linkage mapping and an ever increasing body of genomic sequence information will facilitate the systematic isolation of agronomically important genes and subsequent analysis of their allelic diversity.
doi:10.1186/1471-2229-12-16
PMCID: PMC3349577  PMID: 22284310
24.  Strategic Approaches to Unraveling Genetic Causes of Cardiovascular Diseases 
Circulation research  2011;108(10):1252-1269.
DNA sequence variants (DSVs) are major components of the “causal field” for virtually all-medical phenotypes, whether single-gene familial disorders or complex traits without a clear familial aggregation. The causal variants in single gene disorders are necessary and sufficient to impart large effects. In contrast, complex traits are due to a much more complicated network of contributory components that in aggregate increase the probability of disease. The conventional approach to identification of the causal variants for single gene disorders is genetic linkage. However, it does not offer sufficient resolution to map the causal genes in small size families or sporadic cases. The approach to genetic studies of complex traits entails candidate gene or Genome Wide Association Studies (GWAS). GWAS provides an unbiased survey of the effects of common genetic variants (common disease - common variant hypothesis). GWAS have led to identification of a large number of alleles for various cardiovascular diseases. However, common alleles account for a relatively small fraction of the total heritability of the traits. Accordingly, the focus has shifted toward identification of rare variants that might impart larger effect sizes (rare variant-common disease hypothesis). This shift is made feasible by recent advances in massively parallel DNA sequencing platforms, which afford the opportunity to identify virtually all common as well as rare alleles in individuals. In this review, we discuss various strategies that are used to delineate the genetic contribution to medically important cardiovascular phenotypes, emphasizing the utility of the new deep sequencing approaches.
doi:10.1161/CIRCRESAHA.110.236067
PMCID: PMC3115927  PMID: 21566222
Genetics; Next-Generation Sequencing; Complex traits; Polymorphism
25.  A Population Genetic Approach to Mapping Neurological Disorder Genes Using Deep Resequencing 
PLoS Genetics  2011;7(2):e1001318.
Deep resequencing of functional regions in human genomes is key to identifying potentially causal rare variants for complex disorders. Here, we present the results from a large-sample resequencing (n = 285 patients) study of candidate genes coupled with population genetics and statistical methods to identify rare variants associated with Autism Spectrum Disorder and Schizophrenia. Three genes, MAP1A, GRIN2B, and CACNA1F, were consistently identified by different methods as having significant excess of rare missense mutations in either one or both disease cohorts. In a broader context, we also found that the overall site frequency spectrum of variation in these cases is best explained by population models of both selection and complex demography rather than neutral models or models accounting for complex demography alone. Mutations in the three disease-associated genes explained much of the difference in the overall site frequency spectrum among the cases versus controls. This study demonstrates that genes associated with complex disorders can be mapped using resequencing and analytical methods with sample sizes far smaller than those required by genome-wide association studies. Additionally, our findings support the hypothesis that rare mutations account for a proportion of the phenotypic variance of these complex disorders.
Author Summary
It is widely accepted that genetic factors play important roles in the etiology of neurological diseases. However, the nature of the underlying genetic variation remains unclear. Critical questions in the field of human genetics relate to the frequency and size effects of genetic variants associated with disease. For instance, the common disease–common variant model is based on the idea that sets of common variants explain a significant fraction of the variance found in common disease phenotypes. On the other hand, rare variants may have strong effects and therefore largely contribute to disease phenotypes. Due to their high penetrance and reduced fitness, such variants are maintained in the population at low frequencies, thus limiting their detection in genome-wide association studies. Here, we use a resequencing approach on a cohort of 285 Autism Spectrum Disorder and Schizophrenia patients and preformed several analyses, enhanced with population genetic approaches, to identify variants associated with both diseases. Our results demonstrate an excess of rare variants in these disease cohorts and identify genes with negative (deleterious) selection coefficients, suggesting an accumulation of variants of detrimental effects. Our results present further evidence for rare variants explaining a component of the genetic etiology of autism and schizophrenia.
doi:10.1371/journal.pgen.1001318
PMCID: PMC3044677  PMID: 21383861

Results 1-25 (1570546)