PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1620096)

Clipboard (0)
None

Related Articles

1.  Gaia as a complex adaptive system. 
We define the Gaia system of life and its environment on Earth, review the status of the Gaia theory, introduce potentially relevant concepts from complexity theory, then try to apply them to Gaia. We consider whether Gaia is a complex adaptive system (CAS) in terms of its behaviour and suggest that the system is self-organizing but does not reside in a critical state. Gaia has supported abundant life for most of the last 3.8 Gyr. Large perturbations have occasionally suppressed life but the system has always recovered without losing the capacity for large-scale free energy capture and recycling of essential elements. To illustrate how complexity theory can help us understand the emergence of planetary-scale order, we present a simple cellular automata (CA) model of the imaginary planet Daisyworld. This exhibits emergent self-regulation as a consequence of feedback coupling between life and its environment. Local spatial interaction, which was absent from the original model, can destabilize the system by generating bifurcation regimes. Variation and natural selection tend to remove this instability. With mutation in the model system, it exhibits self-organizing adaptive behaviour in its response to forcing. We close by suggesting how artificial life ('Alife') techniques may enable more comprehensive feasibility tests of Gaia.
doi:10.1098/rstb.2001.1014
PMCID: PMC1692971  PMID: 12079529
2.  GAIA: a gram-based interaction analysis tool – an approach for identifying interacting domains in yeast 
BMC Bioinformatics  2009;10(Suppl 1):S60.
Background
Protein-Protein Interactions (PPIs) play important roles in many biological functions. Protein domains, which are defined as independently folding structural blocks of proteins, physically interact with each other to perform these biological functions. Therefore, the identification of Domain-Domain Interactions (DDIs) is of great biological interests because it is generally accepted that PPIs are mediated by DDIs. As a result, much effort has been put on the prediction of domain pair interactions based on computational methods. Many DDI prediction tools using PPIs network and domain evolution information have been reported. However, tools that combine the primary sequences, domain annotations, and structural annotations of proteins have not been evaluated before.
Results
In this study, we report a novel approach called Gram-bAsed Interaction Analysis (GAIA). GAIA extracts peptide segments that are composed of fixed length of continuous amino acids, called n-grams (where n is the number of amino acids), from the annotated domain and DDI data set in Saccharomyces cerevisiae (budding yeast) and identifies a list of n-grams that may contribute to DDIs and PPIs based on the frequencies of their appearance. GAIA also reports the coordinate position of gram pairs on each interacting domain pair. We demonstrate that our approach improves on other DDI prediction approaches when tested against a gold-standard data set and achieves a true positive rate of 82% and a false positive rate of 21%. We also identify a list of 4-gram pairs that are significantly over-represented in the DDI data set and may mediate PPIs.
Conclusion
GAIA represents a novel and reliable way to predict DDIs that mediate PPIs. Our results, which show the localizations of interacting grams/hotspots, provide testable hypotheses for experimental validation. Complemented with other prediction methods, this study will allow us to elucidate the interactome of cells.
doi:10.1186/1471-2105-10-S1-S60
PMCID: PMC2648738  PMID: 19208164
3.  Characterization of a Novel Intracellularly Activated Gene from Salmonella enterica Serovar Typhi  
Infection and Immunity  2002;70(10):5404-5411.
A Salmonella enterica serovar Typhi gene that is selectively up-regulated upon bacterial invasion of eukaryotic cells was characterized. The open reading frame encodes a 298-amino-acid hydrophobic polypeptide (30.8 kDa), which is predicted to be an integral membrane protein with nine membrane-spanning domains. The protein is closely related (87 to 94% reliability) to different transport and permease systems. Gene expression under laboratory conditions was relatively weak; however, sevenfold induction was observed in a high-osmolarity medium (300 mM NaCl). The growth pattern in a laboratory medium of a serovar Typhi strain Ty2 derivative containing a 735-bp in-frame deletion in this gene, named gaiA (for gene activated intracellularly), was not affected. In contrast, the mutant was partially impaired in intracellular survival in murine peritoneal macrophages, as well as in human monocyte-derived macrophages. However, in the case of human macrophages, this survival defect was modest and evident only at late infection times (24 h). Despite the distinct intracellular survival kinetics displayed in macrophages of different species, the gaiA null mutant was significantly affected in its potential to trigger apoptosis in both murine and human macrophages. Provision of the gaiA gene in trans resulted in complementation of these phenotypes. Interestingly, the absence of a functional gaiA gene caused a marked attenuation in the mouse mucin model, as shown by the increase (3 orders of magnitude) in the 50% lethal dose of the mutant strain over that of the parental strain Ty2 (P ≤ 0.05). Altogether, these data indicate that the product encoded by the gaiA gene is required for triggering apoptosis and bacterial survival within murine macrophages, which is consistent with the in vivo results obtained in the mouse mucin model. However, gaiA was not required for initial intracellular survival in human cells, indicating that its role in the natural host might be more complex than is suggested by the studies performed in the murine system.
doi:10.1128/IAI.70.10.5404-5411.2002
PMCID: PMC128351  PMID: 12228264
4.  Development of a Sensor Node for Precision Horticulture 
Sensors (Basel, Switzerland)  2009;9(5):3240-3255.
This paper presents the design of a new wireless sensor node (GAIA Soil-Mote) for precision horticulture applications which permits the use of precision agricultural instruments based on the SDI-12 standard. Wireless communication is achieved with a transceiver compliant with the IEEE 802.15.4 standard. The GAIA Soil-Mote software implementation is based on TinyOS. A two-phase methodology was devised to validate the design of this sensor node. The first phase consisted of laboratory validation of the proposed hardware and software solution, including a study on power consumption and autonomy. The second phase consisted of implementing a monitoring application in a real broccoli (Brassica oleracea L. var Marathon) crop in Campo de Cartagena in south-east Spain. In this way the sensor node was validated in real operating conditions. This type of application was chosen because there is a large potential market for it in the farming sector, especially for the development of precision agriculture applications.
doi:10.3390/s90503240
PMCID: PMC3297156  PMID: 22412309
Wireless Sensor Networks; Mote; TinyOS; Precision Horticulture
5.  Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology 
PLoS Genetics  2011;7(1):e1001273.
Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these risk variants. It has previously been observed that different genes harboring causal mutations for the same Mendelian disease often physically interact. We sought to evaluate the degree to which this is true of genes within strongly associated loci in complex disease. Using sets of loci defined in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein–protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more densely connected than chance expectation. To confirm biological relevance, we show that the components of the networks tend to be expressed in similar tissues relevant to the phenotypes in question, suggesting the network indicates common underlying processes perturbed by risk loci. Furthermore, we show that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non-immune traits to assess its applicability to complex traits in general. We find that genes in loci associated to height and lipid levels assemble into significantly connected networks but did not detect excess connectivity among Type 2 Diabetes (T2D) loci beyond chance. Taken together, our results constitute evidence that, for many of the complex diseases studied here, common genetic associations implicate regions encoding proteins that physically interact in a preferential manner, in line with observations in Mendelian disease.
Author Summary
Genome-wide association studies have uncovered hundreds of DNA changes associated with complex disease. The ultimate promise of these studies is the understanding of disease biology; this goal, however, is not easily achieved because each disease has yielded numerous associations, each one pointing to a region of the genome, rather than a specific causal mutation. Presumably, the causal variants affect components of common molecular processes, and a first step in understanding the disease biology perturbed in patients is to identify connections among regions associated to disease. Since it has been reported in numerous Mendelian diseases that protein products of causal genes tend to physically bind each other, we chose to approach this problem using known protein–protein interactions to test whether any of the products of genes in five complex trait-associated loci bind each other. We applied several permutation methods and find robustly significant connectivity within four of the traits. In Crohn's disease and rheumatoid arthritis, we are able to show that these genes are co-expressed and that other proteins emerging in the network are enriched for association to disease. These findings suggest that, for the complex traits studied here, associated loci contain variants that affect common molecular processes, rather than distinct mechanisms specific to each association.
doi:10.1371/journal.pgen.1001273
PMCID: PMC3020935  PMID: 21249183
6.  Genome-Wide Interaction-Based Association Analysis Identified Multiple New Susceptibility Loci for Common Diseases 
PLoS Genetics  2011;7(3):e1001338.
Genome-wide interaction-based association (GWIBA) analysis has the potential to identify novel susceptibility loci. These interaction effects could be missed with the prevailing approaches in genome-wide association studies (GWAS). However, no convincing loci have been discovered exclusively from GWIBA methods, and the intensive computation involved is a major barrier for application. Here, we developed a fast, multi-thread/parallel program named “pair-wise interaction-based association mapping” (PIAM) for exhaustive two-locus searches. With this program, we performed a complete GWIBA analysis on seven diseases with stringent control for false positives, and we validated the results for three of these diseases. We identified one pair-wise interaction between a previously identified locus, C1orf106, and one new locus, TEC, that was specific for Crohn's disease, with a Bonferroni corrected P<0.05 (P = 0.039). This interaction was replicated with a pair of proxy linked loci (P = 0.013) on an independent dataset. Five other interactions had corrected P<0.5. We identified the allelic effect of a locus close to SLC7A13 for coronary artery disease. This was replicated with a linked locus on an independent dataset (P = 1.09×10−7). Through a local validation analysis that evaluated association signals, rather than locus-based associations, we found that several other regions showed association/interaction signals with nominal P<0.05. In conclusion, this study demonstrated that the GWIBA approach was successful for identifying novel loci, and the results provide new insights into the genetic architecture of common diseases. In addition, our PIAM program was capable of handling very large GWAS datasets that are likely to be produced in the future.
Author Summary
Recent studies on the genetic basis of common diseases have identified many loci that confer disease susceptibility. However, much of the heritability of these diseases remains unexplained. Loci involved in gene–gene interactions are considered cryptic, because they confer susceptibility, but may not generate a detectable signal on their own. These interactions may account for the “missing heritability” of common diseases. Theoretically, these interactions can be identified with the genome-wide interaction-based association analysis. But, in reality, very few gene–gene interactions have been identified with that method, and most were based on prior biological knowledge. Here, we applied a parallel computing technique that facilitated the identification of multiple new cryptic susceptibility loci involved in common diseases. We applied stringent control for false positives, and we validated our findings with independent datasets. This study demonstrated that interactions between gene loci could be successfully identified with the genome-wide interaction-based approach. With this approach, we also identified cryptic loci with moderate single-locus effects. The identified loci and interactions merit further investigations for fine mapping and functional analyses. Our results extend the current knowledge of common diseases for future studies in genetic mapping. This approach is applicable to current and future genome-wide association datasets.
doi:10.1371/journal.pgen.1001338
PMCID: PMC3060075  PMID: 21437271
7.  Two-Stage Two-Locus Models in Genome-Wide Association 
PLoS Genetics  2006;2(9):e157.
Studies in model organisms suggest that epistasis may play an important role in the etiology of complex diseases and traits in humans. With the era of large-scale genome-wide association studies fast approaching, it is important to quantify whether it will be possible to detect interacting loci using realistic sample sizes in humans and to what extent undetected epistasis will adversely affect power to detect association when single-locus approaches are employed. We therefore investigated the power to detect association for an extensive range of two-locus quantitative trait models that incorporated varying degrees of epistasis. We compared the power to detect association using a single-locus model that ignored interaction effects, a full two-locus model that allowed for interactions, and, most important, two two-stage strategies whereby a subset of loci initially identified using single-locus tests were analyzed using the full two-locus model. Despite the penalty introduced by multiple testing, fitting the full two-locus model performed better than single-locus tests for many of the situations considered, particularly when compared with attempts to detect both individual loci. Using a two-stage strategy reduced the computational burden associated with performing an exhaustive two-locus search across the genome but was not as powerful as the exhaustive search when loci interacted. Two-stage approaches also increased the risk of missing interacting loci that contributed little effect at the margins. Based on our extensive simulations, our results suggest that an exhaustive search involving all pairwise combinations of markers across the genome might provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
Synopsis
Although there is growing appreciation that attempting to map genetic interactions in humans may be a fruitful endeavor, there is no consensus as to the best strategy for their detection, particularly in the case of genome-wide association where the number of potential comparisons is enormous. In this article, the authors compare the performance of four different search strategies to detect loci which interact in genome-wide association—a single-locus search, an exhaustive two-locus search, and two, two-stage procedures in which a subset of loci initially identified with single-locus tests are analyzed using a full two-locus model. Their results show that when loci interact, an exhaustive two-locus search across the genome is superior to a two-stage strategy, and in many situations can identify loci which would not have been identified solely using a single-locus search. Their findings suggest that an exhaustive search involving all pairwise combinations of markers across the genome may provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
doi:10.1371/journal.pgen.0020157
PMCID: PMC1570380  PMID: 17002500
8.  Identifying the genetic determinants of transcription factor activity 
Genome-wide messenger RNA expression levels are highly heritable. However, the molecular mechanisms underlying this heritability are poorly understood.The influence of trans-acting polymorphisms is often mediated by changes in the regulatory activity of one or more sequence-specific transcription factors (TFs). We use a method that exploits prior information about the DNA-binding specificity of each TF to estimate its genotype-specific regulatory activity. To this end, we perform linear regression of genotype-specific differential mRNA expression on TF-specific promoter-binding affinity.Treating inferred TF activity as a quantitative trait and mapping it across a panel of segregants from an experimental genetic cross allows us to identify trans-acting loci (‘aQTLs') whose allelic variation modulates the TF. A few of these aQTL regions contain the gene encoding the TF itself; several others contain a gene whose protein product is known to interact with the TF.Our method is strictly causal, as it only uses sequence-based features as predictors. Application to budding yeast demonstrates a dramatic increase in statistical power, compared with existing methods, to detect locus-TF associations and trans-acting loci. Our aQTL mapping strategy also succeeds in mouse.
Genetic sequence variation naturally perturbs mRNA expression levels in the cell. In recent years, analysis of parallel genotyping and expression profiling data for segregants from genetic crosses between parental strains has revealed that mRNA expression levels are highly heritable. Expression quantitative trait loci (eQTLs), whose allelic variation regulates the expression level of individual genes, have successfully been identified (Brem et al, 2002; Schadt et al, 2003). The molecular mechanisms underlying the heritability of mRNA expression are poorly understood. However, they are likely to involve mediation by transcription factors (TFs). We present a new transcription-factor-centric method that greatly increases our ability to understand what drives the genetic variation in mRNA expression (Figure 1). Our method identifies genomic loci (‘aQTLs') whose allelic variation modulates the protein-level activity of specific TFs. To map aQTLs, we integrate genotyping and expression profiling data with quantitative prior information about DNA-binding specificity of transcription factors in the form of position-specific affinity matrices (Bussemaker et al, 2007). We applied our method in two different organisms: budding yeast and mouse.
In our approach, the inferred TF activity is explicitly treated as a quantitative trait, and genetically mapped. The decrease of ‘phenotype space' from that of all genes (in the eQTL approach) to that of all TFs (in our aQTL approach) increases the statistical power to detect trans-acting loci in two distinct ways. First, as each inferred TF activity is derived from a large number of genes, it is far less noisy than mRNA levels of individual genes. Second, the number of trait/marker combinations that needs to be tested for statistical significance in parallel is roughly two orders of magnitude smaller than for eQTLs. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002; Yvert et al, 2003; Lee et al, 2006; Smith and Kruglyak, 2008; Zhu et al, 2008). The total number of distinct genomic loci identified as an aQTL equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008).
To better understand the mechanisms underlying the identified genetic linkages, we examined the genes within each aQTL region. First, we found four ‘local' aQTLs, which encompass the gene encoding the TF itself. This includes the known polymorphism in the HAP1 gene (Brem et al, 2002), but also novel predictions of trans-acting polymorphisms in RFX1, STB5, and HAP4. Second, using high-throughput protein–protein interaction data, we identified putative causal genes for several aQTLs. For example, we predict that a polymorphism in the cyclin-dependent kinase CDC28 antagonistically modulates the functionally distinct cell cycle regulators Fkh1 and Fkh2. In this and other cases, our approach naturally accounts for post-translational modulation of TF activity at the protein level.
We validated our ability to predict locus-TF associations in yeast using gene expression profiles of allele replacement strains from a previous study (Smith and Kruglyak, 2008). Chromosome 15 contains an aQTL whose allelic status influences the activity of no fewer than 30 distinct TFs. This locus includes IRA2, which controls intracellular cAMP levels. We used the gene expression profile of IRA2 replacement strains to confirm that the polymorphism within IRA2 indeed modulates a subset of the TFs whose activity was predicted to link to this locus, and no other TFs.
Application of our approach to mouse data identified an aQTL modulating the activity of a specific TF in liver cells. We identified an aQTL on mouse chromosome 7 for Zscan4, a transcription factor containing four zinc finger domains and a SCAN domain. Even though we could not detect a candidate causal gene for Zscan4p because of lack of information about the mouse genome, our result demonstrates that our method also works in higher eukaryotes.
In summary, aQTL mapping has a greatly improved sensitivity to detect molecular mechanisms underlying the heritability of gene expression. The successful application of our approach to yeast and mouse data underscores the value of explicitly treating the inferred TF activity as a quantitative trait for increasing statistical power of detecting trans-acting loci. Furthermore, our method is computationally efficient, and easily applicable to any other organism whenever prior information about the DNA-binding specificity of TFs is available.
Analysis of parallel genotyping and expression profiling data has shown that mRNA expression levels are highly heritable. Currently, only a tiny fraction of this genetic variance can be mechanistically accounted for. The influence of trans-acting polymorphisms on gene expression traits is often mediated by transcription factors (TFs). We present a method that exploits prior knowledge about the in vitro DNA-binding specificity of a TF in order to map the loci (‘aQTLs') whose inheritance modulates its protein-level regulatory activity. Genome-wide regression of differential mRNA expression on predicted promoter affinity is used to estimate segregant-specific TF activity, which is subsequently mapped as a quantitative phenotype. In budding yeast, our method identifies six times as many locus-TF associations and more than twice as many trans-acting loci as all existing methods combined. Application to mouse data from an F2 intercross identified an aQTL on chromosome VII modulating the activity of Zscan4 in liver cells. Our method has greatly improved statistical power over existing methods, is mechanism based, strictly causal, computationally efficient, and generally applicable.
doi:10.1038/msb.2010.64
PMCID: PMC2964119  PMID: 20865005
gene expression; gene regulatory networks; genetic variation; quantitative trait loci; transcription factors
9.  The Emergence of Environmental Homeostasis in Complex Ecosystems 
PLoS Computational Biology  2013;9(5):e1003050.
The Earth, with its core-driven magnetic field, convective mantle, mobile lid tectonics, oceans of liquid water, dynamic climate and abundant life is arguably the most complex system in the known universe. This system has exhibited stability in the sense of, bar a number of notable exceptions, surface temperature remaining within the bounds required for liquid water and so a significant biosphere. Explanations for this range from anthropic principles in which the Earth was essentially lucky, to homeostatic Gaia in which the abiotic and biotic components of the Earth system self-organise into homeostatic states that are robust to a wide range of external perturbations. Here we present results from a conceptual model that demonstrates the emergence of homeostasis as a consequence of the feedback loop operating between life and its environment. Formulating the model in terms of Gaussian processes allows the development of novel computational methods in order to provide solutions. We find that the stability of this system will typically increase then remain constant with an increase in biological diversity and that the number of attractors within the phase space exponentially increases with the number of environmental variables while the probability of the system being in an attractor that lies within prescribed boundaries decreases approximately linearly. We argue that the cybernetic concept of rein control provides insights into how this model system, and potentially any system that is comprised of biological to environmental feedback loops, self-organises into homeostatic states.
Author Summary
Life on Earth is perhaps greater than three and a half billion years old and it would appear that once it started it never stopped. During this period a number of dramatic shocks and drivers have affected the Earth. These include the impacts of massive asteroids, runaway climate change and increases in brightness of the Sun. Has life on Earth simply been lucky in withstanding such perturbations? Are there any self-regulating or homeostatic processes operating in the Earth system that would reduce the severity of such perturbations? If such planetary processes exist, to what extent are they the result of the actions of life? In this study, we show how the regulation of environmental conditions can emerge as a consequence of life's effects. If life is both affected by and affects it environment, then this coupled system can self-organise into a robust control system that was first described during the early cybernetics movement around the middle of the twentieth century. Our findings are in principle applicable to a wide range of real world systems - from microbial mats to aquatic ecosystems up to and including the entire biosphere.
doi:10.1371/journal.pcbi.1003050
PMCID: PMC3656095  PMID: 23696719
10.  Knowledge-Driven Analysis Identifies a Gene–Gene Interaction Affecting High-Density Lipoprotein Cholesterol Levels in Multi-Ethnic Populations 
PLoS Genetics  2012;8(5):e1002714.
Total cholesterol, low-density lipoprotein cholesterol, triglyceride, and high-density lipoprotein cholesterol (HDL-C) levels are among the most important risk factors for coronary artery disease. We tested for gene–gene interactions affecting the level of these four lipids based on prior knowledge of established genome-wide association study (GWAS) hits, protein–protein interactions, and pathway information. Using genotype data from 9,713 European Americans from the Atherosclerosis Risk in Communities (ARIC) study, we identified an interaction between HMGCR and a locus near LIPC in their effect on HDL-C levels (Bonferroni corrected Pc = 0.002). Using an adaptive locus-based validation procedure, we successfully validated this gene–gene interaction in the European American cohorts from the Framingham Heart Study (Pc = 0.002) and the Multi-Ethnic Study of Atherosclerosis (MESA; Pc = 0.006). The interaction between these two loci is also significant in the African American sample from ARIC (Pc = 0.004) and in the Hispanic American sample from MESA (Pc = 0.04). Both HMGCR and LIPC are involved in the metabolism of lipids, and genome-wide association studies have previously identified LIPC as associated with levels of HDL-C. However, the effect on HDL-C of the novel gene–gene interaction reported here is twice as pronounced as that predicted by the sum of the marginal effects of the two loci. In conclusion, based on a knowledge-driven analysis of epistasis, together with a new locus-based validation method, we successfully identified and validated an interaction affecting a complex trait in multi-ethnic populations.
Author Summary
Genome-wide association studies (GWAS) have identified many loci associated with complex human traits or diseases. However, the fraction of heritable variation explained by these loci is often relatively low. Gene–gene interactions might play a significant role in complex traits or diseases and are one of the many possible factors contributing to the missing heritability. However, to date only a few interactions have been found and validated in GWAS due to the limited power caused by the need for multiple-testing correction for the very large number of tests conducted. Here, we used three types of prior knowledge, known GWAS hits, protein–protein interactions, and pathway information, to guide our search for gene–gene interactions affecting four lipid levels. We identified an interaction between HMGCR and a locus near LIPC in their effect on high-density lipoprotein cholesterol (HDL-C) and another pair of loci that interact in their effect on low-density lipoprotein cholesterol (LDL-C). We validated the interaction on HDL-C in a number of independent multiple-ethnic populations, while the interaction underlying LDL-C did not validate. The prior knowledge-driven searching approach and a locus-based validation procedure show the potential for dissecting and validating gene–gene interactions in current and future GWAS.
doi:10.1371/journal.pgen.1002714
PMCID: PMC3359971  PMID: 22654671
11.  Genome-Wide Association Studies in an Isolated Founder Population from the Pacific Island of Kosrae 
PLoS Genetics  2009;5(2):e1000365.
It has been argued that the limited genetic diversity and reduced allelic heterogeneity observed in isolated founder populations facilitates discovery of loci contributing to both Mendelian and complex disease. A strong founder effect, severe isolation, and substantial inbreeding have dramatically reduced genetic diversity in natives from the island of Kosrae, Federated States of Micronesia, who exhibit a high prevalence of obesity and other metabolic disorders. We hypothesized that genetic drift and possibly natural selection on Kosrae might have increased the frequency of previously rare genetic variants with relatively large effects, making these alleles readily detectable in genome-wide association analysis. However, mapping in large, inbred cohorts introduces analytic challenges, as extensive relatedness between subjects violates the assumptions of independence upon which traditional association test statistics are based. We performed genome-wide association analysis for 15 quantitative traits in 2,906 members of the Kosrae population, using novel approaches to manage the extreme relatedness in the sample. As positive controls, we observe association to known loci for plasma cholesterol, triglycerides, and C-reactive protein and to a compelling candidate loci for thyroid stimulating hormone and fasting plasma glucose. We show that our study is well powered to detect common alleles explaining ≥5% phenotypic variance. However, no such large effects were observed with genome-wide significance, arguing that even in such a severely inbred population, common alleles typically have modest effects. Finally, we show that a majority of common variants discovered in Caucasians have indistinguishable effect sizes on Kosrae, despite the major differences in population genetics and environment.
Author Summary
Isolated populations have contributed to the discovery of loci with simple Mendelian segregation and large effects on disease risk or trait variation. We hypothesized that the use of isolated populations might also facilitate the discovery of common alleles contributing to complex traits with relatively larger effects. However, the use of association analyses to map common loci influencing trait variation in large, inbred cohorts introduces analytic challenges, as extensive relatedness between subjects violates the assumptions of independence upon which traditional association test statistics are based. We developed an analytic strategy to perform genome-wide association studies in an inbred family containing over 2,800 individuals from the island of Kosrae, Federated States of Micronesia. No alleles with large effect were observed with strong statistical support in any of the 15 traits examined, suggesting that the contribution of individual common variants to complex trait variation in Kosraens is typically not much greater than that observed in other populations. We show that the effects of many loci previously identified in Caucasian populations are indistinguishable in Caucasians and Kosraens, despite very different population genetics and environmental influences.
doi:10.1371/journal.pgen.1000365
PMCID: PMC2628735  PMID: 19197348
12.  Genetic Networks Controlling Structural Outcome of Glucosinolate Activation across Development 
PLoS Genetics  2008;4(10):e1000234.
Most phenotypic variation present in natural populations is under polygenic control, largely determined by genetic variation at quantitative trait loci (QTLs). These genetic loci frequently interact with the environment, development, and each other, yet the importance of these interactions on the underlying genetic architecture of quantitative traits is not well characterized. To better study how epistasis and development may influence quantitative traits, we studied genetic variation in Arabidopsis glucosinolate activation using the moderately sized Bayreuth×Shahdara recombinant inbred population, in terms of number of lines. We identified QTLs for glucosinolate activation at three different developmental stages. Numerous QTLs showed developmental dependency, as well as a large epistatic network, centered on the previously cloned large-effect glucosinolate activation QTL, ESP. Analysis of Heterogeneous Inbred Families validated seven loci and all of the QTL×DPG (days post-germination) interactions tested, but was complicated by the extensive epistasis. A comparison of transcript accumulation data within 211 of these RILs showed an extensive overlap of gene expression QTLs for structural specifiers and their homologs with the identified glucosinolate activation loci. Finally, we were able to show that two of the QTLs are the result of whole-genome duplications of a glucosinolate activation gene cluster. These data reveal complex age-dependent regulation of structural outcomes and suggest that transcriptional regulation is associated with a significant portion of the underlying ontogenic variation and epistatic interactions in glucosinolate activation.
Author Summary
A principal interest in biology is to understand how natural genetic variation translates into phenotypic variation. A key component of this connection is how the genetic variation interacts with other sources of variation, such as environment (G×E), development (G×D), or other genetic loci (G×G or epistasis). To analyze the molecular underpinnings of these quantitative genetics interaction terms, we investigated the genetic architecture of an adaptive trait, glucosinolate activation, in Arabidopsis thaliana during the development of what is considered a static mature rosette. Variation in glucosinolate activation was principally controlled by epistatic and G×D interactions. Epistatic interactions identified both Mendelian epistasis, where regulatory loci controlled enzymatic loci, and quantitative interactions between regulatory loci. G×D appeared to involve master regulatory loci as determined by trans-eQTL hotspot analysis. Finally, two common glucosinolate activation QTLs appear to have evolved via gene loss and sub-functionalization following quadruplication of an ancestral genomic fragment, potentially by two whole-genome duplications. Thus, genomic duplication events may facilitate the formation of quantitative genetic variation. This study provides insights into the molecular basis of the link between genetic and phenotypic variation in a potentially adaptive trait.
doi:10.1371/journal.pgen.1000234
PMCID: PMC2565841  PMID: 18949035
13.  Maximum entropy production in environmental and ecological systems 
The coupled biosphere–atmosphere system entails a vast range of processes at different scales, from ecosystem exchange fluxes of energy, water and carbon to the processes that drive global biogeochemical cycles, atmospheric composition and, ultimately, the planetary energy balance. These processes are generally complex with numerous interactions and feedbacks, and they are irreversible in their nature, thereby producing entropy. The proposed principle of maximum entropy production (MEP), based on statistical mechanics and information theory, states that thermodynamic processes far from thermodynamic equilibrium will adapt to steady states at which they dissipate energy and produce entropy at the maximum possible rate. This issue focuses on the latest development of applications of MEP to the biosphere–atmosphere system including aspects of the atmospheric circulation, the role of clouds, hydrology, vegetation effects, ecosystem exchange of energy and mass, biogeochemical interactions and the Gaia hypothesis. The examples shown in this special issue demonstrate the potential of MEP to contribute to improved understanding and modelling of the biosphere and the wider Earth system, and also explore limitations and constraints to the application of the MEP principle.
doi:10.1098/rstb.2010.0018
PMCID: PMC2871911  PMID: 20368247
thermodynamics; interactions; Earth system science; ecosystems
14.  A Meta-Analysis of Genome-Wide Association Scans Identifies IL18RAP, PTPN2, TAGAP, and PUS10 As Shared Risk Loci for Crohn's Disease and Celiac Disease 
PLoS Genetics  2011;7(1):e1001283.
Crohn's disease (CD) and celiac disease (CelD) are chronic intestinal inflammatory diseases, involving genetic and environmental factors in their pathogenesis. The two diseases can co-occur within families, and studies suggest that CelD patients have a higher risk to develop CD than the general population. These observations suggest that CD and CelD may share common genetic risk loci. Two such shared loci, IL18RAP and PTPN2, have already been identified independently in these two diseases. The aim of our study was to explicitly identify shared risk loci for these diseases by combining results from genome-wide association study (GWAS) datasets of CD and CelD. Specifically, GWAS results from CelD (768 cases, 1,422 controls) and CD (3,230 cases, 4,829 controls) were combined in a meta-analysis. Nine independent regions had nominal association p-value <1.0×10−5 in this meta-analysis and showed evidence of association to the individual diseases in the original scans (p-value <1×10−2 in CelD and <1×10−3 in CD). These include the two previously reported shared loci, IL18RAP and PTPN2, with p-values of 3.37×10−8 and 6.39×10−9, respectively, in the meta-analysis. The other seven had not been reported as shared loci and thus were tested in additional CelD (3,149 cases and 4,714 controls) and CD (1,835 cases and 1,669 controls) cohorts. Two of these loci, TAGAP and PUS10, showed significant evidence of replication (Bonferroni corrected p-values <0.0071) in the combined CelD and CD replication cohorts and were firmly established as shared risk loci of genome-wide significance, with overall combined p-values of 1.55×10−10 and 1.38×10−11 respectively. Through a meta-analysis of GWAS data from CD and CelD, we have identified four shared risk loci: PTPN2, IL18RAP, TAGAP, and PUS10. The combined analysis of the two datasets provided the power, lacking in the individual GWAS for single diseases, to detect shared loci with a relatively small effect.
Author Summary
Celiac disease and Crohn's disease are both chronic inflammatory diseases of the digestive tract. Both of these diseases are complex genetic traits with multiple genetic and non-genetic risk factors. Recent genome-wide association (GWA) studies have identified some of the genetic risk factors for these diseases. Interestingly, in addition to some similarities in phenotype, these studies have shown that CelD and CD share some genetic risk factors. Specifically, by comparing the results of independent GWA studies of CD and CelD, two genetic risk loci were found in common: the PTPN2 locus and the IL18RAP locus. Therefore, in order to directly test for additional shared genetic risk factors, we combined the GWA results from two large studies of CelD and CD, essentially creating a combined phenotype with anyone with CD or CelD being coded as affected. Association results were then replicated in additional cohorts of CelD and CD. It is expected that shared risk loci should show association in this analysis, whereas the signal of risk loci specific to either of the two diseases should be diluted. With this method of meta-analysis, we identified next to PTPN2 and IL18 RAP two loci harbouring TAGAP and PUS10 as shared risk loci for Crohn's disease and celiac disease at genome-wide significance.
doi:10.1371/journal.pgen.1001283
PMCID: PMC3029251  PMID: 21298027
15.  Six Novel Susceptibility Loci for Early-Onset Androgenetic Alopecia and Their Unexpected Association with Common Diseases 
PLoS Genetics  2012;8(5):e1002746.
Androgenetic alopecia (AGA) is a highly heritable condition and the most common form of hair loss in humans. Susceptibility loci have been described on the X chromosome and chromosome 20, but these loci explain a minority of its heritable variance. We conducted a large-scale meta-analysis of seven genome-wide association studies for early-onset AGA in 12,806 individuals of European ancestry. While replicating the two AGA loci on the X chromosome and chromosome 20, six novel susceptibility loci reached genome-wide significance (p = 2.62×10−9–1.01×10−12). Unexpectedly, we identified a risk allele at 17q21.31 that was recently associated with Parkinson's disease (PD) at a genome-wide significant level. We then tested the association between early-onset AGA and the risk of PD in a cross-sectional analysis of 568 PD cases and 7,664 controls. Early-onset AGA cases had significantly increased odds of subsequent PD (OR = 1.28, 95% confidence interval: 1.06–1.55, p = 8.9×10−3). Further, the AGA susceptibility alleles at the 17q21.31 locus are on the H1 haplotype, which is under negative selection in Europeans and has been linked to decreased fertility. Combining the risk alleles of six novel and two established susceptibility loci, we created a genotype risk score and tested its association with AGA in an additional sample. Individuals in the highest risk quartile of a genotype score had an approximately six-fold increased risk of early-onset AGA [odds ratio (OR) = 5.78, p = 1.4×10−88]. Our results highlight unexpected associations between early-onset AGA, Parkinson's disease, and decreased fertility, providing important insights into the pathophysiology of these conditions.
Author Summary
While most genome-wide association studies (GWAS) focus on the identification of susceptibility loci for a specific disease, this hypothesis-free approach also enables the identification of unexpected associations between different diseases by taking advantage of the previously published GWAS associations. Androgenetic Alopecia (AGA, also known as male pattern baldness) is the most common type of hair loss in humans. Parkinson's disease is reported to occur more commonly in men than in women; however, there are no studies investigating the link between AGA and Parkinson's disease. Here, we show that a specific genetic locus, chromosome 17q21.31, which is associated with Parkinson's disease, is also a susceptibility locus for early-onset AGA. We further investigate the association between early-onset AGA and Parkinson's disease, irrespective of genotype, directly in a large-scale web-based study. We find that men with early-onset AGA have 28% higher risk of developing Parkinson's disease. The early-onset AGA locus on chromosome 17q21.31 has also been linked to decreased fertility previously. Future studies of this locus may implicate novel biological pathways affecting these three conditions.
doi:10.1371/journal.pgen.1002746
PMCID: PMC3364959  PMID: 22693459
16.  Evaluation of Widely Consumed Botanicals as Immunological Adjuvants 
Vaccine  2008;26(37):4860-4865.
Background
Many widely used botanical medicines are claimed to be immune enhancers. Clear evidence of augmentation of immune responses in vivo is lacking in most cases. To select botanicals for further study based on immune enhancing activity, we study them here mixed with antigen and injected subcutaneously (s.c.). Globo H and GD3 are cell surface carbohydrates expressed on glycolipids or glycoproteins on the cell surface of many cancers. When conjugated to keyhole limpet hemocyanin (KLH), mixed with an immunological adjuvant and administered s.c. the magnitude of the antibody responses against globo H, GD3 and KLH depend largely on the potency of the adjuvant. We describe here the results obtained using this s.c. immunization model with 7 botanicals purported to have immune stimulant effects.
Methods
Groups of 5–10 mice were immunized with globo H–KLH or GD3-KLH mixed with botanical, saline or positive control immunological adjuvant, s.c. 3 times at 1 week intervals. Antibody responses were measured 1 and 2 weeks after the 3rd immunization. The following seven botanicals and fractions were tested: (1) H-48 (Honso USA Co.), (2) Coriolus vesicolor raw water extract, purified polysaccharide-K (PSK) or purified polysaccharide-peptide (PSP) (Institute of Chinese Medicine (ICM)), (3) Maitake extract (Yukiguni Maitake Co Ltd. and Tradeworks Group), (4) Echinacea lipophilic, neutral and acidic extracts (Gaia Herbs), (5) Astragalus water, 50% or 95% ethanol extracts (ICM), (6) Turmeric supercritical (SC) or hydro-ethanolic (HE) extracts (New Chapter) or 60% ethanol extract (ICM) and (7) yeast β-glucan (Biotec Pharmacon). Purified saponin extract QS-21 (Antigenics) and semi-synthetic saponin GPI-0100 (Advanced BioTherapies) were used as positive control adjuvants. Sera were analyzed by ELISA against synthetic globo H ceramide or GD3 and KLH.
Results
Consistent significant adjuvant activity was observed after s.c vaccination with the Coriolus extracts (especially PSK), a 95% ethanol extract of astragalus and yeast β-glucan, and (to a lesser extent) Maitake. Antibodies against KLH in all cases and against globo H in most cases were induced by these botanicals. Little or no adjuvant activity was demonstrated with H48 or Echinacea extracts or the astragalus water extract. Experiments with GD3-KLH as immunogen confirmed the adjuvant activity of the Coriolus, yeast β-glucan and Astragalus extracts. While extraction with ethanol concentrated the active ingredients in astragalus, it had no impact on coriolus where the 90% ethanol precipitate and solute were equally active.
Conclusions
Some, but not all, botanicals purported to be immune stimulants had adjuvant activity in our model. PSK and astragalus were surprisingly active and are being further fractionated to identify the most active adjuvant components.
doi:10.1016/j.vaccine.2008.06.098
PMCID: PMC2565601  PMID: 18640165
Astragalus; Botanicals; Conjugate vaccine; Cancer vaccine; β-glucan; Immunological adjuvant; PSK; Saponin
17.  Effect of routine probiotic, Lactobacillus reuteri DSM 17938, use on rates of necrotizing enterocolitis in neonates with birthweight < 1000 grams: a sequential analysis 
BMC Pediatrics  2012;12:142.
Background
Necrotizing enterocolitis (NEC) is a disease in neonates, often resulting in death or serious medical or neurodevelopmental complications. The rate of NEC is highest in the smallest babies and many efforts have been tried to reduce the rate of NEC. In neonates born below 1500 grams, the rate of NEC has been significantly reduced with the use of various probiotics. This study examines the impact of routine use of a probiotic, Lactobacillus reuteri DSM 17938 (BioGaia®), on the rate of NEC in neonates at highest risk for developing NEC, those with birth weight ≤1000 grams.
Methods
This is a retrospective cohort study comparing the rates of NEC in neonates with birth weight ≤ 1000 grams. The groups are separated into those neonates born from January 2004 to June 30, 2009, before introduction of L. reuteri , and neonates born July 2009 through April 2011 who received routine L. reuteri prophylaxis. The chart review study was approved by our institutional review board and exempted from informed consent.
Neonates were excluded if they died or were transferred within the first week of life. The remainder were categorized as having no NEC, medical NEC, surgical NEC, or NEC associated death. Since no major changes occurred in our NICU practice in recent years, and the introduction of L. reuteri as routine prophylaxis was abrupt, we attributed the post-probiotic changes to the introduction of this new therapy. Rates of NEC were compared using Chi square analysis with Fisher exact t-test.
Results
Medical records for 311 neonates were reviewed, 232 before- and 79 after-introduction of L. reuteri prophylaxis. The incidence of NEC was significantly lower in the neonates who received L. reuteri (2 of 79 neonates [2.5%] versus 35 of 232 untreated neonates [15.1%]). Rates of late-onset gram-negative or fungal infections (22.8 versus 31%) were not statistically different between treated and untreated groups. No adverse events related to use of L reuteri were noted.
Conclusions
Prophylactic initiation of L. reuteri as a probiotic for prevention of necrotizing enterocolitis resulted in a statistically significant benefit, with avoidance of 1 NEC case for every 8 patients given prophylaxis.
doi:10.1186/1471-2431-12-142
PMCID: PMC3472183  PMID: 22947597
Necrotizing enterocolitis;  Lactobacillus reuteri DSM 17938; Probiotic; Extremely low birth weight
18.  Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork 
BMC Genetics  2008;9:73.
Background
A typical genetical genomics experiment results in four separate data sets; genotype, gene expression, higher-order phenotypic data and metadata that describe the protocols, processing and the array platform. Used in concert, these data sets provide the opportunity to perform genetic analysis at a systems level. Their predictive power is largely determined by the gene expression dataset where tens of millions of data points can be generated using currently available mRNA profiling technologies. Such large, multidimensional data sets often have value beyond that extracted during their initial analysis and interpretation, particularly if conducted on widely distributed reference genetic materials. Besides quality and scale, access to the data is of primary importance as accessibility potentially allows the extraction of considerable added value from the same primary dataset by the wider research community. Although the number of genetical genomics experiments in different plant species is rapidly increasing, none to date has been presented in a form that allows quick and efficient on-line testing for possible associations between genes, loci and traits of interest by an entire research community.
Description
Using a reference population of 150 recombinant doubled haploid barley lines we generated novel phenotypic, mRNA abundance and SNP-based genotyping data sets, added them to a considerable volume of legacy trait data and entered them into the GeneNetwork . GeneNetwork is a unified on-line analytical environment that enables the user to test genetic hypotheses about how component traits, such as mRNA abundance, may interact to condition more complex biological phenotypes (higher-order traits). Here we describe these barley data sets and demonstrate some of the functionalities GeneNetwork provides as an easily accessible and integrated analytical environment for exploring them.
Conclusion
By integrating barley genotypic, phenotypic and mRNA abundance data sets directly within GeneNetwork's analytical environment we provide simple web access to the data for the research community. In this environment, a combination of correlation analysis and linkage mapping provides the potential to identify and substantiate gene targets for saturation mapping and positional cloning. By integrating datasets from an unsequenced crop plant (barley) in a database that has been designed for an animal model species (mouse) with a well established genome sequence, we prove the importance of the concept and practice of modular development and interoperability of software engineering for biological data sets.
doi:10.1186/1471-2156-9-73
PMCID: PMC2630324  PMID: 19017390
19.  Genome Wide Association Studies Using a New Nonparametric Model Reveal the Genetic Architecture of 17 Agronomic Traits in an Enlarged Maize Association Panel 
PLoS Genetics  2014;10(9):e1004573.
Association mapping is a powerful approach for dissecting the genetic architecture of complex quantitative traits using high-density SNP markers in maize. Here, we expanded our association panel size from 368 to 513 inbred lines with 0.5 million high quality SNPs using a two-step data-imputation method which combines identity by descent (IBD) based projection and k-nearest neighbor (KNN) algorithm. Genome-wide association studies (GWAS) were carried out for 17 agronomic traits with a panel of 513 inbred lines applying both mixed linear model (MLM) and a new method, the Anderson-Darling (A-D) test. Ten loci for five traits were identified using the MLM method at the Bonferroni-corrected threshold −log10 (P) >5.74 (α = 1). Many loci ranging from one to 34 loci (107 loci for plant height) were identified for 17 traits using the A-D test at the Bonferroni-corrected threshold −log10 (P) >7.05 (α = 0.05) using 556809 SNPs. Many known loci and new candidate loci were only observed by the A-D test, a few of which were also detected in independent linkage analysis. This study indicates that combining IBD based projection and KNN algorithm is an efficient imputation method for inferring large missing genotype segments. In addition, we showed that the A-D test is a useful complement for GWAS analysis of complex quantitative traits. Especially for traits with abnormal phenotype distribution, controlled by moderate effect loci or rare variations, the A-D test balances false positives and statistical power. The candidate SNPs and associated genes also provide a rich resource for maize genetics and breeding.
Author Summary
Genotype imputation has been used widely in the analysis of genome-wide association studies (GWAS) to boost power and fine-map associations. We developed a two-step data imputation method to meet the challenge of large proportion missing genotypes. GWAS have uncovered an extensive genetic architecture of complex quantitative traits using high-density SNP markers in maize in the past few years. Here, GWAS were carried out for 17 agronomic traits with a panel of 513 inbred lines applying both mixed linear model and a new method, the Anderson-Darling (A-D) test. We intend to show that the A-D test is a complement to current GWAS methods, especially for complex quantitative traits controlled by moderate effect loci or rare variations and with abnormal phenotype distribution. In addition, the traits associated QTL identified here provide a rich resource for maize genetics and breeding.
doi:10.1371/journal.pgen.1004573
PMCID: PMC4161304  PMID: 25211220
20.  An Integration of Genome-Wide Association Study and Gene Expression Profiling to Prioritize the Discovery of Novel Susceptibility Loci for Osteoporosis-Related Traits 
PLoS Genetics  2010;6(6):e1000977.
Osteoporosis is a complex disorder and commonly leads to fractures in elderly persons. Genome-wide association studies (GWAS) have become an unbiased approach to identify variations in the genome that potentially affect health. However, the genetic variants identified so far only explain a small proportion of the heritability for complex traits. Due to the modest genetic effect size and inadequate power, true association signals may not be revealed based on a stringent genome-wide significance threshold. Here, we take advantage of SNP and transcript arrays and integrate GWAS and expression signature profiling relevant to the skeletal system in cellular and animal models to prioritize the discovery of novel candidate genes for osteoporosis-related traits, including bone mineral density (BMD) at the lumbar spine (LS) and femoral neck (FN), as well as geometric indices of the hip (femoral neck-shaft angle, NSA; femoral neck length, NL; and narrow-neck width, NW). A two-stage meta-analysis of GWAS from 7,633 Caucasian women and 3,657 men, revealed three novel loci associated with osteoporosis-related traits, including chromosome 1p13.2 (RAP1A, p = 3.6×10−8), 2q11.2 (TBC1D8), and 18q11.2 (OSBPL1A), and confirmed a previously reported region near TNFRSF11B/OPG gene. We also prioritized 16 suggestive genome-wide significant candidate genes based on their potential involvement in skeletal metabolism. Among them, 3 candidate genes were associated with BMD in women. Notably, 2 out of these 3 genes (GPR177, p = 2.6×10−13; SOX6, p = 6.4×10−10) associated with BMD in women have been successfully replicated in a large-scale meta-analysis of BMD, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of our prioritization strategy. In the absence of direct biological support for identified genes, we highlighted the efficiency of subsequent functional characterization using publicly available expression profiling relevant to the skeletal system in cellular or whole animal models to prioritize candidate genes for further functional validation.
Author Summary
BMD and hip geometry are two major predictors of osteoporotic fractures, the most severe consequence of osteoporosis in elderly persons. We performed sex-specific genome-wide association studies (GWAS) for BMD at the lumbar spine and femor neck skeletal sites as well as hip geometric indices (NSA, NL, and NW) in the Framingham Osteoporosis Study and then replicated the top findings in two independent studies. Three novel loci were significant: in women, including chromosome 1p13.2 (RAP1A) for NW; in men, 2q11.2 (TBC1D8) for NSA and 18q11.2 (OSBPL1A) for NW. We confirmed a previously reported region on 8q24.12 (TNFRSF11B/OPG) for lumbar spine BMD in women. In addition, we integrated GWAS signals with eQTL in several tissues and publicly available expression signature profiling in cellular and whole-animal models, and prioritized 16 candidate genes/loci based on their potential involvement in skeletal metabolism. Among three prioritized loci (GPR177, SOX6, and CASR genes) associated with BMD in women, GPR177 and SOX6 have been successfully replicated later in a large-scale meta-analysis, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of using expression profiling to support the candidacy of suggestive GWAS signals that may contain important genes of interest.
doi:10.1371/journal.pgen.1000977
PMCID: PMC2883588  PMID: 20548944
21.  Integrating Computational Biology and Forward Genetics in Drosophila 
PLoS Genetics  2009;5(1):e1000351.
Genetic screens are powerful methods for the discovery of gene–phenotype associations. However, a systems biology approach to genetics must leverage the massive amount of “omics” data to enhance the power and speed of functional gene discovery in vivo. Thus far, few computational methods for gene function prediction have been rigorously tested for their performance on a genome-wide scale in vivo. In this work, we demonstrate that integrating genome-wide computational gene prioritization with large-scale genetic screening is a powerful tool for functional gene discovery. To discover genes involved in neural development in Drosophila, we extend our strategy for the prioritization of human candidate disease genes to functional prioritization in Drosophila. We then integrate this prioritization strategy with a large-scale genetic screen for interactors of the proneural transcription factor Atonal using genomic deficiencies and mutant and RNAi collections. Using the prioritized genes validated in our genetic screen, we describe a novel genetic interaction network for Atonal. Lastly, we prioritize the whole Drosophila genome and identify candidate gene associations for ten receptor-signaling pathways. This novel database of prioritized pathway candidates, as well as a web application for functional prioritization in Drosophila, called Endeavour-HighFly, and the Atonal network, are publicly available resources. A systems genetics approach that combines the power of computational predictions with in vivo genetic screens strongly enhances the process of gene function and gene–gene association discovery.
Author Summary
Genome sequencing and annotation, combined with large-scale molecular experiments to query gene expression and molecular interactions, collectively known as Systems Biology, have resulted in an enormous wealth in biological databases. Yet, it remains a daunting task to use these data to decipher the rules that govern biological systems. One of the most trusted approaches in biology is genetic analysis because of its emphasis on gene function in living organisms. Genetics, however, proceeds slowly and unravels small-scale interactions. Turning genetics into an effective tool of Systems Biology requires harnessing the large-scale molecular data for the design and execution of genetic screens. In this work, we test the idea of exploiting a computational approach known as gene prioritization to pre-rank genes for the likelihood of their involvement in a process of interest. By carrying out a gene prioritization–supported genetic screen, we greatly enhance the speed and output of in vivo genetic screens without compromising their sensitivity. These results mean that future genetic screens can be custom-catered for any process of interest and carried out with a speed and efficiency that is comparable to other large-scale molecular experiments. We refer to this combined approach as Systems Genetics.
doi:10.1371/journal.pgen.1000351
PMCID: PMC2628282  PMID: 19165344
22.  Physical Activity Attenuates the Genetic Predisposition to Obesity in 20,000 Men and Women from EPIC-Norfolk Prospective Population Study 
PLoS Medicine  2010;7(8):e1000332.
Shengxu Li and colleagues use data from a large prospective observational cohort to examine the extent to which a genetic predisposition toward obesity may be modified by living a physically active lifestyle.
Background
We have previously shown that multiple genetic loci identified by genome-wide association studies (GWAS) increase the susceptibility to obesity in a cumulative manner. It is, however, not known whether and to what extent this genetic susceptibility may be attenuated by a physically active lifestyle. We aimed to assess the influence of a physically active lifestyle on the genetic predisposition to obesity in a large population-based study.
Methods and Findings
We genotyped 12 SNPs in obesity-susceptibility loci in a population-based sample of 20,430 individuals (aged 39–79 y) from the European Prospective Investigation of Cancer (EPIC)-Norfolk cohort with an average follow-up period of 3.6 y. A genetic predisposition score was calculated for each individual by adding the body mass index (BMI)-increasing alleles across the 12 SNPs. Physical activity was assessed using a self-administered questionnaire. Linear and logistic regression models were used to examine main effects of the genetic predisposition score and its interaction with physical activity on BMI/obesity risk and BMI change over time, assuming an additive effect for each additional BMI-increasing allele carried. Each additional BMI-increasing allele was associated with 0.154 (standard error [SE] 0.012) kg/m2 (p = 6.73×10−37) increase in BMI (equivalent to 445 g in body weight for a person 1.70 m tall). This association was significantly (pinteraction = 0.005) more pronounced in inactive people (0.205 [SE 0.024] kg/m2 [p = 3.62×10−18; 592 g in weight]) than in active people (0.131 [SE 0.014] kg/m2 [p = 7.97×10−21; 379 g in weight]). Similarly, each additional BMI-increasing allele increased the risk of obesity 1.116-fold (95% confidence interval [CI] 1.093–1.139, p = 3.37×10−26) in the whole population, but significantly (pinteraction = 0.015) more in inactive individuals (odds ratio [OR] = 1.158 [95% CI 1.118–1.199; p = 1.93×10−16]) than in active individuals (OR = 1.095 (95% CI 1.068–1.123; p = 1.15×10−12]). Consistent with the cross-sectional observations, physical activity modified the association between the genetic predisposition score and change in BMI during follow-up (pinteraction = 0.028).
Conclusions
Our study shows that living a physically active lifestyle is associated with a 40% reduction in the genetic predisposition to common obesity, as estimated by the number of risk alleles carried for any of the 12 recently GWAS-identified loci.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
In the past few decades, the global incidence of obesity—defined as a body mass index (BMI, a simple index of weight-for-height that uses the weight in kilograms divided by the square of the height in meters) of 30 and over, has increased so much that this growing public health concern is now commonly referred to as the “obesity epidemic.” Once considered prevalent only in high-income countries, obesity is an increasing health problem in low- and middle-income countries, particularly in urban settings. In 2005, at least 400 million adults world-wide were obese, and the projected figure for 2015 is a substantial increase of 300 million to around 700 million. Childhood obesity is also a growing concern. Contributing factors to the obesity epidemic are a shift in diet to an increased intake of energy-dense foods that are high in fat and sugars and a trend towards decreased physical activity due to increasingly sedentary lifestyles.
However, genetics are also thought to play a critical role as genetically predisposed individuals may be more prone to obesity if they live in an environment that has abundant access to energy-dense food and labor-saving devices.
Why Was This Study Done?
Although recent genetic studies (genome-wide association studies) have identified 12 alleles (a DNA variant that is located at a specific position on a specific chromosome) associated with increased BMI, there has been no convincing evidence of the interaction between genetics and lifestyle. In this study the researchers examined the possibility of such an interaction by assessing whether individuals with a genetic predisposition to increased obesity risk could modify this risk by increasing their daily physical activity.
What Did the Researchers Do and Find?
The researchers used a population-based cohort study of 25,631 people living in Norwich, UK (The EPIC-Norfolk study) and identified individuals who were 39 to 79 years old during a health check between 1993 and 1997. The researchers invited these people to a second health examination. In total, 20,430 individuals had baseline data available, of which 11,936 had BMI data at the second health check. The researchers used genotyping methods and then calculated a genetic predisposition score for each individual and their occupational and leisure-time physical activities were assessed by using a validated self-administered questionnaire. Then, the researchers used modeling techniques to examine the main effects of the genetic predisposition score and its interaction with physical activity on BMI/obesity risk and BMI change over time. The researchers found that each additional BMI-increasing allele was associated with an increase in BMI equivalent to 445 g in body weight for a person 1.70 m tall and that the size of this effect was greater in inactive people than in active people. In individuals who have a physically active lifestyle, this increase was only 379 g/allele, or 36% lower than in physically inactive individuals in whom the increase was 592 g/allele. Furthermore, in the total sample each additional obesity-susceptibility allele increased the odds of obesity by 1.116-fold. However, the increased odds per allele for obesity risk were 40% lower in physically active individuals (1.095 odds/allele) compared to physically inactive individuals (1.158 odds/allele).
What Do These Findings Mean?
The findings of this study indicate that the genetic predisposition to obesity can be reduced by approximately 40% by having a physically active lifestyle. The findings of this study suggest that, while the whole population benefits from increased physical activity levels, individuals who are genetically predisposed to obesity would benefit more than genetically protected individuals. Furthermore, these findings challenge the deterministic view of the genetic predisposition to obesity that is often held by the public, as they show that even the most genetically predisposed individuals will benefit from adopting a healthy lifestyle. The results are limited by participants self-reporting their physical activity levels, which is less accurate than objective measures of physical activity.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000332.
This study relies on the results of previous genome-wide association studies The National Human Genome Research Institute provides an easy-to-follow guide to understanding such studies
The International Association for the Study of Obesity aims to improve global health by promoting the understanding of obesity and weight-related diseases through scientific research and dialogue
The International Obesity Taskforce is the research-led think tank and advocacy arm of the International Association for the Study of Obesity
The Global Alliance for the Prevention of Obesity and Related Chronic Disease is a global action program that addresses the issues surrounding the prevention of obesity
The National Institutes of Health has its own obesity task force, which includes 26 institutes
doi:10.1371/journal.pmed.1000332
PMCID: PMC2930873  PMID: 20824172
23.  Gene-Based Tests of Association 
PLoS Genetics  2011;7(7):e1002177.
Genome-wide association studies (GWAS) are now used routinely to identify SNPs associated with complex human phenotypes. In several cases, multiple variants within a gene contribute independently to disease risk. Here we introduce a novel Gene-Wide Significance (GWiS) test that uses greedy Bayesian model selection to identify the independent effects within a gene, which are combined to generate a stronger statistical signal. Permutation tests provide p-values that correct for the number of independent tests genome-wide and within each genetic locus. When applied to a dataset comprising 2.5 million SNPs in up to 8,000 individuals measured for various electrocardiography (ECG) parameters, this method identifies more validated associations than conventional GWAS approaches. The method also provides, for the first time, systematic assessments of the number of independent effects within a gene and the fraction of disease-associated genes housing multiple independent effects, observed at 35%–50% of loci in our study. This method can be generalized to other study designs, retains power for low-frequency alleles, and provides gene-based p-values that are directly compatible for pathway-based meta-analysis.
Author Summary
Genome-wide association studies (GWAS) have successfully identified genetic variants associated with complex human phenotypes. Despite a proliferation of analysis methods, most studies rely on simple, robust SNP–by–SNP univariate tests with ever-larger population sizes. Here we introduce a new test motivated by the biological hypothesis that a single gene may contain multiple variants that contribute independently to a trait. Applied to simulated phenotypes with real genotypes, our new method, Gene-Wide Significance (GWiS), has better power to identify true associations than traditional univariate methods, previous Bayesian methods, popular L1 regularized (LASSO) multivariate regression, and other approaches. GWiS retains power for low-frequency alleles that are increasingly important for personal genetics, and it is the only method tested that accurately estimates the number of independent effects within a gene. When applied to human data for multiple ECG traits, GWiS identifies more genome-wide significant loci (verified by meta-analyses of much larger populations) than any other method. We estimate that 35%–50% of ECG trait loci are likely to have multiple independent effects, suggesting that our method will reveal previously unidentified associations when applied to existing data and will improve power for future association studies.
doi:10.1371/journal.pgen.1002177
PMCID: PMC3145613  PMID: 21829371
24.  Geographic Differences in Genetic Susceptibility to IgA Nephropathy: GWAS Replication Study and Geospatial Risk Analysis 
PLoS Genetics  2012;8(6):e1002765.
IgA nephropathy (IgAN), major cause of kidney failure worldwide, is common in Asians, moderately prevalent in Europeans, and rare in Africans. It is not known if these differences represent variation in genes, environment, or ascertainment. In a recent GWAS, we localized five IgAN susceptibility loci on Chr.6p21 (HLA-DQB1/DRB1, PSMB9/TAP1, and DPA1/DPB2 loci), Chr.1q32 (CFHR3/R1 locus), and Chr.22q12 (HORMAD2 locus). These IgAN loci are associated with risk of other immune-mediated disorders such as type I diabetes, multiple sclerosis, or inflammatory bowel disease. We tested association of these loci in eight new independent cohorts of Asian, European, and African-American ancestry (N = 4,789), followed by meta-analysis with risk-score modeling in 12 cohorts (N = 10,755) and geospatial analysis in 85 world populations. Four susceptibility loci robustly replicated and all five loci were genome-wide significant in the combined cohort (P = 5×10−32–3×10−10), with heterogeneity detected only at the PSMB9/TAP1 locus (I2 = 0.60). Conditional analyses identified two new independent risk alleles within the HLA-DQB1/DRB1 locus, defining multiple risk and protective haplotypes within this interval. We also detected a significant genetic interaction, whereby the odds ratio for the HORMAD2 protective allele was reversed in homozygotes for a CFHR3/R1 deletion (P = 2.5×10−4). A seven–SNP genetic risk score, which explained 4.7% of overall IgAN risk, increased sharply with Eastward and Northward distance from Africa (r = 0.30, P = 3×10−128). This model paralleled the known East–West gradient in disease risk. Moreover, the prediction of a South–North axis was confirmed by registry data showing that the prevalence of IgAN–attributable kidney failure is increased in Northern Europe, similar to multiple sclerosis and type I diabetes. Variation at IgAN susceptibility loci correlates with differences in disease prevalence among world populations. These findings inform genetic, biological, and epidemiological investigations of IgAN and permit cross-comparison with other complex traits that share genetic risk loci and geographic patterns with IgAN.
Author Summary
IgA nephropathy (IgAN) is the most common cause of kidney failure in Asia, has lower prevalence in Europe, and is very infrequent among populations of African ancestry. A long-standing question in the field is whether these differences represent variation in genes, environment, or ascertainment. In a recent genome-wide association study of 5,966 individuals, we identified five susceptibility loci for this trait. In this paper, we study the largest IgAN case-control cohort reported to date, composed of 10,775 individuals of European, Asian, and African-American ancestry. We confirm that all five loci are significant contributors to disease risk across this multi-ethnic cohort. In addition, we identify two novel independent susceptibility alleles within the HLA-DQB1/DRB1 locus and a new genetic interaction between loci on Chr.1p36 and Chr.22q22. We develop a seven–SNP genetic risk score that explains nearly 5% of variation in disease risk. In geospatial analysis of 85 world populations, the genetic risk score closely parallels worldwide patterns of disease prevalence. The genetic risk score also predicts an unsuspected Northward risk gradient in Europe. This genetic prediction is verified by examination of registry data demonstrating, similarly to other immune-mediated diseases such as multiple sclerosis and type I diabetes, a previously unrecognized increase in IgAN–attributable kidney failure in Northern European countries.
doi:10.1371/journal.pgen.1002765
PMCID: PMC3380840  PMID: 22737082
25.  A Novel Evolution-Based Method for Detecting Gene-Gene Interactions 
PLoS ONE  2011;6(10):e26435.
Background
The rapid advance in large-scale SNP-chip technologies offers us great opportunities in elucidating the genetic basis of complex diseases. Methods for large-scale interactions analysis have been under development from several sources. Due to several difficult issues (e.g., sparseness of data in high dimensions and low replication or validation rate), development of fast, powerful and robust methods for detecting various forms of gene-gene interactions continues to be a challenging task.
Methodology/Principal Findings
In this article, we have developed an evolution-based method to search for genome-wide epistasis in a case-control design. From an evolutionary perspective, we view that human diseases originate from ancient mutations and consider that the underlying genetic variants play a role in differentiating human population into the healthy and the diseased. Based on this concept, traditional evolutionary measure, fixation index (Fst) for two unlinked loci, which measures the genetic distance between populations, should be able to reveal the responsible genetic interplays for disease traits. To validate our proposal, we first investigated the theoretical distribution of Fst by using extensive simulations. Then, we explored its power for detecting gene-gene interactions via SNP markers, and compared it with the conventional Pearson Chi-square test, mutual information based test and linkage disequilibrium based test under several disease models. The proposed evolution-based method outperformed these compared methods in dominant and additive models, no matter what the disease allele frequencies were. However, its performance was relatively poor in a recessive model. Finally, we applied the proposed evolution-based method to analysis of a published dataset. Our results showed that the P value of the Fst -based statistic is smaller than those obtained by the LD-based statistic or Poisson regression models.
Conclusions/Significance
With rapidly growing large-scale genetic association studies, the proposed evolution-based method can be a promising tool in the identification of epistatic effects.
doi:10.1371/journal.pone.0026435
PMCID: PMC3201950  PMID: 22046286

Results 1-25 (1620096)