PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (2057992)

Clipboard (0)
None

Related Articles

1.  Niche adaptation by expansion and reprogramming of general transcription factors 
Experimental analysis of TFB family proteins in a halophilic archaeon reveals complex environment-dependent fitness contributions. Gene conversion events among these proteins can generate novel niche adaptation capabilities, a process that may have contributed to archaeal adaptation to extreme environments.
Evolution of archaeal lineages correlate with duplication events in the TFB family.Each TFB is required for adaptation to multiple environments.The relative fitness contributions of TFBs change with environmental context.Changes in the regulation of duplicated TFBs can generate new adaptation capabilities.
The evolutionary success of an organism depends on its ability to continually adapt to changes in the patterns of constant, periodic, and transient challenges within its environment. This process of ‘niche adaptation' requires reprogramming of the organism's environmental response networks by reorganizing interactions among diverse parts including environmental sensors, signal transducers, and transcriptional and post-transcriptional regulators. Gene duplications have been discovered to be one of the principal strategies in this process, especially for reprogramming of gene regulatory networks (GRNs). Whereas eukaryotes require dozens of factors for recruitment of RNA polymerase, archaea require just two general transcription factors (GTFs) that are orthologous to eukaryotic TFIIB (TFB in archaea) and TATA-binding protein (TBP) (Bell et al, 1998). Both of these GTFs have expanded extensively in nearly 50% of all archaea whose genomes have been fully sequenced. The phylogenetic analysis presented in this study reveal lineage-specific expansions of TFBs, suggesting that they might encode functionally specialized gene regulatory programs for the unique environments to which these organisms have adapted. This hypothesis is particularly appealing when we consider that the greatest expansion is observed within the group of halophilic archaea whose habitats are associated with routine and dynamic changes in a number of environmental factors including light, temperature, oxygen, salinity, and ionic composition (Rodriguez-Valera, 1993; Litchfield, 1998).
We have previously demonstrated that variations in the expanded set of TFBs (a through e) in Halobacterium salinarum NRC-1 manifests at the level of physical interactions within and across the two families, their DNA-binding specificity, their differential regulation in varying environments, and, ultimately, on the large-scale segregation of transcription of all genes into overlapping yet distinct sets of functionally related groups (Facciotti et al, 2007). We have extended findings from this earlier study with a systematic survey of the fitness consequences of perturbing the TFB network of H. salinarum NRC-1 across 17 environments. Notably, each TFB conferred fitness in two or more environmental conditions tested, and the relative fitness contributions (see Table I) of the five TFBs varied significantly by environment. From an evolutionary perspective, the relationships among these fitness landscapes reveal that two classes of TFBs (c/g- and f-type) appear to have played an important role in the evolution of halophilic archaea by overseeing regulation of core physiological capabilities in these organisms. TFBs of the other clades (b/d and a/e) seem to have emerged much more recently through gene duplications or horizontal gene transfers (HGTs) and are being utilized for adaptation to specialized environmental conditions.
We also investigated higher-order functional interactions and relationships among the duplicated TFBs by performing competition experiments and by mapping genetic interactions in different environments. This demonstrated that depending on environmental context, the TFBs have strikingly different functional hierarchies and genetic interactions with one another. This is remarkable as it makes each TFB essential albeit at different times in a dynamically changing environment.
In order to understand the process by which such gene family expansions shape architecture and functioning of a GRN, we performed integrated analysis of phylogeny, physical interactions, regulation, and fitness landscapes of the seven TFBs in H. salinarum NRC-1. This revealed that evolution of both their protein-coding sequence and their promoter has been instrumental in the encoding of environment-specific regulatory programs. Importantly, the convergent and divergent evolution of regulation and binding properties of TFBs suggested that, aside from HGT and random mutations, a third plausible (and perhaps most interesting) mechanism for acquiring a novel TFB variant is through gene conversion. To test this hypothesis, we synthesized a novel TFBx by transferring TFBa/e clade-specific residues to a TFBd backbone, transformed this variant under the control of either the TFBd or the TFBe promoter (PtfbD or PtfbE) into three different host genetic backgrounds (Δura3 (parent), ΔtfbD, and ΔtfbE), and analyzed fitness and gene expression patterns during growth at 25 and 37°C. This showed that gene conversion events spanning the coding sequence and the promoter, environmental context, and genetic background of the host are all extremely influential in the functional integration of a TFB into the GRN. Importantly, this analysis suggested that altering the regulation of an existing set of expanded TFBs might be an efficient mechanism to reprogram the GRN to rapidly generate novel niche adaptation capability. We have confirmed this experimentally by increasing fitness merely by moving tfbE to PtfbD control, and by generating a completely novel phenotype (biofilm-like appearance) by overexpression of tfbE.
Altogether this study clearly demonstrates that archaea can rapidly generate novel niche adaptation programs by simply altering regulation of duplicated TFBs. This is significant because expansions in the TFB family is widespread in archaea, a class of organisms that not only represent 20% of biomass on earth but are also known to have colonized some of the most extreme environments (DeLong and Pace, 2001). This strategy for niche adaptation is further expanded through interactions of the multiple TFBs with members of other expanded TF families such as TBPs (Facciotti et al, 2007) and sequence-specific regulators (e.g. Lrp family (Peeters and Charlier, 2010)). This is analogous to combinatorial solutions for other complex biological problems such as recognition of pathogens by Toll-like receptors (Roach et al, 2005), generation of antibody diversity by V(D)J recombination (Early et al, 1980), and recognition and processing of odors (Malnic et al, 1999).
Numerous lineage-specific expansions of the transcription factor B (TFB) family in archaea suggests an important role for expanded TFBs in encoding environment-specific gene regulatory programs. Given the characteristics of hypersaline lakes, the unusually large numbers of TFBs in halophilic archaea further suggests that they might be especially important in rapid adaptation to the challenges of a dynamically changing environment. Motivated by these observations, we have investigated the implications of TFB expansions by correlating sequence variations, regulation, and physical interactions of all seven TFBs in Halobacterium salinarum NRC-1 to their fitness landscapes, functional hierarchies, and genetic interactions across 2488 experiments covering combinatorial variations in salt, pH, temperature, and Cu stress. This systems analysis has revealed an elegant scheme in which completely novel fitness landscapes are generated by gene conversion events that introduce subtle changes to the regulation or physical interactions of duplicated TFBs. Based on these insights, we have introduced a synthetically redesigned TFB and altered the regulation of existing TFBs to illustrate how archaea can rapidly generate novel phenotypes by simply reprogramming their TFB regulatory network.
doi:10.1038/msb.2011.87
PMCID: PMC3261711  PMID: 22108796
evolution by gene family expansion; fitness; niche adaptation; reprogramming of gene regulatory network; transcription factor B
2.  Backup without redundancy: genetic interactions reveal the cost of duplicate gene loss 
We show that genetic interaction profiles offer a powerful approach to elicit phenotypes that are far richer than is attainable using single gene deletions. This has allowed us to address the long-standing question of the role played by duplicate genes (paralogs) in robustness against deletion.We provide for the first time direct evidence that the capacity of some duplicates to cover for the loss of their paralogs can account for the observed difference in fitness between duplicate and singleton deletions mutants, but that the overall contribution of this effect to dispensability is small.More broadly, we demonstrate that paralogs possessing apparent backup capacity in some environments have in fact distinct and non-overlapping functions, and are unable to provide backup across a range of compromising conditions. This resolves the previous paradox of how backup genes conferring dispensability can nevertheless be independently maintained in the population.From a practical point of view, our findings suggest efficient strategies to elicit rich deletion phenotypes that should be highly relevant for the design of future phenotypic screens.
Much of our understanding of biological processes has been derived from the characterization of the functional consequence to an organism of altering one or more of its genes. Efforts to systematically evaluate the phenotypic effects of gene loss, however, have been hampered by the fact that the disruption of most genes has surprisingly modest effects on cell growth and viability. The high proportion of genes with no apparent deletion effect has wide-ranging practical and theoretical implications and has been the subject of considerable interest (Wagner, 2000, 2005; Giaever et al, 2002; Gu et al, 2003; Papp et al, 2004; Kafri et al, 2005). One factor that has been implicated as contributing to the high degree of dispensability is the abundance of closely related paralogs present in most genomes (Winzeler et al, 1999; Wagner, 2000; Giaever et al, 2002). Indeed, recent work in S. cerevisiae has shown that the existence of a paralog elsewhere in the genome significantly increases the chance that deletion of a given gene has little effect on growth (Gu et al, 2003). However, current analyses have been mostly correlative, and direct mechanistic evidence supporting or refuting the role of backup compensation in mutational robustness is still largely missing. Furthermore, backup between duplicates is not easily justified in evolutionary terms, in that a genuine ability to comprehensively cover for the loss of another gene is evolutionarily unstable (Brookfield, 1992).
Here, we exploit the recent availability of high-density quantitative genetic interaction profiles (EMAPs) to address these issues directly. To test whether SSL paralogs can account for the excess fitness of duplicates, we classified genes into fitness categories according to their deletion growth defect (Materials and methods). The subset of genes covered by our combined data set exhibits an over-representation of duplicate genes in the weak/no deletion phenotype (WNP) class similar to that reported previously (Gu et al, 2003) (Figure 1B). Strikingly, this difference corresponds to the number of WNP duplicates that have an SSL interaction with their corresponding paralog (Figure 1C). Our data thus provide direct evidence that it is indeed duplicate compensation that accounts for the observed difference in deletion growth defect between duplicates and singletons, at least for the genes covered by our data set.
Apart from the mechanism itself, the characteristic features of buffering duplicates have received considerable attention (Gu et al, 2003; Kafri et al, 2005; Wagner, 2005). Our data allowed us to unambiguously distinguish the subset of duplicates whose dispensability can be attributed to the existence of a backup paralog. The ability to identify backup duplicates directly put us in a position to study their features, and how they differ from other duplicates without buffering properties. In particular, we asked to what extent the observed buffering in rich media reflects functional similarity and a genuine ability to cover for the loss of a paralog in a broader range of conditions.
To assess the extent to which SSL duplicates can provide genuine backup under compromising conditions, we fist used genetic interaction profiles as a more stringent test for redundancy that assesses the effect of gene loss in the background of additional gene deletions. In contrast to the expectation that truly buffered duplicates should have few if any synthetic interactions, we find that the number is in fact substantial and often exceeds that of random genes and non-SSL duplicates (Figure 2B). Similarly, using a recent data set of sensitivity profiles of deletion strains to a range of agents and environments (Brown et al, 2006), we find that the deletion of SSL duplicates across a range of environments has on average no weaker (and in fact a slightly stronger) effect on cellular growth rate than that of non-SSL duplicates or random genes. Taken together, these findings suggest that the backup capacity of SSL duplicates is limited and not indicative of a comprehensive ability to cover for the loss of the paralogous partner.
We next tested the degree of functional similarity of buffering duplicates using similarity in genetic interaction as well as environmental sensitivity profiles as indicators of shared functionality (Tong et al, 2004; Schuldiner et al, 2005; Brown et al, 2006; Pan et al, 2006). In spite of their rich media buffering properties, we find that the interaction and sensitivity patterns of most SSL duplicates are divergent and are usually more similar to those of other, non-paralogous genes (Figure 2C and D; Supplementary Figure 10).
Lastly, in addition to our analysis of duplicate phenotypes, we used genetic interaction spectra as deletion phenotypes for generic genes whose single deletion in standard conditions has little measurable effect. As expected, genetic interactions provide a deletion phenotype for many more genes (80–90%) than single gene deletions in standard growth environments (Steinmetz et al, 2002), which yield a detectable growth defect only for 30–40% (Figure 4B). To assess whether these interactions reflect the cost of gene loss (gene importance), we asked if there is a relationship between the probability of a gene being retained between related species and its number of genetic interactions. Indeed, genetic interactivity exhibits a strong correlation with gene retention across related phyla (Figure 4C and Supplementary Figure 7), and predicts the likelihood of gene loss better than lethality/viability, quantitative growth deficiency or environmental specificity (Supplementary Figure 8). Thus, genetic interactions provide a cost of gene loss that effectively recapitulates evolutionary constraints. This is further supported by the observation that genetic interactions are significantly correlated with environmental sensitivity across a range of conditions. Thus, our findings suggest that for most genes there is a substantial cost of gene loss, even though this is often not reflected in single gene deletion tests carried out in standard conditions.
Many genes can be deleted with little phenotypic consequences. By what mechanism and to what extent the presence of duplicate genes in the genome contributes to this robustness against deletions has been the subject of considerable interest. Here, we exploit the availability of high-density genetic interaction maps to provide direct support for the role of backup compensation, where functionally overlapping duplicates cover for the loss of their paralog. However, we find that the overall contribution of duplicates to robustness against null mutations is low (∼25%). The ability to directly identify buffering paralogs allowed us to further study their properties, and how they differ from non-buffering duplicates. Using environmental sensitivity profiles as well as quantitative genetic interaction spectra as high-resolution phenotypes, we establish that even duplicate pairs with compensation capacity exhibit rich and typically non-overlapping deletion phenotypes, and are thus unable to comprehensively cover against loss of their paralog. Our findings reconcile the fact that duplicates can compensate for each other's loss under a limited number of conditions with the evolutionary instability of genes whose loss is not associated with a phenotypic penalty.
doi:10.1038/msb4100127
PMCID: PMC1847942  PMID: 17389874
duplication; evolution; genetic interactions; redundancy
3.  A Flexible Bayesian Model for Studying Gene–Environment Interaction 
PLoS Genetics  2012;8(1):e1002482.
An important follow-up step after genetic markers are found to be associated with a disease outcome is a more detailed analysis investigating how the implicated gene or chromosomal region and an established environment risk factor interact to influence the disease risk. The standard approach to this study of gene–environment interaction considers one genetic marker at a time and therefore could misrepresent and underestimate the genetic contribution to the joint effect when one or more functional loci, some of which might not be genotyped, exist in the region and interact with the environment risk factor in a complex way. We develop a more global approach based on a Bayesian model that uses a latent genetic profile variable to capture all of the genetic variation in the entire targeted region and allows the environment effect to vary across different genetic profile categories. We also propose a resampling-based test derived from the developed Bayesian model for the detection of gene–environment interaction. Using data collected in the Environment and Genetics in Lung Cancer Etiology (EAGLE) study, we apply the Bayesian model to evaluate the joint effect of smoking intensity and genetic variants in the 15q25.1 region, which contains a cluster of nicotinic acetylcholine receptor genes and has been shown to be associated with both lung cancer and smoking behavior. We find evidence for gene–environment interaction (P-value = 0.016), with the smoking effect appearing to be stronger in subjects with a genetic profile associated with a higher lung cancer risk; the conventional test of gene–environment interaction based on the single-marker approach is far from significant.
Author Summary
Many common diseases result from a complex interplay of genetic and environmental risk factors. It is important to study the potential genetic and environmental risk factors jointly in order to achieve a better understanding of the mechanisms underlying disease development. The standard single-marker approach that studies the environmental risk factor and one genetic marker at a time could misrepresent the gene–environment interaction, as the single genetic marker might not be an appropriate surrogate for the underlying genetic functioning polymorphisms. We propose a method to look at gene–environment interaction at the gene/region level by integrating information observed on multiple genetic markers within the selected gene/region with measures of environmental exposure. Using data collected in the Environment and Genetics in Lung Cancer Etiology (EAGLE) study, we apply the proposed model to evaluate the joint effect of smoking intensity and genetic variants in the 15q25.1 region and find evidence for gene–environment interaction (P-value = 0.016), with the smoking effect varying according to a subject's genetic profile.
doi:10.1371/journal.pgen.1002482
PMCID: PMC3266891  PMID: 22291610
4.  Two-Stage Two-Locus Models in Genome-Wide Association 
PLoS Genetics  2006;2(9):e157.
Studies in model organisms suggest that epistasis may play an important role in the etiology of complex diseases and traits in humans. With the era of large-scale genome-wide association studies fast approaching, it is important to quantify whether it will be possible to detect interacting loci using realistic sample sizes in humans and to what extent undetected epistasis will adversely affect power to detect association when single-locus approaches are employed. We therefore investigated the power to detect association for an extensive range of two-locus quantitative trait models that incorporated varying degrees of epistasis. We compared the power to detect association using a single-locus model that ignored interaction effects, a full two-locus model that allowed for interactions, and, most important, two two-stage strategies whereby a subset of loci initially identified using single-locus tests were analyzed using the full two-locus model. Despite the penalty introduced by multiple testing, fitting the full two-locus model performed better than single-locus tests for many of the situations considered, particularly when compared with attempts to detect both individual loci. Using a two-stage strategy reduced the computational burden associated with performing an exhaustive two-locus search across the genome but was not as powerful as the exhaustive search when loci interacted. Two-stage approaches also increased the risk of missing interacting loci that contributed little effect at the margins. Based on our extensive simulations, our results suggest that an exhaustive search involving all pairwise combinations of markers across the genome might provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
Synopsis
Although there is growing appreciation that attempting to map genetic interactions in humans may be a fruitful endeavor, there is no consensus as to the best strategy for their detection, particularly in the case of genome-wide association where the number of potential comparisons is enormous. In this article, the authors compare the performance of four different search strategies to detect loci which interact in genome-wide association—a single-locus search, an exhaustive two-locus search, and two, two-stage procedures in which a subset of loci initially identified with single-locus tests are analyzed using a full two-locus model. Their results show that when loci interact, an exhaustive two-locus search across the genome is superior to a two-stage strategy, and in many situations can identify loci which would not have been identified solely using a single-locus search. Their findings suggest that an exhaustive search involving all pairwise combinations of markers across the genome may provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
doi:10.1371/journal.pgen.0020157
PMCID: PMC1570380  PMID: 17002500
5.  Using Linkage Analysis to Detect Gene-Gene Interactions. 2. Improved Reliability and Extension to More-Complex Models 
PLoS ONE  2016;11(1):e0146240.
Detecting gene-gene interaction in complex diseases has become an important priority for common disease genetics, but most current approaches to detecting interaction start with disease-marker associations. These approaches are based on population allele frequency correlations, not genetic inheritance, and therefore cannot exploit the rich information about inheritance contained within families. They are also hampered by issues of rigorous phenotype definition, multiple test correction, and allelic and locus heterogeneity. We recently developed, tested, and published a powerful gene-gene interaction detection strategy based on conditioning family data on a known disease-causing allele or a disease-associated marker allele4. We successfully applied the method to disease data and used computer simulation to exhaustively test the method for some epistatic models. We knew that the statistic we developed to indicate interaction was less reliable when applied to more-complex interaction models. Here, we improve the statistic and expand the testing procedure. We computer-simulated multipoint linkage data for a disease caused by two interacting loci. We examined epistatic as well as additive models and compared them with heterogeneity models. In all our models, the at-risk genotypes are “major” in the sense that among affected individuals, a substantial proportion has a disease-related genotype. One of the loci (A) has a known disease-related allele (as would have been determined from a previous analysis). We removed (pruned) family members who did not carry this allele; the resultant dataset is referred to as “stratified.” This elimination step has the effect of raising the “penetrance” and detectability at the second locus (B). We used the lod scores for the stratified and unstratified data sets to calculate a statistic that either indicated the presence of interaction or indicated that no interaction was detectable. We show that the new method is robust and reliable for a wide range of parameters. Our statistic performs well both with the epistatic models (false negative rates, i.e., failing to detect interaction, ranging from 0 to 2.5%) and with the heterogeneity models (false positive rates, i.e., falsely detecting interaction, ≤1%). It works well with the additive model except when allele frequencies at the two loci differ widely. We explore those features of the additive model that make detecting interaction more difficult. All testing of this method suggests that it provides a reliable approach to detecting gene-gene interaction.
doi:10.1371/journal.pone.0146240
PMCID: PMC4709060  PMID: 26752287
6.  Application of three-level linear mixed-effects model incorporating gene-age interactions for association analysis of longitudinal family data 
BMC Proceedings  2009;3(Suppl 7):S89.
Longitudinal studies that collect repeated measurements on the same subjects over time have long been considered as being more powerful and providing much better information on individual changes than cross-sectional data. We propose a three-level linear mixed-effects model for testing genetic main effects and gene-age interactions with longitudinal family data. The simulated Genetic Analysis Workshop 16 Problem 3 data sets were used to evaluate the method. Genome-wide association analyses were conducted based on cross-sectional data, i.e., each of the three single-visit data sets separately, and also on the longitudinal data, i.e., using data from all three visits simultaneously. Results from the analysis of coronary artery calcification phenotype showed that the longitudinal association tests were much more powerful than those based on single-visit data only. Gene-age interactions were evaluated under the same framework for detecting genetic effects that are modulated by age.
PMCID: PMC2795992  PMID: 20018085
7.  Knowledge-Driven Analysis Identifies a Gene–Gene Interaction Affecting High-Density Lipoprotein Cholesterol Levels in Multi-Ethnic Populations 
PLoS Genetics  2012;8(5):e1002714.
Total cholesterol, low-density lipoprotein cholesterol, triglyceride, and high-density lipoprotein cholesterol (HDL-C) levels are among the most important risk factors for coronary artery disease. We tested for gene–gene interactions affecting the level of these four lipids based on prior knowledge of established genome-wide association study (GWAS) hits, protein–protein interactions, and pathway information. Using genotype data from 9,713 European Americans from the Atherosclerosis Risk in Communities (ARIC) study, we identified an interaction between HMGCR and a locus near LIPC in their effect on HDL-C levels (Bonferroni corrected Pc = 0.002). Using an adaptive locus-based validation procedure, we successfully validated this gene–gene interaction in the European American cohorts from the Framingham Heart Study (Pc = 0.002) and the Multi-Ethnic Study of Atherosclerosis (MESA; Pc = 0.006). The interaction between these two loci is also significant in the African American sample from ARIC (Pc = 0.004) and in the Hispanic American sample from MESA (Pc = 0.04). Both HMGCR and LIPC are involved in the metabolism of lipids, and genome-wide association studies have previously identified LIPC as associated with levels of HDL-C. However, the effect on HDL-C of the novel gene–gene interaction reported here is twice as pronounced as that predicted by the sum of the marginal effects of the two loci. In conclusion, based on a knowledge-driven analysis of epistasis, together with a new locus-based validation method, we successfully identified and validated an interaction affecting a complex trait in multi-ethnic populations.
Author Summary
Genome-wide association studies (GWAS) have identified many loci associated with complex human traits or diseases. However, the fraction of heritable variation explained by these loci is often relatively low. Gene–gene interactions might play a significant role in complex traits or diseases and are one of the many possible factors contributing to the missing heritability. However, to date only a few interactions have been found and validated in GWAS due to the limited power caused by the need for multiple-testing correction for the very large number of tests conducted. Here, we used three types of prior knowledge, known GWAS hits, protein–protein interactions, and pathway information, to guide our search for gene–gene interactions affecting four lipid levels. We identified an interaction between HMGCR and a locus near LIPC in their effect on high-density lipoprotein cholesterol (HDL-C) and another pair of loci that interact in their effect on low-density lipoprotein cholesterol (LDL-C). We validated the interaction on HDL-C in a number of independent multiple-ethnic populations, while the interaction underlying LDL-C did not validate. The prior knowledge-driven searching approach and a locus-based validation procedure show the potential for dissecting and validating gene–gene interactions in current and future GWAS.
doi:10.1371/journal.pgen.1002714
PMCID: PMC3359971  PMID: 22654671
8.  A strategy analysis for genetic association studies with known inbreeding 
BMC Genetics  2011;12:63.
Background
Association studies consist in identifying the genetic variants which are related to a specific disease through the use of statistical multiple hypothesis testing or segregation analysis in pedigrees. This type of studies has been very successful in the case of Mendelian monogenic disorders while it has been less successful in identifying genetic variants related to complex diseases where the insurgence depends on the interactions between different genes and the environment. The current technology allows to genotype more than a million of markers and this number has been rapidly increasing in the last years with the imputation based on templates sets and whole genome sequencing. This type of data introduces a great amount of noise in the statistical analysis and usually requires a great number of samples. Current methods seldom take into account gene-gene and gene-environment interactions which are fundamental especially in complex diseases. In this paper we propose to use a non-parametric additive model to detect the genetic variants related to diseases which accounts for interactions of unknown order. Although this is not new to the current literature, we show that in an isolated population, where the most related subjects share also most of their genetic code, the use of additive models may be improved if the available genealogical tree is taken into account. Specifically, we form a sample of cases and controls with the highest inbreeding by means of the Hungarian method, and estimate the set of genes/environmental variables, associated with the disease, by means of Random Forest.
Results
We have evidence, from statistical theory, simulations and two applications, that we build a suitable procedure to eliminate stratification between cases and controls and that it also has enough precision in identifying genetic variants responsible for a disease. This procedure has been successfully used for the beta-thalassemia, which is a well known Mendelian disease, and also to the common asthma where we have identified candidate genes that underlie to the susceptibility of the asthma. Some of such candidate genes have been also found related to common asthma in the current literature.
Conclusions
The data analysis approach, based on selecting the most related cases and controls along with the Random Forest model, is a powerful tool for detecting genetic variants associated to a disease in isolated populations. Moreover, this method provides also a prediction model that has accuracy in estimating the unknown disease status and that can be generally used to build kit tests for a wide class of Mendelian diseases.
doi:10.1186/1471-2156-12-63
PMCID: PMC3155486  PMID: 21767363
9.  Gene-Based Testing of Interactions in Association Studies of Quantitative Traits 
PLoS Genetics  2013;9(2):e1003321.
Various methods have been developed for identifying gene–gene interactions in genome-wide association studies (GWAS). However, most methods focus on individual markers as the testing unit, and the large number of such tests drastically erodes statistical power. In this study, we propose novel interaction tests of quantitative traits that are gene-based and that confer advantage in both statistical power and biological interpretation. The framework of gene-based gene–gene interaction (GGG) tests combine marker-based interaction tests between all pairs of markers in two genes to produce a gene-level test for interaction between the two. The tests are based on an analytical formula we derive for the correlation between marker-based interaction tests due to linkage disequilibrium. We propose four GGG tests that extend the following P value combining methods: minimum P value, extended Simes procedure, truncated tail strength, and truncated P value product. Extensive simulations point to correct type I error rates of all tests and show that the two truncated tests are more powerful than the other tests in cases of markers involved in the underlying interaction not being directly genotyped and in cases of multiple underlying interactions. We applied our tests to pairs of genes that exhibit a protein–protein interaction to test for gene-level interactions underlying lipid levels using genotype data from the Atherosclerosis Risk in Communities study. We identified five novel interactions that are not evident from marker-based interaction testing and successfully replicated one of these interactions, between SMAD3 and NEDD9, in an independent sample from the Multi-Ethnic Study of Atherosclerosis. We conclude that our GGG tests show improved power to identify gene-level interactions in existing, as well as emerging, association studies.
Author Summary
Epistasis is likely to play a significant role in complex diseases or traits and is one of the many possible explanations for “missing heritability.” However, epistatic interactions have been difficult to detect in genome-wide association studies (GWAS) due to the limited power caused by the multiple-testing correction from the large number of tests conducted. Gene-based gene–gene interaction (GGG) tests might hold the key to relaxing the multiple-testing correction burden and increasing the power for identifying epistatic interactions in GWAS. Here, we developed GGG tests of quantitative traits by extending four P value combining methods and evaluated their type I error rates and power using extensive simulations. All four GGG tests are more powerful than a principal component-based test. We also applied our GGG tests to data from the Atherosclerosis Risk in Communities study and found five gene-level interactions associated with the levels of total cholesterol and high-density lipoprotein cholesterol (HDL-C). One interaction between SMAD3 and NEDD9 on HDL-C was further replicated in an independent sample from the Multi-Ethnic Study of Atherosclerosis.
doi:10.1371/journal.pgen.1003321
PMCID: PMC3585009  PMID: 23468652
10.  Improved Statistics for Genome-Wide Interaction Analysis 
PLoS Genetics  2012;8(4):e1002625.
Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result.
Author Summary
Gene–gene interactions are a topic of great interest to geneticists carrying out studies of how genetic factors influence the development of common, complex diseases. Genes that interact may not only make important biological contributions to underlying disease processes, but also be more difficult to detect when using standard statistical methods in which we examine the effects of genetic factors one at a time. Recently a method was proposed by Wu and colleagues [1] for detecting pairwise interactions when carrying out genome-wide association studies (in which a large number of genetic variants across the genome are examined). Wu and colleagues carried out theoretical work and computer simulations that suggested their method outperformed other previously proposed approaches for detecting interactions. Here we show that, in fact, the method proposed by Wu and colleagues can result in an over-preponderence of false postive findings. We propose an adjusted version of their method that reduces the false positive rate while maintaining high power. We also propose a new method for detecting pairs of genetic effects that shows similarly high power but has some conceptual advantages over both Wu's method and also other previously proposed approaches.
doi:10.1371/journal.pgen.1002625
PMCID: PMC3320596  PMID: 22496670
11.  Analysis of multiple compound–protein interactions reveals novel bioactive molecules 
The authors use machine learning of compound-protein interactions to explore drug polypharmacology and to efficiently identify bioactive ligands, including novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein coupled receptors and protein kinases.
We have demonstrated that machine learning of multiple compound–protein interactions is useful for efficient ligand screening and for assessing drug polypharmacology.This approach successfully identified novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein-coupled receptors and protein kinases.These bioactive compounds were not detected by existing computational ligand-screening methods in comparative studies.The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. Perturbations of biological systems by chemical probes provide broader applications not only for analysis of complex systems but also for intentional manipulations of these systems. Nevertheless, the lack of well-characterized chemical modulators has limited their use. Recently, chemical genomics has emerged as a promising area of research applicable to the exploration of novel bioactive molecules, and researchers are currently striving toward the identification of all possible ligands for all target protein families (Wang et al, 2009). Chemical genomics studies have shown that patterns of compound–protein interactions (CPIs) are too diverse to be understood as simple one-to-one events. There is an urgent need to develop appropriate data mining methods for characterizing and visualizing the full complexity of interactions between chemical space and biological systems. However, no existing screening approach has so far succeeded in identifying novel bioactive compounds using multiple interactions among compounds and target proteins.
High-throughput screening (HTS) and computational screening have greatly aided in the identification of early lead compounds for drug discovery. However, the large number of assays required for HTS to identify drugs that target multiple proteins render this process very costly and time-consuming. Therefore, interest in using in silico strategies for screening has increased. The most common computational approaches, ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS; Oprea and Matter, 2004; Muegge and Oloff, 2006; McInnes, 2007; Figure 1A), have been used for practical drug development. LBVS aims to identify molecules that are very similar to known active molecules and generally has difficulty identifying compounds with novel structural scaffolds that differ from reference molecules. The other popular strategy, SBVS, is constrained by the number of three-dimensional crystallographic structures available. To circumvent these limitations, we have shown that a new computational screening strategy, chemical genomics-based virtual screening (CGBVS), has the potential to identify novel, scaffold-hopping compounds and assess their polypharmacology by using a machine-learning method to recognize conserved molecular patterns in comprehensive CPI data sets.
The CGBVS strategy used in this study was made up of five steps: CPI data collection, descriptor calculation, representation of interaction vectors, predictive model construction using training data sets, and predictions from test data (Figure 1A). Importantly, step 1, the construction of a data set of chemical structures and protein sequences for known CPIs, did not require the three-dimensional protein structures needed for SBVS. In step 2, compound structures and protein sequences were converted into numerical descriptors. These descriptors were used to construct chemical or biological spaces in which decreasing distance between vectors corresponded to increasing similarity of compound structures or protein sequences. In step 3, we represented multiple CPI patterns by concatenating these chemical and protein descriptors. Using these interaction vectors, we could quantify the similarity of molecular interactions for compound–protein pairs, despite the fact that the ligand and protein similarity maps differed substantially. In step 4, concatenated vectors for CPI pairs (positive samples) and non-interacting pairs (negative samples) were input into an established machine-learning method. In the final step, the classifier constructed using training sets was applied to test data.
To evaluate the predictive value of CGBVS, we first compared its performance with that of LBVS by fivefold cross-validation. CGBVS performed with considerably higher accuracy (91.9%) than did LBVS (84.4%; Figure 1B). We next compared CGBVS and SBVS in a retrospective virtual screening based on the human β2-adrenergic receptor (ADRB2). Figure 1C shows that CGBVS provided higher hit rates than did SBVS. These results suggest that CGBVS is more successful than conventional approaches for prediction of CPIs.
We then evaluated the ability of the CGBVS method to predict the polypharmacology of ADRB2 by attempting to identify novel ADRB2 ligands from a group of G-protein-coupled receptor (GPCR) ligands. We ranked the prediction scores for the interactions of 826 reported GPCR ligands with ADRB2 and then analyzed the 50 highest-ranked compounds in greater detail. Of 21 commercially available compounds, 11 showed ADRB2-binding activity and were not previously reported to be ADRB2 ligands. These compounds included ligands not only for aminergic receptors but also for neuropeptide Y-type 1 receptors (NPY1R), which have low protein homology to ADRB2. Most ligands we identified were not detected by LBVS and SBVS, which suggests that only CGBVS could identify this unexpected cross-reaction for a ligand developed as a target to a peptidergic receptor.
The true value of CGBVS in drug discovery must be tested by assessing whether this method can identify scaffold-hopping lead compounds from a set of compounds that is structurally more diverse. To assess this ability, we analyzed 11 500 commercially available compounds to predict compounds likely to bind to two GPCRs and two protein kinases. Functional assays revealed that nine ADRB2 ligands, three NPY1R ligands, five epidermal growth factor receptor (EGFR) inhibitors, and two cyclin-dependent kinase 2 (CDK2) inhibitors were concentrated in the top-ranked compounds (hit rate=30, 15, 25, and 10%, respectively). We also evaluated the extent of scaffold hopping achieved in the identification of these novel ligands. One ADRB2 ligand, two NPY1R ligands, and one CDK2 inhibitor exhibited scaffold hopping (Figure 4), indicating that CGBVS can use this characteristic to rationally predict novel lead compounds, a crucial and very difficult step in drug discovery. This feature of CGBVS is critically different from existing predictive methods, such as LBVS, which depend on similarities between test and reference ligands, and focus on a single protein or highly homologous proteins. In particular, CGBVS is useful for targets with undefined ligands because this method can use CPIs with target proteins that exhibit lower levels of homology.
In summary, we have demonstrated that data mining of multiple CPIs is of great practical value for exploration of chemical space. As a predictive model, CGBVS could provide an important step in the discovery of such multi-target drugs by identifying the group of proteins targeted by a particular ligand, leading to innovation in pharmaceutical research.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. For this purpose, the emerging field of chemical genomics is currently focused on accumulating large assay data sets describing compound–protein interactions (CPIs). Although new target proteins for known drugs have recently been identified through mining of CPI databases, using these resources to identify novel ligands remains unexplored. Herein, we demonstrate that machine learning of multiple CPIs can not only assess drug polypharmacology but can also efficiently identify novel bioactive scaffold-hopping compounds. Through a machine-learning technique that uses multiple CPIs, we have successfully identified novel lead compounds for two pharmaceutically important protein families, G-protein-coupled receptors and protein kinases. These novel compounds were not identified by existing computational ligand-screening methods in comparative studies. The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
doi:10.1038/msb.2011.5
PMCID: PMC3094066  PMID: 21364574
chemical genomics; data mining; drug discovery; ligand screening; systems chemical biology
12.  ENABLING PERSONAL GENOMICS WITH AN EXPLICIT TEST OF EPISTASIS 
One goal of personal genomics is to use information about genomic variation to predict who is at risk for various common diseases. Technological advances in genotyping have spawned several personal genetic testing services that market genotyping services directly to the consumer. An important goal of consumer genetic testing is to provide health information along with the genotyping results. This has the potential to integrate detailed personal genetic and genomic information into healthcare decision making. Despite the potential importance of these advances, there are some important limitations. One concern is that much of the literature that is used to formulate personal genetics reports is based on genetic association studies that consider each genetic variant independently of the others. It is our working hypothesis that the true value of personal genomics will only be realized when the complexity of the genotype-to-phenotype mapping relationship is embraced, rather than ignored. We focus here on complexity in genetic architecture due to epistasis or nonlinear gene-gene interaction. We have previously developed a multifactor dimensionality reduction (MDR) algorithm and software package for detecting nonlinear interactions in genetic association studies. In most prior MDR analyses, the permutation testing strategy used to assess statistical significance was unable to differentiate MDR models that captured only interaction effects from those that also detected independent main effects. Statistical interpretation of MDR models required post-hoc analysis using entropy-based measures of interaction information. We introduce here a novel permutation test that allows the effects of nonlinear interactions between multiple genetic variants to be specifically tested in a manner that is not confounded by linear additive effects. We show using data simulated across 35 different epistasis models with varying effect sizes (heritabilities = 0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4) and sample sizes (n = 400, 800, 1600) that the power to detect interactions using the explicit test of epistasis is no different than a standard permutation test. We also show that the test has the appropriate size or type I error rate of approximately 0.05. We then apply MDR with the new explicit test of epistasis to a large genetic study of bladder cancer (n=914) and show that a previously reported nonlinear interaction between two XPD gene polymorphisms is indeed significant (P = 0.005), even after considering the strong additive effect of smoking in the model. Finally, we evaluated the power of the explicit test of epistasis to detect the nonlinear interaction between two XPD gene polymorphisms by simulating data from the MDR model of bladder cancer susceptibility. We show that the power to detect the interaction alone was 1.00 while the power to detect the independent effect of smoking alone was 0.06 which is close to the expected type I error rate of 0.05. Importantly, the power to detect the interaction with smoking in the model was 0.94. The results of this study provide for the first time a simple method for explicitly testing epistasis or gene-gene interaction effects in genetic association studies. An important advantage of the method is that it can be combined with any modeling approach. The explicit test of epistasis brings us a step closer to the type of routine gene-gene interaction analysis that is needed if we are to enable personal genomics.
PMCID: PMC2916690  PMID: 19908385
13.  Network modeling of the transcriptional effects of copy number aberrations in glioblastoma 
DNA copy number aberrations (CNAs) are a characteristic feature of cancer genomes. In this work, Rebecka Jörnsten, Sven Nelander and colleagues combine network modeling and experimental methods to analyze the systems-level effects of CNAs in glioblastoma.
We introduce a modeling approach termed EPoC (Endogenous Perturbation analysis of Cancer), enabling the construction of global, gene-level models that causally connect gene copy number with expression in glioblastoma.On the basis of the resulting model, we predict genes that are likely to be disease-driving and validate selected predictions experimentally. We also demonstrate that further analysis of the network model by sparse singular value decomposition allows stratification of patients with glioblastoma into short-term and long-term survivors, introducing decomposed network models as a useful principle for biomarker discovery.Finally, in systematic comparisons, we demonstrate that EPoC is computationally efficient and yields more consistent results than mRNA-only methods, standard eQTL methods, and two recent multivariate methods for genotype–mRNA coupling.
Gains and losses of chromosomal material (DNA copy number aberrations; CNAs) are a characteristic feature of cancer genomes. At the level of a single locus, it is well known that increased copy number (gene amplification) typically leads to increased gene expression, whereas decreased copy number (gene deletion) leads to decreased gene expression (Pollack et al, 2002; Lee et al, 2008; Nilsson et al, 2008). However, CNAs also affect the expression of genes located outside the amplified/deleted region itself via indirect mechanisms. To fully understand the action of CNAs, it is therefore necessary to analyze their action in a network context. Toward this goal, improved computational approaches will be important, if not essential.
To determine the global effects on transcription of CNAs in the brain tumor glioblastoma, we develop EPoC (Endogenous Perturbation analysis of Cancer), a computational technique capable of inferring sparse, causal network models by combining genome-wide, paired CNA- and mRNA-level data. EPoC aims to detect disease-driving copy number aberrations and their effect on target mRNA expression, and stratify patients into long-term and short-term survivors. Technically, EPoC relates CNA perturbations to mRNA responses by matrix equations, derived from a steady-state approximation of the transcriptional network. Patient prognostic scores are obtained from singular value decompositions of the network matrix. The models are constructed by solving a large-scale, regularized regression problem.
We apply EPoC to glioblastoma data from The Cancer Genome Atlas (TCGA) consortium (186 patients). The identified CNA-driven network comprises 10 672 genes, and contains a number of copy number-altered genes that control multiple downstream genes. Highly connected hub genes include well-known oncogenes and tumor supressor genes that are frequently deleted or amplified in glioblastoma, including EGFR, PDGFRA, CDKN2A and CDKN2B, confirming a clear association between these aberrations and transcriptional variability of these brain tumors. In addition, we identify a number of hub genes that have previously not been associated with glioblastoma, including interferon alpha 1 (IFNA1), myeloid/lymphoid or mixed-lineage leukemia translocated to 10 (MLLT10, a well-known leukemia gene), glutamate decarboxylase 2 GAD2, a postulated glutamate receptor GPR158 and Necdin (NDN). Furthermore, we demonstrate that the network model contains useful information on downstream target genes (including stem cell regulators), and possible drug targets.
We proceed to explore the validity of a small network region experimentally. Introducing experimental perturbations of NDN and other targets in four glioblastoma cell lines (T98G, U-87MG, U-343MG and U-373MG), we confirm several predicted mechanisms. We also demonstrate that the TCGA glioblastoma patients can be stratified into long-term and short-term survivors, using our proposed prognostic scores derived from a singular vector decomposition of the network model. Finally, we compare EPoC to existing methods for mRNA networks analysis and expression quantitative locus methods, and demonstrate that EPoC produces more consistent models between technically independent glioblastoma data sets, and that the EPoC models exhibit better overlap with known protein–protein interaction networks and pathway maps.
In summary, we conclude that large-scale integrative modeling reveals mechanistically and prognostically informative networks in human glioblastoma. Our approach operates at the gene level and our data support that individual hub genes can be identified in practice. Very large aberrations, however, cannot be fully resolved by the current modeling strategy.
DNA copy number aberrations (CNAs) are a hallmark of cancer genomes. However, little is known about how such changes affect global gene expression. We develop a modeling framework, EPoC (Endogenous Perturbation analysis of Cancer), to (1) detect disease-driving CNAs and their effect on target mRNA expression, and to (2) stratify cancer patients into long- and short-term survivors. Our method constructs causal network models of gene expression by combining genome-wide DNA- and RNA-level data. Prognostic scores are obtained from a singular value decomposition of the networks. By applying EPoC to glioblastoma data from The Cancer Genome Atlas consortium, we demonstrate that the resulting network models contain known disease-relevant hub genes, reveal interesting candidate hubs, and uncover predictors of patient survival. Targeted validations in four glioblastoma cell lines support selected predictions, and implicate the p53-interacting protein Necdin in suppressing glioblastoma cell growth. We conclude that large-scale network modeling of the effects of CNAs on gene expression may provide insights into the biology of human cancer. Free software in MATLAB and R is provided.
doi:10.1038/msb.2011.17
PMCID: PMC3101951  PMID: 21525872
cancer biology; cancer genomics; glioblastoma
14.  Interactions between Non-Physician Clinicians and Industry: A Systematic Review 
PLoS Medicine  2013;10(11):e1001561.
In a systematic review of studies of interactions between non-physician clinicians and industry, Quinn Grundy and colleagues found that many of the issues identified for physicians' industry interactions exist for non-physician clinicians.
Please see later in the article for the Editors' Summary
Background
With increasing restrictions placed on physician–industry interactions, industry marketing may target other health professionals. Recent health policy developments confer even greater importance on the decision making of non-physician clinicians. The purpose of this systematic review is to examine the types and implications of non-physician clinician–industry interactions in clinical practice.
Methods and Findings
We searched MEDLINE and Web of Science from January 1, 1946, through June 24, 2013, according to PRISMA guidelines. Non-physician clinicians eligible for inclusion were: Registered Nurses, nurse prescribers, Physician Assistants, pharmacists, dieticians, and physical or occupational therapists; trainee samples were excluded. Fifteen studies met inclusion criteria. Data were synthesized qualitatively into eight outcome domains: nature and frequency of industry interactions; attitudes toward industry; perceived ethical acceptability of interactions; perceived marketing influence; perceived reliability of industry information; preparation for industry interactions; reactions to industry relations policy; and management of industry interactions. Non-physician clinicians reported interacting with the pharmaceutical and infant formula industries. Clinicians across disciplines met with pharmaceutical representatives regularly and relied on them for practice information. Clinicians frequently received industry “information,” attended sponsored “education,” and acted as distributors for similar materials targeted at patients. Clinicians generally regarded this as an ethical use of industry resources, and felt they could detect “promotion” while benefiting from industry “information.” Free samples were among the most approved and common ways that clinicians interacted with industry. Included studies were observational and of varying methodological rigor; thus, these findings may not be generalizable. This review is, however, the first to our knowledge to provide a descriptive analysis of this literature.
Conclusions
Non-physician clinicians' generally positive attitudes toward industry interactions, despite their recognition of issues related to bias, suggest that industry interactions are normalized in clinical practice across non-physician disciplines. Industry relations policy should address all disciplines and be implemented consistently in order to mitigate conflicts of interest and address such interactions' potential to affect patient care.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Making and selling health care goods (including drugs and devices) and services is big business. To maximize the profits they make for their shareholders, companies involved in health care build relationships with physicians by providing information on new drugs, organizing educational meetings, providing samples of their products, giving gifts, and holding sponsored events. These relationships help to keep physicians informed about new developments in health care but also create the potential for causing harm to patients and health care systems. These relationships may, for example, result in increased prescription rates of new, heavily marketed medications, which are often more expensive than their generic counterparts (similar unbranded drugs) and that are more likely to be recalled for safety reasons than long-established drugs. They may also affect the provision of health care services. Industry is providing an increasingly large proportion of routine health care services in many countries, so relationships built up with physicians have the potential to influence the commissioning of the services that are central to the treatment and well-being of patients.
Why Was This Study Done?
As a result of concerns about the tension between industry's need to make profits and the ethics underlying professional practice, restrictions are increasingly being placed on physician–industry interactions. In the US, for example, the Physician Payments Sunshine Act now requires US manufacturers of drugs, devices, and medical supplies that participate in federal health care programs to disclose all payments and gifts made to physicians and teaching hospitals. However, other health professionals, including those with authority to prescribe drugs such as pharmacists, Physician Assistants, and nurse practitioners are not covered by this legislation or by similar legislation in other settings, even though the restructuring of health care to prioritize primary care and multidisciplinary care models means that “non-physician clinicians” are becoming more numerous and more involved in decision-making and medication management. In this systematic review (a study that uses predefined criteria to identify all the research on a given topic), the researchers examine the nature and implications of the interactions between non-physician clinicians and industry.
What Did the Researchers Do and Find?
The researchers identified 15 published studies that examined interactions between non-physician clinicians (Registered Nurses, nurse prescribers, midwives, pharmacists, Physician Assistants, and dieticians) and industry (corporations that produce health care goods and services). They extracted the data from 16 publications (representing 15 different studies) and synthesized them qualitatively (combined the data and reached word-based, rather than numerical, conclusions) into eight outcome domains, including the nature and frequency of interactions, non-physician clinicians' attitudes toward industry, and the perceived ethical acceptability of interactions. In the research the authors identified, non-physician clinicians reported frequent interactions with the pharmaceutical and infant formula industries. Most non-physician clinicians met industry representatives regularly, received gifts and samples, and attended educational events or received educational materials (some of which they distributed to patients). In these studies, non-physician clinicians generally regarded these interactions positively and felt they were an ethical and appropriate use of industry resources. Only a minority of non-physician clinicians felt that marketing influenced their own practice, although a larger percentage felt that their colleagues would be influenced. A sizeable proportion of non-physician clinicians questioned the reliability of industry information, but most were confident that they could detect biased information and therefore rated this information as reliable, valuable, or useful.
What Do These Findings Mean?
These and other findings suggest that non-physician clinicians generally have positive attitudes toward industry interactions but recognize issues related to bias and conflict of interest. Because these findings are based on a small number of studies, most of which were undertaken in the US, they may not be generalizable to other countries. Moreover, they provide no quantitative assessment of the interaction between non-physician clinicians and industry and no information about whether industry interactions affect patient care outcomes. Nevertheless, these findings suggest that industry interactions are normalized (seen as standard) in clinical practice across non-physician disciplines. This normalization creates the potential for serious risks to patients and health care systems. The researchers suggest that it may be unrealistic to expect that non-physician clinicians can be taught individually how to interact with industry ethically or how to detect and avert bias, particularly given the ubiquitous nature of marketing and promotional materials. Instead, they suggest, the environment in which non-physician clinicians practice should be structured to mitigate the potentially harmful effects of interactions with industry.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001561.
This study is further discussed in a PLOS Medicine Perspective by James S. Yeh and Aaron S. Kesselheim
The American Medical Association provides guidance for physicians on interactions with pharmaceutical industry representatives, information about the Physician Payments Sunshine Act, and a toolkit for preparing Physician Payments Sunshine Act reports
The International Council of Nurses provides some guidance on industry interactions in its position statement on nurse-industry relations
The UK General Medical Council provides guidance on financial and commercial arrangements and conflicts of interest as part of its good medical practice website, which describes what is required of all registered doctors in the UK
Understanding and Responding to Pharmaceutical Promotion: A Practical Guide is a manual prepared by Health Action International and the World Health Organization that schools of medicine and pharmacy can use to train students how to recognize and respond to pharmaceutical promotion.
The Institute of Medicine's Report on Conflict of Interest in Medical Research, Education, and Practice recommends steps to identify, limit, and manage conflicts of interest
The University of California, San Francisco, Office of Continuing Medical Education offers a course called Marketing of Medicines
doi:10.1371/journal.pmed.1001561
PMCID: PMC3841103  PMID: 24302892
15.  Systematic Detection of Epistatic Interactions Based on Allele Pair Frequencies 
PLoS Genetics  2012;8(2):e1002463.
Epistatic genetic interactions are key for understanding the genetic contribution to complex traits. Epistasis is always defined with respect to some trait such as growth rate or fitness. Whereas most existing epistasis screens explicitly test for a trait, it is also possible to implicitly test for fitness traits by searching for the over- or under-representation of allele pairs in a given population. Such analysis of imbalanced allele pair frequencies of distant loci has not been exploited yet on a genome-wide scale, mostly due to statistical difficulties such as the multiple testing problem. We propose a new approach called Imbalanced Allele Pair frequencies (ImAP) for inferring epistatic interactions that is exclusively based on DNA sequence information. Our approach is based on genome-wide SNP data sampled from a population with known family structure. We make use of genotype information of parent-child trios and inspect 3×3 contingency tables for detecting pairs of alleles from different genomic positions that are over- or under-represented in the population. We also developed a simulation setup which mimics the pedigree structure by simultaneously assuming independence of the markers. When applied to mouse SNP data, our method detected 168 imbalanced allele pairs, which is substantially more than in simulations assuming no interactions. We could validate a significant number of the interactions with external data, and we found that interacting loci are enriched for genes involved in developmental processes.
Author Summary
Elucidating non-additive (epistatic) interactions between genes is crucial for understanding the molecular mechanisms of complex diseases. Even though high-throughput, systematic testing of genetic interactions is possible in simple model organisms, such screens have so far not been successful in mammals. Here, we propose a computational screening method that only requires genotype information of family trios for predicting genetic interactions. We tested our framework on a set of more than 2,000 heterozygous mice and found 168 imbalanced allele pairs, which is substantially more than expected by chance. We confirmed many of these interactions using data from recombinant inbred lines. The number of significant allele pair imbalances that we detected is surprisingly large and was not expected based on the published evidence. Our framework sets the stage for similar work in human trios.
doi:10.1371/journal.pgen.1002463
PMCID: PMC3276547  PMID: 22346757
16.  A Novel Statistic for Genome-Wide Interaction Analysis 
PLoS Genetics  2010;6(9):e1001131.
Although great progress in genome-wide association studies (GWAS) has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked). The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001
Author Summary
It is expected that genome-wide interaction analysis can be a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we develop a novel statistic for testing interaction between two loci (either linked or unlinked) and validate the null distribution and the type I error rates of the new statistic through simulations. By extensive power studies we show that the developed novel statistic has much higher power to detect interaction than the classical logistic regression. To provide evidence of gene–gene interactions as a possible source of the missing heritability unexplained by the current GWAS, we performed the genome-wide interaction analysis of psoriasis in two independent studies. The preliminary results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001
doi:10.1371/journal.pgen.1001131
PMCID: PMC2944798  PMID: 20885795
We present a novel Bayesian learning method that reconstructs large detailed gene networks from quantitative genetic interaction (GI) data.The method uses global reasoning to handle missing and ambiguous measurements, and provide confidence estimates for each prediction.Applied to a recent data set over genes relevant to protein folding, the learned networks reflect known biological pathways, including details such as pathway ordering and directionality of relationships.The reconstructed networks also suggest novel relationships, including the placement of SGT2 in the tail-anchored biogenesis pathway, a finding that we experimentally validated.
Recent developments have enabled large-scale quantitative measurement of genetic interactions (GIs) that report on the extent to which the activity of one gene is dependent on a second. It has long been recognized (Avery and Wasserman, 1992; Hartman et al, 2001; Segre et al, 2004; Tong et al, 2004; Drees et al, 2005; Schuldiner et al, 2005; St Onge et al, 2007; Costanzo et al, 2010) that functional dependencies revealed by GI data can provide rich information regarding underlying biological pathways. Further, the precise phenotypic measurements provided by quantitative GI data can provide evidence for even more detailed aspects of pathway structure, such as differentiating between full and partial dependence between two genes (Drees et al, 2005; Schuldiner et al, 2005; St Onge et al, 2007; Jonikas et al, 2009) (Figure 1A). As GI data sets become available for a range of quantitative phenotypes and organisms, such patterns will allow researchers to elucidate pathways important to a diverse set of biological processes.
We present a new method that exploits the high-quality, quantitative nature of recent GI assays to automatically reconstruct detailed multi-gene pathway structures, including the organization of a large set of genes into coherent pathways, the connectivity and ordering within each pathway, and the directionality of each relationship. We introduce activity pathway networks (APNs), which represent functional dependencies among a set of genes in the form of a network. We present an automatic method to efficiently reconstruct APNs over large sets of genes based on quantitative GI measurements. This method handles uncertainty in the data arising from noise, missing measurements, and data points with ambiguous interpretations, by performing global reasoning that combines evidence from multiple data points. In addition, because some structure choices remain uncertain even when jointly considering all measurements, our method maintains multiple likely networks, and allows computation of confidence estimates over each structure choice.
We applied our APN reconstruction method to the recent high-quality GI data set of Jonikas et al (2009), which examined the functional interaction between genes that contribute to protein folding in the ER. Specifically, Jonikas et al used the cell's endogenous sensor (the unfolded protein response), to first identify several hundred yeast genes with functions in endoplasmic reticulum folding and then systematically characterized their functional interdependencies by measuring unfolded protein response levels in double mutants. Our analysis produced an ensemble of 500 likelihood-weighted APNs over 178 genes (Figure 2).
We performed an aggregate evaluation of our results by comparing to known biological relationships between gene pairs, including participation in pathways according to the Kyoto Encyclopedia of Genes and Genomes (KEGG), correlation of chemical genomic profiles in a recent high-throughput assay (Hillenmeyer et al, 2008) and similarity of Gene Ontology (GO) annotations. In each evaluation performed, our reconstructed APNs were significantly more consistent with the known relationships than either the raw GI values or the Pearson correlation between profiles of GI values.
Importantly, our approach provides not only an improved means for defining pairs or groups of related genes, but also enables the identification of detailed multi-gene network structures. In many cases, our method successfully reconstructed known cellular pathways, including the ER-associated degradation (ERAD) pathway, and the biosynthesis of N-linked glycans, ranking them among the highest confidence structures. In-depth examination of the learned network structures indicates agreement with many known details of these pathways. In addition, quantitative analysis indicates that our learned APNs are indicative of ordering within KEGG-annotated biological pathways.
Our results also suggest several novel relationships, including placement of uncharacterized genes into pathways, and novel relationships between characterized genes. These include the dependence of the J domain chaperone JEM1 on the PDI homolog MPD1, dependence of the Ubiquitin-recycling enzyme DOA4 on N-linked glycosylation, and the dependence of the E3 Ubiquitin ligase DOA10 on the signal peptidase complex subunit SPC2. Our APNs also place the poorly characterized TPR-containing protein SGT2 upstream of the tail-anchored protein biogenesis machinery components GET3, GET4, and MDY2 (also known as GET5), suggesting that SGT2 has a function in the insertion of tail-anchored proteins into membranes. Consistent with this prediction, our experimental analysis shows that sgt2Δ cells show a defect in localization of the tail-anchored protein GFP-Sed5 from punctuate Golgi structures to a more diffuse pattern, as seen in other genes involved in this pathway.
Our results show that multi-gene, detailed pathway networks can be reconstructed from quantitative GI data, providing a concrete computational manifestation to intuitions that have traditionally accompanied the manual interpretation of such data. Ongoing technological developments in both genetics and imaging are enabling the measurement of GI data at a genome-wide scale, using high-accuracy quantitative phenotypes that relate to a range of particular biological functions. Methods based on RNAi will soon allow collection of similar data for human cell lines and other mammalian systems (Moffat et al, 2006). Thus, computational methods for analyzing GI data could have an important function in mapping pathways involved in complex biological systems including human cells.
High-throughput quantitative genetic interaction (GI) measurements provide detailed information regarding the structure of the underlying biological pathways by reporting on functional dependencies between genes. However, the analytical tools for fully exploiting such information lag behind the ability to collect these data. We present a novel Bayesian learning method that uses quantitative phenotypes of double knockout organisms to automatically reconstruct detailed pathway structures. We applied our method to a recent data set that measures GIs for endoplasmic reticulum (ER) genes, using the unfolded protein response as a quantitative phenotype. The results provided reconstructions of known functional pathways including N-linked glycosylation and ER-associated protein degradation. It also contained novel relationships, such as the placement of SGT2 in the tail-anchored biogenesis pathway, a finding that we experimentally validated. Our approach should be readily applicable to the next generation of quantitative GI data sets, as assays become available for additional phenotypes and eventually higher-level organisms.
doi:10.1038/msb.2010.27
PMCID: PMC2913392  PMID: 20531408
computational biology; genetic interaction; pathway reconstruction; probabilistic methods
BMC Bioinformatics  2009;10:424.
Background
Genomic alterations frequently occur in many cancer patients and play important mechanistic roles in the pathogenesis of cancer. Furthermore, they can modify the expression level of genes due to altered copy number in the corresponding region of the chromosome. An accumulating body of evidence supports the possibility that strong genome-wide correlation exists between DNA content and gene expression. Therefore, more comprehensive analysis is needed to quantify the relationship between genomic alteration and gene expression. A well-designed bioinformatics tool is essential to perform this kind of integrative analysis. A few programs have already been introduced for integrative analysis. However, there are many limitations in their performance of comprehensive integrated analysis using published software because of limitations in implemented algorithms and visualization modules.
Results
To address this issue, we have implemented the Java-based program CHESS to allow integrative analysis of two experimental data sets: genomic alteration and genome-wide expression profile. CHESS is composed of a genomic alteration analysis module and an integrative analysis module. The genomic alteration analysis module detects genomic alteration by applying a threshold based method or SW-ARRAY algorithm and investigates whether the detected alteration is phenotype specific or not. On the other hand, the integrative analysis module measures the genomic alteration's influence on gene expression. It is divided into two separate parts. The first part calculates overall correlation between comparative genomic hybridization ratio and gene expression level by applying following three statistical methods: simple linear regression, Spearman rank correlation and Pearson's correlation. In the second part, CHESS detects the genes that are differentially expressed according to the genomic alteration pattern with three alternative statistical approaches: Student's t-test, Fisher's exact test and Chi square test. By successive operations of two modules, users can clarify how gene expression levels are affected by the phenotype specific genomic alterations. As CHESS was developed in both Java application and web environments, it can be run on a web browser or a local machine. It also supports all experimental platforms if a properly formatted text file is provided to include the chromosomal position of probes and their gene identifiers.
Conclusions
CHESS is a user-friendly tool for investigating disease specific genomic alterations and quantitative relationships between those genomic alterations and genome-wide gene expression profiling.
doi:10.1186/1471-2105-10-424
PMCID: PMC2801522  PMID: 20003544
PLoS Genetics  2009;5(3):e1000432.
Evidence from human genetic studies of several disorders suggests that interactions between alleles at multiple genes play an important role in influencing phenotypic expression. Analytical methods for identifying Mendelian disease genes are not appropriate when applied to common multigenic diseases, because such methods investigate association with the phenotype only one genetic locus at a time. New strategies are needed that can capture the spectrum of genetic effects, from Mendelian to multifactorial epistasis. Random Forests (RF) and Relief-F are two powerful machine-learning methods that have been studied as filters for genetic case-control data due to their ability to account for the context of alleles at multiple genes when scoring the relevance of individual genetic variants to the phenotype. However, when variants interact strongly, the independence assumption of RF in the tree node-splitting criterion leads to diminished importance scores for relevant variants. Relief-F, on the other hand, was designed to detect strong interactions but is sensitive to large backgrounds of variants that are irrelevant to classification of the phenotype, which is an acute problem in genome-wide association studies. To overcome the weaknesses of these data mining approaches, we develop Evaporative Cooling (EC) feature selection, a flexible machine learning method that can integrate multiple importance scores while removing irrelevant genetic variants. To characterize detailed interactions, we construct a genetic-association interaction network (GAIN), whose edges quantify the synergy between variants with respect to the phenotype. We use simulation analysis to show that EC is able to identify a wide range of interaction effects in genetic association data. We apply the EC filter to a smallpox vaccine cohort study of single nucleotide polymorphisms (SNPs) and infer a GAIN for a collection of SNPs associated with adverse events. Our results suggest an important role for hubs in SNP disease susceptibility networks. The software is available at http://sites.google.com/site/McKinneyLab/software.
Author Summary
Susceptibility to many diseases and disorders is caused by breakdown at multiple points in the genetic network. Each of these points of breakdown by itself may have a very modest effect on disease risk but the points may have a much stronger effect through statistical interactions with each other. Genome-wide association studies provide the opportunity to identify alleles at multiple loci that interact to influence phenotypic variation in common diseases and disorders. However, if each SNP is tested for association as though it were independent of the rest of the genome, then the full advantage of the variation from markers across the genome will be unfulfilled. In this study, we illustrate the utility of a new approach to high-dimensional genetic association analysis that treats the collection of SNPs as interacting on a system level. This approach uses a machine-learning filter followed by an information theoretic and graph theoretic approach to infer a phenotype-specific network of interacting SNPs.
doi:10.1371/journal.pgen.1000432
PMCID: PMC2653647  PMID: 19300503
BMC Bioinformatics  2012;13(Suppl 9):S5.
Background
Because common complex diseases are affected by multiple genes and environmental factors, it is essential to investigate gene-gene and/or gene-environment interactions to understand genetic architecture of complex diseases. After the great success of large scale genome-wide association (GWA) studies using the high density single nucleotide polymorphism (SNP) chips, the study of gene-gene interaction becomes a next challenge. Multifactor dimensionality reduction (MDR) analysis has been widely used for the gene-gene interaction analysis. In practice, however, it is not easy to perform high order gene-gene interaction analyses via MDR in genome-wide level because it requires exploring a huge search space and suffers from a computational burden due to high dimensionality.
Results
We propose dimensional reduction analysis, Gene-MDR analysis for the fast and efficient high order gene-gene interaction analysis. The proposed Gene-MDR method is composed of two-step applications of MDR: within- and between-gene MDR analyses. First, within-gene MDR analysis summarizes each gene effect via MDR analysis by combining multiple SNPs from the same gene. Second, between-gene MDR analysis then performs interaction analysis using the summarized gene effects from within-gene MDR analysis. We apply the Gene-MDR method to bipolar disorder (BD) GWA data from Wellcome Trust Case Control Consortium (WTCCC). The results demonstrate that Gene-MDR is capable of detecting high order gene-gene interactions associated with BD.
Conclusion
By reducing the dimension of genome-wide data from SNP level to gene level, Gene-MDR efficiently identifies high order gene-gene interactions. Therefore, Gene-MDR can provide the key to understand complex disease etiology.
doi:10.1186/1471-2105-13-S9-S5
PMCID: PMC3372457  PMID: 22901090
Genetic epidemiology  2014;38(5):430-438.
Genome-wide association studies (GWAS) that draw samples from multiple studies with a mixture of relationship structures are becoming more common. Analytical methods exist for using mixed-sample data, but few methods have been proposed for the analysis of genotype-by-environment (G×E) interactions. Using GWAS data from a study of sarcoidosis susceptibility genes in related and unrelated African Americans, we explored the current analytic options for genotype association testing in studies using both unrelated and family-based designs. We propose a novel method—generalized least squares (GLX)—to estimate both SNP and G×E interaction effects for categorical environmental covariates and compared this method to generalized estimating equations (GEE), logistic regression, the Cochran–Armitage trend test, and the WQLS and MQLS methods. We used simulation to demonstrate that the GLX method reduces type I error under a variety of pedigree structures. We also demonstrate its superior power to detect SNP effects while offering computational advantages and comparable power to detect G×E interactions versus GEE. Using this method, we found two novel SNPs that demonstrate a significant genome-wide interaction with insecticide exposure—rs10499003 and rs7745248, located in the intronic and 3′ UTR regions of the FUT9 gene on chromosome 6q16.1.
doi:10.1002/gepi.21811
PMCID: PMC4112407  PMID: 24845555
GWAS; G×E; gene-by-environment; generalized least squares; mixed samples; sarcoidosis
BMC Genomics  2011;12:344.
Background
Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted.
Results
We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs.
Conclusion
This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulation-tool-bmc-ms9169818735220977/downloads/list.
doi:10.1186/1471-2164-12-344
PMCID: PMC3161015  PMID: 21729295
A protein interactome focused towards cell proliferation was mapped comprising 857 interactions among 393 proteins, leading to many new insights in plant cell cycle regulation.A comprehensive view on heterodimeric cyclin-dependent kinase (CDK)/cyclin complexes in plants is obtained, in relation with their regulators.Over 100 new candidate cell cycle proteins were predicted.
The basic underlying mechanisms that govern the cell cycle are conserved among all eukaryotes. Peculiar for plants, however, is that their genome contains a collection of cell cycle regulatory genes that is intriguingly large (Vandepoele et al, 2002; Menges et al, 2005) compared to other eukaryotes. Arabidopsis thaliana (Arabidopsis) encodes 71 genes in five regulatory classes versus only 15 in yeast and 23 in human.
Despite the discovery of numerous cell cycle genes, little is known about the protein complex machinery that steers plant cell division. Therefore, we applied tandem affinity purification (TAP) approach coupled with mass spectrometry (MS) on Arabidopsis cell suspension cultures to isolate and analyze protein complexes involved in the cell cycle. This approach allowed us to successfully map a first draft of the basic cell cycle complex machinery of Arabidopsis, providing many new insights into plant cell division.
To map the interactome, we relied on a streamlined platform comprising generic Gateway-based vectors with high cloning flexibility, the fast generation of transgenic suspension cultures, TAP adapted for plant cells, and matrix-assisted laser desorption ionization (MALDI) tandem-MS for the identification of purified proteins (Van Leene et al, 2007, 2008Van Leene et al, 2007, 2008). Complexes for 102 cell cycle proteins were analyzed using this approach, leading to a non-redundant data set of 857 interactions among 393 proteins (Figure 1A). Two subspaces were identified in this data set, domain I1, containing interactions confirmed in at least two independent experimental repeats or in the reciprocal purification experiment, and domain I2 consisting of uniquely observed interactions.
Several observations underlined the quality of both domains. All tested reverse purifications found the original interaction, and 150 known or predicted interactions were confirmed, meaning that also a huge stack of new interactions was revealed. An in-depth computational analysis revealed enrichment for many cell cycle-related features among the proteins of the network (Figure 1B), and many protein pairs were coregulated at the transcriptional level (Figure 1C). Through integration of known cell cycle-related features, more than 100 new candidate cell cycle proteins were predicted (Figure 1D). Besides common qualities of both interactome domains, their real significance appeared through mutual differences exposing two subspaces in the cell cycle interactome: a central regulatory network of stable complexes that are repeatedly isolated and represent core regulatory units, and a peripheral network comprising transient interactions identified less frequently, which are involved in other aspects of the process, such as crosstalk between core complexes or connections with other pathways. To evaluate the biological relevance of the cell cycle interactome in plants, we validated interactions from both domains by a transient split-luciferase assay in Arabidopsis plants (Marion et al, 2008), further sustaining the hypothesis-generating power of the data set to understand plant growth.
With respect to insights into the cell cycle physiology, the interactome was subdivided according to the functional classes of the baits and core protein complexes were extracted, covering cyclin-dependent kinase (CDK)/cyclin core complexes together with their positive and negative regulation networks, DNA replication complexes, the anaphase-promoting complex, and spindle checkpoint complexes. The data imply that mitotic A- and B-type cyclins exclusively form heterodimeric complexes with the plant-specific B-type CDKs and not with CDKA;1, whereas D-type cyclins seem to associate with CDKA;1. Besides the extraction of complexes previously shown in other organisms, our data also suggested many new functional links; for example, the link coupling cell division with the regulation of transcript splicing. The association of negative regulators of CDK/cyclin complexes with transcription factors suggests that their role in reallocation is not solely targeted to CDK/cyclin complexes. New members of the Siamese-related inhibitory proteins were identified, and for the first time potential inhibitors of plant-specific mitotic B-type CDKs have been found in plants. New evidence that the E2F–DP–RBR network is not only active at G1-to-S, but also at the G2-to-M transition is provided and many complexes involved in DNA replication or repair were isolated. For the first time, a plant APC has been isolated biochemically, identifying three potential new plant-specific APC interactors, and finally, complexes involved in the spindle checkpoint were isolated mapping many new but specific interactions.
Finally, to get a general view on the complex machinery, modules of interacting cyclins and core cell cycle regulators were ranked along the cell cycle phases according to the transcript expression peak of the cyclins, showing an assorted set of CDK–cyclin complexes with high regulatory differentiation (Figure 4). Even within the same subfamily (e.g. cyclin A3, B1, B2, D3, and D4), cyclins differ not only in their functional time frame but also in the type and number of CDKs, inhibitors, and scaffolding proteins they bind, further indicating their functional diversification. According to our interaction data, at least 92 different variants of CDK–cyclin complexes are found in Arabidopsis.
In conclusion, these results reflect how several rounds of gene duplication (Sterck et al, 2007) led to the evolution of a large set of cyclin paralogs and a myriad of regulators, resulting in a significant jump in the complexity of the cell cycle machinery that could accommodate unique plant-specific features such as an indeterminate mode of postembryonic development. Through their extensive regulation and connection with a myriad of up- and downstream pathways, the core cell cycle complexes might offer the plant a flexible toolkit to fine-tune cell proliferation in response to an ever-changing environment.
Cell proliferation is the main driving force for plant growth. Although genome sequence analysis revealed a high number of cell cycle genes in plants, little is known about the molecular complexes steering cell division. In a targeted proteomics approach, we mapped the core complex machinery at the heart of the Arabidopsis thaliana cell cycle control. Besides a central regulatory network of core complexes, we distinguished a peripheral network that links the core machinery to up- and downstream pathways. Over 100 new candidate cell cycle proteins were predicted and an in-depth biological interpretation demonstrated the hypothesis-generating power of the interaction data. The data set provided a comprehensive view on heterodimeric cyclin-dependent kinase (CDK)–cyclin complexes in plants. For the first time, inhibitory proteins of plant-specific B-type CDKs were discovered and the anaphase-promoting complex was characterized and extended. Important conclusions were that mitotic A- and B-type cyclins form complexes with the plant-specific B-type CDKs and not with CDKA;1, and that D-type cyclins and S-phase-specific A-type cyclins seem to be associated exclusively with CDKA;1. Furthermore, we could show that plants have evolved a combinatorial toolkit consisting of at least 92 different CDK–cyclin complex variants, which strongly underscores the functional diversification among the large family of cyclins and reflects the pivotal role of cell cycle regulation in the developmental plasticity of plants.
doi:10.1038/msb.2010.53
PMCID: PMC2950081  PMID: 20706207
Arabidopsis thaliana; cell cycle; interactome; protein complex; protein interactions
BMC Proceedings  2014;8(Suppl 1):S30.
Identifying genetic variants associated with complex diseases is an important task in genetic research. Although association studies based on unrelated individuals (ie, case-control genome-wide association studies) have successfully identified common single-nucleotide polymorphisms for many complex diseases, these studies are not so likely to identify rare genetic variants. In contrast, family-based association studies are particularly useful for identifying rare-variant associations. Recently, there has been some interest in employing multilevel models in family-based genetic association studies. However, the performance of such models in these studies, especially for longitudinal family-based sequence data, has not been fully investigated. Therefore, in this study, we investigated the performance of the multilevel model in the family-based genetic association analysis and compared it with the conventional family-based association test, by examining the powers and type I error rates of the 2 approaches using 3 data sets from the Genetic Analysis Workshop 18 simulated data: genome-wide association single-nucleotide polymorphism data, sequence data, and rare-variants-only data. Compared with the univariate family-based association test, the multilevel model had slightly higher power to identify most of the causal genetic variants using the genome-wide association single-nucleotide polymorphism data and sequence data. However, both approaches had low power to identify most of the causal single-nucleotide polymorphisms, especially those among the relatively rare genetic variants. Therefore, we suggest a unified method that combines both approaches and incorporates collapsing strategy, which may be more powerful than either approach alone for studying genetic associations using family-based data.
doi:10.1186/1753-6561-8-S1-S30
PMCID: PMC4143633  PMID: 25519380
PLoS Medicine  2011;8(10):e1001106.
Ron Do and colleagues find that a prudent diet high in raw vegetables may modify the increased genetic risk of cardiovascular disease conferred by the chromosome 9p21 SNP.
Background
One of the most robust genetic associations for cardiovascular disease (CVD) is the Chromosome 9p21 region. However, the interaction of this locus with environmental factors has not been extensively explored. We investigated the association of 9p21 with myocardial infarction (MI) in individuals of different ethnicities, and tested for an interaction with environmental factors.
Methods and Findings
We genotyped four 9p21 SNPs in 8,114 individuals from the global INTERHEART study. All four variants were associated with MI, with odds ratios (ORs) of 1.18 to 1.20 (1.85×10−8≤p≤5.21×10−7). A significant interaction (p = 4.0×10−4) was observed between rs2383206 and a factor-analysis-derived “prudent” diet pattern score, for which a major component was raw vegetables. An effect of 9p21 on MI was observed in the group with a low prudent diet score (OR = 1.32, p = 6.82×10−7), but the effect was diminished in a step-wise fashion in the medium (OR = 1.17, p = 4.9×10−3) and high prudent diet scoring groups (OR = 1.02, p = 0.68) (p = 0.014 for difference). We also analyzed data from 19,129 individuals (including 1,014 incident cases of CVD) from the prospective FINRISK study, which used a closely related dietary variable. In this analysis, the 9p21 risk allele demonstrated a larger effect on CVD risk in the groups with diets low or average for fresh vegetables, fruits, and berries (hazard ratio [HR] = 1.22, p = 3.0×10−4, and HR = 1.35, p = 4.1×10−3, respectively) compared to the group with high consumption of these foods (HR = 0.96, p = 0.73) (p = 0.0011 for difference). The combination of the least prudent diet and two copies of the risk allele was associated with a 2-fold increase in risk for MI (OR = 1.98, p = 2.11×10−9) in the INTERHEART study and a 1.66-fold increase in risk for CVD in the FINRISK study (HR = 1.66, p = 0.0026).
Conclusions
The risk of MI and CVD conferred by Chromosome 9p21 SNPs appears to be modified by a prudent diet high in raw vegetables and fruits.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Cardiovascular diseases (CVDs)—diseases that affect the heart and/or the blood vessels—are a leading cause of illness and death worldwide. In the United States, for example, the leading cause of death is coronary heart disease, a CVD in which narrowing of the heart's blood vessels by fatty deposits slows the blood supply to the heart and may eventually cause a heart attack (myocardial infarction, or MI); the third leading cause of death in the US is stroke, a CVD in which the brain's blood supply is interrupted. Environmental factors such as diet, physical activity, and smoking alter a person's risk of developing CVD. In addition, certain genetic variants (alterations in the DNA that forms the body's blueprint; DNA is packed into structures called chromosomes) alter the risk of developing CVD and are passed from parent to child. Thus, in CVD, as in most common diseases, both genetics and the environment play a role.
Why Was This Study Done?
Recent studies have identified several genetic variants that are associated with an increased risk of developing CVD. One of the most robust of these genetic associations is a cluster of single nucleotide polymorphisms (SNPs, differences in a single DNA building block) in a chromosomal region (locus) called 9p21. So far, this association has been mainly studied in European populations. Moreover, the interaction of this locus with environmental factors has not been extensively studied. A better understanding of how 9p21 variants affect CVD risk in people of different ethnicities and of the interaction between this locus and environmental factors could allow the development of targeted strategies for the prevention of CVD. In this study, the researchers investigate the association of 9p21 risk variants with CVD in people of different ethnicities and test for an interaction between this locus and environmental factors.
What Did the Researchers Do and Find?
The researchers assessed four 9p21 SNPs in people enrolled in the INTERHEART study, a global retrospective case-control study that investigated potential MI risk factors by comparing people who had had an acute non-fatal MI with similar people without heart disease. All four SNP risk variants increased the risk of MI by about a fifth. However, the effect of the SNPs on MI was influenced by the “prudent” diet pattern score of the INTERHEART participants, a score that includes fresh fruit and vegetable intake as recorded in food frequency questionnaires. That is, the risk of MI in people carrying SNP risk variants was influenced by their diet. The strongest interaction was seen with an SNP called rs2383206, but although rs2383206 carriers who ate a diet poor in fruits and vegetables had a higher risk of MI than people with a similar diet who did not carry this SNP, rs2383206 carriers and non-carriers who ate a fruit- and vegetable-rich diet had a comparable MI risk. Overall, the combination of the least “prudent” diet and two copies of the risk variant (human cells contain two complete sets of chromosomes) was associated with a two-fold increase in risk for MI in the INTERHEART study. Additionally, data collected in the FINRISK study, which characterized healthy individuals living in Finland at baseline and then followed them to see whether they developed CVD, revealed a similar interaction between diet and 9p21 SNPs.
What Do These Findings Mean?
These findings suggest that the risk of CVD conferred by chromosome 9p21 SNPs may be influenced by diet in multiple ethnic groups. Importantly, they suggest that the deleterious effect of 9p21 SNPs on CVD might be mitigated by consuming a diet rich in fresh fruits and vegetables. The accuracy of these findings may be affected by recall bias in the INTERHEART study (that is, some people may not have remembered their diet accurately) and by the small number of CVD cases in the FINRISK study. Nevertheless, these findings suggest that gene–environment interactions are important drivers of CVD, and they raise the possibility that a sound diet can mediate the effects of 9p21 SNPs.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001106.
The American Heart Association provides information about many types of cardiovascular disease for patients, caregivers, and professionals and tips on keeping the heart healthy
The UK National Health Service Choices website provides information about cardiovascular disease and stroke
Information is available from the British Heart Foundation on heart disease and keeping the heart healthy
The US National Heart Lung and Blood Institute provides information on a wide range of cardiovascular diseases
MedlinePlus provides links to many other sources of information on heart diseases, vascular diseases, and stroke (in English and Spanish)
The US Centers for Disease Control and Prevention has a simple fact sheet on gene-environment interactions; the US National Institute of Environmental Health Sciences provides links to other information on gene-environment interactions
More information is available on the INTERHEART study and on the FINRISK study
doi:10.1371/journal.pmed.1001106
PMCID: PMC3191151  PMID: 22022235

Results 1-25 (2057992)