1.  A Comparison of Multivariate Genome-Wide Association Methods 
PLoS ONE  2014;9(4):e95923.
Joint association analysis of multiple traits in a genome-wide association study (GWAS), i.e. a multivariate GWAS, offers several advantages over analyzing each trait in a separate GWAS. In this study we directly compared a number of multivariate GWAS methods using simulated data. We focused on six methods that are implemented in the software packages PLINK, SNPTEST, MultiPhen, BIMBAM, PCHAT and TATES, and also compared them to standard univariate GWAS, analysis of the first principal component of the traits, and meta-analysis of univariate results. We simulated data (N = 1000) for three quantitative traits and one bi-allelic quantitative trait locus (QTL), and varied the number of traits associated with the QTL (explained variance 0.1%), minor allele frequency of the QTL, residual correlation between the traits, and the sign of the correlation induced by the QTL relative to the residual correlation. We compared the power of the methods using empirically fixed significance thresholds (α = 0.05). Our results showed that the multivariate methods implemented in PLINK, SNPTEST, MultiPhen and BIMBAM performed best for the majority of the tested scenarios, with a notable increase in power for scenarios with an opposite sign of genetic and residual correlation. All multivariate analyses resulted in a higher power than univariate analyses, even when only one of the traits was associated with the QTL. Hence, use of multivariate GWAS methods can be recommended, even when genetic correlations between traits are weak.
PMCID: PMC3999149  PMID: 24763738
2.  Molecular Reclassification of Crohn’s Disease: A Cautionary Note on Population Stratification 
PLoS ONE  2013;8(10):e77720.
Complex human diseases commonly differ in their phenotypic characteristics, e.g., Crohn’s disease (CD) patients are heterogeneous with regard to disease location and disease extent. The genetic susceptibility to Crohn’s disease is widely acknowledged and has been demonstrated by identification of over 100 CD associated genetic loci. However, relating CD subphenotypes to disease susceptible loci has proven to be a difficult task. In this paper we discuss the use of cluster analysis on genetic markers to identify genetic-based subgroups while taking into account possible confounding by population stratification. We show that it is highly relevant to consider the confounding nature of population stratification in order to avoid that detected clusters are strongly related to population groups instead of disease-specific groups. Therefore, we explain the use of principal components to correct for population stratification while clustering affected individuals into genetic-based subgroups. The principal components are obtained using 30 ancestry informative markers (AIM), and the first two PCs are determined to discriminate between continental origins of the affected individuals. Genotypes on 51 CD associated single nucleotide polymorphisms (SNPs) are used to perform latent class analysis, hierarchical and Partitioning Around Medoids (PAM) cluster analysis within a sample of affected individuals with and without the use of principal components to adjust for population stratification. It is seen that without correction for population stratification clusters seem to be influenced by population stratification while with correction clusters are unrelated to continental origin of individuals.
PMCID: PMC3798408  PMID: 24147066
3.  Challenges and Opportunities in Genome-Wide Environmental Interaction (GWEI) studies 
Human genetics  2012;131(10):1591-1613.
The interest in performing gene-environment interaction studies has seen a significant increase with the increase of advanced molecular genetics techniques. Practically, it became possible to investigate the role of environmental factors in disease risk and hence to investigate their role as genetic effect modifiers. The understanding that genetics is important in the uptake and metabolism of toxic substances is an example of how genetic profiles can modify important environmental risk factors to disease. Several rationales exist to set up gene-environment interaction studies and the technical challenges related to these studies – when the number of environmental or genetic risk factors is relatively small – has been described before.
In the post-genomic era, it is now possible to study thousands of genes and their interaction with the environment. This brings along a whole range of new challenges and opportunities. Despite a continuing effort in developing efficient methods and optimal bioinformatics infrastructures to deal with the available wealth of data, the challenge remains how to best present and analyze Genome-Wide Environmental Interaction (GWEI) studies involving multiple genetic and environmental factors. Since GWEIs are performed at the intersection of statistical genetics, bioinformatics and epidemiology, usually similar problems need to be dealt with as for Genome-Wide Association gene-gene Interaction (GWAI) studies. However, additional complexities need to be considered which are typical for large-scale epidemiological studies, but are also related to “joining” two heterogeneous types of data in explaining complex disease trait variation or for prediction purposes.
PMCID: PMC3677711  PMID: 22760307
Genome-wide association studies; gene-environment interaction; post-GWAS analysis; association tests; exploratory methods
4.  A family-based association test to detect gene–gene interactions in the presence of linkage 
For many complex diseases, quantitative traits contain more information than dichotomous traits. One of the approaches used to analyse these traits in family-based association studies is the quantitative transmission disequilibrium test (QTDT). The QTDT is a regression-based approach that models simultaneously linkage and association. It splits up the association effect in a between- and a within-family genetic component to adjust and test for population stratification and includes a variance components method to model linkage. We extend this approach to detect gene–gene interactions between two unlinked QTLs by adjusting the definition of the between- and within-family component and the variance components included in the model. We simulate data to investigate the influence of the epistasis model, linkage disequilibrium patterns between the markers and the QTLs, and allele frequencies on the power and type I error rates of the approach. Results show that for some of the investigated settings, power gains are obtained in comparison with FAM-MDR. We conclude that our approach shows promising results for candidate-gene studies where too few markers are available to correct for population stratification using standard methods (for example EIGENSTRAT). The proposed method is applied to real-life data on hypertension from the FLEMENGHO study.
PMCID: PMC3421128  PMID: 22419171
QTDT; epistasis; association; linkage
5.  Unique Gene Expression and MR T2 Relaxometry Patterns Define Chronic Murine Dextran Sodium Sulphate Colitis as a Model for Connective Tissue Changes in Human Crohn’s Disease 
PLoS ONE  2013;8(7):e68876.
Chronically relapsing inflammation, tissue remodeling and fibrosis are hallmarks of inflammatory bowel diseases. The aim of this study was to investigate changes in connective tissue in a chronic murine model resulting from repeated cycles of dextran sodium sulphate (DSS) ingestion, to mimic the relapsing nature of the human disease.
Materials and Methods
C57BL/6 mice were exposed to DSS in drinking water for 1 week, followed by a recovery phase of 2 weeks. This cycle of exposure was repeated for up to 3 times (9 weeks in total). Colonic inflammation, fibrosis, extracellular matrix proteins and colonic gene expression were studied. In vivo MRI T2 relaxometry was studied as a potential non-invasive imaging tool to evaluate bowel wall inflammation and fibrosis.
Repeated cycles of DSS resulted in a relapsing and remitting disease course, which induced a chronic segmental, transmural colitis after 2 and 3 cycles of DSS with clear induction of fibrosis and remodeling of the muscular layer. Tenascin expression mirrored its expression in Crohn’s colitis. Microarray data identified a gene expression profile different in chronic colitis from that in acute colitis. Additional recovery was associated with upregulation of unique genes, in particular keratins, pointing to activation of molecular pathways for healing and repair. In vivo MRI T2 relaxometry of the colon showed a clear shift towards higher T2 values in the acute stage and a gradual regression of T2 values with increasing cycles of DSS.
Repeated cycles of DSS exposure induce fibrosis and connective tissue changes with typical features, as occurring in Crohn’s disease. Colonic gene expression analysis revealed unique expression profiles in chronic colitis compared to acute colitis and after additional recovery, pointing to potential new targets to intervene with the induction of fibrosis. In vivo T2 relaxometry is a promising non-invasive assessment of inflammation and fibrosis.
PMCID: PMC3720888  PMID: 23894361
6.  A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection 
BioData Mining  2013;6:9.
Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects.
Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student’s t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student’s t-test for association, as well as a novel MB-MDR implementation based on Welch’s t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling.
Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch’s t-tests are generally lower than those for MB-MDR with Student’s t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations.
When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student’s t-tests as internal tests for association.
PMCID: PMC3668290  PMID: 23618370
Model-based multifactor dimensionality reduction; Epistasis; Model violations; Data transformation
7.  Does Pet Ownership in Infancy Lead to Asthma or Allergy at School Age? Pooled Analysis of Individual Participant Data from 11 European Birth Cohorts 
PLoS ONE  2012;7(8):e43214.
To examine the associations between pet keeping in early childhood and asthma and allergies in children aged 6–10 years.
Pooled analysis of individual participant data of 11 prospective European birth cohorts that recruited a total of over 22,000 children in the 1990s.
Exposure definition
Ownership of only cats, dogs, birds, rodents, or cats/dogs combined during the first 2 years of life.
Outcome definition
Current asthma (primary outcome), allergic asthma, allergic rhinitis and allergic sensitization during 6–10 years of age.
Data synthesis
Three-step approach: (i) Common definition of outcome and exposure variables across cohorts; (ii) calculation of adjusted effect estimates for each cohort; (iii) pooling of effect estimates by using random effects meta-analysis models.
We found no association between furry and feathered pet keeping early in life and asthma in school age. For example, the odds ratio for asthma comparing cat ownership with “no pets” (10 studies, 11489 participants) was 1.00 (95% confidence interval 0.78 to 1.28) (I2 = 9%; p = 0.36). The odds ratio for asthma comparing dog ownership with “no pets” (9 studies, 11433 participants) was 0.77 (0.58 to 1.03) (I2 = 0%, p = 0.89). Owning both cat(s) and dog(s) compared to “no pets” resulted in an odds ratio of 1.04 (0.59 to 1.84) (I2 = 33%, p = 0.18). Similarly, for allergic asthma and for allergic rhinitis we did not find associations regarding any type of pet ownership early in life. However, we found some evidence for an association between ownership of furry pets during the first 2 years of life and reduced likelihood of becoming sensitized to aero-allergens.
Pet ownership in early life did not appear to either increase or reduce the risk of asthma or allergic rhinitis symptoms in children aged 6–10. Advice from health care practitioners to avoid or to specifically acquire pets for primary prevention of asthma or allergic rhinitis in children should not be given.
PMCID: PMC3430634  PMID: 22952649
8.  Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data 
Detecting gene–gene interactions or epistasis in studies of human complex diseases is a big challenge in the area of epidemiology. To address this problem, several methods have been developed, mainly in the context of data dimensionality reduction. One of these methods, Model-Based Multifactor Dimensionality Reduction, has so far mainly been applied to case–control studies. In this study, we evaluate the power of Model-Based Multifactor Dimensionality Reduction for quantitative traits to detect gene–gene interactions (epistasis) in the presence of error-free and noisy data. Considered sources of error are genotyping errors, missing genotypes, phenotypic mixtures and genetic heterogeneity. Our simulation study encompasses a variety of settings with varying minor allele frequencies and genetic variance for different epistasis models. On each simulated data, we have performed Model-Based Multifactor Dimensionality Reduction in two ways: with and without adjustment for main effects of (known) functional SNPs. In line with binary trait counterparts, our simulations show that the power is lowest in the presence of phenotypic mixtures or genetic heterogeneity compared to scenarios with missing genotypes or genotyping errors. In addition, empirical power estimates reduce even further with main effects corrections, but at the same time, false-positive percentages are reduced as well. In conclusion, phenotypic mixtures and genetic heterogeneity remain challenging for epistasis detection, and careful thought must be given to the way important lower-order effects are accounted for in the analysis.
PMCID: PMC3110049  PMID: 21407267
Model-Based Multifactor Dimensionality Reduction; gene–gene interactions; quantitative traits; complex diseases; noisy data
9.  Comparison of genetic association strategies in the presence of rare alleles 
BMC Proceedings  2011;5(Suppl 9):S32.
In the quest for the missing heritability of most complex diseases, rare variants have received increased attention. Advances in large-scale sequencing have led to a shift from the common disease/common variant hypothesis to the common disease/rare variant hypothesis or have at least reopened the debate about the relevance and importance of rare variants for gene discoveries. The investigation of modeling and testing approaches to identify significant disease/rare variant associations is in full motion. New methods to better deal with parameter estimation instabilities, convergence problems, or multiple testing corrections in the presence of rare variants or effect modifiers of rare variants are in their infancy. Using a recently developed semiparametric strategy to detect causal variants, we investigate the performance of the model-based multifactor dimensionality reduction (MB-MDR) technique in terms of power and family-wise error rate (FWER) control in the presence of rare variants, using population-based and family-based data (FAM-MDR). We compare family-based results obtained from MB-MDR analyses to screening findings from a quantitative trait Pedigree-based association test (PBAT). Population-based data were further examined using penalized regression models. We restrict attention to all available single-nucleotide polymorphisms on chromosome 4 and consider Q1 as the outcome of interest. The considered family-based methods identified marker C4S4935 in the VEGFC gene with estimated power not exceeding 0.35 (FAM-MDR), when FWER was kept under control. The considered population-based methods gave rise to highly inflated FWERs (up to 90% for PBAT screening).
PMCID: PMC3287868  PMID: 22373505
10.  Mucosal Gene Expression of Antimicrobial Peptides in Inflammatory Bowel Disease Before and After First Infliximab Treatment 
PLoS ONE  2009;4(11):e7984.
Antimicrobial peptides (AMPs) protect the host intestinal mucosa against microorganisms. Abnormal expression of defensins was shown in inflammatory bowel disease (IBD), but it is not clear whether this is a primary defect. We investigated the impact of anti-inflammatory therapy with infliximab on the mucosal gene expression of AMPs in IBD.
Methodology/Principal Findings
Mucosal gene expression of 81 AMPs was assessed in 61 IBD patients before and 4–6 weeks after their first infliximab infusion and in 12 control patients, using Affymetrix arrays. Quantitative real-time reverse-transcription PCR and immunohistochemistry were used to confirm microarray data. The dysregulation of many AMPs in colonic IBD in comparison with control colons was widely restored by infliximab therapy, and only DEFB1 expression remained significantly decreased after therapy in the colonic mucosa of IBD responders to infliximab. In ileal Crohn's disease (CD), expression of two neuropeptides with antimicrobial activity, PYY and CHGB, was significantly decreased before therapy compared to control ileums, and ileal PYY expression remained significantly decreased after therapy in CD responders. Expression of the downregulated AMPs before and after treatment (DEFB1 and PYY) correlated with villin 1 expression, a gut epithelial cell marker, indicating that the decrease is a consequence of epithelial damage.
Our study shows that the dysregulation of AMPs in IBD mucosa is the consequence of inflammation, but may be responsible for perpetuation of inflammation due to ineffective clearance of microorganisms.
PMCID: PMC2776509  PMID: 19956723
11.  Importin-13 genetic variation is associated with improved airway responsiveness in childhood asthma 
Respiratory Research  2009;10(1):67.
Glucocorticoid function is dependent on efficient translocation of the glucocorticoid receptor (GR) from the cytoplasm to the nucleus of cells. Importin-13 (IPO13) is a nuclear transport receptor that mediates nuclear entry of GR. In airway epithelial cells, inhibition of IPO13 expression prevents nuclear entry of GR and abrogates anti-inflammatory effects of glucocorticoids. Impaired nuclear entry of GR has been documented in steroid-non-responsive asthmatics. We hypothesize that common IPO13 genetic variation influences the anti-inflammatory effects of inhaled corticosteroids for the treatment of asthma, as measured by change in methacholine airway hyperresponsiveness (AHR-PC20).
10 polymorphisms were evaluated in 654 children with mild-to-moderate asthma participating in the Childhood Asthma Management Program (CAMP), a clinical trial of inhaled anti-inflammatory medications (budesonide and nedocromil). Population-based association tests with repeated measures of PC20 were performed using mixed models and confirmed using family-based tests of association.
Among participants randomized to placebo or nedocromil, IPO13 polymorphisms were associated with improved PC20 (i.e. less AHR), with subjects harboring minor alleles demonstrating an average 1.51–2.17 fold increase in mean PC20 at 8-months post-randomization that persisted over four years of observation (p = 0.01–0.005). This improvement was similar to that among children treated with long-term inhaled corticosteroids. There was no additional improvement in PC20 by IPO13 variants among children treated with inhaled corticosteroids.
IPO13 variation is associated with improved AHR in asthmatic children. The degree of this improvement is similar to that observed with long-term inhaled corticosteroid treatment, suggesting that IPO13 variation may improve nuclear bioavailability of endogenous glucocorticoids.
PMCID: PMC2724419  PMID: 19619331
12.  Paternal History of Asthma and Airway Responsiveness in Children with Asthma 
Rationale: Little is known regarding the relationship between parental history of asthma and subsequent airway hyperresponsiveness (AHR) in children with asthma. Objectives: We evaluated this relationship in 1,041 children with asthma participating in a randomized trial of antiinflammatory medications (the Childhood Asthma Management Program [CAMP]). Methods: Methacholine challenge testing was performed before treatment randomization and once per year over an average of 4.5 years postrandomization. Cross-sectional and longitudinal repeated measures analyses were performed to model the relationship between PC20 (the methacholine concentration causing a 20% fall in FEV1) with maternal, paternal, and joint parental histories of asthma. Models were adjusted for potential confounders. Measurements and Main Results: At baseline, AHR was strongly associated with a paternal history of asthma. Children with a paternal history of asthma demonstrated significantly greater AHR than those without such history (median logePC20, 0.84 vs. 1.13; p = 0.006). Although maternal history of asthma was not associated with AHR, children with two parents with asthma had greater AHR than those with no parents with asthma (median logePC20, 0.52 vs. 1.17; p = 0.0008). Longitudinal multivariate analysis of the relation between paternal history of asthma and AHR using repeated PC20 measurements over 44 months postrandomization confirmed a significant association between paternal history of asthma and AHR among children in CAMP. Conclusions: Our findings suggest that the genetic contribution of the father is associated with AHR, an important determinant of disease severity among children with asthma.
PMCID: PMC2718530  PMID: 15937295
airway responsiveness; asthma; genetics; longitudinal analysis; parent of origin
13.  Genomic screening in family-based association testing 
BMC Genetics  2005;6(Suppl 1):S115.
Due to the recent gains in the availability of single-nucleotide polymorphism data, genome-wide association testing has become feasible. It is hoped that this additional data may confirm the presence of disease susceptibility loci, and identify new genetic determinants of disease. However, the problem of multiple comparisons threatens to diminish any potential gains from this newly available data. To circumvent the multiple comparisons issue, we utilize a recently developed screening technique using family-based association testing. This screening methodology allows for the identification of the most promising single-nucleotide polymorphisms for testing without biasing the nominal significance level of our test statistic. We compare the results of our screening technique across univariate and multivariate family-based association tests. From our analyses, we observe that the screening technique, applied to different settings, is fairly consistent in identifying optimal markers for testing. One of the identified markers, TSC0047225, was significantly associated with both the ttth1 (p = 0.004) and ttth1-ttth4 (p = 0.004) phenotype(s). We find that both univariate- and multivariate-based screening techniques are powerful tools for detecting an association.
PMCID: PMC1866823  PMID: 16451572
14.  Comparison of linkage and association strategies for quantitative traits using the COGA dataset 
BMC Genetics  2005;6(Suppl 1):S96.
Genome scans using dense single-nucleotide polymorphism (SNP) data have recently become a reality. It is thought that the increase in information content for linkage analysis as a result of the denser scans will help refine previously identified linkage regions and possibly identify new regions not identifiable using the sparser, microsatellite scans. In the context of the dense SNP scans, it is also possible to consider association strategies to provide even more information about potential regions of interest. To circumvent the multiple-testing issues inherent in association analysis, we use a recently developed strategy, implemented in PBAT, which screens the data to identify the optimal SNPs for testing, without biasing the nominal significance level. We compare the results from the PBAT analysis to that of quantitative linkage analysis on chromosome 4 using the Collaborative Study on the Genetics of Alcoholism data, as released through Genetic Analysis Workshop 14.
PMCID: PMC1866683  PMID: 16451712
15.  PBAT: A comprehensive software package for genome-wide association analysis of complex family-based studies 
Human Genomics  2005;2(1):67-69.
The PBAT software package (v2.5) provides a unique set of tools for complex family-based association analysis at a genome-wide level. PBAT can handle nuclear families with missing parental genotypes, extended pedigrees with missing genotypic information, analysis of single nucleotide polymorphisms (SNPs), haplotype analysis, quantitative traits, multivariate/longitudinal data and time to onset phenotypes. The data analysis can be adjusted for covariates and gene/environment interactions. Haplotype-based features include sliding windows and the reconstruction of the haplotypes of the probands. PBAT's screening tools allow the user successfully to handle the multiple comparisons problem at a genome-wide level, even for 100,000 SNPs and more. Moreover, PBAT is computationally fast. A genome scan of 300,000 SNPs in 2,000 trios takes 4 central processing unit (CPU)-days. PBAT is available for Linux, Sun Solaris and Windows XP.
PMCID: PMC3525120  PMID: 15814068
association analysis; extended pedigrees; genome-wide screening; quantitative and qualitative traits; haplotypes

