The X chromosome is generally understudied in association studies, in part because the analyst has had limited methodological options. For nuclear-family-based association studies, most current methods extend the transmission disequilibrium test (TDT) to the X chromosome. We present a new method to study association in case-parent triads: the parent-informed likelihood ratio test for the X chromosome (PIX-LRT). Our method enables estimation of relative risks and takes advantage of parental genotype information and the sex of the affected offspring to increase statistical power to detect an effect. Under a parental exchangeability assumption for the X, if case-parent triads are complete, the parents of affected offspring provide an independent replication sample for estimates based on transmission distortion to their affected offspring. For each offspring sex we combine the parent-level and the offspring-level information to form a likelihood ratio test statistic; we then combine the two to form a combined test statistic. Our method can estimate relative risks under different modes of inheritance or a more general co-dominant model. In triads with missing parental genotypes, the method accounts for missingness with the Expectation-Maximization algorithm. We calculate non-centrality parameters to assess the power gain and robustness of our method compared to alternative methods. We apply PIX-LRT to publically available data from an international consortium of genotyped families affected by the birth defect oral cleft and find a strong, internally-replicated signal for a SNP marker related to cleft lip with or without cleft palate.
X chromosome; family-based design; case-parent triad; oral cleft; association study; SNPs; likelihood ratio test
Neural tube defects (NTDs) are common birth defects (~1 in 1000 pregnancies in the US and Europe) that have complex origins, including environmental and genetic factors. A low level of maternal folate is one well-established risk factor, with maternal periconceptional folic acid supplementation reducing the occurrence of NTD pregnancies by 50-70%. Gene variants in the folate metabolic pathway (e.g., MTHFR rs1801133 (677 C > T) and MTHFD1 rs2236225 (R653Q)) have been found to increase NTD risk. We hypothesized that variants in additional folate/B12 pathway genes contribute to NTD risk.
A tagSNP approach was used to screen common variation in 82 candidate genes selected from the folate/B12 pathway and NTD mouse models. We initially genotyped polymorphisms in 320 Irish triads (NTD cases and their parents), including 301 cases and 341 Irish controls to perform case–control and family based association tests. Significantly associated polymorphisms were genotyped in a secondary set of 250 families that included 229 cases and 658 controls. The combined results for 1441 SNPs were used in a joint analysis to test for case and maternal effects.
Nearly 70 SNPs in 30 genes were found to be associated with NTDs at the p < 0.01 level. The ten strongest association signals (p-value range: 0.0003–0.0023) were found in nine genes (MFTC, CDKN2A, ADA, PEMT, CUBN, GART, DNMT3A, MTHFD1 and T (Brachyury)) and included the known NTD risk factor MTHFD1 R653Q (rs2236225). The single strongest signal was observed in a new candidate, MFTC rs17803441 (OR = 1.61 [1.23-2.08], p = 0.0003 for the minor allele). Though nominally significant, these associations did not remain significant after correction for multiple hypothesis testing.
To our knowledge, with respect to sample size and scope of evaluation of candidate polymorphisms, this is the largest NTD genetic association study reported to date. The scale of the study and the stringency of correction are likely to have contributed to real associations failing to survive correction. We have produced a ranked list of variants with the strongest association signals. Variants in the highest rank of associations are likely to include true associations and should be high priority candidates for further study of NTD risk.
Neural tube defects; Spina bifida; Folic acid; One-carbon metabolism; Candidate gene
In studies of case-parent triads, information is often collected about history of the condition in the parents, but typically parental phenotypes are ignored. Including that information in analyses may increase power to detect genetic association for autosomal variants. Our proposed approach uses parental phenotypes to assess association independently of the usual case-parent-based association test, enabling cross-generational internal replication for findings based on offspring and their parents. Our model for parental phenotypes also resists bias due to population stratification. We combine the information from the two generations into a single coherent model that can exploit approximate equality of parental and offspring relative risks to improve power and can also test that equality. We call the resulting procedure the Parent-phenotype Informed Likelihood Ratio Test (PPI-LRT). When some parental genotypes are missing, one can use the expectation-maximization algorithm to fit the combined model. We also develop a second composite test (PPI-CT) based on a linear combination of the parent-phenotype-based test statistic and that from the traditional log-linear, transmission-based test. We evaluate the proposed methods through non-centrality parameter calculations and simulation studies and compare them to the previously proposed approaches, parenTDT and combTDT. We show that incorporation of parental phenotype data often improves statistical power. As illustration, we apply our method to a study of young-onset breast cancer and find that it improved precision for SNPs in FGFR2 and that estimated relative risks based on triads are closely replicated using the parental data.
case-parent triad; parental phenotype; association study; SNPs; likelihood ratio test
Triad families are routinely used to test association between genetic variants and complex diseases. Triad studies are important and popular since they are robust in terms of being less prone to false positives due to population structure. In practice, one may collect not only complete triads, but also incomplete families such as dyads (affected child with one parent) and singleton monads (affected child without parents). Since there is a lack of convenient algorithms and software to analyze the incomplete data, dyads and monads are usually discarded. This may lead to loss of power and insufficient utilization of genetic information in a study.
We develop likelihood-based statistical models and likelihood ratio tests to test for association between complex diseases and genetic markers by using combinations of full triads, parent-child dyads, and affected singleton monads for a unified analysis. A likelihood is calculated directly to facilitate the data analysis without imputation and to avoid computational complexity. This makes it easy to implement the models and to explain the results.
By simulation studies, we show that the proposed models and tests are very robust in terms of accurately controlling type I error evaluations, and are powerful by empirical power evaluations. The methods are applied to test for association between transforming growth factor alpha (TGFA) gene and cleft palate in an Irish study.
Association mapping of complex diseases; Likelihood ratio tests; Transmission disequilibrium tests
Women who have low cobalamin (vitamin B12) levels are at increased risk for having children with neural tube defects (NTDs). The transcobalamin II receptor (TCblR) mediates uptake of cobalamin into cells. We evaluated inherited variants in the TCblR gene as NTD risk factors.
Case-control and family-based tests of association were used to screen common variation in TCblR as genetic risk factors for NTDs in a large Irish group. A confirmatory group of NTD triads was used to test positive findings.
We found two tightly linked variants associated with NTDs in a recessive model: TCblR rs2336573 (G220R) (pcorr=0.0080, corrected for multiple hypothesis testing) and TCblR rs9426 (pcorr =0. 0279). These variants were also associated with NTDs in a family-based test prior to multiple test correction (log-linear analysis of a recessive model: rs2336573 (G220R) (RR=6.59, p=0.0037) and rs9426 (RR=6.71, p=0.0035)). We describe a copy number variant (CNV) distal to TCblR and two previously unreported exonic insertion-deletion polymorphisms.
TCblR rs2336573 (G220R) and TCblR rs9426 represent a significant risk factor in NTD cases in the Irish population. The homozygous risk genotype was not detected in nearly one thousand controls, indicating this NTD risk factor may be of low frequency and high penetrance. Nine other variants are in perfect LD with the associated SNPs. Additional work is required to identify the disease-causing variant. Our data suggest that variation in TCblR plays a role in NTD risk and that these variants may modulate cobalamin metabolism.
neural tube defects; spina bifida; transcobalamin II receptor (TCblR); cobalamin; vitamin B12; copy number variant (CNV)
The transmission/disequilibrium test was introduced to test for linkage and association between a marker and a putative disease locus using case-parent triads. Several extensions have been proposed to accommodate incomplete triads. Some strategies assumed that parental genotypes were missing completely at random and some methods allowed informative missingness for parental genotypes. However, the above tests assumed that offspring genotypes were missing completely at random and concluded that the transmission/disequilibrium test remained a valid test by excluding incomplete triads from the analysis. In this article, the conditional distribution of ascertained triads allowing informative missingness for offspring genotypes, as well as their parental genotypes, was derived and several tests under such scenarios were evaluated. In simulations, independent triads from the Genetic Analysis Workshop 15 simulated data (Problem 3) was ascertained. When offspring genotypes were missing informatively, simulation results revealed inflated type I error and/or reduced power for the transmission/disequilibrium test excluding incomplete triads.
The methylenetetrahydrofolate reductase (MTHFR) is thought to be
involved in the development of nonsyndromic cleft lip with or without cleft palate
(NSCL/P). However, conflicting results have been obtained when evaluating the association between maternal MTHFR C677T and A1298C polymorphisms and the risk of
NSCL/P. In light of this gap, a meta-analysis of all eligible case-control studies was
conducted in the present study.
Materials and Methods
A total of 15 case-control studies were ultimately identified
after a comprehensive literature search and Hardy-Weinberg equilibrium (HWE) examination. Cochrane’s Q test and index of heterogeneity (I2) indicated no obvious heterogeneity among studies.
Fixed or random-effects models were used to calculate the pooled odds ratios
(ORs). The results showed that the TT genotype in mothers increased the likelihood of having
NSCL/P offspring 1.25 times (95% CI: 1.047-1.494) more than the CC homozygotes. Meanwhile, maternal TT genotype increased the risk of producing NSCL/P offspring in recessive
model (OR=1.325, 95% CI: 1.124-1.562). However, the CT heterozygote and the CT+TT
dominant models had no association with NSCL/P offspring compared with the CC wild-type
homozygote model. Subgroup analyses based on ethnicity indicated that maternal TT genotype increased the likelihood of having NSCL/P offspring in Whites (OR=1.308, 95% CI:
1.059-1.617) and Asians (OR=1.726, 95% CI: 1.090-2.733) in recessive model. Also, subgroup analyses based on source of control showed that mothers with the 677TT genotype had
a significantly increased susceptibility of having NSCL/P children in hospital based population (HB) when compared with CC homozygotes (OR=1.248, 95% CI: 1.024-1.520) and un-
der the recessive model (OR=1.324, 95% CI: 1.104-1.588). Furthermore, maternal A1298C
polymorphism had no significant association with producing NSCL/P offspring (dominant
model OR=0.952, 95% CI: 0.816-1.111, recessive model OR=0.766, 95% CI: 0.567-1.036).
MTHFR C677T polymorphism is associated with the risk of generating NSCL/P
offspring, and being a 677TT homozygote is a risk factor. MTHFR A1298C polymorphism
was not associated with generating NSCL/P offspring. However, further work should be performed to confirm these findings.
Methylenetetrahydrofolate Reductase; Cleft Lip; Meta-Analysis
Worldwide female participation in ultra-endurance events may place them at risk for the female athlete triad (FAT). The study objectives were to establish triad knowledge, occurrence of disordered eating and triad risk amongst participants of the 2014 89-km Comrades Marathon event.
A survey utilising the Low Energy Availability in Females questionnaire (LEAF-Q) and Female Athlete Screening Tool (FAST) questionnaire was conducted on female participants in order to determine the risk. In addition, seven questions pertaining to the triad were asked in order to determine the athlete’s knowledge of the triad. Athletes were requested to complete the anonymous questionnaire after written informed consent was obtained while waiting in the event registration queues. Statistical analyses included Pearson product–moment correlations, chi-square tests and cross-tabulations to evaluate associations of interest.
Knowledge of the triad was poor with 92.5 % of participants having not heard of the triad before and most of those who had, gained their knowledge from school or university. Only three athletes were able to name all 3 components of the triad. Amenorrhoea was the most commonly recalled component while five participants were able to name the component of low bone mineral density. Of the 306 athletes included in the study, 44.1 % were found to be at risk for the female athlete triad. One-third of participants demonstrated disordered eating behaviours with nearly half reporting restrictive eating behaviours. There is a significant association between athletes at risk for the triad according to the LEAF-Q and those with disordered eating (χ2(1) = 8.411, p = 0.014) but no association (or interaction) between triad knowledge and category (at risk/not at risk) of LEAF-Q score (χ2(1) = 0.004, p = 0.949). More athletes in the groups with clinical and sub-clinical eating disorders are at risk for the triad than expected under the null hypothesis for no association.
Only 7.5 % of the female Comrades Marathon runners knew about the triad despite 44.1 % being at a high risk for the triad. Therefore, education and regular screening programmes targeting these athletes are overdue. Postmenopausal athletes are at particularly high risk for large losses in bone mass if they experience chronic energy deficiency and hence require special focus.
Female athlete triad; Ultra-marathon runners; Risk of female athlete triad; Knowledge of female athlete triad
Facial clefts are common birth defects with a strong genetic component. To identify fetal genetic risk factors for clefting, 1536 SNPs in 357 candidate genes were genotyped in two population-based samples from Scandinavia (Norway: 562 case-parent and 592 control-parent triads; Denmark: 235 case-parent triads).
We used two complementary statistical methods, TRIMM and HAPLIN, to look for associations across these two national samples. TRIMM tests for association in each gene by using multi-SNP genotypes from case-parent triads directly without the need to infer haplotypes. HAPLIN on the other hand estimates the full haplotype distribution over a set of SNPs and estimates relative risks associated with each haplotype. For isolated cleft lip with or without cleft palate (I-CL/P), TRIMM and HAPLIN both identified significant associations with IRF6 and ADH1C in both populations, but only HAPLIN found an association with FGF12. For isolated cleft palate (I-CP), TRIMM found associations with ALX3, MKX, and PDGFC in both populations, but only the association with PDGFC was identified by HAPLIN. In addition, HAPLIN identified an association with ETV5 that was not detected by TRIMM.
Strong associations with seven genes were replicated in the Scandinavian samples and our approach effectively replicated the strongest previously known association in clefting—with IRF6. Based on two national cleft cohorts of similar ancestry, two robust statistical methods and a large panel of SNPs in the most promising cleft candidate genes to date, this study identified a previously unknown association with clefting for ADH1C and provides additional candidates and analytic approaches to advance the field.
Hybrid designs arose from an effort to combine the benefits of family-based and population-based study designs. A recently proposed hybrid approach augments case-parent triads with population-based control-parent triads, genotyping everyone except the control offspring. Including parents of controls substantially improves statistical efficiency for testing and estimating both offspring and maternal genetic relative risk parameters relative to using case-parent triads alone. Moreover, it allows testing of required assumptions. Nevertheless, control fathers can be hard to recruit, whereas control offspring and their mothers may be readily available. Consequently, we propose an alternative hybrid design where offspring-mother pairs, instead of parents, serve as population-based controls. We compare the power of our proposed method with several competitors and show that it performs well in various scenarios, though it is slightly less powerful than the hybrid design that uses control parents. We describe approaches for checking whether population stratification will bias inferences that use controls and whether the mating symmetry assumption holds. Surprisingly, if mating symmetry is violated, even though mating-type parameters cannot be directly estimated using control-mother dyads alone, and maternal effects cannot be estimated using case-parent triads alone, combining both sources of data allows estimation of all the parameters. This hybrid design can also be used to study environmental influences on disease risk and gene-by-environment interactions.
genetic relative risk; maternal effect; Single Nucleotide Polymorphism (SNP); association studies; family-based design; population-based design; Poisson regression; early-onset disease
The 677C>T polymorphism of methylenetetrahydrofolate reductase (MTHFR) gene is considered to have a significant effect on colorectal cancer susceptibility, but the results are inconsistent. In order to investigate the association between the MTHFR 677C>T polymorphism and the risk of colorectal cancer, a meta-analysis was held based on 71 published studies.
Eligible studies were identified through searching the MEDLINE, EMBASE, PubMed, Web of Science, Chinese Biomedical Literature database (CBM) and CNKI database. Odds ratios (OR) and 95% confidence intervals (CIs) were used to assess the association. The statistical heterogeneity across studies was examined with x2-based Q-test. Begg's and Egger's test were also carried out to evaluate publication bias. Sensitive and subgroup analysis were also held in this meta-analysis.
Overall, 71 publications including 31,572 cases and 44,066 controls were identified. The MTHFR 677 C>T variant genotypes are significantly associated with increased risk of colorectal cancer. In the stratified analysis by ethnicity, significantly increased risks were also found among Caucasians for CC vs TT (OR = 1.076; 95%CI = 1.008–1.150; I2 = 52.3%), CT vs TT (OR = 1.102; 95%CI = 1.032–1.177; I2 = 51.4%) and dominant model (OR = 1.086; 95%CI = 1.021–1.156; I2 = 53.6%). Asians for CC vs TT (OR = 1.226; 95%CI = 1.116–1.346; I2 = 55.3%), CT vs TT (OR = 1.180; 95%CI = 1.079–1.291; I2 = 36.2%), recessive (OR = 1.069; 95%CI = 1.003-1.140; I2 = 30.9%) and dominant model (OR = 1.198; 95%CI = 1.101-1.303; I2 = 52.4%), and Mixed populations for CT vs TT (OR = 1.142; 95%CI = 1.005-1.296; I2 = 0.0%). However, no associations were found in Africans for all genetic models.
This meta-analysis suggests that the MTHFR 677C>T polymorphism increases the risk for developing colorectal cancer, while there is no association among Africans found in subgroup analysis by ethnicity.
The case-parent triad design is commonly used in genetic association studies. Generally, samples are drawn from an affected offspring, manifesting a phenotype of interest, as well as from the parents. The trio genotypes may be analyzed using a variety of available methods, but we focus on log-linear models because they test for genetic association and additionally estimate the relative risks of transmission. The models need to be modified to adjust for missing genotypes. Furthermore, instability in the parameter estimates can arise when certain kinds of genotype combinations do not appear in the dataset.
In this paper, we kill two birds with one stone. We propose a new method to simultaneously account for missing genotype data and genotype combinations with zero counts. This method solves a zero-inflated Poisson (ZIP) regression likelihood. The maximum likelihood estimates yield relative risks and the information matrix gives appropriate variance estimates for inference. A likelihood ratio test determines the significance of genetic association.
We compared the ZIP regression to previously proposed methods in both simulation studies and in a dataset that investigates the risk of orofacial clefts. The ZIP likelihood estimates regression coefficients with less bias than other methods when the minor allele frequency is small.
Log-linear models; Case-parent triad design; Missing data
Orofacial clefts are common birth defects with strong evidence for both genetic and environmental causal factors. Candidate-gene studies combined with exposures known to influence the outcome provide a highly targeted approach to detecting GxE interactions. We developed a new statistical approach that combines the case-control and offspring-parent triad designs into a “hybrid design” to search for GxE interactions among 334 autosomal cleft candidate genes and maternal first-trimester exposure to smoking, alcohol, coffee, folic acid supplements, dietary folate, and vitamin A. The study population comprised 425 case-parent triads of isolated clefts and 562 control-parent triads derived from a nationwide study of orofacial clefts in Norway (1996-2001). A full maximum-likelihood model was used in combination with a Wald test statistic to screen for statistically significant GxE interaction between strata of exposed and unexposed mothers. In addition, we performed pathway-based analyses on 28 detoxification genes and 21 genes involved in folic acid metabolism. With the possible exception of the T-box 4 gene (TBX4) and dietary folate interaction in isolated CPO, there was little evidence overall of GxE interaction in our data. This study is the largest to date aimed at detecting interactions between orofacial clefts candidate genes and well-established risk exposures.
Birth defects; orofacial cleft; cleft lip; cleft palate; genetic epidemiology
We conducted a case-parent triad study evaluating the role of maternal and offspring genotypes in the folate metabolic pathway on childhood acute lymphoblastic leukemia (ALL) risk.
Childhood ALL case-parent triads (N = 120) were recruited from Texas Children’s Hospital. DNA samples were genotyped using the Sequenom iPLEX MassARRAY for 68 tagSNPs in six folate metabolic pathway genes (MTHFR, MTRR, MTR, DHFR, BHMT, and TYMS). Log-linear modeling was used to examine the associations between maternal and offspring genotypes and ALL.
After controlling for the false discovery rate (<0.1), there were 20 significant maternal effects in the following genes: BHMT (N = 3), MTR (N = 12), and TYMS (N = 5). For instance, maternal genotypes for BHMT rs558133 (relative risk [RR] = 0.51, 95% confidence interval [CI]: 0.30–0.87, P = 0.008, Q = 0.08) and MTR rs2282369 (RR = 0.46, 95% CI: 0.27–0.80, P = 0.004, Q = 0.08) were associated with ALL. There were no significant offspring effects after controlling for the false discovery rate.
This is one of the few studies conducted to evaluate maternal genetic effects in the context of childhood ALL risk. Furthermore, we employed a family-based design that is less susceptible to population stratification bias in the estimation of maternal genetic effects. Our findings suggest that maternal genetic variation in the folate metabolic pathway is relevant in the etiology of childhood ALL. The observed maternal genetic effects support the need for continued research of how the uterine environment may influence risk of ALL.
Acute lymphoblastic leukemia; case-parent triad; folate; genetic epidemiology; pediatric cancer
Preterm delivery (PTD) is a complicated perinatal adverse event. We were interested in association of G308A polymorphism in tumor necrosis factor-α (TNF-α) gene with PTD; so we conducted a genetic epidemiology study in Anqing City, Anhui Province, China. Case families and control families were all collected between July 1999 and June 2002. To control potential population stratification as we could, all eligible subjects were ethnic Han Chinese. 250 case families and 247 control families were included in data analysis. A hybrid design which combines case-parent triads and control parents was employed, to test maternal-fetal genotype (MFG) incompatibility. The method is based on a log-linear modeling approach. In summary, we found that when the mother's or child's genotype was G/A, there was a reduced risk of PTD; however when the mother's or child's genotype was genotype A/A, there was a relatively higher risk of PTD. Combined maternal-fetal genotype GA/GA showed the most reduced risk of PTD. Comparison of the LRTs showed that the model with maternal-fetal genotype effects fits significantly better than the model with only maternal and fetal genotype main effects (log-likelihood = −719.4, P = .023, significant at 0.05 level). That means that the combined maternal-fetal genotype incompatibility was significantly associated with PTD. The model with maternal-fetal genotype effects can be considered a gene-gene interaction model. We claim that both maternal effects and fetal effects should be considered together while investigating genetic factors of certain perinatal diseases.
We describe a novel graph-based event detection approach which can accurately identify and track dynamic outbreaks (where the affected region changes over time). Our approach enforces soft constraints on temporal consistency, allowing detected regions to grow, shrink, or move while penalizing implausible region dynamics. Using simulated contaminant plumes diffusing through a water distribution system, we demonstrate that our method improves both detection time and spatial-temporal accuracy when tracking dynamic water-borne outbreaks.
Space-time scan statistics are often used to identify emerging spatial clusters of disease cases [1,2]. They operate by maximizing a score function (likelihood ratio statistic) over multiple spatio-temporal regions. The temporal component is typically incorporated by aggregating counts across a given time window, thus assuming that the affected region does not change over time. To relax this hard constraint on spatial-temporal “shape” and increase detection power and accuracy when tracking spreading outbreaks, we implement a new graph-based event detection approach which enables identification of dynamic clusters while enforcing temporal consistency constraints between temporally-adjacent spatial regions.
In the subset scanning framework, temporal consistency constraints may be interpreted as influencing the prior probability
pit of location i being included in the optimal spatial subset at time t. We model this prior probability for each location as
Xit−1 is 1 if location i was included in the previous time step and 0 otherwise, and maximize the penalized log-likelihood ratio over dynamic spatio-temporal regions. Our efficient algorithm incorporates these constraints into the Graph-Scan method  by iteratively optimizing the spatial subset for each time slice conditioned on the previous and next slices. Each individual optimization step is made possible by expressing the score function as an additive function (conditioned on the relative risk), which enables the priors to be included while maintaining computational efficiency.
Outbreak plumes were simulated in a water distribution system for 12 one-hour periods. We assumed noisy binary sensors (with 10% false positive and 90% true positive rates) observed hourly at each pipe junction. Our method (“Dynamic”) was compared to the “Static” method, which aggregates counts across time for each spatial region and is therefore constrained to only return temporal cylinders, and the “Independent” method, which separately optimizes the spatial subset for each time slice without taking temporal consistency into account. The methods were evaluated on spatial-temporal overlap (Figure 1), defined as the number of sensors contained in both the detected and affected space-time regions divided by the number of sensors in either the detected or affected space-time regions. A measure of 1 is a perfect match of spatial subsets across each time window and 0 would reflect disjoint space-time regions. Additionally, average time to detect an outbreak (at a fixed false positive rate of 1/month) was 4.24, 4.56, and 6.65 hours for the dynamic, static, and independent methods respectively.
Relaxing constraints on spatial-temporal region shape must be done carefully. Allowing independent selection of spatial regions loses important temporal information while hard constraints on the spatial-temporal region will fail to capture the dynamics of the outbreak. Our approach for detecting dynamic space-time clusters, while incorporating temporal consistency constraints, addresses these issues and results in higher spatial-temporal accuracy and detection power.
Spatial-temporal overlap for three competing detection methods.
outbreak detection; space-time scan statistics; dynamic event tracking; penalized likelihood ratio
The 677 C>T and 1298 A>C polymorphisms of methylenetetrahydrofolate reductase (MTHFR) gene have been widely reported and considered to have a significant effect on breast cancer risk, but the results are inconsistent. A meta-analysis based on 57 eligible studies was carried out to clarify the role of MTHFR gene polymorphisms in breast cancer.
Methods and Results
Eligible articles were identified by searching databases including PubMed, Web of Science, EMBASE, CNKI and CBM for the period up to August 2012. Finally, a total of 57 studies were included in this meta-analysis. Crude ORs with 95% CIs were used to assess the association between the MTHFR polymorphisms and breast cancer risk. The pooled ORs were performed with additive model, dominant model and recessive model, respectively. Subgroup analysis was also performed by ethnicity. The statistical heterogeneity across studies was examined with χ2-based Q-test. A meta-analysis was performed using the Stata 12.0 software. Overall, the 677 C allele was significantly associated with breast cancer risk (OR = 0.942, 95%CI = 0.898 to 0.988) when compared with the 677 T allele in the additive model, and the same results were also revealed under other genetic models. Simultaneously, the 1298 A allele was not associated with the breast cancer susceptibility when compared with the 1298 C allele (OR = 0.993, 95%CI = 0.978 to 1.009). Furthermore, analyses under the dominant, recessive and the allele contrast model yielded similar results.
The results of this meta-analysis suggest that 677 C>T polymorphism in the MTHFR gene may contribute to breast cancer development. However, the 1298 A>C polymorphism is not significantly associated with increased risks of breast cancer.
Neural Tube Defects (NTDs) are among the most prevalent and most severe congenital malformations worldwide. Polymorphisms in key genes involving the folate pathway have been reported to be associated with the risk of NTDs. However, the results from these published studies are conflicting. We surveyed the literature (1996–2011) and performed a comprehensive meta-analysis to provide empirical evidence on the association.
Methods and Findings
We investigated the effects of 5 genetic variants from 47 study populations, for a total of 85 case-control comparisons MTHFR C677T (42 studies; 4374 cases, 7232 controls), MTHFR A1298C (22 studies; 2602 cases, 4070 controls), MTR A2756G (9 studies; 843 cases, 1006 controls), MTRR A66G (8 studies; 703 cases, 1572 controls), and RFC-1 A80G (4 studies; 1107 cases, 1585 controls). We found a convincing evidence of dominant effects of MTHFR C677T (OR 1.23; 95%CI 1.07–1.42) and suggestive evidence of RFC-1 A80G (OR 1.55; 95%CI 1.24–1.92). However, we found no significant effects of MTHFR A1298C, MTR A2756G, MTRR A66G in risk of NTDs in dominant, recessive or in allelic models.
Our meta-analysis strongly suggested a significant association of the variant MTHFR C677T and a suggestive association of RFC-1 A80G with increased risk of NTDs. However, other variants involved in folate pathway do not demonstrate any evidence for a significant marginal association on susceptibility to NTDs.
Childhood acute lymphoblastic leukemia (ALL) is a condition that arises from complex etiologies. The absence of consistent environmental risk factors and the presence of modest familial associations suggest ALL is a complex trait with an underlying genetic component. The identification of genetic factors associated with disease is complicated by complex genetic covariance structures and multiple testing issues. Both issues can be resolved with appropriate Bayesian variable selection methods. The present study was undertaken to extend our hierarchical Bayesian model for case-parent triads to incorporate single nucleotide polymorphisms (SNPs) and incorporate the biological grouping of SNPs within genes. Based on previous evidence that genetic variation in the folate metabolic pathway influences ALL risk, we evaluated 128 tagging SNPs in 16 folate metabolic genes among 118 ALL case-parent triads recruited from the Texas Children’s Cancer Center (Houston, TX) between 2003 and 2010. We used stochastic search gene suggestion (SSGS) in hierarchical Bayesian models to evaluate the association between folate metabolic SNPs and ALL. Using Bayes factors among these variants in childhood ALL case-parent triads, two SNPs were identified with a Bayes factor greater than 1. There was evidence that the minor alleles of NOS3 rs3918186 (OR = 2.16; 95% CI: 1.51-3.15) and SLC19A1 rs1051266 (OR = 2.07; 95% CI: 1.25-3.46) were positively associated with childhood ALL. Our findings are suggestive of the role of inherited genetic variation in the folate metabolic pathway on childhood ALL risk, and they also suggest the utility of Bayesian variable selection methods in the context of case-parent triads for evaluating the role of SNPs on disease risk.
Gastric cancer is ranked as the most common cancer in Koreans. A recent molecular biological study about the folate pathway gene revealed the correlation with a couple of cancer types. In the folate pathway, several genes are involved, including methylenetetrahydrofolate reductase (MTHFR), methyltetrahydrofolate-homocysteine methyltransferase reductase (MTRR), and methyltetrahydrofolate-homocysteine methyltransferase (MTR). The MTHFR gene has been reported several times for the correlation with gastric cancer risk. However, the association of the MTRR or MTR gene has not been reported to date. In this study, we investigated the association between the single nucleotide polymorphisms (SNPs) of the MTHFR, MTRR, and MTR genes and the risk of gastric cancer in Koreans. To identify the genetic association with gastric cancer, we selected 17 SNPs sites in folate pathway-associated genes of MTHFR, MTR, and MTRR and tested in 1,261 gastric cancer patients and 375 healthy controls. By genotype analysis, estimating odds ratios and 95% confidence intervals (CI), rs1801394 in the MTRR gene showed increased risk for gastric cacner, with statistical significance both in the codominant model (odds ratio [OR], 1.39; 95% CI, 1.04 to 1.85) and dominant model (OR, 1.34; 95% CI, 1.02 to 1.75). Especially, in the obese group (body mass index ≥ 25 kg/m2), the codominant (OR, 9.08; 95% CI, 1.01 to 94.59) and recessive model (OR, 3.72; 95% CI, 0.92 to 16.59) showed dramatically increased risk (p < 0.05). In conclusion, rs1801394 in the MTRR gene is associated with gastric cancer risk, and its functional significance need to be validated.
5-methyltetrahydrofolate-homocysteine S-methyltransferase; folate pathway; genetic olymorphism; methionine synthase reductase; methylenetetrahydrofolate reductase (NADPH2); stomach neoplasms
Genotype-based likelihood ratio tests (LRT) of association that examine maternal and parent-of-origin effects have been previously developed in the framework of log-linear and conditional logistic regression models. In the situation where parental genotypes are missing, the expectation maximization (EM) algorithm has been incorporated in the log-linear approach to allow incomplete triads to contribute to the likelihood ratio test. We present an extension to this model which we call the Combined_LRT that incorporates additional information from the genotypes of unaffected siblings to improve assignment of incompletely typed families to mating type categories, thereby improving inference of missing parental data. Using simulations involving a realistic array of family structures, we demonstrate the validity of the Combined_LRT under the null hypothesis of no association and provide power comparisons under varying levels of missing data and using sibling genotype data. We demonstrate the improved power of the Combined_LRT compared with the family-based association test (FBAT), another widely used association test. Lastly, we apply the Combined_LRT to a candidate gene analysis in Autism families, some of which have missing parental genotypes. We conclude that the proposed log-linear model will be an important tool for future candidate gene studies, for many complex diseases where unaffected siblings can often be ascertained and where epigenetic factors such as imprinting may play a role in disease etiology.
family-based association; candidate gene tests; imprinting; parent-of-origin; maternal effects
Genomic imprinting and maternal effects have been increasingly explored for their contributions to complex diseases. Statistical methods have been proposed to detect both imprinting and maternal effects simultaneously based on nuclear families. However, these methods only make use of case-parents triads and possibly control-parents triads, thus wasting valuable information contained in the siblings. More seriously, most existing methods are full-likelihood based and have to make strong assumptions concerning mating-type probabilities (nuisance parameters) to avoid over-parametrization. In this paper, we develop a partial Likelihood approach for detecting Imprinting and Maternal Effects (LIME), using nuclear families with an arbitrary number of affected and unaffected children. By matching affected children with unaffected ones (within or across families) having the same triad/pair familial genotype combination, we derive a partial likelihood that is free of nuisance parameters. This alleviates the need to make strong, yet unrealistic assumptions about the population, leading to a procedure that is robust to departure from Hardy–Weinberg equilibrium. Power gain by including siblings and robustness of LIME under a variety of settings are demonstrated. Our simulation study also indicates that it is more profitable to recruit additional siblings than additional families when the total number of individuals is kept the same. We applied LIME to the Framingham Heart Study data to demonstrate its utility in analyzing real data. Many of our findings are consistent with results in the literature; potentially novel genes for hypertension have also emerged.
association study; imprinting effect; maternal effect; nuclear families with all children; partial likelihood
Meta-analyses of genome-wide association studies are often based on imputed single nucleotide polymorphism (SNP) data, because component studies were genotyped using different platforms. One would like to include case-parent triad studies along with case-control studies in such meta-analyses. However, there are no published methods for estimating relative risks from imputed data for case-parent triad studies. The authors propose a method for estimating the relative risk for a variant SNP allele based on a log-additive model. Their simulations first confirm that the proposed method performs well with genotyped SNP data. As an empirical test of the method's behavior with imputed SNPs, the authors then apply it to chromosome 22 data from the Mexico City Childhood Asthma Study (1998–2003). For chromosome 22, the authors had data on 7,293 SNPs that were both genotyped and imputed using the software MACH, which relies on linkage disequilibrium with nearby SNPs. Correlation between estimated relative risks based on the actual genotypes and those based on the imputed genotypes was remarkably high (r2 = 0.95), validating this method of relative risk estimation for the case-parent study design. This method should be useful to investigators who wish to conduct meta-analyses using imputed SNP data from both case-parent triad and case-control studies.
epidemiologic methods; genome-wide association study; genotype; imputation; meta-analysis; risk
Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical “complete” chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.
Genome-wide association studies are a powerful and now widely-used method for finding genetic variants that increase the risk of developing particular diseases. These studies are complex and must be planned carefully in order to maximize the probability of finding novel associations. The main design choices to be made relate to sample sizes and choice of commercially available genotyping chip and are often constrained by cost, which can currently be as much as several million dollars. No comprehensive comparisons of chips based on their power for different sample sizes or for fixed study cost are currently available. We describe in detail a method for simulating large genome-wide association samples that accounts for the complex correlations between SNPs due to LD, and we used this method to assess the power of current genotyping chips. Our results highlight the differences between the chips under a range of plausible scenarios, and we demonstrate how our results can be used to design a study with a budget constraint. We also show how genotype imputation can be used to boost the power of each chip and that this method decreases the differences between the chips. Our simulation method and software for comparing power are being made available so that future association studies can be designed in a principled fashion.
The asymptotic distribution of the multivariate variance component linkage analysis likelihood ratio test has provoked some contradictory accounts in the literature. In this paper we confirm that some previous results are not correct by deriving the asymptotic distribution in one special case. It is shown that this special case is a good approximation to the distribution in many situations. We also introduce a new approach to simulating from the asymptotic distribution of the likelihood ratio test statistic in constrained testing problems. It is shown that this method is very efficient for small p-values, and is applicable even when the constraints are not convex. The method is related to a multivariate integration problem. We illustrate how the approach can be applied to multivariate linkage analysis in a simulation study. Some more philosophical issues relating to one-sided tests in variance components linkage analysis are discussed.