|Home | About | Journals | Submit | Contact Us | Français|
Neural tube defects (NTDs) are common birth defects (~1 in 1000 pregnancies in the US and Europe) that have complex origins, including environmental and genetic factors. A low level of maternal folate is one well-established risk factor, with maternal periconceptional folic acid supplementation reducing the occurrence of NTD pregnancies by 50-70%. Gene variants in the folate metabolic pathway (e.g., MTHFR rs1801133 (677C>T) and MTHFD1 rs2236225 (R653Q)) have been found to increase NTD risk. We hypothesized that variants in additional folate/B12 pathway genes contribute to NTD risk.
A tagSNP approach was used to screen common variation in 82 candidate genes selected from the folate/B12 pathway and NTD mouse models. We initially genotyped polymorphisms in 320 Irish triads (NTD cases and their parents), including 301 cases and 341 Irish controls to perform case–control and family based association tests. Significantly associated polymorphisms were genotyped in a secondary set of 250 families that included 229 cases and 658 controls. The combined results for 1441 SNPs were used in a joint analysis to test for case and maternal effects.
Nearly 70 SNPs in 30 genes were found to be associated with NTDs at the p<0.01 level. The ten strongest association signals (p-value range: 0.0003–0.0023) were found in nine genes (MFTC, CDKN2A, ADA, PEMT, CUBN, GART, DNMT3A, MTHFD1 and T (Brachyury)) and included the known NTD risk factor MTHFD1 R653Q (rs2236225). The single strongest signal was observed in a new candidate, MFTC rs17803441 (OR=1.61 [1.23-2.08], p=0.0003 for the minor allele). Though nominally significant, these associations did not remain significant after correction for multiple hypothesis testing.
To our knowledge, with respect to sample size and scope of evaluation of candidate polymorphisms, this is the largest NTD genetic association study reported to date. The scale of the study and the stringency of correction are likely to have contributed to real associations failing to survive correction. We have produced a ranked list of variants with the strongest association signals. Variants in the highest rank of associations are likely to include true associations and should be high priority candidates for further study of NTD risk.
Neural tube defects (NTDs) are one of the most common birth defects, with a historical prevalence of ~1 in 1000 in the US [1,2]. The NTD rate is now closer to ~5 in 10,000 in areas with folic acid fortification, such as the US  and many European countries . Between 21 and 28days after conception, the neural plate folds and closes to form the neural tube; this structure later develops into the brain and spinal cord. Failure of the neural tube to close most commonly leads to spina bifida or anencephaly, although encephalocele, craniorachischisis and iniencephaly can also occur .
It is known that both environmental and genetic factors contribute to the development of NTDs. The most established environmental factor is dietary folate; significantly lower levels of folate are observed in mothers with an NTD pregnancy , and periconceptional folate supplementation can reduce the risk of an NTD pregnancy by up to 75% . There is also growing evidence of the importance of cobalamin (vitamin B12) in the etiology of NTDs. Like folate, lower vitamin B12 levels have been reported in mothers with an NTD pregnancy [6,10-17].
Genetic factors also contribute to NTDs. Compared to the general population, there is a 10–20 fold higher recurrence risk to siblings in families with an NTD child [18-20]. This, combined with the recognition of the importance of maternal folate, has led many groups to evaluate genetic polymorphisms related to the folate metabolic pathway as risk factors for NTDs. The best studied genetic risk factor is a single nucleotide polymorphism (SNP) in 5, 10-methylene-tetrahydrofolate reductase (MTHFR). The 677C>T polymorphism results in the substitution of a valine for an alanine at codon 222 (A222V), leading to a thermolabile isoform of the protein . A significantly higher frequency of the MTHFR 677 TT genotype has been observed in NTD cases in many populations (reviewed in ). Genetic variants associated with NTDs have been reported in other genes encoding folate- and vitamin B12-related proteins, such as methylenetetrahydrofolate dehydrogenase (NADP+dependent) 1, methenyltetrahydrofolate cyclohydrolase, formyltetrahydrofolate synthetase (MTHFD1) [23-26], methylenetetrahydrofolate dehydrogenase (NADP+dependent) 1-like (MTHFD1L) , dihydrofolate reductase (DHFR) [28,29], methionine synthase reductase (MTRR) [16,26,30-33], and the transcobalamin II receptor (TCblR) .
As genotyping technology has advanced, the scale of studies attempting to identify genetic NTD risk factors has grown from single SNP analyses to simultaneous evaluation of dozens of variants. Several studies have evaluated specific candidate polymorphisms with evidence of functional changes and/or disease risk (87 variants in 45 genes, ; 48 SNPs in 11 genes ; 64 SNPs in 34 genes ). In contrast, other studies have examined all common variation in candidate genes via tagging SNPs (tagSNPs) in specific genes of interest (118 tagSNPs in 14 folate-related genes ; 37 tagSNPs in 6 transcriptional activator genes ). In the current study, we also took the tagSNP approach to evaluate 1441 SNPs in 82 candidate genes for NTD risk.
Common genetic variation in 82 candidate genes (Figure (Figure1)1) was tested for association with NTDs. Results were generated in two stages. In the first stage, four broad tests of association were performed on all SNPs using a subset of samples. In the second stage, SNPs of interest identified in the initial analysis were then typed in the complete cohort to maximize the power to detect an effect, and a wider range of genetic models were applied to the combined dataset to evaluate the potential contribution of all SNPs to case or maternal risk of NTDs.
The primary sample set (320 NTD case families and 341 controls) was genotyped for 1517 tagging SNPs intended to capture common genetic variation in 82 candidate genes related to folate/vitamin B12 metabolism, transport of folate or vitamin B12, or transcriptional or developmental processes implicated in NTD mouse models (Table (Table1).1). Genotype data was successfully obtained for 1320 SNPs. Four tests of association were performed; two tests were to detect NTD case risk and two tests were to detect maternal risk for an NTD pregnancy. There were 203 SNPs in 54 genes that were significant (p<0.05) by at least one test of association. A gene-based approach was used to select SNPs from fifteen genes to be genotyped in the secondary sample set. Five genes (mitochondrial folate transporter/carrier (MFTC), megalin (LRP2, low density lipoprotein receptor-related protein 2), DNA (cytosine-5-)-methyltransferase 3 beta (DNMT3B), phosphatidylethanolamine N-methyltransferase (PEMT), and euchromatic histone-lysine N-methyltransferase (EHMT1)) contained at least one SNP that was positive for both tests of a case effect or both tests of a maternal effect. An additional four genes (5-methyltetrahydrofolate-homocysteine methyltransferase 1 (MTR), cubilin (CUBN, intrinsic factor-cobalamin receptor), T (Brachyury homolog (mouse)), and AT rich interactive domain 1A (ARID1A)) contained more than five SNPs significant for any of the four tests of association. The remaining six genes (adenosine deaminase (ADA), ferritin, heavy polypeptide 1 (FTH1), cystathionase (CTH), peptidyl arginine deiminase, type IV (PADI4), low density lipoprotein receptor-related protein 6 (LRP6), and serine hydroxymethyltransferase 1 (SHMT1)) were selected based on a combination of factors, including the number of positive SNPs, their level of significance, and biological plausibility. Any SNP in these genes significant by any of the four tests of association (Table (Table2)2) was selected for genotyping in the secondary sample set (258 additional case families and 658 additional controls).
In the combined analyses each SNP was evaluated for contributing to NTD risk by twelve association tests: case and maternal effects were each evaluated using three case–control (or mother-control) tests and three family-based tests (see Methods). There were 68 SNPs in 30 genes that showed an association at the p<0.01 level by any of the twelve tests (Table (Table3).3). Of these, twelve genes contained a single associated SNP. The remaining 56 SNPs were found in 18 genes. Not all associations were independent. Areas of interest were covered by tagSNPs as well as additional SNPs for physical coverage; as a result there were seven SNP pairs in this set with a linkage disequilibrium (LD) relationship above the threshold (r2≥0.8) selected for tagging in this study (Figure (Figure2,2, ARID1A_rs11247593 and ARID1A_rs11247594, CUBN_rs7070148 and CUBN_rs2273737, GART_rs2070388 and GART_rs4817580, MFTC_rs17803441 and MFTC_rs3134260, MTHFR_rs17367504 and MTHFR_rs17037425, MTR_rs10733117 and MTR_rs10925260, and PEMT_rs4646402 and PEMT rs1108579). Notably, for many genes, the associated SNPs occur in the same haplotype block (solid spine of LD based on D’ relationships), implying a single association signal for that gene (Figure (Figure2,2, GART, COMT, MFTC, MTHFR, ARID1A, MTR, FOLH1, MTHFD1, PEMT, RAI1, ENOSF1). A single signal probably also accounts for the significant SNP pairs seen in PDGFRA (D’=0.65) and CDKN2A (D’=0.79). Genes exhibiting more than one independent SNP association include FTCD (D’=0.10), MTHFD1L (two separate haplotype blocks and two SNPs, D’≤0.57), CUBN (one haplotype block and a SNP, D’≤0.20), ALDH1A2 (D’=0.13) and ADA (a weak haplotype block and a SNP, D’≤0.19).
We ranked SNPs by the lowest p-value for any test, accounting for relatedness to other highly significant SNPs. The nine genes exhibiting the 10 strongest association signals were: MFTC, CDKN2A, ADA, PEMT, CUBN, GART, ADA, DNMT3A, MTHFD1 and T (Brachyury) (Table (Table4).4). MFTC, ADA, PEMT and CUBN contained more than one significant SNP, and ADA showed evidence of two independent association signals.
SNPs in MFTC, PEMT and ADA account for seven of the top ten SNPs (Table (Table4).4). In MFTC, these two SNPs (rs17803441 and rs3134260) are essentially in perfect LD (D’=1.0, r2=0.99). As expected, they yielded very similar evidence of NTD risk in a continuous model of logistic regression (rs17803441: OR=1.61 [1.23-2.08], p=0.0003, Risk Allele Frequency (RAF)=0.07; rs3134260: OR=1.56 [1.22-2.04], p=0.0006, RAF=0.07), as well as a recessive model of logistic regression (rs17803441: OR=1.59 [1.19-2.08], p=0.0013; rs3134260: OR=1.54 [1.18-2.04], p=0.0021). There are a total of five highly significant (p<0.01) MFTC SNPs and they are all consistent with a case effect. MFTC rs10112450 is significant by the transmission disequilibrium test (TDT, GRR=1.42, p=0.0065, RAF=0.79), and the remaining MFTC SNPs (rs1865855 and rs750606) show evidence of NTD risk to the case by logistic regression with a continuous or dominant model (p<0.0091). Because all five of these SNPs fall in the same haplotype block (D’>0.96), it is likely there is a single variant in this region responsible for the association signals.
Three SNPs in PEMT were in the top ten SNPs, and two of these are in high linkage disequilibrium (rs1108579 and rs4646402; D’=1.0, r2=0.809). These SNPs also yield very similar evidence of NTD risk by TDT (rs1108579: GRR=1.47, p=0.0006, RAF=0.52; rs4646402: GRR=1.43, p=0.0009, RAF=0.57). PEMT rs11656215 is less strongly linked to these SNPs (r2<0.6), and shows association with NTD risk by TDT (GRR=1.35, p=0.0053, RAF=0.49) and by log-linear analysis of a recessive model (GRR=1.68 [1.24-2.28], p=0.0008). A fourth highly significant SNP was found in the maternal analysis (rs16961845). It shows evidence of maternal risk in a recessive model of log-linear analysis (GRR=1.92 [1.18-3.11], p=0.0082), as well as recessive (p<0.0048) and continuous (p<0.0072) models of logistic regression. By r2 measures of LD, this SNP is the least related to the other three SNPs (r2<0.08), although all four of these SNPs fall in the same haplotype block (Figure (Figure22).
Lastly, three SNPs in ADA were among the ten SNPs with the lowest p values. They show evidence of LD by D’ (≥0.82) but less so by r2 (≤0.54). These three SNPs are all significantly associated with a maternal effect by logistic regression of a continuous model (rs2299686: OR=1.30 [1.11-1.52], p=0.0010, RAF=0.45; rs427483: OR=1.28 [1.10-1.52], p=0.0018, RAF=0.33; rs406383: OR=1.33 [1.12-1.58], p=0.0012, RAF=0.25). Three other SNPs in ADA (rs6031682, rs452159 and rs6094017) also showed a highly significant association with a maternal effect in a continuous model of logistic regression (p<0.0059). All six SNPs were highly significant (p<0.01) for association with maternal risk in either a recessive model (rs406383, p=0.0021) or dominant model (the other five SNPs, p≤0.0063) of logistic regression. Additionally, ADA rs6031682 shows association with case risk in a continuous model of logistic regression (1.38 [1.10-1.73], p=0.0052, RAF=0.84). These six SNPs do not appear to be strongly linked; no haplotype blocks (solid spine of LD) larger than two SNPs were identified. It appears that rs6031682 is clearly outside of a degraded haplotype block consisting of the other five SNPs.
CUBN was the only other gene with more than one SNP found in the ten most strongly associated signals. CUBN rs7070148 and CUBN rs2273737 were both significant for association with maternal risk in a continuous model of logistic regression (OR=1.64 [1.22-2.17], p=0.0010, RAF=0.90 and OR=1.54 [1.18-2.04], p=0.0021, RAF=0.89, respectively). These SNPs are highly linked (D’=0.935, r2=0.823). There are three other highly significant (p<0.01) SNPs in CUBN. Two other SNPs in the same haplotype block with CUBN rs7070148 and CUBN rs2273737 (rs1801222 and rs11254375, D’<0.76) also showed an association with maternal effects. A third SNP, CUBN rs11591606, is outside this block (D’<0.20) and is associated with maternal risk in a dominant model of logistic regression (OR=4.15 [1.51-11.39], p=0.0058, RAF=0.17).
Correction for multiple testing was performed for three of the twelve tests of association. No adjusted p-value was found to remain significant, and no further correction was performed.
This study represents a new scale of evaluation of genetic contribution to NTD risk. Common variants in 82 biologically plausible candidate genes were tested for association with NTDs in a large Irish population. Seventeen variants in nine genes account for the ten most significant associations observed. CDKN2A, GART, DNMT3A, MTHFD1 and T (Brachyury) contained a single SNP among the ten lowest p-values observed for all tests. In contrast, MFTC, ADA, PEMT, and CUBN each contained more than one such SNP. This seems to be due to strong LD relationships between the associated SNPs. The only exception is in ADA, which shows evidence of two strong, unrelated association signals.
ADA (adenosine deaminase) converts adenosine to inosine by removal of an amino group. Deficiency in this enzyme causes severe combined immunodeficiency disease (SCID), which is characterized by compromise of both T cells and B cells. Interestingly, ADA activity was significantly elevated in a study of 68 pregnant women carrying a fetus with a central nervous system malformation ; of these women, 17 had a spina bifida pregnancy. Consistent with this, six unrelated (r2<0.70), noncoding ADA SNPs were found in the current study to be associated with maternal risk of carrying an NTD pregnancy (p≤0.006, uncorrected). Genetic variation in ADA may contribute to maternal risk of NTDs. In addition, this gene was the only one to exhibit two independent association signals among the top ten signals observed. This may indicate that there is more than one allele associated with risk or the same allele has recurred on more than one haplotype. ADA rs6031682 shows evidence of case effects (p=0.0016) as well as maternal effects (p=0.0019) in log-linear analyses of a dominant model, and it is clearly independent of the other significant ADA SNPs (D’<0.19). It would be of interest to test the associated ADA SNPs in an independent study, especially since the scale of correction would be much smaller in a focused study.
PEMT (phosphatidylethanolamine N-methyltransferase) plays a role in choline metabolism. It converts phosphatidylethanolamine to phosphatidylcholine in the liver; phosphatidylcholine is a major component of cell membranes. This role for choline can compete with its role as a methyl donor. Choline can be converted to betaine, which acts as a methyl donor in an alternate, folate-independent conversion of homocysteine to methionine. This link between folate and choline metabolism makes PEMT an interesting candidate gene, and interactions between PEMT SNPs have been reported to be associated with NTDs. In a case–control study, single SNP effects were not observed, although some compound genotypes for PEMT rs7946 and PEMT rs897453 were associated with decreased NTD risk . The latter variant was not directly tested in the current study, and no association for PEMT rs7946 was observed in this Irish sample . However, one related SNP pair (r2=0.80) and two other SNPs (r2<0.60) in PEMT falling in the same haplotype block (D’≥0.69) showed NTD association (p<0.0053, uncorrected) in the current study, suggesting a role for this gene in NTD risk. Unlike the other three SNPs that exhibited case effects, the least related SNP in this block (PEMT rs16961845, 0.69≤ D’≤0.89) was positive for maternal effects by three tests of association. This intronic SNP is in strong r2 LD with 6 other intronic SNPs, making it difficult to speculate about its function. It is also difficult to discern whether the associated SNPs in this block represent independent signals for case risk and for maternal risk, or whether a single signal for a case effect is being detected. Therefore, further studies on the variation of this gene and NTDs are warranted.
MFTC (mitochondrial folate transporter/carrier, SLC23A32) transports folate from the cytoplasm into the mitochondria. Some folate metabolic reactions occur in both the cytoplasm and in mitochondria via compartment-specific enzymes. The mitochondrion produces the majority of the one carbon units used by the cell (reviewed in ). As the genes coding for these mitochondrial enzymes have been identified, they have been shown to be intriguing and relevant candidates for NTD studies. For example, we previously reported that the gene encoding 5, 10-methylene-tetrahydrofolate dehydrogenase 1-like (MTHFD1L) contains a polymorphism associated with NTDs . Genetic variation affecting mitochondrial folate transport may also contribute to NTDs, as seen by our finding that 5 of 11 tested MFTC SNPs showed association (p<0.01) with NTD risk in cases. This gene falls in a region of very high D’ LD; the haplotype block containing these five SNPs extends ~92kb and contains two other genes: DCAF13 (DDB1 and CUL4 associated factor 13) and CTHRC1 (collagen triple helix repeat containing 1). Any SNP in this large haplotype block could be the causative variant driving the observed associations. As the only coding SNP in MFTC, the best candidate is rs17803441 (R117H). However, as arginine and histidine are both polar, basic amino acids, this is a fairly conservative change. We observed a minor allele frequency of 0.07 in this study. Conservation of the more common arginine residue is observed in chimp, wolf, cow, mouse, rat and zebrafish, but not in chicken or invertebrates. All of the SNPs in high LD (r2>0.7) with MFTC rs17803441 (R117H) in this block are intronic or intergenic. This SNP also had the lowest p-value for any test of association of all SNPs tested in this study. It would be of great interest to determine in an independent population whether it contributes to NTD risk.
CUBN encodes the intestinal receptor responsible for the uptake of the vitamin B12-intrinsic factor complex. It is also expressed in the kidney, where it is involved in reabsorption of many proteins and vitamins, including vitamin B12. This gene spans more than 300kb of DNA. The only reported SNP association in CUBN is for rs1907362, which was associated with case risk in a Dutch population . In contrast, we observed two highly significant SNPs in CUBN (rs7070148 and rs2273737) associated with maternal NTD risk. Due to their high LD these SNPs represent a single association signal. There were three other highly associated (p<0.01) SNPs in this gene. CUBN rs11591606 was associated with maternal risk, and is in a smaller haplotype block at the 3’ end of the gene. Two other CUBN SNPs (rs1801222 [S253F] and rs11254375) were also highly associated (p<0.01) with maternal risk and are in the same ~30kb haplotype block with rs7070148 and rs2273737 at the 5’ end of the gene. While there are many SNPs in this block that could be the causal risk SNP, rs1801222 is of interest since it is a coding SNP (S253F) that was significantly associated with lower serum vitamin B12 levels in a meta-analysis of three genome wide association studies of three Caucasian populations . This does not prove that CUBN rs1801222 is the causal SNP in either study, but it is consistent with the hypothesis that this SNP or another CUBN polymorphism linked to it within this haplotype block lowers vitamin B12 levels and thereby increases risk of an NTD pregnancy.
Multiple highly significant SNPs in ADA, PEMT, MFTC and CUBN account for half of the ten strongest association signals observed. The remaining five association signals are equally compelling. MTHFD1 rs2236225 (R653Q) was previously reported as a maternal NTD risk factor in the current study population [23,25] and others [22,24], while the other four signals represent new associations. First, CDKN2A rs3218009 was associated with maternal risk for NTDs. CDKN2a is a tumor suppressor gene that codes for several isoforms, including ARF (alternate open reading frame), a protein that stabilizes p53. A subset of mice carrying p53 null alleles exhibit overgrowth of neural tissue, supporting the importance of this pathway in normal neural tube development. Second, the same highly significant p-value was obtained for GART rs2070388 by two tests for case effect: TDT and log-linear analysis of a dominant model of case effect. GART is a trifunctional enzyme (phosphoribosylglycinamide formyltransferase, phosphoribosylglycinamide synthetase, phosphoribosylaminoimidazole synthetase) involved in de novo purine synthesis. For its phosphoribosylglycinamide activity, GART uses N10-formyl tetrahydrofolate as a one-carbon donor in the synthesis pathway of inosine monophosphate (IMP), a purine precursor. Interestingly, GART rs4817579 in intron 2 has been associated with cleft lip and/or palate plus dental anomalies . This variant was not tested in the current study, and the absence of GART rs2070388 from the HapMap data prevents us from evaluating the relatedness of these markers.
Third, DNMT3A rs7560488 was associated with NTD risk in cases. This gene encodes a DNA methyltransferase involved in de novo methylation during development. The folate pathway generates S-adenosyl methionine, which is used by DNMT3A as a methyl donor. Fourth, T (Brachyury) rs10806845 was associated with maternal NTD risk. The T (Brachyury) gene encodes a transcription factor involved in mesoderm formation and differentiation, and mice null for T (Brachyury) do not survive to term due to a number of developmental abnormalities, including fusion of the neural tube to the gut. Although previous studies differ in whether genetic variation in T (Brachyury) contributes to NTD risk in cases [44-48], our observation may be the first indication of its contribution to maternal risk of carrying an affected fetus.
Although no associations remained significant after conservative adjustment for multiple tests, it remains very possible that some of the evaluated candidates do in fact contribute to NTDs. The scale of our study design (using twelve tests of association to evaluate 1441 candidate SNPs) could contribute to Type II errors. This possibility is supported by the fact that of three SNPs previously reported to be associated with NTDs in this cohort (MTHFR 677C>T [49,50], MTHFD1 R653Q [23,25], TCblR G220R ) only one was observed to be associated in the current study design (Table (Table5).5). Only MTHFD1 R653Q was found to be significantly associated in the primary phase of the analysis, which was performed on approximately half the samples. MTHFR 677C>T and TCblR G220R were only found to be associated (p<0.05, uncorrected) when the full cohort of samples were used. This suggests the possibility that the stringency of correction may be too high. Additionally, it is important to note that MTHFD1 R653Q was ninth among the top ten association signals in this study (Table (Table4).4). This suggests that a number of the ten strongest association signals observed in this study play a role in NTD risk, and they should be high priority candidates for further study.
In summary, this study involves the largest evaluation of common genetic variation for NTD risk yet reported: 1441 SNPs in 82 candidate genes. While no SNP associations remained significant after correction for multiple tests, there is a strong possibility that the study design and/or stringency of correction has resulted in obscuring true associations. At least one established risk factor, MTHFD1 R653Q, was corrected away, suggesting our approach was extremely conservative. Therefore, variation in the top genes identified in this study should be examined in independent populations for NTD risk, especially since many of these genes (MFTC, CDKN2A, ADA, CUBN, DNMT3A, and T (Brachyury)) represent new avenues of investigation.
The recruitment of the Irish NTD families (cases and parents) and controls has been described [23,34,51,52]. Briefly, the cohort includes 586 families with an NTD case; 442 of these families are full family triads (DNA from case, mother and father). For this study, 570 of the NTD families had sufficient DNA and were divided into two sets, one for primary analysis and one for secondary (combined) analysis. The primary and secondary sample sets were matched as closely as possible for the following parameters: the number of complete NTD triads, the proportion of NTD cases with spina bifida vs. other NTDs, and NTD case gender (Table (Table66).
The control population (n=999) is a random sample drawn from 56,049 blood samples donated by women at their first prenatal visit to the three major maternity hospitals in Dublin (1986–1990). A subset of controls (n=341) was randomly selected for the primary screen, and the remaining controls were used in the secondary set.
Written, informed consent was obtained from study subjects, their parents or their guardians. Archived control samples were anonymized prior to analysis. The study was approved by the Ethics Research Committee of the Health Research Board (Dublin, Ireland) and the Institutional Review Board at the National Human Genome Research Institute (Bethesda, MD, USA). Extraction of genomic DNA from blood samples and buccal swabs was performed using the QIAamp DNA Blood Mini Kit (Qiagen, Valencia, CA, USA).
Genes were chosen for study because they are involved in folate and vitamin B12 metabolism or other metabolic and signaling pathways implicated in the etiology of NTDs (Figure (Figure1).1). Although previously published, MTHFR[52-54], MTHFD1[23,25], MTHFD1L, TCblR, and TP53 were reanalyzed in this study in order to compare the differing research strategy, which includes new genetic models of risk assessment. For each gene chosen, we evaluated the transcribed region of the gene and 10 kilobases (kb) upstream and downstream of the gene in an effort to include polymorphisms with potential proximal effects, such as promoter variants. In order to capture the common variation in each gene, SNPs genotyped by HapMap (Data Release 21, Phase II, Anon, 2003 ) were considered. A set of tagSNPs was identified using an algorithm based on optimizing for tagSNPs with maximal minor allele frequencies (MAFs) and an r2 threshold of 0.8, while maximizing the MAF of the selected tagSNP. In addition to this set of tagSNPs, validated variants from dbSNP  were also selected for physical coverage so that spacing between SNPs would be less than 20kb within D’ haplotype blocks and less than 5kb between haplotype blocks. A total of 1517 SNPs were selected.
The selected 1517 SNPs were genotyped on the primary sample sets (320 NTD case families and 341 controls). Genotypes were generated by the Johns Hopkins University SNP Center (Baltimore, MD) using the Illumina GoldenGate assay (Illumina, San Diego, CA). Of the 1517 SNPs attempted, 1320 SNPs (87.2%) remained after filtering out low quality data (re-attempted on another platform, see below). The overall call rate for these 1320 SNPs was 98%. All but 150 of these SNPs had a call rate of >95%; the rest had an average call rate of 87.2% (±7.9%) and were re-genotyped on another platform (see below). Both the overall duplicate concordance rates and the Mendelian consistency rates were 99.99% for the 1320 accepted SNPs.
Based on analyses of the primary sample sets, 93 SNPs were genotyped in the secondary sample set (250 NTD case families and 658 controls). Genotypes were generated by KASPar chemistry at Kbiosciences (Herts, UK). Three SNPs failed on this platform: rs127317149, rs7367859 and rs7096079. For the 90 successfully genotyped SNPs, the overall duplicate concordance rate was 99.81% and the overall Mendelian consistency rate was 99.94%.
SNPs that could not be assayed (n=197) or returned low call rates (<95%, n=150) via Illumina GoldenGate chemistry were re-genotyped in the entire sample set (570 NTD families and 999 controls) by detection of allele-specific primer extension using matrix-assisted laser desorption/ionization – time of flight (MALDI-TOF) mass spectrometry (Sequenom, San Diego, CA, USA). Genotyping data from SNPs with call rates above 95% were added to the final analyses (121/197 SNPs that failed and 106/150 SNPs that yielded low call rates). The overall duplicate concordance rate was 99.10% and the overall Mendelian consistency rate was 99.32% for these 227 SNPs.
To summarize, our final data set consisted of 1441 SNPs; 1320 high quality SNPs from the Illumina platform (93 of these SNPS were also typed in the secondary samples using the KASPar platform, and 106 of these SNPs were re-typed in the entire sample set using the Sequenom platform) plus 121 SNPs from the Sequenom platform.
D’ and r2 measures of LD for SNPs of interest were estimated based on control genotypes using Haploview . Haplotype blocks were based on D’ values using the Solid Spine of LD option in Haploview.
The design for this study involved two stages of genotyping. Rather than use the secondary samples for a replication study, joint analysis of the combined dataset was performed since it generally provides greater power to detect a genetic effect . In the initial analysis, all SNPs successfully typed with the Illumina GoldenGate assay (n=1320, including the 150 high quality SNPs with call rates of <95%) were analyzed in the primary sample set by four tests of association. Two tests were performed to evaluate case effects: 1) Logistic regression, a 1-degree of freedom (DOF) test of association between the affected status and number of risk alleles; and 2) the Spielman transmission disequilibrium test (TDT, ). Similarly, two tests were performed to evaluate maternal effects: 1) Logistic regression, a 1-DOF test of association between the maternal status and number of risk alleles; and 2) log-linear modeling with 2-DOF to test for effect of the maternal genotype.
SNPs were selected to be genotyped on all samples (570 NTD case families and 999 controls) if they met the following criteria: 1) SNPs of interest reaching a significance level in the primary analysis (n=93, Table Table2);2); or 2) failed SNPs (n=121) and SNPs with low call rates (<95%, n=106). Final analyses were performed on the entire dataset and consisted of twelve association tests. NTD case–control and NTD mother-control comparisons were performed using continuous, recessive and dominant coded models of logistic regression to generate odds ratios (ORs) and 95% confidence intervals (CI). Six family-based tests of association were also applied using log-linear models. The NTD triads (case, mother and father) were analyzed for case effects using recessive, dominant and linear (TDT) coding while direct maternal effects were analyzed using recessive, dominant and 2-DOF models.
To correct for multiple comparison, the final analysis used the complete information on the 1441 SNPs. Correction was performed by multivariate permutation (N=9,999 random permutations) for three of the tests used in the initial analysis: case–control logistic regression, mother-control logistic regression and the TDT. This method accounts for any linkage disequilibrium between SNPs. Multivariate permuting of triads for the TDT was performed by treating the test as a one-sample test and permuting the risk allele . Independent permutations were performed for the cases and controls or mothers and controls to adjust for multiple comparisons in the tests of logistic regression. The results were combined by Bonferroni adjustment to account for all comparisons (including all SNPs because of our multivariate permutation approach) while controlling the probability of any false positives at 5%. Since a SNP could only be found significant based on this combined analysis, our method provides type I error control regardless of the SNP selection process or number of SNPs selected for genotyping on all samples.
DOF, Degree(s) of freedom; LD, Linkage disequilibrium; NTDs, Neural tube defects; RAF, Risk allele frequency; SNP, Single nucleotide polymorphism; TDT, Transmission disequilibrium test.
The authors declare that they have no competing interests.
FP, AMM, JLM, PNK, BS, JMS and LCB formulated the study design and performed candidate gene selection. FP additionally performed SNP selection and drafted the manuscript. AMM additionally managed the DNA samples and their selection. AP-M, VBO’L and CS participated in candidate gene selection. JMS and KG-S prepared DNA. AM and JEV extracted DNA and performed genotyping. KMK, AS, JC-H and NS performed genotyping. PC developed the SNP selection algorithm. MC provided database management. JFT contributed to study design and performed the statistical analyses. All authors read, edited and approved the final manuscript.
The pre-publication history for this paper can be accessed here:
These studies would not be possible without the participation of the affected families, and their recruitment by the Irish Association of Spina Bifida and Hydrocephalus and the Irish Public Health Nurses in Ireland. This study was supported by the Intramural Research Programs of the National Institutes of Health, Eunice Kennedy Shriver National Institute of Child Health and Human Development, and the National Human Genome Research Institute.