|Home | About | Journals | Submit | Contact Us | Français|
Polymorphisms in folate-related genes have emerged as important risk factors in a range of diseases including neural tube defects (NTDs), cancer and coronary artery disease (CAD). Having previously identified a polymorphism within the cytoplasmic folate enzyme, MTHFD1, as a maternal risk factor for NTDs; we considered the more recently identified mitochondrial paralogue, MTHFD1L as a candidate gene for NTD association. We identified a common deletion/insertion polymorphism, rs3832406, c.781-6823ATT(7-9), that influences splicing efficiency and is strongly associated with NTD risk. Three alleles of rs3832406 were detected in the Irish population with varying number of ATT repeats; Allele 1 consists of ATT7, while Alleles 2 and 3 consist of ATT8 and ATT9 respectively. Allele 2 of this triallelic polymorphism showed a decreased case risk as demonstrated by case-control logistic regression (P= 0.002) and by transmission disequilibrium test (TDT) (P= 0.001); while Allele 1 showed an increased case risk. Allele 3 showed no influence on NTD risk and represents the lowest frequency allele (0.15). Additional SNP genotyping in the same genomic region provides additional supportive evidence of an association. We demonstrate that two of the three alleles of rs3832406 are functionally different and influence the splicing efficiency of the alternate MTHFD1L mRNA transcripts.
Low maternal folate levels and common genetic variation are recognised as important risk factors for a group of common birth malformations known as neural tube defects or NTDs (Kirke and Scott, 2005). The role of folate in NTD causation was definitively demonstrated by a number of intervention trials showing up to 70% reduction in NTD-affected pregnancies when women ingest a folic acid supplement in the periconceptional period (MRC Vitamin Study Research Group, 1991; Czeizel and Dudas, 1992). This pointed towards genes involved in the transport and metabolism of folate as prime candidates for contributing to the genetic risk of NTDs. However, only two such genes have been demonstrated so far as playing a definitive role in NTD-association in the Irish population. These genes encode two enzymes involved in the cytoplasmic metabolism of folate and are known as Methyltetrahydrofolate Reductase (MTHFR; MIM# 607093) and C1-Tetrahydrofolate Synthase (MTHFD1; MIM# 172460), and have been confirmed in replicate studies (Botto and Yang, 2000; Brody et al., 2002; Parle-McDermott et al., 2006). We considered another folate metabolic gene, MTHFD1L (MIM# 611427), as a candidate for association with NTDs. While the MTHFD1 gene encodes the cytoplasmic C1-Tetrahydrofolate Synthase enzyme and possesses three enzymatic activities, the MTHFD1L gene encodes the mitochondrial localised C1-Tetrahydrofolate Synthase enzyme and appears to be monofunctional. The mere existence of a mitochondrial C1-Synthase was controversial for many years until its existence was clearly demonstrated by recent publications (Prasannan et al., 2003; Walkup et al., 2005; Christensen et al., 2005). The strong association of the cytoplasmic C1-Synthase (MTHFD1) with NTDs prompted us to consider this newly discovered mitochondrial localised enzyme, MTHFD1L, as an important candidate for NTD association.
Our initial approach was to identify candidate polymorphisms within MTHFD1L with a likely functional effect. We identified an intronic deletion/insertion polymorphism (DIP) within intron 7 (rs3832406, c.781-6823ATT(7-9)) that is in close proximity to an alternatively spliced exon (Figure 1a). MTHFD1L produces two transcripts of 3.6kb and 1.1kb; the shorter transcript includes an alternative exon 8a that generates an mRNA containing a premature stop codon (Prasannan et al., 2003) resulting in a protein that lacks synthetase activity (Figure 1b). We hypothesized that rs3832406 alters splicing efficiency as it occurs within a polypyrimidine tract (PPT) that forms part of the ‘splicing code’ (Wang and Cooper, 2007). Recent research has recognised the important role of common genetic variants on gene expression patterns (Morley et al., 2004) and alternative splicing efficiency (Hull et al., 2007). The significance of an impact on splicing efficiency in relation to MTHFD1L is an increased or decreased amount of functional folate enzyme depending on genotype. A polymorphism with such a functional impact may also contribute to NTD risk. We addressed this hypothesis by examining the relative ratio of the long to short transcript by Quantitative Reverse Transcription (RT)-PCR in cell lines of known rs3832406 genotype. NTD association was assessed by case-control and triad family based analyses by genotyping rs3832406 plus 118 SNP markers spanning the entire MTHFD1L gene. Our data confirms our hypothesis i.e., the rs3832406 impacts on alternative splicing efficiency of the MTHFD1L mRNA transcripts and increases case risk of NTDs.
Quantitative Reverse Transcription-Polymerase Chain Reaction (Q RT-PCR) was performed on RNA extracted from the following Coriell® lymphoblast cell lines (Coriell Institute for Medical Research, Camden, New Jersey, US): NA17124, NA17142, NA17146, NA17147, NA17158, NA17165, NA17201, NA17214, NA17218, NA17229, NA17246, NA17291. These lines represent 6 homozygotes of Allele 1 or Allele 2 for rs3832406 and each genotype group had an equal number of African American or Caucasian individuals. The cell lines were cultured in RPMI1640 with 10% Fetal Bovine Serum and 1% Penicillin/Streptomycin (10000U: 10mg/ml) at 37°C at 5% CO2. All lines were routinely tested for Mycoplasma utilising a PCR based assay and were found be uncontaminated. RNA was extracted using Qiagen RNeasy kit (Cat. No. 74104, UK) and Qiagen Qiashredders (Cat. No. 79654, UK). DNase 1 treatment to remove potential contaminating genomic DNA was carried out by ‘on-column’ treatment as described in the Qiagen manual. RNA quality was verified by measurement of A260/A280nm ratios using a Nanodropper and by resolution on a 1% agarose gel. RNA was synthesised to cDNA using Superscript II (Invitrogen, UK) and a combination of oligodT and random hexamers according to the manufacturer’s instructions. Q RT-PCR was performed on the Roche Lightcycler® 480 instrument using assays designed to specifically detect the long or short transcript of MTHFD1L. Assays were designed utilising the ProbeFinder software and the Universal Probe library as follows; MTHFD1L long: Forward 5′ GAGCTCTGAAGARGCATGGAG 3′ Reverse 5′ TGCTTCTGGAGGTTACAGCA 3′ Universal Probe #42; MTHFD1L short: Forward 5′ ACGCCAGCTTCAAAGCAA 3′ Reverse 5′ TCACAGGAGAATCACTTCAACC 3′ and Universal Probe #13. The PCR efficiencies for each assay was assessed using the pooled sample of the experimental cDNAs and was 1.83 for the short assay and 2.0 for the long assay. All assays were intron spanning and were performed using the Probes Master Mix (Roche, UK) as recommended by the manufacturer. The assays were carried out in duplicate incorporating ‘minus Superscript’ and PCR negative controls and replicated several times on separate RNA extractions. The relative ratio of the MTHFD1L long transcript relative to the short transcript was generated by the Roche Lightcycler® 480 Relative Quantification software employing the E (Efficiency)-Method. The E-Method compensates for differences in PCR efficiency of the target and reference genes in particular sample sets and is thought to provide a more accurate estimate of relative quantitative data that the ΔΔCT method.
Families affected by an NTD pregnancy were recruited throughout the Republic of Ireland with the assistance of the Irish Association for Spina Bifida and Hydrocephalus (IASBAH) and the Irish Public Health Nurses from 1993 to 2005. These families, consisting of both complete and incomplete triads (case, mother, father), formed the NTD cohort. Control samples for NTD association were selected from a population of 56,049 pregnant women attending the three main maternity hospitals in the Dublin area between 1986 and 1990. These women had no history of an NTD-affected pregnancy. Details of these cohorts have been previously published (Brody et al., 2002). In addition, genotype and blood metabolite data were available from a cohort of 2,524 healthy, ethnically Irish individuals consisting of university students aged between 18 and 25 years old and recruited over a period of one academic year (TSS cohort). Informed consent and ethical approval were obtained for all human samples used in this study.
Genomic DNA was extracted from all samples using a Qiagen QIAamp DNA Blood Mini Kit or a Qiagen DNeasy Kit (Qiagen, UK). The rs3832406 polymorphism (c.781-6823ATT(7-9)) was genotyped by PCR amplification using fluorescently labelled primers that flanked the DIP (Forward primer: 5′ 6-FAM TTCTCTTTCTTAGCCCCACG 3′ ; Reverse primer: 5′ AGAGCTTGCAGTGAGCCTAGA 3′) and products were resolved and scored on an ABI 377 or 3100 using Genescan 3.1.2 software. For NTD association the rs3832406 DIP was genotyped in 860 controls, and the following samples from the NTD cohort: complete triads n= 439, mother & case only n= 55; mother & father only n= 34; case only n= 42; mother only n= 166; father only n= 2. Additional rs3832406 genotyping included all 2,524 samples from the TSS cohort and a panel of Coriell® lymphoblast cell lines isolated from African American and Caucasian individuals. Quality control for rs3832406 genotyping consisted of repeat genotyping of at least 10% of samples by the same assay (99% agreement), repeat genotyping of an additional 10% of samples by a different assay (94% agreement) and a genotyping success rate of >95%. Discrepant genotype calls were resolved by re-genotyping or were left out of the final analysis. SNPs within MTHFD1L were genotyped within a 1,536 custom Illumina® Goldengate assay (see Supp. Table S1) on a subset of the NTD cohort and controls as follows: 277 complete triads and 340 controls. The call rate for the Illumina® Goldengate assay was 98% with a blind duplicate concordance rate of 99.99%.
A linkage disequilibrium plot of MTHFD1L was generated using Haploview (http://www.broad.mit.edu/mpg/haploview/) (Barrett et al., 2005). Haplotype blocks for genomic region spanning introns 7-10 were defined using the Solid Spine of LD and included the following SNPs: rs803422, rs17080461, rs3832406, rs2295083, rs712210, rs6905272, rs71208, rs12195069, rs17080476, rs1771845, rs175862, rs9397365, rs2295084, rs175853, rs803456. Haplotype frequencies of the blocks were estimated using PHASE 2.1.1 (Stephens et al., 2001; Stephens and Donnelly, 2003). Additionally, PHASE 2.1.1 was used to compare haplotype frequency distributions in controls and NTD cases or NTD mothers using a permutation test.
Non-fasting blood samples were processed within 2 hours of collection. Full blood count data including mean corpuscular volume (MCV) were determined on fresh EDTA blood samples using a Sysmex F-800 Microcell counter. Serum and red cell hemolysates in 1% ascorbic acid were stored at -40°C until analyzed for total folate by microbiological assay (Molloy and Scott, 1997). Plasma total homocysteine (tHcy) was determined by immunofluorescence using an Abbott IMX instrument (Leino, 1999).
Genotype frequencies of rs3832406: were compared between each sample group i.e., mother, father or cases versus control by a χ2 test. An effect of each genotype was compared to the combined frequency of the other genotypes for each comparison (data not shown). Associations with an NTD were tested in cases/controls and separately in mothers/controls by logistic regression with a continuous term indicating the number of alleles of a given type. The transmission of alleles from parents to affected NTD cases was assessed by Transmission Disequilibrium Test using SAS PROC GENMOD. Case and maternal effects were also assessed by a two degree of freedom log-linear model using SAS PROC GENMOD (Weinberg et al., 1998; Wilcox et al., 1998).
Q RT-PCR relative ratios (as described above) were tested for statistical significance by a Mann Whitney U test using SPSS 15.0 for Windows by stratifying the fold change by genotype. P-values below 0.05 were considered significant for all analyses.
Initial genotyping of rs3832406 in control samples from the Irish population identified a total of three alleles that differ in the length of a repeated ‘ATT’ sequence within the polypyrimidine tract of the alternatively spliced exon 8a of MTHFD1L (Figure 1). The ‘ATT’ sequence occurs as 7 (ATT7), 8 (ATT8) or 9 (ATT9) repeats which are referred to as Alleles 1, 2 or 3 respectively. The ratio of the long to short transcript of MTHFD1L was assessed by Quantitative Reverse Transcription (RT)-PCR in a panel of Coriell® cell lines that were homozygous for the most common alleles i.e., Alleles 1 or 2. The ratio of the MTHFD1L long transcript relative to the short transcript was generated by the Roche Lightcycler® 480 Relative Quantification software employing the E (Efficiency)-Method. Q RT-PCR relative ratios were tested for statistical significance by a Mann Whitney U test using SPSS 15.0 for Windows by stratifying the fold change by genotype. The result of this analysis (Figure 1c; Supp. Table S2) showed that the genotype of rs3832406 is associated with splicing efficiency. Allele 1 is associated with having an approximately 1.4 fold higher proportion of the long transcript relative to the short transcript compared to Allele 2 (P= 0.006). Our panel of Coriell® cell lines included a single line that was homozygous for Allele 3 and thus, could not be included in the final data analysis.
Our association study consisted of NTD triads (mother, father and affected case) and controls from the Irish population. Not all triad families were complete i.e., samples from all three family members were not always available. The rs3832406 DIP was genotyped in a total of 1,705 samples from both complete and incomplete NTD triads and 860 controls. Associations with an NTD were tested in cases/controls and separately in mothers/controls by logistic regression with a continuous term indicating the number of alleles of a given type. The transmission of alleles from parents to affected NTD cases was assessed by Transmission Disequilibrium Test using SAS PROC GENMOD. Case and maternal effects were also assessed by a two degree of freedom log-linear model using SAS PROC GENMOD (Weinberg et al., 1998; Wilcox et al., 1998). The rs3832406 polymorphism showed strong evidence for a case association. As described above, this polymorphism is a repeated ‘ATT’ sequence that has three common alleles 7 (ATT7), 8 (ATT8) or 9 (ATT9) referred to as Alleles 1, 2 and 3 respectively. Case-control logistic regression and Transmission Disequilibrium Test (TDT) analysis (Table 1) revealed that carriers of Allele 1 are associated with an increased risk of having an NTD (TDT, P= 0.016), while Allele 2 carriers appear to have a decreased risk (TDT, P= 0.001). A two degree of freedom log-linear analysis confirmed these case associations (Table 1). Allele 3 showed no evidence of an association and has the lowest frequency. We also tested whether rs3832406 correlated with circulating folate or homocysteine in 2,524 healthy students and found no evidence of an association (data not shown).
We genotyped 118 SNPs by Illumina® Goldengate assay to ensure appropriate coverage of the MTHFD1L gene (Supp. Table S1). A total of 277 complete NTD triad families (831 samples) and 340 controls were genotyped. Data analysis revealed several case and maternal association signals from three SNP clusters within MTHFD1L (Figure 2). These SNP clusters appear to be separated by recombination hotspots based on Phase II HapMap data as reported previously (Samani et al., 2007). This suggests that polymorphisms from three separate regions of the MTHFD1L gene have independent associations with NTDs. We focused our attention on the cluster of association signals spanning genomic region intron 7 to 10 which is marked as region ‘a’ in Figure 2 and is shown in more detail in Figure 3. The three allele rs3832406 data were collapsed into a two allele format to incorporate the DIP into the LD map (Figures (Figures22 and and3).3). The linkage disequilibrium plot of MTHFD1L was generated using Haploview (Barrett et al., 2005) and the predicted haplotypes for this region (introns 7 to 10; Figure 3) are described in Table 2. Haplotype blocks were defined using the Solid Spine of LD. Haplotype frequencies of these blocks were estimated using PHASE 2.1.1 (Stephens et al., 2001; Stephens and Donnelly, 2003). There are 6 additional SNP markers in this region showing NTD disease association; all are intronic and show significant association for case-control or mother-control by logistic regression or log linear analysis. The case-control associated SNPs are as follows: rs17080461 (Logistic Regression (LR) Odds Ratio (OR) 1.16, P= 0.05, Minor Allele Frequency (MAF)= 0.126), rs2295083 (LR OR 1.45, P= 0.02, MAF= 0.15), rs712208 (Log linear (LL) Relative Risk (RR) 4.3, P= 0.05, MAF= 0.20), rs17080476 (LL, RR 0.73, P= 0.002, MAF= 0.17). The mother-control associated SNPs are as follows: rs712210 (LL RR 0.55, P= 0.05, MAF= 0.50), rs175853 (LL RR 0.44, P= 0.04, MAF=0.33). SNP rs17080476:A>G showed the most significant association for an NTD case effect as revealed by case-control logistic regression (P= 0.009), by Transmission Disequilibrium Test (TDT) (P= 0.05) and by log linear analysis (P=0.002). The D’ values between rs3832406 and rs17080476 is 0.61 with an r2 of 0.13. SNP rs712208 had the highest case relative risk of 4.3. The D’ values between rs3832406 and rs712208 is 0.902 with an r2 of 0.39. The D’ values between rs3832406 and the other disease-associated SNPs from this region range from 0.56 to 1, and none of the polymorphisms share strong r2 values (r2≤0.43). This provides supporting evidence that the genomic region of MTHFD1L incorporating introns 7-10 harbours a disease associated polymorphism. Whether the disease causing variant is acting independently or in combination with a haplotype is difficult to determine; however our haplotype analyses did not identify an association with a specific haplotype (Table 2). All polymorphisms showing a significant association with NTD risk from this genomic region are intronic. Our evidence to date supports rs3832406 as the most plausible variant within this region to contribute to disease causation as we proposed and provided evidence for a functional effect of this variant. However, given the nature of LD and association studies, we cannot rule out a yet to be identified polymorphism from this genomic region as contributing to NTD risk.
Our functional analysis and association study of the MTHFD1L gene has identified rs3832406 as a risk factor for NTDs by impacting on alternative splicing efficiency. This interesting polymorphism resides within the PPT of intron 7; an important element that is recognised by the splicing machinery. The consensus ‘UCUU’ within a pyrimidine rich sequence provides the optimal binding site for the polypyrimidine tract binding protein (PTB) (Perez et al., 1997). PTB, also known as hnRNP-I, acts as a splicing repressor through interference with a necessary component of the spliceosome, U2AF (Sharma et al., 2005; Izquierdo et al., 2005). Splicing repression via PTB involves binding at sites both upstream and downstream of the exon (Amir-Ahmady et al., 2005). The upstream site adjacent to the 3′ splice site appears to bind PTB with high affinity i.e., equivalent to the PTB site adjacent to rs3832406 in MTHFD1L. Cooperative binding with weaker PTB binding site(s) is necessary for splicing repression. MTHFD1L contains at least four consensus ‘UCUU’ sites located both upstream and downstream of alternate exon 8a. Our data indicates that a change in the length of the polypyrimidine tract at the high affinity PTB binding site interferes with the efficiency of PTB-mediated repression, possibly by weakening cooperative binding of multiple PTBs. A shortened polypyrimidine tract as in Allele 1 (ATT7) appears to result in less efficient splicing of exon 8a, while extending the tract by 3 bases as in Allele 2 (ATT8) results in more efficient splicing of exon 8a. The functional consequences of this are that Allele 1 carriers, particularly in the homozygous state, have a higher proportion of functional MTHFD1L mRNA compared to Allele 2 carriers (the shorter alternatively spliced mRNA lacks enzyme activity). Direct assessment of endogenous MTHFD1L protein level and activity is complicated by the difficulty of separating the mitochondrial form of this enzyme from the more abundant cytoplasmic form as illustrated by the controversy that surrounded the existence of MTHFD1L initially (Prasannan et al., 2003).
The association analysis of rs3832406 in our NTD study cohort provides strong evidence that this common variant increases the case risk of an NTD. The case-control analysis association was strongly supported by the TDT analysis which identified opposite effects on risk for two of the alleles. Allele 1 was associated with an increase of NTD risk while Allele 2 was associated with a decreased risk (Table 1). Allele 3 appeared to have no impact on NTD risk, possibly by not significantly influencing the ratio of the long to short mRNA form of MTHFD1L; but this requires further investigation. The association of rs3832406 prompted an assessment of other variants within the gene region. Our screen of an additional 118 SNP markers in a subset of our NTD study cohort confirmed the region surrounding and including intron 7 as harbouring variant(s) that influence the risk of NTDs in the Irish population (Figures (Figures22 and and3).3). Haplotype analysis did not detect a significant association with any of the haplotypes and risk of NTDs but haplotype ‘GG1AGAG’ showed a frequency of 13% in controls compared to 19% in cases; while haplotype ‘GA2GGAA’ showed a frequency of 12% in controls compared to 8% in cases. The association of the MTHFD1L gene region incorporating introns 7 to 10 with NTD risk may be due to rs3832406 itself, the DIP in the context of other variants in the haplotype or there may be an unidentified disease causing polymorphism(s) that remain to be identified. However, the location of the rs3832406 within the PPT adjacent to exon 8a, the impact on alternative splicing efficiency and NTD disease association points toward this triallelic polymorphism as the strongest candidate for directly contributing to disease causation. Our analysis also identified two other regions of the MTHFD1L gene as harbouring additional variants that are associated with NTDs.
Evidence is accumulating that the MTHFD1L gene is not just important for risk of NTDs. A recent genome wide association study also identified MTHFD1L as a risk factor of coronary artery disease (CAD) in both UK and German populations (Samani et al., 2007). A lead positive SNP in their analysis, rs6922269:A>G, resides in intron 11 with a recombination hotspot occurring between it and rs3832406. Although not genotyped in the current study, its physical location excludes it from the three SNP clusters showing NTD association (Figure 2). Thus, it appears that rs6922269:A>G and rs3832406 represent separate risks in their respective disease associations.
Approximately 41% of populations of North European descent are homozygous for Allele 1. Our Q RT-PCR data predict that these individuals could have up to 70% more functional MTHFD1L and this somehow alters their risk of having an NTD. How does having more MTHFD1L increase ones risk of having an NTD? Studies have shown that folate metabolism is compartmentalized between the cytoplasm, mitochondrion (Appling, 1991) and more recently the nucleus (Anderson et al., 2007). This compartmentalisation is thought to facilitate the different metabolic roles within the cell (Anderson et al., 2007). The mitochondrial folate pathway is believed to play an important role during embryogenesis by ensuring an adequate supply of formate and glycine (Christensen and MacKenzie, 2006). Formate is the preferred one-carbon donor for purine synthesis. Mitochondrial C1-Tetrahydrofolate Synthase encoded by MTHFD1L supplies this formate by catalyzing the reversible synthesis of 10-formyltetrahydrofolate to formate and tetrahydrofolate (Figure 4). Increased production of formate in Allele 1 homozygotes may disrupt the one-carbon flux through the mitochondria and thus, interfere with cellular proliferation. Alternatively, a higher level of formate itself may be toxic to cells which also results in disrupted cellular production during embryogenesis.
In conclusion, we have identified an MTHFD1L functional polymorphism that appears to influence NTD disease risk by affecting splicing efficiency. We acknowledge that replication of this association in another population is required to demonstrate whether this gene has relevance for NTDs outside of Ireland. The MTHFD1L gene is now implicated in two distinct common diseases i.e., NTDs and CAD. The role of MTHFD1L in disease risk highlights the importance of folate metabolism in maintaining health.
These studies would not be possible without the participation of the affected families, and their recruitment by the Irish Association of Spina Bifida and Hydrocephalus and the Irish Public Health Nurses. The authors also thank Peter Chines and Kristine Krebs for their computational assistance.
Grant sponsors: Intramural research programs of the National Human Genome Research Institute, the National Institute of Child Health and Human Development and the Health Research Board (HRB), Ireland including a HRB Research Project Grant (to A.P-M); Grant number RP/2005/58.