|Home | About | Journals | Submit | Contact Us | Français|
Most newborn screening (NBS) laboratories use second-tier molecular tests for cystic fibrosis (CF) using dried blood spots (DBS). The Centers for Disease Control and Prevention’s NBS Quality Assurance Program offers proficiency testing (PT) in DBS for CF transmembrane conductance regulator (CFTR) gene mutation detection. Extensive molecular characterization on 76 CF patients, family members or screen positive newborns was performed for quality assurance. The coding, regulatory regions and portions of all introns were sequenced and large insertions/deletions were characterized as well as two intronic di-nucleotide microsatellites. For CF patient samples, at least two mutations were identified/verified and four specimens contained three likely CF-associated mutations. Thirty-four sequence variations in 152 chromosomes were identified, five of which were not previously reported. Twenty-seven of these variants were used to predict haplotypes from the major haplotype block defined by HapMap data that spans the promoter through intron 19. Chromosomes containing the F508del (p.Phe508del), G542X (p.Gly542X) and N1303K (p.Asn1303Lys) mutations shared a common haplotype subgroup, consistent with a common ancient European founder. Understanding the haplotype background of CF-associated mutations in the U.S. population provides a framework for future phenotype/genotype studies and will assist in determining a likely cis/trans phase of the mutations without need for parent studies.
Cystic fibrosis (CF) is a childhood-onset inherited disorder that significantly shortens the life span of affected individuals. The CF transmembrane conductance regulator (CFTR, OMIM ID: 602421) gene is located on chromosome 7 and codes for a protein that is expressed in the respiratory epithelial cells, gastrointestinal system (pancreatic and liver ductals), the vas deferens in males, mucous-secreting cervical cells in women and sweat gland ductals . Defects in the CFTR gene that alter structure, function or expression of this protein can lead to malfunctions or disease processes in the lungs and upper respiratory tract, gastrointestinal tract, pancreas, liver, sweat glands and genitourinary tract . CF is an autosomal recessive disorder that affects approximately 1:4000 people of Western European, North American and Australasian descent. There are over 1850 mutations in the CFTR gene, of which many have been linked to CF while others have unknown or no known consequences [3,4]. Newborn screening (NBS) for CF first became feasible in 1979 when the dried blood spot (DBS) specimens from newborn infants diagnosed with CF were shown to have elevated levels of immunoreactive trypsinogen (IRT) , leading to an IRT/IRT screening method . After the discovery of the CFTR gene in 1989, screening expanded to include an IRT/DNA method [7,8]. Data from early screening programs prompted the Centers for Disease Control and Prevention (CDC) to recommend the inclusion CF in NBS panels .
CDC’s Newborn Screening and Molecular Biology Branch houses the NBS Quality Assurance Program (NSQAP), co-sponsored by the Association of Public Health Laboratories (APHL), which offers proficiency testing (PT) programs for both IRT and CFTR mutation detection [10,11]. At the end of 2010, ~84% of newborns were screened for CF using the IRT/DNA screening algorithm, which was the first widespread DNA based screening assay . NSQAP’s CFTR mutation collection contains a wide variety of CF-causing mutations including the 23 recommended by the American College of Medical Genetics (ACMG) [13,14] providing quarterly PT challenges to 60 participating laboratories. The aims of this analysis were to verify and identify all CFTR mutations in a diverse group of CF cases and some carriers using comprehensive molecular detection methods. The results from the molecular analysis allowed the prediction of haplotypes for the identified CFTR mutations, providing a detailed molecular framework for the CFTR gene. This framework may assist in understanding why certain genotype combinations may be expressed phenotypically more than others.
NSQAP’s CF Mutation Detection PT program received samples from 72 adult or adolescent CF patients or family members . Four N1303K DBS specimens were provided by the California Department of Public Health NBS Program; two from patients diagnosed with CF and two from patients diagnosed with CF-related metabolic syndrome . CDC’s Human Research Protection Office did not consider this study to be human research as all specimens are de-identified and cannot be traced back to the donor. DNA was extracted either from three 3 mm DBS punches (Whatman Grade filter 903 paper, Kent, United Kingdom) using Qiagen’s QIAamp DNA Micro kit or from 250 μl of blood using QIAamp DNA Mini kit (Valencia, California). DNA concentrations were determined by quantitative PCR of the RNase P gene (Applied Biosystems, Foster City, California) using a standard curve made from pooled human genomic DNA (Roche Applied Science, Indianapolis, Indiana).
The CFTR gene was amplified and sequenced using VariantSEQr CFTR (set RSS000010013, Applied Biosystems) with the following modifications: amplification using HotStarTaq Master Mix (Qiagen), unused primer and nucleotide removal using ExoSAP-IT (USB Corporation, Cleveland, Ohio), sequencing using BigDye Terminator (BDT) Ready Reaction kit ver 1.1 and sequencing reaction purification using Applied Biosystem’s BigDye XTerminator. Since Variant SEQr kit provided incomplete coverage for exons 10, 14, 15, intronic mutation c.3717+12191C>T (3849+10kbC→T) and the 3′ UTR, and exon 1 failed to amplify, published primer sequences were used to detect exon 14 [16,17], exon 15 and the intron 22 mutation [16,17]. Primer sequences for exons 1, 10, 14 and the 3′ UTR were custom designed (Table 1). Amplification for custom designed regions used Qiagen’s HotStarTaq Master Mix with 1 μM of forward and reverse primer. Cycling parameters included a 10 min denaturing step at 95 °C followed by 34–38 cycles at 95 °C for 30 s, specific primer annealing temperature for 30 s, and 72 °C for 1 min; followed by an extension at 72 °C for 8 min and an indefinite hold at 4 °C (Table 1). Unused primer and nucleotides were removed using ExoSAP-IT. Cycle sequencing reactions consisted of 1 μl of PCR product, 0.5 μl of BDT ver1.1, 1.75 μl of 5× sequencing buffer, and 3.2 pmol of primer. Sequencing reactions were purified and electrophoresed with run module BDx_Rapid-Seq36_POP7 and data was analyzed using SeqScape software from Applied Biosystems. DNA sequence was aligned with Genbank CFTR genomic reference sequence NG_016465.
The SALSA MLPA kit P091 CFTR (MRC Holland, Amsterdam, Netherlands) was performed with modifications: 5% glycerol was added to the denaturation step, which was extended from 5 to 10 min. Peak sizes were determined using the internal lane standard, LIZ500, with Applied Biosystems’s GeneMapper software and imported into MRC-Holland’s Coffalyser software to determine deletions and/or insertions.
Microsatellites IVS8CA and IVS17bCA were amplified in a multiplex PCR using primer sequences described previously  modified for capillary electrophoresis fluorescent detection on the ABI 3730 DNA Analyzer. Microsatellites were amplified with Phusion High-Fidelity Master Mix (Finnzymes, Espoo, Finland). An allelic ladder, constructed with in-house samples, was characterized by fragment analysis and sequencing. Peak sizes were determined using the internal lane standard, LIZ500, and the number of repeats was determined using the GeneMapper software (Applied Biosystems).
Analysis of microsatellite IVS17bTA was performed using previously defined primers  with several different polymerase enzymes (AmpliTaqGold DNA Polymerase with Gold Buffer and Buffer II (Applied Biosystems), Phusion HF and Phusion GC Master Mix (Finnzymes), Herculease II Fusion DNA Polymerase (Stratagene), EmeraldAmp Max HS PCR Master Mix (Takara) and HotStarTaq Master Mix (Qiagen)) and PCR additives (3–5% DMSO, 8% glycerol, and 1 M Betaine), however, all conditions gave a cluster of 2–5 fragments for each allele.
Linkage disequilibrium (LD) data from the International HapMap Project was examined using the HapMap Genome Browser (Phase 1, 2 and 3—merged genotypes and frequencies). An LD plot of the logarithm of the odds (LOD score) of the Human Genome Diversity Cell Line Panel (HGDP) CEPH population with HAPMAP preset defaults was used to characterize the LD between a given marker pair in this population (Fig. 1). The diamond color where two SNPs intersect indicates the level of LD; darker shades indicate a higher LD while lighter shades indicate lower LD. Gray regions represent missing data points. Haplotypes were predicted from the 34 sequence variants and two intronic microsatellites with an expectation maximization (EM) algorithm in JMP Genomics (SAS, Cary, North Carolina) using the PROC HAPLOTYPE procedure. The CFTR polymorphisms segregated into two haplotype blocks and compared to the HAPMAP LD plot. The two haplotype blocks assort independently, determined by contingency χ2, and CFTR mutations were assigned to specific predicted haplotypes through association frequency.
DNA sequence data and large deletion analysis of the CFTR gene was performed on 152 unique chromosomes from CF patient or carrier specimens to identify and verify CF-causing mutations and likely benign polymorphisms. In addition to the ACMG 23 recommended mutations, 19 other mutations are represented . Supplemental Table S1 contains a complete list of mutations with HGVS standardized nomenclature. Four specimens contained three mutations (Table 2); however, in all cases, the third mutation was not common, was associated with limited clinical information in the CF mutation database, and was not part of the ACMG recommended 23 mutations. The I1027T mutation, the third mutation in two specimens, has been found in cis with the F508del mutation greater than 5% in the Brittany population (western France) . A number of sequence variants (N=34) with no known consequences also were identified, and were present most often in the non-coding regions of the CFTR gene. Five of these sequence variants have not been described previously in either of the CF Mutation database or NCBI’s dbSNP [4,20]. These new sequence variants are located in non-coding regions: c.1209+ 43T>G (Intron 9, ss432791813), c.1767-231T>C (Intron 13, ss432791824), c.1767-132A>G, (Intron 13, ss432791829), c.*94C>T (3′ UTR, ss432791834) and c.*1823C>T (3′ UTR, ss432791840).
The analysis of the CFTR sequence variants and mutations identified in this study was based on predicted haplotype blocks from the HGDP-CEPH population from the International HapMap project. The HapMap predicts one major haplotype block that spans the promoter through exon 22 (Fig. 1) . Twenty-five sequence variants and 2 microsatellites were characterized in this major block and were used to predict haplotypes with greater than 95% probability. In four samples, 2184delA/394delTT, 3905insT/1248+1G→A, 3120+ 1G→A/−102T→A and 3120+1G→A/L467P, the predicted haplotype could not be unambiguously assigned to one of the two CF-causing mutations. The minor HapMap block spanning intron 23 through the 3′ UTR was randomly associated with the block 1 haplotypes (p=0.850) and was not informative (data not shown).
Among the 144 haplotypes assigned to a CF-causing mutation, there were 34 classes of which four contained the F508del mutation (N=63 chromosomes) (Table 3). Three F508del containing haplotypes were identical except for the number of repeats found at the IVS8CA microsatellite and the fourth was a probable product of a recombination event. The two most common F508del containing haplotypes differed only by the number of repeats of the IVS8CA microsatellite; 32 chromosomes contained 17 repeats and 29 chromosomes contained 23 repeats. The F508del haplotype that contained 23 repeats of the IVS8CA microsatellite was identical to the predicted haplotypes associated with G542X (N=6 of 6), N1303K (N=6 of 6), del Ex17a, b and 18 (N=1 of 1), and 3849+10 kb C→T (N=1 of 2).
The R117H mutation, known to be variable in its disease expression, was found in three specimens. Two had an almost identical predicted haplotype including the 5T variant, with the exception of 15 versus 16 IVS8CA repeats. The third R117H predicted haplotype contained a 7T variant and multiple differences from the other two R117H haplotypes. The haplotype containing the R117H mutation with the 7T variant was identical to the 1717-1G→A containing haplotype; however the significance is unknown since each is only found in one chromosome (Table 3). A single CF-related metabolic syndrome chromosome containing the 5T variant in the absence of R117H had a unique haplotype quite different from all other R117H containing haplotypes.
The goal of this analysis was to characterize the 72 PT specimens in NSQAP’s CFTR Mutation Detection repository, which are used to assist NBS laboratories in ensuring accuracy when they utilize a molecular second tier test for CF. This comprehensive analysis resulted in the identification of 34 classes of predicted haplotypes that span the CFTR gene promoter through intron 18. These haplotypes are consistent with the LD defined by the International HapMap project. Half (N=77) of the 152 chromosomes examined shared a common haplotype subgroup which was associated with three of the most prevalent CF-causing mutations, F508del, G542X, and N1303K.
To understand CFTR mutations, a previous study assessing the origin of 27,177 CF chromosomes from 29 European countries and three North African countries described the five most common CF-causing mutations: F508del (66.8%), G542X (2.6%), N1303K (1.6%), G551D (1.5%) and W1282X (1.0%) . Similarly, Bobadilla et al. described the five most common CF-causing mutations in the U.S., which included F508del (68.6%), G542X (2.4%), G551D (2.1%), W1282X (1.4%) and N1303K (1.3%) . Hence, F508del, G542X, and N1303K are the more common mutations in Caucasians from Europe and the United States. A study on Spanish CF patients suggested an ancient common origin of these three mutations by showing that they all carried a common haplotype subgroup as defined by the IVS8CA, IVS17bCA and IVS17bTA microsatellites . The haplotypes identified in the present study were consistent with the Spanish patient findings, showing that F508del, G542X, and N1303K again share a common haplotype subgroup . Of our 62 F508del containing chromosomes (excluding the one probable recombinant), 29 predicted haplotypes are identical across 27 polymorphisms from the promoter to intron 21 to the G542X containing haplotypes (N=6) and the N1303K containing haplotypes (N=6). The remaining 33 F508del containing haplotypes are identical to this common haplotype except that they contain either 17 or 21 di-nucleotide repeats of the intron 9 IVS8CA microsatellite versus 23. Taken together, these data from a heterogeneous group of CF patients living in the United States support the existence of a common ancient European ancestral haplotype that independently gave rise to these three CF mutations.
A recent study examined the association of common CFTR gene variants and their potential influence on body composition and survival in a non-CF population in rural Ghana . As expected, the LD across the CFTR gene in this Ghana population was not as strong as that seen in a population of European descent; however, they reported specific intron 11 haplotype (comprised of 4 SNPs) associations with young versus old study participants as well as with lower or higher weight in children less than 5 years old. While further studies are necessary to understand if these associations are by chance or if they have any direct influence on health, it provides further motivation to understand the molecular framework of the CFTR gene.
Molecular analysis of CF, as with many autosomal recessive disorders, is complicated by the imperfect correlation between mutations in the CFTR gene and CF phenotype [26,27]. In order to better understand the consequences of the greater than 1850 CFTR mutations, Rowntree and Harris have classified CFTR mutations into five groups, where each classification describes the mechanism by which a group of mutations disrupt CFTR function . To further explain the inability to correlate phenotype and genotype in a supposedly simple “single-gene” disorder, Dipple and McCabe have proposed that there are two functional thresholds relating mutant protein function to phenotype. When protein function is below the first threshold, a severe phenotype will always be observed, whereas if protein function is above the second threshold, the phenotype will be consistently mild. Between these two thresholds, mutations will not necessarily correlate with phenotype and thus should be viewed as “complex traits” . Complex traits are influenced by functional activity thresholds, modifier gene and system dynamics, thereby blurring the lines between genotype phenotype correlations [29–31]. For CF, there are numerous studies that also describe modulatory effects of different genes on the severity of phenotype in CF patients (e.g. mannose-binding lectin 2 and transforming growth factor beta 1) [27,32].
To further define the phenotypic heterogeneity in patients who have the same CF causing mutations, several molecular studies have analyzed the context of disease causing mutations. Researchers found that the phenotypic severity of a CF-causing mutation could be impacted by the genomic context of the CFTR gene as seen when R117H is in cis with the 5T variant and when S1251N is in cis with F508C [33,34]. Thus, the haplotypes defined in this study may assist newborn screeners and clinicians in predicting the cis/trans phase for mutations and poly T variants with variable expression without requiring parent studies. For example, when a newborn specimen is positive for R117H and either F508del, G542X or N1303K, and also carries both an 5T and a 9T variant, a clinician could use haplotype information to proceed with a strong probability that the 9T variant is in cis with F508del, G542X or N1310K and not R117H (Table 3). This study also shows that R117H may reside on the opposite chromosome from nearly all of the listed mutations in Table 3 because it has a different polymorphism background. Additionally, more extensive haplotype studies may allow researchers to determine what if any phenotypic effects results when a particular variant such as 5T are in cis or trans with an identified CF-causing mutation.
This study has defined 144 CFTR haplotypes associated with many CF-causing mutations, laying the groundwork for future research on the molecular structure of the CFTR gene and its potential influence on phenotypic heterogeneity in CF patients. In addition, these extensively characterized specimens will enhance the quality of the PT challenges offered by NSQAP to participating NBS laboratories.
The authors wish to thank Dr Cédric Le Marechal from Laboratoire de génétique moléculaire et d’histocompatibilité, Génétique moléculaire et génétique épidémiologique, CHU Hôpital Morvan in Brest, France for helpful comments and review. Sean Mochal was funded by the Research Participation Program at the Centers for Disease Control and Prevention (CDC), National Center for Environmental Health’s Division of Laboratory Sciences, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and CDC. Dr. Philip Farrell was supported by NIH grant DK 34108. All work performed was supported by CDC.
Author disclosure: Dr. Hannon, Emeritus Chief (retired) from the Newborn Screening and Molecular Biology Branch of CDC serves on the NBS Scientific Advisory Council for Advanced Liquid Logic, Inc., Research Triangle Park, North Carolina, and the Georgia Governor’s Public Health Advisory Council. Dr. Hannon also provides consulting services to National Newborn Screening and Genetic Resource Center in Austin, Texas and PerkinElmer, Inc. in Waltham, Massachusetts.
Supplementary materials related to this article can be found online at doi:10.1016/j.ymgme.2011.10.013.