In the past few years, a plethora of methods for rare variant association with phenotype have been proposed. These methods aggregate information from multiple rare variants across genomic region(s), but there is little consensus as to which method is most effective. The weighting scheme adopted when aggregating information across variants is one of the primary determinants of effectiveness. Here we present a systematic evaluation of multiple weighting schemes through a series of simulations intended to mimic large sequencing studies of a quantitative trait. We evaluate existing phenotype-independent and -dependent methods, as well as weights estimated by penalized regression approaches including Lasso, Elastic Net and SCAD. We find that the difference in power between phenotype-dependent schemes is negligible when high quality functional annotations are available. When functional annotations are unavailable or incomplete, all methods suffer from power loss; however, the variable selection methods outperform the others at the cost of increased computational time. Therefore, in the absence of good annotation, we recommend variable selection methods (which can be viewed as “statistical annotation”) on top regions implicated by a phenotype independent weighting scheme. Further, once a region is implicated, variable selection can help to identify potential causal SNPs for biological validation. These findings are supported by an analysis of a high coverage targeted sequencing study of 1898 individuals.
rare variants; association; weighting; variable selection; variant annotation
It has been repeatedly shown that in case-control association studies, analysis of a secondary trait which ignores the original sampling scheme can produce highly biased risk estimates. Although a number of approaches have been proposed to properly analyze secondary traits, most approaches fail to reproduce the marginal logistic model assumed for the original case-control trait and/or do not allow for interaction between secondary trait and genotype marker on primary disease risk. In addition, the flexible handling of covariates remains challenging. We present a general retrospective likelihood framework to perform association testing for both binary and continuous secondary traits which respects marginal models and incorporates the interaction term. We provide a computational algorithm, based on a reparameterized approximate profile likelihood, for obtaining the maximum likelihood (ML) estimate and its standard error for the genetic effect on secondary trait, in presence of covariates. For completeness we also present an alternative pseudo-likelihood method for handling covariates. We describe extensive simulations to evaluate the performance of the ML estimator in comparison with the pseudo-likelihood and other competing methods.
Editor’s Highlight: Byproducts of constitutive metabolism may themselves be toxic, complicating the risk assessment of the same chemicals encountered from external sources. The application of stable labeled compounds offers insight into the source of chemicals producing biological effects and provides a basis to quantify the contribution of exogenous exposure to biological events. This report describes the concentration dependent contributions of exogenous [13C2]-acetaldehyde and endogenously produced acetaldehyde to adduct formation in human lymphoblastoid cells in vitro. — Jeffrey Fisher
The dose-response relationship for biomarkers of exposure (N2-ethylidene-dG adducts) and effect (cell survival and micronucleus formation) was determined across 4.5 orders of magnitude (50nM–2mM) using [13C2]-acetaldehyde exposures to human lymphoblastoid TK6 cells for 12h. There was a clear increase in exogenous N
2-ethylidene-dG formation at exposure concentrations ≥ 1µM, whereas the endogenous adducts remained nearly constant across all exposure concentrations, with an average of 3.0 adducts/107 dG. Exogenous adducts were lower than endogenous adducts at concentrations ≤ 10µM and were greater than endogenous adducts at concentrations ≥ 250µM. When the endogenous and exogenous adducts were summed together, statistically significant increases in total adduct formation over the endogenous background occurred at 50µM. Cell survival and micronucleus formation were monitored across the exposure range and statistically significant decreases in cell survival and increases in micronucleus formation occurred at ≥ 1000µM. This research supports the hypothesis that endogenously produced reactive species, including acetaldehyde, are always present and constitute the majority of the observed biological effects following very low exposures to exogenous acetaldehyde. These data can replace default assumptions of linear extrapolation to very low doses of exogenous acetaldehyde for risk prediction.
acetaldehyde; DNA adduct; micronucleus; biomarker of exposure; biomarker of effect; liquid chromatography–; mass spectrometry.
Background: Benchmark dose (BMD) modeling computes the dose associated with a prespecified response level. While offering advantages over traditional points of departure (PODs), such as no-observed-adverse-effect-levels (NOAELs), BMD methods have lacked consistency and transparency in application, interpretation, and reporting in human health assessments of chemicals.
Objectives: We aimed to apply a standardized process for conducting BMD modeling to reduce inconsistencies in model fitting and selection.
Methods: We evaluated 880 dose–response data sets for 352 environmental chemicals with existing human health assessments. We calculated benchmark doses and their lower limits [10% extra risk, or change in the mean equal to 1 SD (BMD/L10/1SD)] for each chemical in a standardized way with prespecified criteria for model fit acceptance. We identified study design features associated with acceptable model fits.
Results: We derived values for 255 (72%) of the chemicals. Batch-calculated BMD/L10/1SD values were significantly and highly correlated (R2 of 0.95 and 0.83, respectively, n = 42) with PODs previously used in human health assessments, with values similar to reported NOAELs. Specifically, the median ratio of BMDs10/1SD:NOAELs was 1.96, and the median ratio of BMDLs10/1SD:NOAELs was 0.89. We also observed a significant trend of increasing model viability with increasing number of dose groups.
Conclusions: BMD/L10/1SD values can be calculated in a standardized way for use in health assessments on a large number of chemicals and critical effects. This facilitates the exploration of health effects across multiple studies of a given chemical or, when chemicals need to be compared, providing greater transparency and efficiency than current approaches.
Citation: Wignall JA, Shapiro AJ, Wright FA, Woodruff TJ, Chiu WA, Guyton KZ, Rusyn I. 2014. Standardizing benchmark dose calculations to improve science-based decisions in human health assessments. Environ Health Perspect 122:499–505; http://dx.doi.org/10.1289/ehp.1307539
Background: Quantitative estimation of toxicokinetic variability in the human population is a persistent challenge in risk assessment of environmental chemicals. Traditionally, interindividual differences in the population are accounted for by default assumptions or, in rare cases, are based on human toxicokinetic data.
Objectives: We evaluated the utility of genetically diverse mouse strains for estimating toxicokinetic population variability for risk assessment, using trichloroethylene (TCE) metabolism as a case study.
Methods: We used data on oxidative and glutathione conjugation metabolism of TCE in 16 inbred and 1 hybrid mouse strains to calibrate and extend existing physiologically based pharmacokinetic (PBPK) models. We added one-compartment models for glutathione metabolites and a two-compartment model for dichloroacetic acid (DCA). We used a Bayesian population analysis of interstrain variability to quantify variability in TCE metabolism.
Results: Concentration–time profiles for TCE metabolism to oxidative and glutathione conjugation metabolites varied across strains. Median predictions for the metabolic flux through oxidation were less variable (5-fold range) than that through glutathione conjugation (10-fold range). For oxidative metabolites, median predictions of trichloroacetic acid production were less variable (2-fold range) than DCA production (5-fold range), although the uncertainty bounds for DCA exceeded the predicted variability.
Conclusions: Population PBPK modeling of genetically diverse mouse strains can provide useful quantitative estimates of toxicokinetic population variability. When extrapolated to lower doses more relevant to environmental exposures, mouse population-derived variability estimates for TCE metabolism closely matched population variability estimates previously derived from human toxicokinetic studies with TCE, highlighting the utility of mouse interstrain metabolism studies for addressing toxicokinetic variability.
Citation: Chiu WA, Campbell JL Jr, Clewell HJ III, Zhou YH, Wright FA, Guyton KZ, Rusyn I. 2014. Physiologically based pharmacokinetic (PBPK) modeling of interstrain variability in trichloroethylene metabolism in the mouse. Environ Health Perspect 122:456–463; http://dx.doi.org/10.1289/ehp.1307623
Little is known for certain about the genetics of schizophrenia. The
advent of genomewide association has been widely anticipated as holding
promise as a means to identify reproducible DNA sequence variation
associated with this important and debilitating disorder.
738 cases with DSM-IV schizophrenia (all participants in the CATIE
study) and 733 group-matched controls were genotyped for 492,900 single
nucleotide polymorphisms (SNPs) using the Affymetrix 500K two chip
genotyping platform plus a custom 164K fill-in chip. Following multiple
quality control steps for both subjects and SNPs, logistic regression
analyses were used to assess the evidence for association of all SNPs with
We identified a number of promising SNPs for follow-up studies,
although no SNP or multi-marker combination of SNPs achieved genomewide
statistical significance. Although a few signals coincided with genomic
regions previously implicated in schizophrenia, chance could not be
These data do not provide evidence for the involvement of any genomic
region with schizophrenia detectable with moderate sample size. However,
planned GWAS for response phenotypes and inclusion of individual phenotype
and genotype data from this study in meta-analyses holds promise for the
eventual identification of susceptibility and protective variants.
schizophrenia; genome-wide association; CATIE
Motivation: Scientists and regulators are often faced with complex decisions, where use of scarce resources must be prioritized using collections of diverse information. The Toxicological Prioritization Index (ToxPi™) was developed to enable integration of multiple sources of evidence on exposure and/or safety, transformed into transparent visual rankings to facilitate decision making. The rankings and associated graphical profiles can be used to prioritize resources in various decision contexts, such as testing chemical toxicity or assessing similarity of predicted compound bioactivity profiles. The amount and types of information available to decision makers are increasing exponentially, while the complex decisions must rely on specialized domain knowledge across multiple criteria of varying importance. Thus, the ToxPi bridges a gap, combining rigorous aggregation of evidence with ease of communication to stakeholders.
Results: An interactive ToxPi graphical user interface (GUI) application has been implemented to allow straightforward decision support across a variety of decision-making contexts in environmental health. The GUI allows users to easily import and recombine data, then analyze, visualize, highlight, export and communicate ToxPi results. It also provides a statistical metric of stability for both individual ToxPi scores and relative prioritized ranks.
Availability: The ToxPi GUI application, complete user manual and example data files are freely available from http://comptox.unc.edu/toxpi.php.
Genomes of men and women differ in only a limited number of genes located on the sex chromosomes, whereas the transcriptome is far more sex-specific. Identification of sex-biased gene expression will contribute to understanding the molecular basis of sex-differences in complex traits and common diseases.
Sex differences in the human peripheral blood transcriptome were characterized using microarrays in 5,241 subjects, accounting for menopause status and hormonal contraceptive use. Sex-specific expression was observed for 582 autosomal genes, of which 57.7% was upregulated in women (female-biased genes). Female-biased genes were enriched for several immune system GO categories, genes linked to rheumatoid arthritis (16%) and genes regulated by estrogen (18%). Male-biased genes were enriched for genes linked to renal cancer (9%). Sex-differences in gene expression were smaller in postmenopausal women, larger in women using hormonal contraceptives and not caused by sex-specific eQTLs, confirming the role of estrogen in regulating sex-biased genes.
This study indicates that sex-bias in gene expression is extensive and may underlie sex-differences in the prevalence of common diseases.
We introduce the Interactive Decision Committee method for classification when high-dimensional feature variables are grouped into feature categories. The proposed method uses the interactive relationships among feature categories to build base classifiers which are combined using decision committees. A two-stage or a single-stage 5-fold cross-validation technique is utilized to decide the total number of base classifiers to be combined. The proposed procedure is useful for classifying biochemicals on the basis of toxicity activity, where the feature space consists of chemical descriptors and the responses are binary indicators of toxicity activity. Each descriptor belongs to at least one descriptor category. The support vector machine, the random forests, and the tree-based AdaBoost algorithms are utilized as classifier inducers. Forward selection is used to select the best combinations of the base classifiers given the number of base classifiers. Simulation studies demonstrate that the proposed method outperforms a single large, unaggregated classifier in the presence of interactive feature category information. We applied the proposed method to two toxicity data sets associated with chemical compounds. For these data sets, the proposed method improved classification performance for the majority of outcomes compared to a single large, unaggregated classifier.
Chemical toxicity; Decision committee method; Ensemble; Ensemble feature selection; QSAR modeling; Statistical learning
A subset (~3–5%) of patients with cystic fibrosis (CF) develops severe liver disease (CFLD) with portal hypertension.
To assess whether any of 9 polymorphisms in 5 candidate genes (SERPINA1, ACE, GSTP1, MBL2, and TGFB1) are associated with severe liver disease in CF patients.
Design, Setting, and Participants
A 2-stage design was used in this case–control study. CFLD subjects were enrolled from 63 U.S., 32 Canadian, and 18 CF centers outside of North America, with the University of North Carolina at Chapel Hill (UNC) as the coordinating site. In the initial study, we studied 124 CFLD patients (enrolled 1/1999–12/2004) and 843 CF controls (patients without CFLD) by genotyping 9 polymorphisms in 5 genes previously implicated as modifiers of liver disease in CF. In the second stage, the SERPINA1 Z allele and TGFB1 codon 10 genotype were tested in an additional 136 CFLD patients (enrolled 1/2005–2/2007) and 1088 CF controls.
Main Outcome Measures
We compared differences in distribution of genotypes in CF patients with severe liver disease versus CF patients without CFLD.
The initial study showed CFLD to be associated with the SERPINA1 (also known as α1-antiprotease and α1-antitrypsin) Z allele (P value=3.3×10−6; odds ratio (OR) 4.72, 95% confidence interval (CI) 2.31–9.61), and with transforming growth factor β-1 (TGFB1) codon 10 CC genotype (P=2.8×10−3; OR 1.53, CI 1.16–2.03). In the replication study, CFLD was associated with the SERPINA1 Z allele (P=1.4×10−3; OR 3.42, CI 1.54–7.59), but not with TGFB1 codon 10. A combined analysis of the initial and replication studies by logistic regression showed CFLD to be associated with SERPINA1 Z allele (P=1.5×10−8; OR 5.04, CI 2.88–8.83).
The SERPINA1 Z allele is a risk factor for liver disease in CF. Patients who carry the Z allele are at greater odds (OR ~5) to develop severe liver disease with portal hypertension.
Exome sequencing has become a powerful and effective strategy for discovery of genes underlying Mendelian disorders1. However, use of exome sequencing to identify variants associated with complex traits has been more challenging, partly because the samples sizes needed for adequate power may be very large2. One strategy to increase efficiency is to sequence individuals who are at both ends of a phenotype distribution (i.e., extreme phenotypes). Because the frequency of alleles that contribute to the trait are enriched in one or both extremes of phenotype, a modest sample size can potentially identify novel candidate genes/alleles3. As part of the National Heart, Lung, and Blood Institute Exome Sequencing Project (ESP), we used an extreme phenotype design to discover that variants in DCTN4, encoding a dynactin protein, are associated with time to first Pseudomonas aeruginosa (P. aeruginosa) airway infection, chronic P. aeruginosa infection and mucoid P. aeruginosa among individuals with cystic fibrosis (MIM219700).
A shift in toxicity testing from in vivo to in vitro may efficiently prioritize compounds, reveal new mechanisms, and enable predictive modeling. Quantitative high-throughput screening (qHTS) is a major source of data for computational toxicology, and our goal in this study was to aid in the development of predictive in vitro models of chemical-induced toxicity, anchored on interindividual genetic variability. Eighty-one human lymphoblast cell lines from 27 Centre d’Etude du Polymorphisme Humain trios were exposed to 240 chemical substances (12 concentrations, 0.26nM–46.0μM) and evaluated for cytotoxicity and apoptosis. qHTS screening in the genetically defined population produced robust and reproducible results, which allowed for cross-compound, cross-assay, and cross-individual comparisons. Some compounds were cytotoxic to all cell types at similar concentrations, whereas others exhibited interindividual differences in cytotoxicity. Specifically, the qHTS in a population-based human in vitro model system has several unique aspects that are of utility for toxicity testing, chemical prioritization, and high-throughput risk assessment. First, standardized and high-quality concentration-response profiling, with reproducibility confirmed by comparison with previous experiments, enables prioritization of chemicals for variability in interindividual range in cytotoxicity. Second, genome-wide association analysis of cytotoxicity phenotypes allows exploration of the potential genetic determinants of interindividual variability in toxicity. Furthermore, highly significant associations identified through the analysis of population-level correlations between basal gene expression variability and chemical-induced toxicity suggest plausible mode of action hypotheses for follow-up analyses. We conclude that as the improved resolution of genetic profiling can now be matched with high-quality in vitro screening data, the evaluation of the toxicity pathways and the effects of genetic diversity are now feasible through the use of human lymphoblast cell lines.
chemical cytotoxicity; apoptosis; HapMap; lymphoblasts; qHTS
Head and neck squamous cell carcinoma (HNSCC) is a frequently fatal heterogeneous disease. Beyond the role of human papilloma virus (HPV), no validated molecular characterization of the disease has been established. Using an integrated genomic analysis and validation methodology we confirm four molecular classes of HNSCC (basal, mesenchymal, atypical, and classical) consistent with signatures established for squamous carcinoma of the lung, including deregulation of the KEAP1/NFE2L2 oxidative stress pathway, differential utilization of the lineage markers SOX2 and TP63, and preference for the oncogenes PIK3CA and EGFR. For potential clinical use the signatures are complimentary to classification by HPV infection status as well as the putative high risk marker CCND1 copy number gain. A molecular etiology for the subtypes is suggested by statistically significant chromosomal gains and losses and differential cell of origin expression patterns. Model systems representative of each of the four subtypes are also presented.
Resampling-based expression pathway analysis techniques have been shown to preserve type I error rates, in contrast to simple gene-list approaches that implicitly assume the independence of genes in ranked lists. However, resampling is intensive in computation time and memory requirements. We describe accurate analytic approximations to permutations of score statistics, including novel approaches for Pearson's correlation, and summed score statistics, that have good performance for even relatively small sample sizes. Our approach preserves the essence of permutation pathway analysis, but with greatly reduced computation. Extensions for inclusion of covariates and censored data are described, and we test the performance of our procedures using simulations based on real datasets. These approaches have been implemented in the new R package safeExpress.
Gene sets; Multiple hypothesis testing; Permutation approximation
Expression quantitative trait locus (eQTL) analysis is rapidly moving from a cutting-edge concept in genomics to a mature area of investigation, with important connections to genome-wide association studies for human disease, pharmacogenomics and toxicogenomics. Despite the importance of the topic, many investigators must develop their own code or use tools not specifically suited for eQTL analysis. Convenient computational tools are becoming available, but they are not widely publicized, and investigators who are interested in discovery or eQTL, or in using them to interpret genome-wide association study results may have difficulty navigating the available resources. The purpose of this review is to help investigators find appropriate programs for eQTL analysis and interpretation.
bioinformatics; fast linear modeling; gene expression
Motivation: A number of penalization and shrinkage approaches have been proposed for the analysis of microarray gene expression data. Similar techniques are now routinely applied to RNA sequence transcriptional count data, although the value of such shrinkage has not been conclusively established. If penalization is desired, the explicit modeling of mean–variance relationships provides a flexible testing regimen that ‘borrows’ information across genes, while easily incorporating design effects and additional covariates.
Results: We describe BBSeq, which incorporates two approaches: (i) a simple beta-binomial generalized linear model, which has not been extensively tested for RNA-Seq data and (ii) an extension of an expression mean–variance modeling approach to RNA-Seq data, involving modeling of the overdispersion as a function of the mean. Our approaches are flexible, allowing for general handling of discrete experimental factors and continuous covariates. We report comparisons with other alternate methods to handle RNA-Seq data. Although penalized methods have advantages for very small sample sizes, the beta-binomial generalized linear model, combined with simple outlier detection and testing approaches, appears to have favorable characteristics in power and flexibility.
Availability: An R package containing examples and sample datasets is available at http://www.bios.unc.edu/research/genomic_software/BBSeq
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
In genome-wide association studies, population stratification is recognized as producing inflated type I error due to the inflation of test statistics. Principal component-based methods applied to genotypes provide information about population structure, and have been widely used to control for stratification. Here we explore the precise relationship between genotype principal components and inflation of association test statistics, thereby drawing a connection between principal component-based stratification control and the alternative approach of genomic control. Our results provide an inherent justification for the use of principal components, but call into question the popular practice of selecting principal components based on significance of eigenvalues alone. We propose a new approach, called EigenCorr, which selects principal components based on both their eigenvalues and their correlation with the (disease) phenotype. Our approach tends to select fewer principal components for stratification control than does testing of eigenvalues alone, providing substantial computational savings and improvements in power. Analyses of simulated and real data demonstrate the usefulness of the proposed approach.
Genomic Control; GWAS; PCA; Population Stratification
Genetic studies of lung disease in Cystic Fibrosis are hampered by the
lack of a severity measure that accounts for chronic disease progression and
mortality attrition. Further, combining analyses across studies requires common
phenotypes that are robust to study design and patient ascertainment.
Using data from the North American Cystic Fibrosis Modifier Consortium
(Canadian Consortium for CF Genetic Studies, Johns Hopkins University CF Twin
and Sibling Study, and University of North Carolina/Case Western Reserve
University Gene Modifier Study), the authors calculated age-specific CF
percentile values of FEV1 which were adjusted for CF age-specific mortality
The phenotype was computed for 2061 patients representing the Canadian CF
population, 1137 extreme phenotype patients in the UNC/Case Western study, and
1323 patients from multiple CF sib families in the CF Twin and Sibling Study.
Despite differences in ascertainment and median age, our phenotype score was
distributed in all three samples in a manner consistent with ascertainment
differences, reflecting the lung disease severity of each individual in the
underlying population. The new phenotype score was highly correlated with the
previously recommended complex phenotype, but the new phenotype is more robust
for shorter follow-up and for extreme ages.
A disease progression and mortality adjusted phenotype reduces the need
for stratification or additional covariates, increasing statistical power and
avoiding possible distortions. This approach will facilitate large scale genetic
and environmental epidemiological studies which will provide targeted
therapeutic pathways for the clinical benefit of patients with CF.
Forced Expiratory Volume; Age Effects; Severity of Illness Index
MicroRNAs are short, non-coding RNA sequences that regulate genes at the post-transcriptional level and have been shown to be important in development, tissue differentiation, and disease. Limited attention has been given to the natural variation in miRNA expression across genetically diverse populations even though it is well established that genetic polymorphisms can have a profound effect on mRNA levels. Expression level of 577 miRNAs in the livers of 70 strains of inbred mice was assessed, and we found that miRNA expression is highly stable across different strains. Globally, the expression of miRNA target transcripts does not correlate with miRNA expression, primarily due to the low variance of miRNA but high variance of mRNA expression across strains. Our results show that there is little genetic effect on the baseline miRNA levels in murine liver. The stability of mouse liver miRNA expression in a genetically diverse population suggests that treatment-induced disruptions in liver miRNA expression, a phenomenon established for a large number of toxicants, may indicate an important mechanism for the disturbance of normal liver function, and may prove to be a useful genetic background-independent biomarker of toxicant effect.
micro RNA; liver; mouse; gene expression
Rectal cancer is often clinically resistant to radiotherapy and there would be value to identifying molecular markers to define the biological basis for this phenomenon. NF-κB is a potentially anti-apoptotic transcription factor that has been associated with resistance to radiotherapy in model systems. This study was designed to evaluate NF- κB activation in rectal cancers being treated with chemoradiation to determine whether NF- κB activity correlates with outcome in rectal cancer
Methods and Materials
22 patients were biopsied at multiple time points in a prospective study, and another 50 were analyzed retrospectively. Pre-treatment tumor tissue was analyzed for multiple NF- κB subunits by immunohistochemistry (IHC). Serial tumor biopsies were analyzed for NF- κB-regulated gene expression by RT-PCR and for NF-κB subunit nuclear localization by IHC.
Several NF- κB target genes (Bcl-2, cIAP-2, IL-8 and TRAF1) were significantly upregulated by a single fraction of radiotherapy at 24 hours demonstrating for the first time that NF-κB is activated by radiotherapy in human rectal tumors. Baseline NF-κB p50 nuclear expression did not correlate with pathologic response to radiotherapy, but increasing baseline p50 was prognostic for overall survival (HR 2.15, p = 0.040).
NF-κB nuclear expression at baseline in rectal cancer is prognostic for overall survival but not predictive of response to radiotherapy. Larger patient numbers would be needed to assess the effect of NF-κB target gene upregulation on response to RT. Our results suggest that NF-κB may play an important role in tumor metastasis as opposed to resistance to chemoradiotherapy.
Variants associated with meconium ileus in cystic fibrosis (CF) were identified in 3,763 patients by GWAS. Five SNPs at two loci near SLC6A14 (min P=1.28×10−12 at rs3788766), chr Xq23-24 and SLC26A9 (min P=9.88×10−9 at rs4077468), chr 1q32.1 accounted for ~5% of the phenotypic variability, and were replicated in an independent patient collection (n=2,372; P=0.001 and 0.0001 respectively). By incorporating that disease-causing mutations in CFTR alter electrolyte and fluid flux across epithelia into an hypothesis-driven genome-wide analysis (GWAS-HD), we identified the same SLC6A14 and SLC26A9 associated SNPs, while establishing evidence for the involvement of SNPs in a third solute carrier gene, SLC9A3. In addition, GWAS-HD provided evidence of association between meconium ileus and multiple constituents of the apical plasma membrane where CFTR resides (P=0.0002, testing 155 apical genes jointly and replicated, P=0.022). These findings suggest that modulating activities of apical membrane constituents could complement current therapeutic paradigms for cystic fibrosis.
A combined genome-wide association and linkage study was used to identify loci causing variation in CF lung disease severity. A significant association (P=3. 34 × 10-8) near EHF and APIP (chr11p13) was identified in F508del homozygotes (n=1,978). The association replicated in F508del homozygotes (P=0.006) from a separate family-based study (n=557), with P=1.49 × 10-9 for the three-study joint meta-analysis. Linkage analysis of 486 sibling pairs from the family-based study identified a significant QTL on chromosome 20q13.2 (LOD=5.03). Our findings provide insight into the causes of variation in lung disease severity in CF and suggest new therapeutic targets for this life-limiting disorder.
It is generally presumed that the Cystic Fibrosis (CF) population is relatively homogeneous, and predominantly of European origin. The complex ethnic make-up observed in the CF patients collected by the North American CF Modifier Gene Consortium has brought this assumption into question, and suggested the potential for population substructure in the three CF study samples collected from North America. It is well appreciated that population substructure can result in spurious genetic associations.
To understand the ethnic composition of the North American CF population, and to assess the need for population structure adjustment in genetic association studies with North American CF patients.
Genome-wide single-nucleotide polymorphisms on 3076 unrelated North American CF patients were used to perform population structure analyses. We compared self-reported ethnicity to genotype-inferred ancestry, and also examined whether geographic distribution and CFTR mutation type could explain the structure observed.
Although largely Caucasian, our analyses identified a considerable number of CF patients with admixed African-Caucasian, Mexican-Caucasian and Indian-Caucasian ancestries. Population substructure was present and comparable across the three studies of the consortium. Neither geographic distribution nor mutation type explained the population structure.
Given the ethnic diversity of the North American CF population, it is essential to carefully detect, estimate and adjust for population substructure to guard against potential spurious findings in CF genetic association studies. Other Mendelian diseases that are presumed to predominantly affect single ethnic groups may also benefit from careful analysis of population structure.
ethnicity; principal component analysis; population substructure; population stratification