Reviewing the literature in many fields on proposed risk models reveals problems with the way many risk models are developed. Furthermore, papers reporting new risk models do not always provide sufficient information to allow readers to assess the merits of the model. In this review, we discuss sources of bias that can arise in risk model development. We focus on two biases that can be introduced during data analysis. These two sources of bias are sometimes conflated in the literature and we recommend the terms resubstitution bias and model-selection bias to delineate them. We also propose the RiGoR reporting standard to improve transparency and clarity of published papers proposing new risk models.
Risk prediction; Reporting standards; Research design; Statistical bias
Net reclassification indices have recently become popular statistics for measuring the prediction increment of new biomarkers. We review the various types of net reclassification indices and their correct interpretations. We evaluate the advantages and disadvantages of quantifying the prediction increment with these indices. For pre-defined risk categories, we relate net reclassification indices to existing measures of the prediction increment. We also consider statistical methodology for constructing confidence intervals for net reclassification indices and evaluate the merits of hypothesis testing based on such indices. We recommend that investigators using net reclassification indices should report them separately for events (cases) and nonevents (controls). When there are two risk categories, the components of net reclassification indices are the same as the changes in the true-positive and false-positive rates. We advocate use of true- and false-positive rates and suggest it is more useful for investigators to retain the existing, descriptive terms. When there are three or more risk categories, we recommend against net reclassification indices because they do not adequately account for clinically important differences in shifts among risk categories. The category-free net reclassification index is a new descriptive device designed to avoid pre-defined risk categories. However, it suffers from many of the same problems as other measures such as the area under the receiver operating characteristic curve. In addition, the category-free index can mislead investigators by overstating the incremental value of a biomarker, even in independent validation data. When investigators want to test a null hypothesis of no prediction increment, the well-established tests for coefficients in the regression model are superior to the net reclassification index. If investigators want to use net reclassification indices, confidence intervals should be calculated using bootstrap methods rather than published variance formulas. The preferred single-number summary of the prediction increment is the improvement in net benefit.
In this issue of the Journal, Pencina and et al. (Am J Epidemiol. 2012;176(6):492–494) examine the operating characteristics of measures of incremental value. Their goal is to provide benchmarks for the measures that can help identify the most promising markers among multiple candidates. They consider a setting in which new predictors are conditionally independent of established predictors. In the present article, the authors consider more general settings. Their results indicate that some of the conclusions made by Pencina et al. are limited to the specific scenarios the authors considered. For example, Pencina et al. observed that continuous net reclassification improvement was invariant to the strength of the baseline model, but the authors of the present study show this invariance does not hold generally. Further, they disagree with the suggestion that such invariance would be desirable for a measure of incremental value. They also do not see evidence to support the claim that the measures provide complementary information. In addition, they show that correlation with baseline predictors can lead to much bigger gains in performance than the conditional independence scenario studied by Pencina et al. Finally, the authors note that the motivation of providing benchmarks actually reinforces previous observations that the problem with these measures is they do not have useful clinical interpretations. If they did, researchers could use the measures directly and benchmarks would not be needed.
area under curve; biomarkers; bivariate binomial distribution; receiver operating characteristic; risk assessment; risk factors
Epidemiologic methods are well established for investigating the association of a predictor of interest and disease status in the presence of covariates also associated with disease. There is less consensus on how to handle covariates when the goal is to evaluate the increment in prediction performance gained by a new marker when a set of predictors already exists. We distinguish between adjusting for covariates and joint modeling of disease risk in this context. We show that adjustment versus joint modeling are distinct concepts, and we describe the specific conditions where they are the same. We also discuss the concept of interaction among variables and describe a notion of interaction that is relevant to prediction performance. We conclude with a discussion of the most appropriate methods for evaluating new biomarkers in the presence of existing predictors.
Optimal triage of patients at risk of critical illness requires accurate risk prediction, yet little data exists on the performance criteria required of a potential biomarker to be clinically useful.
Materials and Methods
We studied an adult cohort of non-arrest, non-trauma emergency medical services encounters transported to a hospital from 2002–2006. We simulated hypothetical biomarkers increasingly associated with critical illness during hospitalization, and determined the biomarker strength and sample size necessary to improve risk classification beyond a best clinical model.
Of 57,647 encounters, 3,121 (5.4%) were hospitalized with critical illness and 54,526 (94.6%) without critical illness. The addition of a moderate strength biomarker (odds ratio=3.0 for critical illness) to a clinical model improved discrimination (c-statistic 0.85 vs. 0.8, p<0.01), reclassification (net reclassification improvement=0.15, 95%CI: 0.13,0.18), and increased the proportion of cases in the highest risk categoryby+8.6% (95%CI: 7.5,10.8%). Introducing correlation between the biomarker and physiological variables in the clinical risk score did not modify the results. Statistically significant changes in net reclassification required a sample size of at least 1000 subjects.
Clinical models for triage of critical illness could be significantly improved by incorporating biomarkers, yet, substantial sample sizes and biomarker strength may be required.
Biomarker; simulation; sample size; reclassification
The integrated discrimination improvement (IDI) index is a popular tool for evaluating the capacity of a marker to predict a binary outcome of interest. Recent reports have proposed that the IDI is more sensitive than other metrics for identifying useful predictive markers. In this article, the authors use simulated data sets and theoretical analysis to investigate the statistical properties of the IDI. The authors consider the common situation in which a risk model is fitted to a data set with and without the new, candidate predictor(s). Results demonstrate that the published method of estimating the standard error of an IDI estimate tends to underestimate the error. The z test proposed in the literature for IDI-based testing of a new biomarker is not valid, because the null distribution of the test statistic is not standard normal, even in large samples. If a test for the incremental value of a marker is desired, the authors recommend the test based on the model. For investigators who find the IDI to be a useful measure, bootstrap methods may offer a reasonable option for inference when evaluating new predictors, as long as the added predictive capacity is large.
biological markers; bootstrap confidence interval; prediction; risk assessment; sampling distribution; sampling error; selection bias; type I error
African-American (AA) women have earlier menarche on average than women of European ancestry (EA), and earlier menarche is a risk factor for obesity and type 2 diabetes among other chronic diseases. Identification of common genetic variants associated with age at menarche has a potential value in pointing to the genetic pathways underlying chronic disease risk, yet comprehensive genome-wide studies of age at menarche are lacking for AA women. In this study, we tested the genome-wide association of self-reported age at menarche with common single-nucleotide polymorphisms (SNPs) in a total of 18 089 AA women in 15 studies using an additive genetic linear regression model, adjusting for year of birth and population stratification, followed by inverse-variance weighted meta-analysis (Stage 1). Top meta-analysis results were then tested in an independent sample of 2850 women (Stage 2). First, while no SNP passed the pre-specified P < 5 × 10−8 threshold for significance in Stage 1, suggestive associations were found for variants near FLRT2 and PIK3R1, and conditional analysis identified two independent SNPs (rs339978 and rs980000) in or near RORA, strengthening the support for this suggestive locus identified in EA women. Secondly, an investigation of SNPs in 42 previously identified menarche loci in EA women demonstrated that 25 (60%) of them contained variants significantly associated with menarche in AA women. The findings provide the first evidence of cross-ethnic generalization of menarche loci identified to date, and suggest a number of novel biological links to menarche timing in AA women.
The goal of many microarray studies is to identify genes that are differentially expressed between two classes or populations. Many data analysts choose to estimate the false discovery rate (FDR) associated with the list of genes declared differentially expressed. Estimating an FDR largely reduces to estimating π1, the proportion of differentially expressed genes among all analyzed genes. Estimating π1 is usually done through P-values, but computing P-values can be viewed as a nuisance and potentially problematic step. We evaluated methods for estimating π1 directly from test statistics, circumventing the need to compute P-values. We adapted existing methodology for estimating π1 from t- and z-statistics so that π1 could be estimated from other statistics. We compared the quality of these estimates to estimates generated by two established methods for estimating π1 from P-values. Overall, methods varied widely in bias and variability. The least biased and least variable estimates of π1, the proportion of differentially expressed genes, were produced by applying the “convest” mixture model method to P-values computed from a pooled permutation null distribution. Estimates computed directly from test statistics rather than P-values did not reliably perform well.
Elevated resting heart rate is associated with greater risk of cardiovascular disease and mortality. In a 2-stage meta-analysis of genome-wide association studies in up to 181,171 individuals, we identified 14 new loci associated with heart rate and confirmed associations with all 7 previously established loci. Experimental downregulation of gene expression in Drosophila melanogaster and Danio rerio identified 20 genes at 11 loci that are relevant for heart rate regulation and highlight a role for genes involved in signal transmission, embryonic cardiac development and the pathophysiology of dilated cardiomyopathy, congenital heart failure and/or sudden cardiac death. In addition, genetic susceptibility to increased heart rate is associated with altered cardiac conduction and reduced risk of sick sinus syndrome, and both heart rate–increasing and heart rate–decreasing variants associate with risk of atrial fibrillation. Our findings provide fresh insights into the mechanisms regulating heart rate and identify new therapeutic targets.
The aim of this study was to evaluate the risk of Parkinson disease using clinical and demographic data alone and when combined with information from genes associated with Parkinson disease.
A total of 1,967 participants in the dbGAP NeuroGenetics Research Consortium data set were included. Single-nucleotide polymorphisms associated with Parkinson disease at a genome-wide significance level in previous genome-wide association studies were included in risk prediction. Risk allele scores were calculated as the weighted count of the minor alleles. Five models were constructed. Discriminatory capability was evaluated using the area under the curve.
Both family history and genetic risk scores increased risk for Parkinson disease. Although the fullest model, which included both family history and genetic risk information, resulted in the highest area under the curve, there were no significant differences between models using family history alone and those using genetic information alone.
Adding genome-wide association study–derived genotypes, family history information, or both to standard demographic risk factors for Parkinson disease resulted in an improvement in discriminatory capacity. In the full model, the contributions of genotype data and family history information to discriminatory capacity were similar, and both were statistically significant. This suggests that there is limited overlap between genetic risk factors identified through genome-wide association study and unmeasured susceptibility variants captured by family history. Our results are similar to those of studies of other complex diseases and indicate that genetic risk prediction for Parkinson disease requires identification of additional genetic risk factors and/or better methods for risk prediction in order to achieve a degree of risk prediction that is clinically useful.
genetics; Parkinson disease; risk prediction
New methodology has been proposed in recent years for evaluating the improvement in prediction performance gained by adding a new predictor, Y, to a risk model containing a set of baseline predictors, X, for a binary outcome D. We prove theoretically that null hypotheses concerning no improvement in performance are equivalent to the simple null hypothesis that Y is not a risk factor when controlling for X, H0: P (D = 1|X, Y) = P (D = 1|X). Therefore, testing for improvement in prediction performance is redundant if Y has already been shown to be a risk factor. We also investigate properties of tests through simulation studies, focusing on the change in the area under the ROC curve (AUC). An unexpected finding is that standard testing procedures that do not adjust for variability in estimated regression coefficients are extremely conservative. This may explain why the AUC is widely considered insensitive to improvements in prediction performance and suggests that the problem of insensitivity has to do with use of invalid procedures for inference rather than with the measure itself. To avoid redundant testing and use of potentially problematic methods for inference, we recommend that hypothesis testing for no improvement be limited to evaluation of Y as a risk factor, for which methods are well developed and widely available. Analyses of measures of prediction performance should focus on estimation rather than on testing for no improvement in performance.
Biomarker; Logistic regression; Receiver operating characteristic curve; Risk factors; Risk reclassification
The presence and severity of coronary artery calcified plaque (CAC) differs markedly between individuals of African and European descent, suggesting that admixture mapping (AM) may be informative for identifying genetic variants associated with subclinical cardiovascular disease (CVD).
Methods and Results
AM of CAC was performed in 1,040 unrelated African Americans with type 2 diabetes mellitus from the African American-Diabetes Heart Study (AA-DHS), Multi-Ethnic Study of Atherosclerosis (MESA), and Family Heart Study (FamHS) using the Illumina custom ancestry informative marker (AIM) panel. All cohorts obtained computed tomography scanning of the coronary arteries using identical protocols. For each AIM, the probability of inheriting 0, 1, and 2 copies of a European-derived allele was determined. Linkage analysis was performed by testing for association between each AIM using these probabilities and CAC, accounting for global ancestry, age, gender and study. Markers on 1p32.3 in the GLIS1 gene (rs6663966, LOD=3.7), 1q32.1 near CHIT1 (rs7530895, LOD=3.1), 4q21.2 near PRKG2 (rs1212373, LOD=3.0) and 11q25 in the OPCML gene (rs6590705, LOD=3.4) had statistically significant LOD scores, while markers on 8q22.2 (rs6994682, LOD=2.7), 9p21.2 (rs439314, LOD=2.7), and 13p32.1 (rs7492028, LOD=2.8) manifested suggestive evidence of linkage. These regions were uniformly characterized by higher levels of European ancestry associating with higher levels or odds of CAC. Findings were replicated in 1,350 AAs without diabetes and 2,497 diabetic European Americans from MESA and the Diabetes Heart Study.
Fine mapping these regions will likely identify novel genetic variants that contribute to CAC and clarify racial differences in susceptibility to subclinical CVD.
ancestry; cardiovascular disease risk factors; type 2 diabetes; admixture mapping
Ethnic differences in cardiac arrhythmia incidence have been reported, with a particularly high incidence of sudden cardiac death (SCD) and low incidence of atrial fibrillation in individuals of African ancestry. We tested the hypotheses that African ancestry and common genetic variants are associated with prolonged duration of cardiac repolarization, a central pathophysiological determinant of arrhythmia, as measured by the electrocardiographic QT interval.
Methods and Results
First, individual estimates of African and European ancestry were inferred from genome-wide single nucleotide polymorphism (SNP) data in seven population-based cohorts of African Americans (n=12 097) and regressed on measured QT interval from electrocardiograms. Second, imputation was performed for 2.8 million SNPs and a genome-wide association (GWA) study of QT interval performed in ten cohorts (n=13 105). There was no evidence of association between genetic ancestry and QT interval (p=0.94). Genome-wide significant associations (p<2.5×10−8) were identified with SNPs at two loci, upstream of the genes NOS1AP (rs12143842, p=2×10−15) and ATP1B1 (rs1320976, p=2×10−10). The most significant SNP in NOS1AP was the same as the strongest SNP previously associated with QT interval in individuals of European ancestry. Low p-values (p<10−5) were observed for SNPs at several other loci previously identified in GWA studies in individuals of European ancestry, including KCNQ1, KCNH2, LITAF and PLN.
We observed no difference in duration of cardiac repolarization with global genetic indices of African ancestry. In addition, our GWA study extends the association of polymorphisms at several loci associated with repolarization in individuals of European ancestry to include African Americans.
electrocardiography; electrophysiology; genome-wide association studies; ion channels; repolarization
The PR interval (PR) as measured by the resting, standard 12-lead electrocardiogram (ECG) reflects the duration of atrial/atrioventricular nodal depolarization. Substantial evidence exists for a genetic contribution to PR, including genome-wide association studies that have identified common genetic variants at nine loci influencing PR in populations of European and Asian descent. However, few studies have examined loci associated with PR in African Americans.
Methods and Results
We present results from the largest genome-wide association study to date of PR in 13,415 adults of African descent from ten cohorts. We tested for association between PR (ms) and approximately 2.8 million genotyped and imputed single nucleotide polymorphisms. Imputation was performed using HapMap 2 YRI and CEU panels. Study-specific results, adjusted for global ancestry and clinical correlates of PR, were meta-analyzed using the inverse variance method. Variation in genome-wide test statistic distributions was noted within studies (lambda range: 0.9–1.1), although not after genomic control correction was applied to the overall meta-analysis (lambda: 1.008). In addition to generalizing previously reported associations with MEIS1, SCN5A, ARHGAP24, CAV1, and TBX5 to African American populations at the genome-wide significance level (P<5.0×10−8), we also identified a novel locus: ITGA9, located in a region previously implicated in SCN5A expression. The 3p21 region harboring SCN5A also contained two additional independent secondary signals influencing PR (P<5.0×10−8).
This study demonstrates the ability to map novel loci in African Americans as well as the generalizability of loci associated with PR across populations of African, European and Asian descent.
electrocardiography; epidemiology; GWAS; single nucleotide polymorphism genetics; PR interval
Limited information is available regarding genetic contributions to valvular calcification, which is an important precursor of clinical valve disease.
We determined genomewide associations with the presence of aorticvalve calcification (among 6942 participants) and mitral annular calcification (among 3795 participants), as detected by computed tomographic (CT) scanning; the study population for this analysis included persons of white European ancestry from three cohorts participating in the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium (discovery population). Findings were replicated in independent cohorts of persons with either CT-detected valvular calcification or clinical aortic stenosis.
One SNP in the lipoprotein(a) (LPA) locus (rs10455872) reached genomewide significance for the presence of aorticvalve calcification (odds ratio per allele, 2.05; P = 9.0×10−10), a finding that was replicated in additional white European, African-American, and Hispanic-American cohorts (P<0.05 for all comparisons). Genetically determined Lp(a) levels, as predicted by LPA genotype, were also associated with aorticvalve calcification, supporting a causal role for Lp(a). In prospective analyses, LPA genotype was associated with incident aortic stenosis (hazard ratio per allele, 1.68; 95% confidence interval [CI], 1.32 to 2.15) and aortic-valve replacement (hazard ratio, 1.54; 95% CI, 1.05 to 2.27) in a large Swedish cohort; the association with incident aortic stenosis was also replicated in an independent Danish cohort. Two SNPs (rs17659543 and rs13415097) near the proinflammatory gene IL1F9 achieved genomewide significance for mitral annular calcification (P = 1.5×10−8 and P = 1.8×10−8, respectively), but the findings were not replicated consistently.
Genetic variation in the LPA locus, mediated by Lp(a) levels, is associated with aorticvalve calcification across multiple ethnic groups and with incident clinical aortic stenosis. (Funded by the National Heart, Lung, and Blood Institute and others.)
Coronary heart disease (CHD) is the major cause of death in the United States. Coronary artery calcification (CAC) scores are independent predictors of CHD. African Americans (AA) have higher rates of CHD but are less well-studied in genomic studies. We assembled the largest AA data resource currently available with measured CAC to identify associated genetic variants.
We analyzed log transformed CAC quantity (ln(CAC + 1)), for association with ~2.5 million single nucleotide polymorphisms (SNPs) and performed an inverse-variance weighted meta-analysis on results for 5,823 AA from 8 studies. Heritability was calculated using family studies. The most significant SNPs among AAs were evaluated in European Ancestry (EA) CAC data; conversely, the significance of published SNPs for CAC/CHD in EA was queried within our AA meta-analysis.
Heritability of CAC was lower in AA (~30%) than previously reported for EA (~50%). No SNP reached genome wide significance (p < 5E-08). Of 67 SNPs with p < 1E-05 in AA there was no evidence of association in EA CAC data. Four SNPs in regions previously implicated in CAC/CHD (at 9p21 and PHACTR1) in EA reached nominal significance for CAC in AA, with concordant direction. Among AA, rs16905644 (p = 4.08E-05) had the strongest association in the 9p21 region.
While we observed substantial heritability for CAC in AA, we failed to identify loci for CAC at genome-wide significant levels despite having adequate power to detect alleles with moderate to large effects. Although suggestive signals in AA were apparent at 9p21 and additional CAC and CAD EA loci, overall the data suggest that even larger samples and an ethnic specific focus will be required for GWAS discoveries for CAC in AA populations.
Atherosclerosis; Coronary artery calcium; Genetics; Meta-analysis; African-American
The genetic background of atrial fibrillation (AF) in whites and African Americans is largely unknown. Genes in cardiovascular pathways have not been systematically investigated.
Methods and Results
We examined a panel of approximately 50,000 common single nucleotide polymorphisms (SNPs) in 2,095 cardiovascular candidate genes and AF in three cohorts with participants of European (n=18,524; 2,260 cases) or African American descent (n=3,662; 263 cases) in the National Heart Lung and Blood Institute's Candidate Gene Association Resource. Results in whites were followed up in the German Competence Network for AF (n=906, 468 cases). The top result was assessed in relation to incident ischemic stroke in the Cohorts for Heart and Aging Research in Genomic Epidemiology Stroke Consortium (n= 19,602 whites, 1544 incident strokes). SNP rs4845625 in the IL6R gene was associated with AF (relative risk (RR) C allele, 0.90; 95% confidence interval (CI), 0.85–0.95; P=0.0005) in whites, but did not reach statistical significance in African Americans (RR, 0.86; 95% CI, 0.72–1.03; P=0.09). The results were comparable in the German AF Network replication, (RR, 0.71; 95% CI, 0.57–0.89; P=0.003). No association between rs4845625 and stroke was observed in whites. The known chromosome 4 locus near PITX2 in whites also was associated with AF in African Americans (rs4611994, hazard ratio, 1.40; 95% CI, 1.16–1.69; P=0.0005).
In a community-based cohort meta-analysis, we identified genetic association in IL6R with AF in whites. Additionally, we demonstrated that the chromosome 4 locus known from recent genome-wide association studies in whites is associated with AF in African Americans.
atrial fibrillation; single nucleotide polymorphism; epidemiology; cohort study; race/ethnicity
Burn demographics, prevention and care have changed considerably since the 1970s. The objectives were to 1) identify new and confirm previously described changes, 2) make comparisons to the American Burn Association National Burn Repository, 3) determine when the administration of fluids in excess of the Baxter formula began and to identify potential causes, and 4) model mortality over time, during a 36-year period (1974–2009) at the Harborview Burn Center in Seattle, WA, USA.
Methods and Findings
14,266 consecutive admissions were analyzed in five-year periods and many parameters compared to the National Burn Repository. Fluid resuscitation was compared in five-year periods from 1974 to 2009. Mortality was modeled with the rBaux model. Many changes are highlighted at the end of the manuscript including 1) the large increase in numbers of total and short-stay admissions, 2) the decline in numbers of large burn injuries, 3) that unadjusted case fatality declined to the mid-1980s but has changed little during the past two decades, 4) that race/ethnicity and payer status disparity exists, and 5) that the trajectory to death changed with fewer deaths occurring after seven days post-injury. Administration of fluids in excess of the Baxter formula during resuscitation of uncomplicated injuries was evident at least by the early 1990s and has continued to the present; the cause is likely multifactorial but pre-hospital fluids, prophylactic tracheal intubation and opioids may be involved.
1) The dramatic changes include the rise in short-stay admissions; as a result, the model of burn care practiced since the 1970s is still required but is no longer sufficient. 2) Fluid administration in excess of the Baxter formula with uncomplicated injuries began at least two decades ago. 3) Unadjusted case fatality declined to ∼6% in the mid-1980s and changed little since then. The rBaux mortality model is quite accurate.
Using ∼60,000 SNPs selected for minimal linkage disequilibrium, we perform population structure analysis of 1,374 unrelated Hispanic individuals from the Multi-Ethnic Study of Atherosclerosis (MESA), with self-identification corresponding to Central America (n = 93), Cuba (n = 50), the Dominican Republic (n = 203), Mexico (n = 708), Puerto Rico (n = 192), and South America (n = 111). By projection of principal components (PCs) of ancestry to samples from the HapMap phase III and the Human Genome Diversity Panel (HGDP), we show the first two PCs quantify the Caucasian, African, and Native American origins, while the third and fourth PCs bring out an axis that aligns with known South-to-North geographic location of HGDP Native American samples and further separates MESA Mexican versus Central/South American samples along the same axis. Using k-means clustering computed from the first four PCs, we define four subgroups of the MESA Hispanic cohort that show close agreement with self-identification, labeling the clusters as primarily Dominican/Cuban, Mexican, Central/South American, and Puerto Rican. To demonstrate our recommendations for genetic analysis in the MESA Hispanic cohort, we present pooled and stratified association analysis of triglycerides for selected SNPs in the LPL and TRIB1 gene regions, previously reported in GWAS of triglycerides in Caucasians but as yet unconfirmed in Hispanic populations. We report statistically significant evidence for genetic association in both genes, and we further demonstrate the importance of considering population substructure and genetic heterogeneity in genetic association studies performed in the United States Hispanic population.
Using genotype data from about 60,000 distinct genetic markers, we examined population structure in 1,374 unrelated Hispanic individuals from the Multi-Ethnic Study of Atherosclerosis (MESA), with self-identification corresponding to Central America (n = 93), Cuba (n = 50), the Dominican Republic (n = 203), Mexico (n = 708), Puerto Rico (n = 192), and South America (n = 111). By comparing genetic ancestry of MESA Hispanic participants to reference samples representing worldwide diversity, we show major differences in ancestry of MESA Hispanics reflecting their Caucasian, African, and Native American origins, with finer differences corresponding to North-South geographic origins that separate MESA Mexican versus Central/South American samples. Based on our analysis, we define four subgroups of the MESA Hispanic cohort that show close agreement with the following self-identified regions of origin: Dominican/Cuban, Mexican, Central/South American, and Puerto Rican. We examine association of triglycerides with selected genetic markers, and we further demonstrate the importance of considering differences in genetic ancestry (or factors associated with genetic ancestry) when performing genetic studies of the United States Hispanic population.
Despite a higher burden of standard atrial fibrillation (AF) risk factors, African Americans have a lower risk of AF than whites. It is unknown if the higher riskis due to genetic or environmental factors. As African Americans have varying degrees of European ancestry, we sought to test the hypothesis that European ancestry is an independent risk factor for AF.
Methods and Results
We studied whites (n=4,543) and African Americans (n=822) in the Cardiovascular Health Study (CHS) and whites (n=10,902) and Africa Americans (n=3,517) in the Atherosclerosis Risk in Communities (ARIC) Study (n=3,517). Percent European ancestry in African Americans was estimated using 1,747 ancestry informative markers (AIMs) from the Illumina custom ITMAT-Broad-CARe (IBC) array. Among African Americans without baseline AF, 120 of 804 CHS participants and 181 of 3,517 ARIC participants developed incident AF. A meta-analysis from the two studies revealed that every 10% increase in European ancestry increased the risk of AF by 13% (HR 1.13, 95% CI 1.03–1.23, p=0.007). After adjusting for potential confounders, European ancestry remained a predictor of incident AF in each cohort alone, with a combined estimated hazard ratio for each 10% increase in European ancestry of 1.17 (95% CI 1.07–1.29, p=0.001). A second analysis using 3,192 AIMs from a genome wide Affymetrix 6.0 array in ARIC African Americans yielded similar results.
European ancestry predicted risk of incident AF. Our study suggests that investigating genetic variants contributing to differential AF risk in individuals of African versus European ancestry will be informative.
Atrial Fibrillation Genetics; Ancestry; African Americans
Hypertrophic scar was first described over 100 years ago; PubMed has more than 1,000 references on the topic. Nevertheless prevention and treatment remains poor, because 1) there has been no validated animal model; 2) human scar tissue, which is impossible to obtain in a controlled manner, has been the only source for study; 3) tissues typically have been homogenized, mixing cell populations; and 4) gene-by-gene studies are incomplete.
We have assembled a system that overcomes these barriers and permits the study of genome-wide gene expression in microanatomical locations, in shallow and deep partial-thickness wounds, and pigmented and non-pigmented skin, using the Duroc(pigmented fibroproliferative)/Yorkshire(non-pigmented non-fibroproliferative) porcine model. We used this system to obtain the differential transcriptome at 1, 2, 3, 12 and 20 weeks post wounding. It is not clear when fibroproliferation begins, but it is fully developed in humans and the Duroc breed at 20 weeks. Therefore we obtained the derivative functional genomics unique to 20 weeks post wounding. We also obtained long-term, forty-six week follow-up with the model.
1) The scars are still thick at forty-six weeks post wounding further validating the model. 2) The differential transcriptome provides new insights into the fibroproliferative process as several genes thought fundamental to fibroproliferation are absent and others differentially expressed are newly implicated. 3) The findings in the derivative functional genomics support old concepts, which further validates the model, and suggests new avenues for reductionist exploration. In the future, these findings will be searched for directed networks likely involved in cutaneous fibroproliferation. These clues may lead to a better understanding of the systems biology of cutaneous fibroproliferation, and ultimately prevention and treatment of hypertrophic scarring.
The PR interval on the electrocardiogram reflects atrial and atrioventricular nodal conduction time. The PR interval is heritable, provides important information about arrhythmia risk, and has been suggested to differ among human races. Genome-wide association (GWA) studies have identified common genetic determinants of the PR interval in individuals of European and Asian ancestry, but there is a general paucity of GWA studies in individuals of African ancestry. We performed GWA studies in African American individuals from four cohorts (n = 6,247) to identify genetic variants associated with PR interval duration. Genotyping was performed using the Affymetrix 6.0 microarray. Imputation was performed for 2.8 million single nucleotide polymorphisms (SNPs) using combined YRI and CEU HapMap phase II panels. We observed a strong signal (rs3922844) within the gene encoding the cardiac sodium channel (SCN5A) with genome-wide significant association (p<2.5×10−8) in two of the four cohorts and in the meta-analysis. The signal explained 2% of PR interval variability in African Americans (beta = 5.1 msec per minor allele, 95% CI = 4.1–6.1, p = 3×10−23). This SNP was also associated with PR interval (beta = 2.4 msec per minor allele, 95% CI = 1.8–3.0, p = 3×10−16) in individuals of European ancestry (n = 14,042), but with a smaller effect size (p for heterogeneity <0.001) and variability explained (0.5%). Further meta-analysis of the four cohorts identified genome-wide significant associations with SNPs in SCN10A (rs6798015), MEIS1 (rs10865355), and TBX5 (rs7312625) that were highly correlated with SNPs identified in European and Asian GWA studies. African ancestry was associated with increased PR duration (13.3 msec, p = 0.009) in one but not the other three cohorts. Our findings demonstrate the relevance of common variants to African Americans at four loci previously associated with PR interval in European and Asian samples and identify an association signal at one of these loci that is more strongly associated with PR interval in African Americans than in Europeans.
We performed genome-wide association studies in African American participants from four population-based cohorts to identify genetic variation that correlates with variation in PR interval duration, an electrocardiographic measure of conduction through the atria and atrioventricular node. We observed a strong signal within the gene encoding the cardiac sodium channel, SCN5A, with genome-wide significant association (p<2.5×10−8) in two cohorts and in a meta-analysis of four cohorts with African Americans. We replicated this association in two additional cohorts of African Americans and in Europeans (p = 3×10−16). The signal explains 2% of PR duration variability in African Americans and 0.5% in Europeans. In further meta-analysis, we observed genome-wide significant associations for single nucleotide polymorphisms in SCN10A, MEIS1, TBX5, corresponding to signals observed in people of European and Asian descent. We found an association of genetic ancestry and PR interval in one but not the other three cohorts. Our findings provide the first demonstration of the relevance of these loci to individuals of African ancestry and identify an association signal from SCN5A that is more strongly associated with PR interval in African Americans.
Motivation: Permutation testing is very popular for analyzing microarray data to identify differentially expressed (DE) genes; estimating false discovery rates (FDRs) is a very popular way to address the inherent multiple testing problem. However, combining these approaches may be problematic when sample sizes are unequal.
Results: With unbalanced data, permutation tests may not be suitable because they do not test the hypothesis of interest. In addition, permutation tests can be biased. Using biased P-values to estimate the FDR can produce unacceptable bias in those estimates. Results also show that the approach of pooling permutation null distributions across genes can produce invalid P-values, since even non-DE genes can have different permutation null distributions. We encourage researchers to use statistics that have been shown to reliably discriminate DE genes, but caution that associated P-values may be either invalid, or a less-effective metric for discriminating DE genes.
Supplementary information: Supplementary data are available at Bioinformatics online.
As part of its broad and ambitious mission, the MicroArray Quality Control (MAQC) project reported the results of experiments using External RNA Controls (ERCs) on five microarray platforms. For most platforms, several different methods of data processing were considered. However, there was no similar consideration of different methods for processing the data from the Agilent two-color platform. While this omission is understandable given the scale of the project, it can create the false impression that there is consensus about the best way to process Agilent two-color data. It is also important to consider whether ERCs are representative of all the probes on a microarray.
A comparison of different methods of processing Agilent two-color data shows substantial differences among methods for low-intensity genes. The sensitivity and specificity for detecting differentially expressed genes varies substantially for different methods. Analysis also reveals that the ERCs in the MAQC data only span the upper half of the intensity range, and therefore cannot be representative of all genes on the microarray.
Although ERCs demonstrate good agreement between observed and expected log-ratios on the Agilent two-color platform, such an analysis is incomplete. Simple loess normalization outperformed data processing with Agilent's Feature Extraction software for accurate identification of differentially expressed genes. Results from studies using ERCs should not be over-generalized when ERCs are not representative of all probes on a microarray.