It is an obvious fact that the power of a test statistic is dependent upon the significance (alpha) level at which the test is performed. It is perhaps a less obvious fact that the relative performance of two statistics in terms of power is also a function of the alpha level. Through numerous personal discussions, we have noted that even some competent statisticians have the mistaken intuition that relative power comparisons at traditional levels such as α = 0.05 will be roughly similar to relative power comparisons at very low levels, such as the level α = 5 × 10−8, which is commonly used in genome-wide association studies. In this brief note, we demonstrate that this notion is in fact quite wrong, especially with respect to comparing tests with differing degrees of freedom. In fact, at very low alpha levels the cost of additional degrees of freedom is often comparatively low. Thus we recommend that statisticians exercise caution when interpreting the results of power comparison studies which use alpha levels that will not be used in practice.
Power; Small Significance Levels
We propose a two-step model-based approach, with correction for ascertainment, to linkage analysis of a binary trait with variable age of onset and apply it to a set of multiplex pedigrees segregating for adult glioma.
First, we fit segregation models by formulating the likelihood for a person to have a bivariate phenotype, affection status and age of onset, along with other covariates, and from these we estimate population trait allele frequencies and penetrance parameters as a function of age (N=281 multiplex glioma pedigrees). Second, the best fitting models are used as trait models in multipoint linkage analysis (N=74 informative multiplex glioma pedigrees). To correct for ascertainment, a prevalence constraint is used in the likelihood of the segregation models for all 281 pedigrees. Then the trait allele frequencies are re-estimated for the pedigree founders of the subset of 74 pedigrees chosen for linkage analysis.
Using the best fitting segregation models in model-based multipoint linkage analysis, we identified two separate peaks on chromosome 17; the first agreed with a region identified by Shete et al. who used model-free affected-only linkage analysis, but with a narrowed peak: and the second agreed with a second region they found but had a larger maximum log of the odds (LOD).
Our approach has the advantage of not requiring markers to be in linkage equilibrium unless the minor allele frequency is small (markers which tend to be uninformative for linkage), and of using more of the available information for LOD-based linkage analysis.
Glioma; model-based linkage; segregation; age of onset; prevalence constraint
Olson's conditional-logistic model retains the nice property of the LOD score formulation and has advantages over other methods that make it an appropriate choice for complex trait linkage mapping. However, the asymptotic distribution of the conditional-logistic likelihood-ratio (CL-LR) statistic with genetic constraints on the model parameters is unknown for some analysis models, even in the case of samples comprising only independent sib pairs. We derive approximations to the asymptotic null distributions of the CL-LR statistics and compare them with the empirical null distributions by simulation using independent affected sib pairs. Generally, the empirical null distributions of the CL-LR statistics match well the known or approximated asymptotic distributions for all analysis models considered except for the covariate model with a minimum-adjusted binary covariate. This work will provide useful guidelines for linkage analysis of real data sets for the genetic analysis of complex traits, thereby contributing to the identification of genes for disease traits.
linkage analysis; affected sib pairs; identity-by-descent; conditional-logistic model; genetic constraints; null distribution; likelihood-ratio statistics
Translation studies have been initiated to assess the combined effect of genetic loci from recently accomplished genome-wide association studies and the existing risk factors for early disease prediction. We propose a bagging optimal receiver operating characteristic (ROC) curve method to facilitate this research. Through simulation and real data application, we compared the new method with the commonly used allele counting method and logistic regression, and found that the new method yields a better performance. The new method was applied on the Wellcome Trust data set to form a predictive genetic test for rheumatoid arthritis. The formed test reached an area under the curve (AUC) value of 0.7.
Area under the ROC curve; Bootstrap aggregation; Gene–gene interaction; Genomewide association studies
A novel web-based tool PedWiz that pipelines the informatics process for pedigree data is introduced. PedWiz is designed to assist researchers in the analysis of pedigree data. It provides a convenient tool for pedigree informatics: descriptive statistics, relative pairs, genetic similarity coefficients, the variance-covariance matrix for three estimated coefficients of allele identical-by-descent sharing as well as mean allele sharing, a plot of the pedigree structures, and a visualization of the identity coefficients. With a renewed interest in linkage and other family based methods, PedWiz will be a valuable tool for the analysis of family data.
pedigree; informatics; genetic similarity; identity-by-descent; relative pairs; family data
This paper is concerned with evaluating whether an interaction between two sets of risk factors for a binary trait is removable and fitting a parsimonious additive model using a suitable link function to estimate the disease odds (on the natural logarithm scale) when an interaction is removable. Statisticians define the term “interaction” as a departure from additivity in a linear model on a specific scale on which the data are measured. Certain interactions may be eliminated via a transformation of the outcome such that the relationship between the risk factors and the outcome is additive on the transformed scale. Such interactions are known as removable interactions. We develop a novel test statistic for detecting the presence a removable interaction in case-control studies. We consider the Guerrero and Johnson family of transformations and show that this family constitutes an appropriate link function for fitting an additive model when an interaction is removable. We use simulation studies to examine the type I error and power of the proposed test and to show that an additive model based on the Guerrero and Johnson link function leads to more precise estimates of the disease odds parameters and a better fit when an interaction is removable. The proposed test and use of the transformation are illustrated using case-control data from three published studies. Finally, we indicate how one can check that, after transformation, no further interaction is significant.
Analysis of variance; curvature; independence; interaction effect; link function; main effect; residuals; score statistic; Tukey’s test; transformation; unbalanced data
Current genome-wide association studies still heavily rely on a single-marker strategy, in which each single nucleotide polymorphism (SNP) is tested individually for association with a phenotype. Although methods and software packages that consider multimarker models have become available, they have been slow to become widely adopted and their efficacy in real data analysis is often questioned. Based on conducting extensive simulations, here we endeavor to provide more insights into the performance of simple multimarker association tests as compared to single-marker tests. The results reveal the power advantage as well as disadvantage of the two- vs. the single-marker test. Power differentials depend on the correlation structure among tag SNPs, as well as that between tag SNPs and causal variants. A two-marker test has relatively better performance than single-marker tests when the correlation of the two adjacent markers is high. However, using HapMap data, two-marker tests tended to have a greater chance of being less powerful than single-marker tests, due to constraints on the number of actual possible haplotypes in the HapMap data. Yet, the average power difference was small whenever the one-marker test is more powerful, while there were many situations where the two-marker test can be much more powerful. These findings can be useful to guide analyses of future studies.
Asymptotic power; single-marker test; two-marker test; genome-wide association
We investigated the heritability and familial aggregation of various indexes of arterial stiffness and wave reflection and we partitioned the phenotypic correlation between these traits into shared genetic and environmental components.
Using a family-based population sample, we recruited 204 parents (mean age, 51.7 years) and 290 offspring (29.4 years) from the population in Cracow, Poland (62 families), Hechtel-Eksel, Belgium (36), and Pilsen, the Czech Republic (50). We measured peripheral pulse pressure (PPp) sphygmomanometrically at the brachial artery; central pulse pressure (PPc), the peripheral augmentation indexes (PAIxs) and central augmentation indexes (CAIxs) by applanation tonometry at the radial artery; and aortic pulse wave velocity (PWV) by tonometry or ultrasound. In multivariate-adjusted analyses, we used the ASSOC and PROC GENMOD procedures as implemented in SAGE and SAS, respectively.
We found significant heritability for PAIx, CAIx, PPc and mean arterial pressure ranging from 0.37 to 0.41; P ≤ 0.0001. The method of intrafamilial concordance confirmed these results; intrafamilial correlation coefficients were significant for all arterial indexes (r > ≥ 0.12; P < ≤ 0.02) with the exception of PPc (r = −0.007; P = 0.90) in parent–offspring pairs. The sib–sib correlations were also significant for CAIx (r = 0.22; P = 0.001). The genetic correlation between PWV and the other arterial indexes were significant (ρG ≥ 0.29; P < 0.0001). The corresponding environmental correlations were only significantly positive for PPp (ρE = 0.10, P = 0.03).
The observation of significant intrafamilial concordance and heritability of various indexes of arterial stiffness as well as the genetic correlations among arterial phenotypes strongly support the search for shared genetic determinants underlying these traits.
arterial stiffness; familial aggregation; heritability; pulse pressure; systolic augmentation
Segmental handling of sodium along the proximal and distal nephron might be heritable and different between black and white participants.
We randomly recruited 95 nuclear families of black South African ancestry and 103 nuclear families of white Belgian ancestry. We measured the (FENa) and estimated the fractional renal sodium reabsorption in the proximal (RNaprox) and distal (RNadist) tubules from the clearances of endogenous lithium and creatinine. In multivariable analyses, we studied the relation of RNaprox and RNadist with FENa and estimated the heritability (h2) of RNaprox and RNadist.
Independent of urinary sodium excretion, South Africans (n =240) had higher RNaprox (unadjusted median, 93.9% vs. 81.0%; P < 0.001) than Belgians (n =737), but lower RNadist (91.2% vs. 95.1%; P < 0.001). The slope of RNaprox on FENa was steeper in Belgians than in South Africans (−5.40 ±0.58 vs. −0.78 ±0.58 units; P < 0.001), whereas the opposite was true for the slope of RNadist on FENa (−3.84 ± 0.19 vs. −13.71 ± 1.30 units; P < 0.001). h2 of RNaprox and RNadist was high and significant (P < 0.001) in both countries. h2 was higher in South Africans than in Belgians for RNaprox (0.82 vs. 0.56; P < 0.001), but was similar for RNadist (0.68 vs. 0.50; P = 0.17). Of the filtered sodium load, black participants reabsorb more than white participants in the proximal nephron and less postproximally.
Segmental sodium reabsorption along the nephron is highly heritable, but the capacity for regulation in the proximal and postproximal tubules differs between whites and blacks.
clinical genetics; epidemiology; kidney; lithium clearance; salt sensitivity; segmental tubular sodium transport
15-Hydroxyprostaglandin dehydrogenase (15-PGDH) is a metabolic antagonist of COX-2, catalyzing the degradation of inflammation mediator prostaglandin E2 (PGE2) and other prostanoids. Recent studies have established the 15-PGDH gene as a colon cancer suppressor.
We evaluated 15-PDGH as a colon cancer susceptibility locus in a three-stage design. We first genotyped 102 single-nucleotide polymorphisms (SNPs) in the 15-PGDH gene, spanning ∼50 kb up and down-stream of the coding region, in 464 colon cancer cases and 393 population controls. We then genotyped the same SNPs, and also assayed the expression levels of 15-PGDH in colon tissues from 69 independent patients for whom colon tissue and paired germline DNA samples were available. In the final stage 3, we genotyped the 9 most promising SNPs from stages 1 and 2 in an independent sample of 525 cases and 816 controls (stage 3).
In the first two stages, three SNPs (rs1365611, rs6844282 and rs2332897) were statistically significant (p<0.05) in combined analysis of association with risk of colon cancer and of association with 15-PGDH expression, after adjustment for multiple testing. For one additional SNP, rs2555639, the T allele showed increased cancer risk and decreased 15-PGDH expression, but just missed statistical significance (p-adjusted = 0.063). In stage 3, rs2555639 alone showed evidence of association with an odds ratio (TT compared to CC) of 1.50 (95% CI = 1.05–2.15, p = 0.026).
Our data suggest that the rs2555639 T allele is associated with increased risk of colon cancer, and that carriers of this risk allele exhibit decreased expression of 15-PGDH in the colon.
The potential importance of the joint action of genes, whether modeled with or without a statistical interaction term, has long been recognized. However, identifying such action has been a great challenge, especially when millions of genetic markers are involved. We propose a likelihood ratio-based Mann-Whitney test to search for joint gene action either among candidate genes or genome-wide. It extends the traditional univariate Mann-Whitney test to assess the joint association of genotypes at multiple loci with disease, allowing for high-order statistical interactions. Because only one overall significance test is conducted for the entire analysis, it avoids the issue of multiple testing. Moreover, the approach adopts a computationally efficient algorithm, making a genome-wide search feasible in a reasonable amount of time on a high performance personal computer. We evaluated the approach using both theoretical and real data. By applying the approach to 40 type 2 diabetes (T2D) susceptibility single-nucleotide polymorphisms (SNPs), we identified a four-locus model strongly associated with T2D in the Wellcome Trust (WT) study (permutation P-value < 0.001), and replicated the same finding in the Nurses’ Health Study/Health Professionals Follow-Up Study (NHS/HPFS) (P-value = 3.03 × 10–11). We also conducted a genome-wide search on 385,598 SNPs in the WT study. The analysis took approximately 55 hr on a personal computer, identifying the same first two loci, but overall a different set of four SNPs, jointly associated with T2D (P-value = 1.29 × 10–5). The nominal significance of this same association reached 4.01 × 10–6 in the NHS/HPFS.
gene-gene interaction; genome-wide search; forward selection
Incorporating family data in genetic association studies has become increasingly appreciated, especially for its potential value in testing rare variants. We introduce here a variance-component based association test that can test multiple common or rare variants jointly using both family and unrelated samples.
The proposed approach implemented in our R package aggregates or collapses the information across a region based on genetic similarity instead of genotype scores, which avoids the power loss when the effects are in different directions or have different association strengths. The method is also able to effectively leverage the LD information in a region and it can produce a test statistic with an adaptively estimated number of degrees of freedom. Our method can readily allow for the adjustment of non-genetic contributions to the familial similarity, as well as multiple covariates.
We demonstrate through simulations that the proposed method achieves good performance in terms of Type I error control and statistical power. The method is implemented in the R package “fassoc”, which provides a useful tool for data analysis and exploration.
Association studies; Family data; Score test; Multi-marker test
It has been postulated that multiple-marker methods may have added ability, over single-marker methods, to detect genetic variants associated with disease. The Wellcome Trust Case Control Consortium (WTCCC) provided the first successful large genome-wide association studies (GWAS) which included single-marker association analyses for seven common complex diseases. Of those signals detected, only one was associated with coronary artery disease (CAD), and none were identified for hypertension (HTN). Our objective was to find additional genetic associations and pathways for cardiovascular disease by examining the WTCCC data for variants associated with CAD and HTN using two-marker testing methods. We applied two-marker association testing to the WTCCC dataset, which includes ~2,000 affected individuals with each disorder, and a shared pool of ~3,000 controls, all genotyped using Affymetrix GeneChip 500 K arrays. For CAD, we detected single nucleotide polymorphisms (SNP) pairs in three genes showing genome-wide significance: HFE2, STK32B, and DIPC2. The most notable SNP pairs in a non-protein-coding region were at 9p21, a known major CAD-associated region. For HTN, we detected SNP pairs in five genes: GPR39, XRCC4, MYO6, ZFAT, and MACROD2. Four further associated SNP pair regions were at least 70 kb from any known gene. We have shown that novel, multiple-marker, statistical methods can be of use in finding variants in GWAS. We describe many new, associated variants for both CAD and HTN and describe their known genetic mechanisms.
The genetic etiology of complex human diseases has been commonly viewed as a process that involves multiple genetic variants, environmental factors, as well as their interactions. Statistical approaches, such as the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR), have recently been proposed to test the joint association of multiple genetic variants with either dichotomous or continuous traits. In this paper, we propose a novel Forward U-Test to evaluate the combined effect of multiple loci on quantitative traits with consideration of gene-gene/gene-environment interactions. In this new approach, a U-Statistic-based forward algorithm is first used to select potential disease-susceptibility loci and then a weighted U statistic is used to test the joint association of the selected loci with the disease. Through a simulation study, we found the Forward U-Test outperformed GMDR in terms of greater power. Aside from that, our approach is less computationally intensive, making it feasible for high-dimensional gene-gene/gene-environment research. We illustrate our method with a real data application to Nicotine Dependence (ND), using three independent datasets from the Study of Addiction: Genetics and Environment. Our gene-gene interaction analysis of 155 SNPs in 67 candidate genes identified two SNPs, rs16969968 within gene CHRNA5 and rs1122530 within gene NTRK2, jointly associated with the level of ND (p-value = 5.31e-7). The association, which involves essential interaction, is replicated in two independent datasets with p-values of 1.08e-5 and 0.02, respectively. Our finding suggests that joint action may exist between the two gene products.
gene-gene interaction; Forward U-Test; Nicotine Dependence
Familial aggregation of specific response to allergens and asthma adjusted for age and sensitization to multiple allergens was assessed in two large population cohorts. Allergen skin prick tests (SPTs) were administered to 1151 families in the Tucson Children’s Respiratory Study (CRS) and 435 families in the Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD). Sensitization was defined by wheal size ≥ 3 mm; physician-diagnosed asthma at age ≥ 8 years was based on questionnaires. Using S.A.G.E. 6.1 software ASSOC and FCOR, familial correlations of crude and adjusted phenotypes were evaluated. Crude estimates of parent-offspring (P-O) and sibling correlations were statistically significant for most allergens, ranging from 0.03 to 0.29. After adjusting for age of assessment and “other atopy” (SPT-positive response to additional allergens), correlations were reduced by14–71%. Sibling correlations for specific response to allergens were consistently higher than P-O correlations, but this difference was significant only for dust mite and weed mix in the TESAOD population. Familial correlation for atopic status (any positive SPTs versus none) tended to be higher than for specific allergens. Asthma, with and without adjustment, showed greater familial correlation than either specific or general SPT response and significantly higher sibling correlation in TESAOD than in CRS, probably due to the older age of the siblings and the longer period of ascertainment. In conclusion, significant familial aggregation of specific response to allergen after adjustment for other atopy appears to reflect a genetic propensity toward atopy, dependent on shared familial exposures. Results also suggest that inheritance of asthma is independent of atopic sensitization.
familial aggregation; specific response to allergens; atopy; asthma
Genetic influences may be discerned in families that have multiple affected members and may manifest as an earlier age of cancer diagnosis. In this study we determine whether cancers develop at an earlier age in multiplex Familial Barrett’s Esophagus (FBE) kindreds, defined by 3 or more members affected by Barrett’s esophagus (BE) or esophageal adenocarcinoma (EAC).
Information on BE/EAC risk factors and family history was collected from probands at eight tertiary care academic hospitals. Age of cancer diagnosis and other risk factors were compared between non-familial (no affected relatives), duplex (two affected relatives), and multiplex (three or more affected relatives) FBE kindreds.
The study included 830 non-familial, 274 duplex and 41 multiplex FBE kindreds with 274, 133 and 43 EAC and 566, 288 and 103 BE cases, respectively. Multivariable mixed models adjusting for familial correlations showed that multiplex kindreds were associated with a younger age of cancer diagnosis (p = 0.0186). Median age of cancer diagnosis was significantly younger in multiplex compared to duplex and non-familial kindreds (57 vs. 62 vs. 63 yrs, respectively, p = 0.0448). Mean body mass index (BMI) was significantly lower in multiplex kindreds (p = 0.0033) as was smoking (p < 0.0001), and reported regurgitation (p = 0.0014).
Members of multiplex FBE kindreds develop EAC at an earlier age compared to non-familial EAC cases. Multiplex kindreds do not have a higher proportion of common risk factors for EAC, suggesting that this aggregation might be related to a genetic factor.
These findings indicate that efforts to identify susceptibility genes for BE and EAC will need to focus on multiplex kindreds.
Esophageal adenocarcinoma; Barrett’s esophagus; genetics; family history
Interactions among genomic loci (also known as epistasis) have been suggested as one of the potential sources of missing heritability in single locus analysis of genome-wide association studies (GWAS). The computational burden of searching for interactions is compounded by the extremely low threshold for identifying significant p-values due to multiple hypothesis testing corrections. Utilizing prior biological knowledge to restrict the set of candidate SNP pairs to be tested can alleviate this problem, but systematic studies that investigate the relative merits of integrating different biological frameworks and GWAS data have not been conducted.
We developed four biologically based frameworks to identify pairwise interactions among candidate SNP pairs as follows: (1) for each human protein-coding gene, a set of SNPs associated with that gene was constructed providing a gene-based interaction model, (2) for each known biological pathway, a set of SNPs associated with the genes in the pathway was constructed providing a pathway-based interaction model, (3) a set of SNPs associated with genes in a disease-related subnetwork provides a network-based interaction model, and (4) a framework is based on the function of SNPs. The last approach uses expression SNPs (eSNPs or eQTLs), which are SNPs or loci that have defined effects on the abundance of transcripts of other genes. We constructed pairs of eSNPs and SNPs located in the target genes whose expression is regulated by eSNPs. For all four frameworks the SNP sets were exhaustively tested for pairwise interactions within the sets using a traditional logistic regression model after excluding genes that were previously identified to associate with the trait. Using previously published GWAS data for type 2 diabetes (T2D) and the biologically based pair-wise interaction modeling, we identify twelve genes not seen in the previous single locus analysis.
We present four approaches to detect interactions associated with complex diseases. The results show our approaches outperform the traditional single locus approaches in detecting genes that previously did not reach significance; the results also provide novel drug targets and biomarkers relevant to the underlying mechanisms of disease.
Dopamine β-hydroxylase (DβH) catalyzes the conversion of dopamine to norepinephrine. DβH enters the plasma after vesicular release from sympathetic neurons and the adrenal medulla. Plasma DβH activity (pDβH) varies widely among individuals, and genetic inheritance regulates that variation. Linkage studies suggested strong linkage of pDβH to ABO on 9q34, and positive evidence for linkage to the complement fixation locus on 19p13.2-13.3. Subsequent association studies strongly supported DBH, which maps adjacent to ABO, as the locus regulating a large proportion of the heritable variation in pDβH. Prior studies have suggested that variation in pDβH, or genetic variants at DβH, associate with differences in expression of psychotic symptoms in patients with schizophrenia and other idiopathic or drug-induced brain disorders, suggesting that DBH might be a genetic modifier of psychotic symptoms. As a first step toward investigating that hypothesis, we performed linkage analysis on pDβH in patients with schizophrenia and their relatives. The results strongly confirm linkage of markers at DBH to pDβH under several models (maximum multipoint LOD score, 6.33), but find no evidence to support linkage anywhere on chromosome 19. Accounting for the contributions to the linkage signal of three SNPs at DBH, rs1611115, rs1611122, and rs6271 reduced but did not eliminate the linkage peak, whereas accounting for all SNPs near DBH eliminated the signal entirely. Analysis of markers genome-wide uncovered positive evidence for linkage between markers at chromosome 20p12 (multi-point LOD = 3.1 at 27.2 cM). The present results provide the first direct evidence for linkage between DBH and pDβH, suggest that rs1611115, rs1611122, rs6271 and additional unidentified variants at or near DBH contribute to the genetic regulation of pDβH, and suggest that a locus near 20p12 also influences pDβH.
Numerous studies have provided support for genetic susceptibility to tuberculosis (TB); however, heterogeneity in disease expression has hampered previous genetic studies. The purpose of this work was to investigate possible intermediate phenotypes for TB. A set of cytokine profiles, including antigen-stimulated whole-blood assays for interferon (IFN)–γ, tumor necrosis factor (TNF)–α, transforming growth factor (TGF)–β, and the ratio of IFN to TNF, were analyzed in 177 pedigrees from a community in Uganda with a high prevalence of TB. The heritability of these variables was estimated after adjustment for covariates, and TNF-α, in particular, had an estimated heritability of 68%. A principal component analysis of IFN-γ, TNF-α, and TGF-β reflected the immunologic model of TB. In this analysis, the first component explained >38% of the variation in the data. This analysis illustrates the value of such intermediate phenotypes in mapping susceptibility loci for TB and demonstrates that this area deserves further research.
It is generally known that risk variants segregate together with a disease within families but this information has not been used in the existing statistical methods for detecting rare variants. Here we introduce two weighted sum statistics that can apply to either genome-wide association data or resequencing data for identifying rare disease variants: weights calculated based on sibpairs and odd ratios, respectively. We evaluated the two methods via extensive simulations under different disease models. We compared the proposed methods with the weighted sum statistic (WSS) proposed by Madsen and Browning, keeping the same genotyping or resequencing cost. Our methods clearly demonstrate more statistical power than the WSS. In addition, we found using sibpair information can increase power over using only unrelated samples by more than 40%. We applied our methods to the Framingham Heart Study (FHS) and Wellcome Trust Case Control Consortium (WTCCC) hypertension datasets. Although we did not identify any genes as reaching a genome-wide significance level, we found variants in the candidate gene angiotensinogen (AGT) significantly associated with hypertension at P=6.9×10-4, whereas the most significant single SNP association evidence is P=0.063. We further applied the odds ratio weighted method to the IFIH1 gene for type 1 diabetes in the WTCCC data. Our method yielded a P value of 4.82×10-4, much more significant than that obtained by haplotype-based methods. We demonstrated that family data are extremely informative in searching for rare variants underlying complex traits, and the odds ratio weighted sum statistic is more efficient than currently existing methods.
Diabetic nephropathy (DN) is a leading cause of mortality and morbidity in patients with type 1 and type 2 diabetes. The multicenter FIND consortium aims to identify genes for DN and its associated quantitative traits, e.g. the urine albumin:creatinine ratio (ACR). Herein, the results of whole-genome linkage analysis and a sparse association scan for ACR and a dichotomous DN phenotype are reported in diabetic individuals.
A genomewide scan comprising more than 5,500 autosomal single nucleotide polymorphism markers (average spacing of 0.6 cM) was performed on 1,235 nuclear and extended pedigrees (3,972 diabetic participants) ascertained for DN from African-American (AA), American-Indian (AI), European-American (EA) and Mexican-American (MA) populations.
Strong evidence for linkage to DN was detected on chromosome 6p (p = 8.0 × 10−5, LOD = 3.09) in EA families as well as suggestive evidence for linkage to chromosome 7p in AI families. Regions on chromosomes 3p in AA, 7q in EA, 16q in AA and 22q in MA displayed suggestive evidence of linkage for urine ACR. The linkage peak on chromosome 22q overlaps the MYH9/APOL1 gene region, previously implicated in AA diabetic and nondiabetic nephropathies.
These results strengthen the evidence for previously identified genomic regions and implicate several novel loci potentially involved in the pathogenesis of DN.
Albuminuria; Diabetes mellitus; Renal failure; End-stage renal disease; Linkage; Allelic association
Numerous studies have examined genetic influences on developmental problems such as speech sound disorders, language impairment, and reading disability. Disorders such as speech sound disorder (SSD) are often analyzed using their component endophenotypes. Most studies, however, have involved comparisons of twin pairs or siblings of similar age, or have adjusted for age ignoring effects that are peculiar to age-related trajectories for phenotypic change. Such developmental changes in these skills have limited the usefulness of data from parents or siblings who differ substantially in age from the probands. Employing parent-offspring correlation in heritability estimation permits a more precise estimate of the additive component of genetic variance, but different generations have to be measured for the same trait. We report on a smoothing procedure which fits a series of lines that approximate a curve matching the developmental trajectory. This procedure adjusts for changes in measures with age, so that the adjusted values are on a similar scale for children, adolescents, and adults. We apply this method to four measures of phonological memory and articulation in order to estimate their heritability. Repetition of multisyllabic real words showed the best heritability estimate of 45% in this sample. We conclude that differences in measurement scales across the age span can be reconciled through non-linear modeling of the developmental process.
Speech; Language; longitudinal; developmental genetics; spline fitting
Although recent studies have attempted to dispel the confusion that exists in regard to the definition, analysis and interpretation of interaction in genetics, there still remain aspects that are poorly understood by non-statisticians. After a brief discussion of the definition of gene-gene interaction, the main part of this study addresses the fundamental meaning of statistical interaction and its relationship to measurement scale, disproportionate sample sizes in the cells of a two-way table and gametic phase disequilibrium.
Epistasis; Gametic phase disequilibrium; Interaction; Transformation
Structural Equation Modeling (SEM) is an analysis approach that accounts for both the causal relationships between variables and the errors associated with the measurement of these variables. In this paper, a framework for implementing structural equation models (SEMs) in family data is proposed.
This framework includes both a latent measurement model and a structural model with covariates. It allows for a wide variety of models, including latent growth curve models. Environmental, polygenic and other genetic variance components can be included in the SEM. Kronecker notation makes it easy to separate the SEM process from a familial correlation model. A limited information method of model fitting is discussed. We show how missing data and ascertainment may be handled. We give several examples of how the framework may be used.
A simulation study shows that our method is computationally feasible, and has good statistical properties.
Our framework may be used to build and compare causal models using family data without any genetic marker data. It also allows for a nearly endless array of genetic association and/or linkage tests. A preliminary Matlab program is available, and we are currently implementing a more complete and user-friendly R package.
Latent variable analysis; Path analysis; Extended pedigrees; Complex traits; Genetic linkage analysis; Genetic association
The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.