Type 2 diabetes mellitus (T2DM) is a common complex phenotype that by the year 2010 is predicted to affect 221 million people globally. In the present study we performed a genome-wide linkage scan using the allele-sharing statistic Sall implemented in Allegro and a novel two-dimensional genome-wide strategy implemented in Merloc that searches for pairwise interaction between genetic markers located on different chromosomes linked to T2DM. In addition, we used a robust score statistic from the newly developed QTL-ALL software to search for linkage to variation in adult height. The strategies were applied to a study sample consisting of 238 sib-pairs affected with T2DM from American Samoa. We did not detect any genome-wide significant susceptibility loci for T2DM. However, our two-dimensional linkage investigation detected several loci pairs of interest, including 11q22 and 21q21, 9q21 and 11q22, 1p22–p21 and 4p15, and 4p15 and 15q11–q14, with a two-loci maximum LOD score (MLS) greater than 2.00. Most detected individual loci have previously been identified as susceptibility loci for diabetes-related traits. Our two-dimensional linkage results may facilitate the selection of potential candidate genes and molecular pathways for further diabetes studies because these results, besides providing candidate loci, also demonstrate that polygenic effects may play an important role in T2DM. Linkage was detected (p value of 0.005) for variation in adult height on chromosome 9q31, which was reported previously in other populations. Our finding suggests that the 9q31 region may be a strong quantitative trait locus for adult height, which is likely to be of importance across populations.
DIABETES; STATURE; LINKAGE ANALYSIS; TWO-DIMENSIONAL LINKAGE; QUANTITATIVE TRAIT LOCUS; QTL-ALL SOFTWARE
Association of insulin-induced gene 2 (INSIG2) variants with obesity has been confirmed in several but not all follow-up studies. Differences in environmental factors across populations may mask some genetic associations and therefore gene-environment interactions should be explored. We hypothesized that the association between dietary patterns and components of the metabolic syndrome could be modified by INSIG2 variants.
We conducted a longitudinal study of adiposity and cardiovascular disease risk among 427 and 290 adults from Samoa and American Samoa (1990–95). Principal component analysis on food items from a validated FFQ was used to identify neo-traditional and modern dietary patterns. We explored gene-dietary pattern interactions with the INSIG2 variants rs9308762 and rs7566605.
Results for American Samoans were mostly non-significant. In Samoa, the neo-traditional dietary pattern was associated with lower triglycerides, BMI, waist circumference, systolic and diastolic blood pressure, and fasting glucose (all p-for-trend<0.05). The modern pattern was significantly associated with higher triglycerides, BMI, waist circumference, and lower HDL cholesterol (all p-for-trend<0.05). A significant interaction for triglycerides was found between the modern pattern and the rs9308762 polymorphism (p=0.04). Those from Samoa consuming the modern pattern have higher triglycerides if they are homozygous for the rs9308762 C allele.
The common INSIG2 rs9308762 variant was associated with poorer metabolic control and a greater sensitivity of trigylcerides to a modern dietary pattern. Environmental factors need to be taken into account when assessing genetic associations across and within populations.
INSIG2; dietary patterns; gene-diet interactions; metabolic risk; Samoa
A previous genome-wide linkage study of alcohol dependence (AD) in multiplex families found a suggestive linkage result for a region on Chromosome 1 near microsatellite markers D1S196 and D1S2878. The ASTN1 gene is in this region, a gene previously reported to be associated with substance abuse, bipolar disorder and schizophrenia. Using the same family data consisting of 330 individuals with phenotypic data and DNA, finer mapping of a 26 cM region centered on D1S196 was undertaken using SNPs with minor allele frequency (MAF) ≥ 0.15 and pair-wise linkage disequilibrium (LD) of r2 <0.8 using the HapMap CEU population. Significant FBAT P-values for SNPs within the ASTN1 gene were observed for four SNPs (rs465066, rs228008, rs6668092, and rs172917), the most significant, rs228008, within intron 8 had a P-value of 0.001. Using MQLS, which allows for inclusion of all families, we find three of these SNPs with MQLS P-values <0.003. In addition, two additional neighboring SNPs (rs10798496 and rs6667588) showed significance at P = 0.002 and 0.03, respectively. Haplotype analysis was performed using the haplotype-based test function of FBAT for a block that included rs228008, rs6668092, and rs172917. This analysis found one block (GCG) over-transmitted and another (ATA) under-transmitted to affected offspring. Linkage analysis identified a region consistent with the association results. Family-based association analysis shows the ASTN1 gene significantly associated with alcohol dependence. The potential importance of the ASTN1 gene for AD risk may be related its role in glial-guided neuronal migration.
ASTN1; alcohol dependence; multiplex families
To examine associations in a preterm population between rs9883204 in ADCY5 and rs900400 near LEKR1 and CCNL1 with birth weight. Both markers were associated with birth weight in a term population in a recent genome-wide association (GWA) study by Freathy et al.
A meta-analysis of mother and infant samples was performed for associations of rs900400 and rs9883204 with birth weight in 393 families from the U.S., 265 families from Argentina and 735 mother-infant pairs from Denmark. Z scores adjusted for infant sex and gestational age were generated for each population separately and regressed on allele counts. Association evidence was combined across sites by inverse-variance weighted meta-analysis.
Each additional C allele of rs900400 (LEKR1/CCNL1) in infants was marginally associated with a 0.069 standard deviation (SD) lower birth weight (95% CI = −0.159 – 0.022, P = 0.068). This result was slightly more pronounced after adjusting for smoking (P = 0.036). There were no significant associations identified with rs9883204 or in maternal samples.
These results indicate the potential importance of this marker on birth weight irrespective of gestational age.
Genetic; association; single nucleotide polymorphism
The association between obesity and the fat mass and obesity associated (FTO) gene has been widely replicated among Caucasian populations. The limited number of studies assessing its significance in Asian populations have been somewhat conflicting. We performed a genetic association study of 51 tagging, GWAS, and imputed single nucleotide polymorphisms with twelve measures of adiposity and skeletal robustness in two Samoan populations of Polynesia. We included 465 and 624 unrelated American Samoan and Samoan individuals, respectively; these populations derive from a single genetic background traced to Southeast Asia and represent one socio-cultural unit, although they are economically disparate with distinct environmental exposures. American Samoans were significantly larger than Samoans in all measures of obesity and most measures of skeletal robustness. In separate analyses of American Samoa and Samoa, we found a total of 36 nominal associations between FTO variants and skeletal and obesity measures. The preponderance of these nominal associations (32 of 36) was observed in the Samoan population, and predominantly with skeletal rather than fat mass measures (28 of 36). All significance disappeared, however, following corrections for multiple testing. Based on these findings, it could be surmised that FTO is not likely a major obesity locus in Polynesian populations.
obesity; FTO; association analysis; Samoa
A genetic map function M(d) = RF provides a mapping from the additive genetic distance d to the non-additive recombination fraction RF between a given pair of loci, where the recombination fraction is the proportion of gametes that are recombinant between the two loci. Genetic map functions are needed because in most experiments all we can directly observe are the recombination events. However, since a recombination event is only observed if there are an odd number of crossovers between the two loci, recombination fractions are not additive. One of the most widely used map functions is Haldane's map function, which is derived under the assumptions of no chiasma and no chromatid interference, and has been in widespread use since 1919. However, Casares recently proposed a 'corrected' Haldane's map function – we show here that this 'corrected' map function is not correct due to faulty assumptions and mistakes in its derivation.
To estimate whether African ancestry, specific gene polymorphisms, and gene-environment interactions could account for some of the unexplained preterm birth variance within blacks.
We genotyped 1,509 African ancestry informative markers, cytochrome P-450 1A1 (CYP1A1) and glutathione S-transferases Theta 1 (GSTT1) variants in 1,030 self-reported black mothers. We estimated the African ancestral proportion using the ancestry informative markers for all 1,030 self-reported black mothers. We examined the effect of African ancestry and CYP1A1 and GSTT1 smoking interactions on preterm birth cases as a whole and within its subgroups: very preterm birth (gestational age less than 34 weeks); and late preterm birth (gestational age greater than 34 and less than 37 weeks). We applied logistic regression and receiver operating characteristic (ROC) curve analysis, separately, to evaluate if African ancestry and CYP1A1- and GSTT1-smoking interactions could make additional contributions to preterm birth beyond epidemiological factors.
We found significant associations of African ancestry with preterm birth (22% vs. 31%, OR=1.11; 95%CI: 1.02–1.20) and very preterm birth (23% vs. 33%, OR=1.17; 95%CI: 1.03–1.33), but not with late preterm birth (22% vs. 29%, OR=1.06; 95%CI: 0.97–1.16). In addition, the ROC curve analysis suggested that African ancestry and CYP1A1- and GSTT1-smoking interactions made substantial contributions to very preterm birth beyond epidemiologic factors.
Our data underscore the importance of simultaneously considering epidemiological factors, African ancestry, specific gene polymorphisms and gene-environment interactions to better understand preterm birth racial disparity and to improve our ability to predict preterm birth, especially very preterm birth.
Studies have found both genetic and environmental influences on chronic periodontitis. The purpose of this study was to examine the relationships among previously identified genetic variants, smoking status, and two periodontal disease-related phenotypes (PSR1 and PSR2) in 625 Caucasian adults (aged 18–49 years). The PSR Index was used to classify participants as affected or unaffected under the PSR1 and PSR2 phenotype definitions. Using logistic regression, we found that the form of the relationship varied by single nucleotide polymorphism (SNP): For rs10457525 and rs12630931, the effects of smoking and genotype on risk were additive; whereas for rs10457526 and rs733048, smoking was not independently associated with affected status once genotype was taken into consideration. In contrast, smoking moderated the relationships of rs3870371 and rs733048 with affected status such that former and never smokers with select genotypes were at increased genetic risk. Thus, for several groups, knowledge of genotype may refine the risk prediction over that which can be determined by knowledge of smoking status alone. Future studies should replicate these findings. These findings provide the foundation for the exploration of novel pathways by which periodontitis may occur.
adult; chronic periodontitis; genetics; genomics; smoking
Simulation of genotypes in pedigrees is an important tool to evaluate the power of a linkage or an association study and to assess the empirical significance of results. SLINK is a widely-used package for pedigree simulations, but its implementation has not previously been described in a published paper. SLINK was initially derived from the LINKAGE programs. Over the 20 years since its release, SLINK has been modified to incorporate faster algorithms, notably from the linkage analysis package FASTLINK, also derived from LINKAGE. While SLINK can simulate genotypes on pedigrees of high complexity, one limitation of SLINK, as with most methods based on peeling algorithms to evaluate pedigree likelihoods, is the small number of linked markers that can be generated. The software package SUP includes an elegant wrapper for SLINK that circumvents the limitation on number of markers by using descent markers generated by SLINK to simulate a much larger number of markers on the same chromosome, linked and possibly associated with a trait locus. We have released new coordinated versions of SLINK (3.0; available from http://watson.hgen.pitt.edu) and SUP (v090804; available from http://mlemire.freeshell.org/software or http://watson.hgen.pitt.edu) that integrate the two software packages. Thereby, we have removed some of the previous limitations on the joint functionality of the programs, such as the number of founders in a pedigree. We review the history of SLINK and describe how SLINK and SUP are now coordinated to permit the simulation of large numbers of markers linked and possibly associated with a trait in large pedigrees.
Coordinated conditional simulation; SLINK; SUP; Linkage study; Association study; Pedigree, large; Pedigree, complex
Dental caries is the result of a complex interplay among environmental, behavioral, and genetic factors, with distinct patterns of decay likely due to specific etiologies. Therefore, global measures of decay, such as the DMFS index, may not be optimal for identifying risk factors that manifest as specific decay patterns, especially if the risk factors such as genetic susceptibility loci have small individual effects. We used two methods to extract patterns of decay from surface-level caries data in order to generate novel phenotypes with which to explore the genetic regulation of caries.
The 128 tooth surfaces of the permanent dentition were scored as carious or not by intra-oral examination for 1,068 participants aged 18 to 75 years from 664 biological families. Principal components analysis (PCA) and factor analysis (FA), two methods of identifying underlying patterns without a priori surface classifications, were applied to our data.
The three strongest caries patterns identified by PCA recaptured variation represented by DMFS index (correlation, r = 0.97), pit and fissure surface caries (r = 0.95), and smooth surface caries (r = 0.89). However, together, these three patterns explained only 37% of the variability in the data, indicating that a priori caries measures are insufficient for fully quantifying caries variation. In comparison, the first pattern identified by FA was strongly correlated with pit and fissure surface caries (r = 0.81), but other identified patterns, including a second pattern representing caries of the maxillary incisors, were not representative of any previously defined caries indices. Some patterns identified by PCA and FA were heritable (h2 = 30-65%, p = 0.043-0.006), whereas other patterns were not, indicating both genetic and non-genetic etiologies of individual decay patterns.
This study demonstrates the use of decay patterns as novel phenotypes to assist in understanding the multifactorial nature of dental caries.
Dental caries genetics; Heritability; Permanent dentition; Pit and fissure surfaces; Smooth surfaces; Tooth surfaces; Principal components analysis; Factor analysis; Patterns of tooth decay; Patterns of dental caries
Through extensive linkage and association analyses in multiple independent datasets, this study identified CACNG3 as the most likely AMD susceptibility gene on 16p12.
Age-related macular degeneration (AMD) is a complex disorder of the retina, characterized by drusen, geographic atrophy, and choroidal neovascularization. Cigarette smoking and the genetic variants CFH Y402H, ARMS2 A69S, CFB R32Q, and C3 R102G have been strongly and consistently associated with AMD. Multiple linkage studies have found evidence suggestive of another AMD locus on chromosome 16p12 but the gene responsible has yet to be identified.
In the initial phase of the study, single-nucleotide polymorphisms (SNPs) across chromosome 16 were examined for linkage and/or association in 575 Caucasian individuals from 148 multiplex and 77 singleton families. Additional variants were tested in an independent dataset of unrelated cases and controls. According to these results, in combination with gene expression data and biological knowledge, five genes were selected for further study: CACNG3, HS3ST4, IL4R, Q7Z6F8, and ITGAM.
After genotyping additional tagging SNPs across each gene, the strongest evidence for linkage and association was found within CACNG3 (rs757200 nonparametric LOD* = 3.3, APL (association in the presence of linkage) P = 0.06, and rs2238498 MQLS (modified quasi-likelihood score) P = 0.006 in the families; rs2283550 P = 1.3 × 10−6, and rs4787924 P = 0.002 in the case–control dataset). After adjusting for known AMD risk factors, rs2283550 remained strongly associated (P = 2.4 × 10−4). Furthermore, the association signal at rs4787924 was replicated in an independent dataset (P = 0.035) and in a joint analysis of all the data (P = 0.001).
These results suggest that CACNG3 is the best candidate for an AMD risk gene within the 16p12 linkage peak. More studies are needed to confirm this association and clarify the role of the gene in AMD pathogenesis.
Animal and human studies of addiction indicate that the D2 dopamine receptor (DRD2) plays a critical role in the mechanism of drug reward. D2 receptor density in the brains of alcoholics has been shown to be reduced relative to controls. Previous studies of DRD2 in association with alcohol dependence using variation in the TaqI A locus were highly controversial. Recently, a synonymous mutation, C957T, in the coding region of the human DRD2 gene has been identified which appears to have functional effects including alteration in receptor availability. In order to determine if susceptibility to alcohol dependence (AD) within multiplex alcohol dependence families would be altered by the C957T in the coding region of the D2 gene, within-family association was studied in members of Caucasian multiplex alcohol dependence families. Members of control families with no personal alcohol or substance dependence history were included for case/control comparisons. Analyses performed to detect within-family association showed evidence favoring an association for the C957T polymorphism (P = 0.038). Linkage analyses of polymorphisms in this region showed that only the C957T locus remained of interest (P = 0.015). Evidence for the C957T T allele having a role in AD susceptibility at the population level using a case/control comparison was statistically marginal (P = 0.062), but was consistent with the family data results. These results support a role for DRD2 as a susceptibility gene for alcohol dependence within multiplex families at high risk for developing alcohol dependence.
D2 receptor; alcohol dependence; multiplex families; pedigree disequilibrium test; linkage
In a genetic association study, it is often desirable to perform an overall test of whether any or all single-nucleotide polymorphisms (SNPs) in a gene are associated with a phenotype. Several such tests exist, but most of them are powerful only under very specific assumptions about the genetic effects of the individual SNPs. In addition, some of the existing tests assume that the direction of the effect of each SNP is known, which is a highly unlikely scenario. Here we propose a new kernel-based association test (KBAT) of joint association of several SNPs. Our test is non-parametric and robust, and does not make any assumption about the directions of individual SNP effects. It can be used to test multiple correlated SNPs within a gene and can also be used to test independent SNPs or genes in a biological pathway. Our test uses an analysis of variance (ANOVA) paradigm to compare variation between cases and controls to the variation within the groups. The variation is measured using kernel functions for each marker, and then a composite statistic is constructed to combine the markers into a single test. We present simulation results comparing our statistic to the U-statistic based method by Schaid et al. and another statistic by Wessel and Schork. We consider a variety of different disease models and assumptions about how many SNPs within the gene are actually associated with disease. Our results indicate that our statistic has higher power than other statistics under most realistic conditions.
genetic similarity; association study; multilocus association
Accurate genetic maps are required for successful and efficient linkage mapping of disease genes. However, most available genome-wide genetic maps were built using only small collections of pedigrees, and therefore have large sampling errors. A large set of genetic studies genotyped by the NHLBI Mammalian Genotyping Service (MGS) provide appropriate data for generating more accurate maps.
We collected a large sample of uncleaned genotype data for 461 markers generated by the MGS using the Weber screening sets 9 and 10. This collection includes genotypes for over 4,400 pedigrees containing over 17,000 genotyped individuals from different populations. We identified and cleaned numerous relationship and genotyping errors, as well as verified the marker orders. We used this dataset to test for population-specific genetic maps, and to re-estimate the genetic map distances with greater precision; standard errors for all intervals are provided. The map-interval sizes from the European (or European descent), Chinese, and Hispanic samples are in quite good agreement with each other. We found one map interval on chromosome 8p with a statistically significant size difference between the European and Chinese samples, and several map intervals with significant size differences between the African American and Chinese samples. When comparing Palauan with European samples, a statistically significant difference was detected at the telomeric region of chromosome 11p. Several significant differences were also identified between populations in chromosomal and genome lengths.
Our new population-specific screening set maps can be used to improve the accuracy of disease-mapping studies. As a result of the large sample size, the average length of the 95% confidence interval (CI) for a 10 cM map interval is only 2.4 cM, which is considerably smaller than on previously published maps.
Limited access to large samples and independent replication cohorts precludes genome-wide association (GWA) studies of rare but complex traits. To localize candidate genes with family-based GWA, a novel exploratory analysis was first tested on 1,774 major histocompatibility complex single nucleotide polymorphisms (SNPs) in 240 DNA samples from 80 children with primary liver transplantation (LTx), and their biological parents.
Initially, 57 SNPs with large differences (p<0.05) in minor allele frequencies were selected, when parents of children with early rejection (Rejectors) were compared with parents of Non-Rejectors. In hypothesis-testing of selected SNPs, the gamete competition statistic identified the minor allele G (ancestral allele T) of the SNP rs9296068, near HLA-DOA, as being significantly different (p=0.018) in parent-to-child transmission between outcome groups. Subsequent simple association testing confirmed over- and under-transmission of rs9296068 based on 1) the most significant differences between outcome groups, of 1,774 SNPs tested (p=0.002), and 2) allele (G) frequencies that were greater among Rejectors (51.4 vs. 36.8%, p=0.015), and lower among Non-Rejectors (26.8 vs. 36.8%, p=0.074), compared with 400 normal control Caucasian children. In early functional validation, a) Rejectors demonstrated significant repression of the first HLA-DOA exon closest to rs9296068, and b) Rejectors with the risk allele showed 3-fold greater intragraft content of B-lymphocytes, whose antigen-presenting function is inhibited selectively by HLA-DOA, compared with Rejectors without the allele.
The minor allele of the SNP rs9296068 is significantly associated with LTx rejection, and with enhanced B-lymphocyte participation in rejection, likely due to a dysfunctional HLA-DOA gene product.
Obesity is a complex phenotype affected by genetic and environmental influences such as sociocultural factors and individual behaviors. Previously, we performed two separate genome-wide investigations for adiposity-related traits (BMI, percentage body fat (%BF), abdominal circumference (ABDCIR), and serum leptin and serum adiponectin levels) in families from American Samoa and in families from Samoa. The two polities have a common evolutionary history but have lately been influenced by variations in economic development, leading to differences in income and wealth and in dietary and physical activity patterns. We now present a genome-wide linkage scan of the combined samples from the two polities. We adjust for environmental covariates, including polity of residence, education, cigarette smoking, and farm work, and use variance component methods to calculate univariate and bivariate multipoint lod scores. We identified a region on 9p22 with genome-wide significant linkage for the bivariate phenotypes ABDCIR–%BF (1-d.f. lod 3.30) and BMI–%BF (1-d.f. lod 3.31) and two regions with genome-wide suggestive linkage on 8p12 and 16q23 for adiponectin (lod 2.74) and the bivariate phenotype leptin–ABDCIR (1-d.f. lod 3.17), respectively. These three regions have previously been reported to be linked to adiposity-related phenotypes in independent studies. However, the differences in results between this study and our previous polity-specific studies suggest that environmental effects are of different importance in the samples. These results strongly encourage further genetic studies of adiposity-related phenotypes where extended sets of carefully measured environmental factors are taken into account.
A genome wide association study found significant association of a sequence variant, rs7566605, in the insulin-induced gene 2 (INSIG2) with obesity. However, the association remained inconclusive in follow-up studies. We tested for association of four tagging SNPs (tagSNPs) including this variant with body mass index (BMI) and abdominal circumference (ABDCIR) in the Samoans of the Western Pacific, a population with high levels of obesity.
We studied 907 adult Samoan participants from a longitudinal study of adiposity and cardiovascular disease risk in two polities, American Samoa and Samoa. Four tagSNPs were identified from the Chinese HapMap database based on pairwise r2 of ≥0.8 and minor allele frequency of ≥0.05. Genotyping was performed using the TaqMan assay. Tests of association with BMI and ABDCIR were performed under the additive model.
We did not find association of rs7566605 with either BMI or ABDCIR in any group of the Samoans. However, the most distally located tagSNPs in Intron 3 of the gene, rs9308762, showed significant association with both BMI (p-value 0.024) and ABDCIR (p-value 0.009) in the combined sample and with BMI (p-value 0.038) in the sample from Samoa.
Although rs7566605 was not significantly associated with obesity in our study population, we can not rule out the involvement of INSIG2 in obesity related traits as we found significant association of another tagSNP in INSIG2 with both BMI and ABDCIR. This study suggests the importance of comprehensive assessment of sequence variants within a gene in association studies.
Recent successful discoveries of potentially causal single nucleotide polymorphisms (SNPs) for complex diseases hold great promise, and commercialization of genomics in personalized medicine has already begun. The hope is that genetic testing will benefit patients and their families, and encourage positive lifestyle changes and guide clinical decisions. However, for many complex diseases, it is arguable whether the era of genomics in personalized medicine is here yet. We focus on the clinical validity of genetic testing with an emphasis on two popular statistical methods for evaluating markers. The two methods, logistic regression and receiver operating characteristic (ROC) curve analysis, are applied to our age-related macular degeneration dataset. By using an additive model of the CFH, LOC387715, and C2 variants, the odds ratios are 2.9, 3.4, and 0.4, with p-values of 10−13, 10−13, and 10−3, respectively. The area under the ROC curve (AUC) is 0.79, but assuming prevalences of 15%, 5.5%, and 1.5% (which are realistic for age groups 80 y, 65 y, and 40 y and older, respectively), only 30%, 12%, and 3% of the group classified as high risk are cases. Additionally, we present examples for four other diseases for which strongly associated variants have been discovered. In type 2 diabetes, our classification model of 12 SNPs has an AUC of only 0.64, and two SNPs achieve an AUC of only 0.56 for prostate cancer. Nine SNPs were not sufficient to improve the discrimination power over that of nongenetic predictors for risk of cardiovascular events. Finally, in Crohn's disease, a model of five SNPs, one with a quite low odds ratio of 0.26, has an AUC of only 0.66. Our analyses and examples show that strong association, although very valuable for establishing etiological hypotheses, does not guarantee effective discrimination between cases and controls. The scientific community should be cautious to avoid overstating the value of association findings in terms of personalized medicine before their time.
Correctly merged data sets that have been independently genotyped can increase statistical power in linkage and association studies. However, alleles from microsatellite data sets genotyped with different experimental protocols or platforms cannot be accurately matched using base-pair size information alone. In a previous publication we introduced a statistical model for merging microsatellite data by matching allele frequencies between data sets. These methods are implemented in our software MicroMerge version 1 (v1). While MicroMerge v1 output can be analyzed by some genetic analysis programs, many programs can not analyze alignments that do not match alleles one-to-one between data sets. A consequence of such alignments is that codominant genotypes must often be analyzed as phenotypes. In this paper we describe several extensions that are implemented in MicroMerge version 2 (v2).
Notably, MicroMerge v2 includes a new one-to-one alignment option that creates merged pedigree and locus files that can be handled by most genetic analysis software. Other features in MicroMerge v2 enhance the following aspects of control: 1) optimizing the algorithm for different merging scenarios, such as data sets with very different sample sizes or multiple data sets, 2) merging small data sets when a reliable set of allele frequencies are available, and 3) improving the quantity and 4) quality of merged data. We present results from simulated and real microsatellite genotype data sets, and conclude with an association analysis of three familial dyslipidemia (FD) study samples genotyped at different laboratories. Independent analysis of each FD data set did not yield consistent results, but analysis of the merged data sets identified strong association at locus D11S2002.
The MicroMerge v2 features will enable merging for a variety of genotype data sets, which in turn will facilitate meta-analyses for powering association analysis.
Age-related maculopathy (ARM) is a common cause of visual impairment in the elderly populations of industrialized countries and significantly affects the quality of life of those suffering from the disease. Variants within two genes, the complement factor H (CFH) and the poorly characterized LOC387715 (ARMS2), are widely recognized as ARM risk factors. CFH is important in regulation of the alternative complement pathway suggesting this pathway is involved in ARM pathogenesis. Two other complement pathway genes, the closely linked complement component receptor (C2) and complement factor B (CFB), were recently shown to harbor variants associated with ARM.
We investigated two SNPs in C2 and two in CFB in independent case-control and family cohorts of white subjects and found rs547154, an intronic SNP in C2, to be significantly associated with ARM in both our case-control (P-value 0.00007) and family data (P-value 0.00001). Logistic regression analysis suggested that accounting for the effect at this locus significantly (P-value 0.002) improves the fit of a genetic risk model of CFH and LOC387715 effects only. Modeling with the generalized multifactor dimensionality reduction method showed that adding C2 to the two-factor model of CFH and LOC387715 increases the sensitivity (from 63% to 73%). However, the balanced accuracy increases only from 71% to 72%, and the specificity decreases from 80% to 72%.
C2/CFB significantly influences AMD susceptibility and although accounting for effects at this locus does not dramatically increase the overall accuracy of the genetic risk model, the improvement over the CFH-LOC387715 model is statistically significant.
Rheumatoid arthritis (RA) is a multifactorial disease with complex genetic etiology, about which little is known. Here, we apply a two-stage procedure in which a quick first-stage analysis was used to narrow down targets for a more thorough and detailed testing for gene × gene interaction. Potentially interesting regions were first identified by testing for major gene effects using non-parametric linkage methods. To select regions of interest, we first tested for linkage to three different RA-related traits one at a time: RA affection status and the quantitative phenotypes rheumatoid factor IgM and anti-cyclic citrullinated peptide levels. These linkage analyses identified regions on chromosomes 3, 5, 6, 8, 16, 18, 19, and 20. We subsequently analyzed the selected regions in a pairwise manner to detect gene × gene interactions influencing RA using a recently developed two-dimensional linkage method. We found evidence of interacting loci on chromosomes 5, 6, and 18.
Linkage analysis methods that incorporate etiological heterogeneity of complex diseases are likely to demonstrate greater power than traditional linkage analysis methods. Several such methods use covariates to discriminate between linked and unlinked pedigrees with respect to a certain disease locus. Here we apply several such methods including two mixture models, ordered subset analysis, and a conditional logistic model to genome scan data on the DSM-IV alcohol dependence phenotype on the Collaborative Studies on Genetics of Alcoholism families, and compare the results to traditional nonparametric linkage analysis. In general, there was little agreement among the various covariate-based linkage statistics. Linkage signals with empirical p-values less than 0.001 were detected on chromosomes 3, 4, 7, 10, and 12, with the highest peak occurring at the GABRB1 gene using the ecb21 covariate.
Using the Genetic Analysis Workshop 14 (GAW14) simulated dataset, we compare microsatellite and single-nucleotide polymorphism (SNP) markers in terms of two measures of information content, the traditional entropy-based information content measure, and a new "relative information" measure. Both attempt to measure the amount of information contained in the markers about the identity-by-descent (IBD) sharing among relatives. The performance of the two information measures are compared based on their variability and ability to predict change in the LOD score (ΔLOD) as map density increases for SNP markers. Although in a linked region, LOD scores are correlated with measures of information, we observe that none of the measures predict the LOD score itself very well. In an unlinked region, the LOD score is not related to either measures of information. The information content of microsatellite markers with 7.5-cM spacing is slightly higher than that of SNP markers with 3-cM spacing. At these map densities, microsatellites are found to be uniformly more informative than SNPs irrespective of their level of heterozygosity. For SNPs, we found that as the level of heterozygosity increases, the information content increases. As reported in all other previous studies, we also found that high-density SNPs have higher information content compared to low-density microsatellites. Performance of both the two information measures considered here are similar, but the relative information measure predicts ΔLOD as marker density increases better than the traditional entropy-based information measure.
Whole genome-wide scanning for susceptibility loci based on linkage disequilibrium (LD) has been proposed as a powerful strategy for mapping common complex diseases, especially in isolated populations. We recruited 389 individuals from 175 families in the US territory of American Samoa, and 96 unrelated individuals from American Samoa and the independent country of Samoa in order to examine background LD by using a 10 centimorgan (cM) map containing 381 autosomal and 18 X-linked microsatellite markers. We tested the relationship between LD and recombination fraction by fitting a regression model. We estimated a slope of -0.021 (SE 0.00354; p < 0.0001). Based on our results, LD in the Samoan population decays steadily as the recombination fraction between autosomal markers increases. The patterns of LD observed in the Samoan population are quite similar to those previously observed in Palau but markedly contrast with those observed in a non-isolated Caucasian sample, where there is essentially no marker-to-marker LD. Our analyses support the hypothesis of a recent bottleneck, which is consistent with the known demographic history of the Samoan population. Furthermore, population substructure tests support the hypothesis that self-identified Samoans represent one homogenous genetic population.
association mapping; linkage disequilibrium; isolated population; Samoa
Current linkage analysis methods for quantitative traits do not usually incorporate imprinting effects. Here, we carried out genome-wide linkage analysis for loci influencing adult height in the Framingham Heart Study subjects using variance components while allowing for imprinting effects. We used a sex-averaged map for the 22 autosomes, while chromosomes 6, 14, 18, and 19 were also analyzed using sex-specific maps. We compared results from these four analyses: 1) non-imprinted with sex-averaged maps, 2) imprinted with sex-averaged maps, 3) non-imprinted with sex-specific maps, and 4) imprinted with sex-specific maps. We found four regions on three chromosomes (14q32, 18p11-q21, 18q21-22, and 19q13) with LOD scores above 2.0, with a maximum LOD score of 3.12, allowing for imprinting and sex-specific maps, at D18S1364 on 18q21. While we obtained significant evidence of imprinting effects in both the 18p11-q21 and 19q13 regions when using sex-averaged maps, there were no significant differences between the imprinted and non-imprinted LOD scores when we used sex-specific maps. Our results illustrate the importance of allowing for gender-specific effects in linkage analyses, whether these are in the form of gender-specific recombination frequencies, or in the form of imprinting effects.