1.  Stratified randomization controls better for batch effects in 450K methylation analysis: a cautionary tale 
Frontiers in Genetics  2014;5:354.
Background: Batch effects in DNA methylation microarray experiments can lead to spurious results if not properly handled during the plating of samples.
Methods: Two pilot studies examining the association of DNA methylation patterns across the genome with obesity in Samoan men were investigated for chip- and row-specific batch effects. For each study, the DNA of 46 obese men and 46 lean men were assayed using Illumina's Infinium HumanMethylation450 BeadChip. In the first study (Sample One), samples from obese and lean subjects were examined on separate chips. In the second study (Sample Two), the samples were balanced on the chips by lean/obese status, age group, and census region. We used methylumi, watermelon, and limma R packages, as well as ComBat, to analyze the data. Principal component analysis and linear regression were, respectively, employed to identify the top principal components and to test for their association with the batches and lean/obese status. To identify differentially methylated positions (DMPs) between obese and lean males at each locus, we used a moderated t-test.
Results: Chip effects were effectively removed from Sample Two but not Sample One. In addition, dramatic differences were observed between the two sets of DMP results. After “removing” batch effects with ComBat, Sample One had 94,191 probes differentially methylated at a q-value threshold of 0.05 while Sample Two had zero differentially methylated probes. The disparate results from Sample One and Sample Two likely arise due to the confounding of lean/obese status with chip and row batch effects.
Conclusion: Even the best possible statistical adjustments for batch effects may not completely remove them. Proper study design is vital for guarding against spurious findings due to such effects.
PMCID: PMC4195366  PMID: 25352862
array data; batch effects; DNA methylation; epigenetics; obesity; study design
2.  Endophenotypes for Age-Related Macular Degeneration: Extending Our Reach into the Preclinical Stages of Disease 
Journal of clinical medicine  2014;3(4):1335-1356.
The key to reducing the individual and societal burden of age-related macular degeneration (AMD)-related vision loss, is to be able to initiate therapies that slow or halt the progression at a point that will yield the maximum benefit while minimizing personal risk and cost. There is a critical need to find clinical markers that, when combined with the specificity of genetic testing, will identify individuals at the earliest stages of AMD who would benefit from preventive therapies. These clinical markers are endophenotypes for AMD, present in those who are likely to develop AMD, as well as in those who have clinical evidence of AMD. Clinical characteristics associated with AMD may also be possible endophenotypes if they can be detected before or at the earliest stages of the condition, but we and others have shown that this may not always be valid. Several studies have suggested that dynamic changes in rhodopsin regeneration (dark adaptation kinetics and/or critical flicker fusion frequencies) may be more subtle indicators of AMD-associated early retinal dysfunction. One can test for the relevance of these measures using genetic risk profiles based on known genetic risk variants. These functional measures may improve the sensitivity and specificity of predictive models for AMD and may also serve to delineate clinical subtypes of AMD that may differ with respect to prognosis and treatment.
PMCID: PMC4284143  PMID: 25568804
age-related macular degeneration; endophenotype; genetic risk; preclinical diagnostics; retinal function; predictive modeling
3.  Genome-wide association study of primary dentition pit-and-fissure and smooth surface caries 
Caries research  2014;48(4):330-338.
Dental caries continues to be the most common chronic disease in children today. Despite the substantial involvement of genetics in the process of caries development, the specific genes contributing to dental caries remain largely unknown.
We performed separate genome-wide association studies of smooth and pit-and-fissure tooth surface caries experience in the primary dentitions of self-reported white children in two samples from Iowa and rural Appalachia. In total, 1006 children (ages 3-12 years) were included for smooth surface analysis, and 979 children (ages 4-14 years) for pit-and-fissure surface analysis. Associations were tested for more than 1.2 million single nucleotide polymorphisms, either genotyped or imputed.
We detected genome-wide significant signals in KPNA4 (p-value = 2.0E-9), and suggestive signals in ITGAL (p-value = 2.1E-7) and PLUNC family genes (p-value = 2.0E-6), thus nominating these novel loci as putative caries susceptibility genes. We also replicated associations observed in previous studies for MPPED2 (p-value = 6.9E-6), AJAP1 (p-value = 1.6E-6) and RPS6KA2 (p-value = 7.3E-6). Replication of these associations in additional samples, as well as experimental studies to determine the biological functions of associated genetic variants, are warranted. Ultimately, efforts such as this may lead to a better understand of caries etiology, and could eventually facilitate the development of new interventions and preventive measures.
PMCID: PMC4043868  PMID: 24556642
genome-wide association study; primary dentition; susceptibility gene; smooth surface; pit-and-fissure surface
4.  Mega2: validated data-reformatting for linkage and association analyses 
In a typical study of the genetics of a complex human disease, many different analysis programs are used, to test for linkage and association. This requires extensive and careful data reformatting, as many of these analysis programs use differing input formats. Writing scripts to facilitate this can be tedious, time-consuming, and error-prone. To address these issues, the open source Mega2 data reformatting program provides validated and tested data conversions from several commonly-used input formats to many output formats.
Mega2, the Manipulation Environment for Genetic Analysis, facilitates the creation of analysis-ready datasets from data gathered as part of a genetic study. It transparently allows users to process genetic data for family-based or case/control studies accurately and efficiently. In addition to data validation checks, Mega2 provides analysis setup capabilities for a broad choice of commonly-used genetic analysis programs. First released in 2000, Mega2 has recently been significantly improved in a number of ways. We have rewritten it in C++ and have reduced its memory requirements. Mega2 now can read input files in LINKAGE, PLINK, and VCF/BCF formats, as well as its own specialized annotated format. It supports conversion to many commonly-used formats including SOLAR, PLINK, Merlin, Mendel, SimWalk2, Cranefoot, IQLS, FBAT, MORGAN, BEAGLE, Eigenstrat, Structure, and PLINK/SEQ. When controlled by a batch file, Mega2 can be used non-interactively in data reformatting pipelines. Support for genetic data from several other species besides humans has been added.
By providing tested and validated data reformatting, Mega2 facilitates more accurate and extensive analyses of genetic data, avoiding the need to write, debug, and maintain one’s own custom data reformatting scripts.
Mega2 is freely available at
Electronic supplementary material
The online version of this article (doi:10.1186/s13029-014-0026-y) contains supplementary material, which is available to authorized users.
PMCID: PMC4269913
Software; Linkage; Association; Human Genetics; Data management
5.  A Genome-Wide Association Study of Chronic Otitis Media with Effusion and Recurrent Otitis Media Identifies a Novel Susceptibility Locus on Chromosome 2 
Chronic otitis media with effusion (COME) and recurrent otitis media (ROM) have been shown to be heritable, but candidate gene and linkage studies to date have been equivocal. Our aim was to identify genetic susceptibility factors using a genome-wide association study (GWAS). We genotyped 602 subjects from 143 families with 373 COME/ROM subjects using the Illumina Human CNV370-Duo DNA Bead Chip (324,748 SNPs). We carried out the GWAS scan and imputed SNPs at the regions with the most significant associations. Replication genotyping in an independent family-based sample was conducted for 53 SNPs: the 41 most significant SNPs with P < 10−4 and 12 imputed SNPs with P < 10−4 on chromosome 15 (near the strongest signal). We replicated the association of rs10497394 (GWAS discovery P = 1.30 × 10−5) on chromosome 2 in the independent otitis media population (P = 4.7 × 10−5; meta-analysis P = 1.52 × 10−8). Three additional SNPs had replication P values < 0.10. Two were on chromosome 15q26.1 including rs1110060, the strongest association with COME/ROM in the primary GWAS (P = 3.4 ×10−7) in KIF7 intron 7 (P = 0.072), and rs10775247, a non-synonymous SNP in TICRR exon 2 (P = 0.075). The third SNP rs386057 was on chromosome 5 in TPPP intron 1 (P = 0.045). We have performed the first GWAS of COME/ROM and have identified a SNP rs10497394 on chromosome 2 is significantly associated with COME/ROM susceptibility. This SNP is within a 537 kb intergenic region, bordered by CDCA7 and SP3. The genomic and functional significance of this newly identified locus in COME/ROM pathogenesis requires additional investigation.
PMCID: PMC3825021  PMID: 23974705
otitis media; genetics; genome; susceptibility; locus
6.  Demographic, socioeconomic, and behavioral factors affecting patterns of tooth decay in the permanent dentition: Principal components and factor analyses 
Dental caries of the permanent dentition is a multi-factorial disease resulting from the complex interplay of endogenous and environmental risk factors. The disease is not easily quantified due to the innumerable possible combinations of carious lesions across individual tooth surfaces of the permanent dentition. Global measures of decay, such as the DMFS index (which was developed for surveillance applications), may not be optimal for studying the epidemiology of dental caries because they ignore the distinct patterns of decay across the dentition. We hypothesize that specific risk factors may manifest their effects on specific tooth surfaces leading to patterns of decay that can be identified and studied. In this study we utilized two statistical methods of extracting patterns of decay from surface-level caries data in order to create novel phenotypes with which to study the risk factors affecting dental caries.
Intra-oral dental examinations were performed on 1,068 participants aged 18 to 75 years to assess dental caries. The 128 tooth surfaces of the permanent dentition were scored as carious or not and used as input for principal components analysis (PCA) and factor analysis (FA), two methods of identifying underlying patterns without a priori knowledge of the patterns. Demographic (age, sex, birth year, race/ethnicity, and educational attainment), anthropometric (height, body mass index, waist circumference), endogenous (saliva flow), and environmental (tooth brushing frequency, home water source, and home water fluoride) risk factors were tested for association with the caries patterns identified by PCA and FA, as well as DMFS, for comparison. The ten strongest patterns (i.e., those that explain the most variation in the data set) extracted by PCA and FA were considered.
The three strongest patterns identified by PCA reflected (i) global extent of decay (i.e., comparable to DMFS index), (ii) pit and fissure surface caries, and (iii) smooth surface caries, respectively. The two strongest patterns identified by FA corresponded to (i) pit and fissure surface caries and (ii) maxillary incisor caries. Age and birth year were significantly associated with several patterns of decay, including global decay/DMFS index. Sex, race, educational attainment, and tooth brushing were each associated with specific patterns of decay, but not with global decay/DMFS index.
Taken together, these results support the notion that caries experience is separable into patterns attributable to distinct risk factors. This study demonstrates the utility of such novel caries patterns as new outcomes for exploring the complex, multifactorial nature of dental caries.
PMCID: PMC3568445  PMID: 23106439
dental caries; permanent dentition; pit and fissure surfaces; smooth surfaces; tooth surfaces; principal components analysis; factor analysis; tooth brushing
7.  Genetic-based prediction of disease traits: prediction is very difficult, especially about the future† 
Frontiers in Genetics  2014;5:162.
Translation of results from genetic findings to inform medical practice is a highly anticipated goal of human genetics. The aim of this paper is to review and discuss the role of genetics in medically-relevant prediction. Germline genetics presages disease onset and therefore can contribute prognostic signals that augment laboratory tests and clinical features. As such, the impact of genetic-based predictive models on clinical decisions and therapy choice could be profound. However, given that (i) medical traits result from a complex interplay between genetic and environmental factors, (ii) the underlying genetic architectures for susceptibility to common diseases are not well-understood, and (iii) replicable susceptibility alleles, in combination, account for only a moderate amount of disease heritability, there are substantial challenges to constructing and implementing genetic risk prediction models with high utility. In spite of these challenges, concerted progress has continued in this area with an ongoing accumulation of studies that identify disease predisposing genotypes. Several statistical approaches with the aim of predicting disease have been published. Here we summarize the current state of disease susceptibility mapping and pharmacogenetics efforts for risk prediction, describe methods used to construct and evaluate genetic-based predictive models, and discuss applications.
PMCID: PMC4040440  PMID: 24917882
predictive model; genetic risk; human genetics; prognosis; clinical utility
8.  Identification of a Rare Coding Variant in Complement 3 Associated with Age-related Macular Degeneration 
Nature genetics  2013;45(11):10.1038/ng.2758.
Macular degeneration is a common cause of blindness in the elderly. To identify rare coding variants associated with a large increase in risk of age-related macular degeneration (AMD), we sequenced 2,335 cases and 789 controls in 10 candidate loci (57 genes). To increase power, we augmented our control set with ancestry-matched exome sequenced controls. An analysis of coding variation in 2,268 AMD cases and 2,268 ancestry matched controls revealed two large-effect rare variants; previously described R1210C in the CFH gene (fcase = 0.51%, fcontrol = 0.02%, OR = 23.11), and newly identified K155Q in the C3 gene (fcase = 1.06%, fcontrol = 0.39%, OR = 2.68). The variants suggest decreased inhibition of C3 by Factor H, resulting in increased activation of the alternative complement pathway, as a key component of disease biology.
PMCID: PMC3812337  PMID: 24036949
9.  ADCYAP1R1 and Asthma in Puerto Rican Children 
Rationale: Epigenetic and/or genetic variation in the gene encoding the receptor for adenylate-cyclase activating polypeptide 1 (ADCYAP1R1) has been linked to post-traumatic stress disorder in adults and anxiety in children. Psychosocial stress has been linked to asthma morbidity in Puerto Rican children.
Objectives: To examine whether epigenetic or genetic variation in ADCYAP1R1 is associated with childhood asthma in Puerto Ricans.
Methods: We conducted a case-control study of 516 children ages 6–14 years living in San Juan, Puerto Rico. We assessed methylation at a CpG site in the promoter of ADCYAP1R1 (cg11218385) using a pyrosequencing assay in DNA from white blood cells. We tested whether cg11218385 methylation (range, 0.4–6.1%) is associated with asthma using logistic regression. We also examined whether exposure to violence (assessed by the Exposure to Violence [ETV] Scale in children 9 yr and older) is associated with cg11218385 methylation (using linear regression) or asthma (using logistic regression). Logistic regression was used to test for association between a single nucleotide polymorphism in ADCYAP1R1 (rs2267735) and asthma under an additive model. All multivariate models were adjusted for age, sex, household income, and principal components.
Measurements and Main Results: Each 1% increment in cg11218385 methylation was associated with increased odds of asthma (adjusted odds ratio, 1.3; 95% confidence interval, 1.0–1.6; P = 0.03). Among children 9 years and older, exposure to violence was associated with cg11218385 methylation. The C allele of single nucleotide polymorphism rs2267735 was significantly associated with increased odds of asthma (adjusted odds ratio, 1.3; 95% confidence interval, 1.02–1.67; P = 0.03).
Conclusions: Epigenetic and genetic variants in ADCYAP1R1 are associated with asthma in Puerto Rican children.
PMCID: PMC3733434  PMID: 23328528
methylation; ADCYAP1R1; childhood asthma; Puerto Ricans; violence
10.  Genome-Wide Association Study of Periodontal Health Measured by Probing Depth in Adults Ages 18−49 years 
G3: Genes|Genomes|Genetics  2013;4(2):307-314.
The etiology of chronic periodontitis clearly includes a heritable component. Our purpose was to perform a small exploratory genome-wide association study in adults ages 18–49 years to nominate genes associated with periodontal disease−related phenotypes for future consideration. Full-mouth periodontal pocket depth probing was performed on participants (N = 673), with affected status defined as two or more sextants with probing depths of 5.5 mm or greater. Two variations of this phenotype that differed in how missing teeth were treated were used in analysis. More than 1.2 million genetic markers across the genome were genotyped or imputed and tested for genetic association. We identified ten suggestive loci (p-value ≤ 1E-5), including genes/loci that have been previously implicated in chronic periodontitis: LAMA2, HAS2, CDH2, ESR1, and the genomic region on chromosome 14q21-22 between SOS2 and NIN. Moreover, we nominated novel loci not previously implicated in chronic periodontitis or related pathways, including the regions 3p22 near OSBPL10 (a lipid receptor implicated in hyperlipidemia), 4p15 near HSP90AB2P (a heat shock pseudogene), 11p15 near GVINP1 (a GTPase pseudogene), 14q31 near SEL1L (an intracellular transporter), and 18q12 in FHOD3 (an actin cytoskeleton regulator). Replication of these results in additional samples is needed. This is one of the first research efforts to identify genetic polymorphisms associated with chronic periodontitis-related phenotypes by the genome-wide association study approach. Though small, efforts such this are needed in order to nominate novel genes and generate new hypotheses for exploration and testing in future studies.
PMCID: PMC3931564  PMID: 24347629
GWAS; chronic periodontitis
11.  Applying Novel Genome-Wide Linkage Strategies to Search for Loci Influencing Type 2 Diabetes and Adult Height in American Samoa 
Human biology  2008;80(2):99-123.
Type 2 diabetes mellitus (T2DM) is a common complex phenotype that by the year 2010 is predicted to affect 221 million people globally. In the present study we performed a genome-wide linkage scan using the allele-sharing statistic Sall implemented in Allegro and a novel two-dimensional genome-wide strategy implemented in Merloc that searches for pairwise interaction between genetic markers located on different chromosomes linked to T2DM. In addition, we used a robust score statistic from the newly developed QTL-ALL software to search for linkage to variation in adult height. The strategies were applied to a study sample consisting of 238 sib-pairs affected with T2DM from American Samoa. We did not detect any genome-wide significant susceptibility loci for T2DM. However, our two-dimensional linkage investigation detected several loci pairs of interest, including 11q22 and 21q21, 9q21 and 11q22, 1p22–p21 and 4p15, and 4p15 and 15q11–q14, with a two-loci maximum LOD score (MLS) greater than 2.00. Most detected individual loci have previously been identified as susceptibility loci for diabetes-related traits. Our two-dimensional linkage results may facilitate the selection of potential candidate genes and molecular pathways for further diabetes studies because these results, besides providing candidate loci, also demonstrate that polygenic effects may play an important role in T2DM. Linkage was detected (p value of 0.005) for variation in adult height on chromosome 9q31, which was reported previously in other populations. Our finding suggests that the 9q31 region may be a strong quantitative trait locus for adult height, which is likely to be of importance across populations.
PMCID: PMC3701160  PMID: 18720898
12.  Casares' map function: No need for a 'corrected' Haldane's map function 
Genetica  2008;135(3):305-307.
A genetic map function M(d) = RF provides a mapping from the additive genetic distance d to the non-additive recombination fraction RF between a given pair of loci, where the recombination fraction is the proportion of gametes that are recombinant between the two loci. Genetic map functions are needed because in most experiments all we can directly observe are the recombination events. However, since a recombination event is only observed if there are an odd number of crossovers between the two loci, recombination fractions are not additive. One of the most widely used map functions is Haldane's map function, which is derived under the assumptions of no chiasma and no chromatid interference, and has been in widespread use since 1919. However, Casares recently proposed a 'corrected' Haldane's map function – we show here that this 'corrected' map function is not correct due to faulty assumptions and mistakes in its derivation.
PMCID: PMC2704907  PMID: 18604586
13.  INSIG2 variants, dietary patterns, and metabolic risk in Samoa 
Association of insulin-induced gene 2 (INSIG2) variants with obesity has been confirmed in several but not all follow-up studies. Differences in environmental factors across populations may mask some genetic associations and therefore gene-environment interactions should be explored. We hypothesized that the association between dietary patterns and components of the metabolic syndrome could be modified by INSIG2 variants.
We conducted a longitudinal study of adiposity and cardiovascular disease risk among 427 and 290 adults from Samoa and American Samoa (1990–95). Principal component analysis on food items from a validated FFQ was used to identify neo-traditional and modern dietary patterns. We explored gene-dietary pattern interactions with the INSIG2 variants rs9308762 and rs7566605.
Results for American Samoans were mostly non-significant. In Samoa, the neo-traditional dietary pattern was associated with lower triglycerides, BMI, waist circumference, systolic and diastolic blood pressure, and fasting glucose (all p-for-trend<0.05). The modern pattern was significantly associated with higher triglycerides, BMI, waist circumference, and lower HDL cholesterol (all p-for-trend<0.05). A significant interaction for triglycerides was found between the modern pattern and the rs9308762 polymorphism (p=0.04). Those from Samoa consuming the modern pattern have higher triglycerides if they are homozygous for the rs9308762 C allele.
The common INSIG2 rs9308762 variant was associated with poorer metabolic control and a greater sensitivity of trigylcerides to a modern dietary pattern. Environmental factors need to be taken into account when assessing genetic associations across and within populations.
PMCID: PMC3634362  PMID: 22968099
INSIG2; dietary patterns; gene-diet interactions; metabolic risk; Samoa
14.  ASTN1 and Alcohol Dependence: Family-Based Association Analysis in Multiplex Alcohol Dependence Families 
A previous genome-wide linkage study of alcohol dependence (AD) in multiplex families found a suggestive linkage result for a region on Chromosome 1 near microsatellite markers D1S196 and D1S2878. The ASTN1 gene is in this region, a gene previously reported to be associated with substance abuse, bipolar disorder and schizophrenia. Using the same family data consisting of 330 individuals with phenotypic data and DNA, finer mapping of a 26 cM region centered on D1S196 was undertaken using SNPs with minor allele frequency (MAF) ≥ 0.15 and pair-wise linkage disequilibrium (LD) of r2 <0.8 using the HapMap CEU population. Significant FBAT P-values for SNPs within the ASTN1 gene were observed for four SNPs (rs465066, rs228008, rs6668092, and rs172917), the most significant, rs228008, within intron 8 had a P-value of 0.001. Using MQLS, which allows for inclusion of all families, we find three of these SNPs with MQLS P-values <0.003. In addition, two additional neighboring SNPs (rs10798496 and rs6667588) showed significance at P = 0.002 and 0.03, respectively. Haplotype analysis was performed using the haplotype-based test function of FBAT for a block that included rs228008, rs6668092, and rs172917. This analysis found one block (GCG) over-transmitted and another (ATA) under-transmitted to affected offspring. Linkage analysis identified a region consistent with the association results. Family-based association analysis shows the ASTN1 gene significantly associated with alcohol dependence. The potential importance of the ASTN1 gene for AD risk may be related its role in glial-guided neuronal migration.
PMCID: PMC3623684  PMID: 22488871
ASTN1; alcohol dependence; multiplex families
15.  Complement factor H genetic variant and age-related macular degeneration: effect size, modifiers and relationship to disease subtype 
Background Variation in the complement factor H gene (CFH) is associated with risk of late age-related macular degeneration (AMD). Previous studies have been case–control studies in populations of European ancestry with little differentiation in AMD subtype, and insufficient power to confirm or refute effect modification by smoking.
Methods To precisely quantify the association of the single nucleotide polymorphism (SNP rs1061170, ‘Y402H’) with risk of AMD among studies with differing study designs, participant ancestry and AMD grade and to investigate effect modification by smoking, we report two unpublished genetic association studies (n = 2759) combined with data from 24 published studies (26 studies, 26 494 individuals, including 14 174 cases of AMD) of European ancestry, 10 of which provided individual-level data used to test gene–smoking interaction; and 16 published studies from non-European ancestry.
Results In individuals of European ancestry, there was a significant association between Y402H and late-AMD with a per-allele odds ratio (OR) of 2.27 [95% confidence interval (CI) 2.10–2.45; P = 1.1 x 10−161]. There was no evidence of effect modification by smoking (P = 0.75). The frequency of Y402H varied by ancestral origin and the association with AMD in non-Europeans was less clear, limited by paucity of studies.
Conclusion The Y402H variant confers a 2-fold higher risk of late-AMD per copy in individuals of European descent. This was stable to stratification by study design and AMD classification and not modified by smoking. The lack of association in non-Europeans requires further verification. These findings are of direct relevance for disease prediction. New research is needed to ascertain if differences in circulating levels, expression or activity of factor H protein explain the genetic association.
PMCID: PMC3304526  PMID: 22253316
Age-related macular degeneration (AMD); Complement factor H gene; meta-ananlysis
16.  Replication of a Genome-Wide Association Study of Birth Weight in Preterm Neonates 
The Journal of pediatrics  2011;160(1):19-24.e4.
To examine associations in a preterm population between rs9883204 in ADCY5 and rs900400 near LEKR1 and CCNL1 with birth weight. Both markers were associated with birth weight in a term population in a recent genome-wide association (GWA) study by Freathy et al.
Study design
A meta-analysis of mother and infant samples was performed for associations of rs900400 and rs9883204 with birth weight in 393 families from the U.S., 265 families from Argentina and 735 mother-infant pairs from Denmark. Z scores adjusted for infant sex and gestational age were generated for each population separately and regressed on allele counts. Association evidence was combined across sites by inverse-variance weighted meta-analysis.
Each additional C allele of rs900400 (LEKR1/CCNL1) in infants was marginally associated with a 0.069 standard deviation (SD) lower birth weight (95% CI = −0.159 – 0.022, P = 0.068). This result was slightly more pronounced after adjusting for smoking (P = 0.036). There were no significant associations identified with rs9883204 or in maternal samples.
These results indicate the potential importance of this marker on birth weight irrespective of gestational age.
PMCID: PMC3237813  PMID: 21885063
Genetic; association; single nucleotide polymorphism
17.  Common variants in FTO are not significantly associated with obesity-related phenotypes among Samoans of Polynesia 
Annals of Human Genetics  2011;76(1):17-24.
The association between obesity and the fat mass and obesity associated (FTO) gene has been widely replicated among Caucasian populations. The limited number of studies assessing its significance in Asian populations have been somewhat conflicting. We performed a genetic association study of 51 tagging, GWAS, and imputed single nucleotide polymorphisms with twelve measures of adiposity and skeletal robustness in two Samoan populations of Polynesia. We included 465 and 624 unrelated American Samoan and Samoan individuals, respectively; these populations derive from a single genetic background traced to Southeast Asia and represent one socio-cultural unit, although they are economically disparate with distinct environmental exposures. American Samoans were significantly larger than Samoans in all measures of obesity and most measures of skeletal robustness. In separate analyses of American Samoa and Samoa, we found a total of 36 nominal associations between FTO variants and skeletal and obesity measures. The preponderance of these nominal associations (32 of 36) was observed in the Samoan population, and predominantly with skeletal rather than fat mass measures (28 of 36). All significance disappeared, however, following corrections for multiple testing. Based on these findings, it could be surmised that FTO is not likely a major obesity locus in Polynesian populations.
PMCID: PMC3272784  PMID: 22084931
obesity; FTO; association analysis; Samoa
18.  Genome-wide association Scan of dental caries in the permanent dentition 
BMC Oral Health  2012;12:57.
Over 90% of adults aged 20 years or older with permanent teeth have suffered from dental caries leading to pain, infection, or even tooth loss. Although caries prevalence has decreased over the past decade, there are still about 23% of dentate adults who have untreated carious lesions in the US. Dental caries is a complex disorder affected by both individual susceptibility and environmental factors. Approximately 35-55% of caries phenotypic variation in the permanent dentition is attributable to genes, though few specific caries genes have been identified. Therefore, we conducted the first genome-wide association study (GWAS) to identify genes affecting susceptibility to caries in adults.
Five independent cohorts were included in this study, totaling more than 7000 participants. For each participant, dental caries was assessed and genetic markers (single nucleotide polymorphisms, SNPs) were genotyped or imputed across the entire genome. Due to the heterogeneity among the five cohorts regarding age, genotyping platform, quality of dental caries assessment, and study design, we first conducted genome-wide association (GWA) analyses on each of the five independent cohorts separately. We then performed three meta-analyses to combine results for: (i) the comparatively younger, Appalachian cohorts (N = 1483) with well-assessed caries phenotype, (ii) the comparatively older, non-Appalachian cohorts (N = 5960) with inferior caries phenotypes, and (iii) all five cohorts (N = 7443). Top ranking genetic loci within and across meta-analyses were scrutinized for biologically plausible roles on caries.
Different sets of genes were nominated across the three meta-analyses, especially between the younger and older age cohorts. In general, we identified several suggestive loci (P-value ≤ 10E-05) within or near genes with plausible biological roles for dental caries, including RPS6KA2 and PTK2B, involved in p38-depenedent MAPK signaling, and RHOU and FZD1, involved in the Wnt signaling cascade. Both of these pathways have been implicated in dental caries. ADMTS3 and ISL1 are involved in tooth development, and TLR2 is involved in immune response to oral pathogens.
As the first GWAS for dental caries in adults, this study nominated several novel caries genes for future study, which may lead to better understanding of cariogenesis, and ultimately, to improved disease predictions, prevention, and/or treatment.
PMCID: PMC3574042  PMID: 23259602
Dental caries; Genetics; Genome wide association; Permanent dentition; Genomics
19.  Evidence of association of APOE with age-related macular degeneration - a pooled analysis of 15 studies 
Human mutation  2011;32(12):1407-1416.
Age-related macular degeneration (AMD) is the most common cause of incurable visual impairment in high-income countries. Previous studies report inconsistent associations between AMD and apolipoprotein E (APOE), a lipid transport protein involved in low-density cholesterol modulation. Potential interaction between APOE and sex, and smoking status, has been reported. We present a pooled analysis (n=21,160) demonstrating associations between late AMD and APOε4 (OR=0.72 per haplotype; CI: 0.65–0.74; P=4.41×10−11) and APOε2 (OR=1.83 for homozygote carriers; CI: 1.04–3.23; P=0.04), following adjustment for age-group and sex within each study and smoking status. No evidence of interaction between APOE and sex or smoking was found. Ever smokers had significant increased risk relative to never smokers for both neovascular (OR=1.54; CI: 1.38–1.72; P=2.8×10−15) and atrophic (OR=1.38; CI: 1.18–1.61; P=3.37×10−5) AMD but not early AMD (OR=0.94; CI: 0.86–1.03; P=0.16), implicating smoking as a major contributing factor to disease progression from early signs to the visually disabling late forms. Extended haplotype analysis incorporating rs405509 did not identify additional risks beyondε2 and ε4 haplotypes. Our expanded analysis substantially improves our understanding of the association between the APOE locus and AMD. It further provides evidence supporting the role of cholesterol modulation, and low-density cholesterol specifically, in AMD disease etiology.
PMCID: PMC3217135  PMID: 21882290
age-related macular degeneration; AMD; apolipoprotein E; APOE; case-control association study
20.  Role of African Ancestry and Gene-Environment Interactions in Predicting Preterm Birth 
Obstetrics and gynecology  2011;118(5):1081-1089.
To estimate whether African ancestry, specific gene polymorphisms, and gene-environment interactions could account for some of the unexplained preterm birth variance within blacks.
We genotyped 1,509 African ancestry informative markers, cytochrome P-450 1A1 (CYP1A1) and glutathione S-transferases Theta 1 (GSTT1) variants in 1,030 self-reported black mothers. We estimated the African ancestral proportion using the ancestry informative markers for all 1,030 self-reported black mothers. We examined the effect of African ancestry and CYP1A1 and GSTT1 smoking interactions on preterm birth cases as a whole and within its subgroups: very preterm birth (gestational age less than 34 weeks); and late preterm birth (gestational age greater than 34 and less than 37 weeks). We applied logistic regression and receiver operating characteristic (ROC) curve analysis, separately, to evaluate if African ancestry and CYP1A1- and GSTT1-smoking interactions could make additional contributions to preterm birth beyond epidemiological factors.
We found significant associations of African ancestry with preterm birth (22% vs. 31%, OR=1.11; 95%CI: 1.02–1.20) and very preterm birth (23% vs. 33%, OR=1.17; 95%CI: 1.03–1.33), but not with late preterm birth (22% vs. 29%, OR=1.06; 95%CI: 0.97–1.16). In addition, the ROC curve analysis suggested that African ancestry and CYP1A1- and GSTT1-smoking interactions made substantial contributions to very preterm birth beyond epidemiologic factors.
Our data underscore the importance of simultaneously considering epidemiological factors, African ancestry, specific gene polymorphisms and gene-environment interactions to better understand preterm birth racial disparity and to improve our ability to predict preterm birth, especially very preterm birth.
PMCID: PMC3218119  PMID: 22015876
21.  Effects of Smoking and Genotype on the PSR Index of Periodontal Disease in Adults Aged 18–49 
Studies have found both genetic and environmental influences on chronic periodontitis. The purpose of this study was to examine the relationships among previously identified genetic variants, smoking status, and two periodontal disease-related phenotypes (PSR1 and PSR2) in 625 Caucasian adults (aged 18–49 years). The PSR Index was used to classify participants as affected or unaffected under the PSR1 and PSR2 phenotype definitions. Using logistic regression, we found that the form of the relationship varied by single nucleotide polymorphism (SNP): For rs10457525 and rs12630931, the effects of smoking and genotype on risk were additive; whereas for rs10457526 and rs733048, smoking was not independently associated with affected status once genotype was taken into consideration. In contrast, smoking moderated the relationships of rs3870371 and rs733048 with affected status such that former and never smokers with select genotypes were at increased genetic risk. Thus, for several groups, knowledge of genotype may refine the risk prediction over that which can be determined by knowledge of smoking status alone. Future studies should replicate these findings. These findings provide the foundation for the exploration of novel pathways by which periodontitis may occur.
PMCID: PMC3447590  PMID: 23066400
adult; chronic periodontitis; genetics; genomics; smoking
22.  Coordinated Conditional Simulation with SLINK and SUP of Many Markers Linked or Associated to a Trait in Large Pedigrees 
Human Heredity  2011;71(2):126-134.
Simulation of genotypes in pedigrees is an important tool to evaluate the power of a linkage or an association study and to assess the empirical significance of results. SLINK is a widely-used package for pedigree simulations, but its implementation has not previously been described in a published paper. SLINK was initially derived from the LINKAGE programs. Over the 20 years since its release, SLINK has been modified to incorporate faster algorithms, notably from the linkage analysis package FASTLINK, also derived from LINKAGE. While SLINK can simulate genotypes on pedigrees of high complexity, one limitation of SLINK, as with most methods based on peeling algorithms to evaluate pedigree likelihoods, is the small number of linked markers that can be generated. The software package SUP includes an elegant wrapper for SLINK that circumvents the limitation on number of markers by using descent markers generated by SLINK to simulate a much larger number of markers on the same chromosome, linked and possibly associated with a trait locus. We have released new coordinated versions of SLINK (3.0; available from and SUP (v090804; available from or that integrate the two software packages. Thereby, we have removed some of the previous limitations on the joint functionality of the programs, such as the number of founders in a pedigree. We review the history of SLINK and describe how SLINK and SUP are now coordinated to permit the simulation of large numbers of markers linked and possibly associated with a trait in large pedigrees.
PMCID: PMC3136384  PMID: 21734403
Coordinated conditional simulation; SLINK; SUP; Linkage study; Association study; Pedigree, large; Pedigree, complex
23.  Variations in Apolipoprotein E Frequency With Age in a Pooled Analysis of a Large Group of Older People 
American Journal of Epidemiology  2011;173(12):1357-1364.
Variation in the apolipoprotein E gene (APOE) has been reported to be associated with longevity in humans. The authors assessed the allelic distribution of APOE isoforms ε2, ε3, and ε4 among 10,623 participants from 15 case-control and cohort studies of age-related macular degeneration (AMD) in populations of European ancestry (study dates ranged from 1990 to 2009). The authors included only the 10,623 control subjects from these studies who were classified as having no evidence of AMD, since variation within the APOE gene has previously been associated with AMD. In an analysis stratified by study center, gender, and smoking status, there was a decreasing frequency of the APOE ε4 isoform with increasing age (χ2 for trend = 14.9 (1 df); P = 0.0001), with a concomitant increase in the ε3 isoform (χ2 for trend = 11.3 (1 df); P = 0.001). The association with age was strongest in ε4 homozygotes; the frequency of ε4 homozygosity decreased from 2.7% for participants aged 60 years or less to 0.8% for those over age 85 years, while the proportion of participants with the ε3/ε4 genotype decreased from 26.8% to 17.5% across the same age range. Gender had no significant effect on the isoform frequencies. This study provides strong support for an association of the APOE gene with human longevity.
PMCID: PMC3145394  PMID: 21498624
aged; apolipoprotein E2; apolipoprotein E3; apolipoprotein E4; apolipoproteins E; longevity; meta-analysis; multicenter study
24.  Heritable patterns of tooth decay in the permanent dentition: principal components and factor analyses 
BMC Oral Health  2012;12:7.
Dental caries is the result of a complex interplay among environmental, behavioral, and genetic factors, with distinct patterns of decay likely due to specific etiologies. Therefore, global measures of decay, such as the DMFS index, may not be optimal for identifying risk factors that manifest as specific decay patterns, especially if the risk factors such as genetic susceptibility loci have small individual effects. We used two methods to extract patterns of decay from surface-level caries data in order to generate novel phenotypes with which to explore the genetic regulation of caries.
The 128 tooth surfaces of the permanent dentition were scored as carious or not by intra-oral examination for 1,068 participants aged 18 to 75 years from 664 biological families. Principal components analysis (PCA) and factor analysis (FA), two methods of identifying underlying patterns without a priori surface classifications, were applied to our data.
The three strongest caries patterns identified by PCA recaptured variation represented by DMFS index (correlation, r = 0.97), pit and fissure surface caries (r = 0.95), and smooth surface caries (r = 0.89). However, together, these three patterns explained only 37% of the variability in the data, indicating that a priori caries measures are insufficient for fully quantifying caries variation. In comparison, the first pattern identified by FA was strongly correlated with pit and fissure surface caries (r = 0.81), but other identified patterns, including a second pattern representing caries of the maxillary incisors, were not representative of any previously defined caries indices. Some patterns identified by PCA and FA were heritable (h2 = 30-65%, p = 0.043-0.006), whereas other patterns were not, indicating both genetic and non-genetic etiologies of individual decay patterns.
This study demonstrates the use of decay patterns as novel phenotypes to assist in understanding the multifactorial nature of dental caries.
PMCID: PMC3328249  PMID: 22405185
Dental caries genetics; Heritability; Permanent dentition; Pit and fissure surfaces; Smooth surfaces; Tooth surfaces; Principal components analysis; Factor analysis; Patterns of tooth decay; Patterns of dental caries
25.  Dissection of Chromosome 16p12 Linkage Peak Suggests a Possible Role for CACNG3 Variants in Age-Related Macular Degeneration Susceptibility 
Through extensive linkage and association analyses in multiple independent datasets, this study identified CACNG3 as the most likely AMD susceptibility gene on 16p12.
Age-related macular degeneration (AMD) is a complex disorder of the retina, characterized by drusen, geographic atrophy, and choroidal neovascularization. Cigarette smoking and the genetic variants CFH Y402H, ARMS2 A69S, CFB R32Q, and C3 R102G have been strongly and consistently associated with AMD. Multiple linkage studies have found evidence suggestive of another AMD locus on chromosome 16p12 but the gene responsible has yet to be identified.
In the initial phase of the study, single-nucleotide polymorphisms (SNPs) across chromosome 16 were examined for linkage and/or association in 575 Caucasian individuals from 148 multiplex and 77 singleton families. Additional variants were tested in an independent dataset of unrelated cases and controls. According to these results, in combination with gene expression data and biological knowledge, five genes were selected for further study: CACNG3, HS3ST4, IL4R, Q7Z6F8, and ITGAM.
After genotyping additional tagging SNPs across each gene, the strongest evidence for linkage and association was found within CACNG3 (rs757200 nonparametric LOD* = 3.3, APL (association in the presence of linkage) P = 0.06, and rs2238498 MQLS (modified quasi-likelihood score) P = 0.006 in the families; rs2283550 P = 1.3 × 10−6, and rs4787924 P = 0.002 in the case–control dataset). After adjusting for known AMD risk factors, rs2283550 remained strongly associated (P = 2.4 × 10−4). Furthermore, the association signal at rs4787924 was replicated in an independent dataset (P = 0.035) and in a joint analysis of all the data (P = 0.001).
These results suggest that CACNG3 is the best candidate for an AMD risk gene within the 16p12 linkage peak. More studies are needed to confirm this association and clarify the role of the gene in AMD pathogenesis.
PMCID: PMC3101690  PMID: 21169531

