1.  Genetic Factors In Non-smokers with Age-related Macular Degeneration Revealed Through Genome-wide Gene-Environment Interaction Analysis 
Annals of human genetics  2013;77(3):215-231.
Relatively little is known about the interaction between genes and environment in the complex etiology of age-related macular degeneration (AMD). This study aimed to identify novel factors associated with AMD by analyzing gene-smoking interactions in a genome-wide association study of 1207 AMD cases and 686 controls of Caucasian background with genotype data on 668,238 single nucleotide polymorphisms (SNPs) after quality control. Participants’ history of smoking at least 100 cigarettes lifetime was determined by a self-administered questionnaire. SNP associations modeled the effect of the minor allele additively on AMD using logistic regression, with adjustment for age, sex, and ever/never smoking. Joint effects of SNPs and smoking were examined comparing a null model containing only age, sex, and smoking against an extended model including genotypic and interaction terms. Genome-wide significant main effects were detected at three known AMD loci: CFH (P=7.51×10−30), ARMS2 (P=1.94×10−23), and RDBP/CFB/C2 (P=4.37×10−10), while joint effects analysis revealed three genomic regions with P<10−5. Analyses stratified by smoking found genetic associations largely restricted to non-smokers, with one notable exception: the chromosome 18q22.1 intergenic SNP rs17073641 (between SERPINB8 and CDH7), more strongly associated in non-smokers (OR=0.57, P=2.73×10−5), with an inverse association among smokers (OR=1.42, P=0.00228), suggesting that smoking modifies the effect of some genetic polymorphisms on AMD risk.
PMCID: PMC3625984  PMID: 23577725
Age-related macular degeneration; age-related maculopathy; genome wide association studies (GWAS); gene-environment interaction; genome wide gene-environment interaction studies; smoking; smoking-gene interactions
2.  Identifying modifier loci in existing genome scan data 
Annals of human genetics  2008;72(0 5):670-675.
In many genetic disorders in which a primary disease-causing locus has been identified, evidence for additional trait variation due to genetic factors exists. These findings have led to studies seeking secondary “modifier” loci. Identification of modifier loci provides insight into disease mechanisms and may provide additional screening and treatment targets. We believe that modifier loci can be identified by re-analysis of genome screen data while controlling for primary locus effects. To test this hypothesis, we simulated multiple replicates of typical genome screening data on to two real family structures from a study of hypertrophic cardiomyopathy. With this marker data, we simulated two trait models with characteristics similar to one measure of hypertrophic cardiomyopathy. Both trait models included 3 genes. In the first, the trait was influenced by a primary gene, a secondary “modifier” gene, and a third very small effect gene. In the second, we modeled an interaction between the first two genes. We examined power and false positive rates to map the secondary locus while controlling for the effect of the primary locus with two types of analyses. First, we examine Monte Carlo Markov chain (MCMC) simultaneous segregation and linkage analysis as implemented in Loki, for which we calculated two scoring statistics. Second, we calculate LOD scores using an individual-specific liability class based on the quantitative trait value. We find that both methods produce scores that are significant on a genome-wide level in some replicates. We conclude that mapping of modifier loci in existing samples is possible with these methods.
PMCID: PMC4003897  PMID: 18494837
Modifier gene; Complex trait; Statistical Genetics; Monte Carlo Markov chain; linkage analysis
3.  Initial Assessment of the Pathogenic Mechanisms of the recently identified Alzheimer Risk Loci 
Annals of human genetics  2013;77(2):85-105.
Recent genome wide association studies have identified CLU, CR1, ABCA7 BIN1, PICALM and MS4A6A/MS4A6E in addition to the long established APOE, as loci for Alzheimer’s disease. We have systematically examined each of these loci to assess whether common coding variability contributes to the risk of disease. We have also assessed the regional expression of all the genes in the brain and whether there is evidence of an eQTL explaining the risk. In agreement with other studies we find that coding variability may explain the ABCA7 association, but common coding variability does not explain any of the other loci. We were not able to show that any of the loci had eQTLs within the power of this study. Furthermore the regional expression of each of the loci did not match the pattern of brain regional distribution in Alzheimer pathology.
Although these results are mainly negative, they allow us to start defining more realistic alternative approaches to determine the role of all the genetic loci involved in Alzheimer’s disease.
PMCID: PMC3578142  PMID: 23360175
Alzheimer’s disease; genetic risk; GWAS
4.  A likelihood ratio test for genomewide association under genetic heterogeneity* 
Annals of human genetics  2013;77(2):174-182.
Most existing association tests for genome-wide association studies (GWAS) fail to account for genetic heterogeneity. Zhou and Pan proposed a binomial mixture model based association test to account for the possible genetic heterogeneity in case-control studies. The idea is elegant, however, the proposed test requires an EM-type iterative algorithm to identify the penalized maximum likelihood estimates and a permutation method to assess p-values. The intensive computational burden induced by the EM-algorithm and the permutation becomes prohibitive for direct applications to genome-wide association studies. This paper develops a likelihood ratio test (LRT) for genome-wide association studies under genetic heterogeneity based on a more general alternative mixture model. In particular, a closed-form formula for the likelihood ratio test statistic is derived to avoid the EM-type iterative numerical evaluation. Moreover, an explicit asymptotic null distribution is also obtained which avoids using the permutation to obtain p-values. Thus, the proposed LRT is easy to implement for genome-wide association studies (GWAS). Furthermore, numerical studies demonstrate that the LRT has power advantages over the commonly used Armitage trend test and other existing association tests under genetic heterogeneity. A breast cancer GWAS data set is used to illustrate the newly proposed LRT.
PMCID: PMC3910100  PMID: 23362943
association test; binomial mixture model; complex disease; genetic heterogeneity; genomewide association study
5.  Mitochondrial DNA Diversity in Indigenous Populations of the Southern Extent of Siberia, and the Origins of Native American Haplogroups 
Annals of human genetics  2005;69(0 1):67-89.
In search of the ancestors of Native American mitochondrial DNA (mtDNA) haplogroups, we analyzed the mtDNA of 531 individuals from nine indigenous populations in Siberia. All mtDNAs were subjected to high-resolution RFLP analysis, sequencing of the control-region hypervariable segment I (HVS-I), and surveyed for additional polymorphic markers in the coding region. Furthermore, the mtDNAs selected according to haplogroup/subhaplogroup status were completely sequenced. Phylogenetic analyses of the resulting data, combined with those from previously published Siberian arctic and sub-arctic populations, revealed that remnants of the ancient Siberian gene pool are still evident in Siberian populations, suggesting that the founding haplotypes of the Native American A–D branches originated in different parts of Siberia. Thus, lineage A complete sequences revealed in the Mansi of the Lower Ob and the Ket of the Lower Yenisei belong to A1, suggesting that A1 mtDNAs occasionally found in the remnants of hunting-gathering populations of northwestern and northern Siberia belonged to a common gene pool of the Siberian progenitors of Paleoindians. Moreover, lineage B1, which is the most closely related to the American B2, occurred in the Tubalar and Tuvan inhabiting the territory between the upper reaches of the Ob River in the west, to the Upper Yenisei region in the east. Finally, the sequence variants of haplogroups C and D, which are most similar to Native American C1 and D1, were detected in the Ulchi of the Lower Amur. Overall, our data suggest that the immediate ancestors of the Siberian/Beringian migrants who gave rise to ancient (pre-Clovis) Paleoindians have a common origin with aboriginal people of the area now designated the Altai-Sayan Upland, as well as the Lower Amur/Sea of Okhotsk region.
PMCID: PMC3905771  PMID: 15638829
mtDNA variation; native Siberians; Native Americans
6.  Shared Genomic Segment Analysis: The Power to Find Rare Disease Variants 
Annals of human genetics  2012;76(6):10.1111/j.1469-1809.2012.00728.x.
Shared genomic segment (SGS) analysis is a method that uses dense SNP genotyping in high-risk pedigrees to identify regions of sharing between cases. Here, we illustrate the power of SGS to identify dominant rare risk variants. Using simulated pedigrees, we consider 12 disease models based on disease prevalence, minor allele frequency, and penetrance to represent disease loci that explain 0.2% to 99.8% of total disease risk. Pedigrees were required to contain ≥15 meioses between all cases and to be high-risk based on significant excess of disease (p<0.001 or p<0.00001). Across these scenarios the power for a single pedigree ranged widely. Nonetheless, fewer than 10 pedigrees was sufficient for excellent power in the majority of the models. Power increased with the risk attributable to the disease locus, penetrance, and the excess of disease in the pedigree. Sharing allowing for one sporadic case was uniformly more powerful than sharing using all cases. Further, we do a SGS analysis using a large Attenuated Familial Adenomatous Polyposis pedigree and identified a 1.96 Mb region containing the known causal APC gene with genome-wide significance (p<5×10−7). SGS is a powerful method for detecting rare variants and offers a valuable complement to GWAS and linkage analysis.
PMCID: PMC3879794  PMID: 22989048
7.  Anorectal atresia and variants at predicted regulatory sites in candidate genes 
Annals of human genetics  2012;77(1):31-46.
Anorectal atresia is a serious birth defect of largely unknown etiology but candidate genes have been identified in animal studies and human syndromes. Because alterations in the activity of these genes might lead to anorectal atresia, we selected 71 common variants predicted to be in transcription factor binding sites, CpG windows, splice sites, and miRNA target sites of 25 candidate genes, and tested for their association with anorectal atresia. The study population comprised 150 anorectal atresia cases and 623 control infants without major malformations. Variants predicted to affect transcription factor binding, splicing, and DNA methylation in WNT3A, PCSK5, TCF4, MKKS, GLI2, HOXD12, and BMP4 were associated with anorectal atresia based on a nominal P value <0.05. The GLI2 and BMP4 variants are reported to be moderately associated with gene expression changes (Spearman’s rank correlation coefficients between −0.260 and 0.226). We did not find evidence for interaction between maternal pre-pregnancy obesity and variants in MKKS, a gene previously associated with obesity, on the risk of anorectal atresia. Our results for MKKS support previously suggested associations with anorectal malformations. Our findings suggest that more research is needed to determine whether altered GLI2 and BMP4 expression is important in anorectal atresia in humans.
PMCID: PMC3535506  PMID: 23127126
anorectal malformations; imperforate anus; hindgut; congenital abnormalities
8.  Evaluating mitochondrial DNA variation in autism spectrum disorders 
Annals of human genetics  2012;77(1):9-21.
Despite the increasing speculation that oxidative stress and abnormal energy metabolism may play a role in Autism Spectrum Disorders (ASD), and the observation that patients with mitochondrial defects have symptoms consistent with ASD, there are no comprehensive published studies examining the role of mitochondrial variation in autism. Therefore, we have sought to comprehensively examine the role of mitochondrial DNA (mtDNA) variation with regard to ASD risk, employing a multi-phase approach.
In phase 1 of our experiment, we examined 132 mtDNA single-nucleotide polymorphisms (SNPs) genotyped as part of our genome-wide association studies of ASD. In phase 2 we genotyped the major European mitochondrial haplogroup-defining variants within an expanded set of autism probands and controls. Finally in phase 3, we resequenced the entire mtDNA in a subset of our Caucasian samples (~400 proband-father pairs). In each phase we tested whether mitochondrial variation showed evidence of association to ASD. Despite a thorough interrogation of mtDNA variation, we found no evidence to suggest a major role for mtDNA variation in ASD susceptibility. Accordingly, while there may be attractive biological hints suggesting the role of mitochondria in ASD our data indicate that mtDNA variation is not a major contributing factor to the development of ASD.
PMCID: PMC3535511  PMID: 23130936
mitochondrial DNA; autism; autism spectrum disorders; association studies; genetic
9.  Impact on Modes of Inheritance and Relative Risks of using Extreme Sampling When Designing Genetic Association Studies 
Annals of human genetics  2012;77(1):80-84.
Using extreme phenotypes for association studies can improve statistical power. We study the impact of using samples with extremely high or low traits on the alternative model space, the genotype relative risks, and the genetic models in association studies. We prove the following results: when the risk allele causes high trait values, the more extreme the high traits, the larger the genotype relative risks, which is not always true for using extreme low traits; we also prove that a genetic model theoretically changes with more extreme trait except for the recessive or dominant models. Practically, however, the impact of deviations from the true genetic model at a functional locus due to selective sampling is virtually negligible. The implications of our findings are discussed. Numerical values are reported for illustrations.
PMCID: PMC3535545  PMID: 23163532
Association studies; Extreme sampling; Genetic models; Genotype relative risks; Replication
10.  A small number of candidate gene SNPs reveal continental ancestry in African Americans 
Annals of human genetics  2013;77(1):56-66.
Using genetic data from an obesity candidate gene study of self-reported African Americans and European Americans, we investigated the number of Ancestry Informative Markers (AIMs) and candidate gene SNPs necessary to infer continental ancestry. Proportions of African and European ancestry were assessed with STRUCTURE (K=2), using 276 AIMs. These reference values were compared to estimates derived using 120, 60, 30, and 15 SNP subsets randomly chosen from the 276 AIMs and from 1144 SNPs in 44 candidate genes. All subsets generated estimates of ancestry consistent with the reference estimates, with mean correlations greater than 0.99 for all subsets of AIMs, and mean correlations of 0.99±0.003; 0.98± 0.01; 0.93±0.03; and 0.81± 0.11 for subsets of 120, 60, 30, and 15 candidate gene SNPs, respectively. Among African Americans, the median absolute difference from reference African ancestry values ranged from 0.01 to 0.03 for the four AIMs subsets and from 0.03 to 0.09 for the four candidate gene SNP subsets. Furthermore, YRI/CEU Fst values provided a metric to predict the performance of candidate gene SNPs. Our results demonstrate that a small number of SNPs randomly selected from candidate genes can be used to estimate admixture proportions in African Americans reliably.
PMCID: PMC3677760  PMID: 23278390
Ancestry Informative Markers; AIMs; African Americans; candidate genes; genetic ancestry; Structure; admixture; population stratification
11.  Analysis of Secondary Phenotype Involving the Interactive Effect of the Secondary Phenotype and Genetic Variants on the Primary Disease 
Annals of human genetics  2012;76(6):484-499.
A genome-wide association (GWA) study is usually designed as a case-control study, where the presence and absence of the primary disease defines the cases and controls, respectively. Using the existing data from GWA studies, investigators are also trying to identify the association between genetic variants and secondary phenotypes, which are defined as traits associated with the primary disease. However, recent studies have shown that bias arises in the estimation of marker-secondary phenotype association using originally collected data. We recently proposed a bias correction approach to accurately estimate the odds ratio (OR) for marker-secondary phenotype association. In this communication, we further investigated whether our bias correction approach is robust for a scenario involving the interactive effect of the secondary phenotype and genetic variants on the primary disease. We found that in such a scenario, our bias correction approach also provides an accurate estimation of OR for marker-secondary phenotype association. We investigated accuracy of our approach using simulation studies and showed that the approach better controlled for type I errors than the existing approaches. We also applied our bias correction approach to the real data analysis of association between an N-acetyltransferase gene, NAT2, and smoking on the basis of colorectal cancer data.
PMCID: PMC3472120  PMID: 22881407
odds ratio; bias; secondary phenotype; SNP; genome-wide association study; frequency-matched study design
12.  Identification and Confirmation of an Exonic Splicing Enhancer Variation in Exon 5 of the Alzheimer Disease Associated PICALM Gene 
Annals of human genetics  2012;76(6):448-453.
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder characterized by memory and cognitive impairment and is the leading cause of dementia in the elderly. A number of genome wide association studies and subsequent replication studies have been published recently on late onset AD (LOAD). These studies identified several new susceptibility genes including phosphatidylinositol-binding clathrin assembly protein (PICALM) on chromosome 11. The aim of our study was to examine the entire coding sequence of PICALM to determine if the association could be explained by any previously undetected sequence variation. Therefore, we sequenced 48 cases and 48 controls homozygous for the risk allele in the signal SNP rs3851179. We did not find any new variants; however, rs592297, a known coding synonymous SNP that is part of an exonic splice enhancer region in exon 5, is in strong linkage disequilibrium with rs3851179 and should be examined for functional significance in Alzheimer pathophysiology.
PMCID: PMC3472126  PMID: 22943764
Alzheimer; Neurodegenerative disease; PICALM; Sequencing; Exonic splicing
13.  Association between SNP heterozygosity and quantitative traits in the Framingham Heart Study 
Annals of human genetics  2009;73(0 4):465-473.
Associations between multilocus heterozygosity and fitness traits, also termed heterozygosity and fitness correlations (HFCs), have been reported in numerous organisms. These studies, in general, indicate a positive relationship between heterozygosity and fitness traits. We studied the association between genome-wide heterozygosity at 706 non-synonymous and synonymous SNPs and 19 quantitative traits, including morphological, biochemical and fitness traits in the Framingham Heart Study. Statistically significant association was found between heterozygosity and systolic and diastolic blood pressures as well as left ventricular diameter and wall thickness. These results suggest that heterozygosity may be associated with traits, such as blood pressure that closely track environmental variations. Balancing selection may be operating in the maintenance of heterozygosity and the major components of blood pressure and hypertension. Genome wide SNP heterozygosity may be used to understand the phenomenon of dominance as well as the evolutionary basis of many quantitative traits in humans.
PMCID: PMC3760672  PMID: 19523151
Genome wide heterozygosity; Single Nucleotide Polymorphisms; Balancing selection; inbreeding; association; plastic traits; dominance
14.  Statistical tests for detecting rare variants using variance-stabilizing transformations 
Annals of human genetics  2012;76(5):402-409.
Next generation sequencing holds great promise for detecting rare variants underlying complex human traits. Due to their extremely low allele frequencies, the normality approximation for a proportion no longer works well. The Fisher’s exact method appears to be suitable but it is conservative. We investigate the utility of various variance-stabilizing transformations in single marker association analysis on rare variants. Unlike a proportion itself, the variance of the transformed proportions no longer depends on the proportion, making application of such transformations to rare variant association analysis extremely appealing. Simulation studies demonstrate that tests based on such transformations are more powerful than the Fisher’s exact test while controlling for type I error rate. Based on theoretical considerations and results from simulation studies, we recommend the test based on the Anscombe transformation over tests with other transformations.
PMCID: PMC3418475  PMID: 22724536
rare variants; sequencing; variance-stabilizing transformation; association
15.  Genome-wide association and linkage study in the Amish detects a novel candidate late-onset Alzheimer disease gene 
Annals of human genetics  2012;76(5):342-351.
To identify novel late-onset Alzheimer disease (LOAD) risk genes, we have analyzed Amish populations of Ohio and Indiana. We performed genome-wide SNP linkage and association studies on 798 individuals (109 with LOAD). We tested association using the Modified Quasi-Likelihood Score (MQLS) test and also performed two-point and multipoint linkage analyses. We found that LOAD was significantly associated with APOE (P=9.0×10-6) in all our ascertainment regions except for the Adams County, Indiana, community (P=0.55). Genome-wide, the most strongly associated SNP was rs12361953 (P=7.92×10-7). A very strong, genome-wide significant multipoint peak (recessive HLOD=6.14, dominant HLOD=6.05) was detected on 2p12. Three additional loci with multipoint HLOD scores >3 were detected on 3q26, 9q31, and 18p11. Converging linkage and association results, the most significantly associated SNP under the 2p12 peak was at rs2974151 (P=1.29×10-4). This SNP is located in CTNNA2, which encodes catenin alpha 2, a neuronal-specific catenin known to have function in the developing brain. These results identify CTNNA2 as a novel candidate LOAD gene, and implicate three other regions of the genome as novel LOAD loci. These results underscore the utility of using family-based linkage and association analysis in isolated populations to identify novel loci for traits with complex genetic architecture.
PMCID: PMC3419486  PMID: 22881374
GWAS; Linkage; founder population; Amish; Alzheimer
16.  Successful aging shows linkage to chromosomes 6, 7, and 14 in the Amish 
Annals of human genetics  2011;75(4):516-528.
Successful aging (SA) is a multi-dimensional phenotype involving preservation of cognitive ability, physical function, and social engagement throughout life. Multiple components of SA are heritable, supporting a genetic component. The Old Order Amish are genetically and socially isolated with homogeneous lifestyles, making them a suitable population for studying the genetics of SA. DNA and measures of SA were collected on 214 cognitively intact Amish individuals over age 80. Individuals were grouped into a 13-generation pedigree using the Anabaptist Genealogy Database. A linkage screen of 5,944 single nucleotide polymorphisms (SNPs) was performed using 12 informative sub-pedigrees with an affected-only 2-point and multipoint linkage analysis. Eleven SNPs produced 2-point LOD scores >2, suggestive of linkage. Multipoint linkage analyses, allowing for heterogeneity, detected significant lod scores on chromosomes 6 (HLOD = 4.50), 7 (LOD* = 3.11), and 14 (HLOD = 4.17), suggesting multiple new loci underlying SA.
PMCID: PMC3756593  PMID: 21668908
Amish; longevity; genetic epidemiology; family-based study; population isolate
17.  PTX3 genetic variation and dizygotic twinning in The Gambia: could pleiotropy with innate immunity explain common dizygotic twinning in Africa? 
Annals of human genetics  2012;76(6):454-463.
Dizygotic (DZ) twinning has a genetic component and is common among sub-Saharan Africans; in The Gambia its frequency is up to 3% of live births. Variation in Pentraxin 3 (PTX3), a soluble pattern recognition receptor that plays an important role both in humoral innate immunity and in female fertility, has been associated with resistance to M. tuberculosis infection and to P. aeruginosa infection in cystic fibrosis patients. We tested whether PTX3 variants in Gambian women associate with DZ twinning, by genotyping five PTX3 single nucleotide polymorphisms (SNPs) in 130 sister pairs (96 full sibs and 34 half sibs) who had DZ twins. We found that two, three and five SNP haplotypes differed in frequency between twinning mothers and those without a history of twinning (from p = 0.006 to 3.03e-06 for two SNP and three SNP haplotypes, respectively). Twinning mothers and West African tuberculosis-controls from a previous study shared several frequent haplotypes. Most importantly, our data are consistent with the previously reported association of PTX3 and female fertility in a West African sample from Ghana. Taken together, these results indicate that selective pressure on PTX3 variants that affect the innate immune response to infectious agents, could also produce the observed high incidence of DZ twinning in Gambians.
PMCID: PMC3731069  PMID: 22834944
dizygotic twinning; fertility; innate immunity; Pentraxin 3; The Gambia; Africa
18.  Diversification of the ADH1B Gene during Expansion of Modern Humans 
Annals of human genetics  2011;75(4):497-507.
A variant allele, ADH1B*48His, also known as ADH1B*2, at the human Alcohol Dehydrogenase 1B gene (ADH1B) is strongly associated with alcoholism in some populations and has an unusual geographic distribution. Strong evidence implies selection has increased the frequency of this allele in some East Asian populations but does not fully explain its geographic pattern. We have studied haplotypes of ten single nucleotide polymorphisms (SNPs) and two short tandem repeat polymorphisms (STRPs) in the ADH1B region in 2,206 individuals from a worldwide set of populations. These SNPs and STRPs define nine common haplogroups most of which have distinct geographic patterns. The haplogroups H5 and H6, both with the derived ADH1B*48His allele, appear restricted to the Middle East and East Asia, respectively. The positively selected H7 is derived from H6 by a new regulatory region variant defining SNP rs3811801 restricted to East Asia. Age estimates of the haplogroups based on the STRPs also agree with the time of the migration events estimated by other studies. H7 is estimated to have expanded recently, around 2,800 years ago, and ancient DNA samples from North China confirm its presence about that time. The dating of the H7 expansion may help understand the selective force on the ADH1B gene.
PMCID: PMC3722864  PMID: 21592108
ADH1B; Haplotype evolution; Recent expansion; Geographic distribution
19.  Bayes Factor based on the Trend Test Incorporating Hardy-Weinberg Disequilibrium: More Powerful to Detect Genetic Association 
Annals of Human Genetics  2012;76(4):301-311.
In the analysis of case-control genetic association, the trend test and Pearson’s test are the two most commonly used tests. In genome-wide association studies (GWAS), Bayes factor is a useful tool to support significant p-values, and a better measure than p-value when results are compared across studies with different sample sizes. When reporting the p-value of the trend test, we propose a Bayes factor directly based on the trend test. To improve the power to detect association under recessive or dominant genetic models, we propose a Bayes factor based on the trend test and incorporating Hardy-Weinberg disequilibrium in cases. When the true model is unknown, or both the trend test and Pearson’s test or other robust tests are applied in genome-wide scans, we propose a joint Bayes factor, combining the previous two Bayes factors. All three Bayes factors studied in this paper have closed forms and are easy to compute without integrations, so they can be reported along with p-values, especially in GWAS. We discuss how to use each of them and how to specify priors. Simulation studies and applications to three GWAS are provided to illustrate their usefulness to detect non-additive gene susceptibility in practice.
PMCID: PMC3372619  PMID: 22607017
Bayes factor; Comparing association studies; Genome-wide association studies; Hardy-Weinberg disequilibrium; Pearson’s test; Trend test
20.  Similarity-based multi-marker association tests for continuous traits 
Annals of Human Genetics  2012;76(3):246-260.
Testing multiple markers simultaneously not only can capture the linkage disequilibrium patterns but also can decrease the number of tests and thus alleviate the multiple-testing penalty. If a gene is associated with a phenotype, subjects with similar genotypes in this gene should also have similar phenotypes. Based on this concept, we have developed a general framework that is applicable to continuous traits. Two similarity-based tests (namely, SIMc and SIMp tests) were derived as special cases of the general framework. In our simulation study, we compared the power of the two tests with that of the single-marker analysis, a standard haplotype regression, and a popular and powerful kernel machine regression. Our SIMc test outperforms other tests when the average r-square (a measure of linkage disequilibrium) between the causal variant and the surrounding markers is larger than 0.3 or when the causal allele is common (say, frequency = 0.3). Our SIMp test outperforms other tests when the causal variant was introduced at common haplotypes (the maximum frequency of risk haplotypes > 0.4). We also applied our two tests to an adiposity data set to show their utility.
PMCID: PMC3329946  PMID: 22497480
Haplotype; Similarity; Genomic distance; Linkage disequilibrium; Multi-marker test; Body-mass index; CPE gene
21.  Effect of Population Stratification On False Positive Rates Of Population-based Association Analyses Of Quantitative Traits 
Annals of Human Genetics  2012;76(3):237-245.
It is now well established that population stratification can result in spurious association findings in genetic case-control studies. However, very few studies have addressed similar issues for mapping quantitative traits. Since quantitative phenotypes are often precursors of clinical end-point traits and carry more information on within-genotype trait variability, it has been argued that studying these quantitative traits may be a more powerful strategy to map genes than the binary clinical end-points. Thus, it is of interest to evaluate the adverse effects of population stratification on the analyses of quantitative traits. The popular statistical tests of association for quantitative traits using population level data are ANOVA, linear regression with an additive allelic effect and Kruskal-Wallis. We have theoretically studied the marginal effects of genetic heterogeneity and phenotypic heterogeneity as well as their joint effects on the false positive rate of the three tests mentioned above. We have carried out extensive simulations under different genetic models and probability distributions of quantitative traits to assess the rate of false positives in the presence of population stratification. We find that the rate of false positives increases at a very fast rate with simultaneous increase in differences in the standardized phenotypic means and marker allele frequencies in the subpopulations.
PMCID: PMC3334349  PMID: 22497479
allelic additivity; ANOVA; genetic heterogeneity; Kruskal-Wallis test; phenotypic heterogeneity
22.  Application of a novel hybrid study design to explore gene-environment interactions in orofacial clefts 
Annals of Human Genetics  2012;76(3):221-236.
Orofacial clefts are common birth defects with strong evidence for both genetic and environmental causal factors. Candidate-gene studies combined with exposures known to influence the outcome provide a highly targeted approach to detecting GxE interactions. We developed a new statistical approach that combines the case-control and offspring-parent triad designs into a “hybrid design” to search for GxE interactions among 334 autosomal cleft candidate genes and maternal first-trimester exposure to smoking, alcohol, coffee, folic acid supplements, dietary folate, and vitamin A. The study population comprised 425 case-parent triads of isolated clefts and 562 control-parent triads derived from a nationwide study of orofacial clefts in Norway (1996-2001). A full maximum-likelihood model was used in combination with a Wald test statistic to screen for statistically significant GxE interaction between strata of exposed and unexposed mothers. In addition, we performed pathway-based analyses on 28 detoxification genes and 21 genes involved in folic acid metabolism. With the possible exception of the T-box 4 gene (TBX4) and dietary folate interaction in isolated CPO, there was little evidence overall of GxE interaction in our data. This study is the largest to date aimed at detecting interactions between orofacial clefts candidate genes and well-established risk exposures.
PMCID: PMC3334353  PMID: 22497478
Birth defects; orofacial cleft; cleft lip; cleft palate; genetic epidemiology
23.  Fragile X syndrome: the FMR1 CGG repeat distribution among world populations 
Annals of Human Genetics  2011;76(2):178-191.
Fragile X Syndrome (FXS) is characterized by moderate to severe intellectual disability which is accompanied by macroorchidism and distinct facial morphology. FXS is caused by the expansion of the CGG trinucleotide repeat in the 5′ untranslated region of the Fragile X mental retardation 1 (FMR1) gene. The syndrome has been studied in ethnically diverse populations around the world and has been extensively characterized in several populations. Similar to other trinucleotide expansion disorders, the gene specific instability of FMR1 is not accompanied by genomic instability. Currently we do not have a comprehensive understanding of the molecular underpinnings of gene specific instability associated with tandem repeats. Molecular evidence from in vitro experiments and animal models supports several pathways for gene specific trinucleotide repeat expansion. However, whether the mechanisms reported from other systems contribute to trinucleotide repeat expansion in humans is not clear. To understand how repeat instability in humans could occur, the CGG repeat expansion is explored through molecular analysis and population studies which characterized CGG repeat alleles of FMR1. Finally, the review discusses the relevance of these studies in understanding the mechanism of trinucleotide repeat expansion in FXS.
PMCID: PMC3288311  PMID: 22188182
FMR1 gene; fragile x mutation; prevalence
24.  Genome-wide association of serum uric acid concentration: replication of sequence variants in an island population of the Adriatic coast of Croatia 
Annals of Human Genetics  2012;76(2):121-127.
A genome-wide association study of serum uric acid levels was performed in a relatively isolated population of European descent from an island of the Adriatic coast of Croatia. The study sample included 532 unrelated and 768 related individuals from 235 pedigrees. Inflation due to relatedness was controlled by using genomic control. Genetic association was assessed with 2,241,249 SNPs in 1300 samples after adjusting for age and gender. Our study replicated four previously reported serum uric acid loci (SLC2A9, ABCG2, RREB1, and SLC22A12). The strongest association was found with a SNP in SLC2A9 (rs13129697, P=2.33×10−19), which exhibited significant gender-specific effects, 35.76μmol/L (P=2.11×10−19) in females and 19.58 μmol/L (P=5.40×10−5) in males. Within this region of high linkage disequilibrium, we also detected a strong association with a non-synonymous SNP, rs16890979 (P=2.24×10−17), a putative causal variant for serum uric acid variation. In addition, we identified several novel loci suggestive of association with uric acid levels (SEMA5A, TMEM18, SLC28A2, and ODZ2), although the P-values (P<5×10−6) did not reach the threshold of genome-wide significance. Together, these findings provide further confirmation of previously reported uric acid-related genetic variants and highlight suggestive new loci for additional investigation.
PMCID: PMC3302578  PMID: 22229870
Serum uric acid; genome-wide association; Adriatic island population
25.  From a single whole exome read to notions of clinical screening: primary ciliary dyskinesia and RSPH9 p.Lys268del in the Arabian Peninsula 
Annals of human genetics  2012;76(3):211-220.
Primary ciliary dyskinesia (PCD) is a genetic disorder, usually autosomal recessive, causing early respiratory disease and later subfertility. Whole exome sequencing may enable efficient analysis for locus heterogeneous disorders such as PCD. We whole exome sequenced one consanguineous Saudi Arabian with clinically diagnosed PCD and normal laterality, to attempt ab initio molecular diagnosis.
We reviewed thirteen known PCD genes and potentially autozygous regions (extended homozygosity) for homozygous exon deletions, non-dbSNP codon, splice-site base variants or small indels. Homozygous non-dbSNP changes were also reviewed exome-wide.
One single molecular read representing RSPH9 p.Lys268del was observed, with no wildtype reads, and a notable deficiency of mapped reads at this location. Among all observations, RSPH9 was the strongest candidate for causality. Searching unmapped reads revealed seven more mutant reads. Direct assay for p.Lys268del (MboII digest) confirmed homozygosity in the affected individual, then confirmed homozygosity in three siblings with bronchiectasis. Our finding in southwest Saudi Arabia indicates that p.Lys268del, previously observed in two Bedouin families (Israel, UAE) is geographically widespread in the Arabian Peninsula. Analogous with cystic fibrosis CFTR p.Phe508del, screening for RSPH9 p.Lys268del (which lacks sentinel dextrocardia) in those at risk would help in early diagnosis, tailored clinical management, genetic counselling and primary prevention.
PMCID: PMC3575730  PMID: 22384920
high-throughput nucleotide sequencing; primary ciliary dyskinesia; screening

