1.  The DRD4 Exon III VNTR, Bupropion, and Associations With Prospective Abstinence 
Nicotine & Tobacco Research  2012;15(7):1190-1200.
DRD4 Exon III Variable Number of Tandem Repeat (VNTR) variation was found to interact with bupropion to influence prospective smoking abstinence, in a recently published longitudinal analyses of N = 331 individuals from a randomized double-blind placebo-controlled trial of bupropion and intensive cognitive–behavioral mood management therapy.
We used univariate, multivariate, and longitudinal logistic regression to evaluate gene, treatment, time, and interaction effects on point prevalence and continuous abstinence at end of treatment, 6 months, and 12 months, respectively, in N = 416 European ancestry participants in a double-blind pharmacogenetic efficacy trial randomizing participants to active or placebo bupropion. Participants received 10 weeks of pharmacotherapy and 7 sessions of behavioral therapy, with a target quit date 2 weeks after initiating both therapies. VNTR genotypes were coded with the long allele dominant resulting in 4 analysis categories. Covariates included demographics, dependence measures, depressive symptoms, and genetic ancestry. We also performed genotype-stratified secondary analyses.
We observed significant effects of time in longitudinal analyses of both abstinence outcomes, of treatment in individuals with VNTR long allele genotypes for both abstinence outcomes, and of covariates in some analyses. We observed non-significantly larger differences in active versus placebo effect sizes in individuals with VNTR long allele genotypes than in individuals without the VNTR long allele, in the directions previously reported.
VNTR by treatment interaction differences between these and previous analyses may be attributable to insufficient size of the replication sample. Analyses of multiple randomized clinical trials will enable identification and validation of factors mediating treatment response.
PMCID: PMC3682839  PMID: 23212438
2.  A Monte Carlo Permutation Test for Random Mating Using Genome Sequences 
PLoS ONE  2013;8(8):e71496.
Testing for random mating of a population is important in population genetics, because deviations from randomness of mating may indicate inbreeding, population stratification, natural selection, or sampling bias. However, current methods use only observed numbers of genotypes and alleles, and do not take advantage of the fact that the advent of sequencing technology provides an opportunity to investigate this topic in unprecedented detail. To address this opportunity, a novel statistical test for random mating is required in population genomics studies for which large sequencing datasets are generally available. Here, we propose a Monte-Carlo-based-permutation test (MCP) as an approach to detect random mating. Computer simulations used to evaluate the performance of the permutation test indicate that its type I error is well controlled and that its statistical power is greater than that of the commonly used chi-square test (CHI). Our simulation study shows the power of our test is greater for datasets characterized by lower levels of migration between subpopulations. In addition, test power increases with increasing recombination rate, sample size, and divergence time of subpopulations. For populations exhibiting limited migration and having average levels of population divergence, the statistical power approaches 1 for sequences longer than 1Mbp and for samples of 400 individuals or more. Taken together, our results suggest that our permutation test is a valuable tool to detect random mating of populations, especially in population genomics studies.
PMCID: PMC3734302  PMID: 23940765
3.  Resolving ambiguity in the phylogenetic relationship of genotypes A, B, and C of hepatitis B virus 
Hepatitis B virus (HBV) is an important infectious agent that causes widespread concern because billions of people are infected by at least 8 different HBV genotypes worldwide. However, reconstruction of the phylogenetic relationship between HBV genotypes is difficult. Specifically, the phylogenetic relationships among genotypes A, B, and C are not clear from previous studies because of the confounding effects of genotype recombination. In order to clarify the evolutionary relationships, a rigorous approach is required that can effectively explore genetic sequences with recombination.
In the present study, phylogenetic relationship of the HBV genotypes was reconstructed using a consensus phylogeny of phylogenetic trees of HBV genome segments. Reliability of the reconstructed phylogeny was extensively evaluated in agreements of local phylogenies of genome segments.
The reconstructed phylogenetic tree revealed that HBV genotypes B and C had a closer phylogenetic relationship than genotypes A and B or A and C. Evaluations showed the consensus method was capable to reconstruct reliable phylogenetic relationship in the presence of recombinants.
The consensus method implemented in this study provides an alternative approach for reconstructing reliable phylogenetic relationships for viruses with possible genetic recombination. Our approach revealed the phylogenetic relationships of genotypes A, B, and C of HBV.
PMCID: PMC3682936  PMID: 23758960
Phylogeny; Hepatitis B virus; Recombination; Consensus tree
4.  Dopamine genes and nicotine dependence in treatment seeking and community smokers 
We utilized a cohort of 828 treatment seeking self-identified white cigarette smokers (50% female) to rank candidate gene single nucleotide polymorphisms (SNPs) associated with the Fagerström Test for Nicotine Dependence (FTND), a measure of nicotine dependence which assesses quantity of cigarettes smoked and time- and place-dependent characteristics of the respondent’s smoking behavior. 1123 SNPs at 55 autosomal candidate genes, nicotinic acetylcholine receptors and genes involved in dopaminergic function, were tested for association to baseline FTND scores adjusted for age, depression, education, sex and study site. SNP P values were adjusted for the number of transmission models, the number of SNPs tested per candidate gene, and their intragenic correlation. DRD2, SLC6A3 and NR4A2 SNPs with adjusted P values < 0.10 were considered sufficiently noteworthy to justify further genetic, bioinformatic and literature analyses. Each independent signal among the top-ranked SNPs accounted for ~1% of the FTND variance in this sample. The DRD2 SNP appears to represent a novel association with nicotine dependence. The SLC6A3 SNPs have previously been shown to be associated with SLC6A3 transcription or dopamine transporter density in vitro, in vivo and ex vivo. Analysis of SLC6A3 and NR4A2 SNPs identified a statistically significant gene-gene interaction (P=0.001), consistent with in vitro evidence that the NR4A2 protein product (NURR1) regulates SLC6A3 transcription. A community cohort of N=175 multiplex ever smoking pedigrees (N=423 ever smokers) provided nominal evidence for association with the FTND at these top ranked SNPs, uncorrected for multiple comparisons.
PMCID: PMC3558036  PMID: 19494806
dopamine transporter; Fagerström Test for Nicotine Dependence; single nucleotide polymorphism; candidate gene association scan; gene-gene interaction
5.  Nicotine withdrawal sensitivity, linkage to chr6q26, and association of OPRM1 SNPs in the SMOking in FAMilies (SMOFAM) sample 
Nicotine withdrawal symptoms are related to smoking cessation. A Rasch model has been used to develop a unidimensional sensitivity score representing multiple correlated measures of nicotine withdrawal. A previous autosome-wide screen identified a nonparametric linkage (NPL) log-likelihood ratio (LOD) score of 2.7 on chromosome 6q26 for the sum of nine withdrawal symptoms.
The objectives of these analyses are: a) to assess the influence of nicotine withdrawal sensitivity on relapse, b) conduct autosome-wide NPL analysis of nicotine withdrawal sensitivity among 158 pedigrees with 432 individuals with microsatellite genotypes and nicotine withdrawal scores, and c) explore family-based association of single nucleotide polymorphism (SNPs) at the mu opioid receptor (MOR) candidate gene (OPRM1) to nicotine withdrawal sensitivity in 172 nuclear pedigrees with 419 individuals with both SNP genotypes and nicotine withdrawal scores.
An increased risk for relapse was associated with nicotine withdrawal sensitivity score (odds ratio, OR=1.25, 95% confidence interval, 95%CI=1.10,1.42). A maximal NPL LOD score of 3.15, suggestive of significant linkage, was identified at chr6q26 for nicotine withdrawal sensitivity. Evaluation of 18 OPRM1 SNPs via the family based association test (FBAT) with the nicotine withdrawal sensitivity score identified eight tagging SNPs with global P-values<0.05 and false discovery rate Q-values<0.06.
An increased risk of relapse, suggestive linkage at chr6q26, and nominally significant association with multiple OPRM1 SNPs was found with Rasch modeled nicotine withdrawal sensitivity score in a multiplex smoking pedigree sample. Future studies should attempt to replicate these findings and investigate the relationship between nicotine withdrawal symptoms and variation at OPRM1.
PMCID: PMC3536862  PMID: 19959688
6.  Detection for gene-gene co-association via kernel canonical correlation analysis 
BMC Genetics  2012;13:83.
Currently, most methods for detecting gene-gene interaction (GGI) in genomewide association studies (GWASs) are limited in their use of single nucleotide polymorphism (SNP) as the unit of association. One way to address this drawback is to consider higher level units such as genes or regions in the analysis. Earlier we proposed a statistic based on canonical correlations (CCU) as a gene-based method for detecting gene-gene co-association. However, it can only capture linear relationship and not nonlinear correlation between genes. We therefore proposed a counterpart (KCCU) based on kernel canonical correlation analysis (KCCA).
Through simulation the KCCU statistic was shown to be a valid test and more powerful than CCU statistic with respect to sample size and interaction odds ratio. Analysis of data from regions involving three genes on rheumatoid arthritis (RA) from Genetic Analysis Workshop 16 (GAW16) indicated that only KCCU statistic was able to identify interactions reported earlier.
KCCU statistic is a valid and powerful gene-based method for detecting gene-gene co-association.
PMCID: PMC3506484  PMID: 23039928
Genome-wide association study (GWAS); Gene-gene co-association; Gene-gene interaction (GGI); Kernel canonical correlation analysis (KCCA)
7.  Genetic Divergence Disclosing a Rapid Prehistorical Dispersion of Native Americans in Central and South America 
PLoS ONE  2012;7(9):e44788.
An accurate estimate of the divergence time between Native Americans is important for understanding the initial entry and early dispersion of human beings in the New World. Current methods for estimating the genetic divergence time of populations could seriously depart from a linear relationship with the true divergence for multiple populations of a different population size and significant population expansion. Here, to address this problem, we propose a novel measure to estimate the genetic divergence time of populations. Computer simulation revealed that the new measure maintained an excellent linear correlation with the population divergence time in complicated multi-population scenarios with population expansion. Utilizing the new measure and microsatellite data of 21 Native American populations, we investigated the genetic divergences of the Native American populations. The results indicated that genetic divergences between North American populations are greater than that between Central and South American populations. None of the divergences, however, were large enough to constitute convincing evidence supporting the two-wave or multi-wave migration model for the initial entry of human beings into America. The genetic affinity of the Native American populations was further explored using Neighbor-Net and the genetic divergences suggested that these populations could be categorized into four genetic groups living in four different ecologic zones. The divergence of the population groups suggests that the early dispersion of human beings in America was a multi-step procedure. Further, the divergences suggest the rapid dispersion of Native Americans in Central and South Americas after a long standstill period in North America.
PMCID: PMC3435283  PMID: 22970308
8.  Paleolithic Contingent in Modern Japanese: Estimation and Inference using Genome-wide Data 
Scientific Reports  2012;2:355.
The genetic origins of Japanese populations have been controversial. Upper Paleolithic Japanese, i.e. Jomon, developed independently in Japanese islands for more than 10,000 years until the isolation was ended with the influxes of continental immigrants about 2,000 years ago. However, the knowledge of origin of Jomon and its contribution to the genetic pool of contemporary Japanese is still limited, albeit the extensive studies using mtDNA and Y chromosomes. In this report, we aimed to infer the origin of Jomon and to estimate its contribution to Japanese by fitting an admixture model with missing data from Jomon to a genome-wide data from 94 worldwide populations. Our results showed that the genetic contributions of Jomon, the Paleolithic contingent in Japanese, are 54.3∼62.3% in Ryukyuans and 23.1∼39.5% in mainland Japanese, respectively. Utilizing inferred allele frequencies of the Jomon population, we further showed the Paleolithic contingent in Japanese had a Northeast Asia origin.
PMCID: PMC3320058  PMID: 22482036
9.  Gene- or region-based association study via kernel principal component analysis 
BMC Genetics  2011;12:75.
In genetic association study, especially in GWAS, gene- or region-based methods have been more popular to detect the association between multiple SNPs and diseases (or traits). Kernel principal component analysis combined with logistic regression test (KPCA-LRT) has been successfully used in classifying gene expression data. Nevertheless, the purpose of association study is to detect the correlation between genetic variations and disease rather than to classify the sample, and the genomic data is categorical rather than numerical. Recently, although the kernel-based logistic regression model in association study has been proposed by projecting the nonlinear original SNPs data into a linear feature space, it is still impacted by multicolinearity between the projections, which may lead to loss of power. We, therefore, proposed a KPCA-LRT model to avoid the multicolinearity.
Simulation results showed that KPCA-LRT was always more powerful than principal component analysis combined with logistic regression test (PCA-LRT) at different sample sizes, different significant levels and different relative risks, especially at the genewide level (1E-5) and lower relative risks (RR = 1.2, 1.3). Application to the four gene regions of rheumatoid arthritis (RA) data from Genetic Analysis Workshop16 (GAW16) indicated that KPCA-LRT had better performance than single-locus test and PCA-LRT.
KPCA-LRT is a valid and powerful gene- or region-based method for the analysis of GWAS data set, especially under lower relative risks and lower significant levels.
PMCID: PMC3176196  PMID: 21871061
10.  Accelerating Haplotype-Based Genome-Wide Association Study Using Perfect Phylogeny and Phase-Known Reference Data 
PLoS ONE  2011;6(7):e22097.
The genome-wide association study (GWAS) has become a routine approach for mapping disease risk loci with the advent of large-scale genotyping technologies. Multi-allelic haplotype markers can provide superior power compared with single-SNP markers in mapping disease loci. However, the application of haplotype-based analysis to GWAS is usually bottlenecked by prohibitive time cost for haplotype inference, also known as phasing. In this study, we developed an efficient approach to haplotype-based analysis in GWAS. By using a reference panel, our method accelerated the phasing process and reduced the potential bias generated by unrealistic assumptions in phasing process. The haplotype-based approach delivers great power and no type I error inflation for association studies. With only a medium-size reference panel, phasing error in our method is comparable to the genotyping error afforded by commercial genotyping solutions.
PMCID: PMC3137625  PMID: 21789217
11.  Genome-wide Linkage of Cotinine Pharmacokinetics Suggests Candidate Regions on Chromosomes 9 and 11 
Characterizing cotinine pharmacokinetics is a useful way to study nicotine metabolism because the same liver enzyme is primarily responsible for the metabolism of both, and the clearances of nicotine and cotinine are highly correlated. We conducted a whole-genome linkage analysis to search for candidate regions influencing quantitative variation in cotinine pharmacokinetics in a large-scale pharmacokinetic study with 61 families containing 224 healthy adult participants. The strongest linkage signal was identified at 135 cM of chromosome 9 with LOD=2.81 and P=0.0002; two other suggestive linkage peaks appear at 31.4 and 73.5 cM of chromosome 11 with LOD=1.96 (P=0.0013) and 1.94 (P=0.0014). The confidence level of the linkage between the three genome regions and cotinine pharmacokinetics is statistically significant with a genome-wide empirical probability of P=0.029.
PMCID: PMC2693302  PMID: 18785207
pharmacokinetics; nicotine; dependence; linkage analysis
Pharmacogenetics and genomics  2009;19(5):388-398.
The ratio of trans-3’hydroxycotinine/cotinine (3HC/COT) is a marker of CYP2A6 activity, an important determinant of nicotine metabolism. This analysis sought to conduct a combined genetic epidemiologic and pharmacogenetic investigation of the 3HC/COT ratio in plasma and urine.
One hundred thirty nine twin pairs (110 monozygotic [MZ] and 29 dizygotic [DZ]) underwent a 30-minute infusion of stable isotope-labeled nicotine and its major metabolite, cotinine, followed by an 8-hour in-hospital stay. Blood and urine samples were taken at regular intervals for analysis of nicotine, cotinine, and metabolites. DNA was genotyped to confirm zygosity and for variation in the gene for the primary nicotine metabolic enzyme, CYP2A6 (variants genotyped: *1B, *1×2, *2, *4, *9, *12). Univariate biometric analyses quantified genetic and environmental influences on each measure in the presence and absence of covariates, including measured CYP2A6 genotype.
There was a substantial amount of variation in the free 3HC/COT ratio in plasma (6 hours post-infusion) attributable to additive genetic influences (67.4%, 95% CI = 55.9–76.2%). The heritability estimate was reduced to 61.0% and 49.4%, respectively, after taking into account the effect of covariates and CYP2A6 genotype. In urine (collected over 8 hours), the estimated amount of variation in the 3HC/COT ratio attributable to additive genetic influences was smaller (47.2%, 95% CI = 0–67.2%) and decreased to 44.6% and 42.0% after accounting for covariates and genotype.
Additive genetic factors are prominent in determining variation in plasma 3HC/COT variation but less so in determining variation in urine 3HC/COT.
PMCID: PMC2849278  PMID: 19300303
pharmacogenetics; nicotine; cotinine; metabolism; CYP2A6; twins; genetics; heritability

