Search tips
Search criteria

Results 1-25 (30)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  A keratin 15 containing stem cell population from the hair follicle contributes to squamous papilloma development in the mouse 
Molecular carcinogenesis  2012;52(10):751-759.
The multistage model of non-melanoma skin carcinogenesis has contributed significantly to our understanding of epithelial cancer in general. We used the Krt1-15CrePR1;R26R transgenic mouse to determine the contribution of keratin 15+ cells from the hair follicle to skin tumor development by following the labeled progeny of the keratin 15 expressing cells into papillomas. We present three novel observations. First, we found that keratin 15 expressing cells contribute to most of the papillomas by 20 weeks of promotion. Second, in contrast to the transient behavior of labeled keratin 15-derived progeny in skin wound healing, keratin 15 progeny persist in papillomas and some malignancies for many months following transient induction of the reporter gene. Third, papillomas have surprising heterogeneity not only in their cellular composition, but also in their expression of the codon 61 signature Ha-ras mutation with approximately 30 percent of keratin 15-derived regions expressing the mutation. Together, these results demonstrate that keratin 15 expressing cells of the hair follicle contribute to cutaneous papillomas with long term persistence and a subset of which express the Ha-ras signature mutation characteristic of initiated cells.
PMCID: PMC3424369  PMID: 22431489
skin carcinogenesis; stem cells; hair follicles; papillomas
2.  Mapping genes with longitudinal phenotypes via Bayesian posterior probabilities 
BMC Proceedings  2014;8(Suppl 1):S81.
Most association studies focus on disease risk, with less attention paid to disease progression or severity. These phenotypes require longitudinal data. This paper presents a new method for analyzing longitudinal data to map genes in both population-based and family-based studies. Using simulated systolic blood pressure measurements obtained from Genetic Analysis Workshop 18, we cluster the phenotype data into trajectory subgroups. We then use the Bayesian posterior probability of being in the high subgroup as a quantitative trait in an association analysis with genotype data. This method maintains high power (>80%) in locating genes known to affect the simulated phenotype for most specified significance levels (α). We believe that this method can be useful to aid in the discovery of genes that affect severity or progression of disease.
PMCID: PMC4143622  PMID: 25519410
3.  Single variant and multi-variant trend tests for genetic association with next generation sequencing that are robust to sequencing error 
Human heredity  2013;74(0):10.1159/000346824.
As with any new technology, next generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing.
Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification.
The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model, based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to that data.
We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error.
Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs.
Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have lower power than the corresponding single variant simulation results, most probably due to our specification of multi-variant SNP correlation values.
In conclusion, our LTTae,NGS addresses two key challenges with NGS disease studies; first, it allows for differential misclassification when computing the statistic; and second, it addresses the multiple-testing issue in that there is a multi-variant form of the statistic that has only one degree of freedom, and provides a single p-value, no matter how many loci.
PMCID: PMC3863939  PMID: 23594495
next gen; rare variant; trend test; genetic association; GWAS; allele; locus
4.  Use of cetrorelix in the investigation of testosterone excess in a postmenopausal woman 
BMJ Case Reports  2011;2011:bcr0120113730.
A 65-year-old woman presented to the endocrine clinic with increasing facial hirsutism over the past 6 months. She was noted to have excess hair on forearms, back and abdomen, along with some frontal balding. There were no abnormalities of the external genitalia, blood pressure was satisfactory and weight was stable. Biochemistry confirmed elevated testosterone (4.1 nmol/l). No abnormalities were seen on CT of abdomen and pelvis, nor by transvaginal ultrasound of the ovaries. Six months after her initial clinic visit, testosterone had increased to 6.0 nmol/l, rising to 7.3 nmol/l a few months later. Testosterone failed to suppress to low-dose dexamethasone suggesting excessive adrenal production was unlikely. Urine steroid profiling revealed no abnormality of adrenal steroid metabolites. Testosterone suppression was achieved with a rapidly-acting luteinizing-hormone-releasing hormone antagonist (cetrorelix), suggesting an ovarian source of excess production. Histology following bilateral salpingo-oophorectomy revealed a benign 6 mm diameter Leydig cell tumour in the right ovary.
PMCID: PMC3083010  PMID: 22696667
5.  IL-18R1 and IL-18RAP SNPs may associate with Bronchopulmonary Dysplasia in African American infants 
Pediatric research  2012;71(1):107-114.
The genetic contribution to the development of bronchopulmonary dysplasia (BPD) in prematurely born infants is substantial, but information related to the specific genes involved is lacking. We conducted a case-control single nucleotide polymorphism (SNP) association study of candidate genes (n=601) or 6,324 SNPs in 1,091 prematurely born infants with gestational age <35 weeks, with or without neonatal lung disease including BPD. BPD was defined as need for oxygen at 28 days. Genotype analysis revealed, after multiple comparisons correction, two significant SNPs, rs3771150 (IL-18RAP) and rs3771171 (IL-18R1), in African Americans (AA) with BPD (vs. AA without BPD; q<0.05). No associations with Caucasian (CA) BPD, AA or CA RDS, or prematurity in either AA or CA, were identified with these SNPs. Respective frequencies were 0.098 and 0.093 without BPD and 0.38 for each SNP in infants with BPD. In the replication set (82 cases; 102 controls), the p-values were 0.012 for rs3771150 and 0.07 for rs3771171. Combining p-values using Fisher's method, overall p-values were 8.31E-07 for rs3771150, and 6.33E-06 for rs3771171. We conclude, IL-18RAP and IL-18R1 SNPs identify AA infants at risk for BPD. These genes may contribute to AA BPD pathogenesis via inflammatory-mediated processes and require further study.
PMCID: PMC3610412  PMID: 22289858
6.  Genome-wide linkage analysis of an autosomal recessive hypotrichosis identifies a novel P2RY5 mutation 
Genomics  2008;92(5):273-278.
While there have been significant advances in understanding the genetic etiology of human hair loss over the previous decade, there remain a number of hereditary disorders for which a causative gene has yet to be identified. We studied a large, consanguineous Brazilian family that presented with sparse woolly hair at birth that progressed to severe hypotrichosis by the age of 5, in which 6 of the 14 offspring were affected. After exclusion of known candidate genes, a genome-wide scan was performed to identify the disease locus. Autozygosity mapping revealed a highly significant region of extended homozygosity (LOD score of 10.41) that contained a haplotype with a linkage LOD score of 3.28. Results of these two methods defined a 9 Mb region on chromosome 13q14.11-q14.2. The interval contains the P2RY5 gene, in which we recently identified pathogenic mutations in several families of Pakistani origin affected with autosomal recessive woolly and sparse hair. After the exclusion of several other candidate genes, we sequenced the P2RY5 gene and identified a homozygous mutation (C278Y) in all affected individuals in this family. Our findings show that mutations in P2RY5 display variable expressivity, underlying both hypotrichosis and woolly hair, and underscore the essential role of P2RY5 in the tissue integrity and the maintenance of the hair follicle.
PMCID: PMC3341170  PMID: 18692127
P2RY5; G-protein coupled receptor; autosomal recessive hypotrichosis; autosomal recessive woolly hair; variable expressivity
7.  Genome-wide association studies of adolescent idiopathic scoliosis suggest candidate susceptibility genes 
Human Molecular Genetics  2011;20(7):1456-1466.
Adolescent idiopathic scoliosis (AIS) is an unexplained and common spinal deformity seen in otherwise healthy children. Its pathophysiology is poorly understood despite intensive investigation. Although genetic underpinnings are clear, replicated susceptibility loci that could provide insight into etiology have not been forthcoming. To address these issues, we performed genome-wide association studies (GWAS) of ∼327 000 single nucleotide polymorphisms (SNPs) in 419 AIS families. We found strongest evidence of association with chromosome 3p26.3 SNPs in the proximity of the CHL1 gene (P < 8 × 10−8 for rs1400180). We genotyped additional chromosome 3p26.3 SNPs and tested replication in two follow-up case–control cohorts, obtaining strongest results when all three cohorts were combined (rs10510181 odds ratio = 1.49, 95% confidence interval = 1.29–1.73, P = 2.58 × 10−8), but these were not confirmed in a separate GWAS. CHL1 is of interest, as it encodes an axon guidance protein related to Robo3. Mutations in the Robo3 protein cause horizontal gaze palsy with progressive scoliosis (HGPPS), a rare disease marked by severe scoliosis. Other top associations in our GWAS were with SNPs in the DSCAM gene encoding an axon guidance protein in the same structural class with Chl1 and Robo3. We additionally found AIS associations with loci in CNTNAP2, supporting a previous study linking this gene with AIS. Cntnap2 is also of functional interest, as it interacts directly with L1 and Robo class proteins and participates in axon pathfinding. Our results suggest the relevance of axon guidance pathways in AIS susceptibility, although these findings require further study, particularly given the apparent genetic heterogeneity in this disease.
PMCID: PMC3049353  PMID: 21216876
8.  TDT-HET: A new transmission disequilibrium test that incorporates locus heterogeneity into the analysis of family-based association data 
BMC Bioinformatics  2012;13:13.
Locus heterogeneity is one of the most documented phenomena in genetics. To date, relatively little work had been done on the development of methods to address locus heterogeneity in genetic association analysis. Motivated by Zhou and Pan's work, we present a mixture model of linked and unlinked trios and develop a statistical method to estimate the probability that a heterozygous parent transmits the disease allele at a di-allelic locus, and the probability that any trio is in the linked group. The purpose here is the development of a test that extends the classic transmission disequilibrium test (TDT) to one that accounts for locus heterogeneity.
Our simulations suggest that, for sufficiently large sample size (1000 trios) our method has good power to detect association even the proportion of unlinked trios is high (75%). While the median difference (TDT-HET empirical power - TDT empirical power) is approximately 0 for all MOI, there are parameter settings for which the power difference can be substantial. Our multi-locus simulations suggest that our method has good power to detect association as long as the markers are reasonably well-correlated and the genotype relative risk are larger. Results of both single-locus and multi-locus simulations suggest our method maintains the correct type I error rate.
Finally, the TDT-HET statistic shows highly significant p-values for most of the idiopathic scoliosis candidate loci, and for some loci, the estimated proportion of unlinked trios approaches or exceeds 50%, suggesting the presence of locus heterogeneity.
We have developed an extension of the TDT statistic (TDT-HET) that allows for locus heterogeneity among coded trios. Benefits of our method include: estimates of parameters in the presence of heterogeneity, and reasonable power even when the proportion of linked trios is small. Also, we have extended multi-locus methods to TDT-HET and have demonstrated that the empirical power may be high to detect linkage. Last, given that we obtain PPBs, we conjecture that the TDT-HET may be a useful method for correctly identifying linked trios. We anticipate that researchers will find this property increasingly useful as they apply next-generation sequencing data in family based studies.
PMCID: PMC3292499  PMID: 22264315
9.  Catechol-O-Methyltransferase (COMT) Gene Variants: Possible Association of the Val158Met Variant With Opiate Addiction in Hispanic Women 
Catechol-O-methyltransferase (COMT) catalyzes the breakdown of catechol neurotransmitters, including dopamine, which plays a prominent role in drug reward. A common single nucleotide polymorphism (SNP), G472A, codes for a Val158Met substitution and results in a fourfold down regulation of enzyme activity. We sequenced exon IV of COMT gene in search for novel polymorphisms and then genotyped four out of five identified by direct sequencing, using TaqMan assay on 266 opioid-dependent and 173 control subjects. Genotype frequencies of the G472A SNP varied significantly (P = 0.029) among the three main ethnic/cultural groups (Caucasians, Hispanics, and African Americans). Using a genotype test, we found a trend to point-wise association (P = 0.053) of the G472A SNP in Hispanic subjects with opiate addiction. Further analysis of G472A genotypes in Hispanic subjects with data stratified by gender identified a point-wise significant (P = 0.049) association of G/A and A/A genotypes with opiate addiction in women, but not men. These point-wise significant results are not significant experiment-wise (at P <0.05) after correction for multiple testing. No significant association was found with haplotypes of the three most common SNPs. Linkage disequilibrium patterns were similar for the three ethnic/cultural groups.
PMCID: PMC2909109  PMID: 18270997
dopamine metabolism; gene variant; polymorphism; gender; heroin addiction
10.  Growth mixture modeling as an exploratory analysis tool in longitudinal quantitative trait loci analysis 
BMC Proceedings  2009;3(Suppl 7):S112.
We examined the properties of growth mixture modeling in finding longitudinal quantitative trait loci in a genome-wide association study. Two software packages are commonly used in these analyses: Mplus and the SAS TRAJ procedure. We analyzed the 200 replicates of the simulated data with these programs using three tests: the likelihood-ratio test statistic, a direct test of genetic model coefficients, and the chi-square test classifying subjects based on the trajectory model's posterior Bayesian probability. The Mplus program was not effective in this application due to its computational demands. The distributions of these tests applied to genes not related to the trait were sensitive to departures from Hardy-Weinberg equilibrium. The likelihood-ratio test statistic was not usable in this application because its distribution was far from the expected asymptotic distributions when applied to markers with no genetic relation to the quantitative trait. The other two tests were satisfactory. Power was still substantial when we used markers near the gene rather than the gene itself. That is, growth mixture modeling may be useful in genome-wide association studies. For markers near the actual gene, there was somewhat greater power for the direct test of the coefficients and lesser power for the posterior Bayesian probability chi-square test.
PMCID: PMC2795884  PMID: 20017977
11.  Telomerase Levels in Schizophrenia: A Preliminary Study 
Schizophrenia research  2008;106(2-3):242-247.
We previously demonstrated that telomere length was markedly reduced in peripheral blood lymphocytes from individuals with schizophrenia. Since reduced telomere length can be caused by decreased telomerase activity, we quantitated basal telomerase activity in peripheral blood lymphocytes derived from individuals with schizophrenia (n=53), unaffected relatives (n=31) and unrelated controls (n=59). Telomerase activity varied greatly among individuals, suggesting that this enzymatic activity is affected by various factors. We observed a nominally significant decrease in telomerase activity among individuals with schizophrenia compared to unaffected individuals (unaffected relatives and unrelated controls). Further studies are needed to investigate the role of telomerase in schizophrenia.
PMCID: PMC2613190  PMID: 18829263
lymphocyte; immune; telomere length; premature aging
12.  The cost effectiveness of duplicate genotyping for testing genetic association 
Annals of human genetics  2009;73(Pt 3):370-378.
We consider a modification to the traditional genome wide association (GWA) study design: duplicate genotyping. Duplicate genotyping (re-genotyping some of the samples) has long been suggested for quality control reasons, however has not been evaluated for its statistical cost-effectiveness. We demonstrate that when genotyping error rates are at least m%, duplicate genotyping provides a cost-effective (more statistical power for the same price) design alternative when relative genotype to phenotype/sample acquisition costs are no more than m%. In addition to cost and error rate, duplicate genotyping is most cost-effective for SNPs with low minor allele frequency. In general, relative genotype to phenotype/sample acquisition costs will be low when following up a limited number of SNPs in the second stage of a two-stage GWA study design, and, thus, duplicate genotyping may be useful in these situations. In cases where many SNPs are being followed up at the second stage, duplicate genotyping only low-quality SNPs with low minor allele frequency may be cost-effective. We also find that in almost all cases where duplicate genotyping is cost-effective, the most cost-effective design strategy involves duplicate genotyping all samples. Free software is provided which evaluates the cost-effectiveness of duplicate genotyping based on user inputs.
PMCID: PMC2739690  PMID: 19344449
Genotype Error; Genome-wide Association Study; Re-genotype; Two-Stage; Power; Duplicate genotype
13.  Computing power of quantitative trait locus association mapping for haploid loci 
BMC Bioinformatics  2009;10:261.
Statistical power calculations are a critical part of any study design for gene mapping. Most calculations assume that the locus of interest is biallelic. However, there are common situations in human genetics such as X-linked loci in males where the locus is haploid. The purpose of this work is to mathematically derive the biometric model for haploid loci, and to compute power for QTL mapping when the loci are haploid.
We have derived the biometric model for power calculations for haploid loci and have developed software to perform these calculations. We have verified our calculations with independent mathematical methods.
Our results fill a need in power calculations for QTL mapping studies. Furthermore, failure to appropriately model haploid loci may cause underestimation of power.
PMCID: PMC2738682  PMID: 19698182
14.  Telemedicine and Trauma Referrals — a Plastic Surgery Pilot Project 
The Ulster Medical Journal  2009;78(2):113-114.
A pilot study of the use of digital images as an adjunct to telephone referral was undertaken. Hand trauma represented the majority of the twenty patients included in the study, and the system was found to be an effective aid to delivering appropriate management. We have found image analysis to be a useful addition to the telephone referral process already in use in our unit, but it is unlikely to replace the need for real time clinical assessment of the patient.
PMCID: PMC2699198  PMID: 19568447
15.  Computing Power and Sample Size for Case-Control Association Studies with Copy Number Polymorphism: Application of Mixture-Based Likelihood Ratio Test 
PLoS ONE  2008;3(10):e3475.
Recent studies suggest that copy number polymorphisms (CNPs) may play an important role in disease susceptibility and onset. Currently, the detection of CNPs mainly depends on microarray technology. For case-control studies, conventionally, subjects are assigned to a specific CNP category based on the continuous quantitative measure produced by microarray experiments, and cases and controls are then compared using a chi-square test of independence. The purpose of this work is to specify the likelihood ratio test statistic (LRTS) for case-control sampling design based on the underlying continuous quantitative measurement, and to assess its power and relative efficiency (as compared to the chi-square test of independence on CNP counts). The sample size and power formulas of both methods are given. For the latter, the CNPs are classified using the Bayesian classification rule. The LRTS is more powerful than this chi-square test for the alternatives considered, especially alternatives in which the at-risk CNP categories have low frequencies. An example of the application of the LRTS is given for a comparison of CNP distributions in individuals of Caucasian or Taiwanese ethnicity, where the LRTS appears to be more powerful than the chi-square test, possibly due to misclassification of the most common CNP category into a less common category.
PMCID: PMC2566806  PMID: 18941524
16.  The prevention of progression of arterial disease and diabetes (POPADAD) trial: factorial randomised placebo controlled trial of aspirin and antioxidants in patients with diabetes and asymptomatic peripheral arterial disease 
Objective To determine whether aspirin and antioxidant therapy, combined or alone, are more effective than placebo in reducing the development of cardiovascular events in patients with diabetes mellitus and asymptomatic peripheral arterial disease.
Design Multicentre, randomised, double blind, 2×2 factorial, placebo controlled trial.
Setting 16 hospital centres in Scotland, supported by 188 primary care groups.
Participants 1276 adults aged 40 or more with type 1 or type 2 diabetes and an ankle brachial pressure index of 0.99 or less but no symptomatic cardiovascular disease.
Interventions Daily, 100 mg aspirin tablet plus antioxidant capsule (n=320), aspirin tablet plus placebo capsule (n=318), placebo tablet plus antioxidant capsule (n=320), or placebo tablet plus placebo capsule (n=318).
Main outcome measures Two hierarchical composite primary end points of death from coronary heart disease or stroke, non-fatal myocardial infarction or stroke, or amputation above the ankle for critical limb ischaemia; and death from coronary heart disease or stroke.
Results No evidence was found of any interaction between aspirin and antioxidant. Overall, 116 of 638 primary events occurred in the aspirin groups compared with 117 of 638 in the no aspirin groups (18.2% v 18.3%): hazard ratio 0.98 (95% confidence interval 0.76 to 1.26). Forty three deaths from coronary heart disease or stroke occurred in the aspirin groups compared with 35 in the no aspirin groups (6.7% v 5.5%): 1.23 (0.79 to 1.93). Among the antioxidant groups 117 of 640 (18.3%) primary events occurred compared with 116 of 636 (18.2%) in the no antioxidant groups (1.03, 0.79 to 1.33). Forty two (6.6%) deaths from coronary heart disease or stroke occurred in the antioxidant groups compared with 36 (5.7%) in the no antioxidant groups (1.21, 0.78 to 1.89).
Conclusion This trial does not provide evidence to support the use of aspirin or antioxidants in primary prevention of cardiovascular events and mortality in the population with diabetes studied.
Trial registration Current Controlled Trials ISRCTN53295293.
PMCID: PMC2658865  PMID: 18927173
17.  Understanding Genetic Factors in Idiopathic Scoliosis, a Complex Disease of Childhood 
Current Genomics  2008;9(1):51-59.
Idiopathic scoliosis (AIS) is the most common pediatric spinal deformity, affecting ~3% of children worldwide. AIS significantly impacts national health in the U. S. alone, creating disfigurement and disability for over 10% of patients and costing billions of dollars annually for treatment. Despite many investigations, the underlying etiology of IS is poorly understood. Twin studies and observations of familial aggregation reveal significant genetic contributions to IS. Several features of the disease including potentially strong genetic effects, the early onset of disease, and standardized diagnostic criteria make IS ideal for genomic approaches to finding risk factors. Here we comprehensively review the genetic contributions to IS and compare those findings to other well-described complex diseases such as Crohn’s disease, type 1 diabetes, psoriasis, and rheumatoid arthritis. We also summarize candidate gene studies and evaluate them in the context of possible disease aetiology. Finally, we provide study designs that apply emerging genomic technologies to this disease. Existing genetic data provide testable hypotheses regarding IS etiology, and also provide proof of principle for applying high-density genome-wide methods to finding susceptibility genes and disease modifiers.
PMCID: PMC2674301  PMID: 19424484
Scoliosis; genetics; inheritance; genome-wide association.
18.  Fine mapping of a gene responsible for regulating dietary cholesterol absorption; founder effects underlie cases of phytosterolaemia in multiple communities 
Sitosterolaemia (also known as phytosterolaemia, MIM 210250) is a rare recessive autosomal inherited disorder, characterised by the presence of tendon and tuberous xanthomas, accelerated atherosclerosis and premature coronary artery disease. The defective gene is hypothesised to play an important role in regulating dietary sterol absorption and biliary secretion, thus defining a molecular mechanism whereby this physiological process is carried out. The disease locus was localised previously to chromosome 2p21, in a 15 cM interval between microsatellite markers D2S1788 and D2S1352 (based upon 10 families, maximum lodscore 4.49). In this study, we have extended these studies to include 30 families assembled from around the world. A maximum multipoint lodscore of 11.49 was obtained for marker D2S2998. Homozygosity and haplotype sharing was identified in probands from non-consanguineous marriages from a number of families, strongly supporting the existence of a founder effect among various populations. Additionally, based upon both genealogies, as well as genotyping, two Amish/Mennonite families, that were previously thought not to be related, appear to indicate a founder effect in this population as well. Using both homozygosity mapping, as well as informative recombination events, the sitosterolaemia gene is located at a region defined by markers D2S2294 and Afm210xe9, a distance of less than 2 cM.
PMCID: PMC1525333  PMID: 11378826
genetics; homozygosity mapping; linkage; cholesterol; atherosclerosis; diet
19.  A transmission disequilibrium test for general pedigrees that is robust to the presence of random genotyping errors and any number of untyped parents 
Two issues regarding the robustness of the original transmission disequilibrium test (TDT) developed by Spielman et al are: (i) missing parental genotype data and (ii) the presence of undetected genotype errors. While extensions of the TDT that are robust to items (i) and (ii) have been developed, there is to date no single TDT statistic that is robust to both for general pedigrees. We present here a likelihood method, the TDTae, which is robust to these issues in general pedigrees. The TDTae assumes a more general disease model than the traditional TDT, which assumes a multiplicative inheritance model for genotypic relative risk. Our model is based on Weinberg’s work. To assess robustness, we perform simulations. Also, we apply our method to two data sets from actual diseases: psoriasis and sitosterolemia. Maximization under alternative and null hypotheses is performed using Powell’s method. Results of our simulations indicate that our method maintains correct type I error rates at the 1, 5, and 10% levels of significance. Furthermore, a Kolmorogov–Smirnoff Goodness of Fit test suggests that the data are drawn from a central χ2 with 2 df, the correct asymptotic null distribution. The psoriasis results suggest two loci as being significantly linked to the disease, even in the presence of genotyping errors and missing data, and the sitosterolemia results show a P-value of 1.5 × 10−9 for the marker locus nearest to the sitosterolemia disease genes. We have developed software to perform TDTae calculations, which may be accessed from our ftp site.
PMCID: PMC1356564  PMID: 15162128
Keywords: misclassification; genetics; statistics
20.  Mixture modeling of microarray gene expression data 
BMC Proceedings  2007;1(Suppl 1):S50.
About 28% of genes appear to have an expression pattern that follows a mixture distribution. We use first- and second-order partial correlation coefficients to identify trios and quartets of non-sex-linked genes that are highly associated and that are also mixtures. We identified 18 trio and 35 quartet mixtures and evaluated their mixture distribution concordance. Concordance was defined as the proportion of observations that simultaneously fall in the component with the higher mean or simultaneously in the component with the lower mean based on their Bayesian posterior probabilities. These trios and quartets have a concordance rate greater than 80%. There are 33 genes involved in these trios and quartets. A factor analysis with varimax rotation identifies three gene groups based on their factor loadings. One group of 18 genes has a concordance rate of 56.7%, another group of 8 genes has a concordance rate of 60.8%, and a third group of 7 genes has a concordance rate of 69.6%. Each of these rates is highly significant, suggesting that there may be strong biological underpinnings for the mixture mechanisms of these genes. Bayesian factor screening confirms this hypothesis by identifying six single-nucleotide polymorphisms that are significantly associated with the expression phenotypes of the five most concordant genes in the first group.
PMCID: PMC2367561  PMID: 18466550
21.  Incorporation of genetic model parameters for cost-effective designs of genetic association studies using DNA pooling 
BMC Genomics  2007;8:238.
Studies of association methods using DNA pooling of single nucleotide polymorphisms (SNPs) have focused primarily on the effects of "machine-error", number of replicates, and the size of the pool. We use the non-centrality parameter (NCP) for the analysis of variance test to compute the approximate power for genetic association tests with DNA pooling data on cases and controls. We incorporate genetic model parameters into the computation of the NCP. Parameters involved in the power calculation are disease allele frequency, frequency of the marker SNP allele in coupling with the disease locus, disease prevalence, genotype relative risk, sample size, genetic model, number of pools, number of replicates of each pool, and the proportion of variance of the pooled frequency estimate due to machine variability. We compute power for different settings of number of replicates and total number of genotypings when the genetic model parameters are fixed. Several significance levels are considered, including stringent significance levels (due to the increasing popularity of 100 K and 500 K SNP "chip" data). We use a factorial design with two to four settings of each parameter and multiple regression analysis to assess which parameters most significantly affect power.
The power can increase substantially as the genotyping number increases. For a fixed number of genotypings, the power is a function of the number of replicates of each pool such that there is a setting with maximum power. The four most significant parameters affecting power for association are: (1) genotype relative risk, (2) genetic model, (3) sample size, and (4) the interaction term between disease and SNP marker allele probabilities.
For a fixed number of genotypings, there is an optimal number of replicates of each pool that increases as the number of genotypings increases. Power is not substantially reduced when the number of replicates is close to but not equal to the optimal setting.
PMCID: PMC1947971  PMID: 17634103
22.  Are Molecular Haplotypes Worth the Time and Expense? A Cost-Effective Method for Applying Molecular Haplotypes 
PLoS Genetics  2006;2(8):e127.
Because current molecular haplotyping methods are expensive and not amenable to automation, many researchers rely on statistical methods to infer haplotype pairs from multilocus genotypes, and subsequently treat these inferred haplotype pairs as observations. These procedures are prone to haplotype misclassification. We examine the effect of these misclassification errors on the false-positive rate and power for two association tests. These tests include the standard likelihood ratio test (LRTstd) and a likelihood ratio test that employs a double-sampling approach to allow for the misclassification inherent in the haplotype inference procedure (LRTae). We aim to determine the cost–benefit relationship of increasing the proportion of individuals with molecular haplotype measurements in addition to genotypes to raise the power gain of the LRTae over the LRTstd. This analysis should provide a guideline for determining the minimum number of molecular haplotypes required for desired power. Our simulations under the null hypothesis of equal haplotype frequencies in cases and controls indicate that (1) for each statistic, permutation methods maintain the correct type I error; (2) specific multilocus genotypes that are misclassified as the incorrect haplotype pair are consistently misclassified throughout each entire dataset; and (3) our simulations under the alternative hypothesis showed a significant power gain for the LRTae over the LRTstd for a subset of the parameter settings. Permutation methods should be used exclusively to determine significance for each statistic. For fixed cost, the power gain of the LRTae over the LRTstd varied depending on the relative costs of genotyping, molecular haplotyping, and phenotyping. The LRTae showed the greatest benefit over the LRTstd when the cost of phenotyping was very high relative to the cost of genotyping. This situation is likely to occur in a replication study as opposed to a whole-genome association study.
Localizing genes for complex genetic diseases presents a major challenge. Recent technological advances such as genotyping arrays containing hundreds of thousands of genomic “landmarks,” and databases cataloging these “landmarks” and the levels of correlation between them, have aided in these endeavors. To utilize these resources most effectively, many researchers employ a gene-mapping technique called haplotype-based association in order to examine the variation present at multiple genomic sites jointly for a role in and/or an association with the disease state. Although methods that determine haplotype pairs directly by biological assays are currently available, they rarely are used due to their expense and incongruity to automation. Statistical methods provide an inexpensive, relatively accurate means to determine haplotype pairs. However, these statistical methods can provide erroneous results. In this article, the authors compare a standard statistical method for performing a haplotype-based association test with a method that accounts for the misclassification of haplotype pairs as part of the test. Under a number of feasible scenarios, the performance of the new test exceeded that of the standard test.
PMCID: PMC1550282  PMID: 16933998
23.  LRTae: improving statistical power for genetic association with case/control data when phenotype and/or genotype misclassification errors are present 
BMC Genetics  2006;7:24.
In the field of statistical genetics, phenotype and genotype misclassification errors can substantially reduce power to detect association with genetic case/control studies. Misclassification also can bias population frequency parameters such as genotype, haplotype, or multi-locus genotype frequencies. These problems are of particular concern in case/control designs because, short of repeated sampling, there is no way to detect misclassification errors.
We developed a double-sampling procedure for case/control genetic association using a likelihood ratio test framework. Different approaches have been proposed to deal with misclassification errors. We have chosen the likelihood framework because of the ease with which misclassification probabilities may be incorporated into in the statistical framework and hypothesis testing. The statistic is called the Likelihood Ratio Test allowing for errors (LRTae) and is freely available via software download.
We applied our procedure to 10,000 replicates of simulated case/control data in which we introduced phenotype misclassification errors. The phenotype considered is Ankylosing Spondylitis (AS). The LRTae method power was always greater than LRTstd power for the significance levels considered (5%, 1%, 0.1%, 0.01%). Power gains for the LRTae method over the LRTstd method increased as the significance level became more stringent. Multi-locus genotype frequency estimates using LRTae method were more accurate than estimates using LRTstd method.
The LRTae method can be applied to single-locus genotypes, multi-locus genotypes, or multi-locus haplotypes in a case/control framework and can be more powerful to detect association in case/control studies when both genotype and/or phenotype errors are present. Furthermore, the LRTae method provides asymptotically unbiased estimates of case and control genotype frequencies, as well as estimates of phenotype and/or genotype misclassification rates.
PMCID: PMC1471798  PMID: 16689984
24.  A detailed Hapmap of the Sitosterolemia locus spanning 69 kb; differences between Caucasians and African-Americans 
BMC Medical Genetics  2006;7:13.
Sitosterolemia is an autosomal recessive disorder that maps to the sitosterolemia locus, STSL, on human chromosome 2p21. Two genes, ABCG5 and ABCG8, comprise the STSL and mutations in either cause sitosterolemia. ABCG5 and ABCG8 are thought to have evolved by gene duplication event and are arranged in a head-to-head configuration. We report here a detailed characterization of the STSL in Caucasian and African-American cohorts.
Caucasian and African-American DNA samples were genotypes for polymorphisms at the STSL locus and haplotype structures determined for this locus
In the Caucasian population, 13 variant single nucleotide polymorphisms (SNPs) were identified and resulting in 24 different haplotypes, compared to 11 SNPs in African-Americans resulting in 40 haplotypes. Three polymorphisms in ABCG8 were unique to the Caucasian population (E238L, INT10-50 and G575R), whereas one variant (A259V) was unique to the African-American population. Allele frequencies of SNPs varied also between these populations.
We confirmed that despite their close proximity to each other, significantly more variations are present in ABCG8 compared to ABCG5. Pairwise D' values showed wide ranges of variation, indicating some of the SNPs were in strong linkage disequilibrium (LD) and some were not. LD was more prevalent in Caucasians than in African-Americans, as would be expected. These data will be useful in analyzing the proposed role of STSL in processes ranging from responsiveness to cholesterol-lowering drugs to selective sterol absorption.
PMCID: PMC1413519  PMID: 16507104
25.  Precision and type I error rate in the presence of genotype errors and missing parental data: a comparison between the original transmission disequilibrium test (TDT) and TDTae statistics 
BMC Genetics  2005;6(Suppl 1):S150.
Two factors impacting robustness of the original transmission disequilibrium test (TDT) are: i) missing parental genotypes and ii) undetected genotype errors. While it is known that independently these factors can inflate false-positive rates for the original TDT, no study has considered either the joint impact of these factors on false-positive rates or the precision score of TDT statistics regarding these factors. By precision score, we mean the absolute difference between disease gene position and the position of markers whose TDT statistic exceeds some threshold.
We apply our transmission disequilibrium test allowing for errors (TDTae) and the original TDT to phenotype and modified single-nucleotide polymorphism genotype simulation data from Genetic Analysis Workshop. We modify genotype data by randomly introducing genotype errors and removing a percentage of parental genotype data. We compute empirical distributions of each statistic's precision score for a chromosome harboring a simulated disease locus. We also consider inflation in type I error by studying markers on a chromosome harboring no disease locus.
The TDTae shows median precision scores of approximately 13 cM, 2 cM, 0 cM, and 0 cM at the 5%, 1%, 0.1%, and 0.01% significance levels, respectively. By contrast, the original TDT shows median precision scores of approximately 23 cM, 21 cM, 15 cM, and 7 cM at the corresponding significance levels, respectively. For null chromosomes, the original TDT falsely rejects the null hypothesis for 28.8%, 14.8%, 5.4%, and 1.7% at the 5%, 1%, 0.1% and 0.01%, significance levels, respectively, while TDTae maintains the correct false-positive rate.
Because missing parental genotypes and undetected genotype errors are unknown to the investigator, but are expected to be increasingly prevalent in multilocus datasets, we strongly recommend TDTae methods as a standard procedure, particularly where stricter significance levels are required.
PMCID: PMC1866784  PMID: 16451611

Results 1-25 (30)