Objectives
There is increasing evidence that rare variants play a role in some complex traits, but their analysis is not straightforward. Locus-based tests become necessary due to low power in rare variant single-point association analyses. In addition, variant quality scores are available for sequencing data, but are rarely taken into account. Here, we propose two locus-based methods that incorporate variant quality scores: a regression-based collapsing approach and an allele-matching method.
Methods
Using simulated sequencing data we compare 4 locus-based tests of trait association under different scenarios of data quality. We test two collapsing-based approaches and two allele-matching-based approaches, taking into account variant quality scores and ignoring variant quality scores. We implement the collapsing and allele-matching approaches accounting for variant quality in the freely available ARIEL and AMELIA software.
Results
The incorporation of variant quality scores in locus-based association tests has power advantages over weighting each variant equally. The allele-matching methods are robust to the presence of both protective and risk variants in a locus, while collapsing methods exhibit a dramatic loss of power in this scenario.
Conclusions
The incorporation of variant quality scores should be a standard protocol when performing locus-based association analysis on sequencing data. The ARIEL and AMELIA software implement collapsing and allele-matching locus association analysis methods, respectively, that allow the incorporation of variant quality scores.
doi:10.1159/000336982
PMCID: PMC3477640
PMID: 22441326
Whole-genome sequencing; Exome sequencing; Association analysis; Accounting for uncertainty; Complex trait
Aims
Next-generation sequencing has opened the possibility of large-scale sequence-based disease association studies. A major challenge in interpreting whole-exome data is predicting which of the discovered variants are deleterious or neutral. To address this question in silico, we have developed a score called Combined Annotation scoRing toOL (CAROL), which combines information from 2 bioinformatics tools: PolyPhen-2 and SIFT, in order to improve the prediction of the effect of non-synonymous coding variants.
Methods
We used a weighted Z method that combines the probabilistic scores of PolyPhen-2 and SIFT. We defined 2 dataset pairs to train and test CAROL using information from the db-SNP: ‘HGMD-PUBLIC’ and 1000 Genomes Project databases. The training pair comprises a total of 980 positive control (disease-causing) and 4,845 negative control (non-disease-causing) variants. The test pair consists of 1,959 positive and 9,691 negative controls.
Results
CAROL has higher predictive power and accuracy for the effect of non-synonymous variants than each individual annotation tool (PolyPhen-2 and SIFT) and benefits from higher coverage.
Conclusion
The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.
doi:10.1159/000334984
PMCID: PMC3390741
PMID: 22261837
CAROL; PolyPhen-2; SIFT; Weighted Z method
Objectives
Family-based association tests such as the transmission disequilibrium test (TDT) are dependent on the successful ascertainment of true nuclear family trios. Relationship misspecification inevitably occurs in a proportion of trios collected for genotyping which undetected can lead to a loss of power and increased Type I error due to biases in over-transmission of common alleles. Here, we introduce a method for evaluating the authenticity of nuclear family trios.
Methods
Operating in a Bayesian framework, our approach assesses the extent of pedigree inconsistent genotype configurations in the presence of genotyping errors. Unlike other approaches, our method: (i) utilizes information from three individuals collectively (the whole trio) rather than consider two independent pairwise relationships; (ii) down-weighs SNPs with poor performance; (iii) does not require the user to pre-define a rate of genotyping error, which is often unknown to the user and seldom fixed across the different SNPs considered which available methods unrealistically assumed.
Results
Simulation studies and comparisons with a real set of data showed that our approach is more likely to correctly identify the presence of true and misspecified trios compared to available software, accurately infers the extent of relationship misspecification in a trio and accurately estimates the genotyping error rates.
Conclusions
Assessing relationship misspecification depends on the fidelity of the genotype data used. Available algorithms are not optimised for genotyping technology with varying rates of errors across the markers. Through our comparison studies, our approach is shown to outperform available methods for assessing relationship misspecifications.
doi:10.1159/000164396
PMCID: PMC3000594
PMID: 18931507
relationship misspecification; pedigree inconsistency; genotyping error
Specific language impairment is a neurodevelopmental disorder characterized by impairments essentially restricted to the domain of language and language learning skills. This contrasts with autism, which is a pervasive developmental disorder defined by multiple impairments in language, social reciprocity, narrow interests and/or repetitive behaviors. Genetic linkage studies and family data suggest that the two disorders may have genetic components in common. Two samples, from Canada and the US, selected for specific language impairment were genotyped at loci where such common genes are likely to reside. Significant evidence for linkage was previously observed at chromosome 13q21 in our Canadian sample (HLOD 3.56) and was confirmed in our US sample (HLOD 2.61). Using the posterior probability of linkage (PPL) to combine evidence for linkage across the two samples yielded a PPL over 92%. Two additional loci on chromosome 2 and 7 showed weak evidence for linkage. However, a marker in the cystic fibrosis transmembrane conductance regulator (7q31) showed evidence for association to SLI, confirming results from another group (O’ Brien et al. 2003). Our results indicate that using samples selected for components of the autism phenotype may be a useful adjunct to autism genetics.
doi:10.1159/000077385
PMCID: PMC2976973
PMID: 15133308
Autism; Language impairment; Multiple data sets; Heterogeneity; Linkage analysis
Shtir, Corina | Nagakawa, I. Sharon | Duren, William L. | Conneely, Karen N. | Scott, Laura J. | Silander, Kaisa | Valle, Timo T. | Tuomilehto, Jaakko | Buchanan, Thomas A. | Bergman, Richard N. | Collins, Francis S. | Boehnke, Michael | Watanabe, Richard M.
Objectives
The purpose of this study was to examine carefully heterogeneity underlying evidence for linkage to type 2 diabetes (T2DM) on chromosome 6q from two sets of FUSION families.
Methods
Ordered subsets analysis (OSA) was performed on two sets of FUSION families. For OSA results showing significant improvement in evidence for linkage, T2DM-related phenotypes were compared between individuals with T2DM within the subset versus the complement.
Results
OSA analysis revealed 105 families with the highest average HDL to total cholesterol ratio (HDL ratio) that had strongly increased evidence for linkage (MLS = 7.91 at 78.0 cM; uncorrected p = 0.00002). Subjects with T2DM within this subset were significantly leaner, had lower fasting glucose, insulin, and C-peptide, and more favorable cardiovascular risk profile compared to the complement set of subjects with T2DM. OSA also revealed 33 families with the lowest average fasting insulin that had increased evidence for linkage at a second locus (MLS = 3.45 at 128 cM; uncorrected p = 0.017) coincident with quantitative trait locus linkage analysis results for fasting and 2-hour insulin in subjects without T2DM.
Conclusions
These results suggest two diabetes susceptibility loci on chromosome 6q that may affect subsets of individuals with a milder form of T2DM.
doi:10.1159/000097927
PMCID: PMC2923439
PMID: 17179727
Linkage analysis; Heterogeneity; Type 2 diabetes; HDL cholesterol; Ordered subsets analysis; Chromosome 6q
The genetic etiology for many forms of hearing impairment (HI) is very diverse. Non-syndromic HI (NSHI) is one of the most heterogeneous traits known. Autosomal recessive forms of prelingual HI account for ∼75% of hereditary cases. A novel autosomal recessive NSHI locus, DFNB44, was mapped to a 20.9 cM genetic interval on chromosome 7p14.1-q11.22, according to the Marshfield genetic map, in a consanguineous Pakistani family. Multipoint linkage analysis resulted in a maximum LOD score of 5.0 at marker D7S1818. The 3-unit support interval ranged from marker D7S2209 to marker D7S2435, spanning a 30.1 Mb region on the sequence-based physical map.
doi:10.1159/000081446
PMCID: PMC2920138
PMID: 15583425
7p14.1-q11.22; DFNB44; Non-syndromic hearing impairment; Pakistan
With the widespread availability of SNP genotype data, there is great interest in analyzing pedigree haplotype data. Intermarker linkage disequilibrium for micro-satellite markers is usually low due to their physical distance; however, for dense maps of SNP markers, there can be strong linkage disequilibrium between marker loci. Linkage analysis (parametric and nonparametric) and family-based association studies are currently being carried out using dense maps of SNP marker loci. Monte Carlo methods are often used for both linkage and association studies; however, to date there are no programs available which can generate haplotype and/or genotype data consisting of a large number of loci for pedigree structures. SimPed is a program that quickly generates haplotype and/or genotype data for pedigrees of virtually any size and complexity. Marker data either in linkage disequilibrium or equilibrium can be generated for greater than 20,000 diallelic or multiallelic marker loci. Haplotypes and/or genotypes are generated for pedigree structures using specified genetic map distances and haplotype and/or allele frequencies. The simulated data generated by SimPed is useful for a variety of purposes, including evaluating methods that estimate haplotype frequencies for pedigree data, evaluating type I error due to intermarker linkage disequilibrium and estimating empirical p values for linkage and family-based association studies.
doi:10.1159/000088914
PMCID: PMC2909095
PMID: 16224189
Simulation; Pedigree structure; Type I error; Empirical p values
For autosomal recessive nonsyndromic hearing impairment over 30 loci have been mapped and 19 genes have been identified. DFNB38, a novel locus for autosomal recessive nonsyndromic hearing impairment, was localized in a consanguineous Pakistani kindred to 6q26–q27. The affected family members present with profound prelingual sensorineural hearing impairment and use sign language for communications. Linkage was established to microsatellite markers located on chromosome 6q26–q27 (Multipoint lod score 3.6). The genetic region for DFNB38 spans 10.1 cM according to the Marshfield genetic map and is bounded by markers D6S980 and D6S1719. This genetic region corresponds to 3.4 MB on the sequence-based physical map.
doi:10.1159/000071813
PMCID: PMC2909108
PMID: 12890929
Autosomal recessive hearing impairment; DFNB38; Gene mapping; Pakistan; 6q26–q27
Objective(s)
An individual’s risk of developing cardiovascular disease (CVD) is influenced by genetic factors. This study focussed on mapping genetic loci for CVD-risk traits in a unique population isolate derived from Norfolk Island.
Methods
This investigation focussed on 377 individuals descended from the population founders. Principal components analysis was used to extract orthogonal components from 11 cardiovascular risk traits. Multipoint variance component methods were used to assess genome-wide linkage using SOLAR to the derived factors. A total of 285 of the 377 related individuals were informative for linkage analysis.
Results
A total of 4 principal components accounting for 83% of the total variance were derived. Principal component 1 was loaded with body size indicators; principal component 2 with body size, cholesterol and triglyceride levels; principal component 3 with the blood pressures; and principal component 4 with LDL-cholesterol and total cholesterol levels. Suggestive evidence of linkage for principal component 2 (h2=0.35) was observed on chromosome 5q35 (LOD=1.85; p=0.0008). While peak regions on chromosome 10p11.2 (LOD=1.27; p=0.005) and 12q13 (LOD=1.63; p=0.0054) were observed to segregate with principal components 1 (h2=0.33) and 4 (h2=0.42), respectively.
Conclusion(s)
This study investigated a number of CVD risk traits in a unique isolated population. Findings support the clustering of CVD risk traits and provide interesting evidence of a region on chromosome 5q35 segregating with weight, waist circumference, HDL-c and total triglyceride levels.
doi:10.1159/000210449
PMCID: PMC2880725
PMID: 19339786
Norfolk Island; population isolate; principal component; linkage analysis; 5q35; CVD
Marazita, Mary L. | Lidral, Andrew C. | Murray, Jeffrey C. | Field, L.Leigh | Maher, Brion S. | Goldstein McHenry, Toby | Cooper, Margaret E. | Govil, Manika | Daack-Hirsch, Sandra | Riley, Bridget | Jugessur, Astanand | Felix, Temis | Morene, Lina | Mansilla, M.Adela | Vieira, Alexandre R. | Doheny, Kim | Pugh, Elizabeth | Valencia-Ramirez, Consuelo | Arcos-Burgos, Mauricio
Objectives
Non-syndromic orofacial clefts, i.e. cleft lip (CL) and cleft palate (CP), are among the most common birth defects. The goal of this study was to identify genomic regions and genes for CL with or without CP (CL/P).
Methods
We performed linkage analyses of a 10 cM genome scan in 820 multiplex CL/P families (6,565 individuals). Significant linkage results were followed by association analyses of 1,476 SNPs in candidate genes and regions, utilizing a weighted false discovery rate (wFDR) approach to control for multiple testing and incorporate the genome scan results.
Results
Significant (multipoint HLOD ≥3.2) or genome-wide-significant (HLOD ≥4.02) linkage results were found for regions 1q32, 2p13, 3q27-28, 9q21, 12p11, 14q21-24 and 16q24. SNPs in IRF6 (1q32) and in or near FOXE1 (9q21) reached formal genome-wide wFDR-adjusted significance. Further, results were phenotype dependent in that the IRF6 region results were most significant for families in which affected individuals have CL alone, and the FOXE1 region results were most significant in families in which some or all of the affected individuals have CL with CP.
Conclusions
These results highlight the importance of careful phenotypic delineation in large samples of families for genetic analyses of complex, heterogeneous traits such as CL/P.
doi:10.1159/000224636
PMCID: PMC2709160
PMID: 19521098
Cleft lip; Cleft palate; Linkage; Association; wFDR; IRF6; FOXE1; Genetics
Background/Aims
With pedigree data, genetic linkage can be detected using inheritance vector tests, which explore the discrepancy between the posterior distribution of the inheritance vectors given observed trait values and the prior distribution of the inheritance vectors. In this paper, we propose conditional inheritance vector tests for linkage localization. These conditional tests can also be used to detect additional linkage signals in the presence of previously detected causal genes.
Methods
For linkage localization, we propose to perform inheritance vector tests conditioning on the inheritance vectors at two positions bounding a test region. We can detect additional linkage signals by conducting a further conditional test in a region with no previously detected genes. We use randomized p values to extend the marginal and conditional tests when the inheritance vectors cannot be completely determined from genetic marker data.
Results
We conduct simulation studies to compare and contrast the marginal and the conditional tests and to demonstrate that randomized p values can capture both the significance and the uncertainty in the test results.
Conclusions
The simulation results demonstrate that the proposed conditional tests provide useful localization information, and with informative marker data, the uncertainty in randomized marginal and conditional test results is small.
doi:10.1159/000218112
PMCID: PMC2711517
PMID: 19439976
Conditional test; Inheritance vector; Linkage; Localization; Pedigree; Randomized p value
Objectives
Structured association tests (SAT), like any statistical model, assumes that all variables are measured without error. Measurement error can bias parameter estimates and confound residual variance in linear models. It has been shown that admixture estimates can be contaminated with measurement error causing SAT models to suffer from the same afflictions. Multiple imputation (MI) is presented as a viable tool for correcting measurement error problems in SAT linear models with emphasis on correcting measurement error contaminated admixture estimates.
Methods
Several MI methods are presented and compared, via simulation, in terms of controlling Type I error rates for both non-additive and additive genotype coding.
Results
Results indicate that MI using the Rubin or Cole method can be used to correct for measurement error in admixture estimates in SAT linear models.
Conclusion
Although MI can be used to correct for admixture measurement error in SAT linear models, the data should be of reasonable quality, in terms of marker informativeness, because the method uses the existing data to borrow information in which to make the measurement error corrections. If the data are of poor quality there is little information to borrow to make measurement error corrections.
doi:10.1159/000210450
PMCID: PMC2716289
PMID: 19339787
Multiple imputation; Measurement error; Admixture; Ancestry; Structured association testing
Objective(s)
An individual's risk of developing cardiovascular disease (CVD) is influenced by genetic factors. This study focussed on mapping genetic loci for CVD-risk traits in a unique population isolate derived from Norfolk Island.
Methods
This investigation focussed on 377 individuals descended from the population founders. Principal component analysis was used to extract orthogonal components from 11 cardiovascular risk traits. Multipoint variance component methods were used to assess genome-wide linkage using SOLAR to the derived factors. A total of 285 of the 377 related individuals were informative for linkage analysis.
Results
A total of 4 principal components accounting for 83% of the total variance were derived. Principal component 1 was loaded with body size indicators; principal component 2 with body size, cholesterol and triglyceride levels; principal component 3 with the blood pressures; and principal component 4 with LDL-cholesterol and total cholesterol levels. Suggestive evidence of linkage for principal component 2 (h2 = 0.35) was observed on chromosome 5q35 (LOD = 1.85; p = 0.0008). While peak regions on chromosome 10p11.2 (LOD = 1.27; p = 0.005) and 12q13 (LOD = 1.63; p = 0.003) were observed to segregate with principal components 1 (h2 = 0.33) and 4 (h2 = 0.42), respectively. Conclusion(s): This study investigated a number of CVD risk traits in a unique isolated population. Findings support the clustering of CVD risk traits and provide interesting evidence of a region on chromosome 5q35 segregating with weight, waist circumference, HDL-c and total triglyceride levels.
doi:10.1159/000210449
PMCID: PMC2880725
PMID: 19339786
Norfolk Island; Population isolate; Principal component; Linkage analysis; 5q35; CVD
Chung, Wendy K. | Patki, Amit | Matsuoka, Naoki | Boyer, Bert B. | Liu, Nianjun | Musani, Solomon K. | Goropashnaya, Anna V. | Tan, Perciliz L. | Katsanis, Nicholas | Johnson, Stephen B. | Gregersen, Peter K. | Allison, David B. | Leibel, Rudolph L. | Tiwari, Hemant K.
Objective
Human adiposity is highly heritable, but few of the genes that predispose to obesity in most humans are known. We tested candidate genes in pathways related to food intake and energy expenditure for association with measures of adiposity.
Methods
We studied 355 genetic variants in 30 candidate genes in 7 molecular pathways related to obesity in two groups of adult subjects: 1,982 unrelated European Americans living in the New York metropolitan area drawn from the extremes of their body mass index (BMI) distribution and 593 related Yup'ik Eskimos living in rural Alaska characterized for BMI, body composition, waist circumference, and skin fold thicknesses. Data were analyzed by using a mixed model in conjunction with a false discovery rate (FDR) procedure to correct for multiple testing.
Results
After correcting for multiple testing, two single nucleotide polymorphisms (SNPs) in Ghrelin (GHRL) (rs35682 and rs35683) were associated with BMI in the New York European Americans. This association was not replicated in the Yup'ik participants. There was no evidence for gene × gene interactions among genes within the same molecular pathway after adjusting for multiple testing via FDR control procedure.
Conclusion
Genetic variation in GHRL may have a modest impact on BMI in European Americans.
doi:10.1159/000181158
PMCID: PMC2715950
PMID: 19077438
Obesity; Body composition; Body mass index; Candidate gene; Ghrelin
Objective
Identifying genotyping errors is an important issue in genetic research, yet it has been relatively less studied in samples consisting of unrelated individuals. In this article, we consider several models of genotyping errors, which were originally proposed for pedigree data, for unrelated population samples with single nucleotide polymorphism (SNP) genotype data. The mathematical constraints are investigated for detecting genotyping errors without resampling replicates or genotyping relatives.
Methods
For the various proposed genotyping error models, we unveil the conditions under which the parameters are identifiable. These results are verified through applications to simulated and real SNP data.
Results
We show that, with constraints, two particular models provide both identifiable error rate and allele frequencies of an SNP for unrelated population data. The simulation study shows that these two models present unbiased estimates for the allele frequencies. One of the models also gives an unbiased estimate for the genotyping error rate.
Conclusion
While the Hardy-Weinberg equilibrium test can be used to detect genotyping errors, a key advantage of these models is the explicit estimates of genotyping error rates and allele frequencies. This work may help researchers to estimate error rates and to use the estimates in their analysis to increase power and decrease bias, without the extra work of genotyping family members or replicates.
doi:10.1159/000181153
PMCID: PMC2782542
PMID: 19077433
Genotyping error; Single nucleotide polymorphisms (SNPs); Identifiability
When two or more populations have been separated by geographic or cultural boundaries for many generations, drift, spontaneous mutations, differential selection pressures and other factors may lead to allele frequency differences among populations. If these ‘parental’ populations subsequently come together and begin inter-mating, disequilibrium among linked markers may span a greater genetic distance than it typically does among populations under panmixia [see glossary]. This extended disequilibrium can make association studies highly effective and more economical than disequilibrium mapping in panmictic populations since less marker loci are needed to detect regions of the genome that harbor phenotype-influencing loci. However, under some circumstances, this process of intermating (as well as other processes) can produce disequilibrium between pairs of unlinked loci and thus create the possibility of confounding or spurious associations due to this population stratification. Accordingly, researchers are advised to employ valid statistical tests for linkage disequilibrium mapping allowing conduct of genetic association studies that control for such confounding. Many recent papers have addressed this need. We provide a comprehensive review of advances made in recent years in correcting for population stratification and then evaluate and synthesize these methods based on statistical principles such as (1) randomization, (2) conditioning on sufficient statistics, and (3) identifying whether the method is based on testing the genotype-phenotype covariance (conditional upon familial information) and/or testing departures of the marginal distribution from the expected genotypic frequencies.
doi:10.1159/000119107
PMCID: PMC2803696
PMID: 18382087
Admixture; Ancestry; Association; Covariance-based tests; Genomic control; Linkage; Marginal-based tests; QTL; RAM; Randomization; SAT; Structure; Sufficient statistics; TDT
Objective
Identifying genotyping errors is an important issue in genetic research, yet it has been relatively less studied in samples consisting of unrelated individuals. In this article, we consider several models of genotyping errors, which were originally proposed for pedigree data, for unrelated population samples with single nucleotide polymorphism (SNP) genotype data. The mathematical constraints are investigated for detecting genotyping errors without resampling replicates or genotyping relatives.
Methods
For the various proposed genotyping error models, we unveil the conditions under which the parameters are identifiable. These results are verified through applications to simulated and real SNP data.
Results
We show that, with constraints, two particular models provide both identifiable error rate and allele frequencies of an SNP for unrelated population data. The simulation study shows that these two models present unbiased estimates for the allele frequencies. One of the models also gives an unbiased estimate for the genotyping error rate.
Conclusion
While the Hardy-Weinberg equilibrium test can be used to detect genotyping errors, a key advantage of these models is the explicit estimates of genotyping error rates and allele frequencies. This work may help researchers to estimate error rates and to use the estimates in their analysis to increase power and decrease bias, without the extra work of genotyping family members or replicates.
doi:10.1159/000181153
PMCID: PMC2782542
PMID: 19077433
Genotyping error; single nucleotide polymorphisms (SNPs); identifiability
Background/Aims
With pedigree data, genetic linkage can be detected using inheritance vector tests, which explore the discrepancy between the posterior distribution of the inheritance vectors given observed trait values and the prior distribution of the inheritance vectors. In this paper, we propose conditional inheritance vector tests for linkage localization. These conditional tests can also be used to detect additional linkage signals in the presence of previously detected causal genes.
Methods
For linkage localization, we propose to perform inheritance vector tests conditioning on the inheritance vectors at two positions bounding a test region. We can detect additional linkage signals by conducting a further conditional test in a region with no previously detected genes. We use randomized p-values to extend the marginal and conditional tests when the inheritance vectors cannot be completely determined from genetic marker data,
Results
We conduct simulation studies to compare and contrast the marginal and the conditional tests and to demonstrate that randomized p-values can capture both the significance and the uncertainty in the test results.
Conclusions
The simulation results demonstrate that the proposed conditional tests provide useful localization information, and with informative marker data, the uncertainty in randomized marginal and conditional test results is small.
doi:10.1159/000218112
PMCID: PMC2711517
PMID: 19439976
conditional test; inheritance vector; linkage; localization; pedigree; randomized p-value
Combining data collected from different sources is a cost-effective and time-efficient approach for enhancing the statistical efficiency in estimating weak-to-modest genetic effects or gene-gene or gene-environment interactions. However, combining data across studies becomes complicated when data are collected under different study designs, such as family-based and unrelated individual-based (e.g., population-based case-control design). In this paper, we describe a general method that permits the joint estimation of effects on disease risk of genes, environmental factors, and gene-gene/gene-environment interactions under a hybrid design that includes cases, parents of cases, and unrelated individuals. We provide both asymptotic theory and statistical inference. Extensive simulation experiments demonstrate that the proposed estimation and inferential methods perform well in realistic settings. We illustrate the method by an application to a study of testicular cancer.
doi:10.1159/000179557
PMCID: PMC2763779
PMID: 19077426
Genetic association studies; Family studies; Population-based case control; Testicular cancer
Background
Genotyping error can increase both type I and II errors. In order to elucidate potential genotyping errors, data quality control often includes testing genotype data for deviations from Hardy-Weinberg Equilibrium (HWE).
Methods
The Hardy-Weinberg Disequilibrium (HWD) coefficient and the ability to reject the null hypothesis of HWE were calculated analytically for genotype data from parents and unaffected siblings of affected probands.
Results
Genotype data from parents and unaffected siblings display deviations from HWE when functional or markers in LD with functional locus are tested. For the parental genotype data all deviations from HWE are negative, indicating an excess of heterozygous genotypes with the strongest deviations from HWE observed for the multiplicative model. In contrast, for affected proband genotype data, there is no deviation from HWE under the multiplicative model and the deviations from HWE for the recessive model are positive. For the unaffected sibling data, patterns of deviation from HWE are similar to those observed in the proband data with the exception of the multiplicative model where the HWD coefficient although close to 0 can be either positive or negative depending on the allele frequency.
Conclusion
Deviations from HWE in parental and unaffected sibling genotype data could be due to an association with the functional locus. However these deviations for genotypic relative risk ≤2.0 are not large and therefore the power to detect them is usually low. Testing for deviations from HWE in parental and unaffected sibling genotype data is still beneficial for quality control even though functional loci, in parental and unaffected sibling genotype data, can produce an association signal.
doi:10.1159/000179558
PMCID: PMC2798818
PMID: 19077427
Association studies; Data quality control; Family and population based data; Hardy-Weinberg Equilibrium
Background
Genotyping error can increase both type I and II errors. In order to elucidate potential genotyping errors, data quality control often includes testing genotype data for deviations from Hardy-Weinberg Equilibrium (HWE).
Methods
The Hardy-Weinberg Disequilibrium (HWD) coefficient and the ability to reject the null hypothesis of HWE were calculated analytically for genotype data from parents and unaffected siblings of affected probands.
Results
Genotype data from parents and unaffected siblings display deviations from HWE when functional or markers in LD with functional locus are tested. For the parental genotype data all deviations from HWE are negative, indicating an excess of heterozygous genotypes with the strongest deviations from HWE observed for the multiplicative model. In contrast, for affected proband genotype data, there is no deviation from HWE under the multiplicative model and the deviations from HWE for the recessive model are positive. For the unaffected sibling data, patterns of deviation from HWE are similar to those observed in the proband data with the exception of the multiplicative model where the HWD coefficient although close to 0 can be either positive or negative depending on the allele frequency.
Conclusion
Deviations from HWE in parental and unaffected sibling genotype data could be due to an association with the functional locus. However these deviations for genotypic relative risk ≤2.0 are not large and therefore the power to detect them is usually low. Testing for deviations from HWE in parental and unaffected sibling genotype data is still beneficial for quality control even though functional loci, in parental and unaffected sibling genotype data, can produce an association signal.
doi:10.1159/000179558
PMCID: PMC2798818
PMID: 19077427
Association studies; Data quality control; Family and population based data; Hardy-Weinberg Equilibrium
Missing genotype data can increase false-positive evidence for linkage when either parametric or nonparametric analysis is carried out ignoring intermarker linkage disequilibrium (LD). Previously it was demonstrated by Huang et al. [1] that no bias occurs in this situation for affected sib-pairs with un-related parents when either both parents are genotyped or genotype data is available for two additional unaffected siblings when parental genotypes are missing. However, this is not the case for autosomal recessive consanguineous pedigrees, where missing genotype data for any pedigree member within a consanguinity loop can increase false-positive evidence of linkage. False-positive evidence for linkage is further increased when cryptic consanguinity is present. The amount of false-positive evidence for linkage, and which family members aid in its reduction, is highly dependent on which family members are genotyped. When parental genotype data is available, the false-positive evidence for linkage is usually not as strong as when parental genotype data is unavailable. For a pedigree with an affected proband whose first-cousin parents have been genotyped, further reduction in the false-positive evidence of linkage can be obtained by including genotype data from additional affected siblings of the proband or genotype data from the proband's sibling-grandparents. For the situation, when parental genotypes are unavailable, false-positive evidence for linkage can be reduced by including genotype data from either unaffected siblings of the proband or the proband's married-in-grandparents in the analysis.
doi:10.1159/000112367
PMCID: PMC2798807
PMID: 18073490
Consanguinity; False positives; Linkage analysis; Linkage disequilibrium (LD)
Objective
Heritable maternal and fetal thrombophilia and/or hypofibrinolysis are important causes of miscarriage. Under the constraint that fetal genotype is observed only after a live birth, estimating risk is complicated. Censoring prevents use of published statistical methodology. We propose techniques to determine whether increases in miscarriage are due to the fetal genotype, maternal genotype, or both.
Methods
We propose a study to estimate the risk of miscarriage contributed by an allele, expressed in either dominant or recessive fashion. Using a multinomial likelihood, we derive maximum likelihood estimates of risk for different genotype groups. We describe likelihood ratio tests and a planned hypothesis testing strategy.
Results
Parameter estimation is accurate (bias <0.0011, root mean squared error <0.0780, n = 500). We used simulation to estimate power for studies of three gene mutations: the 4G hypofibrinolytic mutation in the plasminogen activator inhibitor gene (PAI-1), the prothrombin G20210A mutation, and the Factor V Leiden mutation. With 500 families, our methods have approximately 90% power to detect an increase in the miscarriage rate of 0.2, above a background rate of 0.2.
Conclusion
Our statistical method can determine whether increases in miscarriage are due to fetal genotype, maternal genotype, or both despite censoring.
doi:10.1159/000164399
PMCID: PMC2755496
PMID: 18931510
PAI-1; Pregnancy loss; Thrombophilia; Hypofibrinolysis; Genetics
Combining data collected from different sources is a cost-effective and time-efficient approach for enhancing the statistical efficiency in estimating weak-to-modest genetic effects or gene-gene or gene-environment interactions. However, combining data across studies becomes complicated when data are collected under different study designs, such as family-based and unrelated individual-based (e.g., population-based case-control design). In this paper, we describe a general method that permits the joint estimation of effects on disease risk of genes, environmental factors, and gene-gene/ gene-environment interactions under a hybrid design that includes cases, parents of cases, and unrelated individuals. We provide both asymptotic theory and statistical inference. Extensive simulation experiments demonstrate that the proposed estimation and inferential methods perform well in realistic settings. We illustrate the method by an application to a study of testicular cancer.
doi:10.1159/000179557
PMCID: PMC2763779
PMID: 19077426
Background
The studies of complex traits project new challenges to current methods that evaluate association between genotypes and a specific trait. Consideration of possible interactions among loci leads to overwhelming dimensions that cannot be handled using current statistical methods.
Methods
In this article, we evaluate a multi-marker screening algorithm – the backward genotype-trait association (BGTA) algorithm for case-control designs, which uses unphased multi-locus genotypes. BGTA carries out a global investigation on a candidate marker set and automatically screens out markers carrying diminutive amounts of information regarding the trait in question. To address the ‘too many possible genotypes, too few informative chromosomes’ dilemma of a genomic-scale study that consists of hundreds to thousands of markers, we further investigate a BGTA-based marker selection procedure, in which the screening algorithm is repeated on a large number of random marker subsets. Results of these screenings are then aggregated into counts that the markers are retained by the BGTA algorithm. Markers with exceptional high counts of returns are selected for further analysis.
Results and Conclusion
Evaluated using simulations under several disease models, the proposed methods prove to be more powerful in dealing with epistatic traits. We also demonstrate the proposed methods through an application to a study on the inflammatory bowel disease.
doi:10.1159/000096995
PMCID: PMC2757084
PMID: 17114886
Multi-locus; Genotype; Association mapping; Case-control design; Complex traits; Epistasis