The mixed model based single locus regression analysis (MMRA) method was used to analyse the common simulated dataset of the 15th QTL-MAS workshop to detect potential significant association between single nucleotide polymorphisms (SNPs) and the simulated trait. A Wald chi-squared statistic with df =1 was employed as test statistic and the permutation test was performed. For adjusting multiple testing, phenotypic observations were permutated 10,000 times against the genotype and pedigree data to obtain the threshold for declaring genome-wide significant SNPs. Linkage disequilibrium (LD) in term of D' between significant SNPs was quantified and LD blocks were defined to indicate quantitative trait loci (QTL) regions.
The estimated heritability of the simulated trait is approximately 0.30. 82 genome-wide significant SNPs (P < 0.05) on chromosomes 1, 2 and 3 were detected. Through the LD blocks of the significant SNPs, we confirmed 5 and 1 QTL regions on chromosomes 1 and 3, respectively. No block was detected on chromosome 2, and no significant SNP was detected on chromosomes 4 and 5.
MMRA is a suitable method for detecting additive QTL and a fast method with feasibility of performing permutation test. Using LD blocks can effectively detect QTL regions.
Genome-wide linkage analysis using microsatellite markers has been successful in the identification of numerous Mendelian and complex disease loci. The recent availability of high-density single-nucleotide polymorphism (SNP) maps provides a potentially more powerful option. Using the simulated and Collaborative Study on the Genetics of Alcoholism (COGA) datasets from the Genetics Analysis Workshop 14 (GAW14), we examined how altering the density of SNP marker sets impacted the overall information content, the power to detect trait loci, and the number of false positive results. For the simulated data we used SNP maps with density of 0.3 cM, 1 cM, 2 cM, and 3 cM. For the COGA data we combined the marker sets from Illumina and Affymetrix to create a map with average density of 0.25 cM and then, using a sub-sample of these markers, created maps with density of 0.3 cM, 0.6 cM, 1 cM, 2 cM, and 3 cM. For each marker set, multipoint linkage analysis using MERLIN was performed for both dominant and recessive traits derived from marker loci. Our results showed that information content increased with increased map density. For the homogeneous, completely penetrant traits we created, there was only a modest difference in ability to detect trait loci. Additionally, as map density increased there was only a slight increase in the number of false positive results when there was linkage disequilibrium (LD) between markers. The presence of LD between markers may have led to an increased number of false positive regions but no clear relationship between regions of high LD and locations of false positive linkage signals was observed.
Endophenotypes such as behavior disorders have been increasingly adopted in genetic studies for complex traits. For efficient gene mapping, it is essential that an endophenotype is associated with the disease of interest and is inheritable or co-segregating within families. In this study, we proposed a strategy to construct endophenotypes to analyze the Genetic Analysis Workshop 14 simulated dataset. Initially, generalized estimating equation models were employed to identify phenotypes that were correlated to the disease (affected status) in combination with the family structures in data. Endophenotypes were then constructed with consideration of heterogeneity as functions of the identified phenotypes. Genome scans on the constructed endophenotypes were carried out using family-based association analysis. For comparison, genome scans were also performed with the original affected status. The family-based association analysis using the endophenotypes correctly identified the same susceptible gene in about 80 of the 100 replicates.
The Genetic Analysis Workshop 14 simulated data presents an interesting, challenging, and plausible example of a complex disease interaction in a dataset. This paper summarizes the ease of detection for each of the simulated Kofendrerd Personality Disorder (KPD) genes across all of the replicates for five standard linkage statistics. Using the KPD affection status, we have analyzed the microsatellite markers flanking each of the disease genes, plus an additional 2 markers that were not linked to any of the disease loci. All markers were analyzed using the following two-point linkage methods: 1) a MMLS, which is a standard admixture LOD score maximized over θ, α, and mode of inheritance, 2) a MLS calculated by GENEHUNTER, 3) the Kong and Cox LOD score as computed by MERLIN, 4) a MOD score (standard heterogeneity LOD maximized over θ, α, and a grid of genetic model parameters), and 5) the PPL, a Bayesian statistic that directly measures the strength of evidence for linkage to a marker. All of the major loci (D1–D4) were detectable with varying probabilities in the different populations. However, the modifier genes (D5 and D6) were difficult to detect, with similar distributions under the null and alternative across populations and statistics. The pooling of the four datasets in each replicate (n = 350 pedigrees) greatly improved the chance of detecting the major genes using all five methods, but failed to increase the chance to detect D5 and D6.
One implicit assumption in most linkage analysis is that live-born siblings unselected for a phenotype do not share alleles greater than the Mendelian expectation at any particular locus. However, since most families are recruited for genetic studies because of the presence of disease, there is little data available to confirm that this is the case. We hypothesized that loci that behave in a non-Mendelian fashion could be identified using genotype data from the Framingham Heart Study families. We tested the hypothesis that live-born sibs, either stratified by or irrespective of gender, demonstrate excess sharing of alleles on the autosomes, i.e., transmission ratio distortion. Multipoint linkage analysis of siblings either according to gender or not was performed using an allele-sharing method. Such observations may have implications for the mapping of loci for complex disease and quantitative traits in human pedigrees.
No results that reached genome-wide significance were observed. However, four regions demonstrated excess sharing of alleles at p < 0.002 when sibships were stratified by gender-three of which were present in males. Of note, a female-specific locus co-localized with region that is linked to mean systolic blood pressure in the same families. In addition, three other regions demonstrated excess sharing of alleles in sibships irrespective of gender, including a region on chromosome 10p14-p15 (p = 7.5 × 10-4).
Although no loci meeting genome-wide significance were detected to demonstrate transmission ratio distortion, loci with suggestive evidence for linkage were detected. These may have implications for the mapping of susceptibility loci for complex disease in human pedigrees.
Multivariate phenotypes underlie complex traits. Thus, instead of using the end-point trait, it may be statistically more powerful to use a multivariate phenotype correlated to the end-point trait for detecting linkage. In this study, we develop a reverse regression method to analyze linkage of Kofendrerd Personality Disorder affection status in the New York population of the Genetic Analysis Workshop 14 (GAW14) simulated dataset. When we used the multivariate phenotype, we obtained significant evidence of linkage near four of the six putative loci in at least 25% of the replicates. On the other hand, the linkage analysis based on Kofendrerd Personality Disorder status as a phenotype produced significant findings only near two of the loci and in a smaller proportion of replicates.
Genetic association studies offer an opportunity to find genetic variants underlying complex human diseases. Various tests have been developed to improve their power. However, none of these tests is uniformly best and it is usually unclear at the outset what test is best for a specific dataset. For example, Hotelling's T2 test is best for normally distributed data, but it can lose considerable power when normality is not met. To achieve satisfactory power in most cases, without compromising the overall significance level, we propose to adopt a two-stage adaptive analysis strategy – several statistics are compared on a portion of the samples at the first stage and the most powerful statistic is then used for the remaining samples. We evaluated this procedure by mapping the quantitative trait locus of IgM with the simulated data in Genetic Analysis Workshop 15 Problem 3. The results show that the gain in power of the two-stage adaptive analysis procedure could be considerable when the initial choice of test statistic is wrong, whereas the loss is relatively small in the case that the optimal test chosen initially is correct.
Transmission-ratio distortion (TRD) is a phenomenon in which the segregation of alleles does not obey Mendel's laws. As a simple example, a recessive locus that results in fetal lethality will result in live-born individuals sharing more alleles at this locus than expected under Mendel's laws. This could result in apparent linkage of the phenotype of 'being alive' to such a chromosomal regions. Further, this could result in false-positive linkage when 'affected-only' parametric or non-parametric linkage analysis is performed. Similarly, loci demonstrating TRD may be detectable in family-based association tests as deviant transmission of alleles. Therefore, TRD could result in confounding of family-based association studies of diseases. The Framingham Heart Study data available for Genetic Analysis Workshop 16 is a suitable dataset to determine whether there are loci in the genome that reveal TRD because of the large number of individuals from families, the high-resolution genotyping, and the population-based nature of the study. We have used both genome-wide linkage and family-based association methods to determine whether there are loci that demonstrate TRD in the Framingham Heart Study. Family-based association analysis identified thousands of loci with apparent TRD. However, the vast majority of these are likely the result of genotyping errors with application of strict quality control criteria to the genotype data, and automated inspection of the intensity plots, we identify a small number of loci that may show true TRD, including rs1000548 in intron 6 of S-antigen (arrestin, SAG) on chromosome 2 (p = 7 × 10-10).
Genetic Analysis Workshop 14 simulated data have been analyzed with MASC(marker association segregation chi-squares) in which we implemented a bootstrap procedure to provide the variation intervals of parameter estimates. We model here the effect of a genetic factor, S, for Kofendrerd Personality Disorder in the region of the marker C03R0281 for the Aipotu population. The goodness of fit of several genetic models with two alleles for one locus has been tested. The data are not compatible with a direct effect of a single-nucleotide polymorphism (SNP) (SNP 16, 17, 18, 19 of pack 153) in the region. Therefore, we can conclude that the functional polymorphism has not been typed and is in linkage disequilibrium with the four studied SNPs. We obtained very large variation intervals both of the disease allele frequency and the degree of dominance. The uncertainty of the model parameters can be explained first, by the method used, which models marginal effects when the disease is due to complex interactions, second, by the presence of different sub-criteria used for the diagnosis that are not determined by S in the same way, and third, by the fact that the segregation of the disease in the families was not taken into account. However, we could not find any model that could explain the familial segregation of the trait, namely the higher proportion of affected parents than affected sibs.
Mutations in the DFNB1 locus, where two connexin genes are located (GJB2 and GJB6), account for half of congenital cases of nonsyndromic autosomal recessive deafness. Because of the high frequency of DFNB1 gene mutations and the availability of genetic diagnostic tests involving these genes, they are the best candidates to develop a risk prediction model of being hearing impaired. People undergoing genetic counseling are normally interested in knowing the probability of having a hearing impaired child given his/her family history. To address this, a Mendelian model that predicts the probability of being a carrier of DFNB1 mutations, using family history of deafness, has been developed. This probability will be useful as additional information to decide whether or not a genetic test should be performed. This model incorporates Mendelian mode of inheritance, the age of onset of the disease, and the current age of hearing family members. The carrier probabilities are obtained using Bayes’ theorem, in which mutation prevalence is used as the prior distribution. We have validated our model by using information from 305 families affected with congenital or progressive nonsyndromic deafness, in which genetic analysis of GJB2 and GJB6 had already been performed. This model works well, especially in homozygous carriers, showing a high discriminative power. This indicates that our proposed model can be useful in the context of clinical counseling of autosomal recessive disorders.
hearing loss; recessive Mendelian model; predicting carrier probabilities; DFNB1; Bayes’ theorem; GJB2; GJB6
Complex diseases are generally thought to be under the influence of multiple, and possibly interacting, genes. Many association methods have been developed to identify susceptibility genes assuming a single-gene disease model, referred to as single-locus methods. Multilocus methods consider joint effects of multiple genes and environmental factors. One commonly used method for family-based association analysis is implemented in FBAT. The multifactor-dimensionality reduction method (MDR) is a multilocus method, which identifies multiple genetic loci associated with the occurrence of complex disease. Many studies of late onset complex diseases employ a discordant sib pairs design. We compared the FBAT and MDR in their ability to detect susceptibility loci using a discordant sib-pair dataset generated from the simulated data made available to participants in the Genetic Analysis Workshop 14. Using FBAT, we were able to identify the effect of one susceptibility locus. However, the finding was not statistically significant. We were not able to detect any of the interactions using this method. This is probably because the FBAT test is designed to find loci with major effects, not interactions. Using MDR, the best result we obtained identified two interactions. However, neither of these reached a level of statistical significance. This is mainly due to the heterogeneity of the disease trait and noise in the data.
We studied a trend test for genetic association between disease and the number of risk alleles using case-control data. When the data are sampled from families, this trend test can be adjusted to take into account the correlations among family members in complex pedigrees. However, the test depends on the scores based on the underlying genetic model and thus it may have substantial loss of power when the model is misspecified. Since the mode of inheritance will be unknown for complex diseases, we have developed two robust trend tests for case-control studies using family data. These robust tests have relatively good power for a class of possible genetic models. The trend tests and robust trend tests were applied to a dataset of Genetic Analysis Workshop 14 from the Collaborative Study on the Genetics of Alcoholism.
We applied a range of genome-wide association (GWA) methods to map quantitative trait loci (QTL) in the simulated dataset provided by the QTLMAS2009 workshop to derive a comprehensive set of results. A Gompertz curve was modelled on the yield data and showed good predictive properties. QTL analyses were done on the raw measurements and on the individual parameters of the Gompertz curve and its predicted growth for each interval. Half-sib and variance component linkage analysis revealed QTL with different modes of inheritance but with low resolution. This was complemented by association studies using single markers or haplotypes, and additive, dominance, parent-of-origin and epistatic QTL effects. All association analyses were done on phenotypes pre-corrected for pedigree effects. These methods detected QTL positions with high concordance to each other and with greater refinement of the linkage signals. Two-locus interaction analysis detected no epistatic pairs of QTL. Overall, using stringent thresholds we identified QTL regions using linkage analyses, corroborated by 6 individual SNPs with significant effects as well as two putatively imprinted SNPs.
We obtained consistent results across a combination of intra- and inter- family based methods using flexible linear models to evaluate a variety of models. The Gompertz curve fitted the data really well, and provided complementary information on the detected QTL. Retrospective comparisons of the results with actual data simulated showed that best results were obtained by including both yield and the parameters from the Gompertz curve despite the data being simulated using a logistic function.
We present a new method for fine-mapping a disease susceptibility locus using a case-control design. The new method, termed the weighted average (WA) statistic, averages the Cochran-Armitage (CA) trend test statistic and the difference between the Hardy-Weinberg disequilibrium test statistic for cases and controls (the HWD trend). The main characteristics of the WA statistic are that it improves on the weaknesses, and maintains the strengths, of both the CA trend test and the HWD trend test. Data from three different populations in the Genetic Analysis Workshop 14 (GAW14) simulated dataset (Aipotu, Karangar, and Danacaa) were first subjected to model-free linkage analysis to find regions exhibiting linkage. Then, for fine-scale mapping, 140 SNPs within the significant linkage regions were analyzed with the WA test statistic on replicates of the three populations, both separately and combined. The regions that were significant in the multipoint linkage analysis were also significant in this fine-scale mapping. The most significant regions that were obtained using the WA statistic were regions in chromosome 3 (B03T3056–B03T3058, p-value < 1 × 10-10 ) and chromosome 9 (B09T8332–B09T8334, p-value 1 × 10-6 ). Based on the results of the simulated GAW14 data, the WA test statistic showed good performance and could narrow down the region containing the susceptibility locus. However, the strength of the signal depends on both the strength of the linkage disequilibrium and the heterozygosity of the linked marker.
We used the Genetic Analysis Workshop 15 Problem 1 data set to search for expression phenotype quantitative trait loci in a highly selected group of genes with a supposedly correlated role in the development of the enteric nervous system. Our strategy was to reduce the level of multiple testing by analyzing at the genome-wide level a limited number of genes considered to be the most promising enteric nervous system candidates on the basis of mouse expression data, and then extend the analysis to a larger number of traits only for a small number of candidate linked regions. Such a study design allowed us to identify a "master regulator" locus for several genes involved in the enteric nervous system, located in 9q31. In particular, one of four traits included in the genome-wide analysis and 2 of 57 from the follow-up single-chromosome analysis showed LOD scores above 2 around position 109 on chromosome 9 by univariate variance-component linkage analysis. Bivariate linkage analysis further supported the presence of a common regulatory locus, with a maximum multipoint LOD score of 5.17 and five additional LOD scores > 3 in the same region. This region is particularly interesting because a susceptibility locus for Hirschsprung disease, a disease characterized by enteric malformation, was previously mapped to 9q31. The proposed strategy of limiting the genome-wide analysis to a small number of well characterized candidate expression phenotypes and following up the most promising results in a larger number of correlated traits may prove successful for other groups of genes involved in a common pathway.
Clinical heterogeneity of a disease may reflect an underlying genetic heterogeneity, which may hinder the detection of trait loci. Consequently, many statistical methods have been developed that allow for the detection of linkage and/or association signals in the presence of heterogeneity.
This report describes the work of two parallel investigations into similar approaches to ordered subset analysis, based on an observed covariate, in the framework of family-based association analysis using Genetic Analysis Workshop 15 simulated data.
With an appropriate choice of covariate, both approaches allow detection of two loci that are undetectable by the classical transmission-disequilibrium test. For a third locus, detectable by the classical transmission-disequilibrium test, a substantial increase of power of detection is shown.
Genetic mapping provides a powerful approach to identify genes and biological processes underlying any trait influenced by inheritance, including human diseases. We discuss the intellectual foundations of genetic mapping of Mendelian and complex traits in humans, examine lessons emerging from linkage analysis of Mendelian diseases and genome-wide association studies of common diseases, and discuss questions and challenges that lie ahead.
The genetic factors underlying many complex traits are not well understood. The Genetic Analysis Workshop 15 Problem 1 data present the opportunity to explore whether gene expression data from microarrays can be utilized to define useful phenotypes for linkage analysis in complex diseases. We utilize expression profiles for multiple genes that have been associated with a disease to develop a composite 'risk profile' that can be used to map other loci involved in the same disease process. Using prostate cancer as our disease of interest, we identified 26 genes whose expression levels had previously been associated with prostate cancer and defined three phenotypes: high, neutral, or low risk profiles, based on individual expression levels. Linkage analyses using MCLINK, a Markov-chain Monte Carlo method, and MERLIN were performed for all three phenotypes. Both methods were in very close agreement. Genome-wide suggestive linkage evidence was observed on chromosomes 6 and 4. It was interesting to note that the linkage signals did not appear to be strongly influenced by the location of the original 26 genes used in the phenotype definition, indicating that composite measures may have potential to locate additional genes in the same process. In this example, however, extreme caution is necessary in any extrapolation of the identified loci to prostate cancer due to the lack of data regarding the behavior of these genes' expression level in lymphoblastoid cells. Our results do indicate there exists potential to augment our current knowledge about the relationships among genes associated with complex diseases using expression data.
Certain loci on the human genome, such as glutathione S-transferase M1 (GSTM1), do not permit heterozygotes to be reliably determined by commonly used methods. Association of such a locus with a disease is therefore generally tested with a case-control design. When subjects have already been ascertained in a case-parent design however, the question arises as to whether the data can still be used to test disease association at such a locus.
A likelihood ratio test was constructed that can be used with a case-parents design but has somewhat less power than a Pearson's chi-squared test that uses a case-control design. The test is illustrated on a novel dataset showing a genotype relative risk near 2 for the homozygous GSTM1 deletion genotype and autism.
Although the case-control design will remain the mainstay for a locus with a deletion, the likelihood ratio test will be useful for such a locus analyzed as part of a larger case-parent study design. The likelihood ratio test has the advantage that it can incorporate complete and incomplete case-parent trios as well as independent cases and controls. Both analyses support (p = 0.046 for the proposed test, p = 0.028 for the case-control analysis) an association of the homozygous GSTM1 deletion genotype with autism.
Partial least square regression (PLSR) was used to analyze the data of the QTLMAS 2010 workshop to identify genomic regions affecting either one of the two traits and to estimate breeding values. PLSR was appropriate for these data because it enabled to simultaneously fit several traits to the markers.
A preliminary analysis showed phenotypic and genetic correlations between the two traits. Consequently, the data were analyzed jointly in a PLSR model for each chromosome independently. Regression coefficients for the markers were used to calculate the variance of each marker and inference of quantitative trait loci (QTL) was based on local maxima of a smoothed line traced through these variances. In this way, 25 QTL for the continuous trait and 22 for the discrete trait were found. There was evidence for pleiotropic QTL on chromosome 1. The 2000 most important markers were fitted in a second PLSR model to calculate breeding values of the individuals. The accuracies of these estimated breeding values ranged between 0.56 and 0.92.
Results showed the viability of PLSR for QTL analysis and estimating breeding values using markers.
Genome-wide association studies of discrete traits generally use simple methods of analysis based on chi-square tests for contingency tables or logistic regression, at least for an initial scan of the entire genome. Nevertheless, more power might be obtained by using various methods that analyze multiple markers in combination. Methods based on sliding windows, wavelets, Bayesian shrinkage, or penalized likelihood methods, among others, were explored by various participants of Genetic Analysis Workshop 16 Group 1 to combine information across multiple markers within a region, while others used Bayesian variable selection methods for genome-wide multivariate analyses of all markers simultaneously. Imputation can be used to fill in missing markers on individual subjects within a study or in a meta-analysis of studies using different panels. Although multiple imputation theoretically should give more robust tests of association, one participant contribution found little difference between results of single and multiple imputation. Careful control of population stratification is essential, and two contributions found that previously reported associations with two genes disappeared after more precise control. Other issues considered by this group included subgroup analysis, gene-gene interactions, and the use of biomarkers.
rheumatoid arthritis; single-nucleotide polymorphisms; multi-marker associations; imputation; population stratification; gene-gene interactions; biomarkers
Identification of the genetic basis of common traits may be hindered by underlying complex genetic architectures that are inadequately captured by existing models, including both multiallelic and multilocus modes of inheritance (MOI). One useful approach for localizing genes underlying continuous complex traits is the joint oligogenic linkage and segregation analysis implemented in the package Loki. The method uses reversible jump Markov chain Monte Carlo to eliminate the need to prespecify the number of quantitative trait loci (QTLs) in the trait model, thus providing posterior distributions for the number of QTLs in a Bayesian framework. The current implementation assumes QTLs are diallelic, and therefore can overestimate the number of linked QTLs in the presence of a multiallelic QTL. To address the possibility of multiple alleles, we extended the QTL model to allow for a variable number of additive alleles at each locus. Application to simulated data shows that, under a diallelic MOI, the multiallelic and diallelic analysis models give similar results. Under a multiallelic MOI, the multiallelic analysis model provides better mixing and improved convergence, and leads to a more accurate estimate of the underlying trait MOI and model parameter values, than does the diallelic model. Application to real data shows the multiallelic model results in fewer estimated linked QTLs and that the predominant QTL model is similar to one of two predominant models estimated from the diallelic analysis. Our results indicate that use of a multiallelic analysis model can lead to better understanding of the genetic architecture underlying complex traits.
complex trait; MCMC; pedigree; continuous trait; Bayesian
Although trait-associated genes identified as complex versus single-gene inheritance differ substantially in odds ratio, the authors nonetheless posit that their mechanistic concordance can reveal fundamental properties of the genetic architecture, allowing the automated interpretation of unique polymorphisms within a personal genome.
Materials and methods
An analytical method, SPADE-gen, spanning three biological scales was developed to demonstrate the mechanistic concordance between Mendelian and complex inheritance of Alzheimer's disease (AD) genes: biological functions (BP), protein interaction modeling, and protein domain implicated in the disease-associated polymorphism.
Among Gene Ontology (GO) biological processes (BP) enriched at a false detection rate <5% in 15 AD genes of Mendelian inheritance (Online Mendelian Inheritance in Man) and independently in those of complex inheritance (25 host genes of intragenic AD single-nucleotide polymorphisms confirmed in genome-wide association studies), 16 overlapped (empirical p=0.007) and 45 were similar (empirical p<0.009; information theory). SPAN network modeling extended the canonical pathway of AD (KEGG) with 26 new protein interactions (empirical p<0.0001).
The study prioritized new AD-associated biological mechanisms and focused the analysis on previously unreported interactions associated with the biological processes of polymorphisms that affect specific protein domains within characterized AD genes and their direct interactors using (1) concordant GO-BP and (2) domain interactions within STRING protein–protein interactions corresponding to the genomic location of the AD polymorphism (eg, EPHA1, APOE, and CD2AP).
These results are in line with unique-event polymorphism theory, indicating how disease-associated polymorphisms of Mendelian or complex inheritance relate genetically to those observed as ‘unique personal variants’. They also provide insight for identifying novel targets, for repositioning drugs, and for personal therapeutics.
Personal genomics; protein interaction networks; medicine; translational bioinformatics; complex disease; ontology; protein–protein interactions; bioinformatcis; alternative splicing; genetics; network; SNP; protein networks; text-mining; bioinformatics; knowledge representations; uncertain reasoning and decision theory; languages; computational methods
Parent-of-origin (PofO) effects, such as imprinting are a phenomenon in which homologous chromosomes exhibit differential gene expression and epigenetic modifications according to their parental origin. Such non-Mendelian inheritance patterns are generally ignored by conventional association studies, as these tests consider the maternal and paternal alleles as equivalent. To identify regulatory regions that show PofO effects on gene expression (imprinted expression Quantitative Trait Loci, ieQTLs), here we have developed a novel method in which we associate SNP genotypes of defined parental origin with gene expression levels. We applied this method to study 59 HapMap phase II parent-offspring trios. By analyzing mother/father/child trios, rules of Mendelian inheritance allowed the parental origin to be defined for ∼95% of SNPs in each child. We used 680,475 informative SNPs and corresponding expression data for 92,167 probe sets from Affymetrix GeneChip Human Exon 1.0 ST arrays and performed four independent cis-association analyses with the expression level of RefSeq genes within 1 Mb using PLINK. Independent analyses of maternal and paternal genotypes identified two significant cis-ieQTLs (p<10−7) at which expression of genes SFT2D2 and SRRT associated exclusively with maternally inherited SNPs rs3753292 and rs6945374, respectively. 28 additional suggestive cis-associations with only maternal or paternal SNPs were found at a lower stringency threshold of p<10−6, including associations with two known imprinted genes PEG10 and TRAPPC9, demonstrating the efficacy of our method. Furthermore, comparison of our method that utilizes independent analyses of maternal and paternal genotypes with the Likelihood Ratio Test (LRT) showed it to be more effective for detecting imprinting effects than the LRT. Our method represents a novel approach that can identify imprinted regulatory elements that control gene expression, suggesting novel PofO effects in the human genome.
Family history, which includes both common environmental and genetic effects, is associated with an increased risk for many neuropsychiatric diseases. Investigators have identified several disease-causing mutations for specific neuropsychiatric disorders that display Mendelian segregation. Such discoveries can lead to more rational drug design and improved intervention from a better understanding of the underlying biological mechanisms. However, a key challenge of genetic discovery in human complex diseases, including neuropsychiatric disorders, is that most diseases with genetic components display non-Mendelian patterns of inheritance. Recent advances in human population genetics include high-density genome-wide analyses of single nucleotide polymorphisms (SNPs) that make it possible to study complex genetic contributions to human disease. This approach is currently the most powerful strategy for analyzing the genetics of complex diseases. Genome-wide SNP analyses often require a large collaborative effort to collect, manage, and disseminate the numerous samples and corresponding clinical data. In this review we discuss the use of publicly available biorepositories for the collection and distribution of human genetic material, associated phenotypic information, and their use in genome-wide investigations of human neuropsychiatric diseases.
repository; human; neurology; consent; genetics; bioinformatics