PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-18 (18)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
Document Types
1.  PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme 
BMC Bioinformatics  2014;15(1):311.
Background
High-throughput transcriptome sequencing (RNA-seq) technology promises to discover novel protein-coding and non-coding transcripts, particularly the identification of long non-coding RNAs (lncRNAs) from de novo sequencing data. This requires tools that are not restricted by prior gene annotations, genomic sequences and high-quality sequencing.
Results
We present an alignment-free tool called PLEK (predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme), which uses a computational pipeline based on an improved k-mer scheme and a support vector machine (SVM) algorithm to distinguish lncRNAs from messenger RNAs (mRNAs), in the absence of genomic sequences or annotations. The performance of PLEK was evaluated on well-annotated mRNA and lncRNA transcripts. 10-fold cross-validation tests on human RefSeq mRNAs and GENCODE lncRNAs indicated that our tool could achieve accuracy of up to 95.6%. We demonstrated the utility of PLEK on transcripts from other vertebrates using the model built from human datasets. PLEK attained >90% accuracy on most of these datasets. PLEK also performed well using a simulated dataset and two real de novo assembled transcriptome datasets (sequenced by PacBio and 454 platforms) with relatively high indel sequencing errors. In addition, PLEK is approximately eightfold faster than a newly developed alignment-free tool, named Coding-Non-Coding Index (CNCI), and 244 times faster than the most popular alignment-based tool, Coding Potential Calculator (CPC), in a single-threading running manner.
Conclusions
PLEK is an efficient alignment-free computational tool to distinguish lncRNAs from mRNAs in RNA-seq transcriptomes of species lacking reference genomes. PLEK is especially suitable for PacBio or 454 sequencing data and large-scale transcriptome data. Its open-source software can be freely downloaded from https://sourceforge.net/projects/plek/files/.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-311) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2105-15-311
PMCID: PMC4177586  PMID: 25239089
RNA-seq; lncRNA; k-mer; Prediction; de novo sequencing; de novo assemble
2.  Enhancing blue luminescence from Ce-doped ZnO nanophosphor by Li doping 
Nanoscale Research Letters  2014;9(1):480.
Undoped ZnO, Ce-doped ZnO, and (Li, Ce)-codoped ZnO nanophosphors were prepared by a sol-gel process. The effects of the additional doping with Li ions on the crystal structure, particle morphology, and luminescence properties of Ce-doped ZnO were investigated by X-ray diffraction, scanning electron microscopy, X-ray photoelectron spectroscopy, electron paramagnetic resonance spectroscopy and photoluminescence spectroscopy. The results indicate that the obtained samples are single phase, and a nanorod shaped morphology is observed for (Li, Ce)-codoping. Under excitation with 325 nm light, Ce-doped ZnO phosphors show an ultraviolet emission, a green emission, and a blue emission caused by Zn interstitials. The spectrum of the sample codoped with a proper Li concentration features two additional emissions that can be attributed to the Ce3+ ions. With the increase of the Li doping concentration, the Ce3+ blue luminescence of (Li, Ce)-codoped ZnO is obviously enhanced, which results not only from the increase of the Ce3+ ion concentration itself but also from the energy transfer from the ZnO host material to the Ce3+ ions. This enhancement reaches a maximum at a Li content of 0.02, and then decreases sharply due to the concentration quench. These nanophosphors may promise for application to the visible-light-emitting devices.
PACS
78.55.Et; 81.07.Wx; 81.20.Fw
doi:10.1186/1556-276X-9-480
PMCID: PMC4164330  PMID: 25258604
(Li, Ce)-codoped ZnO; Blue luminescence; Phosphors; Sol-gel
3.  Simulating Linkage Disequilibrium Structures in a Human Population for SNP Association Studies 
Biochemical genetics  2011;49(0):395-409.
Existing simulation methods usually simulate linkage disequilibrium (LD) structures starting with an initial population that is randomly generated according to specified allele frequencies. These at random based methods might be unstable because the LD level of the initial population is generally extremely low. This study presents a new algorithm, SIMLD, to simulate genome populations with real LD structures. SIMLD begins from an initial population with possibly the highest LD level, and then the LD decays to fit the desired level through processes of mating and recombination over generations. SIMLD can produce case–control samples according to various disease models. Using empirical SNP marker information from three populations of HapMap data, we implement the proposed algorithm and demonstrate a set of experimental results.
doi:10.1007/s10528-011-9416-x
PMCID: PMC4116680  PMID: 21234669
Case–control; Disease models; Linkage disequilibrium; Simulation; SNPs
4.  Identification of putative pathogenic SNPs implied in schizophrenia-associated miRNAs 
BMC Bioinformatics  2014;15:194.
Background
Schizophrenia is a severe brain disorder, and SNPs (Single nucleotide polymorphism) in schizophrenia-associated miRNAs are believed to be one of the important reasons for dysregulation which might contribute to the altered expression of genes and ultimately result in the disease. Identification of causal SNPs in associated miRNAs may have certain significance in understanding the mechanism of schizophrenia.
Results
For the above purposes, a method based on detection of free energy change is proposed for identification of causal SNPs in schizophrenia-associated miRNAs. A miRNA is firstly segmented, and free energy change is computed after adding an SNP into a segment. The method discovers successfully 6 out of 32 known SNPs and some artificial SNPs could cause significant change in free energy, and among which, 6 known SNPs are supposed to be responsible for most cases of schizophrenia in population.
Conclusions
The proposed method is not only a convenient way to discover causal SNPs in schizophrenia-associated miRNAs without any biochemical assay or sample comparison between cases and controls, but it also has high resolution for causal SNPs even if the SNPs are not reported for their very rare cases in the population. Moreover, the method can be applied to discover the causal SNPs in miRNAs associated with other diseases.
doi:10.1186/1471-2105-15-194
PMCID: PMC4072616  PMID: 24934851
5.  Radiosensitization Effect of Nedaplatin on Nasopharyngeal Carcinoma Cells in Different Status of Epstein-Barr Virus Infection 
BioMed Research International  2014;2014:713674.
This study aims to evaluate the radiosensitization effect of nedaplatin on nasopharyngeal carcinoma (NPC) cell lines with different Epstein-Barr virus (EBV) status. Human NPC cell lines CNE-2 (EBV-negative) and C666 (EBV-positive) were treated with 0–100 μg/mL nedaplatin, and inhibitory effects on cell viability and IC50 were calculated by MTS assay. We assessed changes in radiosensitivity of cells by MTS and colony formation assays, and detected the apoptosis index and changes in cell cycle by flow cytometry. MTS assay showed that nedaplatin caused significant cytotoxicity in CNE-2 and C666 cells in a time- and dose-dependent manner. After 24 h, nedaplatin inhibited growth of CNE-2 and C666 cells with IC50 values of 34.32 and 63.69 μg/mL, respectively. Compared with radiation alone, nedaplatin enhanced the radiation effect on both cell lines. Nedaplatin markedly increased apoptosis and cell cycle arrest in G2/M phase. Nedaplatin radiosensitized human NPC cells CNE-2 and C666, with a significantly greater effect on the former. The mechanisms of radiosensitization include induction of apoptosis and enhancement of cell cycle arrest in G2/M phase.
doi:10.1155/2014/713674
PMCID: PMC4036599  PMID: 24900979
6.  Detection of serum VEGF and MMP-9 levels by Luminex multiplexed assays in patients with breast infiltrative ductal carcinoma 
The aim of the present study was to assess the effect of the combined detection of serum vascular endothelial growth factor (VEGF) and matrix metalloproteinase-9 (MMP-9) by Luminex multiplexed assays for the diagnosis, treatment and prognosis of breast cancer. Preoperative levels of serum VEGF and MMP-9 were detected via a lipid chip-based method in 301 breast cancer cases, 83 breast fibroadenoma cases and 40 healthy adults. Postoperative levels of VEGF and MMP-9 were also detected in 118 breast cancer cases. The levels of serum VEGF and MMP-9 in patients with breast infiltrative ductal carcinoma (IDC) were higher than those in the breast fibroadenoma and healthy control groups (P<0.05); there was no statistically significant difference between the breast fibroadenoma and healthy groups (P>0.05). The levels of VEGF and MMP-9 were shown to correlate with the clinical stage, tumor size and the lymph node metastasis status. However, the levels were not associated with age or gender (P>0.05). In addition, the serum level of MMP-9 exhibited a significantly correlation with the VEGF level (r=0.601, P<0.001). Subgroup analysis revealed that in patients with IDC, serum levels of VEGF and MMP-9 prior to surgery were significantly higher than those following surgery (P<0.05). Therefore, the serum levels of VEGF and MMP-9 can be used as markers for the diagnosis of breast IDC and may also be valuable for the prediction of lymph nodes metastasis.
doi:10.3892/etm.2014.1685
PMCID: PMC4061234  PMID: 24944618
breast infiltrative ductal carcinoma; liquid chip-based method; vascular endothelial growth factor; matrix metalloproteinase-9
7.  Correction: Ameliorative Effects of a Combination of Baicalin, Jasminoidin and Cholic Acid on Ibotenic Acid-Induced Dementia Model in Rats 
PLoS ONE  2013;8(11):10.1371/annotation/4588b718-f48e-4bf4-a387-e56f0a1be19e.
doi:10.1371/annotation/4588b718-f48e-4bf4-a387-e56f0a1be19e
PMCID: PMC3823544  PMID: 24250765
8.  Network-Based Inference Framework for Identifying Cancer Genes from Gene Expression Data 
BioMed Research International  2013;2013:401649.
Great efforts have been devoted to alleviate uncertainty of detected cancer genes as accurate identification of oncogenes is of tremendous significance and helps unravel the biological behavior of tumors. In this paper, we present a differential network-based framework to detect biologically meaningful cancer-related genes. Firstly, a gene regulatory network construction algorithm is proposed, in which a boosting regression based on likelihood score and informative prior is employed for improving accuracy of identification. Secondly, with the algorithm, two gene regulatory networks are constructed from case and control samples independently. Thirdly, by subtracting the two networks, a differential-network model is obtained and then used to rank differentially expressed hub genes for identification of cancer biomarkers. Compared with two existing gene-based methods (t-test and lasso), the method has a significant improvement in accuracy both on synthetic datasets and two real breast cancer datasets. Furthermore, identified six genes (TSPYL5, CD55, CCNE2, DCK, BBC3, and MUC1) susceptible to breast cancer were verified through the literature mining, GO analysis, and pathway functional enrichment analysis. Among these oncogenes, TSPYL5 and CCNE2 have been already known as prognostic biomarkers in breast cancer, CD55 has been suspected of playing an important role in breast cancer prognosis from literature evidence, and other three genes are newly discovered breast cancer biomarkers. More generally, the differential-network schema can be extended to other complex diseases for detection of disease associated-genes.
doi:10.1155/2013/401649
PMCID: PMC3774028  PMID: 24073403
9.  Ameliorative Effects of a Combination of Baicalin, Jasminoidin and Cholic Acid on Ibotenic Acid-Induced Dementia Model in Rats 
PLoS ONE  2013;8(2):e56658.
Aims
To investigate the therapeutic effects and acting mechanism of a combination of Chinese herb active components, i.e., a combination of baicalin, jasminoidin and cholic acid (CBJC) on Alzheimer’s disease (AD).
Methods
Male rats were intracerebroventricularly injected with ibotenic acid (IBO), and CBJC was orally administered. Therapeutic effect was evaluated with the Morris water maze test, FDG-PET examination, and histological examination, and the acting mechanism was studied with DNA microarrays and western blotting.
Results
CBJC treatment significantly attenuated IBO-induced abnormalities in cognition, brain functional images, and brain histological morphology. Additionally, the expression levels of 19 genes in the forebrain were significantly influenced by CBJC; approximately 60% of these genes were related to neuroprotection and neurogenesis, whereas others were related to anti-oxidation, protein degradation, cholesterol metabolism, stress response, angiogenesis, and apoptosis. Expression of these genes was increased, except for the gene related to apoptosis. Changes in expression for 5 of these genes were confirmed by western blotting.
Conclusion
CBJC can ameliorate the IBO-induced dementia in rats and may be significant in the treatment of AD. The therapeutic mechanism may be related to CBJC’s modulation of a number of processes, mainly through promotion of neuroprotection and neurogenesis, with additional promotion of anti-oxidation, protein degradation, etc.
doi:10.1371/journal.pone.0056658
PMCID: PMC3577735  PMID: 23437202
10.  An Overview of Population Genetic Data Simulation 
Abstract
Simulation studies in population genetics play an important role in helping to better understand the impact of various evolutionary and demographic scenarios on sequence variation and sequence patterns, and they also permit investigators to better assess and design analytical methods in the study of disease-associated genetic factors. To facilitate these studies, it is imperative to develop simulators with the capability to accurately generate complex genomic data under various genetic models. Currently, a number of efficient simulation software packages for large-scale genomic data are available, and new simulation programs with more sophisticated capabilities and features continue to emerge. In this article, we review the three basic simulation frameworks—coalescent, forward, and resampling—and some of the existing simulators that fall under these frameworks, comparing them with respect to their evolutionary and demographic scenarios, their computational complexity, and their specific applications. Additionally, we address some limitations in current simulation algorithms and discuss future challenges in the development of more powerful simulation tools.
doi:10.1089/cmb.2010.0188
PMCID: PMC3244809  PMID: 22149682
backward simulators; disease association study; forward simulators; genome simulation; resampling
11.  Comparative Analysis of Methods for Identifying Recurrent Copy Number Alterations in Cancer 
PLoS ONE  2012;7(12):e52516.
Recurrent copy number alterations (CNAs) play an important role in cancer genesis. While a number of computational methods have been proposed for identifying such CNAs, their relative merits remain largely unknown in practice since very few efforts have been focused on comparative analysis of the methods. To facilitate studies of recurrent CNA identification in cancer genome, it is imperative to conduct a comprehensive comparison of performance and limitations among existing methods. In this paper, six representative methods proposed in the latest six years are compared. These include one-stage and two-stage approaches, working with raw intensity ratio data and discretized data respectively. They are based on various techniques such as kernel regression, correlation matrix diagonal segmentation, semi-parametric permutation and cyclic permutation schemes. We explore multiple criteria including type I error rate, detection power, Receiver Operating Characteristics (ROC) curve and the area under curve (AUC), and computational complexity, to evaluate performance of the methods under multiple simulation scenarios. We also characterize their abilities on applications to two real datasets obtained from cancers with lung adenocarcinoma and glioblastoma. This comparison study reveals general characteristics of the existing methods for identifying recurrent CNAs, and further provides new insights into their strengths and weaknesses. It is believed helpful to accelerate the development of novel and improved methods.
doi:10.1371/journal.pone.0052516
PMCID: PMC3527554  PMID: 23285074
12.  Genome-wide identification of significant aberrations in cancer genome 
BMC Genomics  2012;13:342.
Background
Somatic Copy Number Alterations (CNAs) in human genomes are present in almost all human cancers. Systematic efforts to characterize such structural variants must effectively distinguish significant consensus events from random background aberrations. Here we introduce Significant Aberration in Cancer (SAIC), a new method for characterizing and assessing the statistical significance of recurrent CNA units. Three main features of SAIC include: (1) exploiting the intrinsic correlation among consecutive probes to assign a score to each CNA unit instead of single probes; (2) performing permutations on CNA units that preserve correlations inherent in the copy number data; and (3) iteratively detecting Significant Copy Number Aberrations (SCAs) and estimating an unbiased null distribution by applying an SCA-exclusive permutation scheme.
Results
We test and compare the performance of SAIC against four peer methods (GISTIC, STAC, KC-SMART, CMDS) on a large number of simulation datasets. Experimental results show that SAIC outperforms peer methods in terms of larger area under the Receiver Operating Characteristics curve and increased detection power. We then apply SAIC to analyze structural genomic aberrations acquired in four real cancer genome-wide copy number data sets (ovarian cancer, metastatic prostate cancer, lung adenocarcinoma, glioblastoma). When compared with previously reported results, SAIC successfully identifies most SCAs known to be of biological significance and associated with oncogenes (e.g., KRAS, CCNE1, and MYC) or tumor suppressor genes (e.g., CDKN2A/B). Furthermore, SAIC identifies a number of novel SCAs in these copy number data that encompass tumor related genes and may warrant further studies.
Conclusions
Supported by a well-grounded theoretical framework, SAIC has been developed and used to identify SCAs in various cancer copy number data sets, providing useful information to study the landscape of cancer genomes. Open–source and platform-independent SAIC software is implemented using C++, together with R scripts for data formatting and Perl scripts for user interfacing, and it is easy to install and efficient to use. The source code and documentation are freely available at http://www.cbil.ece.vt.edu/software.htm.
doi:10.1186/1471-2164-13-342
PMCID: PMC3428679  PMID: 22839576
13.  TAGCNA: A Method to Identify Significant Consensus Events of Copy Number Alterations in Cancer 
PLoS ONE  2012;7(7):e41082.
Somatic copy number alteration (CNA) is a common phenomenon in cancer genome. Distinguishing significant consensus events (SCEs) from random background CNAs in a set of subjects has been proven to be a valuable tool to study cancer. In order to identify SCEs with an acceptable type I error rate, better computational approaches should be developed based on reasonable statistics and null distributions. In this article, we propose a new approach named TAGCNA for identifying SCEs in somatic CNAs that may encompass cancer driver genes. TAGCNA employs a peel-off permutation scheme to generate a reasonable null distribution based on a prior step of selecting tag CNA markers from the genome being considered. We demonstrate the statistical power of TAGCNA on simulated ground truth data, and validate its applicability using two publicly available cancer datasets: lung and prostate adenocarcinoma. TAGCNA identifies SCEs that are known to be involved with proto-oncogenes (e.g. EGFR, CDK4) and tumor suppressor genes (e.g. CDKN2A, CDKN2B), and provides many additional SCEs with potential biological relevance in these data. TAGCNA can be used to analyze the significance of CNAs in various cancers. It is implemented in R and is freely available at http://tagcna.sourceforge.net/.
doi:10.1371/journal.pone.0041082
PMCID: PMC3399811  PMID: 22815924
14.  Performance analysis of novel methods for detecting epistasis 
BMC Bioinformatics  2011;12:475.
Background
Epistasis is recognized fundamentally important for understanding the mechanism of disease-causing genetic variation. Though many novel methods for detecting epistasis have been proposed, few studies focus on their comparison. Undertaking a comprehensive comparison study is an urgent task and a pathway of the methods to real applications.
Results
This paper aims at a comparison study of epistasis detection methods through applying related software packages on datasets. For this purpose, we categorize methods according to their search strategies, and select five representative methods (TEAM, BOOST, SNPRuler, AntEpiSeeker and epiMODE) originating from different underlying techniques for comparison. The methods are tested on simulated datasets with different size, various epistasis models, and with/without noise. The types of noise include missing data, genotyping error and phenocopy. Performance is evaluated by detection power (three forms are introduced), robustness, sensitivity and computational complexity.
Conclusions
None of selected methods is perfect in all scenarios and each has its own merits and limitations. In terms of detection power, AntEpiSeeker performs best on detecting epistasis displaying marginal effects (eME) and BOOST performs best on identifying epistasis displaying no marginal effects (eNME). In terms of robustness, AntEpiSeeker is robust to all types of noise on eME models, BOOST is robust to genotyping error and phenocopy on eNME models, and SNPRuler is robust to phenocopy on eME models and missing data on eNME models. In terms of sensitivity, AntEpiSeeker is the winner on eME models and both SNPRuler and BOOST perform well on eNME models. In terms of computational complexity, BOOST is the fastest among the methods. In terms of overall performance, AntEpiSeeker and BOOST are recommended as the efficient and effective methods. This comparison study may provide guidelines for applying the methods and further clues for epistasis detection.
doi:10.1186/1471-2105-12-475
PMCID: PMC3259123  PMID: 22172045
15.  The essence of linkage-based imprinting detection: Comparing power, type 1 error, and the effects of confounders in two different analysis approaches 
Annals of human genetics  2010;74(3):248-262.
Summary
Background and goal
The epigenetic phenomenon of imprinting is critical to understanding disease expression. However, it is hard to detect and can be species- and tissue-specific. One approach is to detect imprinting by taking advantage of linkage information. Although imprinting detection methods exist, the effects of potential confounders, such as heterogeneity, sex-specific penetrance, and differential sex-based ascertainment, have not been explored in depth. In this study we explored possible confounders using two different imprinting detection approaches. Our goal was to understand the essence of how imprinting and linkage interact and to elucidate the underlying issues in existing imprinting detection approaches.
Methods
One method (PP) models imprinting by maximizing lod scores with respect to parent-specific penetrances. The other method (DRF) approximates imprinting by maximizing two-point lods with respect to differential male-female recombination fractions. We compared power, type 1 error, and confounder effects in these two linkage-based imprinting detection methods using two-point linkage analysis for simplicity. We computer-simulated data, determining power and type 1 error for imprinting detection among datasets with detectable linkage. We generated data with and without imprinting, with and without heterogeneity, and with varying reduced penetrance, family and dataset size. We also examined non-imprinting situations that could mimic imprinting, e.g., sex-specific penetrances, and a scenario requiring a sex-specified affected parent for ascertainment.
Results
Without heterogeneity, PP had more imprinting-detecting power than DRF. Surprisingly, PP’s power increased when parental affectedness status was ignored, but decreased with heterogeneity. With heterogeneity, type 1 error could increase dramatically for both methods. However, DRF’s power also appeared to increase under heterogeneity, more than could be attributed to the inflated type 1 error. We determined the reasons behind these phenomena.
The presence of sex-specific penetrance increased false positives for PP but not for DRF. Ascertainment through an affected “mother” in unimprinted data did not lead to false positives with either method. For PP, increased information may depend on non-penetrant heterozygous individuals, arguing against using affected sib pairs or other affecteds-only methods.
Conclusions
The high type 1 error levels under some circumstances means methods must be used cautiously. Using differential recombination fractions to approximate imprinting has certain advantages and should be incorporated into future imprinting detection programs.
doi:10.1111/j.1469-1809.2010.00568.x
PMCID: PMC2998764  PMID: 20374235
Imprinting; linkage analysis; robust methods; lod score maximization; recombination fraction; computer simulation; epigenetic phenomena; confounders
16.  Probability Theory-based SNP Association Study Method for Identifying Susceptibility Loci and Genetic Disease Models in Human Case-Control Data 
One of the most challenging points in studying human common complex diseases is to search for both strong and weak susceptibility single-nucleotide polymorphisms (SNPs) and identify forms of genetic disease models. Currently, a number of methods have been proposed for this purpose. Many of them have not been validated through applications into various genome datasets, so their abilities are not clear in real practice. In this paper, we present a novel SNP association study method based on probability theory, called ProbSNP. The method firstly detects SNPs by evaluating their joint probabilities in combining with disease status and selects those with the lowest joint probabilities as susceptibility ones, and then identifies some forms of genetic disease models through testing multiple-locus interactions among the selected SNPs. The joint probabilities of combined SNPs are estimated by establishing Gaussian distribution probability density functions, in which the related parameters (i.e., mean value and standard deviation) are evaluated based on allele and haplotype frequencies. Finally, we test and validate the method using various genome datasets. We find that ProbSNP has shown remarkable success in the applications to both simulated genome data and real genome-wide data.
doi:10.1109/TNB.2010.2070805
PMCID: PMC3029504  PMID: 20840904
Association study; SNPs; probability theory; Gaussian distribution; case-control
17.  Pattern Expression Nonnegative Matrix Factorization: Algorithm and Applications to Blind Source Separation 
Independent component analysis (ICA) is a widely applicable and effective approach in blind source separation (BSS), with limitations that sources are statistically independent. However, more common situation is blind source separation for nonnegative linear model (NNLM) where the observations are nonnegative linear combinations of nonnegative sources, and the sources may be statistically dependent. We propose a pattern expression nonnegative matrix factorization (PE-NMF) approach from the view point of using basis vectors most effectively to express patterns. Two regularization or penalty terms are introduced to be added to the original loss function of a standard nonnegative matrix factorization (NMF) for effective expression of patterns with basis vectors in the PE-NMF. Learning algorithm is presented, and the convergence of the algorithm is proved theoretically. Three illustrative examples on blind source separation including heterogeneity correction for gene microarray data indicate that the sources can be successfully recovered with the proposed PE-NMF when the two parameters can be suitably chosen from prior knowledge of the problem.
doi:10.1155/2008/168769
PMCID: PMC2430033  PMID: 18566689
18.  Construction of the model for the Genetic Analysis Workshop 14 simulated data: genotype-phenotype relationships, gene interaction, linkage, association, disequilibrium, and ascertainment effects for a complex phenotype 
BMC Genetics  2005;6(Suppl 1):S3.
The Genetic Analysis Workshop 14 simulated dataset was designed 1) To test the ability to find genes related to a complex disease (such as alcoholism). Such a disease may be given a variety of definitions by different investigators, have associated endophenotypes that are common in the general population, and is likely to be not one disease but a heterogeneous collection of clinically similar, but genetically distinct, entities. 2) To observe the effect on genetic analysis and gene discovery of a complex set of gene × gene interactions. 3) To allow comparison of microsatellite vs. large-scale single-nucleotide polymorphism (SNP) data. 4) To allow testing of association to identify the disease gene and the effect of moderate marker × marker linkage disequilibrium. 5) To observe the effect of different ascertainment/disease definition schemes on the analysis. Data was distributed in two forms. Data distributed to participants contained about 1,000 SNPs and 400 microsatellite markers. Internet-obtainable data consisted of a finer 10,000 SNP map, which also contained data on controls. While disease characteristics and parameters were constant, four "studies" used varying ascertainment schemes based on differing beliefs about disease characteristics. One of the studies contained multiplex two- and three-generation pedigrees with at least four affected members. The simulated disease was a psychiatric condition with many associated behaviors (endophenotypes), almost all of which were genetic in origin. The underlying disease model contained four major genes and two modifier genes. The four major genes interacted with each other to produce three different phenotypes, which were themselves heterogeneous. The population parameters were calibrated so that the major genes could be discovered by linkage analysis in most datasets. The association evidence was more difficult to calibrate but was designed to find statistically significant association in 50% of datasets. We also simulated some marker × marker linkage disequilibrium around some of the genes and also in areas without disease genes. We tried two different methods to simulate the linkage disequilibrium.
doi:10.1186/1471-2156-6-S1-S3
PMCID: PMC1866756  PMID: 16451639

Results 1-18 (18)