The NF-κB and IRF transcription factor families are major players in inflammation and antiviral response and act as two major effectors of the innate immune response (IIR). The regulatory mechanisms of activation of these two pathways and their interactions during the IIR are only partially known.
Our in silico findings report that there is cross-regulation between both pathways at the level of gene transcription regulation, mediated by the presence of binding sites for both factors in promoters of genes essential for these pathways. These findings agree with recent experimental data reporting crosstalk between pathways activated by RIG-I and TLR3 receptors in response to pathogens.
We present an extended crosstalk diagram of the IRF - NF-κB pathways. We conclude that members of the NF-κB family may directly impact regulation of IRF family, while IRF members impact regulation of NF-κB family rather indirectly, via other transcription factors such as AP-1 and SP1.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1511-7) contains supplementary material, which is available to authorized users.
NF-κB; IRF3; Transcription factors; Crosstalk; Innate immune response
Multiple algorithms are used to predict the impact of missense mutations on protein structure and function using algorithm-generated sequence alignments or manually curated alignments. We compared the accuracy with native alignment of SIFT, Align-GVGD, PolyPhen-2 and Xvar when generating functionality predictions of well characterized missense mutations (n = 267) within the BRCA1, MSH2, MLH1 and TP53 genes. We also evaluated the impact of the alignment employed on predictions from these algorithms (except Xvar) when supplied the same four alignments including alignments automatically generated by (1) SIFT, (2) Polyphen-2, (3) Uniprot, and (4) a manually curated alignment tuned for Align-GVGD. Alignments differ in sequence composition and evolutionary depth. Data-based receiver operating characteristic curves employing the native alignment for each algorithm result in area under the curve of 78-79% for all four algorithms. Predictions from the PolyPhen-2 algorithm were least dependent on the alignment employed. In contrast, Align-GVGD predicts all variants neutral when provided alignments with a large number of sequences. Of note, algorithms make different predictions of variants even when provided the same alignment and do not necessarily perform best using their own alignment. Thus, researchers should consider optimizing both the algorithm and sequence alignment employed in missense prediction.
multiple sequence alignment; SIFT; PolyPhen-2; Align-GVGD; Xvar; BRCA1; MSH2; MLH1; TP53
We present an integrated dynamical cross-talk model of the epithelial innate immune reponse (IIR) incorporating RIG-I and TLR3 as the two major pattern recognition receptors (PRR) converging on the RelA and IRF3 transcriptional effectors. bioPN simulations reproduce biologically relevant gene-and protein abundance measurements in response to time course, gene silencing and dose-response perturbations both at the population and single cell level. Our computational predictions suggest that RelA and IRF3 are under auto- and cross-regulation. We predict, and confirm experimentally, that RIG-I mRNA expression is controlled by IRF7. We also predict the existence of a TLR3-dependent, IRF3-independent transcription factor (or factors) that control(s) expression of MAVS, IRF3 and members of the IKK family. Our model confirms the observed dsRNA dose-dependence of oscillatory patterns in single cells, with periods of 1–3 hr. Model fitting to time series, matched by knockdown data suggests that the NF-κB module operates in a different regime (with different coefficient values) than in the TNFα-stimulation experiments. In future studies, this model will serve as a foundation for identification of virus-encoded IIR antagonists and examination of stochastic effects of viral replication.
Our model generates simulated time series, which reproduce the noisy oscillatory patterns of activity (with 1–3 hour period) observed in individual cells. Our work supports the hypothesis that the IIR is a phenomenon that emerged by evolution despite highly variable responses at an individual cell level.
In this study, we developed a method for modeling the progression and detection of lung cancer based on the smoking behavior at an individual level. The model allows obtaining the characteristics of lung cancer in a population at the time of diagnosis. Lung cancer data from Surveillance, Epidemiology and End Results (SEER) database collected between 2004 and 2008 were used to fit the lung cancer progression and detection model. The fitted model combined with a smoking based carcinogenesis model was used to predict the distribution of age, gender, tumor size, disease stage and smoking status at diagnosis and the results were validated against independent data from the SEER database collected from 1988 to 1999. The model accurately predicted the gender distribution and median age of LC patients of diagnosis, and reasonably predicted the joint tumor size and disease stage distribution.
Sophisticated modeling techniques can be powerful tools to help us understand the effects of cancer control interventions on population trends in cancer incidence and mortality. Readers of journal articles are however rarely supplied with modeling details.
Six modeling groups collaborated as part of the National Cancer Institute’s Cancer Intervention and Surveillance Modeling Network (CISNET) to investigate the contribution of US tobacco control efforts towards reducing lung cancer deaths over the period 1975 to 2000.
The models included in this monograph were developed independently and use distinct, complementary approaches towards modeling the natural history of lung cancer. The models used the same data for inputs and agreed on the design of the analysis and the outcome measures. This article highlights aspects of the models that are most relevant to similarities of or differences between the results. Structured comparisons can increase the transparency of these complex models.
Lung cancer; population trends; tobacco control; modeling
Tumor size at diagnosis (TSD) indirectly reflects tumor growth rate. The relationship between TSD and smoking is poorly understood. The aim of the study was to determine the relationship between smoking and TSD. We reviewed 1712 newly diagnosed and previously untreated non-small cell lung cancer (NSCLC) patients’ electronic medical records and collected tumor characteristics. Demographic and epidemiologic characteristics were derived from questionnaires administered during personal interviews. Univariate and multivariate linear regression models were used to evaluate the relationship between TSD and smoking controlling for demographic and clinical factors. We also investigated the relationship between the rs1051730 SNP in an intron of the CHRNA3 gene (the polymorphism most significantly associated with lung cancer risk and smoking behavior) and TSD. We found a strong dose dependent relationship between TSD and smoking. Current smokers had largest and never smokers smallest TSD with former smokers having intermediate TSD. In the multivariate linear regression model, smoking status (never, former, and current), histological type (adenocarcinoma vs SqCC), and gender were significant predictors of TSD. Smoking duration and intensity may explain the gender effect in predicting TSD. We found that the variant allele of rs1051730 in CHRNA3 gene was associated with larger TSD of squamous cell carcinoma. In the multivariate linear regression model, both rs1051730 and smoking were significant predictors for the size of squamous carcinomas. We conclude that smoking is positively associated with lung tumor size at the moment of diagnosis.
Lung cancer; tumor size; epidemiologic characteristics; risk factors; CHRNA3
We present the codimensional PCA, a novel and straightforward method for resolving sample heterogeneity within a set of cryo-EM 2D projection images of macromolecular assemblies. The method employs Principal Component Analysis (PCA) of resmapled 3D structures computed using subsets of 2D data obtained with a novel hypergeometric sampling scheme. PCA provides us with a small subset of dominating “eingevolumes” of the system, whose reprojections are compared with experimental projection data to yield their factorial coordinates constructed in a common framework of the 3D space of the macromolecule. Codimensional PCA is unique in the dramatic reduction of dimensionality of the problem, which facilitates rapid determination of both the plausible number of conformers in the sample and their 3D structures. We applied the codimensional PCA to a complex data set of T. thermophilus 70S ribosome, and we identified four major conformational states and visualized high mobility of the stalk base region.
Lung cancer is the leading cancer killer for both men and women worldwide. Over 80% of lung cancers are attributed to smoking. In this analysis, the authors propose to use a two-stage clonal expansion (TSCE) model to predict an individual’s lung cancer risk based on gender and smoking history. The TSCE model is traditionally fitted to prospective cohort data. Here, the authors describe a new method that allows for the reconstruction of cohort data from the combination of risk factor data obtained from a case-control study, and tabled incidence/mortality rate data, and discuss alternative approaches. The method is applied to fit a TSCE model based on smoking. The fitted model is validated against independent data from the control arm of a lung cancer chemoprevention trial, CARET, where it accurately predicted the number of lung cancer deaths observed.
TSCE model; lung cancer; risk prediction; smoking
The efficacy of CT screening for lung cancer remains controversial as results from the National Lung Screening Trial (NLST) are not yet available. In this study, we use data from a single-arm CT screening trial to estimate the mortality reduction using a modeling-based approach to construct a control comparison arm.
In order to estimate the potential lung cancer mortality reduction due to CT screening, a previously developed and validated model was applied to the screening trial to predict the number of lung cancer deaths in the absence of screening. Using age, gender, and smoking characteristics matching the trial participants, the model was used to simulate 5000 trials in the absence of CT screening to produce the expected number of lung cancer deaths along with confidence intervals, while adjusting for healthy volunteer bias.
There were 64 observed lung cancer deaths in the screening cohort (n=7995), while the model predicted 117.7 (95%CI: 98, 139) indicating a mortality reduction of 45.6% (p<0.001). When a more conservative healthy volunteer adjustment is applied, 111.3 lung cancer deaths are predicted (95%CI: 91, 132) for a lung-cancer-specific mortality reduction of 42.5% (p<0.001).
These results indicate that CT screening along with early stage treatment can reduce lung cancer-specific mortality. This mortality reduction is greatly influenced by the protocol of nodule follow-up and treatment, and length of follow-up.
Lung cancer; CT screening; Mortality reduction; TSCE model; healthy volunteer effect
Analysis of protein/small molecule interactions is crucial in the discovery of new drug candidates and lead structure optimization. Small biomolecules (ligands) are highly flexible and may adopt numerous conformations upon binding to the protein. Using computer simulations instead of sophisticated laboratory procedures may significantly reduce cost of some stages of drug development. Inspired by probabilistic path planning in robotics, stochastic roadmap methodology can be regarded as a very interesting approach to effective sampling of ligand conformational space around a protein molecule. Protein-ligand interactions are divided into two parts: electrostatics, modeled by the Poisson-Boltzmann equation, and van der Waals interactions, represented by the Lennard-Jones potential. The results are promising; it can be shown that locations of binding sites predicted by the simulation are in agreement with those revealed by experimental x-ray crystallography of protein-ligand complexes. We wanted to extend our knowledge beyond the current molecular modeling tools to arrive at a better understanding of the ligand-binding process. To this end, we investigated a two-level model of protein-ligand interaction and sampling of ligand conformational space covering the entire surface of protein target. Supplementary Material is available at www.liebertonline.com/cmb.
binding site discovery; Poisson-Boltzmann equation; protein-ligand interaction; Stochastic Roadmap Simulation
The NF-κB family plays a prominent role in the innate immune response, cell cycle activation or cell apoptosis. Upon stimulation by pathogen-associated patterns, such as viral RNA a kinase cascade is activated, which strips the NF-κB of its inhibitor IκBα molecule and allows it to translocate into the nucleus. Once in the nucleus, it activates transcription of approximately 90 genes whose kinetics of expression differ relative to when NF-κB translocates into the nucleus, referred to as Early, Middle and Late genes. It is not obvious what mechanism is responsible for segregation of the genes’ timing of transcriptional response.
It is likely that the differences in timing are due, in part, to the number and type of transcription factor binding sites (TFBS), required for NF-κB itself as well as for the putative cofactors, in the Early versus Late genes. We therefore applied an evolutionary analysis of conserved TFBS. We also examined whether transcription dynamic was related to the presence of AU-rich elements (ARE) located in 3′UTR of the mRNA because recent studies have shown that the presence of AREs is associated with rapid gene induction. We found that Early genes were significantly enriched in NF-κB binding sites occurring in evolutionarily conserved domains compared to genes in the Late group. We also found that Early genes had significantly greater number of ARE sequences in the 3′UTR of the gene. The similarities observed among the Early genes were seen in comparison with distant species, while the Late genes promoter regions were much more diversified. Based on the promoter structure and ARE content, Middle genes can be divided into two subgroups which show similarities to Early and Late genes respectively.
Our data suggests that the rapid response of the NF-κB dependent Early genes may be due to both increased gene transcription due to NF-κB loading as well as the contribution of mRNA instability to the transcript profiles. Wider phylogenetic analysis of NF-κB dependent genes provides insight into the degree of cross-species similarity found in the Early genes, opposed to many differences in promoter structure that can be found among the Late genes. These data suggest that activation and expression of the Late genes is much more species-specific than of the Early genes.
Considerable effort has been expended on tobacco control strategies in the United States since the mid-1950s. However, we have little quantitative information on how changes in smoking behaviors have impacted lung cancer mortality. We quantified the cumulative impact of changes in smoking behaviors that started in the mid-1950s on lung cancer mortality in the United States over the period 1975–2000.
A consortium of six groups of investigators used common inputs consisting of simulated cohort-wise smoking histories for the birth cohorts of 1890 through 1970 and independent models to estimate the number of US lung cancer deaths averted during 1975–2000 as a result of changes in smoking behavior that began in the mid-1950s. We also estimated the number of deaths that could have been averted had tobacco control been completely effective in eliminating smoking after the Surgeon General’s first report on Smoking and Health in 1964.
Approximately 795 851 US lung cancer deaths were averted during the period 1975–2000: 552 574 among men and 243 277 among women. In the year 2000 alone, approximately 70 218 lung cancer deaths were averted: 44 135 among men and 26 083 among women. However, these numbers are estimated to represent approximately 32% of lung cancer deaths that could have potentially been averted during the period 1975–2000, 38% of the lung cancer deaths that could have been averted in 1991–2000, and 44% of lung cancer deaths that could have been averted in 2000.
Our results reflect the cumulative impact of changes in smoking behavior since the 1950s. Despite a large impact of changing smoking behaviors on lung cancer deaths, lung cancer remains a major public health problem. Continued efforts at tobacco control are critical to further reduce the burden of this disease.
Detection of early stage non-small cell lung cancer (NSCLC) is commonly believed to be incidental. Understanding the reasons that caused initial detection of these patients is important for early diagnosis. However, these reasons are not well studied.
We retrospectively reviewed medical records of patients diagnosed with stage I or II NSCLC between 2000 and 2009 at UT MD Anderson Cancer Center. Information on suggestive LC-symptoms or other reasons that caused detection were extracted from patients' medical records. We applied univariate and multivariate analyses to evaluate the association of suggestive LC-symptoms with tumor size and patient survival.
Of the 1396 early stage LC patients, 733 (52.5%) presented with suggestive LC-symptoms as chief complaint. 347 (24.9%) and 287 (20.6%) were diagnosed because of regular check-ups and evaluations for other diseases, respectively. The proportion of suggestive LC-symptom-caused detection had a linear relationship with the tumor size (correlation 0.96; with p<.0001). After age, gender, race, smoking status, therapy, and stage adjustment, the symptom-caused detection showed no significant difference in overall and LC-specific survival when compared with the other (non-symptom-caused) detection.
Symptoms suggestive of LC are the number one reason that led to detection in early NSCLC. They were also associated with tumor size at diagnosis, suggesting early stage LC patients are developing symptoms. Presence of symptoms in early stages did not compromise survival. A symptom-based alerting system or guidelines may be worth of further study to benefit NSCLC high risk individuals.
Volunteering participants in disease studies tend to be healthier than the general population partially due to specific enrollment criteria. Using modeling to accurately predict outcomes of cohort studies enrolling volunteers requires adjusting for the bias introduced in this way. Here we propose a new method to account for the effect of a specific form of healthy volunteer bias resulting from imposing disease status-related eligibility criteria, on disease-specific mortality, by explicitly modeling the length of the time interval between the moment when the subject becomes ineligible for the study, and the outcome.
Using survival time data from 1190 newly diagnosed lung cancer patients at MD Anderson Cancer Center, we model the time from clinical lung cancer diagnosis to death using an exponential distribution to approximate the length of this interval for a study where lung cancer death serves as the outcome. Incorporating this interval into our previously developed lung cancer risk model, we adjust for the effect of disease status-related eligibility criteria in predicting the number of lung cancer deaths in the control arm of CARET. The effect of the adjustment using the MD Anderson-derived approximation is compared to that based on SEER data.
Using the adjustment developed in conjunction with our existing lung cancer model, we are able to accurately predict the number of lung cancer deaths observed in the control arm of CARET.
The resulting adjustment was accurate in predicting the lower rates of disease observed in the early years while still maintaining reasonable prediction ability in the later years of the trial. This method could be used to adjust for, or predict the duration and relative effect of any possible biases related to disease-specific eligibility criteria in modeling studies of volunteer-based cohorts.
The Mayo Lung Project (MLP) was a randomized clinical trial designed to test whether periodic screening by chest Xray reduces lung cancer (LC) mortality in high-risk male smokers. Among MLP participants, more LC deaths were found in the screening arm both at the trial’s end and after long-term follow-up. Overdiagnosis is widely cited as an explanation for the MLP results whereas a role of excess LC risk attributable to undergoing numerous chest Xray screenings has been largely un-examined. We examine the consistency of the MLP data with a modified two-stage clonal expansion (TSCE) model of excess LC risk.
Using a simulation model calibrated to the initial MLP data, we estimate the statistical variance of LC incidence and mortality between the screening and control arms. We derive and apply a Bayesian estimation framework using a modified version of the TSCE model to evaluate the role of excess LC risk attributable to chest Xray screening.
Based on our simulations, we find that the overall difference in LC deaths and incidence between study and control arms is unlikely (p=0.0424, p=0.0104) assuming no LC excess risk. We estimate that the 10-year excess LC risk for a 60-year old male smoker having received 10 chest Xray screens is 0.574% (p=0.0021).
The excess LC risk observed among screening arm participants is statistically significant with respect to the TSCE model framework, due in part to the incorporation of key risk correlates of age and screen frequency into the estimation framework.
Lung Cancer Screening; Mathematical Model; Simulation; Mayo Lung Project
We review a large volume of literature concerning mathematical models of cancer therapy, oriented towards optimization of treatment protocols. The review, although partly idiosyncratic, covers such major areas of therapy optimization as phase-specific chemotherapy, antiangiogenic therapy and therapy under drug resistance. We start from early cell-cycle progression models, very simple but admitting explicit mathematical solutions, based on methods of control theory. We continue with more complex models involving evolution of drug resistance and pharmacokinetic and pharmacodynamic effects. Then, we consider two more recent areas: angiogenesis of tumors and molecular signaling within and among cells. We discuss biological background and mathematical techniques of this field, which has a large although only partly realized potential for contributing to cancer treatment.
mathematical modeling; chemotherapy optimization; cell cycle models; pharmacokinetics; pharmacodynamics
Alu elements occupy about eleven percent of the human genome and are still growing in copy numbers. Since Alu elements substantially impact the shape of our genome, there is a need for modeling the amplification, mutation and selection forces of these elements.
Our proposed theoretical neutral model follows a discrete-time branching process described by Griffiths and Pakes. From this model, we derive a limit frequency spectrum of the Alu element distribution, which serves as the theoretical, neutral frequency to which real Alu insertion data can be compared through statistical goodness of fit tests. Departures from the neutral frequency spectrum may indicate selection.
A comparison of the Alu sequence data, obtained by courtesy of Dr. Jerzy Jurka, with our model shows that the distributions of Alu sequences in the AluY family systematically deviate from the expected distribution derived from the branching process.
This observation suggests that Alu sequences do not evolve neutrally and might be under selection.
Macromolecular structure determination by cryo-electron microscopy (EM) and single particle analysis are based on the assumption that imaged molecules have identical structure. With the increased size of processed datasets it becomes apparent that many complexes coexist in a mixture of conformational states or contain flexible regions. As the cryo-EM data is collected in form of projections of imaged molecules, the information about variability of reconstructed density maps is not directly available. To address this problem, we describe a new implementation of the bootstrap resampling technique that yields estimates of voxel-by-voxel variance of a structure reconstructed from the set of its projections. We introduced a novel highly efficient reconstruction algorithm that is based on direct Fourier inversion and which incorporates correction for the transfer function of the microscope, thus extending the resolution limits of variance estimation. We also describe a validation method to determine the number of resampled volumes required to achieve stable estimate of the variance. The proposed bootstrap method was applied to a dataset of 70S ribosome complexed with tRNA and the elongation factor G. The variance map revealed regions of high variability: the L1 protein, the EF-G and the 30S head and the ratchet-like subunit rearrangement. The proposed method of variance estimation opens new possibilities for single particle analysis, by extending applicability of the technique to heterogeneous datasets of macromolecules, and to complexes with significant conformational variability.
single particle reconstruction; bootstrap; electron microscopy; ribosome; elongation cycle
Motivation: Interferon-β induced JAK-STAT signaling pathways contribute to mucosal immune recognition and an anti-viral state. Though the main molecular mechanisms constituting these pathways are known, neither the detailed structure of the regulatory network, nor its dynamics has yet been investigated. The objective of this work is to build a mathematical model for the pathway that would serve two purposes: (1) to reproduce experimental results in simulation of both early and late response to Interferon-β stimulation and (2) to explain experimental phenomena generating new hypotheses about regulatory mechanisms that cannot yet be tested experimentally.
Results: Experimentally determined time dependent changes in the major components of this pathway were used to build a mathematical model describing pathway dynamics in the form of ordinary differential equations. The experimental results suggested existence of unknown negative control mechanisms that were tested numerically using the model. Together, experimental and numerical data show that the epithelial JAK-STAT pathway might be subjected to previously unknown dynamic negative control mechanisms: (1) activation of dormant phosphatases and (2) inhibition of nuclear import of IRF1.
Availability: The model, written in Matlab, is available online at www.stat.rice.edu/~jsmieja/IFN
Supplementary information: Supplementary data are available at Bioinformatics online.
Numerous prospective and retrospective studies have clearly demonstrated a dose-related increased lung cancer risk associated with cigarette smoking, with evidence also for a genetic component to risk. In this study, using the two-stage clonal expansion stochastic model framework, for the first time we investigated the roles of both genetic susceptibility and smoking history in the initiation, clonal expansion, and malignant transformation processes in lung carcinogenesis, integrating information collected by a case–control study and a large-scale prospective cohort study. Our results show that individuals with suboptimal DNA repair capacity have enhanced transition rates of key events in carcinogenesis.
TSCE model; lung cancer; genetic susceptibility; cigarette smoking
Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates – structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates.
Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable.
These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.
The NF-κB regulatory network controls innate immune response by transducing variety of pathogen-derived and cytokine stimuli into well defined single-cell gene regulatory events.
We analyze the network by means of the model combining a deterministic description for molecular species with large cellular concentrations with two classes of stochastic switches: cell-surface receptor activation by TNFα ligand, and IκBα and A20 genes activation by NF-κB molecules. Both stochastic switches are associated with amplification pathways capable of translating single molecular events into tens of thousands of synthesized or degraded proteins. Here, we show that at a low TNFα dose only a fraction of cells are activated, but in these activated cells the amplification mechanisms assure that the amplitude of NF-κB nuclear translocation remains above a threshold. Similarly, the lower nuclear NF-κB concentration only reduces the probability of gene activation, but does not reduce gene expression of those responding.
These two effects provide a particular stochastic robustness in cell regulation, allowing cells to respond differently to the same stimuli, but causing their individual responses to be unequivocal. Both effects are likely to be crucial in the early immune response: Diversity in cell responses causes that the tissue defense is harder to overcome by relatively simple programs coded in viruses and other pathogens. The more focused single-cell responses help cells to choose their individual fates such as apoptosis or proliferation. The model supports the hypothesis that binding of single TNFα ligands is sufficient to induce massive NF-κB translocation and activation of NF-κB dependent genes.
Due to the increasing power of personal computers, as well as the availability of flexible forward-time simulation programs like simuPOP, it is now possible to simulate the evolution of complex human diseases using a forward-time approach. This approach is potentially more powerful than the coalescent approach since it allows simulations of more than one disease susceptibility locus using almost arbitrary genetic and demographic models. However, the application of such simulations has been deterred by the lack of a suitable simulation framework. For example, it is not clear when and how to introduce disease mutants—especially those under purifying selection—to an evolving population, and how to control the disease allele frequencies at the last generation. In this paper, we introduce a forward-time simulation framework that allows us to generate large multi-generation populations with complex diseases caused by unlinked disease susceptibility loci, according to specified demographic and evolutionary properties. Unrelated individuals, small or large pedigrees can be drawn from the resulting population and provide samples for a wide range of study designs and ascertainment methods. We demonstrate our simulation framework using three examples that map genes associated with affection status, a quantitative trait, and the age of onset of a hypothetical cancer, respectively. Nonadditive fitness models, population structure, and gene–gene interactions are simulated. Case-control, sibpair, and large pedigree samples are drawn from the simulated populations and are examined by a variety of gene-mapping methods.
Complex diseases such as hypertension and diabetes are usually caused by multiple disease-susceptibility genes, environment factors, and interactions between them. Simulating populations or samples with complex diseases is an effective approach to study the likely genetic architecture of these diseases and to develop more effective gene-mapping methods. Compared to traditional backward-time (coalescent) methods, population-based, forward-time simulations are more suitable for this task because they can simulate almost arbitrary demographic and genetic features. Forward-time simulations also allow the researcher to perform head-to-head comparisons among gene-mapping methods based on different study designs and ascertainment methods. Unfortunately, evolving a population generation by generation is a random process, so the fates of disease alleles are unpredictable and there is no effective way to control the disease allele frequency at the present generation. In this paper, the authors propose a simulation method that avoids these problems and makes forward-time population simulation a practical solution for the simulation of complex diseases.