Head and neck squamous cell carcinoma (HNSCC) is a frequently fatal heterogeneous disease. Beyond the role of human papilloma virus (HPV), no validated molecular characterization of the disease has been established. Using an integrated genomic analysis and validation methodology we confirm four molecular classes of HNSCC (basal, mesenchymal, atypical, and classical) consistent with signatures established for squamous carcinoma of the lung, including deregulation of the KEAP1/NFE2L2 oxidative stress pathway, differential utilization of the lineage markers SOX2 and TP63, and preference for the oncogenes PIK3CA and EGFR. For potential clinical use the signatures are complimentary to classification by HPV infection status as well as the putative high risk marker CCND1 copy number gain. A molecular etiology for the subtypes is suggested by statistically significant chromosomal gains and losses and differential cell of origin expression patterns. Model systems representative of each of the four subtypes are also presented.
Expression quantitative trait locus (eQTL) analysis is rapidly moving from a cutting-edge concept in genomics to a mature area of investigation, with important connections to genome-wide association studies for human disease, pharmacogenomics and toxicogenomics. Despite the importance of the topic, many investigators must develop their own code or use tools not specifically suited for eQTL analysis. Convenient computational tools are becoming available, but they are not widely publicized, and investigators who are interested in discovery or eQTL, or in using them to interpret genome-wide association study results may have difficulty navigating the available resources. The purpose of this review is to help investigators find appropriate programs for eQTL analysis and interpretation.
bioinformatics; fast linear modeling; gene expression
Motivation: A number of penalization and shrinkage approaches have been proposed for the analysis of microarray gene expression data. Similar techniques are now routinely applied to RNA sequence transcriptional count data, although the value of such shrinkage has not been conclusively established. If penalization is desired, the explicit modeling of mean–variance relationships provides a flexible testing regimen that ‘borrows’ information across genes, while easily incorporating design effects and additional covariates.
Results: We describe BBSeq, which incorporates two approaches: (i) a simple beta-binomial generalized linear model, which has not been extensively tested for RNA-Seq data and (ii) an extension of an expression mean–variance modeling approach to RNA-Seq data, involving modeling of the overdispersion as a function of the mean. Our approaches are flexible, allowing for general handling of discrete experimental factors and continuous covariates. We report comparisons with other alternate methods to handle RNA-Seq data. Although penalized methods have advantages for very small sample sizes, the beta-binomial generalized linear model, combined with simple outlier detection and testing approaches, appears to have favorable characteristics in power and flexibility.
Availability: An R package containing examples and sample datasets is available at http://www.bios.unc.edu/research/genomic_software/BBSeq
Contact: firstname.lastname@example.org; email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online.
In genome-wide association studies, population stratification is recognized as producing inflated type I error due to the inflation of test statistics. Principal component-based methods applied to genotypes provide information about population structure, and have been widely used to control for stratification. Here we explore the precise relationship between genotype principal components and inflation of association test statistics, thereby drawing a connection between principal component-based stratification control and the alternative approach of genomic control. Our results provide an inherent justification for the use of principal components, but call into question the popular practice of selecting principal components based on significance of eigenvalues alone. We propose a new approach, called EigenCorr, which selects principal components based on both their eigenvalues and their correlation with the (disease) phenotype. Our approach tends to select fewer principal components for stratification control than does testing of eigenvalues alone, providing substantial computational savings and improvements in power. Analyses of simulated and real data demonstrate the usefulness of the proposed approach.
Genomic Control; GWAS; PCA; Population Stratification
Genetic studies of lung disease in Cystic Fibrosis are hampered by the
lack of a severity measure that accounts for chronic disease progression and
mortality attrition. Further, combining analyses across studies requires common
phenotypes that are robust to study design and patient ascertainment.
Using data from the North American Cystic Fibrosis Modifier Consortium
(Canadian Consortium for CF Genetic Studies, Johns Hopkins University CF Twin
and Sibling Study, and University of North Carolina/Case Western Reserve
University Gene Modifier Study), the authors calculated age-specific CF
percentile values of FEV1 which were adjusted for CF age-specific mortality
The phenotype was computed for 2061 patients representing the Canadian CF
population, 1137 extreme phenotype patients in the UNC/Case Western study, and
1323 patients from multiple CF sib families in the CF Twin and Sibling Study.
Despite differences in ascertainment and median age, our phenotype score was
distributed in all three samples in a manner consistent with ascertainment
differences, reflecting the lung disease severity of each individual in the
underlying population. The new phenotype score was highly correlated with the
previously recommended complex phenotype, but the new phenotype is more robust
for shorter follow-up and for extreme ages.
A disease progression and mortality adjusted phenotype reduces the need
for stratification or additional covariates, increasing statistical power and
avoiding possible distortions. This approach will facilitate large scale genetic
and environmental epidemiological studies which will provide targeted
therapeutic pathways for the clinical benefit of patients with CF.
Forced Expiratory Volume; Age Effects; Severity of Illness Index
MicroRNAs are short, non-coding RNA sequences that regulate genes at the post-transcriptional level and have been shown to be important in development, tissue differentiation, and disease. Limited attention has been given to the natural variation in miRNA expression across genetically diverse populations even though it is well established that genetic polymorphisms can have a profound effect on mRNA levels. Expression level of 577 miRNAs in the livers of 70 strains of inbred mice was assessed, and we found that miRNA expression is highly stable across different strains. Globally, the expression of miRNA target transcripts does not correlate with miRNA expression, primarily due to the low variance of miRNA but high variance of mRNA expression across strains. Our results show that there is little genetic effect on the baseline miRNA levels in murine liver. The stability of mouse liver miRNA expression in a genetically diverse population suggests that treatment-induced disruptions in liver miRNA expression, a phenomenon established for a large number of toxicants, may indicate an important mechanism for the disturbance of normal liver function, and may prove to be a useful genetic background-independent biomarker of toxicant effect.
micro RNA; liver; mouse; gene expression
Rectal cancer is often clinically resistant to radiotherapy and there would be value to identifying molecular markers to define the biological basis for this phenomenon. NF-κB is a potentially anti-apoptotic transcription factor that has been associated with resistance to radiotherapy in model systems. This study was designed to evaluate NF- κB activation in rectal cancers being treated with chemoradiation to determine whether NF- κB activity correlates with outcome in rectal cancer
Methods and Materials
22 patients were biopsied at multiple time points in a prospective study, and another 50 were analyzed retrospectively. Pre-treatment tumor tissue was analyzed for multiple NF- κB subunits by immunohistochemistry (IHC). Serial tumor biopsies were analyzed for NF- κB-regulated gene expression by RT-PCR and for NF-κB subunit nuclear localization by IHC.
Several NF- κB target genes (Bcl-2, cIAP-2, IL-8 and TRAF1) were significantly upregulated by a single fraction of radiotherapy at 24 hours demonstrating for the first time that NF-κB is activated by radiotherapy in human rectal tumors. Baseline NF-κB p50 nuclear expression did not correlate with pathologic response to radiotherapy, but increasing baseline p50 was prognostic for overall survival (HR 2.15, p = 0.040).
NF-κB nuclear expression at baseline in rectal cancer is prognostic for overall survival but not predictive of response to radiotherapy. Larger patient numbers would be needed to assess the effect of NF-κB target gene upregulation on response to RT. Our results suggest that NF-κB may play an important role in tumor metastasis as opposed to resistance to chemoradiotherapy.
Variants associated with meconium ileus in cystic fibrosis (CF) were identified in 3,763 patients by GWAS. Five SNPs at two loci near SLC6A14 (min P=1.28×10−12 at rs3788766), chr Xq23-24 and SLC26A9 (min P=9.88×10−9 at rs4077468), chr 1q32.1 accounted for ~5% of the phenotypic variability, and were replicated in an independent patient collection (n=2,372; P=0.001 and 0.0001 respectively). By incorporating that disease-causing mutations in CFTR alter electrolyte and fluid flux across epithelia into an hypothesis-driven genome-wide analysis (GWAS-HD), we identified the same SLC6A14 and SLC26A9 associated SNPs, while establishing evidence for the involvement of SNPs in a third solute carrier gene, SLC9A3. In addition, GWAS-HD provided evidence of association between meconium ileus and multiple constituents of the apical plasma membrane where CFTR resides (P=0.0002, testing 155 apical genes jointly and replicated, P=0.022). These findings suggest that modulating activities of apical membrane constituents could complement current therapeutic paradigms for cystic fibrosis.
A combined genome-wide association and linkage study was used to identify loci causing variation in CF lung disease severity. A significant association (P=3. 34 × 10-8) near EHF and APIP (chr11p13) was identified in F508del homozygotes (n=1,978). The association replicated in F508del homozygotes (P=0.006) from a separate family-based study (n=557), with P=1.49 × 10-9 for the three-study joint meta-analysis. Linkage analysis of 486 sibling pairs from the family-based study identified a significant QTL on chromosome 20q13.2 (LOD=5.03). Our findings provide insight into the causes of variation in lung disease severity in CF and suggest new therapeutic targets for this life-limiting disorder.
It is generally presumed that the Cystic Fibrosis (CF) population is relatively homogeneous, and predominantly of European origin. The complex ethnic make-up observed in the CF patients collected by the North American CF Modifier Gene Consortium has brought this assumption into question, and suggested the potential for population substructure in the three CF study samples collected from North America. It is well appreciated that population substructure can result in spurious genetic associations.
To understand the ethnic composition of the North American CF population, and to assess the need for population structure adjustment in genetic association studies with North American CF patients.
Genome-wide single-nucleotide polymorphisms on 3076 unrelated North American CF patients were used to perform population structure analyses. We compared self-reported ethnicity to genotype-inferred ancestry, and also examined whether geographic distribution and CFTR mutation type could explain the structure observed.
Although largely Caucasian, our analyses identified a considerable number of CF patients with admixed African-Caucasian, Mexican-Caucasian and Indian-Caucasian ancestries. Population substructure was present and comparable across the three studies of the consortium. Neither geographic distribution nor mutation type explained the population structure.
Given the ethnic diversity of the North American CF population, it is essential to carefully detect, estimate and adjust for population substructure to guard against potential spurious findings in CF genetic association studies. Other Mendelian diseases that are presumed to predominantly affect single ethnic groups may also benefit from careful analysis of population structure.
ethnicity; principal component analysis; population substructure; population stratification
Immortalized human lymphoblastoid cell lines have been used to demonstrate that it is possible to use an in vitro model system to identify genetic factors that affect responses to xenobiotics. To extend the application of such studies to investigative toxicology by assessing interindividual and population-wide variability and heritability of chemical-induced toxicity phenotypes, we have used cell lines from the Centre d'Etude du Polymorphisme Humain (CEPH) trios assembled by the HapMap Consortium. Our goal is to aid in the development of predictive in vitro genetics-anchored models of chemical-induced toxicity. Cell lines from the CEPH trios were exposed to three concentrations of 14 environmental chemicals. We assessed ATP production and caspase-3/7 activity 24 h after treatment. Replicate analyses were used to evaluate experimental variability and classify responses. We show that variability of response across the cell lines exists for some, but not all, chemicals, with perfluorooctanoic acid (PFOA) and phenobarbital eliciting the greatest degree of interindividual variability. Although the data for the chemicals used here do not show evidence for broad-sense heritability of toxicity response phenotypes, substantial cell line variation was found, and candidate genetic factors contributing to the variability in response to PFOA were investigated using genome-wide association analysis. The approach of screening chemicals for toxicity in a genetically defined yet diverse in vitro human cell-based system is potentially useful for identification of chemicals that may pose a highest risk, the extent of within-species variability in the population, and genetic loci of interest that potentially contribute to chemical susceptibility.
chemical toxicity; human; lymphoblasts; population variability
Mouse models play a crucial role in the study of human behavioral traits and diseases. Variation of gene expression in brain may play a critical role in behavioral phenotypes, and thus it is of great importance to understand regulation of transcription in mouse brain. In this study, we analyzed the role of two important factors influencing steady-state transcriptional variation in mouse brain. First we considered the effect of assessing whole brain vs. discrete regions of the brain. Second, we investigated the genetic basis of strain effects on gene expression. We examined the transcriptome of three brain regions using Affymetrix expression arrays: whole brain, forebrain, and hindbrain in adult mice from two common inbred strains (C57BL/6J vs. NOD/ShiLtJ) with eight replicates for each brain region and strain combination. We observed significant differences between the transcriptomes of forebrain and hindbrain. In contrast, the transcriptomes of whole brain and forebrain were very similar. Using 4.3 million single-nucleotide polymorphisms identified through whole-genome sequencing of C57BL/6J and NOD/ShiLtJ strains, we investigated the relationship between strain effect in gene expression and DNA sequence similarity. We found that cis-regulatory effects play an important role in gene expression differences between strains and that the cis-regulatory elements are more often located in 5′ and/or 3′ transcript boundaries, with no apparent preference on either 5′ or 3′ ends.
Mouse Genetic Resource; Mouse Collaborative Cross; mouse; gene expression; whole brain; forebrain; hindbrain; sequence variation
Summary: seeQTL is a comprehensive and versatile eQTL database, including various eQTL studies and a meta-analysis of HapMap eQTL information. The database presents eQTL association results in a convenient browser, using both segmented local-association plots and genome-wide Manhattan plots.
Availability and implementation: seeQTL is freely available for non-commercial use at http://www.bios.unc.edu/research/genomic_software/seeQTL/.
Contact: firstname.lastname@example.org; email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online.
Sporadic amyotrophic lateral sclerosis (sALS) is a motor neuron disease with poorly understood etiology. Results of gene expression profiling studies of whole blood from ALS patients have not been validated and are difficult to relate to ALS pathogenesis because gene expression profiles depend on the relative abundance of the different cell types present in whole blood. We conducted microarray analyses using Agilent Human Whole Genome 4 × 44k Arrays on a more homogeneous cell population, namely purified peripheral blood lymphocytes (PBLs), from ALS patients and healthy controls to identify molecular signatures possibly relevant to ALS pathogenesis.
Differentially expressed genes were determined by LIMMA (Linear Models for MicroArray) and SAM (Significance Analysis of Microarrays) analyses. The SAFE (Significance Analysis of Function and Expression) procedure was used to identify molecular pathway perturbations. Proteasome inhibition assays were conducted on cultured peripheral blood mononuclear cells (PBMCs) from ALS patients to confirm alteration of the Ubiquitin/Proteasome System (UPS).
For the first time, using SAFE in a global gene ontology analysis (gene set size 5-100), we show significant perturbation of the KEGG (Kyoto Encyclopedia of Genes and Genomes) ALS pathway of motor neuron degeneration in PBLs from ALS patients. This was the only KEGG disease pathway significantly upregulated among 25, and contributing genes, including SOD1, represented 54% of the encoded proteins or protein complexes of the KEGG ALS pathway. Further SAFE analysis, including gene set sizes >100, showed that only neurodegenerative diseases (4 out of 34 disease pathways) including ALS were significantly upregulated. Changes in UBR2 expression correlated inversely with time since onset of disease and directly with ALSFRS-R, implying that UBR2 was increased early in the course of ALS. Cultured PBMCs from ALS patients accumulated more ubiquitinated proteins than PBMCs from healthy controls in a serum-dependent manner confirming changes in this pathway.
Our study indicates that PBLs from sALS patients are strong responders to systemic signals or local signals acquired by cell trafficking, representing changes in gene expression similar to those present in brain and spinal cord of sALS patients. PBLs may provide a useful means to study ALS pathogenesis.
Variability in cystic fibrosis (CF) lung disease is partially due to non-CFTR genetic modifiers. Mucin genes are very polymorphic, and mucins play a key role in the pathogenesis of CF lung disease; therefore, mucin genes are strong candidates as genetic modifiers. DNA from CF patients recruited for extremes of lung phenotype was analyzed by Southern blot or PCR to define variable number tandem repeat (VNTR) length polymorphisms for MUC1, MUC2, MUC5AC, and MUC7. VNTR length polymorphisms were tested for association with lung disease severity and for linkage disequilibrium (LD) with flanking single nucleotide polymorphisms (SNPs). No strong associations were found for MUC1, MUC2, or MUC7. A significant association was found between the overall distribution of MUC5AC VNTR length and CF lung disease severity (p = 0.025; n = 468 patients); plus, there was robust association of the specific 6.4 kb HinfI VNTR fragment with severity of lung disease (p = 6.2×10−4 after Bonferroni correction). There was strong LD between MUC5AC VNTR length modes and flanking SNPs. The severity-associated 6.4 kb VNTR allele of MUC5AC was confirmed to be genetically distinct from the 6.3 kb allele, as it showed significantly stronger association with nearby SNPs. These data provide detailed respiratory mucin gene VNTR allele distributions in CF patients. Our data also show a novel link between the MUC5AC 6.4 kb VNTR allele and severity of CF lung disease. The LD pattern with surrounding SNPs suggests that the 6.4 kb allele contains, or is linked to, important functional genetic variation.
Association studies using unrelated individuals have become the most popular design for mapping complex traits. One of the major challenges of association mapping is avoiding spurious association due to population stratification. Principal component analysis (PCA) on genome-wide marker genotypes is one of the most popular population stratification control methods. It implicitly assumes that the markers are in linkage equilibrium, a condition that is rarely satisfied and that we plan to relax.
We carefully examined the impact of linkage disequilibrium (LD) on PCA, and proposed a simple modification of the standard PCA to automatically adjust for the correlations among markers.
We demonstrated that LD patterns in genome-wide association datasets can distort the techniques for stratification control, showing ‘subpopulations’ reflecting localized LD phenomena rather than plausible population structure. We showed that the proposed method effectively removes the artifactual effect of LD patterns, and successfully recovers underlying population structure that is not apparent from standard PCA.
PCA is highly influenced by sets of SNPs with high LD, obscuring the true population substructure. Our shrinkage PCA applies to all available markers, regardless of the LD patterns. The proposed method is easier to implement than most existing LD adjusted PCA methods.
PCA; Loadings; GWAS
A number of settings arise in which it is of interest to predict Principal Component (PC) scores for new observations using data from an initial sample. In this paper, we demonstrate that naive approaches to PC score prediction can be substantially biased towards 0 in the analysis of large matrices. This phenomenon is largely related to known inconsistency results for sample eigenvalues and eigenvectors as both dimensions of the matrix increase. For the spiked eigenvalue model for random matrices, we expand the generality of these results, and propose bias-adjusted PC score prediction. In addition, we compute the asymptotic correlation coefficient between PC scores from sample and population eigenvectors. Simulation and real data examples from the genetics literature show the improved bias and numerical properties of our estimators.
PCA; PC Scores; Random Matrix; PC Regression
Motivation: DNA copy number gains and losses are commonly found in tumor tissue, and some of these aberrations play a role in tumor genesis and development. Although high resolution DNA copy number data can be obtained using array-based techniques, no single method is widely used to distinguish between recurrent and sporadic copy number aberrations.
Results: Here we introduce Discovering Copy Number Aberrations Manifested In Cancer (DiNAMIC), a novel method for assessing the statistical significance of recurrent copy number aberrations. In contrast to competing procedures, the testing procedure underlying DiNAMIC is carefully motivated, and employs a novel cyclic permutation scheme. Extensive simulation studies show that DiNAMIC controls false positive discoveries in a variety of realistic scenarios. We use DiNAMIC to analyze two publicly available tumor datasets, and our results show that DiNAMIC detects multiple loci that have biological relevance.
Availability: Source code implemented in R, as well as text files containing examples and sample datasets are available at http://www.bios.unc.edu/research/genomic_software/DiNAMIC.
Contact: firstname.lastname@example.org; email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online.
Analysis of microarray experiments often involves testing for the overrepresentation of pre-defined sets of genes among lists of genes deemed individually significant. Most popular gene set testing methods assume the independence of genes within each set, an assumption that is seriously violated, as extensive correlation between genes is a well-documented phenomenon.
We conducted a meta-analysis of over 200 datasets from the Gene Expression Omnibus in order to demonstrate the practical impact of strong gene correlation patterns that are highly consistent across experiments. We show that a common independence assumption-based gene set testing procedure produces very high false positive rates when applied to data sets for which treatment groups have been randomized, and that gene sets with high internal correlation are more likely to be declared significant. A reanalysis of the same datasets using an array resampling approach properly controls false positive rates, leading to more parsimonious and high-confidence gene set findings, which should facilitate pathway-based interpretation of the microarray data.
These findings call into question many of the gene set testing results in the literature and argue strongly for the adoption of resampling based gene set testing criteria in the peer reviewed biomedical literature.
In quantitative-trait linkage studies using experimental crosses, the conventional normal location-shift model or other parameterizations may be unnecessarily restrictive. We generalize the mapping problem to a genuine nonparametric setup and provide a robust estimation procedure for the situation where the underlying phenotype distributions are completely unspecified. Classical Wilcoxon-Mann-Whitney statistics are employed for point and interval estimation of QTL positions and effects.
GEE; Generalized least squares estimate; Quantitative trait; Weighted least squares estimate; Wilcoxon- Mann-Whitney statistic
Improvements in high-throughput technology and its increasing use have led to the generation of many highly complex datasets that often address similar biological questions. Combining information from these studies can increase the reliability and generalizability of results and also yield new insights that guide future research.
This paper describes a novel algorithm called BLANKET for symmetric analysis of two experiments that assess informativeness of descriptors. The experiments are required to be related only in that their descriptor sets intersect substantially and their definitions of case and control are consistent. From resulting lists of n descriptors ranked by informativeness, BLANKET determines shortlists of descriptors from each experiment, generally of different lengths p and q. For any pair of shortlists, four numbers are evident: the number of descriptors appearing in both shortlists, in exactly one shortlist, or in neither shortlist. From the associated contingency table, BLANKET computes Right Fisher Exact Test (RFET) values used as scores over a plane of possible pairs of shortlist lengths [1,2]. BLANKET then chooses a pair or pairs with RFET score less than a threshold; the threshold depends upon n and shortlist length limits and represents a quality of intersection achieved by less than 5% of random lists.
Researchers seek within a universe of descriptors some minimal subset that collectively and efficiently predicts experimental outcomes. Ideally, any smaller subset should be insufficient for reliable prediction and any larger subset should have little additional accuracy. As a method, BLANKET is easy to conceptualize and presents only moderate computational complexity. Many existing databases could be mined using BLANKET to suggest optimal sets of predictive descriptors.
Gene expression quantitative trait locus (eQTL) mapping has become a powerful tool in systems biology. While many authors have made important discoveries using this approach, one persistent challenge in eQTL studies is the selection of loci and genes that should receive further biological investigation. In this study, we compared eQTL generated from gene expression profiling in the livers of two panels of mouse strains, 41 BXD recombinant inbred and 36 mouse diversity panel (MDP) strains. Cis-eQTL, loci in which the transcript and its maximum QTL are co-located, have been shown to be more reproducible than trans-eQTL, which are not co-located with the transcript. We observed that 9.9% of cis-eQTL and 2.0% of trans-eQTL replicated between the two panels. Notably, a significant eQTL hotspot on distal chromosome 12 observed in the BXD panel was reproduced in the MDP. Furthermore, the shorter linkage disequilibrium in the MDP strains allowed us to considerably narrow the locus and limit the number of candidate genes to a cluster of Serpin genes, which code for extracellular proteases. We conclude that this strategy has some utility in increasing confidence and resolution in eQTL mapping studies; however, due to the high false positive rate in the MDP, eQTL mapping in inbred strains is best carried out in combination with an eQTL linkage study.
Accurate prediction of in vivo toxicity from in vitro testing is a challenging problem. Large public–private consortia have been formed with the goal of improving chemical safety assessment by the means of high-throughput screening.
A wealth of available biological data requires new computational approaches to link chemical structure, in vitro data, and potential adverse health effects.
Methods and results
A database containing experimental cytotoxicity values for in vitro half-maximal inhibitory concentration (IC50) and in vivo rodent median lethal dose (LD50) for more than 300 chemicals was compiled by Zentralstelle zur Erfassung und Bewertung von Ersatz- und Ergaenzungsmethoden zum Tierversuch (ZEBET; National Center for Documentation and Evaluation of Alternative Methods to Animal Experiments). The application of conventional quantitative structure–activity relationship (QSAR) modeling approaches to predict mouse or rat acute LD50 values from chemical descriptors of ZEBET compounds yielded no statistically significant models. The analysis of these data showed no significant correlation between IC50 and LD50. However, a linear IC50 versus LD50 correlation could be established for a fraction of compounds. To capitalize on this observation, we developed a novel two-step modeling approach as follows. First, all chemicals are partitioned into two groups based on the relationship between IC50 and LD50 values: One group comprises compounds with linear IC50 versus LD50 relationships, and another group comprises the remaining compounds. Second, we built conventional binary classification QSAR models to predict the group affiliation based on chemical descriptors only. Third, we developed k-nearest neighbor continuous QSAR models for each subclass to predict LD50 values from chemical descriptors. All models were extensively validated using special protocols.
The novelty of this modeling approach is that it uses the relationships between in vivo and in vitro data only to inform the initial construction of the hierarchical two-step QSAR models. Models resulting from this approach employ chemical descriptors only for external prediction of acute rodent toxicity.
acute toxicity; computational toxicology; IC50; LD50; LOAEL; NOAEL; QSAR
We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls.
Chronic fatiguing illness remains a poorly understood syndrome of unknown pathogenesis. We attempted to identify biomarkers for chronic fatiguing illness using microarrays to query the transcriptome in peripheral blood leukocytes.
Cases were 44 individuals who were clinically evaluated and found to meet standard international criteria for chronic fatigue syndrome or idiopathic chronic fatigue, and controls were their monozygotic co-twins who were clinically evaluated and never had even one month of impairing fatigue. Biological sampling conditions were standardized and RNA stabilizing media were used. These methodological features provide rigorous control for bias resulting from case-control mismatched ancestry and experimental error. Individual gene expression profiles were assessed using Affymetrix Human Genome U133 Plus 2.0 arrays.
There were no significant differences in gene expression for any transcript.
Contrary to our expectations, we were unable to identify a biomarker for chronic fatiguing illness in the transcriptome of peripheral blood leukocytes suggesting that positive findings in prior studies may have resulted from experimental bias.