High-throughput microscopic screening instruments can generate huge collections of images of live cells incubated with combinatorial libraries of fluorescent molecules. Organizing and visualizing these images to discern biologically important patterns that link back to chemical structure is a challenge. We present an analysis and visualization methodology - Cheminformatic Assisted Image Array (CAIA) - that greatly facilitates data mining efforts. For illustration, we considered a collection of microscopic images acquired from cells incubated with each member of a combinatorial library of styryl molecules being screened for candidate bioimaging probes. By sorting CAIAs based on quantitative image features, the relative contribution of each combinatorial building block on probe intracellular distribution could be visually discerned. The results revealed trends hidden in the dataset: most interestingly, the building blocks of the styryl molecules appeared to behave as chemical address tags, additively and independently encoding spatial patterns of intracellular fluorescence. Translated into practice, CAIA facilitated discovery of several outstanding styryl molecules for live cell nuclear imaging applications.
Cheminformatics; high content screening; combinatorial library; styryl; fluorescence; bioimaging; chemical address tags; QSAR; CAIA
Advances in modern neuroimaging in combination with behavioral genetics have allowed neuroscientists to investigate how genetic and environmental factors shape human brain structure and function. Estimating the heritability of brain structure and function via twin studies has become one of the major approaches in studying the genetics of the brain. In a classical twin study, heritability is estimated by computing genetic and phenotypic variation based on the similarity of monozygotic and dizygotic twins. However, heritability has traditionally been measured for univariate, scalar traits, and it is challenging to assess the heritability of a spatial process, such as a pattern of neural activity. In this work, we develop a statistical method to estimate phenotypic variance and covariance at each location in a spatial process, which in turn can be used to estimate the heritability of a spatial dataset. The method is based on a dimensionally-reduced model of spatial variation in paired images, in which adjusted least squares estimates can be used to estimate the key model parameters. The advantage of the proposed method compared to conventional methods such as a voxelwise or mean-ROI approaches is demonstrated in both a simulation study and a real data study assessing genetic influence on patterns of brain activity in the visual and motor cortices in response to a simple visuomotor task.
Heritability; Intraclass Correlation; Twin Study; Spatial Analysis; Genetics
Vaccine adverse events (VAEs) are adverse bodily changes occurring after vaccination. Understanding the adverse event (AE) profiles is a crucial step to identify serious AEs. Two different types of seasonal influenza vaccines have been used on the market: trivalent (killed) inactivated influenza vaccine (TIV) and trivalent live attenuated influenza vaccine (LAIV). Different adverse event profiles induced by these two groups of seasonal influenza vaccines were studied based on the data drawn from the CDC Vaccine Adverse Event Report System (VAERS). Extracted from VAERS were 37,621 AE reports for four TIVs (Afluria, Fluarix, Fluvirin, and Fluzone) and 3,707 AE reports for the only LAIV (FluMist). The AE report data were analyzed by a novel combinatorial, ontology-based detection of AE method (CODAE). CODAE detects AEs using Proportional Reporting Ratio (PRR), Chi-square significance test, and base level filtration, and groups identified AEs by ontology-based hierarchical classification. In total, 48 TIV-enriched and 68 LAIV-enriched AEs were identified (PRR>2, Chi-square score >4, and the number of cases >0.2% of total reports). These AE terms were classified using the Ontology of Adverse Events (OAE), MedDRA, and SNOMED-CT. The OAE method provided better classification results than the two other methods. Thirteen out of 48 TIV-enriched AEs were related to neurological and muscular processing such as paralysis, movement disorders, and muscular weakness. In contrast, 15 out of 68 LAIV-enriched AEs were associated with inflammatory response and respiratory system disorders. There were evidences of two severe adverse events (Guillain-Barre Syndrome and paralysis) present in TIV. Although these severe adverse events were at low incidence rate, they were found to be more significantly enriched in TIV-vaccinated patients than LAIV-vaccinated patients. Therefore, our novel combinatorial bioinformatics analysis discovered that LAIV had lower chance of inducing these two severe adverse events than TIV. In addition, our meta-analysis found that all previously reported positive correlation between GBS and influenza vaccine immunization were based on trivalent influenza vaccines instead of monovalent influenza vaccines.
To further our understanding of the biology and prognostic significance of various chromosomal 13q14 deletions in CLL.
We have analyzed data from SNP 6.0 arrays to define the anatomy of various 13q14 deletions in a cohort of 255 CLL patients and have correlated two subsets of 13q14 deletions (type I: exclusive of RB1 and type II: inclusive of RB1) with patient survival. Further, we have measured the expression of the 13q14-resident microRNAs by Q-PCR in 242 CLL patients and subsequently assessed their prognostic significance. We have sequenced all coding exons of RB1 in patients with monoallelic Rb1 deletion and have sequenced the 13q14-resident miR locus in all patients.
Large 13q14 (type II) deletions were detected in ~20% of all CLL patients and were associated with shortened survival. A strong association between 13q14 type II deletions and elevated genomic complexity, as measured through CLL-FISH or SNP 6.0 array profiling, was identified, suggesting that these lesions may contribute to CLL disease evolution through genomic destabilization. Sequence and copy number analysis of the RB1 gene identified a small CLL subset that is RB1 null. Finally, neither the expression levels of the 13q14-resident microRNAs nor the degree of 13q14 deletion, as measured through SNP 6.0 array-based copy number analysis, had significant prognostic importance.
Our data suggest that the clinical course of CLL is accelerated in patients with large (type II) 13q14 deletions that span the RB1 gene, therefore justifying routine identification of 13q14 subtypes in CLL management.
CLL; 13q14 deletion subtypes; survival
To explore the extent to which current knowledge about the organelle-targeting features of small molecules may be applicable towards controlling the accumulation and distribution of exogenous chemical agents inside cells, molecules with known subcellular localization properties (as reported in the scientific literature) were compiled into a single data set. This data set was compared to a reference data set of approved drug molecules derived from the DrugBank database, and to a reference data set of random organic molecules derived from the PubChem database. Cheminformatic analysis revealed that molecules with reported subcellular localizations were comparably diverse. However, the calculated physicochemical properties of molecules reported to accumulate in different organelles were markedly overlapping. In relation to the reference sets of Drug Bank and Pubchem molecules, molecules with reported subcellular localizations were biased towards larger, more complex chemical structures possessing multiple ionizable functional groups and higher lipophilicity. Stratifying molecules based on molecular weight revealed that many physicochemical properties trends associated with specific organelles were reversed in smaller vs. larger molecules. Most likely, these reversed trends are due to the different transport mechanisms determining the subcellular localization of molecules of different sizes. Molecular weight can be dramatically altered by tagging molecules with fluorophores or by incorporating organelle targeting motifs. Generally, in order to better exploit structure-localization relationships, subcellular targeting strategies would benefit from analysis of the biodistribution effects resulting from variations in the size of the molecules.
drug transport; pharmacokinetics; biodistribution; drug targeting; databases; mathematical modeling; drug delivery; cheminformatics
An active area in cancer biomarker research is the development of statistical methods to identify expression signatures reflecting the heterogeneity of cancer across affected individuals. Tomlins et al.  observed heterogeneous patterns of oncogene activation within several cancer types, and introduced a statistical method called Cancer Outlier Profile Analysis (COPA) to identify “cancer outlier genes”. Several related statistical approaches have since been developed, but the operating characteristics of these procedures (e.g. power, false positive rate), have not yet been fully characterized, especially in a proteomics setting. Here, we use simulation to identify the degree to which an outlier pattern of differential expression must hold in order for outlier-based approaches to be more effective than mean-based approaches. We also propose a diagnostic procedure that characterizes the potentially unequal levels of differential expression in the tails and in the center of a distribution of expression values. We find that for sample sizes and effect sizes typical of proteomics studies, the outlier pattern must be strong in order for outlier-based analysis to provide a meaningful benefit. This is corroborated by analysis of proteomics data from a melanoma study, in which the differential expression is most often present throughout the distribution, rather than being concentrated in the tails, albeit with a few proteins showing expression patterns consistent with outlier expression.
Cancers of the urinary bladder are the fifth most commonly diagnosed malignancy in the US. Early clinical diagnosis of bladder cancer remains a major challenge and the development of non-invasive methods for detection and surveillance is desirable for both patients and health care providers.
In order to identify urinary proteins with potential clinical utility we enriched and profiled the glycoprotein component of urine samples using a dual-lectin affinity chromatography and LC-MS/MS platform.
From a primary sample set obtained from 54 cancer patients and 46 controls a total of 265 distinct glycoproteins were identified with high confidence, and changes in glycoprotein abundance between groups were quantified by a label-free spectral counting method. Validation of candidate biomarker alpha-1-antitrypsin (A1AT) for disease association was performed on an independent set of 70 samples (35 cancer cases) using an ELISA. Increased levels of urinary alpha-1-antitrypsin (A1AT) glycoprotein were indicative of the presence of bladder cancer (p value < 0.0001) and augmented voided urine cytology results. A1AT detection classified bladder cancer patients with a sensitivity of 74% and specificity of 80%.
The described strategy can enable higher resolution profiling of the proteome in biological fluids by reducing complexity. Application of glycoprotein enrichment provided novel candidates for further investigation as biomarkers for the non-invasive detection of bladder cancer.
bladder cancer; glycoprotein profiling; diagnostic profile; A1AT
A mass spectrometric method was developed to elucidate the N-glycan structures of serum glycoproteins and utilize fucosylated glycans as potential markers for pancreatic cancer. This assay was applied to haptoglobin in human serum where N-glycans derived from the serum of 16 pancreatic cancer patients were compared with those from 15 individuals with benign conditions (5 normals, 5 chronic pancreatitis, and 5 type II diabetes). This assay used only 10uL of serum where haptoglobin was extracted using a monoclonal antibody and quantitative permethylation was performed on desialylated N-glycans followed by MALDI-QIT-TOF MS analysis. Eight desialylated N-glycan structures of haptoglobin were identified where a bifucosylated tri-antennary structure was reported for the first time in pancreatic cancer samples. Both core and antennary fucosylation were elevated in pancreatic cancer samples compared to samples from benign conditions. Fucosylation degree indices were calculated and show a significant difference between pancreatic cancer patients of all stages and the benign conditions analyzed. This study demonstrates that a serum assay based on haptoglobin fucosylation patterns using mass spectrometric analysis may serve as a novel method for the diagnosis of pancreatic cancer.
The chromosomal deletion 11q affects biology and clinical outcome in CLL but del11q-deregulated genes remain incompletely characterized.
We have employed integrated genomic profiling approaches upon CLL cases with and without del11q to identify 11q-relevant genes.
We have identified differential expression of the insulin receptor (INSR) in CLL, including high-level INSR expression in the majority of CLL with del11q. High INSR mRNA expression in 11q CLL (~10-fold higher mean levels than other genomic categories) was confirmed by Q-PCR in 247 CLL cases. INSR protein measurements in 257 CLL cases through FACS, compared with measurements in normal CD19+ B-cells and monocytes, confirmed that a subset of CLL aberrantly expresses high INSR levels. INSR stimulation by insulin in CLL cells ex vivo resulted in the activation of canonical INSR signaling pathways, including the AKT-mTOR and Ras/Raf/Erk pathways, and INSR activation partially abrogated spontaneous CLL cell apoptosis ex vivo. Higher INSR levels correlated with shorter time to first therapy (TTFT) and shorter overall survival (OS). In bivariate analysis, INSR expression predicted for rapid initial disease progression and shorter OS in ZAP-70 low/negative CLL. Finally, in multivariate analysis (ZAP-70 status, IgVH status and INSR expression), we detected elevated hazard ratios and trends for short OS for CLL cases with high INSR expression (analyzed inclusive or exclusive of cases with del11q).
Our aggregate biochemical and clinical outcome data suggest biologically meaningful elevated INSR expression in a substantial subset of all CLL cases, including many cases with del11q.
CLL; Insulin receptor; apoptosis; deletion 11q; disease progression
Systemic immunosuppression is a risk factor for melanoma, and sunburn-induced immunosuppression is thought to be causal. Genes in immunosuppression pathways are therefore candidate melanoma-susceptibility genes. If variants within these genes individually have a small effect on disease risk, the association may be undetected in genome-wide association (GWA) studies due to low power to reach a high significance level. Pathway-based approaches have been suggested as a method of incorporating a priori knowledge into the analysis of GWA studies. In this study, the association of 1113 single nucleotide polymorphisms (SNPs) in 43 genes (39 genomic regions) related to immunosuppression have been analysed using a gene-set approach in 1539 melanoma cases and 3917 controls from the GenoMEL consortium GWA study. The association between melanoma susceptibility and the whole set of tumour-immunosuppression genes, and also predefined functional subgroups of genes, was considered. The analysis was based on a measure formed by summing the evidence from the most significant SNP in each gene, and significance was evaluated empirically by case-control label permutation. An association was found between melanoma and the complete set of genes (pemp = 0.002), as well as the subgroups related to the generation of tolerogenic dendritic cells (pemp = 0.006) and secretion of suppressive factors (pemp = 0.0004), thus providing preliminary evidence of involvement of tumour-immunosuppression gene polymorphisms in melanoma susceptibility. The analysis was repeated on a second phase of the GenoMEL study, which showed no evidence of an association. As one of the first attempts to replicate a pathway-level association, our results suggest that low power and heterogeneity may present challenges.
Using the exome sequencing data from 697 unrelated individuals and their simulated disease phenotypes from Genetic Analysis Workshop 17, we develop and apply a gene-based method to identify the relationship between a gene with multiple rare genetic variants and a phenotype. The method is based on the Mantel test, which assesses the correlation between two distance matrices using a permutation procedure. Using up to 100,000 permutations to estimate the statistical significance in 200 replicate data sets, we found that the method had 5.1% type I error at an α level of 0.05 and had various power to detect genes with simulated genetic associations. FLT1 and KDR had the most significant correlations with Q1 and were replicated 170 and 24 times, respectively, in 200 simulated data sets using a Bonferroni corrected p-value of 0.05 as a threshold. These results suggest that the distance correlation method can be used to identify genotype-phenotype association when multiple rare genetic variants in a gene are involved.
Epistasis plays an important role in genetics, evolution and crop breeding. To detect the epistasis, triple test cross (TTC) design had been developed several decades ago. Classical procedures for the TTC design use only linear transformations Z1, Z2 and Z3, calculated from the TTC family means of quantitative trait, to infer the nature of the collective additive, dominance and epistatic effects of all the genes. Although several quantitative trait loci (QTL) mapping approaches in the TTC design have been developed, these approaches do not provide a complete solution for dissecting pure main and epistatic effects. In this study, therefore, we developed a two-step approach to estimate all pure main and epistatic effects in the F2-based TTC design under the F2 and F∞ metric models. In the first step, with Z1 and Z2 the augmented main and epistatic effects in the full genetic model that simultaneously considered all putative QTL on the whole genome were estimated using empirical Bayes approach, and with Z3 three pure epistatic effects were obtained using two-dimensional genome scans. In the second step, the three pure epistatic effects obtained in the first step were integrated with the augmented epistatic and main effects for the further estimation of all other pure effects. A series of Monte Carlo simulation experiments has been carried out to confirm the proposed method. The results from simulation experiments show that: 1) the newly defined genetic parameters could be rightly identified with satisfactory statistical power and precision; 2) the F2-based TTC design was superior to the F2 and F2:3 designs; 3) with Z1 and Z2 the statistical powers for the detection of augmented epistatic effects were substantively affected by the signs of pure epistatic effects; and 4) with Z3 the estimation of pure epistatic effects required large sample size and family replication number. The extension of the proposed method in this study to other base populations was further discussed.
This study was conducted to identify novel genes with importance to the biology of adult acute myelogenous leukemia (AML).
We analyzed DNA from highly purified AML blasts and paired buccal cells from 95 patients for recurrent genomic microdeletions using ultra-high density Affymetrix SNP 6.0 array-based genomic profiling.
Through fine mapping of microdeletions on 17q, we derived a minimal deleted region of ~0.9Mb length that harbors 11 known genes; this region includes Neurofibromin 1 (NF1). Sequence analysis of all NF1 coding exons in the 11 AML cases with NF1 copy number changes identified acquired truncating frameshift mutations in 2 patients. These NF1 mutations were already present in the hematopoetic stem cell compartment. Subsequent expression analysis of NF1 mRNA in the entire AML cohort using FACS sorted blasts as a source of RNA identified 6 patients (one with a NF1 mutation) with absent NF1 expression. The NF1 null states were associated with increased Ras-bound GTP, and shRNA-mediated NF1 suppression in primary AML blasts with wild type NF1 facilitated colony formation in methylcellulose. Primary AML blasts without functional NF1, unlike blasts with functional NF1, displayed sensitivity to rapamycin-induced apoptosis, thus identifying a dependence on mTOR signaling for survival. Finally, colony formation in methylcellulose ex vivo of NF1 null CD34+/CD38− cells sorted from AML bone marrow samples was inhibited by low dose rapamycin.
NF1 null states are present in 7/95=7% of adult AML and delineate a disease subset that could be preferentially targeted by Ras or mTOR-directed therapeutics.
AML; genomic microdeletions; NF1 mutations
Genome-wide association study (GWAS) has identified more than 30 loci associated with type 2 diabetes (T2D) in Caucasians. However, genomic understanding of T2D in Asians, especially Han Chinese, is still limited.
Methods and Principal Findings
A two-stage GWAS was performed in Han Chinese from Mainland China. The discovery stage included 793 T2D cases and 806 healthy controls genotyped using Illumina Human 660- and 610-Quad BeadChips; and the replication stage included two independent case-control populations (a total of 4445 T2D cases and 4458 controls) genotyped using TaqMan assay. We validated the associations of KCNQ1 (rs163182, p = 2.085×10−17, OR 1.28) and C2CD4A/B (rs1370176, p = 3.677×10−4, OR 1.124; rs1436953, p = 7.753×10−6, OR 1.141; rs7172432, p = 4.001×10−5, OR 1.134) in Han Chinese.
Conclusions and Significance
Our study represents the first GWAS of T2D with both discovery and replication sample sets recruited from Han Chinese men and women residing in Mainland China. We confirmed the associations of KCNQ1 and C2CD4A/B with T2D, with the latter for the first time being examined in Han Chinese. Arguably, eight more independent loci were replicated in our GWAS.
Analyzing subpopulations of tumor cells in tissue is a challenging subject in proteomic studies. Pancreatic cancer stem cells (CSCs) are such a group of cells that only constitute 0.2-0.8% of the total tumor cells but have been found to be the origin of pancreatic cancer carcinogenesis and metastasis. Global proteome profiling of pancreatic CSCs from xenograft tumors in mice is a promising way to unveil the molecular machinery underlying the signaling pathways. However, the extremely low availability of pancreatic tissue CSCs (around 10,000 cells per xenograft tumor or patient sample) has limited the utilization of currently standard proteomic approaches which do not work effectively with such a small amount of material. Herein, we describe the profiling of the proteome of pancreatic CSCs using a capillary scale shotgun technique by coupling offline capillary isoelectric focusing(cIEF) with nano reversed phase liquid chromatography(RPLC) followed by spectral counting peptide quantification. A whole cell lysate from 10,000 cells which corresponds to ∼1ug protein material is equally divided for three repeated cIEF separations where around 300ng peptide material is used in each run. In comparison with a non-tumorigenic tumor cell sample, among 1159 distinct proteins identified with FDR less than 0.2%, 169 differentially expressed proteins are identified after multiple testing corrections where 24% of the proteins are upregulated in the CSCs group. Ingenuity Pathway analysis of these differential expression signatures further suggests significant involvement of signaling pathways related to apoptosis, cell proliferation, inflammation and metastasis.
cancer stem cells; capillary isoelectric focusing; two-dimensional separation; proteome profiling; differential expression; spectral count; pathway analysis
Chemical address tags can be defined as specific structural features shared by a set of bioimaging probes having a predictable influence on cell-associated visual signals obtained from these probes. Here, using a large image dataset acquired with a high content screening instrument, machine vision and cheminformatics analysis have been applied to reveal chemical address tags. With a combinatorial library of fluorescent molecules, fluorescence signal intensity, spectral, and spatial features characterizing each one of the probes' visual signals were extracted from images acquired with the three different excitation and emission channels of the imaging instrument. With multivariate regression, the additive contribution from each one of the different building blocks of the bioimaging probes towards each measured, cell-associated image-based feature was calculated. In this manner, variations in the chemical features of the molecules were associated with the resulting staining patterns, facilitating quantitative, objective analysis of chemical address tags. Hierarchical clustering and paired image-cheminformatics analysis revealed key structure-property relationships amongst many building blocks of the fluorescent molecules. The results point to different chemical modifications of the bioimaging probes that can exert similar (or different) effects on the probes' visual signals. Inspection of the clustered structures suggests intramolecular charge migration or partial charge distribution as potential mechanistic determinants of chemical address tag behavior.
Cheminformatics; machine vision; bioimaging; fluorescence; high content screening; image cytometry; combinatorial chemistry
Visual analytics, a technique aiding data analysis and decision making, is a novel tool that allows for a better understanding of the context of complex systems. Public health professionals can greatly benefit from this technique since context is integral in disease monitoring and biosurveillance. We propose a graphical tool that can reveal the distribution of an outcome by time and age simultaneously.
We introduce and demonstrate multi-panel (MP) graphs applied in four different settings: U.S. national influenza-associated and salmonellosis-associated hospitalizations among the older adult population (≥65 years old), 1991–2004; confirmed salmonellosis cases reported to the Massachusetts Department of Public Health for the general population, 2004–2005; and asthma-associated hospital visits for children aged 0–18 at Milwaukee Children's Hospital of Wisconsin, 1997–2006. We illustrate trends and anomalies that otherwise would be obscured by traditional visualization techniques such as case pyramids and time-series plots.
MP graphs can weave together two vital dynamics—temporality and demographics—that play important roles in the distribution and spread of diseases, making these graphs a powerful tool for public health and disease biosurveillance efforts.
Genomic complexity is present in ~15–30% of all CLL and has emerged as a strong independent predictor of rapid disease progression and short remission duration in CLL. We conducted this study to advance our understanding of the causes of genomic complexity in CLL.
We have obtained quantitative measurements of radiation-induced apoptosis and radiation-induced ATM auto-phosphorylation in purified CLL cells from 158 and 140 patients, respectively, and have employed multi variate analysis to identify independent contributions of various biological variables on genomic complexity in CLL.
Here, we identify a strong independent effect of radiation resistance on elevated genomic complexity in CLL and describe radiation resistance as a predictor for shortened CLL survival. Further, using multivariate analysis, we identify del17p/p53 aberrations, del11q, del13q14 type II (invariably resulting in Rb loss) and CD38 expression as independent predictors of genomic complexity in CLL, with aberrant p53 as a predictor of ~50% of genomic complexity in CLL. Focusing on del11q, we determined that normalized ATM activity was a modest predictor of genomic complexity but was not independent of del11q. Through SNP array-based fine mapping of del11q, we identified frequent mono-allelic loss of Mre11 and H2AFX in addition to ATM, indicative of compound del11q-resident gene defects in the DNA-ds-break response.
Our quantitative analysis links multiple molecular defects, including for the first time del11q and large 13q14 deletions (type II), to elevated genomic complexity in CLL, thereby suggesting mechanisms for the observed clinical aggressiveness of CLL in patients with unstable genomes.
CLL; genomic complexity; DNA double strand break response
Most genome-wide association (GWA) studies have focused on populations of European ancestry with limited assessment of the influence of the sequence variants on populations of other ethnicities. To determine whether markers that we have recently shown to associate with Bone Mineral Density (BMD) in Europeans also associate with BMD in East-Asians we analysed 50 markers from 23 genomic loci in samples from Korea (n = 1,397) and two Chinese Hong Kong sample sets (n = 3,869 and n = 785). Through this effort we identified fourteen loci that associated with BMD in East-Asian samples using a false discovery rate (FDR) of 0.05; 1p36 (ZBTB40, P = 4.3×10−9), 1p31 (GPR177, P = 0.00012), 3p22 (CTNNB1, P = 0.00013), 4q22 (MEPE, P = 0.0026), 5q14 (MEF2C, P = 1.3×10−5), 6q25 (ESR1, P = 0.0011), 7p14 (STARD3NL, P = 0.00025), 7q21 (FLJ42280, P = 0.00017), 8q24 (TNFRSF11B, P = 3.4×10−5), 11p15 (SOX6, P = 0.00033), 11q13 (LRP5, P = 0.0033), 13q14 (TNFSF11, P = 7.5×10−5), 16q24 (FOXL1, P = 0.0010) and 17q21 (SOST, P = 0.015). Our study marks an early effort towards the challenge of cataloguing bone density variants shared by many ethnicities by testing BMD variants that have been established in Europeans, in East-Asians.
High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA , Wilcoxon's test, SAM , RVM , limma , VarMixt  and SMVar . Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data.
The NCI60 human tumor cell line screen is a public resource for studying selective and non-selective growth inhibition of small molecules against cancer cells. By coupling growth inhibition screening data with biological characterizations of the different cell lines, it becomes possible to infer mechanisms of action underlying some of the observable patterns of selective activity. Using these data, mechanistic relationships have been identified including specific associations between single genes and small families of closely related compounds, and less specific relationships between biological processes involving several cooperating genes and broader families of compounds. Here we aim to characterize the degree to which such specific and general relationships are present in these data. A related question is whether genes tend to act with a uniform mechanism for all associated compounds, or whether multiple mechanisms are commonly involved. We address these two issues in a statistical framework placing special emphasis on the effects of measurement error in the gene expression and chemical screening data. We find that as measurement accuracy increases, the pattern of apparent associations shifts from one dominated by isolated gene/compound pairs, to one in which families consisting of an average of 25 compounds are associated to the same gene. At the same time, the number of genes that appear to play a role in influencing compound activities decreases. For less than half of the genes, the presence of both positive and negative correlations indicates pleiotropic associations with molecules via different mechanisms of action.
High throughput screen; gene expression; chemical biology; measurement error; false discovery rate; toxicity
Ovarian cancer, the second most common gynecological malignancy, accounts for 3% of all cancers among women in the United States, and has a high mortality rate, largely because existing therapies for widespread disease are rarely curative. Ovarian endometrioid adenocarcinoma (OEA) accounts for about 20% of the overall incidence of all ovarian cancer. We have used proteomics profiling to characterize low stage (FIGO stage 1 or 2) versus high stage (FIGO stage 3 or 4) human OEAs. In general, the low stage tumors lacked p53 mutations and had frequent CTNNB1, PTEN, and/or PIK3CA mutations. The high stage tumors had mutant p53, were usually high grade, and lacked mutations predicted to deregulate Wnt/β-catenin and PI3K/Pten/Akt signaling. We utilized 2-D liquid-based separation/mass mapping techniques to elucidate molecular weight and pI measurements of the differentially expressed intact proteins. We generated 2-D protein mass maps to facilitate the analysis of protein expression between both the low stage and high stage tumors. These mass maps (over a pI range of 5.6–4.6) revealed that the low stage OEAs demonstrated protein over-expression at the lower pI ranges (pI 4.8–4.6) in comparison to the high stage tumors, which demonstrated protein over-expression in the higher pI ranges (pI 5.4–5.2). These data suggest that both low and high stage OEAs have characteristic pI signatures of abundant protein expression probably reflecting, at least in part, the different signaling pathway defects that characterize each group. In this study, the low stage OEAs were distinguishable from high stage tumors based upon the proteomic profiles. Interestingly, when only high-grade (grade 2 or 3) OEAs were included in the analysis, the tumors still tended to cluster according to stage, suggesting that the altered protein expression was not solely dependent upon tumor cell differentiation. Further, these protein profiles clearly distinguish OEA from other types of ovarian cancer at the protein level.
Endometrioid ovarian cancer; Liquid-based protein separation; Mass mapping
The proteomic profiles from two distinct ovarian endometrioid tumor derived cell lines, (MDAH-2774 and TOV-112D) each with different morphological characteristics and genetic mutations, have been studied. Characterization of the differential global protein expression between these two cell lines has important implications for the understanding of the pathogenesis of ovarian endometrioid carcinoma. In this comparative proteomic study, extensive fractionation of peptides generated from whole cell trypsin digestion was achieved by coupling capillary isoelectric focusing (cIEF) in the first dimensional separation with capillary liquid chromatography (RP-HPLC) in the second dimensional separation. On-line analysis was performed using tandem mass spectra acquired by a linear ion trap mass spectrometer from triplicate runs. A total of 1749 and 1955 proteins with protein probability above 0.95 were identified from MDAH-2774 and TOV-112D after filtering through Peptide Prophet/ Protein Prophet software. Differentially expressed proteins were further investigated by Ingenuity Pathway Analysis (IPA) to reveal the association with important biological functions. Canonical pathway analysis using IPA demonstrates that important signaling pathways are highly associated with one of these two cell lines versus the other, such as the PI3K/AKT pathway which is found to be significantly predominant in MDAH-2774 but not in TOV-112D. Also, protein network analysis using IPA highlights p53 as a central hub relating to other proteins from the connectivity map. These results illustrate the utility of high throughput proteomics methods using large scale proteome profiling combined with bioinformatics tools to identify differential signaling pathways, thus contributing to the understanding of mechanisms of deregulation in neoplastic cells.
capillary isoelectric focusing; proteins; ovarian cancer; pathway analysis; quantitation; spectral count
Ovarian serous carcinomas (OSCs) comprise over half of all ovarian carcinomas and account for the majority of ovarian cancer-related deaths. We used a 2-dimensional liquid-based protein mapping strategy to characterize global protein expression patterns in 19 OSC tumor samples from 15 different patients to facilitate molecular classification of tumor stage. Protein expression profiles were produced, using pI-based separation in the first dimension and hydrophobicity-based separation in the second dimension, over a pH range of 4.0-7.0. Hierarchical clustering was applied to protein maps to indicate the tumor interrelationships. The 19 tumor samples could be classified into two different groups, one group associated with low stage (Stage 1) tumors and the other group associated with high stage (Stages 3/4) tumors. Proteins that were differentially expressed in different groups were selected for identification by LTQ-ESI-MS/MS. Fourteen of the selected proteins were over-expressed in the low stage tumors; 46 of the proteins were over-expressed in the high stage tumors. These proteins are known to play an important role in cellular functions such as glycolysis, protein biosynthesis, and cytoskeleton rearrangement and may serve as markers associated with different stages of OSCs. To further confirm the stage-dependent protein identifications, Lamin A/C and Vimentin expression in ovarian serous carcinomas was assessed by immunohistochemistry using ovarian tumor tissue microarrays for 66 samples.
Mass Mapping; Ovarian Cancer; Tumor Stage; Cancer Markers; Tissue Microarrays; 2-D Liquid Separations
Pancreatic cancer is a formidable disease and early detection biomarkers are needed to make inroads into improving the outcomes in these patients. In this work lectin antibody microarrays were utilized to detect unique glycosylation patterns of proteins from serum. Antibodies to four potential glycoprotein markers that were found in previous studies were printed on nitrocellulose coated glass slides and these microarrays were hybridized against patient serum to extract the target glycoproteins. Lectins were then used to detect different glycan structural units on the captured glycoproteins in a sandwich assay format. The biotinylated lectins used to assess differential glycosylation patterns were Aleuria aurentia lectin (AAL), Sambucus nigra bark lectin (SNA), Maackia amurensis lectin II (MAL), Lens culinaris agglutinin (LCA), and Concanavalin A (ConA). Captured glycoproteins were evaluated on the microarray in situ by on-plate digestion and direct analysis using MALDI QIT-TOF mass spectroscopy. Analysis was performed using serum from 89 normal controls, 35 chronic pancreatitis samples, 37 diabetic samples and 22 pancreatic cancer samples. We found that this method had excellent reproducibility as measured by the signal deviation of control blocks as on-slide standard and 41 pairs of pure technical replicates. It was possible to discriminate cancer from the other disease groups and normal samples with high sensitivity and specificity where the response of Alpha-1-β glycoprotein to lectin SNA increased by 69% in the cancer sample compared to the other non-cancer groups (95% confidence interval 53% to 86%). These data suggest that differential glycosylation patterns detected on high throughput lectin microarrays are a promising biomarker approach for the early detection of pancreatic cancer.
Glycoproteins; Pancreatic cancer; Lectins; Antibody Array; Cancer Markers