1.  Molecular Analysis of Central Nervous System Disease Spectrum in Childhood Acute Lymphoblastic Leukemia 
Treatment of the central nervous system (CNS) is an essential therapeutic component in childhood acute lymphoblastic leukemia (ALL). The goal of this study was to identify molecular signatures distinguishing patients with CNS disease from those without the disease in pediatric patients with ALL. We analyzed gene expression data from 207 pediatric patients with ALL. Patients without CNS were classified as CNS1, while those with mild and advanced CNS disease were classified as CNS2 and CNS3, respectively. We compared gene expression levels among the three disease classes. We identified gene signatures distinguishing the three disease classes. Pathway analysis revealed molecular networks and biological pathways dysregulated in response to CNS disease involvement. The identified pathways included the ILK, WNT, B-cell receptor, AMPK, ERK5, and JAK signaling pathways. The results demonstrate that transcription profiling could be used to stratify patients to guide therapeutic decision-making in pediatric ALL.
PMCID: PMC4792199  PMID: 26997880
acute lymphoblastic leukemia; central nervous disease spectrum; gene expression
2.  Prognostic Biomarkers for HNSCC using Quantitative Real-Time PCR and Microarray Analysis: β-Tubulin Isotypes and the p53 Interactome 
Cytoskeleton (Hoboken, N.J.)  2014;71(11):628-637.
In 2014, more than 40,000 people in the United States will be diagnosed with head and neck squamous cell cancer (HNSCC) and nearly 8400 people will die of the disease ( Little is known regarding molecular targets that might lead to better therapies and improved outcomes for these patients. The incorporation of taxanes into the standard cisplatin/5-fluouracil initial chemotherapy for HNSCC has been associated with improved response rate and survival. Taxanes target the β-subunit of the tubulin heterodimers, the major protein in microtubules, and halt cell division at G2/M phase. Both laboratory and clinical research suggest a link between β-tubulin expression and cancer patient survival, indicating that patterns of expression for β-tubulin isotypes along with activity of tumor suppressors such as p53 or micro-RNAs could be useful prognostic biomarkers and could suggest therapeutic targets.
PMCID: PMC4367444  PMID: 25355403
tubulin; tubulin isotypes; HNSCC; p53; TUBB3
3.  Cardiac extracellular proteome profiling and membrane topology analysis using glycoproteomics 
Extracellular proteins are easily accessible, which presents a sub-proteome of molecular targets that have high diagnostic and therapeutic potential. Efforts have been made to catalogue the cardiac extracellular matridome and analyze the topology of identified proteins for the design of therapeutic targets. Although many bioinformatics tools have been developed to predict protein topology, topology has been experimentally validated for only a very small portion of membrane proteins. The aim of this study was to use a glycoproteomics and mass spectrometry approach to identify glycoproteins in the extracellular matridome of the infarcted LV and provide experimental evidence for topological determination.
Experimental design
Glycoproteomics analysis was performed on eight biological replicates of day 7 post-MI samples from wild type mice using solid-phase extraction of glycopeptides, followed by mass spectrometric identification of N-linked glycosylation sites for topology assessment.
We identified hundreds of glycoproteins and the identified N-glycosylation sites provide novel information on the correct topology for membrane proteins present in the infarct setting.
Conclusions and clinical relevance
Our data provides the foundation for future studies of the LV infarct extracellular matridome, which may facilitate the discovery of drug targets and biomarkers.
PMCID: PMC4212811  PMID: 24920555
Extracellular matridome; glycoprotein; membrane orientation; matrix metalloproteinase; proteomics; myocardial infarction; left ventricle
4.  Cross-talk between Notch and the Estrogen Receptor in Breast Cancer Suggests Novel Therapeutic Approaches 
Cancer research  2008;68(13):5226-5235.
High expression of Notch-1 and Jagged-1 mRNA correlates with poor prognosis in breast cancer. Elucidating the cross-talk between Notch and other major breast cancer pathways is necessary to determine which patients may benefit from Notch inhibitors, which agents should be combined with them, and which biomarkers indicate Notch activity in vivo. We explored expression of Notch receptors and ligands in clinical specimens, as well as activity, regulation, and effectors of Notch signaling using cell lines and xenografts. Ductal and lobular carcinomas commonly expressed Notch-1, Notch-4, and Jagged-1 at variable levels. However, in breast cancer cell lines, Notch-induced transcriptional activity did not correlate with Notch receptor levels and was highest in estrogen receptor α–negative (ERα–), Her2/Neu nonoverexpressing cells. In ERα+ cells, estradiol inhibited Notch activity and Notch-1IC nuclear levels and affected Notch-1 cellular distribution. Tamoxifen and raloxifene blocked this effect, reactivating Notch. Notch-1 induced Notch-4. Notch-4 expression in clinical specimens correlated with proliferation (Ki67). In MDA-MB231 (ERα–) cells, Notch-1 knockdown or γ-secretase inhibition decreased cyclins A and B1, causing G2 arrest, p53-independent induction of NOXA, and death. In T47D:A18 (ERα+) cells, the same targets were affected, and Notch inhibition potentiated the effects of tamoxifen. In vivo, γ;-secretase inhibitor treatment arrested the growth of MDA-MB231 tumors and, in combination with tamoxifen, caused regression of T47D:A18 tumors. Our data indicate that combinations of antiestrogens and Notch inhibitors may be effective in ERα+ breast cancers and that Notch signaling is a potential therapeutic target in ERα– breast cancers.
PMCID: PMC4445363  PMID: 18593923
5.  Transcriptome Analysis of Minimal Residual Disease in Subtypes of Pediatric B Cell Acute Lymphoblastic Leukemia 
Acute lymphoblastic leukemia (ALL) is the most common childhood cancer and the leading cause of cancer-related death in children and adolescents. Minimal residual disease (MRD) is a strong, independent prognostic factor. The objective of this study was to identify molecular signatures distinguishing patients with positive MRD from those with negative MRD in different subtypes of ALL, and to identify molecular networks and biological pathways deregulated in response to positive MRD at day 46. We compared gene expression levels between patients with positive MRD and negative MRD in each subtype to identify differentially expressed genes. Hierarchical clustering was applied to determine their functional relationships. We identified subtype-specific gene signatures distinguishing patients with positive MRD from those with negative MRD. We identified the genes involved in cell cycle, apoptosis, transport, and DNA repair. We also identified molecular networks and biological pathways dysregulated in response to positive MRD, including Granzyme B, B-cell receptor, and PI3K signaling pathways.
PMCID: PMC4444133  PMID: 26056509
minimal residual disease; B-cell acute lymphoblastic leukemia; gene expression
6.  Evaluation of microarray-based DNA methylation measurement using technical replicates: the Atherosclerosis Risk In Communities (ARIC) Study 
BMC Bioinformatics  2014;15(1):312.
DNA methylation is a widely studied epigenetic phenomenon; alterations in methylation patterns influence human phenotypes and risk of disease. As part of the Atherosclerosis Risk in Communities (ARIC) study, the Illumina Infinium HumanMethylation450 (HM450) BeadChip was used to measure DNA methylation in peripheral blood obtained from ~3000 African American study participants. Over 480,000 cytosine-guanine (CpG) dinucleotide sites were surveyed on the HM450 BeadChip. To evaluate the impact of technical variation, 265 technical replicates from 130 participants were included in the study.
For each CpG site, we calculated the intraclass correlation coefficient (ICC) to compare variation of methylation levels within- and between-replicate pairs, ranging between 0 and 1. We modeled the distribution of ICC as a mixture of censored or truncated normal and normal distributions using an EM algorithm. The CpG sites were clustered into low- and high-reliability groups, according to the calculated posterior probabilities. We also demonstrated the performance of this clustering when applied to a study of association between methylation levels and smoking status of individuals. For the CpG sites showing genome-wide significant association with smoking status, most (~96%) were seen from sites in the high reliability cluster.
We suggest that CpG sites with low ICC may be excluded from subsequent association analyses, or extra caution needs to be taken for associations at such sites.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-312) contains supplementary material, which is available to authorized users.
PMCID: PMC4180315  PMID: 25239148
DNA methylation; Infinium 450 K chip; Technical error; Intraclass correlation; Normal mixture models
7.  High CD49f expression is associated with osteosarcoma tumor progression: a study using patient-derived primary cell cultures 
Cancer Medicine  2014;3(4):796-811.
Overall prognosis for osteosarcoma (OS) is poor despite aggressive treatment options. Limited access to primary tumors, technical challenges in processing OS tissues, and the lack of well-characterized primary cell cultures has hindered our ability to fully understand the properties of OS tumor initiation and progression. In this study, we have isolated and characterized cell cultures derived from four central high-grade human OS samples. Furthermore, we used the cell cultures to study the role of CD49f in OS progression. Recent studies have implicated CD49f in stemness and multipotency of both cancer stem cells and mesenchymal stem cells. Therefore, we investigated the role of CD49f in osteosarcomagenesis. First, single cell suspensions of tumor biopsies were subcultured and characterized for cell surface marker expression. Next, we characterized the growth and differentiation properties, sensitivity to chemotherapy drugs, and anchorage-independent growth. Xenograft assays showed that cell populations expressing CD49fhi/CD90lo cell phenotype produced an aggressive tumor. Multiple lines of evidence demonstrated that inhibiting CD49f decreased the tumor-forming ability. Furthermore, the CD49fhi/CD90lo cell population is generating more aggressive OS tumor growth and indicating this cell surface marker could be a potential candidate for the isolation of an aggressive cell type in OSs.
PMCID: PMC4303148  PMID: 24802970
Cancer stem cell; CD49f; integrin; osteosarcoma; tumor-initiating cell; tumor progression
8.  Integrative Genomic Analysis for the Discovery of Biomarkers in Prostate Cancer 
Biomarker Insights  2014;9:39-51.
Genome-wide association studies (GWAS) have achieved great success in identifying single nucleotide polymorphisms (SNPs, herein called genetic variants) and genes associated with risk of developing prostate cancer. However, GWAS do not typically link the genetic variants to the disease state or inform the broader context in which the genetic variants operate. Here, we present a novel integrative genomics approach that combines GWAS information with gene expression data to infer the causal association between gene expression and the disease and to identify the network states and biological pathways enriched for genetic variants. We identified gene regulatory networks and biological pathways enriched for genetic variants, including the prostate cancer, IGF-1, JAK2, androgen, and prolactin signaling pathways. The integration of GWAS information with gene expression data provides insights about the broader context in which genetic variants associated with an increased risk of developing prostate cancer operate.
PMCID: PMC4085106  PMID: 25057237
GWAS; genetic variants; gene expression; prostate cancer
9.  Comprehensive Assessment and Network Analysis of the Emerging Genetic Susceptibility Landscape of Prostate Cancer 
Cancer Informatics  2013;12:175-191.
Recent advances in high-throughput genotyping have made possible identification of genetic variants associated with increased risk of developing prostate cancer using genome-wide associations studies (GWAS). However, the broader context in which the identified genetic variants operate is poorly understood. Here we present a comprehensive assessment, network, and pathway analysis of the emerging genetic susceptibility landscape of prostate cancer.
We created a comprehensive catalog of genetic variants and associated genes by mining published reports and accompanying websites hosting supplementary data on GWAS. We then performed network and pathway analysis using single nucleotide polymorphism (SNP)-containing genes to identify gene regulatory networks and pathways enriched for genetic variants.
We identified multiple gene networks and pathways enriched for genetic variants including IGF-1, androgen biosynthesis and androgen signaling pathways, and the molecular mechanisms of cancer. The results provide putative functional bridges between GWAS findings and gene regulatory networks and biological pathways.
PMCID: PMC3769142  PMID: 24031161
prostate cancer GWAS network pathway analysis
10.  Analysis of Patterns of Gene Expression Variation within and between Ethnic Populations in Pediatric B-ALL 
Cancer Informatics  2013;12:155-173.
B-Precursor acute lymphoblastic leukemia (B-ALL) is the most common childhood cancer. Although 80% of B-ALL patients are able to be cured, significant challenges persist. Significant disparities in clinical outcomes and mortality rates exist between racial/ethnic populations. The objective of this study was to determine whether gene expression levels significantly differ between ethnic populations. We compared gene expression levels between four ethnic populations (Whites, Blacks, Hispanics, and Asians) in the United States. Additionally, we performed network and pathway analysis to identify gene networks and pathways. Gene expression data involved 198 samples distributed as follows: 126 Whites, 51 Hispanics, 13 Blacks, and 8 Asians. We identified 300 highly significantly (P < 0.001) differentially expressed genes between the four ethnic populations. Among the identified genes included the genes PHF6, BRD3, CRLF2, and RNF135 which have been implicated in pediatric B-ALL. We identified key pathways implicated in B-ALL including the PDGF, PI3/AKT, ERBB2-ERBB3, and IL-15 signaling pathways.
PMCID: PMC3762614  PMID: 24023509
leukemia gene expression variation pediatric B-ALL
11.  Novel Integrative Genomics Approach for Associating GWAS Information with Intrinsic Subtypes of Breast Cancer 
Cancer Informatics  2013;12:125-142.
Genome-wide association studies (GWAS) have achieved great success in identifying common variants associated with increased risk of developing breast cancer. However, GWAS do not typically provide information about the broader context in which genetic variants operate in different subtypes of breast cancer. The objective of this study was to determine whether genes containing single nucleotide polymorphisms (SNPs, herein called genetic variants) are associated with different subtypes of breast cancer. Additionally, we sought to identify gene regulator networks and biological pathways enriched for these genetic variants. Using supervised analysis, we identified 201 genes that were significantly associated with the six intrinsic subtypes of breast cancer. The results demonstrate that integrative genomics analysis is a powerful approach for linking GWAS information to distinct disease states and provide insights about the broader context in which genetic variants operate in different subtypes of breast cancer.
PMCID: PMC3663490  PMID: 23761956
GWAS subtypes breast cancer
12.  An Integrative Genomics Approach for Associating GWAS Information with Triple-Negative Breast Cancer 
Cancer Informatics  2013;12:1-20.
Genome-wide association studies (GWAS) have identified genetic variants associated with an increased risk of developing breast cancer. However, the association of genetic variants and their associated genes with the most aggressive subset of breast cancer, the triple-negative breast cancer (TNBC), remains a central puzzle in molecular epidemiology. The objective of this study was to determine whether genes containing single nucleotide polymorphisms (SNPs) associated with an increased risk of developing breast cancer are connected to and could stratify different subtypes of TNBC. Additionally, we sought to identify molecular pathways and networks involved in TNBC. We performed integrative genomics analysis, combining information from GWAS studies involving over 400,000 cases and over 400,000 controls, with gene expression data derived from 124 breast cancer patients classified as TNBC (at the time of diagnosis) and 142 cancer-free controls. Analysis of GWAS reports produced 500 SNPs mapped to 188 genes. We identified a signature of 159 functionally related SNP-containing genes which were significantly (P <10−5) associated with and stratified TNBC. Additionally, we identified 97 genes which were functionally related to, and had similar patterns of expression profiles, SNP-containing genes. Network modeling and pathway prediction revealed multi-gene pathways including p53, NFkB, BRCA, apoptosis, DNA repair, DNA mismatch, and excision repair pathways enriched for SNPs mapped to genes significantly associated with TNBC. The results provide convincing evidence that integrating GWAS information with gene expression data provides a unified and powerful approach for biomarker discovery in TNBC.
PMCID: PMC3565545  PMID: 23423317
triple negative breast cancer GWAS gene expression
14.  Integrative Analysis of Response to Tamoxifen Treatment in ER-Positive Breast Cancer Using GWAS Information and Transcription Profiling 
Variable response and resistance to tamoxifen treatment in breast cancer patients remains a major clinical problem. To determine whether genes and biological pathways containing SNPs associated with risk for breast cancer are dysregulated in response to tamoxifen treatment, we performed analysis combining information from 43 genome-wide association studies with gene expression data from 298 ER+ breast cancer patients treated with tamoxifen and 125 ER+ controls. We identified 95 genes which distinguished tamoxifen treated patients from controls. Additionally, we identified 54 genes which stratified tamoxifen treated patients into two distinct groups. We identified biological pathways containing SNPs associated with risk for breast cancer, which were dysregulated in response to tamoxifen treatment. Key pathways identified included the apoptosis, P53, NFkB, DNA repair and cell cycle pathways. Combining GWAS with transcription profiling provides a unified approach for associating GWAS findings with response to drug treatment and identification of potential drug targets.
PMCID: PMC3292850  PMID: 22399860
tamoxifen genome-wide association studies gene expression
16.  An Integrative Genomics Approach to Biomarker Discovery in Breast Cancer 
Cancer Informatics  2011;10:185-204.
Genome-wide association studies (GWAS) have successfully identified genetic variants associated with risk for breast cancer. However, the molecular mechanisms through which the identified variants confer risk or influence phenotypic expression remains poorly understood. Here, we present a novel integrative genomics approach that combines GWAS information with gene expression data to assess the combined contribution of multiple genetic variants acting within genes and putative biological pathways, and to identify novel genes and biological pathways that could not be identified using traditional GWAS. The results show that genes containing SNPs associated with risk for breast cancer are functionally related and interact with each other in biological pathways relevant to breast cancer. Additionally, we identified novel genes that are co-expressed and interact with genes containing SNPs associated with breast cancer. Integrative analysis combining GWAS information with gene expression data provides functional bridges between GWAS findings and biological pathways involved in breast cancer.
PMCID: PMC3153161  PMID: 21869864
genome-wide association studies gene expression pathway
17.  Associating GWAS Information with the Notch Signaling Pathway Using Transcription Profiling 
Cancer Informatics  2011;10:93-108.
Genome-wide association studies (GWAS) have identified SNPs associated with breast cancer. However, they offer limited insights about the biological mechanisms by which SNPs confer risk. We investigated the association of GWAS information with a major oncogenic pathway in breast cancer, the Notch signaling pathway. We first identified 385 SNPs and 150 genes associated with risk for breast cancer by mining data from 41 GWAS. We then investigated their expression, along with 32 genes involved in the Notch signaling pathway using two publicly available gene expression data sets from the Caucasian (42 cases and 143 controls) and Asian (43 cases and 43 controls) populations. Pathway prediction and network modeling confirmed that Notch receptors and genes involved in the Notch signaling pathway interact with genes containing SNPs associated with risk for breast cancer. Additionally, we identified other SNP-associated biological pathways relevant to breast cancer, including the P53, apoptosis and MAP kinase pathways.
PMCID: PMC3091413  PMID: 21584266
GWAS; gene expression; Notch signaling pathway
18.  A method of predicting changes in human gene splicing induced by genetic variants in context of cis-acting elements 
BMC Bioinformatics  2010;11:22.
Polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We have developed a new tool (SpliceScan II) for predicting the effects of genetic variants on splicing and cis-regulatory elements. The novel Bayesian non-canonical 5'GC splice site (SS) sensor used in our tool allows inference on non-canonical exons.
Our tool performed favorably when compared with the existing methods in the context of genes linked to the Autism Spectrum Disorder (ASD). SpliceScan II was able to predict more aberrant splicing isoforms triggered by the mutations, as documented in DBASS5 and DBASS3 aberrant splicing databases, than other existing methods. Detrimental effects behind some of the polymorphic variations previously associated with Alzheimer's and breast cancer could be explained by changes in predicted splicing patterns.
We have developed SpliceScan II, an effective and sensitive tool for predicting the detrimental effects of genomic variants on splicing leading to Mendelian and complex hereditary disorders. The method could potentially be used to screen resequenced patient DNA to identify de novo mutations and polymorphic variants that could contribute to a genetic disorder.
PMCID: PMC3098058  PMID: 20067640
19.  Computational prediction of splicing regulatory elements shared by Tetrapoda organisms 
BMC Genomics  2009;10:508.
Auxiliary splicing sequences play an important role in ensuring accurate and efficient splicing by promoting or repressing recognition of authentic splice sites. These cis-acting motifs have been termed splicing enhancers and silencers and are located both in introns and exons. They co-evolved into an intricate splicing code together with additional functional constraints, such as tissue-specific and alternative splicing patterns. We used orthologous exons extracted from the University of California Santa Cruz multiple genome alignments of human and 22 Tetrapoda organisms to predict candidate enhancers and silencers that have reproducible and statistically significant bias towards annotated exonic boundaries.
A total of 2,546 Tetrapoda enhancers and silencers were clustered into 15 putative core motifs based on their Markov properties. Most of these elements have been identified previously, but 118 putative silencers and 260 enhancers (~15%) were novel. Examination of previously published experimental data for the presence of predicted elements showed that their mutations in 21/23 (91.3%) cases altered the splicing pattern as expected. Predicted intronic motifs flanking 3' and 5' splice sites had higher evolutionary conservation than other sequences within intronic flanks and the intronic enhancers were markedly differed between 3' and 5' intronic flanks.
Difference in intronic enhancers supporting 5' and 3' splice sites suggests an independent splicing commitment for neighboring exons. Increased evolutionary conservation for ISEs/ISSs within intronic flanks and effect of modulation of predicted elements on splicing suggest functional significance of found elements in splicing regulation. Most of the elements identified were shown to have direct implications in human splicing and therefore could be useful for building computational splicing models in biomedical research.
PMCID: PMC2777938  PMID: 19889216
20.  fMRI Study of Language Activation in Schizophrenia, Schizoaffective Disorder and in Individuals Genetically at High Risk 
Schizophrenia research  2007;96(1-3):14-24.
Structural and functional abnormalities have been found in language-related brain regions in patients with schizophrenia. We previously reported findings pointing to differences in word processing between people with schizophrenia and individuals who are at high-risk for schizophrenia using a voxel-based (whole brain) fMRI approach. We now extend this finding to specifically examine functional activity in three language related cortical regions using a larger cohort of individuals.
A visual lexical discrimination task was performed by 36 controls, 21 subjects at high genetic-risk for schizophrenia, and 20 patients with schizophrenia during blood oxygenation level dependent (BOLD) fMRI scanning. Activation in bilateral inferior frontal gyri (Brodmann's area 44-45), bilateral inferior parietal lobe (Brodmann's area 39-40), and bilateral superior temporal gyri (Brodmann's area 22) was investigated. For all subjects, two-tailed Pearson correlations were calculated between the computed laterality index and a series of cognitive test scores determining language functioning.
Regional activation in Brodmann's area 44-45 was left lateralized in normal controls, while high-risk subjects and patients with schizophrenia or schizoaffective disorder showed more bilateral activation. No significant differences among the three diagnostic groups in the other two regions of interest (Brodmann's area 22 or areas 39-40) were found. Furthermore, the apparent reasons for loss of leftward language lateralization differed between groups. In high-risk subjects, the loss of lateralization was based on reduced left hemisphere activation, while in the patient group, it was due to increased right side activation. Language ability related cognitive scores were positively correlations with the laterality indices obtained from Brodmann's areas 44-45 in the high-risk group, and with the laterality indices from Brodmann's areas 22 and 44-45 in the patient group.
This study reinforces previous language related imaging studies in high-risk subjects and patients with schizophrenia suggesting that reduced functional lateralization in language related frontal cortex may be a vulnerability marker for schizophrenia. Future studies will determine whether it is predictive of who develops illness.
PMCID: PMC2212592  PMID: 17719745
fMRI; schizophrenia; High risk, genetic; Language lateralization; ROI based study

