Search tips
Search criteria

Results 1-14 (14)

Clipboard (0)
Year of Publication
Document Types
1.  Simultaneous inferences based on empirical Bayes methods and false discovery rates ineQTL data analysis 
BMC Genomics  2013;14(Suppl 8):S8.
Genome-wide association studies (GWAS) have identified hundreds of genetic variants associated with complex human diseases, clinical conditions and traits. Genetic mapping of expression quantitative trait loci (eQTLs) is providing us with novel functional effects of thousands of single nucleotide polymorphisms (SNPs). In a classical quantitative trail loci (QTL) mapping problem multiple tests are done to assess whether one trait is associated with a number of loci. In contrast to QTL studies, thousands of traits are measured alongwith thousands of gene expressions in an eQTL study. For such a study, a huge number of tests have to be performed (~106). This extreme multiplicity gives rise to many computational and statistical problems. In this paper we have tried to address these issues using two closely related inferential approaches: an empirical Bayes method that bears the Bayesian flavor without having much a priori knowledge and the frequentist method of false discovery rates. A three-component t-mixture model has been used for the parametric empirical Bayes (PEB) method. Inferences have been obtained using Expectation/Conditional Maximization Either (ECME) algorithm. A simulation study has also been performed and has been compared with a nonparametric empirical Bayes (NPEB) alternative.
The results show that PEB has an edge over NPEB. The proposed methodology has been applied to human liver cohort (LHC) data. Our method enables to discover more significant SNPs with FDR<10% compared to the previous study done by Yang et al. (Genome Research, 2010).
In contrast to previously available methods based on p-values, the empirical Bayes method uses local false discovery rate (lfdr) as the threshold. This method controls false positive rate.
PMCID: PMC4042241  PMID: 24564682
2.  Alt Event Finder: a tool for extracting alternative splicing events from RNA-seq data 
BMC Genomics  2012;13(Suppl 8):S10.
Alternative splicing increases proteome diversity by expressing multiple gene isoforms that often differ in function. Identifying alternative splicing events from RNA-seq experiments is important for understanding the diversity of transcripts and for investigating the regulation of splicing.
We developed Alt Event Finder, a tool for identifying novel splicing events by using transcript annotation derived from genome-guided construction tools, such as Cufflinks and Scripture. With a proper combination of alignment and transcript reconstruction tools, Alt Event Finder is capable of identifying novel splicing events in the human genome. We further applied Alt Event Finder on a set of RNA-seq data from rat liver tissues, and identified dozens of novel cassette exon events whose splicing patterns changed after extensive alcohol exposure.
Alt Event Finder is capable of identifying de novo splicing events from data-driven transcript annotation, and is a useful tool for studying splicing regulation.
PMCID: PMC3535697  PMID: 23281921
3.  A modulator based regulatory network for ERα signaling pathway 
BMC Genomics  2012;13(Suppl 6):S6.
Estrogens control multiple functions of hormone-responsive breast cancer cells. They regulate diverse physiological processes in various tissues through genomic and non-genomic mechanisms that result in activation or repression of gene expression. Transcription regulation upon estrogen stimulation is a critical biological process underlying the onset and progress of the majority of breast cancer. ERα requires distinct co-regulator or modulators for efficient transcriptional regulation, and they form a regulatory network. Knowing this regulatory network will enable systematic study of the effect of ERα on breast cancer.
To investigate the regulatory network of ERα and discover novel modulators of ERα functions, we proposed an analytical method based on a linear regression model to identify translational modulators and their network relationships. In the network analysis, a group of specific modulator and target genes were selected according to the functionality of modulator and the ERα binding. Network formed from targets genes with ERα binding was called ERα genomic regulatory network; while network formed from targets genes without ERα binding was called ERα non-genomic regulatory network. Considering the active or repressive function of ERα, active or repressive function of a modulator, and agonist or antagonist effect of a modulator on ERα, the ERα/modulator/target relationships were categorized into 27 classes.
Using the gene expression data and ERα Chip-seq data from the MCF-7 cell line, the ERα genomic/non-genomic regulatory networks were built by merging ERα/ modulator/target triplets (TF, M, T), where TF refers to the ERα, M refers to the modulator, and T refers to the target. Comparing these two networks, ERα non-genomic network has lower FDR than the genomic network. In order to validate these two networks, the same network analysis was performed in the gene expression data from the ZR-75.1 cell. The network overlap analysis between two cancer cells showed 1% overlap for the ERα genomic regulatory network, but 4% overlap for the non-genomic regulatory network.
We proposed a novel approach to infer the ERα/modulator/target relationships, and construct the genomic/non-genomic regulatory networks in two cancer cells. We found that the non-genomic regulatory network is more reliable than the genomic regulatory network.
PMCID: PMC3481450  PMID: 23134758
4.  Changes in predicted protein disorder tendency may contribute to disease risk 
BMC Genomics  2011;12(Suppl 5):S2.
Recent studies suggest that many proteins or regions of proteins lack 3D structure. Defined as intrinsically disordered proteins, these proteins/peptides are functionally important. Recent advances in next generation sequencing technologies enable genome-wide identification of novel nucleotide variations in a specific population or cohort.
Using the exonic single nucleotide variations (SNVs) identified in the 1,000 Genomes Project and distributed by the Genetic Analysis Workshop 17, we systematically analysed the genetic and predicted disorder potential features of the non-synonymous variations. The result of experiments suggests that a significant change in the tendency of a protein region to be structured or disordered caused by SNVs may lead to malfunction of such a protein and contribute to disease risk.
After validation with functional SNVs on the traits distributed by GAW17, we conclude that it is valuable to consider structure/disorder tendencies while prioritizing and predicting mechanistic effects arising from novel genetic variations.
PMCID: PMC3287498  PMID: 22369681
5.  Predicting sequence and structural specificities of RNA binding regions recognized by splicing factor SRSF1 
BMC Genomics  2011;12(Suppl 5):S8.
RNA-binding proteins (RBPs) play diverse roles in eukaryotic RNA processing. Despite their pervasive functions in coding and noncoding RNA biogenesis and regulation, elucidating the sequence specificities that define protein-RNA interactions remains a major challenge. Recently, CLIP-seq (Cross-linking immunoprecipitation followed by high-throughput sequencing) has been successfully implemented to study the transcriptome-wide binding patterns of SRSF1, PTBP1, NOVA and fox2 proteins. These studies either adopted traditional methods like Multiple EM for Motif Elicitation (MEME) to discover the sequence consensus of RBP's binding sites or used Z-score statistics to search for the overrepresented nucleotides of a certain size. We argue that most of these methods are not well-suited for RNA motif identification, as they are unable to incorporate the RNA structural context of protein-RNA interactions, which may affect to binding specificity. Here, we describe a novel model-based approach--RNAMotifModeler to identify the consensus of protein-RNA binding regions by integrating sequence features and RNA secondary structures.
As an example, we implemented RNAMotifModeler on SRSF1 (SF2/ASF) CLIP-seq data. The sequence-structural consensus we identified is a purine-rich octamer 'AGAAGAAG' in a highly single-stranded RNA context. The unpaired probabilities, the probabilities of not forming pairs, are significantly higher than negative controls and the flanking sequence surrounding the binding site, indicating that SRSF1 proteins tend to bind on single-stranded RNA. Further statistical evaluations revealed that the second and fifth bases of SRSF1octamer motif have much stronger sequence specificities, but weaker single-strandedness, while the third, fourth, sixth and seventh bases are far more likely to be single-stranded, but have more degenerate sequence specificities. Therefore, we hypothesize that nucleotide specificity and secondary structure play complementary roles during binding site recognition by SRSF1.
In this study, we presented a computational model to predict the sequence consensus and optimal RNA secondary structure for protein-RNA binding regions. The successful implementation on SRSF1 CLIP-seq data demonstrates great potential to improve our understanding on the binding specificity of RNA binding proteins.
PMCID: PMC3287504  PMID: 22369183
6.  Alteration of gene expression by alcohol exposure at early neurulation 
BMC Genomics  2011;12:124.
We have previously demonstrated that alcohol exposure at early neurulation induces growth retardation, neural tube abnormalities, and alteration of DNA methylation. To explore the global gene expression changes which may underline these developmental defects, microarray analyses were performed in a whole embryo mouse culture model that allows control over alcohol and embryonic variables.
Alcohol caused teratogenesis in brain, heart, forelimb, and optic vesicle; a subset of the embryos also showed cranial neural tube defects. In microarray analysis (accession number GSM9545), adopting hypothesis-driven Gene Set Enrichment Analysis (GSEA) informatics and intersection analysis of two independent experiments, we found that there was a collective reduction in expression of neural specification genes (neurogenin, Sox5, Bhlhe22), neural growth factor genes [Igf1, Efemp1, Klf10 (Tieg), and Edil3], and alteration of genes involved in cell growth, apoptosis, histone variants, eye and heart development. There was also a reduction of retinol binding protein 1 (Rbp1), and de novo expression of aldehyde dehydrogenase 1B1 (Aldh1B1). Remarkably, four key hematopoiesis genes (glycophorin A, adducin 2, beta-2 microglobulin, and ceruloplasmin) were absent after alcohol treatment, and histone variant genes were reduced. The down-regulation of the neurospecification and the neurotrophic genes were further confirmed by quantitative RT-PCR. Furthermore, the gene expression profile demonstrated distinct subgroups which corresponded with two distinct alcohol-related neural tube phenotypes: an open (ALC-NTO) and a closed neural tube (ALC-NTC). Further, the epidermal growth factor signaling pathway and histone variants were specifically altered in ALC-NTO, and a greater number of neurotrophic/growth factor genes were down-regulated in the ALC-NTO than in the ALC-NTC embryos.
This study revealed a set of genes vulnerable to alcohol exposure and genes that were associated with neural tube defects during early neurulation.
PMCID: PMC3056799  PMID: 21338521
7.  2K09 and thereafter : the coming era of integrative bioinformatics, systems biology and intelligent computing for functional genomics and personalized medicine research 
BMC Genomics  2010;11(Suppl 3):I1.
Significant interest exists in establishing synergistic research in bioinformatics, systems biology and intelligent computing. Supported by the United States National Science Foundation (NSF), International Society of Intelligent Biological Medicine (, International Journal of Computational Biology and Drug Design (IJCBDD) and International Journal of Functional Informatics and Personalized Medicine, the ISIBM International Joint Conferences on Bioinformatics, Systems Biology and Intelligent Computing (ISIBM IJCBS 2009) attracted more than 300 papers and 400 researchers and medical doctors world-wide. It was the only inter/multidisciplinary conference aimed to promote synergistic research and education in bioinformatics, systems biology and intelligent computing. The conference committee was very grateful for the valuable advice and suggestions from honorary chairs, steering committee members and scientific leaders including Dr. Michael S. Waterman (USC, Member of United States National Academy of Sciences), Dr. Chih-Ming Ho (UCLA, Member of United States National Academy of Engineering and Academician of Academia Sinica), Dr. Wing H. Wong (Stanford, Member of United States National Academy of Sciences), Dr. Ruzena Bajcsy (UC Berkeley, Member of United States National Academy of Engineering and Member of United States Institute of Medicine of the National Academies), Dr. Mary Qu Yang (United States National Institutes of Health and Oak Ridge, DOE), Dr. Andrzej Niemierko (Harvard), Dr. A. Keith Dunker (Indiana), Dr. Brian D. Athey (Michigan), Dr. Weida Tong (FDA, United States Department of Health and Human Services), Dr. Cathy H. Wu (Georgetown), Dr. Dong Xu (Missouri), Drs. Arif Ghafoor and Okan K Ersoy (Purdue), Dr. Mark Borodovsky (Georgia Tech, President of ISIBM), Dr. Hamid R. Arabnia (UGA, Vice-President of ISIBM), and other scientific leaders. The committee presented the 2009 ISIBM Outstanding Achievement Awards to Dr. Joydeep Ghosh (UT Austin), Dr. Aidong Zhang (Buffalo) and Dr. Zhi-Hua Zhou (Nanjing) for their significant contributions to the field of intelligent biological medicine.
PMCID: PMC2999338  PMID: 21143775
8.  Genome-wide prediction of cis-acting RNA elements regulating tissue-specific pre-mRNA alternative splicing 
BMC Genomics  2009;10(Suppl 1):S4.
Human genes undergo various patterns of pre-mRNA splicing across different tissues. Such variation is primarily regulated by trans-acting factors that bind on exonic and intronic cis-acting RNA elements (CAEs). Here we report a computational method to mechanistically identify cis-acting RNA elements that contribute to the tissue-specific alternative splicing pattern. This method is an extension of our previous model, SplicingModeler, which predicts the significant CAEs that contribute to the splicing differences between two tissues. In this study, we introduce tissue-specific functional levels estimation step, which allows evaluating regulatory functions of predicted CAEs that are involved in more than two tissues.
Using a publicly available Affymetrix Genechip® Human Exon Array dataset, our method identifies 652 cis-acting RNA elements (CAEs) across 11 human tissues. About one third of predicted CAEs can be mapped to the known RBP (RNA binding protein) binding sites or match with other predicted exonic splicing regulator databases. Interestingly, the vast majority of predicted CAEs are in intronic regulatory regions. A noticeable exception is that many exonic elements are found to regulate the alternative splicing between cerebellum and testes. Most identified elements are found to contribute to the alternative splicing between two tissues, while some are important in multiple tissues. This suggests that genome-wide alternative splicing patterns are regulated by a combination of tissue-specific cis-acting elements and "general elements" whose functional activities are important but differ across multiple tissues.
In this study, we present a model-based computational approach to identify potential cis-acting RNA elements by considering the exon splicing variation as the combinatorial effects of multiple cis-acting regulators. This methodology provides a novel evaluation on the functional levels of cis-acting RNA elements by estimating their tissue-specific functions on various tissues.
PMCID: PMC2709265  PMID: 19594881
9.  Reconstruct gene regulatory network using slice pattern model 
BMC Genomics  2009;10(Suppl 1):S2.
Gene expression time series array data has become a useful resource for investigating gene functions and the interactions between genes. However, the gene expression arrays are always mixed with noise, and many nonlinear regulatory relationships have been omitted in many linear models. Because of those practical limitations, inference of gene regulatory model from expression data is still far from satisfactory.
In this study, we present a model-based computational approach, Slice Pattern Model (SPM), to identify gene regulatory network from time series gene expression array data. In order to estimate performances of stability and reliability of our model, an artificial gene network is tested by the traditional linear model and SPM. SPM can handle the multiple transcriptional time lags and more accurately reconstruct the gene network. Using SPM, a 17 time-series gene expression data in yeast cell cycle is retrieved to reconstruct the regulatory network. Under the reliability threshold, θ = 55%, 18 relationships between genes are identified and transcriptional regulatory network is reconstructed. Results from previous studies demonstrate that most of gene relationships identified by SPM are correct.
With the help of pattern recognition and similarity analysis, the effect of noise has been limited in SPM method. At the same time, genetic algorithm is introduced to optimize parameters of gene network model, which is performed based on a statistic method in our experiments. The results of experiments demonstrate that the gene regulatory model reconstructed using SPM is more stable and reliable than those models coming from traditional linear model.
PMCID: PMC2709263  PMID: 19594879
10.  Transcription factor and microRNA regulation in androgen-dependent and -independent prostate cancer cells 
BMC Genomics  2008;9(Suppl 2):S22.
Prostate cancer is one of the leading causes of cancer death in men. Androgen ablation, the most commonly-used therapy for progressive prostate cancer, is ineffective once the cancer cells become androgen-independent. The regulatory mechanisms that cause this transition (from androgen-dependent to androgen-independent) remain unknown. In this study, based on the microarray data comparing global gene expression patterns in the prostate tissue between androgen-dependent and -independent prostate cancer patients, we indentify a set of transcription factors and microRNAs that potentially cause such difference, using a model-based computational approach.
From 335 position weight matrices in the TRANSFAC database and 564 microRNAs in the microRNA registry, our model identify 5 transcription factors and 7 microRNAs to be potentially responsible for the level of androgen dependency. Of these transcription factors and microRNAs, the estimated function of all the 5 transcription factors are predicted to be inhibiting transcription in androgen-independent samples comparing with the dependent ones. Six out of 7 microRNAs, however, demonstrated stimulatory effects. We also find that the expression levels of three predicted transcription factors, including AP-1, STAT3 (signal transducers and activators of transcription 3), and DBP (albumin D-box) are significantly different between androgen-dependent and -independent patients. In addition, microRNA microarray data from other studies confirm that several predicted microRNAs, including miR-21, miR-135a, and miR-135b, demonstrate differential expression in prostate cancer cells, comparing with normal tissues.
We present a model-based computational approach to identify transcription factors and microRNAs influencing the progression of androgen-dependent prostate cancer to androgen-independent prostate cancer. This result suggests that the capability of transcription factors to initiate transcription and microRNAs to facilitate mRNA degradation are both decreased in androgen-independent prostate cancer. The proposed model-based approach indicates that considering combinatorial effects of transcription factors and microRNAs in a unified model provides additional transcriptional and post-transcriptional regulatory mechanisms on global gene expression in the prostate cancer with different hormone-dependency.
PMCID: PMC2559887  PMID: 18831788
11.  A Poisson mixture model to identify changes in RNA polymerase II binding quantity using high-throughput sequencing technology 
BMC Genomics  2008;9(Suppl 2):S23.
We present a mixture model-based analysis for identifying differences in the distribution of RNA polymerase II (Pol II) in transcribed regions, measured using ChIP-seq (chromatin immunoprecipitation following massively parallel sequencing technology). The statistical model assumes that the number of Pol II-targeted sequences contained within each genomic region follows a Poisson distribution. A Poisson mixture model was then developed to distinguish Pol II binding changes in transcribed region using an empirical approach and an expectation-maximization (EM) algorithm developed for estimation and inference. In order to achieve a global maximum in the M-step, a particle swarm optimization (PSO) was implemented. We applied this model to Pol II binding data generated from hormone-dependent MCF7 breast cancer cells and antiestrogen-resistant MCF7 breast cancer cells before and after treatment with 17β-estradiol (E2). We determined that in the hormone-dependent cells, ~9.9% (2527) genes showed significant changes in Pol II binding after E2 treatment. However, only ~0.7% (172) genes displayed significant Pol II binding changes in E2-treated antiestrogen-resistant cells. These results show that a Poisson mixture model can be used to analyze ChIP-seq data.
PMCID: PMC2559888  PMID: 18831789
12.  Using RNase sequence specificity to refine the identification of RNA-protein binding regions 
BMC Genomics  2008;9(Suppl 1):S17.
Massively parallel pyrosequencing is a high-throughput technology that can sequence hundreds of thousands of DNA/RNA fragments in a single experiment. Combining it with immunoprecipitation-based biochemical assays, such as cross-linking immunoprecipitation (CLIP), provides a genome-wide method to detect the sites at which proteins bind DNA or RNA. In a CLIP-pyrosequencing experiment, the resolutions of the detected protein binding regions are partially determined by the length of the detected RNA fragments (CLIP amplicons) after trimming by RNase digestion. The lengths of these fragments usually range from 50-70 nucleotides. Many genomic regions are marked by multiple RNA fragments. In this paper, we report an empirical approach to refine the localization of protein binding regions by using the distribution pattern of the detected RNA fragments and the sequence specificity of RNase digestion. We present two regions to which multiple amplicons map as examples to demonstrate this approach.
PMCID: PMC2386059  PMID: 18366606
13.  Identification of transcription factor and microRNA binding sites in responsible to fetal alcohol syndrome 
BMC Genomics  2008;9(Suppl 1):S19.
This is a first report, using our MotifModeler informatics program, to simultaneously identify transcription factor (TF) and microRNA (miRNA) binding sites from gene expression microarray data. Based on the assumption that gene expression is controlled by combinatorial effects of transcription factors binding in the 5'-upstream regulatory region and miRNAs binding in the 3'-untranslated region (3'-UTR), we developed a model for (1) predicting the most influential cis-acting elements under a given biological condition, and (2) estimating the effects of those elements on gene expression levels. The regulatory regions, TF and miRNA, which mediate the differential genes expression in fetal alcohol syndrome were unknown; microarray data from alcohol exposure paradigm was used. The model predicted strong inhibitory effects of 5' cis-acting elements and stimulatory effects of 3'-UTR under alcohol treatment. Current predictive model derived a key hypothesis for the first time a novel role of miRNAs in gene expression changes associated with abnormal mouse embryo development after alcohol exposure. This suggests that disturbance of miRNA functions may contribute to the alcohol-induced developmental deficiencies.
PMCID: PMC2386061  PMID: 18366608
14.  Artificial ants deposit pheromone to search for regulatory DNA elements 
BMC Genomics  2006;7:221.
Identification of transcription-factor binding motifs (DNA sequences) can be formulated as a combinatorial problem, where an efficient algorithm is indispensable to predict the role of multiple binding motifs. An ant algorithm is a biology-inspired computational technique, through which a combinatorial problem is solved by mimicking the behavior of social insects such as ants. We developed a unique version of ant algorithms to select a set of binding motifs by considering a potential contribution of each of all random DNA sequences of 4- to 7-bp in length.
Human chondrogenesis was used as a model system. The results revealed that the ant algorithm was able to identify biologically known binding motifs in chondrogenesis such as AP-1, NFκB, and sox9. Some of the predicted motifs were identical to those previously derived with the genetic algorithm. Unlike the genetic algorithm, however, the ant algorithm was able to evaluate a contribution of individual binding motifs as a spectrum of distributed information and predict core consensus motifs from a wider DNA pool.
The ant algorithm offers an efficient, reproducible procedure to predict a role of individual transcription-factor binding motifs using a unique definition of artificial ants.
PMCID: PMC1586019  PMID: 16942615

Results 1-14 (14)