Search tips
Search criteria

Results 1-16 (16)

Clipboard (0)
Year of Publication
Document Types
1.  Transcriptional regulation and spatial interactions of head-to-head genes 
BMC Genomics  2014;15(1):519.
In eukaryotic genomes, about 10% of genes are arranged in a head-to-head (H2H) orientation, and the distance between the transcription start sites of each gene pair is closer than 1 kb. Two genes in an H2H pair are prone to co-express and co-function. There have been many studies on bidirectional promoters. However, the mechanism by which H2H genes are regulated at the transcriptional level still needs further clarification, especially with regard to the co-regulation of H2H pairs. In this study, we first used the Hi-C data of chromatin linkages to identify spatially interacting H2H pairs, and then integrated ChIP-seq data to compare H2H gene pairs with and without evidence of spatial interactions in terms of their binding transcription factors (TFs). Using ChIP-seq and DNase-seq data, histones and DNase associated with H2H pairs were identified. Furthermore, we looked into the connections between H2H genes in a human co-expression network.
We found that i) Similar to the behaviour of two genes within an H2H pair (intra-H2H pair), a gene pair involving two distinct H2H pairs (inter-H2H pair) which interact with each other spatially, share common transcription factors (TFs); ii) TFs of intra- and inter-H2H pairs are distributed differently. Factors such as HEY1, GABP, Sin3Ak-20, POL2, E2F6, and c-MYC are essential for the bidirectional transcription of intra-H2H pairs; while factors like CTCF, BDP1, GATA2, RAD21, and POL3 play important roles in coherently regulating inter-H2H pairs; iii) H2H gene blocks are enriched with hypersensitive DNase and modified histones, which participate in active transcriptions; and iv) H2H genes tend to be highly connected compared with non-H2H genes in the human co-expression network.
Our findings shed new light on the mechanism of the transcriptional regulation of H2H genes through their linear and spatial interactions. For intra-H2H gene pairs, transcription factors regulate their transcriptions through bidirectional promoters, whereas for inter-H2H gene pairs, transcription factors are likely to regulate their activities depending on the spatial interaction of H2H gene pairs. In this way, two distinctive groups of transcription factors mediate intra- and inter-H2H gene transcriptions respectively, resulting in a highly compact gene regulatory network.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-519) contains supplementary material, which is available to authorized users.
PMCID: PMC4089025  PMID: 24962804
2.  Identification of gene fusions from human lung cancer mass spectrometry data 
BMC Genomics  2013;14(Suppl 8):S5.
Tandem mass spectrometry (MS/MS) technology has been applied to identify proteins, as an ultimate approach to confirm the original genome annotation. To be able to identify gene fusion proteins, a special database containing peptides that cross over gene fusion breakpoints is needed.
It is impractical to construct a database that includes all possible fusion peptides originated from potential breakpoints. Focusing on 6259 reported and predicted gene fusion pairs from ChimerDB 2.0 and Cancer Gene Census, we for the first time created a database CanProFu that comprehensively annotates fusion peptides formed by exon-exon linkage between these pairing genes.
Applying this database to mass spectrometry datasets of 40 human non-small cell lung cancer (NSCLC) samples and 39 normal lung samples with stringent searching criteria, we were able to identify 19 unique fusion peptides characterizing gene fusion events. Among them 11 gene fusion events were only found in NSCLC samples. And also, 4 alternative splicing events were characterized in cancerous or normal lung samples.
The database and workflow in this work can be flexibly applied to other MS/MS based human cancer experiments to detect gene fusions as potential disease biomarkers or drug targets.
PMCID: PMC4042237  PMID: 24564548
3.  The de novo sequence origin of two long non-coding genes from an inter-genic region 
BMC Genomics  2013;14(Suppl 8):S6.
The gene Polymorphic derived intron-containing, known as Pldi, is a long non-coding RNA (lncRNA) first discovered in mouse. Although parts of its sequence were reported to be conserved in rat and human, it can only be expressed in mouse testis with a mouse-specific transcription start site. The consensus sequence of Pldi is also part of an antisense transcript AK158810 expressed in a wide range of mouse tissues.
We focused on sequence origin of Pldi and Ak158810. We demonstrated that their sequence was originated from an inter-genic region and is only presented in mammalians. Transposable events and chromosome rearrangements were involved in the evolution of ancestral sequence. Moreover, we discovered high conservation in part of this region was correlated with chromosome rearrangements, CpG demethylation and transcriptional factor binding motif. These results demonstrated that multiple factors contributed to the sequence origin of Pldi.
We comprehensively analyzed the sequence origin of Pldi-Ak158810 loci. We provided various factors, including rearrangement, transposable elements, contributed to the formation of the sequence.
PMCID: PMC4042238  PMID: 24564579
Overlapping transcripts; Sequence Origin of Pldi and Ak158810 loci; Conserved Element; Substitution rate
4.  Evolutionary relationships between miRNA genes and their activity 
BMC Genomics  2012;13:718.
The emergence of vertebrates is characterized by a strong increase in miRNA families. MicroRNAs interact broadly with many transcripts, and the evolution of such a system is intriguing. However, evolutionary questions concerning the origin of miRNA genes and their subsequent evolution remain unexplained.
In order to systematically understand the evolutionary relationship between miRNAs gene and their function, we classified human known miRNAs into eight groups based on their evolutionary ages estimated by maximum parsimony method. New miRNA genes with new functional sequences accumulated more dynamically in vertebrates than that observed in Drosophila. Different levels of evolutionary selection were observed over miRNA gene sequences with different time of origin. Most genic miRNAs differ from their host genes in time of origin, there is no particular relationship between the age of a miRNA and the age of its host genes, genic miRNAs are mostly younger than the corresponding host genes. MicroRNAs originated over different time-scales are often predicted/verified to target the same or overlapping sets of genes, opening the possibility of substantial functional redundancy among miRNAs of different ages. Higher degree of tissue specificity and lower expression level was found in young miRNAs.
Our data showed that compared with protein coding genes, miRNA genes are more dynamic in terms of emergence and decay. Evolution patterns are quite different between miRNAs of different ages. MicroRNAs activity is under tight control with well-regulated expression increased and targeting decreased over time. Our work calls attention to the study of miRNA activity with a consideration of their origin time.
PMCID: PMC3544654  PMID: 23259970
5.  Differential combinatorial regulatory network analysis related to venous metastasis of hepatocellular carcinoma 
BMC Genomics  2012;13(Suppl 8):S14.
Hepatocellular carcinoma (HCC) is one of the most fatal cancers in the world, and metastasis is a significant cause to the high mortality in patients with HCC. However, the molecular mechanism behind HCC metastasis is not fully understood. Study of regulatory networks may help investigate HCC metastasis in the way of systems biology profiling.
By utilizing both sequence information and parallel microRNA(miRNA) and mRNA expression data on the same cohort of HBV related HCC patients without or with venous metastasis, we constructed combinatorial regulatory networks of non-metastatic and metastatic HCC which contain transcription factor(TF) regulation and miRNA regulation. Differential regulation patterns, classifying marker modules, and key regulatory miRNAs were analyzed by comparing non-metastatic and metastatic networks.
Globally TFs accounted for the main part of regulation while miRNAs for the minor part of regulation. However miRNAs displayed a more active role in the metastatic network than in the non-metastatic one. Seventeen differential regulatory modules discriminative of the metastatic status were identified as cumulative-module classifier, which could also distinguish survival time. MiR-16, miR-30a, Let-7e and miR-204 were identified as key miRNA regulators contributed to HCC metastasis.
In this work we demonstrated an integrative approach to conduct differential combinatorial regulatory network analysis in the specific context venous metastasis of HBV-HCC. Our results proposed possible transcriptional regulatory patterns underlying the different metastatic subgroups of HCC. The workflow in this study can be applied in similar context of cancer research and could also be extended to other clinical topics.
PMCID: PMC3535701  PMID: 23282077
6.  Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics 
BMC Genomics  2012;13(Suppl 8):S19.
Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved.
In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets.
Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences.
This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences.
PMCID: PMC3535712  PMID: 23282225
7.  Towards biological characters of interactions between transcription factors and their DNA targets in mammals 
BMC Genomics  2012;13:388.
In post-genomic era, the study of transcriptional regulation is pivotal to decode genetic information. Transcription factors (TFs) are central proteins for transcriptional regulation, and interactions between TFs and their DNA targets (TFBSs) are important for downstream genes’ expression. However, the lack of knowledge about interactions between TFs and TFBSs is still baffling people to investigate the mechanism of transcription.
To expand the knowledge about interactions between TFs and TFBSs, three biological features (sequence feature, structure feature, and evolution feature) were utilized to build TFBS identification models for studying binding preference between TFs and their DNA targets in mammals. Results show that each feature does have fairly well performance to capture TFBSs, and the hybrid model combined all three features is more robust for TFBS identification. Subsequently, correspondence between TFs and their TFBSs was investigated to explore interactions among them in mammals. Results indicate that TFs and TFBSs are reciprocal in sequence, structure, and evolution level.
Our work demonstrates that, to some extent, TFs and TFBSs have developed a coevolutionary relationship in order to keep their physical binding and maintain their regulatory functions. In summary, our work will help understand transcriptional regulation and interpret binding mechanism between proteins and DNAs.
PMCID: PMC3472306  PMID: 22888987
8.  Human transcriptional interactome of chromatin contribute to gene co-expression 
BMC Genomics  2010;11:704.
Transcriptional interactome of chromatin is one of the important mechanisms in gene transcription regulation. By chromatin conformation capture and 3D FISH experiments, several chromatin interactions cases among sequence-distant genes or even inter-chromatin genes were reported. However, on genomics level, there is still little evidence to support these mechanisms. Recently based on Hi-C experiment, a genome-wide picture of chromatin interactions in human cells was presented. It provides a useful material for analysing whether the mechanism of transcriptional interactome is common.
The main work here is to demonstrate whether the effects of transcriptional interactome on gene co-expression exist on genomic level. While controlling the effects of transcription factors control similarities (TCS), we tested the correlation between Hi-C interaction and the mutual ranks of gene co-expression rates (provided by COXPRESdb) of intra-chromatin gene pairs. We used 6,084 genes with both TF annotation and co-expression information, and matched them into 273,458 pairs with similar Hi-C interaction ranks in different cell types. The results illustrate that co-expression is strongly associated with chromatin interaction. Further analysis using GO annotation reveals potential correlation between gene function similarity, Hi-C interaction and their co-expression.
According to the results in this research, the intra-chromatin interactome may have relation to gene function and associate with co-expression. This study provides evidence for illustrating the effect of transcriptional interactome on transcription regulation.
PMCID: PMC3053592  PMID: 21156067
9.  The ancient mammalian KRAB zinc finger gene cluster on human chromosome 8q24.3 illustrates principles of C2H2 zinc finger evolution associated with unique expression profiles in human tissues 
BMC Genomics  2010;11:206.
Expansion of multi-C2H2 domain zinc finger (ZNF) genes, including the Krüppel-associated box (KRAB) subfamily, paralleled the evolution of tetrapodes, particularly in mammalian lineages. Advances in their cataloging and characterization suggest that the functions of the KRAB-ZNF gene family contributed to mammalian speciation.
Here, we characterized the human 8q24.3 ZNF cluster on the genomic, the phylogenetic, the structural and the transcriptome level. Six (ZNF7, ZNF34, ZNF250, ZNF251, ZNF252, ZNF517) of the seven locus members contain exons encoding KRAB domains, one (ZNF16) does not. They form a paralog group in which the encoded KRAB and ZNF protein domains generally share more similarities with each other than with other members of the human ZNF superfamily. The closest relatives with respect to their DNA-binding domain were ZNF7 and ZNF251. The analysis of orthologs in therian mammalian species revealed strong conservation and purifying selection of the KRAB-A and zinc finger domains. These findings underscore structural/functional constraints during evolution. Gene losses in the murine lineage (ZNF16, ZNF34, ZNF252, ZNF517) and potential protein truncations in primates (ZNF252) illustrate ongoing speciation processes. Tissue expression profiling by quantitative real-time PCR showed similar but distinct patterns for all tested ZNF genes with the most prominent expression in fetal brain. Based on accompanying expression signatures in twenty-six other human tissues ZNF34 and ZNF250 revealed the closest expression profiles. Together, the 8q24.3 ZNF genes can be assigned to a cerebellum, a testis or a prostate/thyroid subgroup. These results are consistent with potential functions of the ZNF genes in morphogenesis and differentiation. Promoter regions of the seven 8q24.3 ZNF genes display common characteristics like missing TATA-box, CpG island-association and transcription factor binding site (TFBS) modules. Common TFBS modules partly explain the observed expression pattern similarities.
The ZNF genes at human 8q24.3 form a relatively old mammalian paralog group conserved in eutherian mammals for at least 130 million years. The members persisted after initial duplications by undergoing subfunctionalizations in their expression patterns and target site recognition. KRAB-ZNF mediated repression of transcription might have shaped organogenesis in mammalian ontogeny.
PMCID: PMC2865497  PMID: 20346131
10.  Estimating accuracy of RNA-Seq and microarrays with proteomics 
BMC Genomics  2009;10:161.
Microarrays revolutionized biological research by enabling gene expression comparisons on a transcriptome-wide scale. Microarrays, however, do not estimate absolute expression level accurately. At present, high throughput sequencing is emerging as an alternative methodology for transcriptome studies. Although free of many limitations imposed by microarray design, its potential to estimate absolute transcript levels is unknown.
In this study, we evaluate relative accuracy of microarrays and transcriptome sequencing (RNA-Seq) using third methodology: proteomics. We find that RNA-Seq provides a better estimate of absolute expression levels.
Our result shows that in terms of overall technical performance, RNA-Seq is the technique of choice for studies that require accurate estimation of absolute transcript levels.
PMCID: PMC2676304  PMID: 19371429
11.  Genomic regions with distinct genomic distance conservation in vertebrate genomes 
BMC Genomics  2009;10:133.
A number of vertebrate highly conserved elements (HCEs) have been detected and their genomic interval distances have been reported to be more conserved than protein coding genes among mammalian genomes. A characteristic of the human – non-mammalian comparisons is a bimodal distribution of relative distance difference of conserved consecutive HCE pairs; and it is difficult to attribute such profile to a random assortment. We therefore undertook an analysis of the human genomic regions confined by consecutive HCE pairs common to eight genomes (human, mouse, rat, chicken, frog, zebrafish, tetradon and fugu).
Among HCE pairs, we found that some consistently preserve highly conserved interval distance among genomes while others have relatively low distance conservation. Using a partition method, we detected two groups of inter-HCE regions (IHRs) with distinct distance conservation pattern in vertebrate genomes: IHR1s that are bordered by HCE pairs with relative small distance variation, and IHR2s with larger distance difference values. Compared to random background, annotated repeat sequences are significantly less frequent in IHR1s than IHR2s, which reflects a correlation between repeat sequences and the length expansion of IHRs. Both groups of IHRs are unexpectedly enriched in human indel (i.e. insertion and deletion) polymorphism-variations than random background. The correlation between the percentage of conserved sequence and human IHR length was stronger for IHR1 than IHR2. Both groups of IHRs are significantly enriched for CpG islands.
The data suggest that subsets of HCE pairs may undergo different evolutionary paths in light of their genomic distance conservation, and that sets of genomic regions pertain to HCEs, as well as the region in which HCEs reside, should be treated as integrated domains.
PMCID: PMC2667192  PMID: 19323843
12.  The conservation pattern of short linear motifs is highly correlated with the function of interacting protein domains 
BMC Genomics  2008;9:452.
Many well-represented domains recognize primary sequences usually less than 10 amino acids in length, called Short Linear Motifs (SLiMs). Accurate prediction of SLiMs has been difficult because they are short (often < 10 amino acids) and highly degenerate. In this study, we combined scoring matrixes derived from peptide library and conservation analysis to identify protein classes enriched of functional SLiMs recognized by SH2, SH3, PDZ and S/T kinase domains.
Our combined approach revealed that SLiMs are highly conserved in proteins from functional classes that are known to interact with a specific domain, but that they are not conserved in most other protein groups. We found that SLiMs recognized by SH2 domains were highly conserved in receptor kinases/phosphatases, adaptor molecules, and tyrosine kinases/phosphatases, that SLiMs recognized by SH3 domains were highly conserved in cytoskeletal and cytoskeletal-associated proteins, that SLiMs recognized by PDZ domains were highly conserved in membrane proteins such as channels and receptors, and that SLiMs recognized by S/T kinase domains were highly conserved in adaptor molecules, S/T kinases/phosphatases, and proteins involved in transcription or cell cycle control. We studied Tyr-SLiMs recognized by SH2 domains in more detail, and found that SH2-recognized Tyr-SLiMs on the cytoplasmic side of membrane proteins are more highly conserved than those on the extra-cellular side. Also, we found that SH2-recognized Tyr-SLiMs that are associated with SH3 motifs and a tyrosine kinase phosphorylation motif are more highly conserved.
The interactome of protein domains is reflected by the evolutionary conservation of SLiMs recognized by these domains. Combining scoring matrixes derived from peptide libraries and conservation analysis, we would be able to find those protein groups that are more likely to interact with specific domains.
PMCID: PMC2576256  PMID: 18828911
13.  The use of global transcriptional analysis to reveal the biological and cellular events involved in distinct development phases of Trichophyton rubrum conidial germination 
BMC Genomics  2007;8:100.
Conidia are considered to be the primary cause of infections by Trichophyton rubrum.
We have developed a cDNA microarray containing 10250 ESTs to monitor the transcriptional strategy of conidial germination. A total of 1561 genes that had their expression levels specially altered in the process were obtained and hierarchically clustered with respect to their expression profiles. By functional analysis, we provided a global view of an important biological system related to conidial germination, including characterization of the pattern of gene expression at sequential developmental phases, and changes of gene expression profiles corresponding to morphological transitions. We matched the EST sequences to GO terms in the Saccharomyces Genome Database (SGD). A number of homologues of Saccharomyces cerevisiae genes related to signalling pathways and some important cellular processes were found to be involved in T. rubrum germination. These genes and signalling pathways may play roles in distinct steps, such as activating conidial germination, maintenance of isotropic growth, establishment of cell polarity and morphological transitions.
Our results may provide insights into molecular mechanisms of conidial germination at the cell level, and may enhance our understanding of regulation of gene expression related to the morphological construction of T. rubrum.
PMCID: PMC1871584  PMID: 17428342
14.  Analysis of the dermatophyte Trichophyton rubrum expressed sequence tags 
BMC Genomics  2006;7:255.
Dermatophytes are the primary causative agent of dermatophytoses, a disease that affects billions of individuals worldwide. Trichophyton rubrum is the most common of the superficial fungi. Although T. rubrum is a recognized pathogen for humans, little is known about how its transcriptional pattern is related to development of the fungus and establishment of disease. It is therefore necessary to identify genes whose expression is relevant to growth, metabolism and virulence of T. rubrum.
We generated 10 cDNA libraries covering nearly the entire growth phase and used them to isolate 11,085 unique expressed sequence tags (ESTs), including 3,816 contigs and 7,269 singletons. Comparisons with the GenBank non-redundant (NR) protein database revealed putative functions or matched homologs from other organisms for 7,764 (70%) of the ESTs. The remaining 3,321 (30%) of ESTs were only weakly similar or not similar to known sequences, suggesting that these ESTs represent novel genes.
The present data provide a comprehensive view of fungal physiological processes including metabolism, sexual and asexual growth cycles, signal transduction and pathogenic mechanisms.
PMCID: PMC1621083  PMID: 17032460
15.  Exploring photosynthesis evolution by comparative analysis of metabolic networks between chloroplasts and photosynthetic bacteria 
BMC Genomics  2006;7:100.
Chloroplasts descended from cyanobacteria and have a drastically reduced genome following an endosymbiotic event. Many genes of the ancestral cyanobacterial genome have been transferred to the plant nuclear genome by horizontal gene transfer. However, a selective set of metabolism pathways is maintained in chloroplasts using both chloroplast genome encoded and nuclear genome encoded enzymes. As an organelle specialized for carrying out photosynthesis, does the chloroplast metabolic network have properties adapted for higher efficiency of photosynthesis? We compared metabolic network properties of chloroplasts and prokaryotic photosynthetic organisms, mostly cyanobacteria, based on metabolic maps derived from genome data to identify features of chloroplast network properties that are different from cyanobacteria and to analyze possible functional significance of those features.
The properties of the entire metabolic network and the sub-network that consists of reactions directly connected to the Calvin Cycle have been analyzed using hypergraph representation. Results showed that the whole metabolic networks in chloroplast and cyanobacteria both possess small-world network properties. Although the number of compounds and reactions in chloroplasts is less than that in cyanobacteria, the chloroplast's metabolic network has longer average path length, a larger diameter, and is Calvin Cycle -centered, indicating an overall less-dense network structure with specific and local high density areas in chloroplasts. Moreover, chloroplast metabolic network exhibits a better modular organization than cyanobacterial ones. Enzymes involved in the same metabolic processes tend to cluster into the same module in chloroplasts.
In summary, the differences in metabolic network properties may reflect the evolutionary changes during endosymbiosis that led to the improvement of the photosynthesis efficiency in higher plants. Our findings are consistent with the notion that since the light energy absorption, transfer and conversion is highly efficient even in photosynthetic bacteria, the further improvements in photosynthetic efficiency in higher plants may rely on changes in metabolic network properties.
PMCID: PMC1524952  PMID: 16646993
16.  Genomic characterization of ribitol teichoic acid synthesis in Staphylococcus aureus: genes, genomic organization and gene duplication 
BMC Genomics  2006;7:74.
Staphylococcus aureus or MRSA (Methicillin Resistant S. aureus), is an acquired pathogen and the primary cause of nosocomial infections worldwide. In S. aureus, teichoic acid is an essential component of the cell wall, and its biosynthesis is not yet well characterized. Studies in Bacillus subtilis have discovered two different pathways of teichoic acid biosynthesis, in two strains W23 and 168 respectively, namely teichoic acid ribitol (tar) and teichoic acid glycerol (tag). The genes involved in these two pathways are also characterized, tarA, tarB, tarD, tarI, tarJ, tarK, tarL for the tar pathway, and tagA, tagB, tagD, tagE, tagF for the tag pathway. With the genome sequences of several MRSA strains: Mu50, MW2, N315, MRSA252, COL as well as methicillin susceptible strain MSSA476 available, a comparative genomic analysis was performed to characterize teichoic acid biosynthesis in these S. aureus strains.
We identified all S. aureus tar and tag gene orthologs in the selected S. aureus strains which would contribute to teichoic acids sythesis.Based on our identification of genes orthologous to tarI, tarJ, tarL, which are specific to tar pathway in B. subtilis W23, we also concluded that tar is the major teichoic acid biogenesis pathway in S. aureus. Further analyses indicated that the S. aureus tar genes, different from the divergon organization in B. subtilis, are organized into several clusters in cis. Most interesting, compared with genes in B. subtilis tar pathway, the S. aureus tar specific genes (tarI,J,L) are duplicated in all six S. aureus genomes.
In the S. aureus strains we analyzed, tar (teichoic acid ribitol) is the main teichoic acid biogenesis pathway. The tar genes are organized into several genomic groups in cis and the genes specific to tar (relative to tag): tarI, tarJ, tarL are duplicated. The genomic organization of the S. aureus tar pathway suggests their regulations are different when compared to B. subtilis tar or tag pathway, which are grouped in two operons in a divergon structure.
PMCID: PMC1458327  PMID: 16595020

Results 1-16 (16)