Search tips
Search criteria

Results 1-25 (53)

Clipboard (0)
Year of Publication
Document Types
1.  TGF-β induces the expression of SAP30L, a novel nuclear protein 
BMC Genomics  2003;4:53.
We have previously set up an in vitro mesenchymal-epithelial cell co-culture model which mimics the intestinal crypt villus axis biology in terms of epithelial cell differentiation. In this model the fibroblast-induced epithelial cell differentiation from secretory crypt cells to absorptive enterocytes is mediated via transforming growth factor-β (TGF-β), the major inhibitory regulator of epithelial cell proliferation known to induce differentiation in intestinal epithelial cells. The aim of this study was to identify novel genes whose products would play a role in this TGF-β-induced differentiation.
Differential display analysis resulted in the identification of a novel TGF-β upregulated mRNA species, the Sin3-associated protein 30-like, SAP30L. The mRNA is expressed in several human tissues and codes for a nuclear protein of 183 amino acids 70% identical with Sin3 associated protein 30 (SAP30). The predicted nuclear localization signal of SAP30L is sufficient for nuclear transport of the protein although mutating it does not completely remove SAP30L from the nuclei. In the nuclei SAP30L concentrates in small bodies which were shown by immunohistochemistry to colocalize with PML bodies only partially.
By reason of its nuclear localization and close homology to SAP30 we believe that SAP30L might have a role in recruiting the Sin3-histone deacetylase complex to specific corepressor complexes in response to TGF-β, leading to the silencing of proliferation-driving genes in the differentiating intestinal epithelial cells.
PMCID: PMC319701  PMID: 14680513
2.  Genomic characterization of a repetitive motif strongly associated with developmental genes in Drosophila 
BMC Genomics  2003;4:52.
Non-coding DNA represents a high proportion of all metazoan genomes. Although an undetermined fraction of this DNA may be considered devoid of any function, it also contains important information residing in specific cis-regulatory sequences.
We report a 27 bp motif that is overrepresented within the fly genome. This motif does not show any significant similarity with transposon sequences and is strongly associated with genes involved in development and/or signal transduction. The 27 bp motif is preferentially located within introns, and has a tendency to be present in multiple copies around genes. Furthermore, it is often found embedded in known non-coding regulatory regions. The regulatory network defined by this motif is partially shared in D. pseudoobscura.
We have identified a 27 bp cis-regulatory sequence widely distributed within the Drosophila genome in association with developmental genes. This motif may be very useful towards the annotation of functional regulatory regions within the Drosophila genome and the construction of regulatory networks of Drosophila development.
PMCID: PMC327093  PMID: 14675495
3.  Metabolic reconstruction of sulfur assimilation in the extremophile Acidithiobacillus ferrooxidans based on genome analysis 
BMC Genomics  2003;4:51.
Acidithiobacillus ferrooxidans is a gamma-proteobacterium that lives at pH2 and obtains energy by the oxidation of sulfur and iron. It is used in the biomining industry for the recovery of metals and is one of the causative agents of acid mine drainage. Effective tools for the study of its genetics and physiology are not in widespread use and, despite considerable effort, an understanding of its unusual physiology remains at a rudimentary level. Nearly complete genome sequences of A. ferrooxidans are available from two public sources and we have exploited this information to reconstruct aspects of its sulfur metabolism.
Two candidate mechanisms for sulfate uptake from the environment were detected but both belong to large paralogous families of membrane transporters and their identification remains tentative. Prospective genes, pathways and regulatory mechanisms were identified that are likely to be involved in the assimilation of sulfate into cysteine and in the formation of Fe-S centers. Genes and regulatory networks were also uncovered that may link sulfur assimilation with nitrogen fixation, hydrogen utilization and sulfur reduction. Potential pathways were identified for sulfation of extracellular metabolites that may possibly be involved in cellular attachment to pyrite, sulfur and other solid substrates.
A bioinformatic analysis of the genome sequence of A. ferrooxidans has revealed candidate genes, metabolic process and control mechanisms potentially involved in aspects of sulfur metabolism. Metabolic modeling provides an important preliminary step in understanding the unusual physiology of this extremophile especially given the severe difficulties involved in its genetic manipulation and biochemical analysis.
PMCID: PMC324559  PMID: 14675496
4.  The WD-repeat protein superfamily in Arabidopsis: conservation and divergence in structure and function 
BMC Genomics  2003;4:50.
The WD motif (also known as the Trp-Asp or WD40 motif) is found in a multitude of eukaryotic proteins involved in a variety of cellular processes. Where studied, repeated WD motifs act as a site for protein-protein interaction, and proteins containing WD repeats (WDRs) are known to serve as platforms for the assembly of protein complexes or mediators of transient interplay among other proteins. In the model plant Arabidopsis thaliana, members of this superfamily are increasingly being recognized as key regulators of plant-specific developmental events.
We analyzed the predicted complement of WDR proteins from Arabidopsis, and compared this to those from budding yeast, fruit fly and human to illustrate both conservation and divergence in structure and function. This analysis identified 237 potential Arabidopsis proteins containing four or more recognizable copies of the motif. These were classified into 143 distinct families, 49 of which contained more than one Arabidopsis member. Approximately 113 of these families or individual proteins showed clear homology with WDR proteins from the other eukaryotes analyzed. Where conservation was found, it often extended across all of these organisms, suggesting that many of these proteins are linked to basic cellular mechanisms. The functional characterization of conserved WDR proteins in Arabidopsis reveals that these proteins help adapt basic mechanisms for plant-specific processes.
Our results show that most Arabidopsis WDR proteins are strongly conserved across eukaryotes, including those that have been found to play key roles in plant-specific processes, with diversity in function conferred at least in part by divergence in upstream signaling pathways, downstream regulatory targets and /or structure outside of the WDR regions.
PMCID: PMC317288  PMID: 14672542
5.  Relationship between gene co-expression and probe localization on microarray slides 
BMC Genomics  2003;4:49.
Microarray technology allows simultaneous measurement of thousands of genes in a single experiment. This is a potentially useful tool for evaluating co-expression of genes and extraction of useful functional and chromosomal structural information about genes.
In this work we studied the association between the co-expression of genes, their location on the chromosome and their location on the microarray slides by analyzing a number of eukaryotic expression datasets, derived from the S. cerevisiae, C. elegans, and D. melanogaster. We find that in several different yeast microarray experiments the distribution of the number of gene pairs with correlated expression profiles as a function of chromosomal spacing is peaked at short separations and has two superimposed periodicities. The longer periodicity has a spacing of 22 genes (~42 Kb), and the shorter periodicity is 2 genes (~4 Kb).
The relative positioning of DNA probes on microarray slides and source plates introduces subtle but significant correlations between pairs of genes. Careful consideration of this spatial artifact is important for analysis of microarray expression data. It is particularly relevant to recent microarray analyses that suggest that co-expressed genes cluster along chromosomes or are spaced by multiples of a fixed number of genes along the chromosome.
PMCID: PMC317287  PMID: 14667251
6.  Phylogenomic identification of five new human homologs of the DNA repair enzyme AlkB 
BMC Genomics  2003;4:48.
Combination of biochemical and bioinformatic analyses led to the discovery of oxidative demethylation – a novel DNA repair mechanism catalyzed by the Escherichia coli AlkB protein and its two human homologs, hABH2 and hABH3. This discovery was based on the prediction made by Aravind and Koonin that AlkB is a member of the 2OG-Fe2+ oxygenase superfamily.
In this article, we report identification and sequence analysis of five human members of the (2OG-Fe2+) oxygenase superfamily designated here as hABH4 through hABH8. These experimentally uncharacterized and poorly annotated genes were not associated with the AlkB family in any database, but are predicted here to be phylogenetically and functionally related to the AlkB family (and specifically to the lineage that groups together hABH2 and hABH3) rather than to any other oxygenase family. Our analysis reveals the history of ABH gene duplications in the evolution of vertebrate genomes.
We hypothesize that hABH 4–8 could either be back-up enzymes for hABH1-3 or may code for novel DNA or RNA repair activities. For example, enzymes that can dealkylate N3-methylpurines or N7-methylpurines in DNA have not been described. Our analysis will guide experimental confirmation of these novel human putative DNA repair enzymes.
PMCID: PMC317286  PMID: 14667252
phylogenomics; bioinformatics; dealkylation; demethylation; dioxygenases
7.  Identification and phylogenetic analysis of Dictyostelium discoideum kinesin proteins 
BMC Genomics  2003;4:47.
Kinesins constitute a large superfamily of motor proteins in eukaryotic cells. They perform diverse tasks such as vesicle and organelle transport and chromosomal segregation in a microtubule- and ATP-dependent manner. In recent years, the genomes of a number of eukaryotic organisms have been completely sequenced. Subsequent studies revealed and classified the full set of members of the kinesin superfamily expressed by these organisms. For Dictyostelium discoideum, only five kinesin superfamily proteins (Kif's) have already been reported.
Here, we report the identification of thirteen kinesin genes exploiting the information from the raw shotgun reads of the Dictyostelium discoideum genome project. A phylogenetic tree of 390 kinesin motor domain sequences was built, grouping the Dictyostelium kinesins into nine subfamilies. According to known cellular functions or strong homologies to kinesins of other organisms, four of the Dictyostelium kinesins are involved in organelle transport, six are implicated in cell division processes, two are predicted to perform multiple functions, and one kinesin may be the founder of a new subclass.
This analysis of the Dictyostelium genome led to the identification of eight new kinesin motor proteins. According to an exhaustive phylogenetic comparison, Dictyostelium contains the same subset of kinesins that higher eukaryotes need to perform mitosis. Some of the kinesins are implicated in intracellular traffic and a small number have unpredictable functions.
PMCID: PMC305348  PMID: 14641909
8.  Microarray analysis of tumor necrosis factor α induced gene expression in U373 human glioblastoma cells 
BMC Genomics  2003;4:46.
Tumor necrosis factor α (TNF) is able to induce a variety of biological responses in the nervous system including inflammation and neuroprotection. Human astrocytoma cells U373 have been widely used as a model for inflammatory cytokine actions in the nervous system. Here we used cDNA microarrays to analyze the time course of the transcriptional response from 1 h up to 12 h post TNF treatment in comparison to untreated U373 cells. TNF activated strongly the NF-κB transcriptional pathway and is linked to other pathways via the NF-κB target genes JUNB and IRF-1. Part of the TNF-induced gene expression could be inhibited by pharmacological inhibition of NF-κB with pyrrolidine-dithiocarbamate (PDTC). NF-κB comprises a family of transcription factors which are involved in the inducible expression of genes regulating neuronal survival, inflammatory response, cancer and innate immunity.
In this study we show that numerous genes responded to TNF (> 880 from 7500 tested) with a more than two-fold induction rate. Several novel TNF-responsive genes (about 60% of the genes regulated by a factor ≥ 3) were detected. A comparison of our TNF-induced gene expression profiles of U373, with profiles from 3T3 and Hela cells revealed a striking cell-type specificity. SCYA2 (MCP-1, CCL2, MCAF) was induced in U373 cells in a sustained manner and at the highest level of all analyzed genes. MCP-1 protein expression, as monitored with immunofluorescence and ELISA, correlated exactly with microarray data. Based on these data and on evidence from literature we suggest a model for the potential neurodegenerative effect of NF-κB in astroglia: Activation of NF-κB via TNF results in a strongly increased production of MCP-1. This leads to a exacerbation of neurodegeneration in stoke or Multiple Sclerosis, presumably via infiltration of macrophages.
The vast majority of genes regulated more than 3-fold were previously not linked to tumor necrosis factor α as a search in published literature revealed. Striking co-regulation for several functional groups such as proteasome and ribosomal proteins were detected.
PMCID: PMC317285  PMID: 14641910
9.  In silico and in vivo analysis reveal a novel gene in Saccharomyces cerevisiae trehalose metabolism 
BMC Genomics  2003;4:45.
The ability to respond rapidly to fluctuations in environmental changes is decisive for cell survival. Under these conditions trehalose has an essential protective function and its concentration increases in response to enhanced expression of trehalose synthase genes, TPS1, TPS2, TPS3 and TSL1. Intriguingly, the NTH1 gene, which encodes neutral trehalase, is highly expressed at the same time. We have previously shown that trehalase remains in its inactive non-phosphorylated form by the action of an endogenous inhibitor. Recently, a comprehensive two-hybrid analysis revealed a 41-kDa protein encoded by the YLR270w ORF, which interacts with NTH1p.
In this work we investigate the correlation of this Trehalase Associated Protein, in trehalase activity regulation. The neutral trehalase activity in the ylr270w mutant strain was about 4-fold higher than in the control strain. After in vitro activation by PKA the ylr270w mutant total trehalase activity increased 3-fold when compared to a control strain. The expression of the NTH1 gene promoter fused to the heterologous reporter lacZ gene was evaluated. The mutant strain lacking YLR270w exhibited a 2-fold increase in the NTH1-lacZ basal expression when compared to the wild type strain.
These results strongly indicate a central role for Ylr270p in inhibiting trehalase activity, as well as in the regulation of its expression preventing a wasteful futile cycle of synthesis-degradation of trehalose.
PMCID: PMC280675  PMID: 14614785
trehalase; trehalose; DCS1; YLR270w; yeast; Saccharomyces cerevisiae
10.  Amplified RNA degradation in T7-amplification methods results in biased microarray hybridizations 
BMC Genomics  2003;4:44.
The amplification of RNA with the T7-System is a widely used technique for obtaining increased amounts of RNA starting from limited material. The amplified RNA (aRNA) can subsequently be used for microarray hybridizations, warranting sufficient signal for image analysis. We describe here an amplification-time dependent degradation of aRNA in prolonged standard T7 amplification protocols, that results in lower average size aRNA and decreased yields.
A time-dependent degradation of amplified RNA (aRNA) could be observed when using the classical "Eberwine" T7-Amplification method. When the amplification was conducted for more than 4 hours, the resulting aRNA showed a significantly smaller size distribution on gel electrophoresis and a concomitant reduction of aRNA yield. The degradation of aRNA could be correlated to the presence of the T7 RNA Polymerase in the amplification cocktail. The aRNA degradation resulted in a strong bias in microarray hybridizations with a high coefficient of variation and a significant reduction of signals of certain transcripts, that seem to be susceptible to this RNA degrading activity. The time-dependent degradation of these transcripts was verified by a real-time PCR approach.
It is important to perform amplifications not longer than 4 hours as there is a characteristic 'quality vs. yield' situation for longer amplification times. When conducting microarray hybridizations it is important not to compare results obtained with aRNA from different amplification times.
PMCID: PMC280674  PMID: 14606961
RNA; amplification; bias; T7 Polymerase; microarray
11.  Expression of alternatively spliced isoforms of human Sp7 in osteoblast-like cells 
BMC Genomics  2003;4:43.
Osteogenic and chondrocytic differentiation involves a cascade of coordinated transcription factor gene expression that regulates proliferation and matrix protein formation in a defined temporo-spatial manner. Bone morphogenetic protein-2 induces expression of the murine Osterix/Specificity protein-7 (Sp7) transcription factor that is required for osteoblast differentiation and bone formation. Regulation of its expression may prove useful for mediating skeletal repair.
Sp7, the human homologue of the mouse Osterix gene, maps to 12q13.13, close to Sp1 and homeobox gene cluster-C. The first two exons of the 3-exon gene are alternatively spliced, encoding a 431-residue long protein isoform and an amino-terminus truncated 413-residue short protein isoform. The human Sp7 protein is a member of the Sp family having 78% identity with Sp1 in the three, Cys2-His2 type, DNA-binding zinc-fingers, but there is little homology elsewhere. The Sp7 mRNA was expressed in human foetal osteoblasts and craniofacial osteoblasts, chondrocytes and the osteosarcoma cell lines HOS and MG63, but was not detected in adult femoral osteoblasts. Generally, the expression of the short (or beta) protein isoform of Sp7 was much higher than the long (or alpha) protein isoform. No expression of either isoform was found in a panel of other cell types. However, in tissues, low levels of Sp7 were detected in testis, heart, brain, placenta, lung, pancreas, ovary and spleen.
Sp7 expression in humans is largely confined to osteoblasts and chondrocytes, both of which differentiate from the mesenchymal lineage. Of the two protein isoforms, the short isoform is most abundant.
PMCID: PMC280673  PMID: 14604442
12.  PCAS – a precomputed proteome annotation database resource 
BMC Genomics  2003;4:42.
Many model proteomes or "complete" sets of proteins of given organisms are now publicly available. Much effort has been invested in computational annotation of those "draft" proteomes. Motif or domain based algorithms play a pivotal role in functional classification of proteins. Employing most available computational algorithms, mainly motif or domain recognition algorithms, we set up to develop an online proteome annotation system with integrated proteome annotation data to complement existing resources.
We report here the development of PCAS (ProteinCentric Annotation System) as an online resource of pre-computed proteome annotation data. We applied most available motif or domain databases and their analysis methods, including hmmpfam search of HMMs in Pfam, SMART and TIGRFAM, RPS-PSIBLAST search of PSSMs in CDD, pfscan of PROSITE patterns and profiles, as well as PSI-BLAST search of SUPERFAMILY PSSMs. In addition, signal peptide and TM are predicted using SignalP and TMHMM respectively. We mapped SUPERFAMILY and COGs to InterPro, so the motif or domain databases are integrated through InterPro. PCAS displays table summaries of pre-computed data and a graphical presentation of motifs or domains relative to the protein. As of now, PCAS contains human IPI, mouse IPI, and rat IPI, A. thaliana, C. elegans, D. melanogaster, S. cerevisiae, and S. pombe proteome.
PCAS is available at
PCAS gives better annotation coverage for model proteomes by employing a wider collection of available algorithms. Besides presenting the most confident annotation data, PCAS also allows customized query so users can inspect statistically less significant boundary information as well. Therefore, besides providing general annotation information, PCAS could be used as a discovery platform. We plan to update PCAS twice a year. We will upgrade PCAS when new proteome annotation algorithms identified.
PMCID: PMC293463  PMID: 14594458
13.  Multifactorial experimental design and the transitivity of ratios with spotted DNA microarrays 
BMC Genomics  2003;4:41.
Multifactorial experimental designs using DNA microarrays are becoming increasingly common, but the extent of the transitivity of cDNA microarray expression measurements across multiple samples has yet to be explored.
A strong correlation between direct and transitive inference for significantly differentially expressed genes is demonstrated, using subsets of a dye-swap loop design.
In experimental design, opportunities for transitive inference should be exploited, while always ensuring that comparisons of greatest interest comprise direct hybridizations.
PMCID: PMC239860  PMID: 14525623
14.  Differential representation of sunflower ESTs in enriched organ-specific cDNA libraries in a small scale sequencing project 
BMC Genomics  2003;4:40.
Subtractive hybridization methods are valuable tools for identifying differentially regulated genes in a given tissue avoiding redundant sequencing of clones representing the same expressed genes, maximizing detection of low abundant transcripts and thus, affecting the efficiency and cost effectiveness of small scale cDNA sequencing projects aimed to the specific identification of useful genes for breeding purposes. The objective of this work is to evaluate alternative strategies to high-throughput sequencing projects for the identification of novel genes differentially expressed in sunflower as a source of organ-specific genetic markers that can be functionally associated to important traits.
Differential organ-specific ESTs were generated from leaf, stem, root and flower bud at two developmental stages (R1 and R4). The use of different sources of RNA as tester and driver cDNA for the construction of differential libraries was evaluated as a tool for detection of rare or low abundant transcripts. Organ-specificity ranged from 75 to 100% of non-redundant sequences in the different cDNA libraries. Sequence redundancy varied according to the target and driver cDNA used in each case. The R4 flower cDNA library was the less redundant library with 62% of unique sequences. Out of a total of 919 sequences that were edited and annotated, 318 were non-redundant sequences. Comparison against sequences in public databases showed that 60% of non-redundant sequences showed significant similarity to known sequences. The number of predicted novel genes varied among the different cDNA libraries, ranging from 56% in the R4 flower to 16 % in the R1 flower bud library. Comparison with sunflower ESTs on public databases showed that 197 of non-redundant sequences (60%) did not exhibit significant similarity to previously reported sunflower ESTs. This approach helped to successfully isolate a significant number of new reported sequences putatively related to responses to important agronomic traits and key regulatory and physiological genes.
The application of suppressed subtracted hybridization technology not only enabled the cost effective isolation of differentially expressed sequences but it also allowed the identification of novel sequences in sunflower from a relative small number of analyzed sequences when compared to major sequencing projects.
PMCID: PMC270089  PMID: 14519210
Helianthus annuus; sunflower; EST; SSH; organ-specific transcripts
15.  A novel design of whole-genome microarray probes for Saccharomyces cerevisiae which minimizes cross-hybridization 
BMC Genomics  2003;4:38.
Numerous DNA microarray hybridization experiments have been performed in yeast over the last years using either synthetic oligonucleotides or PCR-amplified coding sequences as probes. The design and quality of the microarray probes are of critical importance for hybridization experiments as well as subsequent analysis of the data.
We present here a novel design of Saccharomyces cerevisiae microarrays based on a refined annotation of the genome and with the aim of reducing cross-hybridization between related sequences. An effort was made to design probes of similar lengths, preferably located in the 3'-end of reading frames. The sequence of each gene was compared against the entire yeast genome and optimal sub-segments giving no predicted cross-hybridization were selected. A total of 5660 novel probes (more than 97% of the yeast genes) were designed. For the remaining 143 genes, cross-hybridization was unavoidable. Using a set of 18 deletant strains, we have experimentally validated our cross-hybridization procedure. Sensitivity, reproducibility and dynamic range of these new microarrays have been measured. Based on this experience, we have written a novel program to design long oligonucleotides for microarray hybridizations of complete genome sequences.
A validated procedure to predict cross-hybridization in microarray probe design was defined in this work. Subsequently, a novel Saccharomyces cerevisiae microarray (which minimizes cross-hybridization) was designed and constructed. Arrays are available at Eurogentec S. A. Finally, we propose a novel design program, OliD, which allows automatic oligonucleotide design for microarrays. The OliD program is available from authors.
PMCID: PMC239980  PMID: 14499002
16.  The interrelationship between DRIM gene expression and cytogenetic and phenotypic characteristics in human breast tumor cell lines 
BMC Genomics  2003;4:39.
In order to facilitate the identification of genes involved in the metastatic phenotype we have previously developed a pair of cell lines from the human breast carcinoma cell line MDA-MB-435, which have diametrically opposite metastatic potential in athymic mice. Differential display analysis of this model previously identified a novel gene, DRIM (down regulated in metastasis), the decreased expression of which correlated with metastatic capability. DRIM encodes a protein comprising 2785 amino acids with significant homology to a protein in yeast and C. elegans, but little else is currently known about its function or pattern of expression. In a detailed analysis of the DRIM gene locus we quantitatively evaluated gene dosage and the expression of DRIM transcripts in a panel of breast cell lines of known metastatic phenotype.
Fluorescent in situ hybridization (FISH) analyses mapped a single DRIM gene locus to human chromosome 12q23~24, a region of conserved synteny to mouse chromosome 10. We confirmed higher expression of DRIM mRNA in the non-metastatic MDA-MB-435 clone NM2C5, relative to its metastatic counterpart M4A4, but this appeared to be due to the presence of an extra copy of the DRIM gene in the cell line's genome. The other non-metastatic cell lines in the series (T47D MCF-7, SK-BR-3 and ZR-75-1) contained either 3 or 4 chromosomal copies of DRIM gene. However, the expression level of DRIM mRNA in M4A4 was found to be 2–4 fold higher than in unrelated breast cells of non-metastatic phenotype.
Whilst DRIM expression is decreased in metastatic M4A4 cells relative to its non-metastatic isogenic counterpart, neither DRIM gene dosage nor DRIM mRNA levels correlated with metastatic propensity in a series of human breast tumor cell lines examined. Collectively, these findings indicate that the expression pattern of the DRIM gene in relation to the pathogenesis of breast tumor metastasis is more complex than previously recognized.
PMCID: PMC222913  PMID: 14503924
17.  An improved probability mapping approach to assess genome mosaicism 
BMC Genomics  2003;4:37.
Maximum likelihood and posterior probability mapping are useful visualization techniques that are used to ascertain the mosaic nature of prokaryotic genomes. However, posterior probabilities, especially when calculated for four-taxon cases, tend to overestimate the support for tree topologies. Furthermore, because of poor taxon sampling four-taxon analyses suffer from sensitivity to the long branch attraction artifact. Here we extend the probability mapping approach by improving taxon sampling of the analyzed datasets, and by using bootstrap support values, a more conservative tool to assess reliability.
Quartets of orthologous proteins were complemented with homologs from selected reference genomes. The mapping of bootstrap support values from these extended datasets gives results similar to the original maximum likelihood and posterior probability mapping. The more conservative nature of the plotted support values allows to focus further analyses on those protein families that strongly disagree with the majority or plurality of genes present in the analyzed genomes.
Posterior probability is a non-conservative measure for support, and posterior probability mapping only provides a quick estimation of phylogenetic information content of four genomes. This approach can be utilized as a pre-screen to select genes that might have been horizontally transferred. Better taxon sampling combined with subtree analyses prevents the inconsistencies associated with four-taxon analyses, but retains the power of visual representation. Nevertheless, a case-by-case inspection of individual multi-taxon phylogenies remains necessary to differentiate unrecognized paralogy and shared phylogenetic reconstruction artifacts from horizontal gene transfer events.
PMCID: PMC222983  PMID: 12974984
maximum likelihood mapping; long-branch attraction; horizontal gene transfer; taxon sampling; bootstrap support values mapping
18.  Increased retention of functional fusions to toxic genes in new two-hybrid libraries of the E. coli strain MG1655 and B. subtilis strain 168 genomes, prepared without passaging through E. coli 
BMC Genomics  2003;4:36.
Cloning of genes in expression libraries, such as the yeast two-hybrid system (Y2H), is based on the assumption that the loss of target genes is minimal, or at worst, managable. However, the expression of genes or gene fragments that are capable of interacting with E. coli or yeast gene products in these systems has been shown to be growth inhibitory, and therefore these clones are underrepresented (or completely lost) in the amplified library.
Analysis of candidate genes as Y2H fusion constructs has shown that, while stable in E. coli and yeast for genetic studies, they are rapidly lost in growth conditions for genomic libraries. This includes the rapid loss of a fragment of the E. coli cell division gene ftsZ which encodes the binding site for ZipA and FtsA. Expression of this clone causes slower growth in E. coli. This clone is also rapidly lost in yeast, when expressed from a GAL1 promoter, relative to a vector control, but is stable when the promoter is repressed. We have demonstrated in this report that the construction of libraries for the E. coli and B. subtilis genomes without passaging through E. coli is practical, but the number of transformants is less than for libraries cloned using E. coli as a host. Analysis of several clones in the libraries that are strongly growth inhibitory in E. coli include genes for many essential cellular processes, such as transcription, translation, cell division, and transport.
Expression of Y2H clones capable of interacting with E. coli and yeast targets are rapidly lost, causing a loss of complexity. The strategy for preparing Y2H libraries described here allows the retention of genes that are toxic when inappropriately expressed in E. coli, or yeast, including many genes that represent potential antibacterial targets. While these methods are generally applicable to the generation of Y2H libraries from any source, including mammalian and plant genomes, the potential of functional clones interacting with host proteins to inhibit growth would make this approach most relevant for the study of prokaryotic genomes.
PMCID: PMC212392  PMID: 12964949
19.  The Anopheles gambiae glutathione transferase supergene family: annotation, phylogeny and expression profiles 
BMC Genomics  2003;4:35.
Twenty-eight genes putatively encoding cytosolic glutathione transferases have been identified in the Anopheles gambiae genome. We manually annotated these genes and then confirmed the annotation by sequencing of A. gambiae cDNAs. Phylogenetic analysis with the 37 putative GST genes from Drosophila and representative GSTs from other taxa was undertaken to develop a nomenclature for insect GSTs. The epsilon class of insect GSTs has previously been implicated in conferring insecticide resistance in several insect species. We compared the expression level of all members of this GST class in two strains of A. gambiae to determine whether epsilon GST expression is correlated with insecticide resistance status.
Two A. gambiae GSTs are alternatively spliced resulting in a maximum number of 32 transcripts encoding cytosolic GSTs. We detected cDNAs for 31 of these in adult mosquitoes. There are at least six different classes of GSTs in insects but 20 of the A. gambiae GSTs belong to the two insect specific classes, delta and epsilon. Members of these two GST classes are clustered on chromosome arms 2L and 3R respectively. Two members of the GST supergene family are intronless. Amongst the remainder, there are 13 unique introns positions but within the epsilon and delta class, there is considerable conservation of intron positions. Five of the eight epsilon GSTs are overexpressed in a DDT resistant strain of A. gambiae.
The GST supergene family in A. gambiae is extensive and regulation of transcription of these genes is complex. Expression profiling of the epsilon class supports earlier predictions that this class is important in conferring insecticide resistance.
PMCID: PMC194574  PMID: 12914673
20.  Application of comparative genomics in the identification and analysis of novel families of membrane-associated receptors in bacteria 
BMC Genomics  2003;4:34.
A great diversity of multi-pass membrane receptors, typically with 7 transmembrane (TM) helices, is observed in the eukaryote crown group. So far, they are relatively rare in the prokaryotes, and are restricted to the well-characterized sensory rhodopsins of various phototropic prokaryotes.
Utilizing the currently available wealth of prokaryotic genomic sequences, we set up a computational screen to identify putative 7 (TM) and other multi-pass membrane receptors in prokaryotes. As a result of this procedure we were able to recover two widespread families of 7 TM receptors in bacteria that are distantly related to the eukaryotic 7 TM receptors and prokaryotic rhodopsins. Using sequence profile analysis, we were able to establish that the first members of these receptor families contain one of two distinct N-terminal extracellular globular domains, which are predicted to bind ligands such as carbohydrates. In their intracellular portions they contain fusions to a variety of signaling domains, which suggest that they are likely to transduce signals via cyclic AMP, cyclic diguanylate, histidine phosphorylation, dephosphorylation, and through direct interactions with DNA. The second family of bacterial 7 TM receptors possesses an α-helical extracellular domain, and is predicted to transduce a signal via an intracellular HD hydrolase domain. Based on comparative analysis of gene neighborhoods, this receptor is predicted to function as a regulator of the diacylglycerol-kinase-dependent glycerolipid pathway. Additionally, our procedure also recovered other types of putative prokaryotic multi-pass membrane associated receptor domains. Of these, we characterized two widespread, evolutionarily mobile multi-TM domains that are fused to a variety of C-terminal intracellular signaling domains. One of these typified by the Gram-positive LytS protein is predicted to be a potential sensor of murein derivatives, whereas the other one typified by the Escherichia coli UhpB protein is predicted to function as sensor of conformational changes occurring in associated membrane proteins
We present evidence for considerable variety in the types of uncharacterized surface receptors in bacteria, and reconstruct the evolutionary processes that model their diversity. The identification of novel receptor families in prokaryotes is likely to aid in the experimental analysis of signal transduction and environmental responses of several bacteria, including pathogens such as Leptospira, Treponema, Corynebacterium, Coxiella, Bacillus anthracis and Cytophaga.
PMCID: PMC212514  PMID: 12914674
21.  Identification of a novel Drosophila gene, beltless, using injectable embryonic and adult RNA interference (RNAi) 
BMC Genomics  2003;4:33.
RNA interference (RNAi) is a process triggered by a double-stranded RNA that leads to targeted down-regulation/silencing of gene expression and can be used for functional genomics; i.e. loss-of-function studies. Here we report on the use of RNAi in the identification of a developmentally important novel Drosophila (fruit fly) gene (corresponding to a putative gene CG5652/GM06434), that we named beltless based on an embryonic loss-of-function phenotype.
Beltless mRNA is expressed in all developmental stages except in 0–6 h embryos. In situ RT-PCR localized beltless mRNA in the ventral cord and brain of late stage embryos and in the nervous system, ovaries, and the accessory glands of adult flies. RNAi was induced by injection of short (22 bp) beltless double-stranded RNAs into embryos or into adult flies. Embryonic RNAi altered cuticular phenotypes ranging from partially-formed to missing denticle belts (thus beltless) of the abdominal segments A2–A4. Embryonic beltless RNAi was lethal. Adult RNAi resulted in the shrinkage of the ovaries by half and reduced the number of eggs laid. We also examined Df(1)RK4 flies in which deletion removes 16 genes, including beltless. In some embryos, we observed cuticular abnormalities similar to our findings with beltless RNAi. After differentiating Df(1)RK4 embryos into those with visible denticle belts and those missing denticle belts, we assayed the presence of beltless mRNA; no beltless mRNA was detectable in embryos with missing denticle belts.
We have identified a developmentally important novel Drosophila gene, beltless, which has been characterized in loss-of-function studies using RNA interference. The putative beltless protein shares homologies with the C. elegans nose resistant to fluoxetine (NRF) NRF-6 gene, as well as with several uncharacterized C. elegans and Drosophila melanogaster genes, some with prominent acyltransferase domains. Future studies should elucidate the role and mechanism of action of beltless during Drosophila development and in adults, including in the adult nervous system.
PMCID: PMC194572  PMID: 12914675
22.  Molecular cloning, genomic characterization and over-expression of a novel gene, XRRA1, identified from human colorectal cancer cell HCT116Clone2_XRR and macaque testis 
BMC Genomics  2003;4:32.
As part of our investigation into the genetic basis of tumor cell radioresponse, we have isolated several clones with a wide range of responses to X-radiation (XR) from an unirradiated human colorectal tumor cell line, HCT116. Using human cDNA microarrays, we recently identified a novel gene that was down-regulated by two-fold in an XR-resistant cell clone, HCT116Clone2_XRR. We have named this gene as X-ray radiation resistance associated 1 (XRRA1) (GenBank BK000541). Here, we present the first report on the molecular cloning, genomic characterization and over-expression of the XRRA1 gene.
We found that XRRA1 was expressed predominantly in testis of both human and macaque. cDNA microarray analysis showed three-fold higher expression of XRRA1 in macaque testis relative to other tissues. We further cloned the macaque XRRA1 cDNA (GenBank AB072776) and a human XRRA1 splice variant from HCT116Clone2_XRR (GenBank AY163836). In silico analysis revealed the full-length human XRRA1, mouse, rat and bovine Xrra1 cDNAs. The XRRA1 gene comprises 11 exons and spans 64 kb on chromosome 11q13.3. Human and macaque cDNAs share 96% homology. Human XRRA1 cDNA is 1987 nt long and encodes a protein of 559 aa. XRRA1 protein is highly conserved in human, macaque, mouse, rat, pig, and bovine. GFP-XRRA1 fusion protein was detected in both the nucleus and cytoplasm of HCT116 clones and COS-7 cells. Interestingly, we found evidence that COS-7 cells which over-expressed XRRA1 lacked Ku86 (Ku80, XRCC5), a non-homologous end joining (NHEJ) DNA repair molecule, in the nucleus. RT-PCR analysis showed differential expression of XRRA1 after XR in HCT116 clones manifesting significantly different XR responses. Further, we found that XRRA1 was expressed in most tumor cell types. Surprisingly, mouse Xrra1 was detected in mouse embryonic stem cells R1.
Both XRRA1 cDNA and protein are highly conserved among mammals, suggesting that XRRA1 may have similar functions. Our results also suggest that the genetic modulation of XRRA1 may affect the XR responses of HCT116 clones and that XRRA1 may have a role in the response of human tumor and normal cells to XR. XRRA1 might be correlated with cancer development and might also be an early expressed gene.
PMCID: PMC194569  PMID: 12908878
23.  Congruence of tissue expression profiles from Gene Expression Atlas, SAGEmap and TissueInfo databases 
BMC Genomics  2003;4:31.
Extracting biological knowledge from large amounts of gene expression information deposited in public databases is a major challenge of the postgenomic era. Additional insights may be derived by data integration and cross-platform comparisons of expression profiles. However, database meta-analysis is complicated by differences in experimental technologies, data post-processing, database formats, and inconsistent gene and sample annotation.
We have analysed expression profiles from three public databases: Gene Expression Atlas, SAGEmap and TissueInfo. These are repositories of oligonucleotide microarray, Serial Analysis of Gene Expression and Expressed Sequence Tag human gene expression data respectively. We devised a method, Preferential Expression Measure, to identify genes that are significantly over- or under-expressed in any given tissue. We examined intra- and inter-database consistency of Preferential Expression Measures. There was good correlation between replicate experiments of oligonucleotide microarray data, but there was less coherence in expression profiles as measured by Serial Analysis of Gene Expression and Expressed Sequence Tag counts. We investigated inter-database correlations for six tissue categories, for which data were present in the three databases. Significant positive correlations were found for brain, prostate and vascular endothelium but not for ovary, kidney, and pancreas.
We show that data from Gene Expression Atlas, SAGEmap and TissueInfo can be integrated using the UniGene gene index, and that expression profiles correlate relatively well when large numbers of tags are available or when tissue cellular composition is simple. Finally, in the case of brain, we demonstrate that when PEM values show good correlation, predictions of tissue-specific expression based on integrated data are very accurate.
PMCID: PMC183867  PMID: 12885301
24.  Analysis of the conservation of synteny between Fugu and human chromosome 12 
BMC Genomics  2003;4:30.
The pufferfish Fugu rubripes (Fugu) with its compact genome is increasingly recognized as an important vertebrate model for comparative genomic studies. In particular, large regions of conserved synteny between human and Fugu genomes indicate its utility to identify disease-causing genes. The human chromosome 12p12 is frequently deleted in various hematological malignancies and solid tumors, but the actual tumor suppressor gene remains unidentified.
We investigated approximately 200 kb of the genomic region surrounding the ETV6 locus in Fugu (fETV6) in order to find conserved functional features, such as genes or regulatory regions, that could give insight into the nature of the genes targeted by deletions in human cancer cells. Seven genes were identified near the fETV6 locus. We found that the synteny with human chromosome 12 was conserved, but extensive genomic rearrangements occurred between the Fugu and human ETV6 loci.
This comparative analysis led to the identification of previously uncharacterized genes in the human genome and some potentially important regulatory sequences as well. This is a good indication that the analysis of the compact Fugu genome will be valuable to identify functional features that have been conserved throughout the evolution of vertebrates.
PMCID: PMC179898  PMID: 12877756
25.  Comparative genomic analysis reveals independent expansion of a lineage-specific gene family in vertebrates: The class II cytokine receptors and their ligands in mammals and fish 
BMC Genomics  2003;4:29.
The high degree of sequence conservation between coding regions in fish and mammals can be exploited to identify genes in mammalian genomes by comparison with the sequence of similar genes in fish. Conversely, experimentally characterized mammalian genes may be used to annotate fish genomes. However, gene families that escape this principle include the rapidly diverging cytokines that regulate the immune system, and their receptors. A classic example is the class II helical cytokines (HCII) including type I, type II and lambda interferons, IL10 related cytokines (IL10, IL19, IL20, IL22, IL24 and IL26) and their receptors (HCRII). Despite the report of a near complete pufferfish (Takifugu rubripes) genome sequence, these genes remain undescribed in fish.
We have used an original strategy based both on conserved amino acid sequence and gene structure to identify HCII and HCRII in the genome of another pufferfish, Tetraodon nigroviridis that is amenable to laboratory experiments. The 15 genes that were identified are highly divergent and include a single interferon molecule, three IL10 related cytokines and their potential receptors together with two Tissue Factor (TF). Some of these genes form tandem clusters on the Tetraodon genome. Their expression pattern was determined in different tissues. Most importantly, Tetraodon interferon was identified and we show that the recombinant protein can induce antiviral MX gene expression in Tetraodon primary kidney cells. Similar results were obtained in Zebrafish which has 7 MX genes.
We propose a scheme for the evolution of HCII and their receptors during the radiation of bony vertebrates and suggest that the diversification that played an important role in the fine-tuning of the ancestral mechanism for host defense against infections probably followed different pathways in amniotes and fish.
PMCID: PMC179897  PMID: 12869211

Results 1-25 (53)