Search tips
Search criteria

Results 1-25 (182)

Clipboard (0)
Year of Publication
1.  Comparison of gene expression signatures of diamide, H2O2 and menadione exposed Aspergillus nidulans cultures – linking genome-wide transcriptional changes to cellular physiology 
BMC Genomics  2005;6:182.
In addition to their cytotoxic nature, reactive oxygen species (ROS) are also signal molecules in diverse cellular processes in eukaryotic organisms. Linking genome-wide transcriptional changes to cellular physiology in oxidative stress-exposed Aspergillus nidulans cultures provides the opportunity to estimate the sizes of peroxide (O22-), superoxide (O2•-) and glutathione/glutathione disulphide (GSH/GSSG) redox imbalance responses.
Genome-wide transcriptional changes triggered by diamide, H2O2 and menadione in A. nidulans vegetative tissues were recorded using DNA microarrays containing 3533 unique PCR-amplified probes. Evaluation of LOESS-normalized data indicated that 2499 gene probes were affected by at least one stress-inducing agent. The stress induced by diamide and H2O2 were pulse-like, with recovery after 1 h exposure time while no recovery was observed with menadione. The distribution of stress-responsive gene probes among major physiological functional categories was approximately the same for each agent. The gene group sizes solely responsive to changes in intracellular O22-, O2•- concentrations or to GSH/GSSG redox imbalance were estimated at 7.7, 32.6 and 13.0 %, respectively. Gene groups responsive to diamide, H2O2 and menadione treatments and gene groups influenced by GSH/GSSG, O22- and O2•- were only partly overlapping with distinct enrichment profiles within functional categories. Changes in the GSH/GSSG redox state influenced expression of genes coding for PBS2 like MAPK kinase homologue, PSK2 kinase homologue, AtfA transcription factor, and many elements of ubiquitin tagging, cell division cycle regulators, translation machinery proteins, defense and stress proteins, transport proteins as well as many enzymes of the primary and secondary metabolisms. Meanwhile, a separate set of genes encoding transport proteins, CpcA and JlbA amino acid starvation-responsive transcription factors, and some elements of sexual development and sporulation was ROS responsive.
The existence of separate O22-, O2•- and GSH/GSSG responsive gene groups in a eukaryotic genome has been demonstrated. Oxidant-triggered, genome-wide transcriptional changes should be analyzed considering changes in oxidative stress-responsive physiological conditions and not correlating them directly to the chemistry and concentrations of the oxidative stress-inducing agent.
PMCID: PMC1352360  PMID: 16368011
2.  Sal-Site: Integrating new and existing ambystomatid salamander research and informational resources 
BMC Genomics  2005;6:181.
Salamanders of the genus Ambystoma are a unique model organism system because they enable natural history and biomedical research in the laboratory or field. We developed Sal-Site to integrate new and existing ambystomatid salamander research resources in support of this model system. Sal-Site hosts six important resources: 1) Salamander Genome Project: an information-based web-site describing progress in genome resource development, 2) Ambystoma EST Database: a database of manually edited and analyzed contigs assembled from ESTs that were collected from A. tigrinum tigrinum and A. mexicanum, 3) Ambystoma Gene Collection: a database containing full-length protein-coding sequences, 4) Ambystoma Map and Marker Collection: an image and database resource that shows the location of mapped markers on linkage groups, provides information about markers, and provides integrating links to Ambystoma EST Database and Ambystoma Gene Collection databases, 5) Ambystoma Genetic Stock Center: a website and collection of databases that describe an NSF funded salamander rearing facility that generates and distributes biological materials to researchers and educators throughout the world, and 6) Ambystoma Research Coordination Network: a web-site detailing current research projects and activities involving an international group of researchers. Sal-Site is accessible at .
PMCID: PMC1351182  PMID: 16359543
3.  A method for accurate detection of genomic microdeletions using real-time quantitative PCR 
BMC Genomics  2005;6:180.
Quantitative Polymerase Chain Reaction (qPCR) is a well-established method for quantifying levels of gene expression, but has not been routinely applied to the detection of constitutional copy number alterations of human genomic DNA. Microdeletions or microduplications of the human genome are associated with a variety of genetic disorders. Although, clinical laboratories routinely use fluorescence in situ hybridization (FISH) to identify such cryptic genomic alterations, there remains a significant number of individuals in which constitutional genomic imbalance is suspected, based on clinical parameters, but cannot be readily detected using current cytogenetic techniques.
In this study, a novel application for real-time qPCR is presented that can be used to reproducibly detect chromosomal microdeletions and microduplications. This approach was applied to DNA from a series of patient samples and controls to validate genomic copy number alteration at cytoband 22q11. The study group comprised 12 patients with clinical symptoms of chromosome 22q11 deletion syndrome (22q11DS), 1 patient trisomic for 22q11 and 4 normal controls. 6 of the patients (group 1) had known hemizygous deletions, as detected by standard diagnostic FISH, whilst the remaining 6 patients (group 2) were classified as 22q11DS negative using the clinical FISH assay. Screening of the patients and controls with a set of 10 real time qPCR primers, spanning the 22q11.2-deleted region and flanking sequence, confirmed the FISH assay results for all patients with 100% concordance. Moreover, this qPCR enabled a refinement of the region of deletion at 22q11. Analysis of DNA from chromosome 22 trisomic sample demonstrated genomic duplication within 22q11.
In this paper we present a qPCR approach for the detection of chromosomal microdeletions and microduplications. The strategic use of in silico modelling for qPCR primer design to avoid regions of repetitive DNA, whilst providing a level of genomic resolution greater than standard cytogenetic assays. The implementation of qPCR detection in clinical laboratories will address the need to replace complex, expensive and time consuming FISH screening to detect genomic microdeletions or duplications of clinical importance.
PMCID: PMC1327677  PMID: 16351727
4.  GPX-Macrophage Expression Atlas: A database for expression profiles of macrophages challenged with a variety of pro-inflammatory, anti-inflammatory, benign and pathogen insults 
BMC Genomics  2005;6:178.
Macrophages play an integral role in the host immune system, bridging innate and adaptive immunity. As such, they are finely attuned to extracellular and intracellular stimuli and respond by rapidly initiating multiple signalling cascades with diverse effector functions. The macrophage cell is therefore an experimentally and clinically amenable biological system for the mapping of biological pathways. The goal of the macrophage expression atlas is to systematically investigate the pathway biology and interaction network of macrophages challenged with a variety of insults, in particular via infection and activation with key inflammatory mediators. As an important first step towards this we present a single searchable database resource containing high-throughput macrophage gene expression studies.
The GPX Macrophage Expression Atlas (GPX-MEA) is an online resource for gene expression based studies of a range of macrophage cell types following treatment with pathogens and immune modulators. GPX-MEA follows the MIAME standard and includes an objective quality score with each experiment. It places special emphasis on rigorously capturing the experimental design and enables the searching of expression data from different microarray experiments. Studies may be queried on the basis of experimental parameters, sample information and quality assessment score. The ability to compare the expression values of individual genes across multiple experiments is provided. In addition, the database offers access to experimental annotation and analysis files and includes experiments and raw data previously unavailable to the research community.
GPX-MEA is the first example of a quality scored gene expression database focussed on a macrophage cellular system that allows efficient identification of transcriptional patterns. The resource will provide novel insights into the phenotypic response of macrophages to a variety of benign, inflammatory, and pathogen insults. GPX-MEA is available through the GPX website at .
PMCID: PMC1351201  PMID: 16343346
5.  Comparative genome analysis reveals a conserved family of actin-like proteins in apicomplexan parasites 
BMC Genomics  2005;6:179.
The phylum Apicomplexa is an early-branching eukaryotic lineage that contains a number of important human and animal pathogens. Their complex life cycles and unique cytoskeletal features distinguish them from other model eukaryotes. Apicomplexans rely on actin-based motility for cell invasion, yet the regulation of this system remains largely unknown. Consequently, we focused our efforts on identifying actin-related proteins in the recently completed genomes of Toxoplasma gondii, Plasmodium spp., Cryptosporidium spp., and Theileria spp.
Comparative genomic and phylogenetic studies of apicomplexan genomes reveals that most contain only a single conventional actin and yet they each have 8–10 additional actin-related proteins. Among these are a highly conserved Arp1 protein (likely part of a conserved dynactin complex), and Arp4 and Arp6 homologues (subunits of the chromatin-remodeling machinery). In contrast, apicomplexans lack canonical Arp2 or Arp3 proteins, suggesting they lost the Arp2/3 actin polymerization complex on their evolutionary path towards intracellular parasitism. Seven of these actin-like proteins (ALPs) are novel to apicomplexans. They show no phylogenetic associations to the known Arp groups and likely serve functions specific to this important group of intracellular parasites.
The large diversity of actin-like proteins in apicomplexans suggests that the actin protein family has diverged to fulfill various roles in the unique biology of intracellular parasites. Conserved Arps likely participate in vesicular transport and gene expression, while apicomplexan-specific ALPs may control unique biological traits such as actin-based gliding motility.
PMCID: PMC1334187  PMID: 16343347
6.  Comparative analysis of programmed cell death pathways in filamentous fungi 
BMC Genomics  2005;6:177.
Fungi can undergo autophagic- or apoptotic-type programmed cell death (PCD) on exposure to antifungal agents, developmental signals, and stress factors. Filamentous fungi can also exhibit a form of cell death called heterokaryon incompatibility (HI) triggered by fusion between two genetically incompatible individuals. With the availability of recently sequenced genomes of Aspergillus fumigatus and several related species, we were able to define putative components of fungi-specific death pathways and the ancestral core apoptotic machinery shared by all fungi and metazoa.
Phylogenetic profiling of HI-associated proteins from four Aspergilli and seven other fungal species revealed lineage-specific protein families, orphan genes, and core genes conserved across all fungi and metazoa. The Aspergilli-specific domain architectures include NACHT family NTPases, which may function as key integrators of stress and nutrient availability signals. They are often found fused to putative effector domains such as Pfs, SesB/LipA, and a newly identified domain, HET-s/LopB. Many putative HI inducers and mediators are specific to filamentous fungi and not found in unicellular yeasts. In addition to their role in HI, several of them appear to be involved in regulation of cell cycle, development and sexual differentiation. Finally, the Aspergilli possess many putative downstream components of the mammalian apoptotic machinery including several proteins not found in the model yeast, Saccharomyces cerevisiae.
Our analysis identified more than 100 putative PCD associated genes in the Aspergilli, which may help expand the range of currently available treatments for aspergillosis and other invasive fungal diseases. The list includes species-specific protein families as well as conserved core components of the ancestral PCD machinery shared by fungi and metazoa.
PMCID: PMC1325252  PMID: 16336669
7.  Sequence comparisons of plasmids pBJS-O of Spiroplasma citri and pSKU146 of S. kunkelii: implications for plasmid evolution 
BMC Genomics  2005;6:175.
Spiroplasma citri BR3-3X and S. kunkelii CR2-3X cause serious diseases worldwide on citrus and maize species, respectively. S. citri BR3-3X harbors a plasmid, pBJS-Original (pBJS-O), that encodes the spiroplasma adhesion related protein 1 (SARP1), a protein implicated in binding of the pathogen to cells of its leafhopper vector, Circulifer tenellus. The S. kunkelii CR2-3X plasmid, pSKU146, encodes a homolog of SARP1, Sk-ARP1. Due to the close phylogenetic relationship of the two pathogens, we hypothesized that the two plasmids are closely related as well.
The nucleotide sequence of pBJS-O was determined and compared to the sequences of a plasmid from BR3-T (pBJS-T), which is a multiply passaged leafhopper transmissible derivative of BR3-3X, and to known plasmid sequences including that of pSKU146. In addition to arp1, the 13,374 bp pBJS-O sequence putatively contains nine genes, recognized as open reading frames (ORFs). Several pBJS-O ORFs have homologs on pSKU146. However, the sequences flanking soj-like genes on both plasmids were found to be more distant from one another than sequences in any other region. Further, unlike pSKU146, pBJS-O lacks the conserved oriT region characteristic of the IncP group of bacterial plasmids. We were unable to identify a region in pBJS-O resembling a known plasmid origin of transfer. In regions where sequence was available for the plasmid from both BR3-3X and BR3-T, the pBJS-T sequence had a 0.4 kb deletion relative to its progenitor, pBJS-O. Southern blot hybridization of extrachromosomal DNA from various S. citri strains and spiroplasma species to an arp-specific probe and a probe made from the entire plasmid DNA of BR3-3X revealed limited conservation of both sequences in the genus Spiroplasma. Finally, we also report the presence on the BR3-3X chromosome of arp2, an S. citri homolog of arp1 that encodes the predicted protein SARP2. The C-terminal domain of SARP2 is homologous to that of SARP1, but its N-terminal domain is distinct.
Our data suggest that pBJS is a novel S. citri plasmid that does not belong to any known plasmid incompatibility group. The differences between pBJS-O and pSKU146 suggest that one or more events of recombination have contributed to the divergence of the plasmids of the two sister Spiroplasma species; the plasmid from S. citri itself has diverged slightly during the derivation of S. citri BR3-T from BR3-3X. Our data also show that pBJS-O encodes the putative adhesin SARP1. The presence of traE and mob on pBJS-O suggests a role for the plasmid in spiroplasmal conjugation.
PMCID: PMC1318496  PMID: 16336638
8.  ASAP: Amplification, sequencing & annotation of plastomes 
BMC Genomics  2005;6:176.
Availability of DNA sequence information is vital for pursuing structural, functional and comparative genomics studies in plastids. Traditionally, the first step in mining the valuable information within a chloroplast genome requires sequencing a chloroplast plasmid library or BAC clones. These activities involve complicated preparatory procedures like chloroplast DNA isolation or identification of the appropriate BAC clones to be sequenced. Rolling circle amplification (RCA) is being used currently to amplify the chloroplast genome from purified chloroplast DNA and the resulting products are sheared and cloned prior to sequencing. Herein we present a universal high-throughput, rapid PCR-based technique to amplify, sequence and assemble plastid genome sequence from diverse species in a short time and at reasonable cost from total plant DNA, using the large inverted repeat region from strawberry and peach as proof of concept. The method exploits the highly conserved coding regions or intergenic regions of plastid genes. Using an informatics approach, chloroplast DNA sequence information from 5 available eudicot plastomes was aligned to identify the most conserved regions. Cognate primer pairs were then designed to generate ~1 – 1.2 kb overlapping amplicons from the inverted repeat region in 14 diverse genera.
100% coverage of the inverted repeat region was obtained from Arabidopsis, tobacco, orange, strawberry, peach, lettuce, tomato and Amaranthus. Over 80% coverage was obtained from distant species, including Ginkgo, loblolly pine and Equisetum. Sequence from the inverted repeat region of strawberry and peach plastome was obtained, annotated and analyzed. Additionally, a polymorphic region identified from gel electrophoresis was sequenced from tomato and Amaranthus. Sequence analysis revealed large deletions in these species relative to tobacco plastome thus exhibiting the utility of this method for structural and comparative genomics studies.
This simple, inexpensive method now allows immediate access to plastid sequence, increasing experimental throughput and serving generally as a universal platform for plastid genome characterization. The method applies well to whole genome studies and speeds assessment of variability across species, making it a useful tool in plastid structural genomics.
PMCID: PMC1318494  PMID: 16336644
9.  Bacterial genome adaptation to niches: Divergence of the potential virulence genes in three Burkholderia species of different survival strategies 
BMC Genomics  2005;6:174.
Two closely related species Burkholderia mallei (Bm) and Burkholderia pseudomallei (Bp) are serious human health hazards and are potential bio-warfare agents, whereas another closely related species Burkholderia thailandensis (Bt) is a non-pathogenic saprophyte. To investigate the genomic factors resulting in such a dramatic difference, we first identified the Bm genes responsive to the mouse environment, and then examined the divergence of these genes in Bp and Bt.
The genes down-expressed, which largely encode cell growth-related proteins, are conserved well in all three species, whereas those up-expressed, which include potential virulence genes, are less well conserved or absent notably in Bt. However, a substantial number of up-expressed genes is still conserved in Bt. Bm and Bp further diverged from each other in a small number of genes resulting from unit number changes in simple sequence repeats (ssr) in the homologs.
Our data suggest that divergent evolution of a small set of genes, rather than acquisition or loss of pathogenic islands, is associated with the development of different life styles in these bacteria of similar genomic contents. Further divergence between Bm and Bp mediated by ssr changes may reflect different adaptive processes of Bm and Bp fine-tuning into their host environments.
PMCID: PMC1343551  PMID: 16336651
10.  The odorant receptor repertoire of teleost fish 
BMC Genomics  2005;6:173.
Vertebrate odorant receptors comprise three types of G protein-coupled receptors: the OR, V1R and V2R receptors. The OR superfamily contains over 1,000 genes in some mammalian species, representing the largest gene superfamily in the mammalian genome.
To facilitate an informed analysis of OR gene phylogeny, we identified the complete set of 143 OR genes in the zebrafish genome, as well as the OR repertoires in two pufferfish species, fugu (44 genes) and tetraodon (42 genes). Although the genomes analyzed here contain fewer genes than in mammalian species, the teleost OR genes can be grouped into a larger number of major clades, representing greater overall OR diversity in the fish.
Based on the phylogeny of fish and mammalian repertoires, we propose a model for OR gene evolution in which different ancestral OR genes or gene families were selectively lost or expanded in different vertebrate lineages. In addition, our calculations of the ratios of non-synonymous to synonymous codon substitutions among more recently expanding OR subgroups in zebrafish implicate residues that may be involved in odorant binding.
PMCID: PMC1325023  PMID: 16332259
11.  The angiotensin-converting enzyme (ACE) gene family of Anopheles gambiae 
BMC Genomics  2005;6:172.
Members of the M2 family of peptidases, related to mammalian angiotensin converting enzyme (ACE), play important roles in regulating a number of physiological processes. As more invertebrate genomes are sequenced, there is increasing evidence of a variety of M2 peptidase genes, even within a single species. The function of these ACE-like proteins is largely unknown. Sequencing of the A. gambiae genome has revealed a number of ACE-like genes but probable errors in the Ensembl annotation have left the number of ACE-like genes, and their structure, unclear.
TBLASTN and sequence analysis of cDNAs revealed that the A. gambiae genome contains nine genes (AnoACE genes) which code for proteins with similarity to mammalian ACE. Eight of these genes code for putative single domain enzymes similar to other insect ACEs described so far. AnoACE9, however, has several features in common with mammalian somatic ACE such as a two domain structure and a hydrophobic C terminus. Four of the AnoACE genes (2, 3, 7 and 9) were shown to be expressed at a variety of developmental stages. Expression of AnoACE3, AnoACE7 and AnoACE9 is induced by a blood meal, with AnoACE7 showing the largest (approximately 10-fold) induction.
Genes coding for two-domain ACEs have arisen several times during the course of evolution suggesting a common selective advantage to having an ACE with two active-sites in tandem in a single protein. AnoACE7 belongs to a sub-group of insect ACEs which are likely to be membrane-bound and which have an unusual, conserved gene structure.
PMCID: PMC1325048  PMID: 16329762
12.  An EST-based approach for identifying genes expressed in the intestine and gills of pre-smolt Atlantic salmon (Salmo salar) 
BMC Genomics  2005;6:171.
The Atlantic salmon is an important aquaculture species and a very interesting species biologically, since it spawns in fresh water and develops through several stages before becoming a smolt, the stage at which it migrates to the sea to feed. The dramatic change of habitat requires physiological, morphological and behavioural changes to prepare the salmon for its new environment. These changes are called the parr-smolt transformation or smoltification, and pre-adapt the salmon for survival and growth in the marine environment. The development of hypo-osmotic regulatory ability plays an important part in facilitating the transition from rivers to the sea. The physiological mechanisms behind the developmental changes are largely unknown. An understanding of the transformation process will be vital to the future of the aquaculture industry. A knowledge of which genes are expressed prior to the smoltification process is an important basis for further studies.
In all, 2974 unique sequences, consisting of 779 contigs and 2195 singlets, were generated for Atlantic salmon from two cDNA libraries constructed from the gills and the intestine, accession numbers [Genbank: CK877169-CK879929, CK884015-CK886537 and CN181112-CN181464]. Nearly 50% of the sequences were assigned putative functions because they showed similarity to known genes, mostly from other species, in one or more of the databases used. The Swiss-Prot database returned significant hits for 1005 sequences. These could be assigned predicted gene products, and 967 were annotated using Gene Ontology (GO) terms for molecular function, biological process and/or cellular component, employing an annotation transfer procedure.
This paper describes the construction of two cDNA libraries from pre-smolt Atlantic salmon (Salmo salar) and the subsequent EST sequencing, clustering and assigning of putative function to 1005 genes expressed in the gills and/or intestine.
PMCID: PMC1318472  PMID: 16321156
13.  Efficient single nucleotide polymorphism discovery in laboratory rat strains using wild rat-derived SNP candidates 
BMC Genomics  2005;6:170.
The laboratory rat (Rattus norvegicus) is an important model for studying many aspects of human health and disease. Detailed knowledge on genetic variation between strains is important from a biomedical, particularly pharmacogenetic point of view and useful for marker selection for genetic cloning and association studies.
We show that Single Nucleotide Polymorphisms (SNPs) in commonly used rat strains are surprisingly well represented in wild rat isolates. Shotgun sequencing of 814 Kbp in one wild rat resulted in the identification of 485 SNPs as compared with the Brown Norway genome sequence. Genotyping 36 commonly used inbred rat strains showed that 84% of these alleles are also polymorphic in a representative set of laboratory rat strains.
We postulate that shotgun sequencing in a wild rat sample and subsequent genotyping in multiple laboratory or domesticated strains rather than direct shotgun sequencing of multiple strains, could be the most efficient SNP discovery approach. For the rat, laboratory strains still harbor a large portion of the haplotypes present in wild isolates, suggesting a relatively recent common origin and supporting the idea that rat inbred strains, in contrast to mouse inbred strains, originate from a single species, R. norvegicus.
PMCID: PMC1318490  PMID: 16316463
14.  The EH1 motif in metazoan transcription factors 
BMC Genomics  2005;6:169.
The Engrailed Homology 1 (EH1) motif is a small region, believed to have evolved convergently in homeobox and forkhead containing proteins, that interacts with the Drosophila protein groucho (C. elegans unc-37, Human Transducin-like Enhancers of Split). The small size of the motif makes its reliable identification by computational means difficult. I have systematically searched the predicted proteomes of Drosophila, C. elegans and human for further instances of the motif.
Using motif identification methods and database searching techniques, I delimit which homeobox and forkhead domain containing proteins also have likely EH1 motifs. I show that despite low database search scores, there is a significant association of the motif with transcription factor function. I further show that likely EH1 motifs are found in combination with T-Box, Zinc Finger and Doublesex domains as well as discussing other plausible candidate associations. I identify strong candidate EH1 motifs in basal metazoan phyla.
Candidate EH1 motifs exist in combination with a variety of transcription factor domains, suggesting that these proteins have repressor functions. The distribution of the EH1 motif is suggestive of convergent evolution, although in many cases, the motif has been conserved throughout bilaterian orthologs. Groucho mediated repression was established prior to the evolution of bilateria.
PMCID: PMC1310626  PMID: 16309560
15.  Genes involved in complex adaptive processes tend to have highly conserved upstream regions in mammalian genomes 
BMC Genomics  2005;6:168.
Recent advances in genome sequencing suggest a remarkable conservation in gene content of mammalian organisms. The similarity in gene repertoire present in different organisms has increased interest in studying regulatory mechanisms of gene expression aimed at elucidating the differences in phenotypes. In particular, a proximal promoter region contains a large number of regulatory elements that control the expression of its downstream gene. Although many studies have focused on identification of these elements, a broader picture on the complexity of transcriptional regulation of different biological processes has not been addressed in mammals. The regulatory complexity may strongly correlate with gene function, as different evolutionary forces must act on the regulatory systems under different biological conditions. We investigate this hypothesis by comparing the conservation of promoters upstream of genes classified in different functional categories.
By conducting a rank correlation analysis between functional annotation and upstream sequence alignment scores obtained by human-mouse and human-dog comparison, we found a significantly greater conservation of the upstream sequence of genes involved in development, cell communication, neural functions and signaling processes than those involved in more basic processes shared with unicellular organisms such as metabolism and ribosomal function. This observation persists after controlling for G+C content. Considering conservation as a functional signature, we hypothesize a higher density of cis-regulatory elements upstream of genes participating in complex and adaptive processes.
We identified a class of functions that are associated with either high or low promoter conservation in mammals. We detected a significant tendency that points to complex and adaptive processes were associated with higher promoter conservation, despite the fact that they have emerged relatively recently during evolution. We described and contrasted several hypotheses that provide a deeper insight into how transcriptional complexity might have been emerged during evolution.
PMCID: PMC1310621  PMID: 16309559
16.  Characterization of 954 bovine full-CDS cDNA sequences 
BMC Genomics  2005;6:166.
Genome assemblies rely on the existence of transcript sequence to stitch together contigs, verify assembly of whole genome shotgun reads, and annotate genes. Functional genomics studies also rely on transcript sequence to create expression microarrays or interpret digital tag data produced by methods such as Serial Analysis of Gene Expression (SAGE). Transcript sequence can be predicted based on reconstruction from overlapping expressed sequence tags (EST) that are obtained by single-pass sequencing of random cDNA clones, but these reconstructions are prone to errors caused by alternative splice forms, transcripts from gene families with related sequences, and expressed pseudogenes. These errors confound genome assembly and annotation. The most useful transcript sequences are derived by complete insert sequencing of clones containing the entire length, or at least the full protein coding sequence (CDS) portion, of the source mRNA. While the bovine genome sequencing initiative is nearing completion, there is currently a paucity of bovine full-CDS mRNA and protein sequence data to support bovine genome assembly and functional genomics studies. Consequently, the production of high-quality bovine full-CDS cDNA sequences will enhance the bovine genome assembly and functional studies of bovine genes and gene products. The goal of this investigation was to identify and characterize the full-CDS sequences of bovine transcripts from clones identified in non-full-length enriched cDNA libraries. In contrast to several recent full-length cDNA investigations, these full-CDS cDNAs were selected, sequenced, and annotated without the benefit of the target organism's genomic sequence, by using comparison of bovine EST sequence to existing human mRNA to identify likely full-CDS clones for full-length insert cDNA (FLIC) sequencing.
The predicted bovine protein lengths, 5' UTR lengths, and Kozak consensus sequences from 954 bovine FLIC sequences (bFLICs; average length 1713 nt, representing 762 distinct loci) are all consistent with previously sequenced mammalian full-length transcripts.
In most cases, the bFLICs span the entire CDS of the genes, providing the basis for creating predicted bovine protein sequences to support proteomics and comparative evolutionary research as well as functional genomics and genome annotation. The results demonstrate the utility of the comparative approach in obtaining predicted protein sequences in other species.
PMCID: PMC1314900  PMID: 16305752
17.  Analysis of vertebrate genomes suggests a new model for clade B serpin evolution 
BMC Genomics  2005;6:167.
The human genome contains 13 clade B serpin genes at two loci, 6p25 and 18q21. The three genes at 6p25 all conform to a 7-exon gene structure with conserved intron positioning and phasing, however, at 18q21 there are two 7-exon genes and eight genes with an additional exon yielding an 8-exon structure. Currently, it is not known how these two loci evolved, nor which gene structure arose first – did the 8-exon genes gain an exon, or did the 7-exon genes lose one? Here we use the genomes of diverse vertebrate species to plot the emergence of clade B serpin genes and to identify the point at which the two genomic structures arose.
Analysis of the chicken genome indicated the presence of a single clade B serpin gene locus, containing orthologues of both human loci and both genomic structures. The frog genome and the genomes of three fish species presented progressively simpler loci, although only the 7-exon structure could be identified. The Serpinb12 gene contains seven exons in the frog genome, but eight exons in chickens and humans, indicating that the additional exon evolved in this gene.
We propose a new model for clade B serpin evolution from a single 7-exon gene (either Serpinb1 or Serpinb6). An additional exon was gained in the Serpinb12 gene between the tetrapoda and amniota radiations to produce the 8-exon structure. Both structures were then duplicated at a single locus until a chromosomal breakage occurred at some point along the mammalian lineage resulting in the two modern loci.
PMCID: PMC1308813  PMID: 16305753
18.  Expression and genomic organization of zonadhesin-like genes in three species of fish give insight into the evolutionary history of a mosaic protein 
BMC Genomics  2005;6:165.
The mosaic sperm protein zonadhesin (ZAN) has been characterized in mammals and is implicated in species-specific egg-sperm binding interactions. The genomic structure and testes-specific expression of zonadhesin is known for many mammalian species. All zonadhesin genes characterized to date consist of meprin A5 antigen receptor tyrosine phosphatase mu (MAM) domains, mucin tandem repeats, and von Willebrand (VWD) adhesion domains. Here we investigate the genomic structure and expression of zonadhesin-like genes in three species of fish.
The cDNA and corresponding genomic locus of a zonadhesin-like gene (zlg) in Atlantic salmon (Salmo salar) were sequenced. Zlg is similar in adhesion domain content to mammalian zonadhesin; however, the domain order is altered. Analysis of puffer fish (Takifugu rubripes) and zebrafish (Danio rerio) sequence data identified zonadhesin (zan) genes that share the same domain order, content, and a conserved syntenic relationship with mammalian zonadhesin. A zonadhesin-like gene in D. rerio was also identified. Unlike mammalian zonadhesin, D. rerio zan and S. salar zlg were expressed in the gut and not in the testes.
We characterized likely orthologs of zonadhesin in both T. rubripes and D. rerio and uncovered zonadhesin-like genes in S. salar and D. rerio. Each of these genes contains MAM, mucin, and VWD domains. While these domains are associated with several proteins that show prominent gut expression, their combination is unique to zonadhesin and zonadhesin-like genes in vertebrates. The expression patterns of fish zonadhesin and zonadhesin-like genes suggest that the reproductive role of zonadhesin evolved later in the mammalian lineage.
PMCID: PMC1325057  PMID: 16303057
19.  Efficient gene-driven germ-line point mutagenesis of C57BL/6J mice 
BMC Genomics  2005;6:164.
Analysis of an allelic series of point mutations in a gene, generated by N-ethyl-N-nitrosourea (ENU) mutagenesis, is a valuable method for discovering the full scope of its biological function. Here we present an efficient gene-driven approach for identifying ENU-induced point mutations in any gene in C57BL/6J mice. The advantage of such an approach is that it allows one to select any gene of interest in the mouse genome and to go directly from DNA sequence to mutant mice.
We produced the Cryopreserved Mutant Mouse Bank (CMMB), which is an archive of DNA, cDNA, tissues, and sperm from 4,000 G1 male offspring of ENU-treated C57BL/6J males mated to untreated C57BL/6J females. Each mouse in the CMMB carries a large number of random heterozygous point mutations throughout the genome. High-throughput Temperature Gradient Capillary Electrophoresis (TGCE) was employed to perform a 32-Mbp sequence-driven screen for mutations in 38 PCR amplicons from 11 genes in DNA and/or cDNA from the CMMB mice. DNA sequence analysis of heteroduplex-forming amplicons identified by TGCE revealed 22 mutations in 10 genes for an overall mutation frequency of 1 in 1.45 Mbp. All 22 mutations are single base pair substitutions, and nine of them (41%) result in nonconservative amino acid substitutions. Intracytoplasmic sperm injection (ICSI) of cryopreserved spermatozoa into B6D2F1 or C57BL/6J ova was used to recover mutant mice for nine of the mutations to date.
The inbred C57BL/6J CMMB, together with TGCE mutation screening and ICSI for the recovery of mutant mice, represents a valuable gene-driven approach for the functional annotation of the mammalian genome and for the generation of mouse models of human genetic diseases. The ability of ENU to induce mutations that cause various types of changes in proteins will provide additional insights into the functions of mammalian proteins that may not be detectable by knockout mutations.
PMCID: PMC1325271  PMID: 16300676
20.  An acquisition account of genomic islands based on genome signature comparisons 
BMC Genomics  2005;6:163.
Recent analyses of prokaryotic genome sequences have demonstrated the important force horizontal gene transfer constitutes in genome evolution. Horizontally acquired sequences are detectable by, among others, their dinucleotide composition (genome signature) dissimilarity with the host genome. Genomic islands (GIs) comprise important and interesting horizontally transferred sequences, but information about acquisition events or relatedness between GIs is scarce. In Vibrio vulnificus CMCP6, 10 and 11 GIs have previously been identified in the sequenced chromosomes I and II, respectively. We assessed the compositional similarity and putative acquisition account of these GIs using the genome signature. For this analysis we developed a new algorithm, available as a web application.
Of 21 GIs, VvI-1 and VvI-10 of chromosome I have similar genome signatures, and while artificially divided due to a linear annotation, they are adjacent on the circular chromosome and therefore comprise one GI. Similarly, GIs VvI-3 and VvI-4 of chromosome I together with the region between these two islands are compositionally similar, suggesting that they form one GI (making a total of 19 GIs in chromosome I + chromosome II). Cluster analysis assigned the 19 GIs to 11 different branches above our conservative threshold. This suggests a limited number of compositionally similar donors or intragenomic dispersion of ancestral acquisitions. Furthermore, 2 GIs of chromosome II cluster with chromosome I, while none of the 19 GIs group with chromosome II, suggesting an unidirectional dispersal of large anomalous gene clusters from chromosome I to chromosome II.
From the results, we infer 10 compositionally dissimilar donors for 19 GIs in the V. vulnificus CMCP6 genome, including chromosome I donating to chromosome II. This suggests multiple transfer events from individual donor types or from donors with similar genome signatures. Applied to other prokaryotes, this approach may elucidate the acquisition account in their genome sequences, and facilitate donor identification of GIs.
PMCID: PMC1310630  PMID: 16297239
21.  Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria 
BMC Genomics  2005;6:162.
Identification of a bacterial protein's subcellular localization (SCL) is important for genome annotation, function prediction and drug or vaccine target identification. Subcellular fractionation techniques combined with recent proteomics technology permits the identification of large numbers of proteins from distinct bacterial compartments. However, the fractionation of a complex structure like the cell into several subcellular compartments is not a trivial task. Contamination from other compartments may occur, and some proteins may reside in multiple localizations. New computational methods have been reported over the past few years that now permit much more accurate, genome-wide analysis of the SCL of protein sequences deduced from genomes. There is a need to compare such computational methods with laboratory proteomics approaches to identify the most effective current approach for genome-wide localization characterization and annotation.
In this study, ten subcellular proteome analyses of bacterial compartments were reviewed. PSORTb version 2.0 was used to computationally predict the localization of proteins reported in these publications, and these computational predictions were then compared to the localizations determined by the proteomics study. By using a combined approach, we were able to identify a number of contaminants and proteins with dual localizations, and were able to more accurately identify membrane subproteomes. Our results allowed us to estimate the precision level of laboratory subproteome studies and we show here that, on average, recent high-precision computational methods such as PSORTb now have a lower error rate than laboratory methods.
We have performed the first focused comparison of genome-wide proteomic and computational methods for subcellular localization identification, and show that computational methods have now attained a level of precision that is exceeding that of high-throughput laboratory approaches. We note that analysis of all cellular fractions collectively is required to effectively provide localization information from laboratory studies, and we propose an overall approach to genome-wide subcellular localization characterization that capitalizes on the complementary nature of current laboratory and computational methods.
PMCID: PMC1314894  PMID: 16288665
22.  Construction and characterization of a genomic BAC library for the Mus m. musculus mouse subspecies (PWD/Ph inbred strain) 
BMC Genomics  2005;6:161.
The genome of classical laboratory strains of mice is an artificial mosaic of genomes originated from several mouse subspecies with predominant representation (>90%) of the Mus m. domesticus component. Mice of another subspecies, East European/Asian Mus m. musculus, can interbreed with the classical laboratory strains to generate hybrids with unprecedented phenotypic and genotypic variations. To study these variations in depth we prepared the first genomic large insert BAC library from an inbred strain derived purely from the Mus m. musculus-subspecies. The library will be used to seek and characterize genomic sequences controlling specific monogenic and polygenic complex traits, including modifiers of dominant and recessive mutations.
A representative mouse genomic BAC library was derived from a female mouse of the PWD/Ph inbred strain of Mus m. musculus subspecies. The library consists of 144 768 primary clones from which 97% contain an insert of 120 kb average size. The library represents an equivalent of 6.7 × mouse haploid genome, as estimated from the total number of clones carrying genomic DNA inserts and from the average insert size. The clones were arrayed in duplicates onto eight high-density membranes that were screened with seven single-copy gene probes. The individual probes identified four to eleven positive clones, corresponding to 6.9-fold coverage of the mouse genome. Eighty-seven BAC-ends of PWD/Ph clones were sequenced, edited, and aligned with mouse C57BL/6J (B6) genome. Seventy-three BAC-ends displayed unique hits on B6 genome and their alignment revealed 0.92 single nucleotide polymorphisms (SNPs) per 100 bp. Insertions and deletions represented 0.3% of the BAC end sequences.
Analysis of the novel genomic library for the PWD/Ph inbred strain demonstrated coverage of almost seven mouse genome equivalents and a capability to recover clones for specific regions of PWD/Ph genome. The single nucleotide polymorphism between the strains PWD/Ph and C57BL/6J was 0.92/100 bp, a value significantly higher than between classical laboratory strains. The library will serve as a resource for dissecting the phenotypic and genotypic variations between mice of the Mus m. musculus subspecies and classical laboratory mouse strains.
PMCID: PMC1299325  PMID: 16288658
23.  Leveraging human genomic information to identify nonhuman primate sequences for expression array development 
BMC Genomics  2005;6:160.
Nonhuman primates (NHPs) are essential for biomedical research due to their similarities to humans. The utility of NHPs will be greatly increased by the application of genomics-based approaches such as gene expression profiling. Sequence information from the 3' end of genes is the key resource needed to create oligonucleotide expression arrays.
We have developed the algorithms and procedures necessary to quickly acquire sequence information from the 3' end of nonhuman primate orthologs of human genes. To accomplish this, we identified terminal exons of over 15,000 human genes by aligning mRNA sequences with genomic sequence. We found the mean length of complete last exons to be approximately 1,400 bp, significantly longer than previous estimates. We designed primers to amplify genomic DNA, which included at least 300 bp of the terminal exon. We cloned and sequenced the PCR products representing over 5,500 Macaca mulatta (rhesus monkey) orthologs of human genes. This sequence information has been used to select probes for rhesus gene expression profiling. We have also tested 10 sets of primers with genomic DNA from Macaca fascicularis (Cynomolgus monkey), Papio hamadryas (Baboon), and Chlorocebus aethiops (African green monkey, vervet). The results indicate that the primers developed for this study will be useful for acquiring sequence from the 3' end of genes for other nonhuman primate species.
This study demonstrates that human genomic DNA sequence can be leveraged to obtain sequence from the 3' end of NHP orthologs and that this sequence can then be used to generate NHP oligonucleotide microarrays. Affymetrix and Agilent used sequences obtained with this approach in the design of their rhesus macaque oligonucleotide microarrays.
PMCID: PMC1314899  PMID: 16288651
24.  Preferential attachment in the evolution of metabolic networks 
BMC Genomics  2005;6:159.
Many biological networks show some characteristics of scale-free networks. Scale-free networks can evolve through preferential attachment where new nodes are preferentially attached to well connected nodes. In networks which have evolved through preferential attachment older nodes should have a higher average connectivity than younger nodes. Here we have investigated preferential attachment in the context of metabolic networks.
The connectivities of the enzymes in the metabolic network of Escherichia coli were determined and representatives for these enzymes were located in 11 eukaryotes, 17 archaea and 46 bacteria. E. coli enzymes which have representatives in eukaryotes have a higher average connectivity while enzymes which are represented only in the prokaryotes, and especially the enzymes only present in βγ-proteobacteria, have lower connectivities than expected by chance. Interestingly, the enzymes which have been proposed as candidates for horizontal gene transfer have a higher average connectivity than the other enzymes. Furthermore, It was found that new edges are added to the highly connected enzymes at a faster rate than to enzymes with low connectivities which is consistent with preferential attachment.
Here, we have found indications of preferential attachment in the metabolic network of E. coli. A possible biological explanation for preferential attachment growth of metabolic networks is that novel enzymes created through gene duplication maintain some of the compounds involved in the original reaction, throughout its future evolution. In addition, we found that enzymes which are candidates for horizontal gene transfer have a higher average connectivity than other enzymes. This indicates that while new enzymes are attached preferentially to highly connected enzymes, these highly connected enzymes have sometimes been introduced into the E. coli genome by horizontal gene transfer. We speculate that E. coli has adjusted its metabolic network to a changing environment by replacing the relatively central enzymes for better adapted orthologs from other prokaryotic species.
PMCID: PMC1316878  PMID: 16281983
25.  Large-scale genetic variation of the symbiosis-required megaplasmid pSymA revealed by comparative genomic analysis of Sinorhizobium meliloti natural strains 
BMC Genomics  2005;6:158.
Sinorhizobium meliloti is a soil bacterium that forms nitrogen-fixing nodules on the roots of leguminous plants such as alfalfa (Medicago sativa). This species occupies different ecological niches, being present as a free-living soil bacterium and as a symbiont of plant root nodules. The genome of the type strain Rm 1021 contains one chromosome and two megaplasmids for a total genome size of 6 Mb. We applied comparative genomic hybridisation (CGH) on an oligonucleotide microarrays to estimate genetic variation at the genomic level in four natural strains, two isolated from Italian agricultural soil and two from desert soil in the Aral Sea region.
From 4.6 to 5.7 percent of the genes showed a pattern of hybridisation concordant with deletion, nucleotide divergence or ORF duplication when compared to the type strain Rm 1021. A large number of these polymorphisms were confirmed by sequencing and Southern blot. A statistically significant fraction of these variable genes was found on the pSymA megaplasmid and grouped in clusters. These variable genes were found to be mainly transposases or genes with unknown function.
The obtained results allow to conclude that the symbiosis-required megaplasmid pSymA can be considered the major hot-spot for intra-specific differentiation in S. meliloti.
PMCID: PMC1298293  PMID: 16283928

Results 1-25 (182)