PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-23 (23)
 

Clipboard (0)
None
Journals
Year of Publication
1.  Evolution of gene fusions: horizontal transfer versus independent events 
Genome Biology  2002;3(5):research0024.1-research0024.13.
Background
Gene fusions can be used as tools for functional prediction and also as evolutionary markers. Fused genes often show a scattered phyletic distribution, which suggests a role for processes other than vertical inheritance in their evolution.
Results
The evolutionary history of gene fusions was studied by phylogenetic analysis of the domains in the fused proteins and the orthologous domains that form stand-alone proteins. Clustering of fusion components from phylogenetically distant species was construed as evidence of dissemination of the fused genes by horizontal transfer. Of the 51 examined gene fusions that are represented in at least two of the three primary kingdoms (Bacteria, Archaea and Eukaryota), 31 were most probably disseminated by cross-kingdom horizontal gene transfer, whereas 14 appeared to have evolved independently in different kingdoms and two were probably inherited from the common ancestor of modern life forms. On many occasions, the evolutionary scenario also involves one or more secondary fissions of the fusion gene. For approximately half of the fusions, stand-alone forms of the fusion components are encoded by juxtaposed genes, which are known or predicted to belong to the same operon in some of the prokaryotic genomes. This indicates that evolution of gene fusions often, if not always, involves an intermediate stage, during which the future fusion components exist as juxtaposed and co-regulated, but still distinct, genes within operons.
Conclusion
These findings suggest a major role for horizontal transfer of gene fusions in the evolution of protein-domain architectures, but also indicate that independent fusions of the same pair of domains in distant species is not uncommon, which suggests positive selection for the multidomain architectures.
PMCID: PMC115226  PMID: 12049665
2.  Two C or not two C: recurrent disruption of Zn-ribbons, gene duplication, lineage-specific gene loss, and horizontal gene transfer in evolution of bacterial ribosomal proteins 
Genome Biology  2001;2(9):research0033.1-research0033.14.
Background
Ribosomal proteins are encoded in all genomes of cellular life forms and are, generally, well conserved during evolution. In prokaryotes, the genes for most ribosomal proteins are clustered in several highly conserved operons, which ensures efficient co-regulation of their expression. Duplications of ribosomal-protein genes are infrequent, and given their coordinated expression and functioning, it is generally assumed that ribosomal-protein genes are unlikely to undergo horizontal transfer. However, with the accumulation of numerous complete genome sequences of prokaryotes, several paralogous pairs of ribosomal protein genes have been identified. Here we analyze all such cases and attempt to reconstruct the evolutionary history of these ribosomal proteins.
Results
Complete bacterial genomes were searched for duplications of ribosomal proteins. Ribosomal proteins L36, L33, L31, S14 are each duplicated in several bacterial genomes and ribosomal proteins L11, L28, L7/L12, S1, S15, S18 are so far duplicated in only one genome each. Sequence analysis of the four ribosomal proteins, for which paralogs were detected in several genomes, two of the ribosomal proteins duplicated in one genome (L28 and S18), and the ribosomal protein L32 showed that each of them comes in two distinct versions. One form contains a predicted metal-binding Zn-ribbon that consists of four conserved cysteines (in some cases replaced by histidines), whereas, in the second form, these metal-chelating residues are completely or partially replaced. Typically, genomes containing paralogous genes for these ribosomal proteins encode both versions, designated C+ and C-, respectively. Analysis of phylogenetic trees for these seven ribosomal proteins, combined with comparison of genomic contexts for the respective genes, indicates that in most, if not all cases, their evolution involved a duplication of the ancestral C+ form early in bacterial evolution, with subsequent alternative loss of the C+ and C- forms in different lineages. Additionally, evidence was obtained for a role of horizontal gene transfer in the evolution of these ribosomal proteins, with multiple cases of gene displacement 'in situ', that is, without a change of the gene order in the recipient genome.
Conclusions
A more complex picture of evolution of bacterial ribosomal proteins than previously suspected is emerging from these results, with major contributions of lineage-specific gene loss and horizontal gene transfer. The recurrent theme of emergence and disruption of Zn-ribbons in bacterial ribosomal proteins awaits a functional interpretation.
PMCID: PMC56895  PMID: 11574053
5.  Phylogenomics of prokaryotic ribosomal proteins 
Genome Biology  2011;12(Suppl 1):P30.
doi:10.1186/gb-2011-12-s1-p30
PMCID: PMC3439054
6.  The origin and early evolution of eukaryotes in the light of phylogenomics 
Genome Biology  2010;11(5):209.
Comparative genomics and new phylogenies of eukaryote groups suggest a scenario in which the mitochondrial endosymbiosis triggered the origin of eukaryotes.
Phylogenomics of eukaryote supergroups suggest a highly complex last common ancestor of eukaryotes and a key role of mitochondrial endosymbiosis in the origin of eukaryotes.
doi:10.1186/gb-2010-11-5-209
PMCID: PMC2898073  PMID: 20441612
7.  A novel family of P-loop NTPases with an unusual phyletic distribution and transmembrane segments inserted within the NTPase domain 
Genome Biology  2004;5(5):R30.
This study characterizes the KAP protein family - a newly identified sub-group of the P-loop NTPases, which have transmembrane helices inserted into the P-loop NTPase domain. Their unusual phyletic distribution suggests KAP proteins were transferred from bacteria to animals by horizontal gene transfer.
Background
Recent sequence-structure studies on P-loop-fold NTPases have substantially advanced the existing understanding of their evolution and functional diversity. These studies provide a framework for characterization of novel lineages within this fold and prediction of their functional properties.
Results
Using sequence profile searches and homology-based structure prediction, we have identified a previously uncharacterized family of P-loop NTPases, which includes the neuronal membrane protein and receptor tyrosine kinase substrate Kidins220/ARMS, which is conserved in animals, the F-plasmid PifA protein involved in phage T7 exclusion, and several uncharacterized bacterial proteins. We refer to these (predicted) NTPases as the KAP family, after Kidins220/ARMS and PifA. The KAP family NTPases are sporadically distributed across a wide phylogenetic range in bacteria but among the eukaryotes are represented only in animals. Many of the prokaryotic KAP NTPases are encoded in plasmids and tend to undergo disruption to form pseudogenes. A unique feature of all eukaryotic and certain bacterial KAP NTPases is the presence of two or four transmembrane helices inserted into the P-loop NTPase domain. These transmembrane helices anchor KAP NTPases in the membrane such that the P-loop domain is located on the intracellular side. We show that the KAP family belongs to the same major division of the P-loop NTPase fold with the AAA+, ABC, RecA-like, VirD4-like, PilT-like, and AP/NACHT-like NTPase classes. In addition to the KAP family, we identified another small family of predicted bacterial NTPases, with two transmembrane helices inserted into the P-loop domain. This family is not specifically related to the KAP NTPases, suggesting independent acquisition of the transmembrane helices.
Conclusions
We predict that KAP family NTPases function principally in the NTP-dependent dynamics of protein complexes, especially those associated with the intracellular surface of cell membranes. Animal KAP NTPases, including Kidins220/ARMS, are likely to function as NTP-dependent regulators of the assembly of membrane-associated signaling complexes involved in neurite growth and development. One possible function of the prokaryotic KAP NTPases might be in the exclusion of selfish replicons, such as viruses, from the host cells. Phylogenetic analysis and phyletic patterns suggest that the common ancestor of the animals acquired a KAP NTPase via lateral transfer from bacteria. However, an earlier transfer into eukaryotes followed by multiple losses in several eukaryotic lineages cannot be ruled out.
PMCID: PMC416466  PMID: 15128444
8.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes 
Genome Biology  2004;5(2):R7.
We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs from seven eukaryotic genomes. The analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes.
Background
Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes.
Results
We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes.
Conclusions
The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.
PMCID: PMC395751  PMID: 14759257
9.  Evolution of mosaic operons by horizontal gene transfer and gene displacement in situ 
Genome Biology  2003;4(9):R55.
Comparative genomics and phylogenetic analysis have been used to examine horizontal transfer of entire operons versus displacement of individual genes within operons by horizontally acquired orthologs and independent assembly of the same or similar operons from genes with different phylogenetic affinities.
Background
Shuffling and disruption of operons and horizontal gene transfer are major contributions to the new, dynamic view of prokaryotic evolution. Under the 'selfish operon' hypothesis, operons are viewed as mobile genetic entities that are constantly disseminated via horizontal gene transfer, although their retention could be favored by the advantage of coregulation of functionally linked genes. Here we apply comparative genomics and phylogenetic analysis to examine horizontal transfer of entire operons versus displacement of individual genes within operons by horizontally acquired orthologs and independent assembly of the same or similar operons from genes with different phylogenetic affinities.
Results
Since a substantial number of operons have been identified experimentally in only a few model bacteria, evolutionarily conserved gene strings were analyzed as surrogates of operons. The phylogenetic affinities within these predicted operons were assessed first by sequence similarity analysis and then by phylogenetic analysis, including statistical tests of tree topology. Numerous cases of apparent horizontal transfer of entire operons were detected. However, it was shown that apparent horizontal transfer of individual genes or arrays of genes within operons is not uncommon either and results in xenologous gene displacement in situ, that is, displacement of an ancestral gene by a horizontally transferred ortholog from a taxonomically distant organism without change of the local gene organization. On rarer occasions, operons might have evolved via independent assembly, in part from horizontally acquired genes.
Conclusions
The discovery of in situ gene displacement shows that combination of rampant horizontal gene transfer with selection for preservation of operon structure provides for events in prokaryotic evolution that, a priori, seem improbable. These findings also emphasize that not all aspects of operon evolution are selfish, with operon integrity maintained by purifying selection at the organism level.
PMCID: PMC193655  PMID: 12952534
10.  Getting positive about selection 
Genome Biology  2003;4(8):331.
A report on the 68th Symposium on Quantitative Biology, 'The Genome of Homo Sapiens', Cold Spring Harbor, USA, 28 May-2 June 2003.
A report on the 68th Symposium on Quantitative Biology, The Genome of Homo Sapiens', Cold Spring Harbor, USA, 28 May-2 June 2003.
PMCID: PMC193638  PMID: 12914654
11.  Comparative genomics of archaea: how much have we learned in six years, and what's next? 
Genome Biology  2003;4(8):115.
With 16 complete archaeal genomes sequenced to date, comparative genomics has revealed a conserved core of 313 genes that are represented in all sequenced archaeal genomes, plus a variable 'shell' that is prone to lineage-specific gene loss and horizontal gene exchange.
Archaea comprise one of the three distinct domains of life (with bacteria and eukaryotes). With 16 complete archaeal genomes sequenced to date, comparative genomics has revealed a conserved core of 313 genes that are represented in all sequenced archaeal genomes, plus a variable 'shell' that is prone to lineage-specific gene loss and horizontal gene exchange. The majority of archaeal genes have not been experimentally characterized, but novel functional pathways have been predicted.
PMCID: PMC193635  PMID: 12914651
12.  The rhomboids: a nearly ubiquitous family of intramembrane serine proteases that probably evolved by multiple ancient horizontal gene transfers 
Genome Biology  2003;4(3):R19.
The near-universal presence of the rhomboid family in bacteria, archaea and eukaryotes appears to suggest that this protein is part of the heritage of the last universal common ancestor, phylogenetic tree analysis indicates a likely bacterial origin with subsequent dissemination by horizontal gene transfer.
Background
The rhomboid family of polytopic membrane proteins shows a level of evolutionary conservation unique among membrane proteins. They are present in nearly all the sequenced genomes of archaea, bacteria and eukaryotes, with the exception of several species with small genomes. On the basis of experimental studies with the developmental regulator rhomboid from Drosophila and the AarA protein from the bacterium Providencia stuartii, the rhomboids are thought to be intramembrane serine proteases whose signaling function is conserved in eukaryotes and prokaryotes.
Results
Phylogenetic tree analysis carried out using several independent methods for tree constructions and the corresponding statistical tests suggests that, despite its broad distribution in all three superkingdoms, the rhomboid family was not present in the last universal common ancestor of extant life forms. Instead, we propose that rhomboids evolved in bacteria and have been acquired by archaea and eukaryotes through several independent horizontal gene transfers. In eukaryotes, two distinct, ancient acquisitions apparently gave rise to the two major subfamilies, typified by rhomboid and PARL (presenilins-associated rhomboid-like protein), respectively. Subsequent evolution of the rhomboid family in eukaryotes proceeded by multiple duplications and functional diversification through the addition of extra transmembrane helices and other domains in different orientations relative to the conserved core that harbors the protease activity.
Conclusions
Although the near-universal presence of the rhomboid family in bacteria, archaea and eukaryotes appears to suggest that this protein is part of the heritage of the last universal common ancestor, phylogenetic tree analysis indicates a likely bacterial origin with subsequent dissemination by horizontal gene transfer. This emphasizes the importance of explicit phylogenetic analysis for the reconstruction of ancestral life forms. A hypothetical scenario for the origin of intracellular membrane proteases from membrane transporters is proposed.
doi:10.1186/gb-2003-4-3-r19
PMCID: PMC153459  PMID: 12620104
13.  Extensive domain shuffling in transcription regulators of DNA viruses and implications for the origin of fungal APSES transcription factors 
Genome Biology  2002;3(3):research0012.1-research0012.11.
Background
Viral DNA-binding proteins have served as good models to study the biochemistry of transcription regulation and chromatin dynamics. Computational analysis of viral DNA-binding regulatory proteins and identification of their previously undetected homologs encoded by cellular genomes might lead to a better understanding of their function and evolution in both viral and cellular systems.
Results
The phyletic range and the conserved DNA-binding domains of the viral regulatory proteins of the poxvirus D6R/N1R and baculoviral Bro protein families have not been previously defined. Using computational analysis, we show that the amino-terminal module of the D6R/N1R proteins defines a novel, conserved DNA-binding domain (the KilA-N domain) that is found in a wide range of proteins of large bacterial and eukaryotic DNA viruses. The KilA-N domain is suggested to be homologous to the fungal DNA-binding APSES domain. We provide evidence for the KilA-N and APSES domains sharing a common fold with the nucleic acid-binding modules of the LAGLIDADG nucleases and the amino-terminal domains of the tRNA endonuclease. The amino-terminal module of the Bro proteins is another, distinct DNA-binding domain (the Bro-N domain) that is present in proteins whose domain architectures parallel those of the KilA-N domain-containing proteins. A detailed analysis of the KilA-N and Bro-N domains and the associated domains points to extensive domain shuffling and lineage-specific gene family expansion within DNA virus genomes.
Conclusions
We define a large class of novel viral DNA-binding proteins and their cellular homologs and identify their domain architectures. On the basis of phyletic pattern analysis we present evidence for a probable viral origin of the fungus-specific cell-cycle regulatory transcription factors containing the APSES DNA-binding domain. We also demonstrate the extensive role of lineage-specific gene expansion and domain shuffling, within a limited set of approximately 24 domains, in the generation of the diversity of virus-specific regulatory proteins.
PMCID: PMC88810  PMID: 11897024
14.  Selection in the evolution of gene duplications 
Genome Biology  2002;3(2):research0008.1-research0008.9.
Background
Gene duplications have a major role in the evolution of new biological functions. Theoretical studies often assume that a duplication per se is selectively neutral and that, following a duplication, one of the gene copies is freed from purifying (stabilizing) selection, which creates the potential for evolution of a new function.
Results
In search of systematic evidence of accelerated evolution after duplication, we used data from 26 bacterial, six archaeal, and seven eukaryotic genomes to compare the mode and strength of selection acting on recently duplicated genes (paralogs) and on similarly diverged, unduplicated orthologous genes in different species. We find that the ratio of nonsynonymous to synonymous substitutions (Kn/Ks) in most paralogous pairs is <<1 and that paralogs typically evolve at similar rates, without significant asymmetry, indicating that both paralogs produced by a duplication are subject to purifying selection. This selection is, however, substantially weaker than the purifying selection affecting unduplicated orthologs that have diverged to the same extent as the analyzed paralogs. Most of the recently duplicated genes appear to be involved in various forms of environmental response; in particular, many of them encode membrane and secreted proteins.
Conclusions
The results of this analysis indicate that recently duplicated paralogs evolve faster than orthologs with the same level of divergence and similar functions, but apparently do not experience a phase of neutral evolution. We hypothesize that gene duplications that persist in an evolving lineage are beneficial from the time of their origin, due primarily to a protein dosage effect in response to variable environmental conditions; duplications are likely to give rise to new functions at a later phase of their evolution once a higher level of divergence is reached.
PMCID: PMC65685  PMID: 11864370
15.  Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins 
Genome Biology  2001;2(12):research0053.1-research0053.9.
Background
Detection of changes in a protein's evolutionary rate may reveal cases of change in that protein's function. We developed and implemented a simple relative rates test in an attempt to assess the rate constancy of protein evolution and to detect cases of functional diversification between orthologous proteins. The test was performed on clusters of orthologous protein sequences from complete bacterial genomes (Chlamydia trachomatis, C. muridarum and Chlamydophila pneumoniae), complete archaeal genomes (Pyrococcus horikoshii, P. abyssi and P. furiosus) and partially sequenced mammalian genomes (human, mouse and rat).
Results
Amino-acid sequence evolution rates are significantly correlated on different branches of phylogenetic trees representing the great majority of analyzed orthologous protein sets from all three domains of life. However, approximately 1% of the proteins from each group of species deviates from this pattern and instead shows variation that is consistent with an acceleration of the rate of amino-acid substitution, which may be due to functional diversification. Most of the putative functionally diversified proteins from all three species groups are predicted to function at the periphery of the cells and mediate their interaction with the environment.
Conclusions
Relative rates of protein evolution are remarkably constant for the three species groups analyzed here. Deviations from this rate constancy are probably due to changes in selective constraints associated with diversification between orthologs. Functional diversification between orthologs is thought to be a relatively rare event. However, the resolution afforded by the test designed specifically for genomic-scale datasets allowed us to identify numerous cases of possible functional diversification between orthologous proteins.
PMCID: PMC64838  PMID: 11790256
16.  Quod erat demonstrandum? The mystery of experimental validation of apparently erroneous computational analyses of protein sequences 
Genome Biology  2001;2(12):research0051.1-research0051.11.
Background
Computational predictions are critical for directing the experimental study of protein functions. Therefore it is paradoxical when an apparently erroneous computational prediction seems to be supported by experiment.
Results
We analyzed six cases where application of novel or conventional computational methods for protein sequence and structure analysis led to non-trivial predictions that were subsequently supported by direct experiments. We show that, on all six occasions, the original prediction was unjustified, and in at least three cases, an alternative, well-supported computational prediction, incompatible with the original one, could be derived. The most unusual cases involved the identification of an archaeal cysteinyl-tRNA synthetase, a dihydropteroate synthase and a thymidylate synthase, for which experimental verifications of apparently erroneous computational predictions were reported. Using sequence-profile analysis, multiple alignment and secondary-structure prediction, we have identified the unique archaeal 'cysteinyl-tRNA synthetase' as a homolog of extracellular polygalactosaminidases, and the 'dihydropteroate synthase' as a member of the β-lactamase-like superfamily of metal-dependent hydrolases.
Conclusions
In each of the analyzed cases, the original computational predictions could be refuted and, in some instances, alternative strongly supported predictions were obtained. The nature of the experimental evidence that appears to support these predictions remains an open question. Some of these experiments might signify discovery of extremely unusual forms of the respective enzymes, whereas the results of others could be due to artifacts.
PMCID: PMC64836  PMID: 11790254
17.  An apology for orthologs - or brave new memes 
Genome Biology  2001;2(4):comment1005.1-comment1005.2.
PMCID: PMC138920  PMID: 11305932
18.  The DNA-repair protein AlkB, EGL-9, and leprecan define new families of 2-oxoglutarate- and iron-dependent dioxygenases 
Genome Biology  2001;2(3):research0007.1-research0007.8.
Background:
Protein fold recognition using sequence profile searches frequently allows prediction of the structure and biochemical mechanisms of proteins with an important biological function but unknown biochemical activity. Here we describe such predictions resulting from an analysis of the 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenases, a class of enzymes that are widespread in eukaryotes and bacteria and catalyze a variety of reactions typically involving the oxidation of an organic substrate using a dioxygen molecule.
Results:
We employ sequence profile analysis to show that the DNA repair protein AlkB, the extracellular matrix protein leprecan, the disease-resistance-related protein EGL-9 and several uncharacterized proteins define novel families of enzymes of the 2OG-Fe(II) oxygenase superfamily. The identification of AlkB as a member of the 2OG-Fe(II) oxygenase superfamily suggests that this protein catalyzes oxidative detoxification of alkylated bases. More distant homologs of AlkB were detected in eukaryotes and in plant RNA viruses, leading to the hypothesis that these proteins might be involved in RNA demethylation. The EGL-9 protein from Caenorhabditis elegans is necessary for normal muscle function and its inactivation results in resistance against paralysis induced by the Pseudomonas aeruginosa toxin. EGL-9 and leprecan are predicted to be novel protein hydroxylases that might be involved in the generation of substrates for protein glycosylation.
Conclusions:
Here, using sequence profile searches, we show that several previously undetected protein families contain 2OG-Fe(II) oxygenase fold. This allows us to predict the catalytic activity for a wide range of biologically important, but biochemically uncharacterized proteins from eukaryotes and bacteria.
PMCID: PMC30706  PMID: 11276424
19.  Interkingdom gene fusions 
Genome Biology  2000;1(6):research0013.1-13.13.
Background:
Genome comparisons have revealed major lateral gene transfer between the three primary kingdoms of life - Bacteria, Archaea, and Eukarya. Another important evolutionary phenomenon involves the evolutionary mobility of protein domains that form versatile multidomain architectures. We were interested in investigating the possibility of a combination of these phenomena, with an invading gene merging with a pre-existing gene in the recipient genome.
Results:
Complete genomes of fifteen bacteria, four archaea and one eukaryote were searched for interkingdom gene fusions (IKFs); that is, genes coding for proteins that apparently consist of domains originating from different primary kingdoms. Phylogenetic analysis supported 37 cases of IKF, each of which includes a 'native' domain and a horizontally acquired 'alien' domain. IKFs could have evolved via lateral transfer of a gene coding for the alien domain (or a larger protein containing this domain) followed by recombination with a native gene. For several IKFs, this scenario is supported by the presence of a gene coding for a second, stand-alone version of the alien domain in the recipient genome. Among the genomes investigated, the greatest number of IKFs has been detected in Mycobacterium tuberculosis, where they are almost always accompanied by a stand-alone alien domain. For most of the IKF cases detected in other genomes, the stand-alone counterpart is missing.
Conclusions:
The results of comparative genome analysis show that IKF formation is a real, but relatively rare, evolutionary phenomenon. We hypothesize that IKFs are formed primarily via the proposed two-stage mechanism, but other than in the Actinomycetes, in which IKF generation seems to be an active, ongoing process, most of the stand-alone intermediates have been eliminated, perhaps because of functional redundancy.
PMCID: PMC16144  PMID: 11178267
20.  Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs) 
Genome Biology  2000;1(5):research0009.1-research0009.19.
Background:
Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi.
Results:
A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix.
Conclusions:
Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and predicted protein functions provide for a significant improvement in genome annotation. A differential genome display approach helps in a systematic investigation of common and distinct features of gene repertoires and in some cases reveals unexpected connections that may be indicative of functional similarities between phylogenetically distant organisms and of lateral gene exchange.
PMCID: PMC15027  PMID: 11178258
21.  The alpha/beta fold uracil DNA glycosylases: a common origin with diverse fates 
Genome Biology  2000;1(4):research0007.1-research0007.8.
Background:
Uracil DNA glycosylases (UDGs) are major repair enzymes that protect DNA from mutational damage caused by uracil incorporated as a result of a polymerase error or deamination of cytosine. Four distinct families of UDGs have been identified, which show very limited sequence similarity to each other, although two of them have been shown to possess the same structural fold. The structural and evolutionary relationships between the rest of the UDGs remain uncertain.
Results:
Using sequence profile searches, multiple alignment analysis and protein structure comparisons, we show here that all known UDGs possess the same fold and must have evolved from a common ancestor. Although all UDGs catalyze essentially the same reaction, significant changes in the configuration of the catalytic residues were detected within their common fold, which probably results in differences in the biochemistry of these enzymes. The extreme sequence divergence of the UDGs, which is unusual for enzymes with the same principal activity, is probably due to the major role of the uracil-flipping caused by the conformational strain enacted by the enzyme on uracil-containing DNA, as compared with the catalytic action of individual polar residues. We predict two previously undetected families of UDGs and delineate a hypothetical scenario for their evolution.
Conclusions:
UDGs form a single protein superfamily with a distinct structural fold and a common evolutionary origin. Differences in the catalytic mechanism of the different families combined with the construction of the catalytic pocket have, however, resulted in extreme sequence divergence of these enzymes.
PMCID: PMC15025  PMID: 11178247
22.  Encapsulated in silica: genome, proteome and physiology of the thermophilic bacterium Anoxybacillus flavithermus WK1 
Genome Biology  2008;9(11):R161.
Sequencing of the complete genome of Anoxybacillus flavithermus reveals enzymes that are required for silica adaptation and biofilm formation.
Background
Gram-positive bacteria of the genus Anoxybacillus have been found in diverse thermophilic habitats, such as geothermal hot springs and manure, and in processed foods such as gelatin and milk powder. Anoxybacillus flavithermus is a facultatively anaerobic bacterium found in super-saturated silica solutions and in opaline silica sinter. The ability of A. flavithermus to grow in super-saturated silica solutions makes it an ideal subject to study the processes of sinter formation, which might be similar to the biomineralization processes that occurred at the dawn of life.
Results
We report here the complete genome sequence of A. flavithermus strain WK1, isolated from the waste water drain at the Wairakei geothermal power station in New Zealand. It consists of a single chromosome of 2,846,746 base pairs and is predicted to encode 2,863 proteins. In silico genome analysis identified several enzymes that could be involved in silica adaptation and biofilm formation, and their predicted functions were experimentally validated in vitro. Proteomic analysis confirmed the regulation of biofilm-related proteins and crucial enzymes for the synthesis of long-chain polyamines as constituents of silica nanospheres.
Conclusions
Microbial fossils preserved in silica and silica sinters are excellent objects for studying ancient life, a new paleobiological frontier. An integrated analysis of the A. flavithermus genome and proteome provides the first glimpse of metabolic adaptation during silicification and sinter formation. Comparative genome analysis suggests an extensive gene loss in the Anoxybacillus/Geobacillus branch after its divergence from other bacilli.
doi:10.1186/gb-2008-9-11-r161
PMCID: PMC2614493  PMID: 19014707
23.  A genomic analysis of the archaeal system Ignicoccus hospitalis-Nanoarchaeum equitans 
Genome Biology  2008;9(11):R158.
Sequencing of the complete genome of Ignicoccus hospitalis gives insight into its association with another species of Archaea, Nanoarchaeum equitans.
Background
The relationship between the hyperthermophiles Ignicoccus hospitalis and Nanoarchaeum equitans is the only known example of a specific association between two species of Archaea. Little is known about the mechanisms that enable this relationship.
Results
We sequenced the complete genome of I. hospitalis and found it to be the smallest among independent, free-living organisms. A comparative genomic reconstruction suggests that the I. hospitalis lineage has lost most of the genes associated with a heterotrophic metabolism that is characteristic of most of the Crenarchaeota. A streamlined genome is also suggested by a low frequency of paralogs and fragmentation of many operons. However, this process appears to be partially balanced by lateral gene transfer from archaeal and bacterial sources.
Conclusions
A combination of genomic and cellular features suggests highly efficient adaptation to the low energy yield of sulfur-hydrogen respiration and efficient inorganic carbon and nitrogen assimilation. Evidence of lateral gene exchange between N. equitans and I. hospitalis indicates that the relationship has impacted both genomes. This association is the simplest symbiotic system known to date and a unique model for studying mechanisms of interspecific relationships at the genomic and metabolic levels.
doi:10.1186/gb-2008-9-11-r158
PMCID: PMC2614490  PMID: 19000309

Results 1-23 (23)