Despite a ubiquitous distribution [
1] and a diversity that may parallel that of the Bacteria (for a recent review see [
2]), the Archaea still remain the most unexplored of life's domains. Whereas 21 different phyla are identified in the Bacteria (National Center for Biotechnology Information (NCBI) Taxonomy Database, as of October 2004 [
3]), known cultivable archaeal species fall into only two distinct phyla - the Crenarchaeota and the Euryarchaeota [
4] - on the basis of small subunit rRNA (SSU rRNA) (NCBI Taxonomy Database, as of October 2004 [
3]). A number of non-cultivated species that do not group with either Crenarchaeota or Euryarchaeota have been tentatively assigned to a third phylum, the Korarchaeota [
5]. However, this group may be artefactual, as well as that formed by other environmental 16S rRNA sequences [
2].
The Crenarchaeota/Euryarchaeota divide indicated by SSU rRNA phylogenies is strongly supported by comparative genomics, as a number of genes present in euryarchaeal genomes are missing altogether in crenarchaeal ones and vice versa. These differences are not trivial, as they involve key proteins involved in DNA replication, chromosome structure and replication. For example, the Crenarchaeota lack both DNA polymerases of the D family and eukaryotic-like histones, which are present in the Euryarchaeota [
6,
7]. Similarly, replication protein RPA and cell-division protein FtsZ remain exclusive to the Euryarchaeota [
8], while only the Crenarchaeota harbor the ribosomal protein S30 (COG4919). This suggests that members of these two archaeal subdomains may employ critically different molecular strategies for key cellular processes. The distinctiveness of the phyla Euryarchaeaota and Crenarchaeota is further strengthened by phylogenetic analysis ([
9,
10] and this work) and is likely to remain unaffected even when additional cultivable species will be defined. Such a dramatic split is intriguing as it may be more profound than that separating the different bacterial phyla and leaves open different scenarios for the origin of these important differences during early archaeal evolution.
Karl Stetter and his colleagues recently described a novel archaeal species -
Nanoarchaeum equitans - representing the smallest known living cell [
11]. This tiny hyperthermophile grows and divides at the surface of crenarchaeal
Ignicoccus species and cannot be cultivated independently, indicating an obligate symbiotic, and possibly parasitic, life style [
12]. Sequencing of the
N. equitans genome revealed the smallest cellular genome presently known (480 kb) and raised fascinating questions regarding the origin and evolution of this archaeon [
13]. Indeed, in contrast to typical genomes from parasitic/symbiotic microbes [
14-
16], that of
N. equitans does not show any evidence of decaying genes and contains a full complement of tightly packed genes encoding informational proteins [
13]. This suggests that the establishment of the dependence-relationship between
N. equitans and
Ignicoccus is probably very ancient. In a phylogeny of 14 archaeal taxa based on a concatenation of 35 ribosomal proteins and rooted by eukaryotic sequences,
N. equitans emerged as the first archaeal lineage, that is, before the divergence of the two main archaeal phyla, the Euryarchaeota and the Crenarchaeota [
13]. This is consistent with the early emergence of
N. equitans in a phylogeny based on SSU rRNA [
12], and with the proposal that
N. equitans should be considered as the representative of a novel and very ancient archaeal phylum, the Nanoarchaeota [
11].
Testing the phylogenetic position of
N. equitans is thus crucial to deciphering the history of the archaeal domain. For instance, if the divergence of this lineage indeed preceded the divergence of Euryarchaeota and Crenarchaeota, features common to
N. equitans and any other archaeal taxa could probably be considered as ancestral characters (provided that lateral gene transfers (LGTs) are excluded). For example, the most parsimonious interpretation for the presence in the genome of
N. equitans of all those genes that are otherwise found in the Euryarchaeota only [
13] is that all these proteins were present in the last archaeal ancestor and were subsequently lost in the Crenarchaeota. However, the hypothesis of an early divergence of the Nanoarchaeota should be treated with caution. There are now several examples in which fast-evolving taxa are mistakenly assigned to early branches because of a long branch attraction (LBA) artifact due to their high evolutionary rates [
17], especially when a distant outgroup is used [
18-
21]. Similarly, since adaptation to a symbiotic or parasitic life style may have accelerated its evolutionary rate, the basal position of
N. equitans in phylogenetic analyses using distant eukaryotic sequences as the outgroup [
13] may be strongly affected by LBA.
We tested the position of
N. equitans in the archaeal phylogeny by using a dataset of concatenated ribosomal proteins larger than that used by Waters and colleagues [
13], a much broader taxonomic sampling, and without including any outgroup in order to reduce LBA. By applying phylogenetic approaches that accurately handle reconstruction biases, we show that the early emergence of
N. equitans observed in previous analyses probably resulted from an LBA artifact due to the fast evolutionary rate of this archaeon, possibly worsened by LGT affecting a fraction of its ribosomal proteins. Indeed, the phylogenies based on our new ribosomal protein dataset and on additional single genes suggest that
N. equitans is more likely to be a very divergent euryarchaeon - possibly a sister lineage of Thermococcales - than a new and ancestral archaeal phylum. This is consistent with further evidence gathered from close BLAST hits analyses on the whole genome complement of this taxon.