We searched 687,835 eubacterial proteins in 162 complete and 13 incomplete proteomes, 60,382 archaebacterial proteins in 27 complete proteomes, and 231,229 eukaryotic proteins in 23 complete proteomes, totaling 979,446 screened proteins in 212 complete and 13 incomplete proteomes (Tables S1
). Since we aimed at maximizing the sensitivity of detection, we used one of the most sensitive tools 
with a permissive cut-off. Our final fold predictions () were evaluated and are supported by several considerations 
, including fold assignment program scores, secondary structure prediction agreement, atomic model evaluation by statistical potential (, Text S1
), and, for a selected protein, limited proteolysis (see below).
Number of membrane coat-like proteins in the PVC superphylum.
G. obscuriglobus proteins have the MC architecture.
At least four MCs are expected to be found in each eukaryotic proteomes, corresponding to clathrin, Sec31, the pair of homologues α- and β'-COP, and one nucleoporin. We found at least four MCs proteins in most eukaryotes, with a few exceptions, like Plasmodium falciparum, where we found only two. This might be explained by our failure to detect all MCs in this organism but is perhaps more likely to be due to the peculiar cellular biology of this organism, given that in all other eukaryotes, our method recovered at least one copy of all four groups of MCs.
Thus, proteins predicted to have the MC architecture were detected in all eukaryotes, as expected—however, they were also unexpectedly detected in the proteomes of several members of the bacterial Planctomycetes-Verrucomicrobia-Chlamydiae (PVC) superphylum (; ; Tables S1
). We found 11, 11, 8, and 5 genes coding for MC-like proteins in the Planctomycetes B. marina
, P. maris
, G. obscuriglobus
, and R. baltica
proteomes, respectively, and 16, 14, and 9 in the Verrucomicrobiae V. spinosum
, C. flavus
, and P. parvula
, and 9 in the Lentisphaerae L. araneosa
. We did not find MC-like protein coding genes in the Planctomycetes C. Kuenenia stuttgartiensis
, in the Verrucomicrobiae A. muciniphila
, M. infernorum
, O. bacterium
, and O. terrae
or in the Lentisphaerae V. vadensis
proteomes. Notably, we found no MC-like proteins in the Chlamydiae. Most of the sequences identified are annotated as uncharacterized or predicted proteins. All PVC MC-like proteins are derived from a single common ancestor, since they detect each other after a few rounds of PSI-Blast. Sequence-similarity based clustering of these sequences suggests that the most recent common ancestor of these organisms may have contained more than one such protein; all of the dendrograms obtained from these analyses contained several well-supported groups of sequences whose species composition is inconsistent with the presence of a single MC protein in the most recent common PVC ancestor (Figure S1
Sequence searches using PVC MC-like proteins as queries do not detect any sequences other than the PVC MC-like proteins, and such searches starting from the eukaryotic MCs do not detect any bacterial proteins, as reported previously 
. These two facts demonstrate the necessity of using our structure-based search protocol. Despite the lack of significant sequence-similarity between eukaryotic and prokaryotic MCs, predicted secondary structure content and architecture (i.e., domain composition and organization) similarity links both sets of proteins at the structural level ( and Figures S2
, ), without implying homology (see Discussion
Secondary and tertiary structure of MC proteins.
The presence of proteins with the MC architecture in a bacterial phylum was unexpected 
. PVC is a monophyletic group whose members have dramatically different lifestyles and colonize a wide range of different habitats. However, they also have several unexpected similarities lending support to the monophyly of this supergroup 
. Unlike most other prokaryotes, members of the PVC superphylum have a compartmentalized cell plan 
. G. osbcuriglobus
, a member of the Planctomycete phylum, is unique among prokaryotes in having cytoplasmic invaginations of the internal membrane that sometimes appear to surround the DNA with a double membrane envelope 
. Thus, we focused our analysis on G. obscuriglobus
. To avoid artefacts related to sample fixation in conventional EM, we first investigated the membrane morphology in high-pressure frozen and freeze substituted G. obscuriglobus
cells. We observed that the internal membrane morphology of G. obscuriglobus
is variable and changes considerably during growth on solid culture medium. The main phenotypic observation is an irregular volume of the paryphoplasm, the space between the inner and outer membrane () 
. In large colonies after 2 wk growth, the paryphoplasm can occupy up to 50% of the cell volume and frequently includes vesicle-like structures containing dark particles, most likely ribosomes. The content of the vesicles appears to have a different composition than the cytoplasm since it appears darker and denser in the electron micrographs (), and the vesicle compartments are therefore presumably closed. The vesicles are unlikely to be artefactual as they were observed with two different fixation/substitution methods, osmium tetroxide-acetone and uranyl acetate-acetone, and have previously been reported using freeze fracturing 
The Gemmata membrane morphology is variable.
To further localize one of the identified proteins, we cloned, overexpressed, and purified one of the G. obscuriglobus
MC-like proteins, gp4978, in Escherichia coli
. Limited proteolysis 
supports the predicted MC architecture as protease-accessible sites are positioned similarly to those in eukaryotic MC proteins () 
. We then raised polyclonal antibodies against the gp4978 protein to investigate its localization in the cell. The antibodies recognized the gp4978 tagged protein in expressing E. coli
cells but not in control extracts, indicating that it is specific for the protein (Figure S10
). Western blot of G. obscuriglobus
cell extracts indicated that the serum does not cross-react with other proteins, despite percentages of identity ranging from 22% to 28% between the G. obscuriglobus
MC-like proteins. Additionally, we have characterized the specificity of the antibody using immuno-labeling. As limited labeling was observed outside the cell and pre-immune serum did not label the G. obscuriglobus
cells, we concluded that the antibody is specific for gp4978. Labeling was not observed on control E. coli
We performed a quantitative immuno-localization analysis on high-pressure frozen and freeze substituted G. obscuriglobus
cells with affinity purified anti-gp4978 antibodies and secondary protein A-gold labeling. We initially analyzed cells with marked cytoplasmic membrane invaginations, most of which have paryphoplasm of considerable volume. In such cells, >95% of the antibody-gold particles localized in the paryphoplasm (n
507). In Gemmata
cells, labeling was not observed with two control sera, raised against human Mel-28 and Aequorea victoria
green fluorescent protein, respectively.
We then focused on cells with vesicles in the paryphoplasmic space. Most gp4978 either localized free in the paryphoplasm or in proximity to vesicle membranes (). Fifty-nine percent of the gold particles were located in the paryphoplasm more than 10 nm from any membrane, and 28% were adjacent to the paryphoplasmic surface of a vesicle membrane. In addition, 5% were in contact with the outer membrane, 4% with the inner membrane, and 5% were located in the cytoplasm (n
494 from four independent experiments). Thus, a significant fraction (>1/3rd
) of the paryphoplasmic pool of gp4978 associates with intracytoplasmic membranes.
Sub-population of gp4978 associates with membranes.
Eukaryotic MCs are in tight interaction with dynamic bent membranes 
. Thus, the membrane localization of the Gemmata
MC-like protein is similar to that of eukaryotic MCs. We therefore investigated the possibility of lateral gene transfer between a eukaryote and the bacteria by comparing the GC content and codon usage of the proteins and did not detect evidence of lateral gene transfer involving the planctomycete MC-like proteins. The codon usage and GC content of the MC-like protein genes is not significantly different from those of other planctomycete proteins, nor are they significantly similar to those of any proteins from other proteomes, including eukaryotes (Tables S3