In order to get an overview about the phylogenetic and habitat distribution of NHases, we created HMMs (Hidden-Markov-Model) for each of the two subunits based on 42 α and 48 β subunit sequences and screened 12,126,382 proteins (or protein fragments) from UniRef and seven metagenomic data sets from diverse environments. In total, 324 α (including 14 of thiocyanate hydratases (SCNases) 
) and 265 β (including 4 SCNases) subunit members were found in this homology search step. The α subunit HMM seems to be more sensitive when applied to fragmented sequences – the ratio of α to β sequences is not 1
1 as expected (for fully sequenced genomes, this ratio is obtained; see Table S1
). Yet, the HMMs identify both subunits in most of the species in UniRef that harbor NHases and also in some of the metagenomic scaffolds.
To confirm the NHases membership of the identified sequences, to study the taxonomic distribution of the originating organisms and to possibly define new subgroups we constructed maximum likelihood trees of both subunits. These trees () confirmed that the detected sequences are NHases and show taxonomic clustering. They illustrate that all sequences – also the metagenomic ones - seem to originate from bacterial species, with a large fraction of proteobacterial NHases found in the Global Ocean Sampling Expedition dataset (Table S1
and Figure S1
). There is one notable and surprising exception to this observation: both subunits are contained in a single hypothetical open reading frame (UniProt identifier A9V2C1) of the recently sequenced choanoflagellate Monosiga brevicollis 
, as deposited in the UniRef database.
Maximum-likelihood tree of the NHase α subunit sequences.
The unicellular Monosiga brevicollis
is one of more than 125 known choanoflagellates which represent the closest known relatives of metazoans (i.e. are closer to animals than plants and fungi). They can form simple multicellular colonies and are found in marine, brackish and freshwater habitats in which they use their apical flagellum to prey bacteria 
As Monosiga would be the first eukaryote that harbors an NHase, we analyzed the respective gene and encoding protein in detail.
The putative NHase is 496 amino acids long and contains the usually separately encoded subunits fused into one protein connected by a Histidin-rich stretch (). Both subunits seem complete and the putative ion binding active site in the α subunit (single letter code: CXXCSC) that is necessary for NHase functioning 
appears conserved. The orientation of the two subunits in the coding region of the genome of Monosiga brevicollis is different from the operon structure in most bacteria; the β subunit is located 5′-terminal, the α subunit 3′-terminal while in bacteria the domains are usually arranged in the order α- β (5′ to 3′). The phylogenetic analysis () shows that the protein clusters together with NHases of proteobacterial origin and a BLAST-based analysis clearly indicates proteobacteria as the most similar homologs (Methods S1
and Methods S2
Scheme of the genomic region, ESTs and the protein of the NHases in Monosiga brevicollis.
In order to exclude contamination and check for likely functionality, we analyzed genomic features and EST (expressed sequence tag) data. The expression of the gene is strongly supported by the existence of two ESTs covering a large portion of the gene (). Furthermore, one EST (accession number JGI_XYM3899.rev) implies that the gene contains a 96 bp long intron in the active site. The GC value of the corresponding transcripts (59.4%) differs only slightly from the median GC value of all Monosiga transcripts (56.9%) which strengthen the assumption that it is a gene of Monosiga and not bacterial contamination of the genome sequence.
Putative amidases could be detected with HMMs in Monosiga's protein set (as in other eukaryotes) but their genes are distantly located to the NHase in the genome and show only low similarity to the NHase-connected amidases in bacteria. Despite the fact that the identified amidases do not seem to be transferred from a proteobacterial donor together with the NHase, it is possible that an existing Monosiga amidase took over this functionality but we cannot exclude that the NHase products are processed differently in this choanoflagellate.